Auto-tools/Projects/OrangeFactor: Difference between revisions

Revision as of 16:42, 19 October 2010

Status

We have weekly meetings!

Wednesdays, 2PM PDT, 1-800-707-2533, PIN: 369, Conf# 304 (or) 650-903-0800 ext: 92, Conf# 304

Goals

Primary Goal

To develop a web dashboard that is useful for identifying and tracking the state of intermittent oranges in our tinderbox unit tests. This should help developers identify which oranges are most 'interesting', and should give people a notion of the overall state of oranges over time.

Secondary Goals

Since the implementation of the dashboard will require tinderbox failures be put into a database, we could potentially use this database in the tinderbox+pushlog UI, which would allow it to query data from a (fast) database, rather than parsing buildbot logs as it sometimes currently does.

History

Topfails

Topfails was the first database-driven orange tracker developed in our team. It shows failures in terms of overall occurrences. It suffers from a buggy log parser, and a UI with relatively few views.

source: http://hg.mozilla.org/automation/topfails/

Orange Factor

Orange Factor is a newer dashboard by jmaher. It calculates the average number of oranges per push (the 'orange factor'), and tracks that number over time. We're currently using it as a base to explore the usefulness of other statistics.

source: http://github.com/jmaher/Orange-Factor

Installation instructions

Architecture

The system will have several moving parts:

an HBASE database (bug 601028), hosted by the metrics team, which will store data parsed out from the buildbot logs, and which will provide some type of queryable interface that will provide test failure information in JSON format

a flume agent, which will move logs from the build/test system to a storage area hosted by metrics

a unittest logparser (bug 601216), that will parse buildbot test logs and produce output that gets fed into the HBASE db

a web dashboard that pulls data from the database and displays various interesting statistics about it

Making Oranges Interesting

Currently, our intermittent oranges are not very interesting. After they've been identified, they are usually more-or-less ignored. This has caused us to accumulate oranges to the point where we have to deal with several of them for every commit (and by 'deal with', I mean 'log it and forget it'), which is time consuming for the sheriffs and for anyone who pushes a commit. At the same time, it demotivates any effort to actually fix them.

We'd like to help change that. We think we can help by creating a dashboard to analyze oranges in the following ways:

identify the oranges that occur most frequently; these are the oranges that would produce the greatest improvement in our orange factor if fixed

identify significant changes in the frequency of a given orange; if a known intermittent orange suddenly begins to occur more frequently, it may be related to a recent code change, and this might give developers more information about when/why it occurred, which would hopefully help in fixing it

identify interesting patterns in failures; some failures may occur more frequently on certain OS's, build types, architectures, or other factors; by providing views which can track oranges across a range of factors, we might be able to provide developers with data that would help them reproduce failures or give them insight into their cause

identify overall trends in orange occurrences, already part of Orange Factor; this can help track the 'orangeness' of a product over time, and can help measure the helpfulness of orange-fixing activities

Dashboard Views

A list of dashboard views that may be interesting. We're currently using Orange Factor as a platform to experiment with views.

[DONE] display of overall orange factor over time
- do we need to be able display orange factor per OS, build type, etc?
[DONE] display of failures/day, for a given failure
[DONE] dipslay of failures/commit/day, for a given failure
[ON TRACK] display of moving averages of the above
- which moving averages are most useful?
display of failure frequencies which exceed certain limits (probably based on standard deviation)
display of most common failures, in aggregate, and separated by various factors: platform, OS version, architecture, build type, etc
other...?

@@ Line 32: / Line 32: @@
 source: http://github.com/jmaher/Orange-Factor
+[[Auto-tools/Projects/WarOnOrange/InstallationInstructions|Installation instructions]]
 == Architecture ==

Auto-tools/Projects/OrangeFactor: Difference between revisions

Revision as of 16:42, 19 October 2010

Contents

Status

Goals

History

Architecture

Making Oranges Interesting

Dashboard Views

Navigation menu

Auto-tools/Projects/OrangeFactor: Difference between revisions

Revision as of 16:42, 19 October 2010

Status

Goals

History

Architecture

Making Oranges Interesting

Dashboard Views

Navigation menu

Search