State of Performance Testing: November 2012

While the Signal from Noise project is not yet complete, there have been considerable improvements to Talos and the supporting infrastructure in the preceding year.

State of Talos: November 2012

The following areas have been improved in Talos as part of the SfN project:

Talos has been made more developer friendly
- It is now a standard python package with appropriate dependencies
- It is now easier to run: https://wiki.mozilla.org/Buildbot/Talos/Running#Running_locally_-_Source_Code
- Features `talos` + `PerfConfigurator` executables
- `talos` may be executed in a single step
- Standalone Talos has been deprecated
- The Pageloader extension now versioned with Talos
Per recommendation of Lewchuk and Metrics ( https://wiki.mozilla.org/Metrics/Talos_Investigation#Proposed_Filtering_Changes ), page load tests no longer ignore the maximum value and instead ignore the first 5 values per page: https://bugzilla.mozilla.org/show_bug.cgi?id=710484 , see ignore_first:5 in http://k0s.org:8080/?show=active
Talos statistics being reported to graphserver are now configurable via filters (see http://k0s.org/mozilla/blog/20120215124438 ). This said, we ultimately want to remove filters entirely once we no longer have to maintain graphserver. Talos shouldn't be doing statistics; this is a stop-gap measure (albeit a year long one) until we switch to using Datazilla and turn off graphserver.
All raw Talos measurements are now reported to Datazilla: https://datazilla.mozilla.org/talos
Talos tp tests no longer touch network: https://bugzilla.mozilla.org/show_bug.cgi?id=720852
PerfConfigurator has been made a robust YAML/JSON parser and generator
- run_tests.py now utilizes PerfConfigurator
- PerfConfigurator and remotePerfConfigurator have also been combined
- Separate PerfConfigurator step no longer required
Pageloader no longer calculates statistics: https://bugzilla.mozilla.org/show_bug.cgi?id=723571
Test definitions are no longer duplicated throughout the Talos codebase: They are now python files, see http://hg.mozilla.org/talos/file/tip/talos/ttest.py. This allows the test definitions to take advantage of inheritance and to stop duplicating repeating code and have made Talos far easier to configure, change, maintain, and expand. There is more we want to do here, this is merely a start. For more details see https://bugzilla.mozilla.org/show_bug.cgi?id=814228. jmaher - we should expand on the usage of this, the audience is developers
improved talos testing with talos.json: Release engineering put forth considerable effort to allow talos changes to be tested with try server. The results of this effort include having the URL of a talos.zip file listed in https://mxr.mozilla.org/mozilla-central/source/testing/talos/talos.json . This change alone probably saved hundreds of man-hours. ctalbert - expand this so that we can detail a bit about how to actually go about making use of the functionality.
software has been written which includes a web app component that details the up-to-date names of talos tests and suites in buildbot, TBPL, talos, and graphserver: http://k0s.org/mozilla/hg/talosnames/ . A deployed instance as at http://k0s.org:8080/

Several contributors have also participated in Talos development. \o/ The scope of their contributions have ranged from good first bug fixes to over-arching rewrites of parts of the software. Thanks goes out to all the folks that volunteered their time to help out here.

There are several remaining areas where the Talos software should be improved such as:

Complete our work toward creating a central definition of what a Talos test is versus split between the various definitions: https://bugzilla.mozilla.org/show_bug.cgi?id=814228
Unification of Talos counters: https://bugzilla.mozilla.org/show_bug.cgi?id=812352

State of Datazilla: November 2012

Datazilla manages talos data with three distinct database schemas: talos_objectstore_1, talos_perftest_1, and pushlog_hgmozilla_1. The objectstore contains a single table designed to store JSON objects. These objects contain a set of untreated replicate values for every page in a given Talos test suite. They are indexed in a separate schema called talos_perftest_1. In addition to indexing test data and reference data (product type, platform information, test suite/page names) the index also stores associated metrics data. This includes the results of the welch's one sided t-test, application of false discovery rate, and exponentially smoothed means and standard deviations. The application of metrics are treated generically in the schema, so any number of statistical treatments of the raw data can be supported in the future. The pushlog_hgmozilla_1 schema maintains an ordered list of pushes that are used to compare consecutive pushes to one another. The raw JSON data generated by Talos in production is received asyncronously and not necessarily in the push order that occured from the repository. All of the database schema's can be found here: https://github.com/mozilla/datazilla/tree/master/datazilla/model/sql/template_schema .

The user interface for datazilla was initially designed to drill down and examine the raw data associated with a Talos test. This was helpful in Q1-Q2 2012, in determining what needed to be done but does not address the issue of performance regression detection which is most relevant to developers and sheriffs. A new user interface was designed and implemented in Q4 to display the results of the new metrics treatment.

datazilla is now deployed in production. The source code can be found here https://github.com/mozilla/datazilla
datazilla utilizes production Talos data
RESTful API: http://datazilla.readthedocs.org/en/latest/webservice/
there is a python client as used by talos: https://github.com/mozilla/datazilla_client

State of Statistics: November 2012

pageload tests ignore the first 5 data points: http://k0s.org:8080/?show=active
we run non-interleaved for pageload tests
we use more replicates per page
Datazilla utilizes improved statistical methodologies. Datazilla uses the welch's ttest, the FDR stuff, and the exponential smoothing.

A datazilla-metrics repository, https://github.com/mozilla/datazilla-metrics , has been created, which is a python package that implements statistical methods useful for Datazilla.

Performance Testing Roadmap: 2013

It is a goal for 2013 to finish up the loose ends for talos, datazilla, and signal from noise in general:

Switch primary performance UI to be Datazilla : bug 824813
Deprecate Graphserver : bug 824814
Turn regressions orange on TBPL : bug 824812

Conclusion

In the last year, we've dug into every part of the performance testing automation at Mozilla. We have analyzed the test harness, the reporting tools, and the statistical soundness of the results that were being generated. Over the course of that year, we used what we learned to make the Talos framework easier to maintain, easier to run, simpler to set up, easier to test on try, and less error prone.We have created Datazilla, an extensible system for storing and retriving all our performance metrics from Talos and any future performance automation. We have rebooted our performance statistical analysis and created statistically viable, per-push regression/improvement detection. We have made all these systems easier to use and more open so that any contributor anywhere can take a look at our code and even experiment with new methods of statistical analysis on our performance data. But, we're not finished yet. There are more fixes to be done to the Talos framework itself. And the most critical piece of the infrastructure move still has to take place. We have to shift to using Datazilla in production and deprecating our use of Graphserver for new versions of Firefox. As we do that, we can clean out the remaining cruft in the Talos test framework, and focus our efforts on new ground breaking performance automation.Stay tuned. Or better yet, get involved: https://wiki.mozilla.org/Auto-tools#Want_to_Help.3F

Auto-tools/Projects/Signal From Noise/StatusNovember2012

Contents