947
edits
m (→Problems We Aimed to Solve with Datazilla: formatting) |
(→Problems We Aimed to Solve with Datazilla: link to data.sql + formatting) |
||
Line 22: | Line 22: | ||
=== Problems We Aimed to Solve with Datazilla === | === Problems We Aimed to Solve with Datazilla === | ||
* Preserve and capture raw performance numbers: | * Preserve and capture raw performance numbers: the Talos test framework is a bad place to do statistics, because if you do any averaging before uploading the results then the ability to retrieve the original data is forever lost. Instead, datazilla should take in all raw values from talos and provide a central platform for regression/improvement detection and statistical study. | ||
* Reduce the granularity of Talos from a page set to a single page: http://k0s.org/mozilla/blog/20120425093346 ; statistics and regressions should be dealt with on a per-page basis, as pages may have wildly different performance values; see also https://wiki.mozilla.org/Metrics/Talos_Investigation#Unrolling_Talos . | * Reduce the granularity of Talos from a page set to a single page: http://k0s.org/mozilla/blog/20120425093346 ; statistics and regressions should be dealt with on a per-page basis, as pages may have wildly different performance values; see also https://wiki.mozilla.org/Metrics/Talos_Investigation#Unrolling_Talos . | ||
Line 30: | Line 30: | ||
* Statistics should be self-evident: often, Talos+Graphserver and other statistical systems have been approached as a "black box": A number comes out that is "good" or "bad". However, this effectively leaves an interested developer in the dark as to where this number came from and discourages understanding the system and playing with data. Datazilla was designed to expose the statistics being used so that there are no mysteries here. | * Statistics should be self-evident: often, Talos+Graphserver and other statistical systems have been approached as a "black box": A number comes out that is "good" or "bad". However, this effectively leaves an interested developer in the dark as to where this number came from and discourages understanding the system and playing with data. Datazilla was designed to expose the statistics being used so that there are no mysteries here. | ||
* No requirement to update the database every time a test or machine changes | * No requirement to update the database every time a test or machine changes: unlike the maintenance nightmare that is the current [http://hg.mozilla.org/graphs/file/tip/sql/data.sql data.sql] in graphserver, the Datazilla schema should be dynamic in response to uploaded data. | ||
* Allow experimentation with statistics: | * Allow experimentation with statistics: while in practice, there will be a canonical manner (or conceivably manners) to determine regressions and improvements, alternatives should be investigatable and swappable. This can only be done by creating a system that stores all the raw data from the performance tests. | ||
* Ability to utilize data from arbitrary performance suites, not just | * Ability to utilize data from arbitrary performance suites, not just Talos: whatever we create next for performance analysis should be able to use Datazilla as a data storage and retrieval system. This way we can use Datazilla as a building block in our next performance automation task. | ||
* Datazilla should be able to be scalable enough to accumulate data per-push and generate a "regression/improvement" analysis for that push in real time. | * Datazilla should be able to be scalable enough to accumulate data per-push and generate a "regression/improvement" analysis for that push in real time. | ||
* The system should also provide a UI atop its own REST interfaces so that an interested developer can start on TBPL and drill into the results of a push. The developer should be able to drill all the way down to the raw replicate values for the page (i.e. each page is loaded some number of times, and you should be able to drill down to that level if you want to). | * The system should also provide a UI atop its own REST interfaces so that an interested developer can start on TBPL and drill into the results of a push. The developer should be able to drill all the way down to the raw replicate values for the page (i.e. each page is loaded some number of times, and you should be able to drill down to that level if you want to). |
edits