Socorro:Rapid Betas
For Rapid Betas, we'll need to aggregate data based on build dates and provide "rolling windows" over them (e.g. "builds from the last 7 days") in top crashers and probably also graphs. The same functionality has been requested for esp. Nightly analysis for some time as well, and bug 672606 has been around for a while with a proposed solution.
Current Situation & Proposal Overview
Current topcrash reports and graphs are already providing rolling "last x days" views, but those are based on crash dates.
The proposal is to create a new, additional aggregated data set (matview) based on build days and use the same UI and middleware code with only a switch added to switch between crash-day and build-day oriented views of data.
UI and Middleware
As stated above, the goal is to keep the same UI and middleware code and only add a switch between crash-day and build-day oriented views of data to the topcrasher and graph UIs that is handed down to the middleware which selects the matview to use. No further changes to that code should be necessary.
New Calculations
The meat of the needed work is new, parallel matviews and stored procedures to create those. The new views would be structured to exactly match the crash-day-based ones (so that users can easily switch between them), just that their date column is being derived from build IDs stored in crashes instead of crash days.
The other change is that for betas, reports will not be made for different betas separately, but for all of a version in beta, i.e. instead of separate reports for "16.0b1", "16.0b2", "16.0b3", etc. there would be one category of "16.0 Beta", looking at all builds with a "16.0" version number and a "beta" channel - similar to how reports for Nightly and Aurora channels are done now.
Mostly for performance reasons, a window of crash dates to make it into the aggregation probably still needs to be provided, the proposal calls for 7 days starting with the build date.
Example
So, when the matview entries for "16.0 Beta" are generated on 2012-09-13, the value for the 2012-09-12 date is being calculated as the sum of crashes with a "16.0" version number, a "beta" channel and a build ID starting in "20120912", with crash submission/collection timestamps between 2012-09-12 00:00:00 UTC and 2012-09-19 00:00:00 UTC. Due to the crash date window, the values for the 6 days before that need to be regenerated as well. This means that every day, the values for 7 days of data need to be calculated/aggregated.
Graph Data
The same as for topcrashers is true with the daily data used in graphs, it needs a parallel DB structure based on build-day instead of crash-day. In this case, the same technique as for the crash matviews needs to be done for ADUs as well, i.e. summing up the ADUs from multiple collection days for build IDs starting with a certain day string. This data is available in metrics, but the metrics data push needs to be changed to make this available to Socorro, the tables need to be updates to hold it, and stored procedures for ADU matviews need to be updated as well. (There's some math in the bug for explaining why summing up ADUs for multiple days is fine here, for those with doubts on a statistical basis.)
Synergy with Existing Data
Similar data, at least on a highly aggregated basis, might already be available from bug 640238 work, some adjustments are possible to avoid redundancy in that data (e.g. if a 10-day instead of 7-day crash timestamp window might make more sense to fit both report types).
Comments and Open Issues
Middleware / UI
Josh
We will most likely need to do all of the following before the above is viable. The interfaces Kairo talks about contain some of our oldest (and hairiest) UI code in the application; they are extremely fragile and really can't be adapted to new functionality of any kind without refactoring. So:
- purge OldTCBS code
- refactor and clean up TCBS UI/mware code
- refactor and clean up home graph page UI/mware code
Laura
Navigation:
- Rework the main navigation: At the end of the beta period there may be up to 42 beta builds. How many of these should we show in the drop downs? Last 7? Or do we add a different type of navigation?
TCBS:
- To be clear, you want the following reports:
- Topcrashers per daily beta (buildid), generated each day, for the last N days (as it is at present)
- Topcrashers aggregated for the last seven days worth of daily betas, generated each day, for the last N days
Graphs:
- To be clear, you want the following reports:
- One graph per daily beta (buildid), generated each day, for the last N days (as it is at present)
- Graph showing...I don't know what this one is meant to show. The last N betas/buildids aggregated together for crashiness and ADUs? Is that useful?
Crons
- Rework aggregate, daily, and adu crons
Homepage:
- What do you want to appear on the graph here?
Data / Database
Josh
There are three different time windows for crash statistics, and I think the above spec confuses them.
- Receipt Window: a window of time during which a crash was received, e.g. crashes received by the collectors between 4/12/2012 and 4/19/2012, regardless of what builds they are from.
- Build Window: a span of time during which releases on a specific channel were built. e.g. crashes from builds between 20120412**** and 20120419**** on the beta channel, regardless of when they were installed or crashed.
- Build-Crash Window: a period between the build date of a given build and the datetime its crash was received, e.g. all crashes which were received within 7 days of the build date for builds 20120412***, 20120413*** and 20120415***.
The above specification seems to use these three windows interchangably, when they all refer to different sets of crashes, and each window will require its own distinct matviews and UIs. Can we have some clarification on which particular window is meant for each case, when expressing parameters like "7 day window"?
TCBS, crash date view
Mostly for performance reasons, a window of crash dates to make it into the aggregation probably still needs to be provided, the proposal calls for 7 days starting with the build date.
Actually, there's no real performance benefit in this view; we can happily aggregate all data regardless of build date for betas until they go past sunset; we do that now. So my question is: what makes the most sense to the developers?
TCBS, build date view
At what point (Build-Crash Window) do we stop aggregating crash data for each build after it's released? This is the place, if any, where we should have a 7-day or 14-day window.
Once we can agree on a spec, I don't anticipate the database work involved being particularly arduous. We already have all the data we need to back the above.
Open Source
Dropping the beta numbers may break things for our external users. Or possibly not; our use of beta numbers wasn't very portable in the first place.
- I wasn't planning on dropping them, don't see how that's a requirement