|
|
(4 intermediate revisions by 2 users not shown) |
Line 3: |
Line 3: |
| [[Image:Breakpad.jpg|Pictogram of the breakpad server architecture]] | | [[Image:Breakpad.jpg|Pictogram of the breakpad server architecture]] |
|
| |
|
| == Breakpad Client == | | == Milestones == |
| * Platform integration for Windows, Mac OS X, Linux
| |
| * UI parity with Mozilla products
| |
| * crash minidumps will be stored on the user HD automatically. NOTE: need to delete on "Clear Private Data"
| |
| * opt-in control for users to participate in sending data at all, and confirmation before sending any individual blackbox. also the ability to supply e-mail contact info, last visited site, and comments about activity leading up to the crash.
| |
| * Collect and send data to Breakpad Server** Product info (product, version, platform, build id, airbag version)
| |
| ** System information
| |
| *** os version
| |
| *** processor, processor speed
| |
| *** available and in use memory and diskspace
| |
| *** screen size resolution and any available info on graphic system
| |
| *** process list for other programs that might be in use and conflicting
| |
| *** command line start up options --
| |
| *** extensions loaded (new)
| |
| *** we could also consider improvements like auto filling crash url with info out of history, or send a recent history list under user control.
| |
| ** Stability statistics (total runtime, time since last crash, crash frequency, etc.)
| |
| ** Minidumps automatically contain the list of DLLs in the process that crashed, which will give us plugin data
| |
| * Store incident queue/history
| |
| * Talkback also has the ability to receive config changes back from the server to control operations of the client. This is useful for such things as controls for shutting down the client for obsolete versions, slow down the transmission re-try rate or for another throttling mechanisms.
| |
|
| |
|
| == Breakpad Minidump Collector == | | A [http://spreadsheets.google.com/ccc?key=pz4NfkoyHy_bnEjHrg9seDA spreadsheet] has milestones and task information. The actual dates for the milestones are not yet finalized. |
|
| |
|
| 1) the collector accepts the crash reports (minidump + metadata) from clients
| | == Detailed Information == |
|
| |
| Requires write access to the minidump store
| |
| read/write access to the database
| |
|
| |
| 2) the symbol uploader takes symbol information from
| |
| 2a) the tinderboxes/build systems
| |
| 2b) uploaded symbol information from extension authors
| |
|
| |
| Requires write access to the symbol store
| |
| and read/write access to the database
| |
|
| |
|
| * Apache web server to manage incoming minidumps via HTTPS | | * [[Breakpad/Design/Client]] |
| * Pass minidump data through the firewall
| | * [[Breakpad/Design/Database]] |
| * Monitor queue and minidump status
| | * [[Breakpad/Design/Collector]] |
| * Check client version and state and be able to serve config changes to Breakpad Client (e.g. send message to disable client)
| | * [[Breakpad/Design/Processor]] |
| * Prep minidump for handoff to the processor | | * [[Breakpad/Design/Reporter]] |
| * If we provide the configuration control feature the repeater also needs to push configuration info back down to the client for such things like "turn yourself off", "slow down your retry rate" and other control options.
| | * [[Breakpad/Design/Bootstrap]] |
| | | * [[Breakpad/Design/Symbol Server]] |
| == Breakpad Symbols Store ==
| | * [[Breakpad/Design/Loadtesting]] |
| 5) symbol store maintenance
| |
|
| |
| Need to clean up old symbol information (e.g. nightly builds greater than
| |
| a week old, alphas/betas that are no longer relevant, etc.
| |
|
| |
| Requires delete access to the symbol store
| |
| Read/write access to the database
| |
| | |
| * Infrastructure should allow build systems to push symbols to the Symbols Store | |
| * Extension and plugin authors should also be able to upload PDB files for inclusion.
| |
| * Store symbols for each build/release based on product information (similar to AUS URLs). Key parameters should include:
| |
| ** Vendor/Project, Product, Product Version (?), Platform, OS Version (?), Build ID
| |
| ** Currently don't use Product or OS Versions with Talkback, but might want to consider doing it with Breakpad. | |
| * Provide a Windows Symbol Server impl with the symbol data
| |
| | |
| NOTE: the airbag processor does not need or use any product/version information to find symbols in the symbol store: all symbols are retrieved by unique image keys or checksums. However, in order to flush out old symbol data that is no longer relevant (especially from nightly builds), we will need to have a buildid/flushing strategy.
| |
| | |
| == Breakpad Processor ==
| |
| 3) The processor takes the minidump and turns it into a stack
| |
|
| |
| Requires read/delete access to the minidump store
| |
| read/write access to the database
| |
|
| |
| this doesn't need to be (probably shouldn't be) a web app, but a
| |
| background daemon
| |
| | |
| * Grab minidumps from the collector | |
| * Process minidumps
| |
| ** Extract info from the minidump
| |
| ** Map stack trace to symbol info from Symbols Store to decipher function names, file paths and line no.
| |
| ** Store crash information to Breakpad Database
| |
| | |
| == Breakpad Database ==
| |
| | |
| * Use MySQL, like everything else (we should revisit this, as it may NOT be the right choice --morgamic)
| |
| * Define schema that works well with current query/reporting needs
| |
| ** [need to dig up all common queries - jay] | |
| * The schema now accounts for the following kinds of information
| |
| ** individual minidump records
| |
| ** product/OS/buildid information
| |
| ** control information used to "register" an incoming minidump and monitor/track/control its progress though all the stages from being transfered from the client until it is fully collected and processed on the server and available for reporting and analysis. | |
| * using the same database and schema for the minidump data and also for control over the processing of the minidumps has created problems in the past. when coruption is introduced, database performance problems crop up, or maintenance is required to manage the collection of minidumps, the processing of incoming minidumps is hindered and we start into a downward spiral of compounding problems. We should do some thinking to see if there are ways to improve on the existing design. | |
| | |
| == Analysis and Reporting System(s) ==
| |
| 4) The query engine provides reporting and querying from the database
| |
|
| |
| Requires read access to the database
| |
| | |
| * figure out how much of http://talkback-public.mozilla.org/search/start.jsp we can leverage to cover the three main areas of analysis approaches
| |
| ** tools for looking at large volumes of data and quickly making sense of it.
| |
| *** reports like http://talkback-public.mozilla.org/reports/firefox/FF2001/FF2001-topcrashers.html
| |
| and http://talkback-public.mozilla.org/reports/firefox/FF2001/smart-analysis.all
| |
| ** the ability to find an individual report and put them into the context of a larger set of problems.
| |
| *** find blackbox(s) by incident id, comments, and configuration info, plus e-mail address and other potentially confidential info (for reporting system behind the firewall.)
| |
| ** the ability to quickly find and isolate when crash regressions appeared in nightly builds or between major releases.
| |
| ***
| |
| ** the ability to find common themes behind crash reports by looking at things like common comments, common configuration characteristics such as OS version, graphics systems, process lists, time since last crash, most frequently visited sites, etc...
| |
| *** http://talkback-public.mozilla.org/reports/firefox/FF2001/url-analysis-all.html
| |
| * Properly segrate "public" data (stacks) and "private" data (parameters, which may contain sensitive information)
| |