Release Management/WER Investigation
Jump to navigation
Jump to search
This page will track the initial investigation of Windows Error Reporting (WER). Our current plan is to hook our bug tracking and metrics into Winqual, the site/API used to access Microsoft's crash info. Using this info will help diagnose hangs (which aren't currently tracked) on Windows XP and later, and may even uncover new crashes.
Meetings
- 2011-10-10 - Kickoff
Associated Bugs
- bug 429592 - discussion of what to do with process hangs
- bug 600275
- bug 693315 - Add annotation for source of crashes
Resources
- Short description on MSDN: WER: Getting Started
- Longer guide (more targeted at hw/drivers though): Developers Guide to WER
- Full WinQual Docs
- Microsoft's WER client: Stackhash
- MSDN Webcast: Getting Started with Windows Error Reporting (video from 2006)
- Breakpad design docs - https://code.google.com/p/google-breakpad/w/list
Builds
Creating File Mappings
- MS suggests using their Microsoft Product Feedback Mapping Tool (AppMap.exe) to create a file manifest - required for crash collection for a new product version
- (?) Need to explore generating this file manifest so that the build machines can create it
- Or use the command line interface during the build on a windows machine as outlined at Microsoft Product Feedback Mapping Tool Readme
Uploading File Mappings
- MS suggests using the "Upload File Mappings option" on the Administration menu
- (?) Need to figure out if the WER web API can instead be used to perform an upload
WinQual Update Frequency
- (?) Determine if there are any limitations to the number of products that can be registered (for possible use with nighties
- Need to take into consideration the >2 day lag time on getting reports
- (?) Bandwidth limits for pulling down
- Only the first few cab files are stored for a crash. The event viewer web interface offers the ability to make a "Data Request" to collect "Processor & Memory Information", the heap, specific files (logs, etc.), additional cab files
- (?) Can the web API be used to expose this to developers or will we need more accounts?
- (?) Need to find out what it means when no cab files are available and the web interface offers "(click here to switch to collection mode)" - aren't we always collecting?
Accessing WinQual Data
Windows Live Login
- (?) Access in winqual requires a Windows Live login. Similarly, in MS's StackHash client requires the use of Windows Live Sign-in Assistant. Need to determine if the associated library is required for logging in.
- (?) Need to figure out what account we'd use for automated tools
WER Web API
- API documentation: StackHash source download > 3rdparty > WinQual API > Data Services.docx
Breakpad/Soccoro
- (?) Need to understand the overlap between WinQual crash data and Breakpad crash data, map the applicable info, and decide what to do with "additional" info
- (?) Can we make use of individual hit event info in general? (as opposed to just crash cabs)
- (?) Will hangs (heap dumps) need to be handled any differently than minidumps?
- (?) Who do we give access to minidumps?
- (?) What is our current data retention policy?
- We may want to keep hangs around for longer since there may be a lot, and they've never been investigated
- (?) What is our access audit ability?
Cab File Contents (for collector/processor)
- WERInternalMetadata.xml - (possibly) not present if version.txt is. Includes
- OSVersionInformation - windows version info, architecture, etc.
- ProblemSignatures - event type (crash/hang), crashing executable name, exe version/timestamp, methodDef token of faulting method (?), and IL offset of faulting instruction (?)
- DynamicSignatures -
- SystemInformation - HW info. What's an MID?
- AppCompat.txt (also all lower) - not present if WERDataCollectionFailure.txt is. Includes information on all images loaded by the process.
- WERDataCollectionFailure.txt - includes error message if processing failed in MS.
- version.txt - only came across this once. Only included OS version.
- For crashes
- ______.{m}dmp - minidump file. These types of dumps already appeared to be handled according to http://code.google.com/p/google-breakpad/wiki/ProcessorDesign
- For hangs
- <process-name>.xml - additional hang metadata like the wait chain list
- memory.hdmp - info about the difference between a heapdump and a minidump outlined here
Future Investigations
- Consider providing a solution (link) when the user is presented with the Windows crash dialog
- Can even link to an exe (if part of the "Designed for Windows" logo program), which may be a good idea for last ditch effort if even Firefox's safe mode fails (application files no longer pristine, need reinstall).