Release Management/WER Investigation: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
|||
Line 65: | Line 65: | ||
=== Cab File Contents (for collector/processor) === | === Cab File Contents (for collector/processor) === | ||
* [https://blogs.msdn.com/themes/blogs/generic/post.aspx?WeblogApp=wer&y=2009&m=03&d=16&WeblogPostName=faq&GroupKeys= FAQ - WER Services] - more info here under the question "What are the different types of memory dumps?" | |||
* WERInternalMetadata.xml - (possibly) not present if version.txt is. Includes | * WERInternalMetadata.xml - (possibly) not present if version.txt is. Includes | ||
** OSVersionInformation - windows version info, architecture, etc. | ** OSVersionInformation - windows version info, architecture, etc. |
Revision as of 01:25, 24 October 2011
This page will track the initial investigation of Windows Error Reporting (WER). Our current plan is to hook our bug tracking and metrics into Winqual, the site/API used to access Microsoft's crash info. Using this info will help diagnose hangs (which aren't currently tracked) on Windows XP and later, and may even uncover new crashes.
Meetings
- 2011-10-10 - Kickoff
- 2011-10-19
Associated Bugs
- Meta bug - bug 695713
- bug 429592 - discussion of what to do with process hangs
- bug 600275
- bug 692859
- bug 693315 - Add annotation for source of crashes
- bug 696207 - Add AppMap.exe to the build process
Resources
- Short description on MSDN: WER: Getting Started
- Longer guide (more targeted at hw/drivers though): Developers Guide to WER
- Full WinQual Docs
- Microsoft's WER client: Stackhash
- MSDN Webcast: Getting Started with Windows Error Reporting (video from 2006)
- Breakpad design docs - https://code.google.com/p/google-breakpad/w/list
- FAQ - WER Services
Builds
Creating File Mappings
- MS suggests using their Microsoft Product Feedback Mapping Tool (AppMap.exe) to create a file manifest - required for crash collection for a new product version
- (?) Need to explore generating this file manifest so that the build machines can create it
- Or use the command line interface during the build on a windows machine as outlined at Microsoft Product Feedback Mapping Tool Readme
Uploading File Mappings
- MS suggests using the "Upload File Mappings option" on the Administration menu
- (?) Need to figure out if the WER web API can instead be used to perform an upload
WinQual Update Frequency
- "By default we collect 10 cab (minidump) files per event"
- Lag times
- "Once we receive cab files for an event you will generally be able to see these cabs within a few hours of us receiving them."
- "For newly detected crashes it can take more than 4 days to get the crashes processed and up on the site."
- (?) Determine if there are any limitations to the number of products that can be registered (for possible use with nighties
- (?) Bandwidth limits for pulling down
- Only the first few cab files are stored for a crash. The event viewer web interface offers the ability to make a "Data Request" to collect "Processor & Memory Information", the heap, specific files (logs, etc.), additional cab files
- (?) Can the web API be used to expose this to developers or will we need more accounts?
- (?) Need to find out what it means when no cab files are available and the web interface offers "(click here to switch to collection mode)" - aren't we always collecting?
Accessing WinQual Data
Windows Live Login
- (?) Access in winqual requires a Windows Live login. Similarly, in MS's StackHash client requires the use of Windows Live Sign-in Assistant. Need to determine if the associated library is required for logging in.
- (?) Need to figure out what account we'd use for automated tools
WER Web API
- API documentation: StackHash source download > 3rdparty > WinQual API > Data Services.docx
Breakpad/Soccoro
- Crash dumps are stored in buckets
- "For crash events the bucketing parameters are Application Name, Application Version, Application Build Date, Module Name, Module Version, Module Build Date, Exception Code, and Code Offset"
- (?) Need to understand the overlap between WinQual crash data and Breakpad crash data, map the applicable info, and decide what to do with "additional" info
- (?) Can we make use of individual hit event info in general? (as opposed to just crash cabs)
- (?) Will hangs (heap dumps) need to be handled any differently than minidumps?
- (?) Who do we give access to minidumps?
- (?) What is our current data retention policy?
- We may want to keep hangs around for longer since there may be a lot, and they've never been investigated
- (?) What is our access audit ability?
Cab File Contents (for collector/processor)
- FAQ - WER Services - more info here under the question "What are the different types of memory dumps?"
- WERInternalMetadata.xml - (possibly) not present if version.txt is. Includes
- OSVersionInformation - windows version info, architecture, etc.
- ProblemSignatures - event type (crash/hang), crashing executable name, exe version/timestamp, methodDef token of faulting method (?), and IL offset of faulting instruction (?)
- DynamicSignatures -
- SystemInformation - HW info. What's an MID?
- AppCompat.txt (also all lower) - not present if WERDataCollectionFailure.txt is. Includes information on all images loaded by the process.
- WERDataCollectionFailure.txt - includes error message if processing failed in MS.
- version.txt - only came across this once. Only included OS version.
- For crashes
- ______.{m}dmp - minidump file. These types of dumps already appeared to be handled according to http://code.google.com/p/google-breakpad/wiki/ProcessorDesign
- For hangs
- <process-name>.xml - additional hang metadata like the wait chain list
- memory.hdmp - info about the difference between a heapdump and a minidump outlined here
Future Investigations
- Consider providing a solution (link) when the user is presented with the Windows crash dialog
- Can even link to an exe (if part of the "Designed for Windows" logo program), which may be a good idea for last ditch effort if even Firefox's safe mode fails (application files no longer pristine, need reinstall).