Release Management/WER Investigation: Difference between revisions
Jump to navigation
Jump to search
(8 intermediate revisions by the same user not shown) | |||
Line 20: | Line 20: | ||
* [https://msevents.microsoft.com/CUI/EventDetail.aspx?EventID=1032286008&EventCategory=5&culture=en-US&CountryCode=US MSDN Webcast: Getting Started with Windows Error Reporting] (video from 2006) | * [https://msevents.microsoft.com/CUI/EventDetail.aspx?EventID=1032286008&EventCategory=5&culture=en-US&CountryCode=US MSDN Webcast: Getting Started with Windows Error Reporting] (video from 2006) | ||
* Breakpad design docs - [https://code.google.com/p/google-breakpad/w/list https://code.google.com/p/google-breakpad/w/list] | * Breakpad design docs - [https://code.google.com/p/google-breakpad/w/list https://code.google.com/p/google-breakpad/w/list] | ||
* [https://blogs.msdn.com/themes/blogs/generic/post.aspx?WeblogApp=wer&y=2009&m=03&d=16&WeblogPostName=faq&GroupKeys= FAQ - WER Services] | |||
* [https://winqual.microsoft.com/help/dp_eventlist.htm WinQual Event List Columns Explanation] | |||
== Builds == | == Builds == | ||
Line 29: | Line 31: | ||
=== Uploading File Mappings === | === Uploading File Mappings === | ||
* MS suggests using the "Upload File Mappings option" on the Administration menu | * MS suggests using the "Upload File Mappings option" on the Administration menu | ||
* | * Can upload file mappings, but might be out of date: [https://blogs.msdn.com/themes/blogs/generic/post.aspx?WeblogApp=wer&y=2009&m=06&d=12&WeblogPostName=using-the-product-mapping-file-upload-web-service&GroupKeys= blog post] | ||
== WinQual Update Frequency == | == WinQual Update Frequency == | ||
* "By default we collect 10 cab (minidump) files per event" | |||
* Lag times | |||
** "Once we receive cab files for an event you will generally be able to see these cabs within a few hours of us receiving them." | |||
** "For newly detected crashes it can take more than 4 days to get the crashes processed and up on the site." | |||
* (?) Determine if there are any limitations to the number of products that can be registered (for possible use with nighties | * (?) Determine if there are any limitations to the number of products that can be registered (for possible use with nighties | ||
* (?) Bandwidth limits for pulling down | * (?) Bandwidth limits for pulling down | ||
* Only the first few cab files are stored for a crash. The event viewer web interface offers the ability to make a "Data Request" to collect "Processor & Memory Information", the heap, specific files (logs, etc.), additional cab files | * Only the first few cab files are stored for a crash. The event viewer web interface offers the ability to make a "Data Request" to collect "Processor & Memory Information", the heap, specific files (logs, etc.), additional cab files | ||
Line 50: | Line 55: | ||
== Breakpad/Soccoro == | == Breakpad/Soccoro == | ||
* Crash dumps are stored in buckets | |||
** "For crash events the bucketing parameters are Application Name, Application Version, Application Build Date, Module Name, Module Version, Module Build Date, Exception Code, and Code Offset" | |||
* (?) Need to understand the overlap between WinQual crash data and Breakpad crash data, map the applicable info, and decide what to do with "additional" info | * (?) Need to understand the overlap between WinQual crash data and Breakpad crash data, map the applicable info, and decide what to do with "additional" info | ||
** (?) Can we make use of individual hit event info in general? (as opposed to just crash cabs) | ** (?) Can we make use of individual hit event info in general? (as opposed to just crash cabs) | ||
* (?) Will hangs (heap dumps) need to be handled any differently than minidumps? | * (?) Will hangs (heap dumps) need to be handled any differently than minidumps? | ||
** Hang blog posts: [https://blogs.msdn.com/themes/blogs/generic/post.aspx?WeblogApp=wer&y=2009&m=03&d=19&WeblogPostName=let-there-be-hangs-part-3-the-hungapp-module&GroupKeys= part 3] and [https://blogs.msdn.com/themes/blogs/generic/post.aspx?WeblogApp=wer&y=2009&m=03&d=19&WeblogPostName=let-there-be-hangs-part-4-better-bucketing-in-windows-vista&GroupKeys= part 4] | |||
** They are bucketed differently. On XP, "hangs really only have 2 effective bucketing parameters... all of particular version of an application’s hangs ended up in a single bucket." On Vista it's better, but "there are still edge cases (just as there are in crash bucketing) where a bucket does not uniquely identify a single bug." (^^ see blog posts) | |||
** Need to also understand how to represent [https://blogs.msdn.com/themes/blogs/generic/post.aspx?WeblogApp=wer&y=2010&m=08&d=16&WeblogPostName=xproc-application-hang-cabs-in-windows-7&GroupKeys= cross process hangs] | |||
* (?) Who do we give access to minidumps? | * (?) Who do we give access to minidumps? | ||
* (?) What is our current data retention policy? | * (?) What is our current data retention policy? | ||
Line 59: | Line 69: | ||
=== Cab File Contents (for collector/processor) === | === Cab File Contents (for collector/processor) === | ||
* [https://blogs.msdn.com/themes/blogs/generic/post.aspx?WeblogApp=wer&y=2009&m=03&d=16&WeblogPostName=faq&GroupKeys= FAQ - WER Services] - more info here under the question "What are the different types of memory dumps?" | |||
* WERInternalMetadata.xml - (possibly) not present if version.txt is. Includes | * WERInternalMetadata.xml - (possibly) not present if version.txt is. Includes | ||
** OSVersionInformation - windows version info, architecture, etc. | ** OSVersionInformation - windows version info, architecture, etc. |
Latest revision as of 16:13, 24 October 2011
This page will track the initial investigation of Windows Error Reporting (WER). Our current plan is to hook our bug tracking and metrics into Winqual, the site/API used to access Microsoft's crash info. Using this info will help diagnose hangs (which aren't currently tracked) on Windows XP and later, and may even uncover new crashes.
Meetings
- 2011-10-10 - Kickoff
- 2011-10-19
Associated Bugs
- Meta bug - bug 695713
- bug 429592 - discussion of what to do with process hangs
- bug 600275
- bug 692859
- bug 693315 - Add annotation for source of crashes
- bug 696207 - Add AppMap.exe to the build process
Resources
- Short description on MSDN: WER: Getting Started
- Longer guide (more targeted at hw/drivers though): Developers Guide to WER
- Full WinQual Docs
- Microsoft's WER client: Stackhash
- MSDN Webcast: Getting Started with Windows Error Reporting (video from 2006)
- Breakpad design docs - https://code.google.com/p/google-breakpad/w/list
- FAQ - WER Services
- WinQual Event List Columns Explanation
Builds
Creating File Mappings
- MS suggests using their Microsoft Product Feedback Mapping Tool (AppMap.exe) to create a file manifest - required for crash collection for a new product version
- (?) Need to explore generating this file manifest so that the build machines can create it
- Or use the command line interface during the build on a windows machine as outlined at Microsoft Product Feedback Mapping Tool Readme
Uploading File Mappings
- MS suggests using the "Upload File Mappings option" on the Administration menu
- Can upload file mappings, but might be out of date: blog post
WinQual Update Frequency
- "By default we collect 10 cab (minidump) files per event"
- Lag times
- "Once we receive cab files for an event you will generally be able to see these cabs within a few hours of us receiving them."
- "For newly detected crashes it can take more than 4 days to get the crashes processed and up on the site."
- (?) Determine if there are any limitations to the number of products that can be registered (for possible use with nighties
- (?) Bandwidth limits for pulling down
- Only the first few cab files are stored for a crash. The event viewer web interface offers the ability to make a "Data Request" to collect "Processor & Memory Information", the heap, specific files (logs, etc.), additional cab files
- (?) Can the web API be used to expose this to developers or will we need more accounts?
- (?) Need to find out what it means when no cab files are available and the web interface offers "(click here to switch to collection mode)" - aren't we always collecting?
Accessing WinQual Data
Windows Live Login
- (?) Access in winqual requires a Windows Live login. Similarly, in MS's StackHash client requires the use of Windows Live Sign-in Assistant. Need to determine if the associated library is required for logging in.
- (?) Need to figure out what account we'd use for automated tools
WER Web API
- API documentation: StackHash source download > 3rdparty > WinQual API > Data Services.docx
Breakpad/Soccoro
- Crash dumps are stored in buckets
- "For crash events the bucketing parameters are Application Name, Application Version, Application Build Date, Module Name, Module Version, Module Build Date, Exception Code, and Code Offset"
- (?) Need to understand the overlap between WinQual crash data and Breakpad crash data, map the applicable info, and decide what to do with "additional" info
- (?) Can we make use of individual hit event info in general? (as opposed to just crash cabs)
- (?) Will hangs (heap dumps) need to be handled any differently than minidumps?
- Hang blog posts: part 3 and part 4
- They are bucketed differently. On XP, "hangs really only have 2 effective bucketing parameters... all of particular version of an application’s hangs ended up in a single bucket." On Vista it's better, but "there are still edge cases (just as there are in crash bucketing) where a bucket does not uniquely identify a single bug." (^^ see blog posts)
- Need to also understand how to represent cross process hangs
- (?) Who do we give access to minidumps?
- (?) What is our current data retention policy?
- We may want to keep hangs around for longer since there may be a lot, and they've never been investigated
- (?) What is our access audit ability?
Cab File Contents (for collector/processor)
- FAQ - WER Services - more info here under the question "What are the different types of memory dumps?"
- WERInternalMetadata.xml - (possibly) not present if version.txt is. Includes
- OSVersionInformation - windows version info, architecture, etc.
- ProblemSignatures - event type (crash/hang), crashing executable name, exe version/timestamp, methodDef token of faulting method (?), and IL offset of faulting instruction (?)
- DynamicSignatures -
- SystemInformation - HW info. What's an MID?
- AppCompat.txt (also all lower) - not present if WERDataCollectionFailure.txt is. Includes information on all images loaded by the process.
- WERDataCollectionFailure.txt - includes error message if processing failed in MS.
- version.txt - only came across this once. Only included OS version.
- For crashes
- ______.{m}dmp - minidump file. These types of dumps already appeared to be handled according to http://code.google.com/p/google-breakpad/wiki/ProcessorDesign
- For hangs
- <process-name>.xml - additional hang metadata like the wait chain list
- memory.hdmp - info about the difference between a heapdump and a minidump outlined here
Future Investigations
- Consider providing a solution (link) when the user is presented with the Windows crash dialog
- Can even link to an exe (if part of the "Designed for Windows" logo program), which may be a good idea for last ditch effort if even Firefox's safe mode fails (application files no longer pristine, need reinstall).