Release Management/WER Investigation: Difference between revisions

 
(16 intermediate revisions by the same user not shown)
Line 1: Line 1:
This page will track the initial investigation of Windows Error Reporting (WER). Our current plan is to hook our bug tracking and metrics into [https://winqual.microsoft.com/ Winqual], the site/API used to access Microsoft's crash info. Using this info will help diagnose hangs (which aren't currently tracked) on Windows XP and later, and may even uncover new crashes.
This page will track the initial investigation of Windows Error Reporting (WER). Our current plan is to hook our bug tracking and metrics into [https://winqual.microsoft.com/ Winqual], the site/API used to access Microsoft's crash info. Using this info will help diagnose hangs (which aren't currently tracked) on Windows XP and later, and may even uncover new crashes.
== Meetings ==
* [[/2011-10-10|2011-10-10]] - Kickoff
* [[/2011-10-19|2011-10-19]]


== Associated Bugs ==
== Associated Bugs ==
* '''Meta bug''' - {{bug|695713}}
* {{bug|429592}} - discussion of what to do with process hangs
* {{bug|429592}} - discussion of what to do with process hangs
* {{bug|600275}}
* {{bug|600275}}
* {{bug|692859}}
* {{bug|693315}} - Add annotation for source of crashes
* {{bug|696207}} - Add AppMap.exe to the build process


== Resources ==
== Resources ==
Line 12: Line 20:
* [https://msevents.microsoft.com/CUI/EventDetail.aspx?EventID=1032286008&EventCategory=5&culture=en-US&CountryCode=US MSDN Webcast: Getting Started with Windows Error Reporting] (video from 2006)
* [https://msevents.microsoft.com/CUI/EventDetail.aspx?EventID=1032286008&EventCategory=5&culture=en-US&CountryCode=US MSDN Webcast: Getting Started with Windows Error Reporting] (video from 2006)
* Breakpad design docs - [https://code.google.com/p/google-breakpad/w/list https://code.google.com/p/google-breakpad/w/list]
* Breakpad design docs - [https://code.google.com/p/google-breakpad/w/list https://code.google.com/p/google-breakpad/w/list]
 
* [https://blogs.msdn.com/themes/blogs/generic/post.aspx?WeblogApp=wer&y=2009&m=03&d=16&WeblogPostName=faq&GroupKeys= FAQ - WER Services]
== Needed for Investigation  ==
* [https://winqual.microsoft.com/help/dp_eventlist.htm WinQual Event List Columns Explanation]
*<strike>VMWare/Windows 7 ({{bug|690490}} &amp; {{bug|690491}})</strike>
*<strike>Winqual account</strike>


== Builds ==
== Builds ==
Line 21: Line 27:
* MS suggests using their Microsoft Product Feedback Mapping Tool (AppMap.exe) to create a file manifest - required for crash collection for a new product version
* MS suggests using their Microsoft Product Feedback Mapping Tool (AppMap.exe) to create a file manifest - required for crash collection for a new product version
* (?) Need to explore generating this file manifest so that the build machines can create it
* (?) Need to explore generating this file manifest so that the build machines can create it
** Or use the command line interface during the build on a windows machine as outlined at [https://winqual.microsoft.com/help/dp_appmap.htm Microsoft Product Feedback Mapping Tool Readme]


=== Uploading File Mappings ===
=== Uploading File Mappings ===
* MS suggests using the "Upload File Mappings option" on the Administration menu
* MS suggests using the "Upload File Mappings option" on the Administration menu
* (?) Need to figure out if the WER web API can instead be used to perform an upload
* Can upload file mappings, but might be out of date: [https://blogs.msdn.com/themes/blogs/generic/post.aspx?WeblogApp=wer&y=2009&m=06&d=12&WeblogPostName=using-the-product-mapping-file-upload-web-service&GroupKeys= blog post]


== WinQual Update Frequency ==
== WinQual Update Frequency ==
* "By default we collect 10 cab (minidump) files per event"
* Lag times
** "Once we receive cab files for an event you will generally be able to see these cabs within a few hours of us receiving them."
** "For newly detected crashes it can take more than 4 days to get the crashes processed and up on the site."
* (?) Determine if there are any limitations to the number of products that can be registered (for possible use with nighties
* (?) Determine if there are any limitations to the number of products that can be registered (for possible use with nighties
** Need to take into consideration the >2 day lag time on getting reports
* (?) Bandwidth limits for pulling down  
* (?) Bandwidth limits for pulling down  
* Only the first few cab files are stored for a crash. The event viewer web interface offers the ability to make a "Data Request" to collect "Processor & Memory Information", the heap, specific files (logs, etc.), additional cab files
* Only the first few cab files are stored for a crash. The event viewer web interface offers the ability to make a "Data Request" to collect "Processor & Memory Information", the heap, specific files (logs, etc.), additional cab files
Line 45: Line 55:


== Breakpad/Soccoro ==
== Breakpad/Soccoro ==
* Crash dumps are stored in buckets
** "For crash events the bucketing parameters are Application Name, Application Version, Application Build Date, Module Name, Module Version, Module Build Date, Exception Code, and Code Offset"
* (?) Need to understand the overlap between WinQual crash data and Breakpad crash data, map the applicable info, and decide what to do with "additional" info
* (?) Need to understand the overlap between WinQual crash data and Breakpad crash data, map the applicable info, and decide what to do with "additional" info
** (?) Can we make use of individual hit event info in general? (as opposed to just crash cabs)
** (?) Can we make use of individual hit event info in general? (as opposed to just crash cabs)
* (?) Will hangs (heap dumps) need to be handled any differently than minidumps?
* (?) Will hangs (heap dumps) need to be handled any differently than minidumps?
** Hang blog posts: [https://blogs.msdn.com/themes/blogs/generic/post.aspx?WeblogApp=wer&y=2009&m=03&d=19&WeblogPostName=let-there-be-hangs-part-3-the-hungapp-module&GroupKeys= part 3] and [https://blogs.msdn.com/themes/blogs/generic/post.aspx?WeblogApp=wer&y=2009&m=03&d=19&WeblogPostName=let-there-be-hangs-part-4-better-bucketing-in-windows-vista&GroupKeys= part 4]
** They are bucketed differently. On XP, "hangs really only have 2 effective bucketing parameters... all of particular version of an application’s hangs ended up in a single bucket." On Vista it's better, but "there are still edge cases (just as there are in crash bucketing) where a bucket does not uniquely identify a single bug." (^^ see blog posts)
** Need to also understand how to represent [https://blogs.msdn.com/themes/blogs/generic/post.aspx?WeblogApp=wer&y=2010&m=08&d=16&WeblogPostName=xproc-application-hang-cabs-in-windows-7&GroupKeys= cross process hangs]
* (?) Who do we give access to minidumps?
* (?) Who do we give access to minidumps?
* (?) What is our current data retention policy?
* (?) What is our current data retention policy?
Line 54: Line 69:


=== Cab File Contents (for collector/processor) ===
=== Cab File Contents (for collector/processor) ===
* [https://blogs.msdn.com/themes/blogs/generic/post.aspx?WeblogApp=wer&y=2009&m=03&d=16&WeblogPostName=faq&GroupKeys= FAQ - WER Services] - more info here under the question "What are the different types of memory dumps?"
* WERInternalMetadata.xml - (possibly) not present if version.txt is. Includes
* WERInternalMetadata.xml - (possibly) not present if version.txt is. Includes
** OSVersionInformation - windows version info, architecture, etc.
** OSVersionInformation - windows version info, architecture, etc.
Confirmed users
1,798

edits