Release Management/WER Investigation: Difference between revisions

(Created page with "This page will track the initial investigation of Windows Error Reporting (WER). Our current plan is to hook our bug tracking and metrics into [https://winqual.microsoft.com/ Win...")
 
 
(25 intermediate revisions by the same user not shown)
Line 1: Line 1:
This page will track the initial investigation of Windows Error Reporting (WER). Our current plan is to hook our bug tracking and metrics into [https://winqual.microsoft.com/ Winqual], the site/webAPI used to access Microsoft's crash info. Using this info will help diagnose hangs (which aren't currently tracked) on Windows XP and later, and may even uncover new crashes.
This page will track the initial investigation of Windows Error Reporting (WER). Our current plan is to hook our bug tracking and metrics into [https://winqual.microsoft.com/ Winqual], the site/API used to access Microsoft's crash info. Using this info will help diagnose hangs (which aren't currently tracked) on Windows XP and later, and may even uncover new crashes.


== Areas of Interest ==
== Meetings ==
* Terms associated with using Winqual (data after 6mos, etc.)
* [[/2011-10-10|2011-10-10]] - Kickoff
* Creating a library/service to act as a
* [[/2011-10-19|2011-10-19]]
** Sync layer between Winqual and our crash storage service
** Method of providing symbols/app version/file mappings for new builds
** Auto-filer for [[Bugzilla]], possibly dealing with dupes
** Provider for [[Socorro]] crash metrics
* Applicability for other platforms such as Mac OS X


== Associated Bugs ==
== Associated Bugs ==
* '''Meta bug''' - {{bug|695713}}
* {{bug|429592}} - discussion of what to do with process hangs
* {{bug|429592}} - discussion of what to do with process hangs
* {{bug|600275}}
* {{bug|600275}}
* {{bug|692859}}
* {{bug|693315}} - Add annotation for source of crashes
* {{bug|696207}} - Add AppMap.exe to the build process


== Resources ==
== Resources ==
* Short description on MSDN: [http://msdn.microsoft.com/en-us/windows/hardware/gg487440 WER: Getting Started]
* Short description on MSDN: [http://msdn.microsoft.com/en-us/windows/hardware/gg487440 WER: Getting Started]
* Longer guide (more targeted at hw/drivers though): [https://winqual.microsoft.com/help/developers_guide_to_wer.htm Developers Guide to WER]
* Longer guide (more targeted at hw/drivers though): [https://winqual.microsoft.com/help/developers_guide_to_wer.htm Developers Guide to WER]
* [http://winqual.microsoft.com/help/winqual_help.pdf Full WinQual Docs]
* Microsoft's WER client: [http://stackhash.codeplex.com/ Stackhash]
* Microsoft's WER client: [http://stackhash.codeplex.com/ Stackhash]
* [https://msevents.microsoft.com/CUI/EventDetail.aspx?EventID=1032286008&EventCategory=5&culture=en-US&CountryCode=US MSDN Webcast: Getting Started with Windows Error Reporting] (video from 2006)
* Breakpad design docs - [https://code.google.com/p/google-breakpad/w/list https://code.google.com/p/google-breakpad/w/list]
* [https://blogs.msdn.com/themes/blogs/generic/post.aspx?WeblogApp=wer&y=2009&m=03&d=16&WeblogPostName=faq&GroupKeys= FAQ - WER Services]
* [https://winqual.microsoft.com/help/dp_eventlist.htm WinQual Event List Columns Explanation]
== Builds ==
=== Creating File Mappings ===
* MS suggests using their Microsoft Product Feedback Mapping Tool (AppMap.exe) to create a file manifest - required for crash collection for a new product version
* (?) Need to explore generating this file manifest so that the build machines can create it
** Or use the command line interface during the build on a windows machine as outlined at [https://winqual.microsoft.com/help/dp_appmap.htm Microsoft Product Feedback Mapping Tool Readme]
=== Uploading File Mappings ===
* MS suggests using the "Upload File Mappings option" on the Administration menu
* Can upload file mappings, but might be out of date: [https://blogs.msdn.com/themes/blogs/generic/post.aspx?WeblogApp=wer&y=2009&m=06&d=12&WeblogPostName=using-the-product-mapping-file-upload-web-service&GroupKeys= blog post]
== WinQual Update Frequency ==
* "By default we collect 10 cab (minidump) files per event"
* Lag times
** "Once we receive cab files for an event you will generally be able to see these cabs within a few hours of us receiving them."
** "For newly detected crashes it can take more than 4 days to get the crashes processed and up on the site."
* (?) Determine if there are any limitations to the number of products that can be registered (for possible use with nighties
* (?) Bandwidth limits for pulling down
* Only the first few cab files are stored for a crash. The event viewer web interface offers the ability to make a "Data Request" to collect "Processor & Memory Information", the heap, specific files (logs, etc.), additional cab files
** (?) Can the web API be used to expose this to developers or will we need more accounts?
* (?) Need to find out what it means when no cab files are available and the web interface offers "(click here to switch to collection mode)" - aren't we always collecting?
== Accessing WinQual Data ==
=== Windows Live Login ===
* (?) Access in winqual requires a Windows Live login. Similarly, in MS's StackHash client requires the use of Windows Live Sign-in Assistant. Need to determine if the associated library is required for logging in.
** [http://msdn.microsoft.com/en-us/library/bb676633.aspx Windows Live ID Web Authentication]
** [http://msdn.microsoft.com/en-us/library/ff748607.aspx Using the REST API Service]
* (?) Need to figure out what account we'd use for automated tools
=== WER Web API ===
* API documentation: [http://stackhash.codeplex.com/SourceControl/list/changesets# StackHash source download] > 3rdparty > WinQual API > Data Services.docx
* API documentation: [http://stackhash.codeplex.com/SourceControl/list/changesets# StackHash source download] > 3rdparty > WinQual API > Data Services.docx


== Needed for Investigation ==
== Breakpad/Soccoro ==
* VMWare/Windows 7 (bug needed)
* Crash dumps are stored in buckets
* Winqual account
** "For crash events the bucketing parameters are Application Name, Application Version, Application Build Date, Module Name, Module Version, Module Build Date, Exception Code, and Code Offset"
* (?) Need to understand the overlap between WinQual crash data and Breakpad crash data, map the applicable info, and decide what to do with "additional" info
** (?) Can we make use of individual hit event info in general? (as opposed to just crash cabs)
* (?) Will hangs (heap dumps) need to be handled any differently than minidumps?
** Hang blog posts: [https://blogs.msdn.com/themes/blogs/generic/post.aspx?WeblogApp=wer&y=2009&m=03&d=19&WeblogPostName=let-there-be-hangs-part-3-the-hungapp-module&GroupKeys= part 3] and [https://blogs.msdn.com/themes/blogs/generic/post.aspx?WeblogApp=wer&y=2009&m=03&d=19&WeblogPostName=let-there-be-hangs-part-4-better-bucketing-in-windows-vista&GroupKeys= part 4]
** They are bucketed differently. On XP, "hangs really only have 2 effective bucketing parameters... all of particular version of an application’s hangs ended up in a single bucket." On Vista it's better, but "there are still edge cases (just as there are in crash bucketing) where a bucket does not uniquely identify a single bug." (^^ see blog posts)
** Need to also understand how to represent [https://blogs.msdn.com/themes/blogs/generic/post.aspx?WeblogApp=wer&y=2010&m=08&d=16&WeblogPostName=xproc-application-hang-cabs-in-windows-7&GroupKeys= cross process hangs]
* (?) Who do we give access to minidumps?
* (?) What is our current data retention policy?
** We may want to keep hangs around for longer since there may be a lot, and they've never been investigated
* (?) What is our access audit ability?


== Questions that need Answering ==
=== Cab File Contents (for collector/processor) ===
* What needs to be done to creating the file mapping XML files from each build?
* [https://blogs.msdn.com/themes/blogs/generic/post.aspx?WeblogApp=wer&y=2009&m=03&d=16&WeblogPostName=faq&GroupKeys= FAQ - WER Services] - more info here under the question "What are the different types of memory dumps?"
* What information is actually provided by Winqual (on the site, in the REST data, and in the CAB files)?
* WERInternalMetadata.xml - (possibly) not present if version.txt is. Includes
* How will we symbolicate the crash dumps?
** OSVersionInformation - windows version info, architecture, etc.
** The Developers Guide mentions that "Microsoft encourages people to submit their symbols when they submit drivers to be signed". Can we take advantage of that?
** ProblemSignatures - event type (crash/hang), crashing executable name, exe version/timestamp, methodDef token of faulting method (?), and IL offset of faulting instruction (?)
* How much overlap is there with current crash tracking at Mozilla?
** DynamicSignatures -
* MS: Will we run into any limits if staying synchronized with the server?
** SystemInformation - HW info. What's an MID?
* MS: Any limitations based upon the number of builds (delays, etc). Especially necessary if we decide to track nightlies.
* AppCompat.txt (also all lower) - not present if WERDataCollectionFailure.txt is. Includes information on all images loaded by the process.
* How does debugging using minidump files work?
* WERDataCollectionFailure.txt - includes error message if processing failed in MS.
** Requires both the associated symbols as well as the associated images.
* version.txt - only came across this once. Only included OS version.
* Crash/hang "buckets" only collect CAB files for the first few instances of the crash - what does that mean for debugging?
* For crashes
** In the Developers Guide, it mentions "We can collect additional files on request if you find that you need more information for debugging."
** ______.{m}dmp - minidump file. These types of dumps already appeared to be handled according to http://code.google.com/p/google-breakpad/wiki/ProcessorDesign
* For hangs
** <process-name>.xml - additional hang metadata like the wait chain list
** memory.hdmp - info about the difference between a heapdump and a minidump [http://msdn.microsoft.com/en-us/library/windows/desktop/bb513622%28v=vs.85%29.aspx outlined here]


== Future Investigations ==
== Future Investigations ==
* Consider providing a solution (link) when the user is presented with the Windows crash dialog
* Consider providing a solution (link) when the user is presented with the Windows crash dialog
** Can even link to an exe (if part of the "Designed for Windows" logo program), which may be a good idea for last ditch effort if even Firefox's safe mode fails (application files no longer pristine, need reinstall).
** Can even link to an exe (if part of the "Designed for Windows" logo program), which may be a good idea for last ditch effort if even Firefox's safe mode fails (application files no longer pristine, need reinstall).
Confirmed users
1,798

edits