Account confirmers, Anti-spam team, canmove, Confirmed users, Bureaucrats and Sysops emeriti
4,083
edits
No edit summary |
|||
(32 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
Notes collected from Stability Week 2013, Aug 19-23, Mozilla MV (in reverse order). | Notes collected from Stability Week 2013, Aug 19-23, Mozilla MV (in reverse order). [[CrashKill/StabilityWeek2013/Actions|Action Items have been extracted to a separate page]]. | ||
= Day 4: 08/22/2013 = | = Day 4: 08/22/2013 = | ||
Line 67: | Line 67: | ||
We should knock some windows in this airport. | We should knock some windows in this airport. | ||
=== Actions === | === Actions === | ||
* File a bug and do something about it. [lonnen] | * File a bug and do something about it. [lonnen] -> {{bug|908334}} | ||
= Day 3: 08/21/2013 = | = Day 3: 08/21/2013 = | ||
== Socorro: future of dev == | == Socorro: future of dev == | ||
Line 89: | Line 90: | ||
===Actions=== | ===Actions=== | ||
* redo Vagrant-lite (rhelmer) | * redo Vagrant-lite (rhelmer) | ||
* developer HBase polished and working (tmary bug 872810, rhelmer to follow up) bug 907964 | * developer HBase polished and working (tmary {{bug|872810}}, rhelmer to follow up) {{bug|907964}} | ||
* keep dev for now for use as a datastore etc (no real action here) | * keep dev for now for use as a datastore etc (no real action here) | ||
* build new dev env on the VM dumitru gave us (lonnen) | * build new dev env on the VM dumitru gave us (lonnen) | ||
Line 149: | Line 150: | ||
* Redesign crash reporter to better collect emails addresses. | * Redesign crash reporter to better collect emails addresses. | ||
===Actions=== | ===Actions=== | ||
* [KaiRo] Finish up bug on listing last 3 days of crashes in about:support | * [KaiRo] Finish up bug on listing last 3 days of crashes in about:support - {{bug|765285}} | ||
* [gps] expose "disable addon with ID X" API from chrome to "healthreport" | * [gps] expose "disable addon with ID X" API from chrome to "healthreport" - {{bug|912815}} | ||
* [Arun] come back for ideas for design changes in the client | * [Arun] come back for ideas for design changes in the client | ||
https://etherpad.mozilla.org/ux-stability-workweek (my notes -Arun) | https://etherpad.mozilla.org/ux-stability-workweek (my notes -Arun) | ||
* [bsmedberg] Support classification | * [bsmedberg] Support classification | ||
* [ | * [brandonsavage] API to say "tell me what classification for this crash is" [Magic 8 ball API] - {{bug|915667}} | ||
* [bsmedberg to follow up] Reduce the junk in the help menu. | * [bsmedberg to follow up] Reduce the junk in the help menu. | ||
* [bsmedberg] Redesign email flow | * [bsmedberg] Redesign email flow | ||
Line 192: | Line 193: | ||
* Build IDs of final release are highlighted in green on this spreadsheet : https://docs.google.com/a/mozilla.com/spreadsheet/ccc?key=0Av0LdM1CVycIdDBxVFhqNmNkSnBhVnNSc05vMmhRdXc#gid=4 | * Build IDs of final release are highlighted in green on this spreadsheet : https://docs.google.com/a/mozilla.com/spreadsheet/ccc?key=0Av0LdM1CVycIdDBxVFhqNmNkSnBhVnNSc05vMmhRdXc#gid=4 | ||
* bajaj - checklist item for channel to be release- | * bajaj - checklist item for channel to be release- | ||
* KaiRo to poke catlee for proper channel naming internally | * KaiRo to poke catlee for proper channel naming internally (pinged on {{bug|869905}} for now) | ||
== B2G Stability Process == | == B2G Stability Process == | ||
Line 224: | Line 225: | ||
* communicate the device-specific crash report | * communicate the device-specific crash report | ||
* [akeybl] Propose creation of a confidential partner-specific Boot2Gecko::Partner Issues (just like POVB) | * [akeybl] Propose creation of a confidential partner-specific Boot2Gecko::Partner Issues (just like POVB) | ||
* [nhirata w/ robert wood] automated testing ( http://woodrobert.wordpress.com/2013/05/31/gaia-ui-endurance-tests/ ) should enable crash metrics for TAMs to use (endurance testing) | * [done][nhirata w/ robert wood] automated testing ( http://woodrobert.wordpress.com/2013/05/31/gaia-ui-endurance-tests/ ) should enable crash metrics for TAMs to use (endurance testing) | ||
== B2G Stability Requirements == | == B2G Stability Requirements == | ||
Line 258: | Line 259: | ||
** KaiRo mentioned to have a test-case where they could crash and pass on the details in tn a bug # | ** KaiRo mentioned to have a test-case where they could crash and pass on the details in tn a bug # | ||
===Actions=== | ===Actions=== | ||
* | * exposing system app URLs publicly - {{bug|915397}} | ||
* | * reports per-device - {{bug|853455}} | ||
* [NEEDS BUG, Socorro] report by OS version 1.1.*, 1.1.2, etc. | * [NEEDS BUG, Socorro] report by OS version 1.1.*, 1.1.2, etc. | ||
* [NEEDS BUG, Socorro] surface the buildID to understand geo-specific stability issues | * [NEEDS BUG, Socorro] surface the buildID to understand geo-specific stability issues | ||
* [bsmedberg] replacing | * [bsmedberg] replacing debuggerd (bug #?) | ||
* [NEEDS BUG, bsmedberg] need to check for process responsiveness from an external place (app pings), and also pull system logs when reporting - https://bugzilla.mozilla.org/show_bug.cgi?id=908000 | * [NEEDS BUG, bsmedberg] need to check for process responsiveness from an external place (app pings), and also pull system logs when reporting - https://bugzilla.mozilla.org/show_bug.cgi?id=908000 | ||
* | * way to report issues to Mozilla from settings, with crash reports (input? etc?) - {{bug|915409}} | ||
* KaiRo->Fabrice to discuss B2G version number-less reports ({{bug|910836}}) | * KaiRo->Fabrice to discuss B2G version number-less reports ({{bug|910836}}) | ||
* nhirata/Laura to look at https://bugzilla.mozilla.org/show_bug.cgi?id=895246 (duplicate reports) | * nhirata/Laura to look at https://bugzilla.mozilla.org/show_bug.cgi?id=895246 (duplicate reports) | ||
** client side bug as well, why are we getting dupes at all? | ** client side bug as well, why are we getting dupes at all? | ||
** | ** switch dupe detection to use minidump checksumming - {{bug|907499}} | ||
* | * Get activation time be reported as install time - {{bug|915405}} | ||
* | * {{bug|908896 }} about:crashes on the phone | ||
* | * fg OOM kills (FHR? crashes?) - {{bug|915407}} | ||
== Meet with B2G Engineering working on Crashes == | == Meet with B2G Engineering working on Crashes == | ||
Line 325: | Line 326: | ||
===Actions=== | ===Actions=== | ||
*KaiRo to drive project on long-term crash rate graph. | *KaiRo to drive project on long-term crash rate graph. {{bug|915438}} | ||
== High Level CrashKill Review == | == High Level CrashKill Review == | ||
Line 365: | Line 366: | ||
** crash and don't get your tabs back (session restore) [non-actionable?] | ** crash and don't get your tabs back (session restore) [non-actionable?] | ||
* "Soak" time for RCs | * "Soak" time for RCs | ||
Actions | ===Actions=== | ||
* arm/flash top crash bugs -kairo | * arm/flash top crash bugs -kairo - filed {{bug|918085}} for a solution to include that | ||
* [socorro] triage bugs that kairo | * [socorro] triage bugs that kairo has on file for custom crash reports | ||
* longterm crashes, plugin crashes, hangs, plugin hangs | * longterm crashes, plugin crashes, hangs, plugin hangs - filed as {{bug|915438}} | ||
* annotate phase of startup as part of crash, along with actual time -bsmedberg (NEEDS BUG) - | * annotate phase of startup as part of crash, along with actual time -bsmedberg (NEEDS BUG) - {{bug|907994}} | ||
* bug on new crash severity rating {{bug|918077}} | |||
== Socorro Brainstorming == | == Socorro Brainstorming == | ||
* Combining signatures associated with the same bug # in Socorro (m:n relationship) | * Combining signatures associated with the same bug # in Socorro (m:n relationship) | ||
Line 458: | Line 461: | ||
* Could we get some kind of MTBF instead of only crashes per ADI? | * Could we get some kind of MTBF instead of only crashes per ADI? | ||
* Can we annotate amount of usage of number of pageloads? | * Can we annotate amount of usage of number of pageloads? | ||
Actions | ===Actions=== | ||
* [gps] Annotation on "how many tabs were open?" | * [gps] Annotation on "how many tabs were open?" | ||
* [dmajor] Annotation on OS (maybe also build architecture?) so EMPTY dump reports still get that (and use it in Socorro) | * [dmajor] Annotation on OS (maybe also build architecture?) so EMPTY dump reports still get that (and use it in Socorro) | ||
** Bug already exists: {{bug|838061}} | |||
* [ask Cww if to do at all] put "increasing memory swap helps to crash less" on SUMO radar | * [ask Cww if to do at all] put "increasing memory swap helps to crash less" on SUMO radar | ||
* [tabled] Investigate with UX/UR of warning users of impending doom and proposing ways out | * [tabled] Investigate with UX/UR of warning users of impending doom and proposing ways out | ||
* [bsmedberg to file] Annotate that the slow script dialog came up - | * [bsmedberg to file] Annotate that the slow script dialog came up - {{bug|907993}} | ||
* [JS team, laura to nag] bug 630464 get JS stacks on the toplevel for uncaught exceptions | * [JS team, laura to nag] {{bug|630464}} get JS stacks on the toplevel for uncaught exceptions | ||
* [not yet] Talk to jjensen about what crash data we can put in FHR | * [not yet] Talk to jjensen about what crash data we can put in FHR - {{bug|875562}} is leading up to this | ||
* Can we determine and eliminate duplicate crash reports (ones that are really just re-submitted) | * Can we determine and eliminate duplicate crash reports (ones that are really just re-submitted) | ||
** [crashkill]review heuristics | ** [crashkill]review heuristics - {{bug|907499}} will do this in a better way. | ||
** [laura/kairo]triage old bugs for aggregates with and without dupes | ** [laura/kairo]triage old bugs for aggregates with and without dupes - with {{bug|907499}} we'll be able to eliminate real dupes completely | ||
* We have information on gfx chipset and driver, memory, hardware/cpu but don't use it to narrow down the cause of crashes | * We have information on gfx chipset and driver, memory, hardware/cpu but don't use it to narrow down the cause of crashes | ||
** [laura/kairo] prioritize/review bugs during triage | ** [laura/kairo] prioritize/review bugs during triage - {{bug|853468}} gives us summarized info on graphics chipsets | ||
** Can potentially compare with information like add-ons that are installed | ** Can potentially compare with information like add-ons that are installed | ||
* [kairo to file] Annotate/show events like Firefox or MS releases on the crash charts? SUMO already has the ability to note "events" on their graphs. They use it for releases as well. Maybe we can steal or learn. | * [kairo to file] Annotate/show events like Firefox or MS releases on the crash charts? SUMO already has the ability to note "events" on their graphs. They use it for releases as well. Maybe we can steal or learn. | ||
* [Kairo to follow up with privacy] Submit all URLs from all tabs with a crash report | * [Kairo to follow up with privacy, contacted afowler] Submit all URLs from all tabs with a crash report | ||
== Automated Stability == | == Automated Stability == | ||
* Jesse, Rob, :gkw, :tracy, :bc | * Jesse, Rob, :gkw, :tracy, :bc | ||
Line 493: | Line 498: | ||
** Or utilize nightly population by filtering fuzzer test cases(as it sometimes finds security issues), if costs is an issue ? | ** Or utilize nightly population by filtering fuzzer test cases(as it sometimes finds security issues), if costs is an issue ? | ||
** We found about ~10bugs a week with running fuzzers on ~100 machines .Fuzzer's run's breakpad,parses to text output, matches to see if its a known bug,checks assertation failures,too much recursion, ref count leaks,inconsistent rendering for a given DOM tree | ** We found about ~10bugs a week with running fuzzers on ~100 machines .Fuzzer's run's breakpad,parses to text output, matches to see if its a known bug,checks assertation failures,too much recursion, ref count leaks,inconsistent rendering for a given DOM tree | ||
===Actions=== | |||
* QA to look at the output from the bug hunter tool given the current scenario | * QA to look at the output from the bug hunter tool given the current scenario | ||
*[bsmedberg to help with the action needed to resolve the issues] Disconnect between fuzzer bugs and the signature in crash-stat ? | *[bsmedberg to help with the action needed to resolve the issues] Disconnect between fuzzer bugs and the signature in crash-stat ? | ||
* gkw, to ensure decoder uploads stacks as well with the bug reports | * gkw, to ensure decoder uploads stacks as well with the bug reports | ||
* Jesse to release part of fuzzzers so nightly population can run it | * Jesse to release part of fuzzzers so nightly population can run it | ||
* [KaiRo] escalate necessity of tests passing on various sanitizers so we can have tests and fuzzing find issues using those | * [KaiRo] escalate necessity of tests passing on various sanitizers so we can have tests and fuzzing find issues using those - as of the platform meeting on 9/24, it looks like tests are run and passing on mozilla-central on ASan builds. | ||
* If we can't pass all TBPL tests using ASan/etc, can we enable ASan/etc for the subset of tests that do pass (using test manifests for ASan opt-in)? | * If we can't pass all TBPL tests using ASan/etc, can we enable ASan/etc for the subset of tests that do pass (using test manifests for ASan opt-in)? | ||
* [KaiRo] Ease routine jobs of putting ranks into bugs and updating topcrash keyword | * [KaiRo] Ease routine jobs of putting ranks into bugs and updating topcrash keyword - {{bug|913437}} | ||
** Get a service on Socorro that will give current ranks of a signature on current releases. | ** Get a service on Socorro that will give current ranks of a signature on current releases. - {{bug|915373}} | ||
* Can we get Nightly ASan builds that can auto-update so adventerous users can use Nightly ASan as their regular browser? | * Can we get Nightly ASan builds that can auto-update so adventerous users can use Nightly ASan as their regular browser? | ||
== bsmedberg hour (see '''#s''') | |||
== bsmedberg hour == | |||
(see '''#s''') | |||
* ('''1''') JSON version of minidump stackwalk | * ('''1''') JSON version of minidump stackwalk | ||
** Goal is to add all sorts of additional data to do identify JIT frame, go back to the caller etc (??) | ** Goal is to add all sorts of additional data to do identify JIT frame, go back to the caller etc (??) | ||
Line 542: | Line 549: | ||
** Implement in a way that the existing crash-reporter UI should be able to do it when he restarts firefox | ** Implement in a way that the existing crash-reporter UI should be able to do it when he restarts firefox | ||
** Store a crash-report on user's computer and upload on demand ? | ** Store a crash-report on user's computer and upload on demand ? | ||
===Actions=== | |||
* See above '''bolded''' #s | * See above '''bolded''' #s | ||
* [lars] JSON minidump_stackwalk getting deployed | * [lars] JSON minidump_stackwalk getting deployed | ||
== JS Engineering | |||
== JS Engineering == | |||
* :nbp,Jesse from the JS team are here | * :nbp,Jesse from the JS team are here | ||
* What are the pain points for JS team related to Socorro ? | * What are the pain points for JS team related to Socorro ? | ||
Line 597: | Line 605: | ||
* Making flexible Elastic search Queries ? [Adrian] | * Making flexible Elastic search Queries ? [Adrian] | ||
===Actions=== | |||
* [bajaj] Create a bug# to have the total GC crashes on Nightly graphed | * [bajaj] Create a bug# to have the total GC crashes on Nightly graphed | ||
* [KaiRo] to ask Scoobidiver for information on anything that are potentially automatable (Scoobidiver could get hit by a bus, God forbid!) | * [KaiRo] to ask Scoobidiver for information on anything that are potentially automatable (Scoobidiver could get hit by a bus, God forbid!) - actually, it looks like he disappeared right before this stability week :( | ||
* [Laura] to find list of what JS team wants to filter on, determine whether these are common between teams or we need a flexible solution | * [Laura] to find list of what JS team wants to filter on, determine whether these are common between teams or we need a flexible solution | ||
* Enable search results to be split by custom criteria (e.g. addresses) instead of signature | * Enable search results to be split by custom criteria (e.g. addresses) instead of signature | ||
* Better way of classification for JIT crashes | * [bsmedberg working that out with nbp] Better way of classification for JIT crashes | ||
* Do something (in Socorro UI) with "interesting" addresses (mark them?) | * Do something (in Socorro UI) with "interesting" addresses (mark them?) - {{bug|918101}} | ||
* [bsmedberg/nbp] Store where we are in JIT code in a page, dump it as part of the crash stack | * [bsmedberg/nbp] Store where we are in JIT code in a page, dump it as part of the crash stack | ||
* [Socorro] exploitable crashes could be marked differently, see above for heuristics | * [Socorro] exploitable crashes could be marked differently, see above for heuristics | ||
* Make stalkwalker know about (Ion) JIT frames | * [bsmedberg] Make stalkwalker know about (Ion) JIT frames | ||
== Radeon Update & Driver Investigation == | == Radeon Update & Driver Investigation == | ||
Line 625: | Line 633: | ||
** We need to give their QA to have them engaged? -milan | ** We need to give their QA to have them engaged? -milan | ||
** Getting to the right person would help | ** Getting to the right person would help | ||
===Actions=== | |||
* [bsmedberg] learning kernel debugging for the radeon crash, from our new in-house expert | * [bsmedberg] learning kernel debugging for the radeon crash, from our new in-house expert | ||
* '''[KaiRo/akeybl/bizdev?] following up on escalation with AMD, in preparation for blogging''' | * '''[KaiRo/akeybl/bizdev?] following up on escalation with AMD, in preparation for blogging''' - pushed off as dmajor found out more details on what's behind this issue | ||
** having an engineer would make things much much more likely to be resolved, at MS you can even pay $$ for support tickets to be reviewed by egr | ** having an engineer would make things much much more likely to be resolved, at MS you can even pay $$ for support tickets to be reviewed by egr | ||
* [lsblakk/bsmedberg] highly correlated crashes in FF23.0, discussing in post-mortem | * [lsblakk/bsmedberg] highly correlated crashes in FF23.0, discussing in post-mortem | ||
* [bsmedberg] to find bug # for empty crash (probably bug 837835) - | * [bsmedberg] to find bug # for empty crash (probably {{bug|837835}}) - | ||
** [milan] to follow up on above bug | ** [milan] to follow up on above bug | ||
* MS support ticket on bug 812695 (text corruption on Win7) | * MS support ticket on {{bug|812695}} (text corruption on Win7) [KaiRo sent email to jrmuizel/bas to check if we really think it's an MS issue] | ||
== GFX Engineering | |||
== GFX Engineering == | |||
* Ran into situations that happen on older cards | * Ran into situations that happen on older cards | ||
** Can we purchase gfx cards before they're EOL'd or when they're released? -milan | ** Can we purchase gfx cards before they're EOL'd or when they're released? -milan | ||
Line 665: | Line 674: | ||
* what about issues where millions of users crash once? -bsmedberg | * what about issues where millions of users crash once? -bsmedberg | ||
** still impacts overall stability perception, obviously | ** still impacts overall stability perception, obviously | ||
===Actions=== | |||
* [Milan/Anthony/Marc] how to get access to gfx hardware, or access to user computers (remote access?) | * [Milan/Anthony/Marc] how to get access to gfx hardware, or access to user computers (remote access?) | ||
** is there a gfx card service like https://appthwack.com/ | ** is there a gfx card service like https://appthwack.com/ | ||
Line 672: | Line 681: | ||
** [bajaj] bug # incoming | ** [bajaj] bug # incoming | ||
* [bsmedberg] flow chart for developers: how to get the data that you need, next steps in the investigation, etc. | * [bsmedberg] flow chart for developers: how to get the data that you need, next steps in the investigation, etc. | ||
* [Kairo for filing bugs] Put things that are now in app notes into proper crash annotations instead | * [Kairo for filing bugs] Put things that are now in app notes into proper crash annotations instead - {{bug|918102}} filed as a tracker | ||
* affected graphics chips per signature - bug 853468 (planned for Q3) | * affected graphics chips per signature - {{bug|853468}} (planned for Q3) | ||
** [brandon] | ** [brandon] {{bug|853468}} Graphics vendors and devices - q3 goal to add this to signature summary | ||
* [Laura/bsmedberg] help get around the legalities involved with crash data access to contributors and external partners (and once those are resolved, the logistics) | * [Laura/bsmedberg] help get around the legalities involved with crash data access to contributors and external partners (and once those are resolved, the logistics) | ||
** needs subgoals! | ** needs subgoals! |
edits