Platform/Uptime: Difference between revisions

m
name fixup
m (name fixup)
 
(20 intermediate revisions by 2 users not shown)
Line 1: Line 1:
'''Project Uptime ran from April 2016 to June 2017.'''
''Preventing and fixing crashes remains an important task, and there is ongoing work there. But this work is no longer being coordinated under Project Uptime. This page is being kept in place as a historical record.''
Project Uptime's goal is to reduce the crash rate of Firefox (desktop and mobile) and keep it down. This project is a Platform Engineering initiative that aims to extend and complement existing work relating to stability within Mozilla.  
Project Uptime's goal is to reduce the crash rate of Firefox (desktop and mobile) and keep it down. This project is a Platform Engineering initiative that aims to extend and complement existing work relating to stability within Mozilla.  


Line 202: Line 205:
'''Extra:''' A [https://docs.google.com/presentation/d/1j-w1Mxgh7xQBPa57gdx_PSP2yf9yMJh8Oqu_7izReAY/edit#slide=id.p cross-variate analysis of FHR data], by Brendan Colloran, which may have useful techniques.
'''Extra:''' A [https://docs.google.com/presentation/d/1j-w1Mxgh7xQBPa57gdx_PSP2yf9yMJh8Oqu_7izReAY/edit#slide=id.p cross-variate analysis of FHR data], by Brendan Colloran, which may have useful techniques.


=== Improve understanding of OOM causes ===
=== {{mdone|}} Improve understanding of OOM causes ===
* {{mdone|}} Do a large-scale analysis of memory reports from OOM crashes [njn]
* {{mdone|}} Do a large-scale analysis of memory reports from OOM crashes [njn]
* Show important data from memory reports in crash-stats [njn]
* {{mdone|}} Show important data from memory reports in crash-stats [njn]
* ? Discuss common OOM cases with partners [digitarald?]
* ? Discuss common OOM cases with partners [digitarald?]


Line 366: Line 369:
** [https://health.graphics/crashes/beta Telemetry Crash Rate (Beta)] [harald]
** [https://health.graphics/crashes/beta Telemetry Crash Rate (Beta)] [harald]
** [https://bsmedberg.github.io/telemetry-dashboard/crashes/office-dashboard.html Crashes per 1000 usage hours (Beta, DevEd, Nightly)] [bsmedberg]
** [https://bsmedberg.github.io/telemetry-dashboard/crashes/office-dashboard.html Crashes per 1000 usage hours (Beta, DevEd, Nightly)] [bsmedberg]
** {{bug|1324528}} - Bug about removing old dashboards.
** {{bug|1324526}} - Bug about the new dashboards we want.
* [https://dataviz.mozilla.org/views/PlatformVersionFirefoxADI/DesktopADIbyPlatform Firefox ADI dashboard] (requires Tableau/dataviz privileges to view)
* [https://dataviz.mozilla.org/views/PlatformVersionFirefoxADI/DesktopADIbyPlatform Firefox ADI dashboard] (requires Tableau/dataviz privileges to view)


Line 383: Line 388:
North America (Pacific)
North America (Pacific)
* Andrew McCreight, platform engineering
* Andrew McCreight, platform engineering
* Chris Lonnen, Socorro
* Lonnen, Socorro
* David Baron, platform engineering
* David Baron, platform engineering


Line 401: Line 406:
* Julian Seward, dynamic analysis
* Julian Seward, dynamic analysis
* Sylvestre Ledru, static analysis, release management & stability
* Sylvestre Ledru, static analysis, release management & stability
* Gabriele Svelto, Firefox engineering


Europe (Eastern)
Europe (Eastern)
Line 407: Line 413:
= Meetings =
= Meetings =


Meetings are every two weeks. Because Uptime participants span so many timezones, there are two meetings. People should attend the meeting that best suits their timezone.
We held meetings for several months, then switched to email updates because we deemed that to be more effective use of everybody's time.
 
Meetings will take place in the [https://v.mozilla.com/flex.html?roomdirect.html&key=tGTDjguBXn29Ldaww7BCeVhp4M Uptime Vidyo room].
 
== Meeting A ==


This meeting is at 9am US (Pacific) time. The times for Europe may shift by one hour during the transition to/from summer time.
Here are the minutes of the meetings we have had.
 
{| class="wikitable"
! North America (Pacific)
! North America (Eastern)
! Europe (Western)
! Europe (Central)
! Europe (Eastern)
|-
| Monday 9am
| Monday 12pm
| Monday 5pm
| Monday 6pm
| Monday 7pm
|}
 
== Meeting B ==
 
This meeting is at 9am Taiwan time. Taiwan does not observe daylight saving, so the meeting time changes in other locations that do. The table below shows the resulting times for the two main segments of the year. The time when the meeting time changes for the non-Taiwanese locations depends on when they enter/leave summer time.
 
{| class="wikitable"
!
! Taiwan
! Australia (Eastern)
! North America (Pacific)
! North America (Eastern)
|-
| Northern summer<br>(e.g. July)
| Tuesday 9am
| Tuesday 11am
| Monday 6pm
| Monday 9pm
|-
| Northern winter<br>(e.g. January)
| Tuesday 9am
| Tuesday 12pm
| Monday 5pm
| Monday 8pm
|}
 
== Minutes ==
 
Meeting minutes are taken so people who are unable to attend a meeting can know what happened, and also to record action items. Minutes are taken in etherpads to allow multiple people to edit them during the meeting. The following etherpad contains a minutes template that can be copied into a new etherpad for each meeting. After doing so, please (a) fill in the date, and (b) copy action items from the previous meeting's minutes into the relevant section of the new meeting's minutes.
 
* [https://public.etherpad-mozilla.org/p/uptime-template Minutes template]
 
Please write minutes so they are comprehensible to people who were not at the meeting. In particular, links to bug reports, project pages, etc., are very helpful. Prior to the meeting, the minutes document can serve as the agenda; please feel free to add items ahead of time.


* [https://public.etherpad-mozilla.org/p/uptime20170410 2017-04-10 minutes]
* [https://public.etherpad-mozilla.org/p/uptime20170327 2017-03-27 minutes]
* [https://public.etherpad-mozilla.org/p/uptime20170313 2017-03-13 minutes]
* [https://public.etherpad-mozilla.org/p/uptime20170227 2017-02-27 minutes]
* [https://public.etherpad-mozilla.org/p/uptime20170213 2017-02-13 minutes]
* [https://public.etherpad-mozilla.org/p/uptime20161219 2016-12-19 minutes]
* [https://public.etherpad-mozilla.org/p/uptime_hawaii_general Hawaii minutes: general]
* [https://public.etherpad-mozilla.org/p/uptime_hawaii_general Hawaii minutes: general]
* [https://public.etherpad-mozilla.org/p/uptime_hawaii_windows Hawaii minutes: Windows third-party crashes]
* [https://public.etherpad-mozilla.org/p/uptime_hawaii_windows Hawaii minutes: Windows third-party crashes]
Line 466: Line 428:
* [https://public.etherpad-mozilla.org/p/uptime_hawaii_analysis Hawaii minutes: crash report analysis]
* [https://public.etherpad-mozilla.org/p/uptime_hawaii_analysis Hawaii minutes: crash report analysis]
* [https://public.etherpad-mozilla.org/p/uptime_hawaii_metrics Hawaii minutes: crash metrics]
* [https://public.etherpad-mozilla.org/p/uptime_hawaii_metrics Hawaii minutes: crash metrics]
* [https://public.etherpad-mozilla.org/p/uptime20161121 2016-11-21 minutes]
* [https://public.etherpad-mozilla.org/p/uptime20161121 2016-11-21 minutes]
* [https://public.etherpad-mozilla.org/p/uptime20161107 2016-11-07 minutes]
* [https://public.etherpad-mozilla.org/p/uptime20161107 2016-11-07 minutes]
Line 475: Line 436:
* [https://public.etherpad-mozilla.org/p/uptime20160829 2016-08-29 minutes]
* [https://public.etherpad-mozilla.org/p/uptime20160829 2016-08-29 minutes]


== Follow-up email threads ==
Here is the meeting minutes template.


Follow-up threads are encouraged on the email list, for discussing things that were not clear from the meeting minutes. (Updating the minutes with additional information based on these meetings is also encouraged.)
* [https://public.etherpad-mozilla.org/p/uptime-template Minutes template]


= Communication channels =
= Communication channels =
Line 489: Line 450:
= Nightly crash triage =
= Nightly crash triage =


We aim to analyze the crashes for every Nightly build.
''This documentation has been moved to [[NightlyCrashTriage]].''
 
== Roster ==
 
Nightly builds are produced at 3am each day (California time).
 
* Monday (Australian time): njn analyzes Friday's build.
* Monday (US East time): marcia analyzes Saturday's build.
* Tuesday (Taiwan time): ting analyzes Sunday's build.
* Wednesday (Taiwan time): kanru analyzes Monday's build.
* Wednesday (US East time): jchen analyzes Tuesday's build.
* Thursday (US West time): mccr8 analyzes Wednesday's build.
* Friday (US East time): jchen analyzes Thursday's build.
 
A [https://calendar.google.com/calendar/b/1/embed?src=mozilla.com_37e791c3iohijr18mi02l0dvi0@group.calendar.google.com&ctz=America/Los_Angeles live calendar] is also available for Mozilla employees. Please use it to schedule deviations from the usual roster, e.g. for PTO.
 
== Notes ==
 
Triage notes are kept in the following pages.
 
* [[Platform/Uptime/NightlyCrashTriage/2016Q4|2016 Q4]]
* [[Platform/Uptime/NightlyCrashTriage/2016Q3|2016 Q3]]
* [[Platform/Uptime/NightlyCrashTriage/2016Q2|2016 Q2]]
 
Use the date you are doing the triage, rather than the date of the build, to decide which page to put your notes in. The reason for this is that the triage date has a heading, which makes it more prominent in the notes than the build date.
 
== Data sources, tools, documentation ==
 
Crucial links
* [http://dbaron.org/mozilla/crashes-by-build Nightly and Aurora crashes by build]
* [https://mozilla.github.io/stab-crashes/correlations.html The new correlation reports]
 
Other links
* [https://crash-stats.mozilla.com/search/?product=Firefox&_facets=moz_crash_reason&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=moz_crash_reason#facet-moz_crash_reason All MOZ_CRASH] crashes.
* [http://bsmedberg.github.io/socorro-toolbox/html/multiple-minidumps.html Display stacks from multi-dump hangs/crashes]
* [https://crash-analysis.mozilla.com/crash_analysis/ The old, busted correlation reports]
 
Documentation
* A [[Platform/Uptime/NightlyCrashAnalysis|rough guide to Nightly crash analysis]]
* [https://developer.mozilla.org/en-US/docs/Understanding_crash_reports Understanding crash reports]
* [https://developer.mozilla.org/en-US/docs/A_guide_to_searching_crash_reports A guide to searching crash reports]
Confirmed users
1,031

edits