Release Management/Nightly Respin: Difference between revisions

adding criteria to mobile
(Initial page)
 
(adding criteria to mobile)
 
(14 intermediate revisions by 3 users not shown)
Line 1: Line 1:
{{DISPLAYTITLE:The Firefox Nightly respin process}}
{{Processbox
  | Process name = Nightly respin
  | Purpose      = Update our users' broken nightlies
  | Why          = A patch on mozilla-central broke Nightly badly (crash, broken UI…)
  | Goals        = Nightly population retention
  | People      = Relman, releng, sheriffs
}}
<p style="font-size: larger; font-weight:bold;">This page details the process to back out a faulty patch and, if needed, get nightlies respun.</p>
<p style="font-size: larger; font-weight:bold;">This page details the process to back out a faulty patch and, if needed, get nightlies respun.</p>


Line 4: Line 14:
== Summarized process ==
== Summarized process ==


# File a bug with as much details possible about the regression / crash if a bug hasn't been filed yet
===Desktop===
# Ask release engineering to stop automatic nightly updates because of the bug filed in step 1
# File a bug with as much detail as possible about the regression / crash if a bug hasn't been filed yet
# Ask releng to stop automatic nightly updates because of the bug filed in step 1
# Warn our users about the regression via our Twitter account and #nightly IRC channel, give the bug number.
# Warn our users about the regression via our Twitter account and #nightly IRC channel, give the bug number.
# Investigate to find the faulty patch via mozregression or stack traces for crashes
# Investigate to find the faulty patch via mozregression or stack traces for crashes
# Ask release engineering for the back out of the patch and nightly respun, give the bug number as reference
# Ask sheriffs for the back out of the patch and nightly respun, give the bug number as reference
# Mark the bug as blocking the bug referenced for the faulty patch
# Mark the bug as blocking the bug referenced for the faulty patch
# Ask the patch author to investigate the regression (NeedInfo in Bugzilla)
# Ask the patch author to investigate the regression (NeedInfo in Bugzilla)
# When updates are back announce that the fix is served on Twitter and IRC
# When updates are back announce that the fix is served on Twitter and IRC
===Mobile===
# Find the regression as soon as possible and notify the author of the regression via #mobile-android-team (tag managers)
# Identify the affected population
# Identify best method to limit the affected population from the unstable build
# Warn our users about the regression via Matrix (#Fenix or #Nightly channel)
# Ask the patch author or the Mobile team to investigate the regression (NeedInfo in Bugzilla or Tag on Github)
# When updates are back announce that the fix is available


== When should we back out a patch? ==
== When should we back out a patch? ==


===Desktop and Mobile===
We want to back out a patch when a significant regression is identified. This is usually either a functional regression (browser unusable, content rendering broken) or a sudden spike of crashes on the Nightly channel.
We want to back out a patch when a significant regression is identified. This is usually either a functional regression (browser unusable, content rendering broken) or a sudden spike of crashes on the Nightly channel.


Line 21: Line 41:
== How to find the patch to back out ==
== How to find the patch to back out ==


===Desktop===
If it is a functional regression (reproducible case), then we should use [https://mozilla.github.io/mozregression/quickstart.html mozregression]. If it is a spike in crashes not necessarily reproducible (random crashes while surfing), then our crash analysis experts in the Release Management team should be contacted. The analysis of the stack trace combined with hg logs on mozilla-central often allow finding the bug number that introduced the instability.
If it is a functional regression (reproducible case), then we should use [https://mozilla.github.io/mozregression/quickstart.html mozregression]. If it is a spike in crashes not necessarily reproducible (random crashes while surfing), then our crash analysis experts in the Release Management team should be contacted. The analysis of the stack trace combined with hg logs on mozilla-central often allow finding the bug number that introduced the instability.
===Mobile===
TBD


== Bug filing ==
== Bug filing ==


===Desktop===
* If the bug was already filed by a community member, then use it to track the regression and qualify it. Add the nightly-community keyword if missing.
* If the bug was already filed by a community member, then use it to track the regression and qualify it. Add the nightly-community keyword if missing.
* If it is a crash, get a Crash ID from the people that reported it and file a bug via Socorro.
* If it is a crash, get a Crash ID from the people that reported it and file a bug via Socorro.
* If it is a functional regression and no bug was filed yet, file it.
* If it is a functional regression and no bug was filed yet, file it.


The bug number will be used to track the work to fix the regression. Communicate to our community that a bug exist so as to avoid having many duplicated filed.
Have the ''status-firefoxN'' tracking flag set as ''affected'', the ''tracking-firefoxN'' set as ''blocking'' and the target milestone set to ''mozillaN'' where N is the version number for Nightly.
 
Once the back out is done, mark the bug as FIXED and change the ''status-firefoxN'' tracking flag from ''affected'' to ''fixed''.
 
The bug number will be used to track the work to fix the regression. Communicate to our community that a bug exists so as to avoid having many duplicate bugs filed.


== Stopping automatic background updates ==
== Stopping automatic background updates ==


If you think that a lot of people are going to be impacted by a regression, ask release engineering to stop automatic updates. You can contact the on-call sheriff in the #sheriffs IRC channel or ask in the #releng channel.
===Desktop===
If you think that a lot of people are going to be impacted by a regression, ask releng or relman to stop automatic update.
 
Blocking automatic updates will not prevent new users to install Firefox Nightly from mozilla.org but it will mitigate greatly the impact on our existing user base.
 
We can ask releng for automatic updates to be stopped for a specific OS and potentially set up a fallback update mechanism to the last good known builds.


Blocking automatic updates won't prevent people to get an update if they check manually for it in the About dialog. It will also not prevent new users to install Firefox Nightly from mozilla.org but it will mitigate greatly the impact on our existing user base.
Most of the time, it is Relman that stops update via Balrog, it stops updates for all OSes.


Most of the major regressions are reported immediately via our @FirefoxNightly Twitter account followers, usually when more than 2 people report a similar regression there is a high chance that it will be serious and stopping automatic updates should be done rapidly.
Most of the major regressions are reported immediately via our @FirefoxNightly Twitter account followers, usually when more than 2 people report a similar regression there is a high chance that it will be serious and stopping automatic updates should be done rapidly.


'''Note:''' Some members of the release management team have the technical knowledge and permissions to stop automatic updates.
===Mobile===
If you think that a lot of people are going to be impacted by a regression, you need to identify the best method to limit the affected population from the unstable build.
 
If the offending commit or issue has not reached the mobile build, you can stop the mobile nightly builds by cancelling the scheduled hook.
 
If the significant regression is due to geckoview changes, the Relman team can create a new commit to "rollback" the Geckoview bump to the last stable build. If it is a mobile change/commit that is causing a crash spike/significant regression, relman team can backout the patch. Depending on the timing of either of these changes, you might need to manually trigger the nightly builds.


== Asking for a back out of the patch  and new nightlies ==
If the issue is specific to a device or OS version, the app can be limited or blocked from those devices in the Google Play Store via the device catalog.


== Asking for a back out of the patch and new nightlies ==
===Desktop===
You can contact sheriffs in the #sheriffs IRC channel to back out the patch that caused the regression when you have identified it. The back out commit will reference the bug number.
You can contact sheriffs in the #sheriffs IRC channel to back out the patch that caused the regression when you have identified it. The back out commit will reference the bug number.


Line 48: Line 90:


'''Note:''' Some members of the release management team have the technical knowledge and permissions to back out patches.
'''Note:''' Some members of the release management team have the technical knowledge and permissions to back out patches.
===Mobile===
If the next nightly is about to be triggered, it can be canceled via taskcluster.
If a new build is needed, the Relman team can trigger new mobile nightly builds via a taskcluster hook once the branch is in a stable state.


== Communicating about the issue ==
== Communicating about the issue ==


====Desktop====
We should not hesitate to communicate the issue with a reference to the bug number to our community so as to minimize the number of duplicate bugs. If the issue needs steps to reproduce which are not obvious or a specific hardware/OS combination, having all communications centralized in a single bug helps.
We should not hesitate to communicate the issue with a reference to the bug number to our community so as to minimize the number of duplicate bugs. If the issue needs steps to reproduce which are not obvious or a specific hardware/OS combination, having all communications centralized in a single bug helps.


Line 57: Line 105:
Communicating about major regressions in Nightly is also part of the informal social contract we have with our alpha testers, making sure they are informed of major technical issues impacting them helps keeping them engaged.
Communicating about major regressions in Nightly is also part of the informal social contract we have with our alpha testers, making sure they are informed of major technical issues impacting them helps keeping them engaged.


The main communication channels to communicate a regression are our @FirefoxNightly Twitter account, our #nightly IRC channel (bridged with our Telegram group) and potentially the #nightly-newbies slack channel if employees or NDAed volunteers reported it there.
The main communication channels to communicate a regression are our [https://twitter.com/firefoxnightly @FirefoxNightly] Twitter account, our [https://mozilla.social/@FirefoxNightly @FirefoxNightly@mozilla.social] Mastodon account and our [https://matrix.to/#/#nightly:mozilla.org #Nightly] chatroom on  Matrix/Element.
 
When updates are stopped, this will be automatically indicated on https://whattrainisitnow.com/release/?version=nightly with the reason message (usually linking to a bug) entered by release managers or sheriffs in Balrog when they stopped updates.


====Mobile====
Once a major regression is identified, we should communicate this to the right party or team about the situation along with a plan of mitigation. Include the affected population or set forth a plan to investigate this metric. Be sure to tag mobile managers via Slack when communicating the problem. In case of an emergency or if immediate escalation is required, use Mozilla People resources to contact the Mobile Managers.
The main communication channels for mobile nightly to communicate a regression are our [https://matrix.to/#/#nightly:mozilla.org #Fenix] chatroom on Matrix/Element and our #Mobile-Android-Team Slack channels
[[Category:Release_Management]]
[[Category:Release_Management]]
[[Category:Release_Management:Processes]]
[[Category:Release_Management:Processes|Nightly Respin]]
193

edits