ReleaseEngineering/Releaseduty/FAQ

< ReleaseEngineering‎ | Releaseduty
Revision as of 16:15, 14 March 2016 by Mtabara (talk | contribs) (Add some more info in misc section.)

0. Which IRC channels should I join?

Good starters are: #releaseduty, #tbdrivers, #release-drivers. There are also mailinglist subscriptions to release-automation-notifications as well as the <thunderbird-drivers@mozilla.org>, <release-automation-notifications-thunderbird@mozilla.org>, <release-drivers@mozilla.org> and <release@mozilla.com>.

1. How does the Ship-it workflow work in terms of shipping a new release?

Release Manager (RelMan) submits a new release form, another RelMan reviews that and once it hits 'Ready' the release enters the 'Reviewed' section and wais to be run. Since there's a release-runner.sh script running in a loop on bm81, there's a max windows of (60 seconds-ish?) till the job gets its share. Following which it enters the 'Running/Complete' table where we can observe its state. The "Reviewed" tab goes to "No pending release" yet again. This drawing depicts more technical details on this topic, in the Release-runner / Ship-it section.

2. With respect to release-runner, I grasped the release-runner part in the graph. However the http://hg.mozilla.org/build/tools/file/tip/buildfarm/release/release-runner.py#l309 shows me a TC task is created.

  • Where are those operations (patch, tag, sanity check, push &reconfig, trigger) being run?
  • What does the scheduler.createTaskGraph mean and the mark_as_completed part?

TODO answer

3. What does release-promotion refer to within all this context?

'release promotion' is simply the idea that we take an already existing CI build from (e.g.) beta and 'promote' that to being the build we release/ship to users. Prior to this idea, we have always rebuilt Firefox at the start of each new release. The diff between the current CI builds and CI builds going forward is that they will have slightly different mozconfigs and bits that make them as ready-to-ship. The assumption is that each CI build is releasable without changing it, so the signatures are final. (yes, no more bumping versions/build configs/mozconfigs/signatures!!). Long story short, release promotion entails taking an existing set of builds that have already been created and passed QA and “promoting” them to be used as a release candidate. This represents a fundamental shift in how we deliver Firefox to end users, and as such is both very exciting and terrifying at the same time.

4. In a few steps, how does a beta generation process looks like?

Please address this schema for details.

5. Somebody on IRC said "we need to update several stuff till the beta goes out in the release channel". Are all these hotfixes for stuff that has been observed since the changes started riding the trains?

Yes! Since 2012 Mozilla moved to a fixed-schedule release model, otherwise known as the Train Model, in which we released Firefox every six weeks to get features and updates to users faster and move at the speed of the Web. Hence, every six weeks the following merges take place:

mozilla-central => mozilla-aurora

mozilla-aurora => mozilla-beta

mozilla-beta => mozilla-release

So before we even ship a new release, there's a 6-8 weeks of betas in which we test things. Unexpected bugs can and usually come up in this time so there's need for hotfixes.

6. Overall rate vs update rate?

  • What's the overall rate everyone's keep talking about in the release channel mtg?

(e.g. Overall rate: 1.0 - yellow (almost green), on the upper edge of what 42 data looked like)

  • when they are talking about "update rate from 30 to 100" they are talking about throttling a release?

TODO answer

7. All of the following bugs are excerpt from a channel mtg. Are all these bugs noticed when 44 was already riding the trains in beta channel?

Overall rate: 1.1 - yellow, slightly up from last week of 43

  • bug 1212133 (a11y::DocAccessible::RemoveDependentIDsFor) is 2.2% of 44.0b1 data
  • bug 1233481 (js::AutoEnterOOMUnsafeRegion::crash) is 2.0%
  • bug 1215970 (CondVar::Wait) is 1.5%, see release
  • bug 1217135 (layers::ImageBridgeChild::EndTransaction) is 1.3%
  • bug 1234170 (nsIWebSocketChannel::Serial) is 1.2%
  • bug 1233962 (net::InterceptedChannelBase::DoNotifyController) is 1.1%

Therefore, are all of these hotfixes that need to get immediately in the release?

TODO answer

8. Scenario: Release-runner triggers a sendchange to buildbot. Buildbot builds. Is there any automatic testing chained up as soon as the build is ready? (as in, the same testing I see on treeherder for regular granular commits?)

No. To understand that you must first understand the train model system. Any current change that's being added by developers is first tested under a specific branch in treeherder (could be try branch for example). Assuming it has passed all the tests, the change is ready to be landed in mozilla-inbound where it runs the tests again. Assuming everything is green yet-again, the Sheriffs will pull down that changset and drag it under mozilla-central from where it starts riding the train model. Initially riding within the Nightly, then Aurora, then Beta to finally arrive in the release channel. Whenever a hotfix/chemspill comes into place and needs to be addressed, it it tested before the code is landed in the specific mercurial code base. Therefore, it's not being actually tested within the release process but beforehand.


9. Just a general overview, no need for micro-level details, how do the QA tools work?

  • their testing is subsequent to automatic testing, right? (that is regular tests from treeherder)
  • how do they test a build? Do they have specific tools?
  • do they focus on verifying bugs that are "stated as fixed" for that specific release? (whatsnew page)

TODO answer

10. What is a partner repack change for FF? What does "partner" refer to in this context? (e.g. bug 1231679)

Partner repacks refer to 3rd party customized branded versions of Firefox that Mozilla is taking care of for some of its clients. With some exceptions, most of the partner reconfigs lie under private repositories. For example, whenever we ship a build release on our servers, we also take care and host a Bing repack. For the others that we don't publish, all the resulting artifacts are private. Mostly, the partner repacks don't need too much of RelEng interference as all bits are held under private git repos and are directly handled by the partnering companies. The usual first occurrence with them for a releaseduty is when releases may get some delays in being published through the <channel>-cdntest because some of the partner-repacks are still running, hence holding the checksums step which at its turn holds the push-to-mirrors step.


11. During the release-drivers meeting, people often refer to conversations noticed on Twitter (as in complaints or positive feedback on something). How do people know what's on Twitter? Is there a specific handle or hashtag they're following or it's just the @mozilla and related conversations?

TODO answer

12. Is there a programmed calendar for the Thunderbird events?As it is for FF desktop or it's just something that is being setup in the #release-drivers and synced-up with TB folks?

We don't have a fixed calendar for Thunderbird. Developers are taking care of it internally and notify us whenever they are ready to ship.

13. I got confused by the order of the things happening.

As soon as the build is completed and available, the QE starts testing it? (manual, regresion, smoking, etc) I ask that because earlier today I was debugging some firefox_antivirus and update_verify_release on win64 steps when I saw an email like "[desktop] Firefox 43.0.4 (build 3) - Sign Off on Manual Functional Testing". Indeed the build was successful but there were many following steps afterwards that were not yet completed. How does it work, when does QE actually start testing?

It's easy! QE starts testing as soon as they put their hands on the en-US factory builds (for desktop Firefox) and multi-builds (for Fennec). In order to better understand the main logic, please see this drawing. To better correlate, please find the below list of emails that arrive (usually) chronologically (excerpt from releasing the 45.0b2):

  • initial email to notify the release has been triggered
  • [release] Firefox 45.0b2 build1: tagging started for Firefox 45.0b2
  • [release] Firefox 45.0b2 build1: completed firefox_reset_schedulers
  • [release] Firefox 45.0b2 build1: Source Repo tagging complete
  • [release] Firefox 45.0b2 build1: L10n Repo tagging complete
  • [release] Firefox 45.0b2 build1: completed firefox_bouncer_submitter
  • [release] Firefox 45.0b2 build1: completed firefox_source
  • (depending on the platform, e.g. on macosx64 here) => [release] Firefox 45.0b2 build1: macosx64 en-US build available
  • Firefox desktop only - once this watershed is hit for all platforms, QE start pursuing the manual functional testing. Attention again - Firefox desktop only!
  • (for each platform, 10 emails tracking l10n repacks (e.g. Linux 10/10 repacks email) => [release] Firefox 45.0b2 build1: completed repack_10/10 on linux
  • (depending on the platform, e.g. on win64 here) => [release] Firefox 45.0b2 build1: All win64 builds now available
  • Fennec only - once all builds are gathered and this watershed is hit for all platforms, QE start pursuing the manual functional testing. Attention again - Fennec desktop only!
  • [release] Firefox 45.0b2 build1: Updates available on beta-localtest
  • Once QE signs-off the manual testing, the RelMan Quality team follows-up with update testing'
  • (not a blocker for the release, runs on separate "thread" - depending on the platform, e.g. on win64 here) => [release] Firefox 45.0b2 build2: completed partner_repack on win64
  • [release] Fennec 45.0b2 build1: completed firefox_checksums
  • [release] Firefox 45.0b2 release
  • (6/6 emails for each platform) => [release] Firefox 45.0b2 build1: completed update_verify_beta_6/6 on linux64
  • [release] Firefox 45.0b2 build1: completed firefox_antivirus
  • if it's a normal release, not a beta, there's an additional email like [release] Firefox 44.0.1 build2: ready to ship to the release channel coming over
  • Upon pushing to Balrog from RelEng side, the following email announces final victory!
  • [release] Firefox 45.0b2 build1: Updates available on beta
  • [release] Firefox 45.0b2 build1: completed firefox_postrelease

14. How come the same RC ready-to-be-shipped (e.g. Desktop Firefox 44) results in bunch of errors (GTK3-related, binary found on diff, etc) on the update_verify_beta but completes successful on the update_verify_release?

TODO answer

15. How come Windows64 is not in the platform list for 38.6.0esr? http://hg.mozilla.org/build/buildbot-configs/file/tip/mozilla/release-firefox-mozilla-esr38.py#l71

TODO answer

16. Email stages I've noticed for <release> and <beta>.

Are these stages correct?

release

  • [desktop] Firefox 43.0.4 (build 3) - Sign Off on Manual Functional Testing by Softvision

assumption: QA performs manual and smoke testing following which it signs-off and hands to release-drivers

  • [Desktop] Firefox 43.0.4 build3 signed off on the release-localtest channel by QE

assumption: release-drivers perform update testing on release-localtest channel following which they sign-off and hand to release-management

  • [desktop] Please push Firefox 43.0.3 (build#1) to the release-cdntest channel

assumption: RelMan asks releng to push to release-cdntest (aka push to mirrors / push to cdn)

  • releng pushes to mirrors
  • [Desktop] Firefox 43.0.3 signed off on the release-cdntest channel by QE

assumption: QE performs the update tests on the release cdn-tests following which they sign off and hand to releaase managers

  • [desktop] Please push Firefox Desktop 43.0.3 (build #1) to the release channel - 100% update rate

assumption: release managers ask releng to push to release channel (aka push to balrog)

  • releng pushes to release channel
  • release-automation sends notification: "[release] Firefox 43.0.3 build1: Updates available on release"
  • [Desktop] QE signing off on 43.0.3 release

assumptioon: release-drivers perform the updates tests on the release channel

  • releng runs post-release for the release
  • release-automation sends notification stating that: "[release] Firefox 43.0.3 build1: completed firefox_postrelease"
  • done

YES!

betas

  • [desktop] Firefox 44.0 Beta 4 (build 1) - Sign Off on Manual Functional Testing

assumption: QA performs manual, regression and smoke testing on beta-localtest following which it signs-off and hands to release-drivers

  • for beta releases, push to mirrors (or pushing the beta release to the beta-cdntest) is done automatically
  • [Desktop] Please push Firefox 44.0b4 build1 to the beta channel

assumption: QE and release-drivers perform the updates on the beta-cdntest channel and ask releng to push it to beta channel

  • releng performs the 'Push to balrog' action
  • release-automation sends notification stating: "[release] Firefox 44.0b4 build1: Updates available on beta"
  • [Desktop] Firefox 44.0b4 updates on beta channel signed off by QE

assumption: QE performs update testing on the beta channel and ask releng for post-release step

  • releng performs the 'post-release' action step
  • release-automation sends notification stating that: "[release] Firefox 44.0b4 build1: completed firefox_postrelease"
  • done

YES!

Questions:

  • My explanation so far is that betas are auto when it comes to push-to-mirrors, while for any other releases this step is to be done manually. Is this correct?

> Yes!

  • what does the "updates testing" refer to? I saw it performed at various steps during the release process. Is it automated / stage?

> It's a step performed by the QE/release drivers as soon as the QE finishes up the signoff on that specific release. It implies testing that updates are working properly and no issues come around (e.g. testing WhatsNewPage and many other things)

  • for betas, is there a beta-localtest or is it the same thing with beta-cdntest?

> All of the releases have the <channel>-localtest, <channel>-cdntest and <channel>. All of them require signing-off QE for each of these and update testing from QE/release-drivers. Once this is done, releng moves it forward, either manually or automaically (e.g. from <channel>-localtest to <channel>-cdntest for beta releases).

17. Why don't I see update_verify_beta for dot releases?

From time to time, a handful of issues precipitate a dot release. When that happens, its behavior slightly varies from a normal release. A normal release (e.g. 43.0, 44.0, etc) has its RC shipped to beta channel first before making it to release channel - for testing purposes, update verify steps are taking place both ways, hence update_verify_release and update_verify_beta steps. Upon successful testing we ship the RC on the beta channel and then on the release channel, following which we merge the code for the next release cycle so that the beta release bumps its version. In the lights of this logic, a dot release (e.g. 43.0.1 or 44.0.1) happens a certain amount of time after the official release. For that reason, a dot release can't be tested in beta channel as the at-that-moment beta version is greater than the dot release version, hence the updater would refuse to downgrade. Therefore, there is only one cycle of update_verify for dot releases (update_verify_release == update_verify in this case).

18. As part of the release process, which step takes the user's Firefox to prompt the "There is a new update. Please click below [...]"?

The tl;dr is "Publish to Balrog". That is the magic step that makes the hot stuff available to all the users. However, there are certain bits to complete the equitation here. Once a release completes its deliverables and its signing is done - that release is available in the release-localtest channel. After the release passes by the push-to-mirrors step it arrives in the release-cdntest channel. Finally, when the release is being pushed to balrog, it reaches to its release channel. Normally, each of these steps require a QE sign-off (and, for some, a RelMan approval). Balrog also has the so-called throttling rules which control the outcome population for each release. In the most common scenario, the throttling steps come before the release gets actual published in Balrog. That is - change/add/update the rules so that the reaching user population does not exceed a certain X% amount (0 <= X <= 100). Even though the windowing would be small, we wouldn't want to publish first and then update the rules. It's against the logic. However, there are certain scenarios in which this action can take place. Consider having a release (e.g. Firefox 122.0) that was found to having high crash rates and has been set to a 0% user-absorption rate. A follow-up dot release is imminent in this case (say Firefox 122.0.1). If we update the rates to some greater values before pushing the dot release to Balrog, the users will be pointed to the old-release (Firefox 122.0). That's why in this case, we firstly push the dot release to Balrog and only then change the rates to some greater values. Whenever we push a new release to Balrog, its buildbot-configs update configs use the ruleId to map to that given rule and update the Balrog database. Long story short, if it's a dot release, the rules are there already so you push it first and then slightly adjust the rules. Upon successful push to Balrog, users across the globe will start getting their Firefox prompt asking them to update their version. So this happens after Push-to-Balrog, and NOT after post-release. However, not all the browsers will ask for that right away. Each Firefox process has an internal clock that runs to ask for updates in a given window timeframe of X hours (typically 24). It depends on each clock's measurements, sync and functionality on when to ask for that. However, we can be sure that after the completion of the entire time frame, all browsers across the world would have asked the question to the user. Worth to mention here that there is a Ship-it option that enforces update with immediate or planned action. The option is controlled by RelMan whenever they trigger the release in Ship-it.


Misc info

  • we don't usually publish updates for the first release of esr, usually there are 2 cycles of overlap (e.g. 45.2.0 will be probably the first release when we can offer updates to 38 users)