Releases:Release Post Mortem:2016-02-17: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
(Adding an item in the roundtable)
m (status update on 45.0b6)
 
(20 intermediate revisions by the same user not shown)
Line 8: Line 8:
=Shipped=
=Shipped=


== [[Releases/Thunderbird_45.0b1/BuildNotes | Thunderbird_45.0b1]] (jlund/rail/callek/nick/mtabara) ==
== [[Releases/Firefox_45.0b6/BuildNotes | Firefox 45.0b6]] (nick/mtabara/rail) ==
* victory!
* shipped during live channel mtg :)
* instead of disabling updates I pointed Linux users to 44.0b1 and others to 45.0b1
* <s>all update verify are good, AV came in, awaiting only the QE sign-off to push it live</s>
* <s>TODO: awaiting decision as lack of TB equivalent watershed Firefox beta gtk3 rule in Balrog, please see email on TB-drivers email</s>
* "Starting the build 6 without the last Hello changes. They broke m-b"
* <s>build1</s>
* intermittent errors:
** we had two win32 repacks failing
** regular linux/linux64 GTK3 known-issue thingie for 3-4 update verify steps
*** failed at repack_6/10 on win32 - retriggered, intermittent timeout
** bouncer_submitter failure, which is server side and apparently not new but we retry around it, nthomas filed {{bug|1248490}}. Retried the job, it succeeded with several auto-retries in the log.
*** failed at repack_2/10 on win32 - retriggered
 
**** retriggered upon 'da' locale failed while submitting to balrog, specifically around the make_incremental_update.sh script
== [[Releases/Thunderbird_38.6.0/BuildNotes | Thunderbird 38.6.0]] (nthomas/rail/mtabara) ==
**** retriggered upon loosing slave instance
* shipped with 50% update rate
**** retriggered upon timeout
* nthomas found Balrog UI bug here - {{bug|1248475}}
** '''from tb-drivers mailing list: "We'll likely abandon build1 and go for build2 after getting some fixes"
* release notes available [http://here https://www.mozilla.org/en-US/thunderbird/38.6.0/releasenotes/]
'''
* intermittent issues
** mock install error in linux64 repack 3/10, retriggered
** exception in win32 repack 1/10 after hang creating a mar file, retriggered
 
== [[Releases/Firefox_45.0b5/BuildNotes | Firefox 45.0b5]] (mtabara/nick/rail) ==
* victory
** initially we had a bunch of the update verify steps failing; because of an infra bug we ended up with high load on the servers, balrog being somewhat busted for a short window timeframe - see {{bug|1247869}} for more details.
** failed at update_verify_beta_4/6 on macosx64, failed mercurial cloning - automatic retry
** failed at update_verify_beta_6/6 on win64 - intermitent error while downloading complete mar
** several others with GTK3 known issue errors
** several others with downloading issues on complete mars or Balrog.
** antivirus failed in several attempts due to IncompleteRead - nthomas filed {{bug|1248299}} to track this
 
== [[Releases/Firefox_44.0.2/BuildNotes | Fennec 44.0.2]] (mtabara/rail/nthomas) ==
* we deeeed it!
* new security bug issue {{bug|1245724}}. 44.0.2 is underway <font color="red">''for both desktop and mobile''</font>
* <s>build1</s>:
** stopped in order to add one more critical fennec issue and start a build 2
 
* <s>build2</s>:
* <s>build2</s>:
** "Same changesets as before, but buildbot changes merged to production."
** abandoned here for yet-another build to follow with a hotfix
** '''gave up build 2 because of build error'''
* build3:
** intermittent errors:
** intermittent errors:
*** failed at repack_7/10 on win32, automatic retry
*** Fennec 44.0.2 build2: build step failed on android-api-11 - failure to clone build/tools when the fingerprint didn't match. gps suspects AWS are rolling out new certs
*** failed at repack_3/10 on win32, automatic retry
 
*** failed at update_verify_beta_2/6 on linux64 - GTK3 known issue error
* <s>build3</s>:
*** failed at update_verify_beta_2/6 on linux - GTK3 known issue error
** abandonded here as buildbot-master73 froze our builds and was really slow today - given that there was too much room for human error to interfere, we'll follow-up with a fourth build.


=Ongoing=
* build4:
** intermittent errors:
*** [release-runner] WARNING: Reconfig exceeded 900m then 1800 seconds - looks like buildbot-master73 is naughty today and really slow hence it delayed the whole reconfig step
*** at least three builders have been grabbed by the same bm73 yet-again. We might end up in the same scenario as build3.


== [[Releases/Firefox_38.6.1esr/BuildNotes | Firefox 38.6.1]] (mtabara/rail/nthomas) ==
== [[Releases/Firefox_38.6.1esr/BuildNotes | Firefox 38.6.1esr]] (mtabara/rail/nthomas) ==
* awaiting 'Please push' email - everything looks good in terms of AV + update verify steps
* victory eventually!
*  for the font related issue mentioned in another thread, {{bug|1246093}}, we are building and testing a dot release for ESR, 38.6.1.  
*  for the font related issue mentioned in another thread, {{bug|1246093}}, we are building and testing a dot release for ESR, 38.6.1.  
* building from a relbranch with just the one sec fix
* building from a relbranch with just the one sec fix
Line 40: Line 59:
** some intermittent errors:
** some intermittent errors:
*** antivirus check failed for a downloading issue when scanning, retriggered
*** antivirus check failed for a downloading issue when scanning, retriggered
** main issues:
*** we rushed into pushing it to esr-release channel without the QE signoff. Update tests were failing because of WNP error. mtabara changed the throttling to 0 in the first place, rail solved the WNP and then rates were amended to 100% yet again.


== [[Releases/Firefox_44.0.2/BuildNotes | Firefox 44.0.2]] (mtabara/rail/nthomas) ==
== [[Releases/Firefox_44.0.2/BuildNotes | Firefox 44.0.2]] (mtabara/rail/nthomas) ==
* victory!
* release notes here: https://www.mozilla.org/en-US/firefox/android/44.0.2/releasenotes/
* release notes here: https://www.mozilla.org/en-US/firefox/android/44.0.2/releasenotes/
* new security bug issue {{bug|1245724}}. 44.0.2 is underway <font color="red">''for both desktop and mobile''</font>
* new security bug issue {{bug|1245724}}. 44.0.2 is underway <font color="red">''for both desktop and mobile''</font>
Line 53: Line 75:


* build3:
* build3:
* awaiting 'Please push' email - everything looks good in terms of AV + update verify steps
** this build is needed to address a critical windows startup issue (backed out {{bug|1218473}})
** this build is needed to address a critical windows startup issue (backed out {{bug|1218473}})
** intermittent errors for Firefox:
** intermittent errors for Firefox:
*** few update verify failed for downloading issues
*** few update verify failed for downloading issues


== [[Releases/Firefox_44.0.2/BuildNotes | Fennec 44.0.2]] (mtabara/rail/nthomas) ==
== [[Releases/Thunderbird_45.0b1/BuildNotes | Thunderbird_45.0b1]] (jlund/rail/callek/nick/mtabara) ==
* new security bug issue {{bug|1245724}}. 44.0.2 is underway <font color="red">''for both desktop and mobile''</font>
* victory!
* <s>build1</s>:
* instead of disabling updates I pointed Linux users to 44.0b1 and others to 45.0b1
** stopped in order to add one more critical fennec issue and start a build 2
* <s>TODO: awaiting decision as lack of TB equivalent watershed Firefox beta gtk3 rule in Balrog, please see email on TB-drivers email</s>
 
* <s>build1</s>
** we had two win32 repacks failing
*** failed at repack_6/10 on win32 - retriggered, intermittent timeout
*** failed at repack_2/10 on win32 - retriggered
**** retriggered upon 'da' locale failed while submitting to balrog, specifically around the make_incremental_update.sh script
**** retriggered upon loosing slave instance
**** retriggered upon timeout
** '''from tb-drivers mailing list: "We'll likely abandon build1 and go for build2 after getting some fixes"
'''
* <s>build2</s>:
* <s>build2</s>:
** abandoned here for yet-another build to follow with a hotfix
** "Same changesets as before, but buildbot changes merged to production."
** '''gave up build 2 because of build error'''
* build3:
** intermittent errors:
** intermittent errors:
*** Fennec 44.0.2 build2: build step failed on android-api-11 - failure to clone build/tools when the fingerprint didn't match. gps suspects AWS are rolling out new certs
*** failed at repack_7/10 on win32, automatic retry
*** failed at repack_3/10 on win32, automatic retry
*** failed at update_verify_beta_2/6 on linux64 - GTK3 known issue error
*** failed at update_verify_beta_2/6 on linux - GTK3 known issue error


* <s>build3</s>:
=Ongoing=
** abandonded here as buildbot-master73 froze our builds and was really slow today - given that there was too much room for human error to interfere, we'll follow-up with a fourth build.
 
* build4:
** intermittent errors:
*** [release-runner] WARNING: Reconfig exceeded 900m then 1800 seconds - looks like buildbot-master73 is naughty today and really slow hence it delayed the whole reconfig step
*** at least three builders have been grabbed by the same bm73 yet-again. We might end up in the same scenario as build3.


== [[Releases/Firefox_45.0b6/BuildNotes | Fennec 45.0b6]] (nick/mtabara/rail) ==
* awaiting the Google Play email to run the post-release and move this to the Shipped section
* "Starting the build 6 without the last Hello changes. They broke m-b"
=Roundtable=
=Roundtable=
* bhearsum: should we really be blocking shipping a chemspill release on what's new page configuration? I don't have opinion on this particular what's new page, but holding back in-the-wild fixes because of a what's new page seems bad
* bhearsum: should we really be blocking shipping a chemspill release on what's new page configuration? I don't have opinion on this particular what's new page, but holding back in-the-wild fixes because of a what's new page seems bad


Context:
<pre>
20:12:19 <bhearsum> a question for the postmortem, maybe: should we really be blocking shipping a chemspill release on what's new page configuration?
20:12:55 <rail> we should stop showing it
20:12:59 — ~mtabara agrees
20:13:19 <rail> we are about to ship esr45 :)
20:13:27 <bhearsum> i don't have opinion on this particular what's new page
20:13:55 <bhearsum> but holding back in-the-wild fixes because of a what's new page seems bad
20:14:59 <lizzard> we’r only holding it back for a short time
20:15:03 <lizzard> but good question....
20:15:32 <bhearsum> yeah, and it's only for esr in this case
20:15:41 <bhearsum> i doubt it made a practical difference for that userbase
20:15:51 <lizzard> For esr, i can’t imagine enterprise folks can deploy this so quickly as to mind an hour’s diference
20:15:52 <bhearsum> but if were the firefox release channel it might be a different story
20:17:26 <bhearsum> i guess it's also an important point that screwing up the WNP has more effect on the release channel
</pre>
* mtabara: while deploying TB 38.6.0 with nthomas we had to change the balrog update rates to 50%. While attempting we realized they were already change and there has been some strictly UI issue in Balrog as rate changes did not shown on rule history - {{bug|1248475}}. nthomas did a db query to find the answers we were then looking for:
<pre>
mysql> select change_id, changed_by, from_unixtime(substr(timestamp, 1, 10)) as timestamp, backgroundRate from rules_history where rule_id=170 order by change_id desc limit 10;
+-----------+---------------------+----------------------------+----------------+
| change_id | changed_by          | timestamp                  | backgroundRate |
+-----------+---------------------+----------------------------+----------------+
|      4423 | tbirdbld            | 2016-02-15 20:41:30.000000 |            50 |
|      3890 | tbirdbld            | 2016-01-07 22:21:11.000000 |            50 |
|      3889 | jlund@mozilla.com  | 2016-01-07 22:20:21.000000 |            50 |
|      3810 | jwood@mozilla.com  | 2015-12-30 16:03:00.000000 |            100 |
|      3740 | raliiev@mozilla.com | 2015-12-23 19:01:11.000000 |            30 |
|      3734 | tbirdbld            | 2015-12-23 15:18:47.000000 |            30 |
|      3733 | raliiev@mozilla.com | 2015-12-23 15:16:31.000000 |            30 |
|      3425 | jlund@mozilla.com  | 2015-12-02 18:54:06.000000 |            100 |
|      3378 | jlund@mozilla.com  | 2015-11-27 17:07:23.000000 |              0 |
|      3334 | tbirdbld            | 2015-11-25 18:58:35.000000 |            30 |
+-----------+---------------------+----------------------------+----------------+
</pre>
Question for bhearsum: any chance I can get *read-only* access on that DB as well for future scenarios?
* mtabara: {{bug|1241263}} on [[ReleaseEngineering/Releaseduty/FAQ|FAQ]], feel free to add/amend input should you like
* mtabara: improvement proposal partner_repack related


=Action items=
=Action items=

Latest revision as of 20:29, 17 February 2016

Meeting Details

« previous week | index | next week »
< most recent | upcoming >


Release Duty

  • FF 45 cycle: mtabara

Misc

Shipped

Firefox 45.0b6 (nick/mtabara/rail)

  • shipped during live channel mtg :)
  • all update verify are good, AV came in, awaiting only the QE sign-off to push it live
  • "Starting the build 6 without the last Hello changes. They broke m-b"
  • intermittent errors:
    • regular linux/linux64 GTK3 known-issue thingie for 3-4 update verify steps
    • bouncer_submitter failure, which is server side and apparently not new but we retry around it, nthomas filed bug 1248490. Retried the job, it succeeded with several auto-retries in the log.

Thunderbird 38.6.0 (nthomas/rail/mtabara)

Firefox 45.0b5 (mtabara/nick/rail)

  • victory
    • initially we had a bunch of the update verify steps failing; because of an infra bug we ended up with high load on the servers, balrog being somewhat busted for a short window timeframe - see bug 1247869 for more details.
    • failed at update_verify_beta_4/6 on macosx64, failed mercurial cloning - automatic retry
    • failed at update_verify_beta_6/6 on win64 - intermitent error while downloading complete mar
    • several others with GTK3 known issue errors
    • several others with downloading issues on complete mars or Balrog.
    • antivirus failed in several attempts due to IncompleteRead - nthomas filed bug 1248299 to track this

Fennec 44.0.2 (mtabara/rail/nthomas)

  • we deeeed it!
  • new security bug issue bug 1245724. 44.0.2 is underway for both desktop and mobile
  • build1:
    • stopped in order to add one more critical fennec issue and start a build 2
  • build2:
    • abandoned here for yet-another build to follow with a hotfix
    • intermittent errors:
      • Fennec 44.0.2 build2: build step failed on android-api-11 - failure to clone build/tools when the fingerprint didn't match. gps suspects AWS are rolling out new certs
  • build3:
    • abandonded here as buildbot-master73 froze our builds and was really slow today - given that there was too much room for human error to interfere, we'll follow-up with a fourth build.
  • build4:
    • intermittent errors:
      • [release-runner] WARNING: Reconfig exceeded 900m then 1800 seconds - looks like buildbot-master73 is naughty today and really slow hence it delayed the whole reconfig step
      • at least three builders have been grabbed by the same bm73 yet-again. We might end up in the same scenario as build3.

Firefox 38.6.1esr (mtabara/rail/nthomas)

  • victory eventually!
  • for the font related issue mentioned in another thread, bug 1246093, we are building and testing a dot release for ESR, 38.6.1.
  • building from a relbranch with just the one sec fix
  • build1:
    • some intermittent errors:
      • antivirus check failed for a downloading issue when scanning, retriggered
    • main issues:
      • we rushed into pushing it to esr-release channel without the QE signoff. Update tests were failing because of WNP error. mtabara changed the throttling to 0 in the first place, rail solved the WNP and then rates were amended to 100% yet again.

Firefox 44.0.2 (mtabara/rail/nthomas)

  • build2:
    • intermittent errors for Firefox:
      • failed at firefox_antivirus, retriggered - intermittent download error for locale/partial
    • abandoned as there's a follow-up build3 coming underway
  • build3:
    • this build is needed to address a critical windows startup issue (backed out bug 1218473)
    • intermittent errors for Firefox:
      • few update verify failed for downloading issues

Thunderbird_45.0b1 (jlund/rail/callek/nick/mtabara)

  • victory!
  • instead of disabling updates I pointed Linux users to 44.0b1 and others to 45.0b1
  • TODO: awaiting decision as lack of TB equivalent watershed Firefox beta gtk3 rule in Balrog, please see email on TB-drivers email
  • build1
    • we had two win32 repacks failing
      • failed at repack_6/10 on win32 - retriggered, intermittent timeout
      • failed at repack_2/10 on win32 - retriggered
        • retriggered upon 'da' locale failed while submitting to balrog, specifically around the make_incremental_update.sh script
        • retriggered upon loosing slave instance
        • retriggered upon timeout
    • from tb-drivers mailing list: "We'll likely abandon build1 and go for build2 after getting some fixes"

  • build2:
    • "Same changesets as before, but buildbot changes merged to production."
    • gave up build 2 because of build error
  • build3:
    • intermittent errors:
      • failed at repack_7/10 on win32, automatic retry
      • failed at repack_3/10 on win32, automatic retry
      • failed at update_verify_beta_2/6 on linux64 - GTK3 known issue error
      • failed at update_verify_beta_2/6 on linux - GTK3 known issue error

Ongoing

Fennec 45.0b6 (nick/mtabara/rail)

  • awaiting the Google Play email to run the post-release and move this to the Shipped section
  • "Starting the build 6 without the last Hello changes. They broke m-b"

Roundtable

  • bhearsum: should we really be blocking shipping a chemspill release on what's new page configuration? I don't have opinion on this particular what's new page, but holding back in-the-wild fixes because of a what's new page seems bad

Context:

20:12:19 <bhearsum> a question for the postmortem, maybe: should we really be blocking shipping a chemspill release on what's new page configuration?
20:12:55 <rail> we should stop showing it 
20:12:59 — ~mtabara agrees
20:13:19 <rail> we are about to ship esr45 :)
20:13:27 <bhearsum> i don't have opinion on this particular what's new page
20:13:55 <bhearsum> but holding back in-the-wild fixes because of a what's new page seems bad
20:14:59 <lizzard> we’r only holding it back for a short time
20:15:03 <lizzard> but good question....
20:15:32 <bhearsum> yeah, and it's only for esr in this case
20:15:41 <bhearsum> i doubt it made a practical difference for that userbase
20:15:51 <lizzard> For esr, i can’t imagine enterprise folks can deploy this so quickly as to mind an hour’s diference
20:15:52 <bhearsum> but if were the firefox release channel it might be a different story
20:17:26 <bhearsum> i guess it's also an important point that screwing up the WNP has more effect on the release channel
  • mtabara: while deploying TB 38.6.0 with nthomas we had to change the balrog update rates to 50%. While attempting we realized they were already change and there has been some strictly UI issue in Balrog as rate changes did not shown on rule history - bug 1248475. nthomas did a db query to find the answers we were then looking for:
mysql> select change_id, changed_by, from_unixtime(substr(timestamp, 1, 10)) as timestamp, backgroundRate from rules_history where rule_id=170 order by change_id desc limit 10;
+-----------+---------------------+----------------------------+----------------+
| change_id | changed_by          | timestamp                  | backgroundRate |
+-----------+---------------------+----------------------------+----------------+
|      4423 | tbirdbld            | 2016-02-15 20:41:30.000000 |             50 |
|      3890 | tbirdbld            | 2016-01-07 22:21:11.000000 |             50 |
|      3889 | jlund@mozilla.com   | 2016-01-07 22:20:21.000000 |             50 |
|      3810 | jwood@mozilla.com   | 2015-12-30 16:03:00.000000 |            100 |
|      3740 | raliiev@mozilla.com | 2015-12-23 19:01:11.000000 |             30 |
|      3734 | tbirdbld            | 2015-12-23 15:18:47.000000 |             30 |
|      3733 | raliiev@mozilla.com | 2015-12-23 15:16:31.000000 |             30 |
|      3425 | jlund@mozilla.com   | 2015-12-02 18:54:06.000000 |            100 |
|      3378 | jlund@mozilla.com   | 2015-11-27 17:07:23.000000 |              0 |
|      3334 | tbirdbld            | 2015-11-25 18:58:35.000000 |             30 |
+-----------+---------------------+----------------------------+----------------+

Question for bhearsum: any chance I can get *read-only* access on that DB as well for future scenarios?

  • mtabara: bug 1241263 on FAQ, feel free to add/amend input should you like
  • mtabara: improvement proposal partner_repack related

Action items