Marketplace/PushDuty: Difference between revisions

(added banner)
 
(25 intermediate revisions by 6 users not shown)
Line 1: Line 1:
{{Marketplace_banner}}


== Marketplace Push Duty ==
== Marketplace Push Duty ==


So, you're going to update marketplace.firefox.com, eh?  You've come to the right place.  As release manager your responsibilities include:
So, you're going to update marketplace.firefox.com, eh?  You've come to the right place.  As release manager your responsibilities are:


* Tagging releases when the milestone closes and updating stage
* Tagging releases when the milestone closes and updating stage
* '''Evaluating any potential impact of the push on system performance'''
* '''Evaluating''' and cherry picking requests for the tag after it closes
* '''Evaluating''' and cherry picking requests for the tag after it closes
* Ensuring the waffle flags on stage are set appropriately (if it's going out in the next push, it's enabled, otherwise it is equivalent to production)
* Ensuring the waffle flags on stage are set appropriately (if it's going out in the next push, it's enabled, otherwise it is equivalent to production)
* Working with ops during our push window to make sure the release is smooth
* Working with Ops during push to make sure the release is smooth
* Working with QA during our push window to make sure any concerns are addressed
* Working with QA to make sure any concerns are addressed
* Follow up with ops and QA to do repeat pushes to address any critical issues.
* Following up with Ops and QA to do repeat pushes to address any critical issues
* Noting any new major features going out on the etherpad
* Noting any new major features going out on the etherpad
* Telling the person after you that they are on for the next week
* Telling the person after you that they are on for the next week
Line 15: Line 17:
=== Calendar ===
=== Calendar ===


If you'd like some calendar events to remind you when to do things, I've set one up:
If you'd like some calendar events to remind you when to do things, AndyM set one up: [https://www.google.com/calendar/embed?src=mozilla.com_5efh3ujitq1gt5l86qpiaka39g%40group.calendar.google.com&ctz=America/Vancouver HTML] or [https://www.google.com/calendar/ical/mozilla.com_5efh3ujitq1gt5l86qpiaka39g%40group.calendar.google.com/public/basic.ics iCal].


https://mail.mozilla.com/home/amckay@mozilla.com/Marketplace%20Push%20Duty
You can subscribe to this for the week you are on push duty, then turn off when you are off it.
 
You can subscribe to this for the week you are on push duty, then turn off when you are off it. If iCal doesn't work for you, ask Andy to send you the calendar from Zimbra.


== A walkthrough of a push ==
== A walkthrough of a push ==


=== Tagging ===
=== Tagging and Pushing to Stage ===
You should be tagging '''Friday at 11am PST''' before you expect to push.  It
You should be tagging '''Friday at 11am PST''' before you expect to push.  Remind folks on IRC that you're tagging and make sure they don't have half-finished patches.
doesn't hurt to chat with folks ahead of time on IRC to remind people you're
tagging and making sure they don't have half finished patches.
 
==== Automatic tagging and staging ====
 
In the zamboni repository, you can run:


    make release
# '''Tag the repositories and push the tags to Stage''' -- this can be done [[#Automatic tagging and pushing to Stage|automatically]] or [[#Manual tagging and pushing to Stage|manually]].
# '''Actively steward the push to Stage''' -- if there's an error during push '''or''' if the push will have adverse affects on production performance, work with Ops and commit authors to either redo or adjust the push (more on that [[#What it means to steward the push|below]]).
# '''Update the etherpad with the compare URLs for each repo''' -- add in the github compare URLs into the etherpad, so when the push comes people can easily see what is about to go out.


If this is your first time running it, it will prompt you for dreadnot URL (http://dreadnot-stage.addonsadm.private.phx1.mozilla.com/) and region (phx1), as well as username and password. If you have the "keyring" python module installed, it will securely remember your password. Remember to be connected to the VPN to run this.


The script will then tag all the appropriate repos, and push them to stage.
==== Manual tagging and pushing to Stage ====
 
==== Manual instructions ====


Name your tag with the '''date of the push''' in the format YYYY.MM.DD.
Name your tag with the '''date of the push''' in the format YYYY.MM.DD.


The following repositories need tagging:
The following repositories need tagging:
* https://github.com/mozilla/solitude
* https://github.com/mozilla/webpay
* https://github.com/mozilla/commbadge
* https://github.com/mozilla/commbadge
* https://github.com/mozilla/fireplace
* https://github.com/mozilla/fireplace
* https://github.com/mozilla/marketplace-stats
* https://github.com/mozilla/marketplace-stats
* https://github.com/mozilla/monolith-aggregator
* https://github.com/mozilla/monolith-aggregator
* https://github.com/mozilla/rocketfuel
* https://github.com/mozilla/discoplace
* https://github.com/mozilla/transonic
* https://github.com/mozilla/transonic
* https://github.com/mozilla/zamboni
* https://github.com/mozilla/zamboni
* https://github.com/mozilla/spartacus
* https://github.com/mozilla/marketplace-operator-dashboard
* https://github.com/mozilla/marketplace-content-tools


There is [https://github.com/cvan/tagz/ a script] which can do all that for you.  Try:
There is [https://github.com/cvan/tagz/ a script] which can do all that for you.  Try:
   python tagz.py -r mozilla/solitude,mozilla/webpay,mozilla/commbadge,mozilla/fireplace,mozilla/marketplace-stats,mozilla/monolith-aggregator,mozilla/rocketfuel,mozilla/discoplace,mozilla/transonic,mozilla/zamboni,mozilla/spartacus -c create -t YYYY.MM.DD
   python tagz.py -r mozilla/commbadge,mozilla/fireplace,mozilla/marketplace-operator-dashboard,mozilla/marketplace-stats,mozilla/monolith-aggregator,mozilla/transonic,mozilla/zamboni,mozilla/marketplace-content-tools -c create -t YYYY.MM.DD
 
Next add in the github compare URLs into the etherpad, so when the push comes people can easily see what is about to go out.


Next you'll need to update the staging servers:
Next you'll need to update the staging servers:


# Load [http://dreadnot-stage.addonsadm.private.phx1.mozilla.com/ dreadnot] (restricted, you'll need VPN+permissions to get here)
# Go to [https://deploy.mozaws.net/job/marketplace-STAGE/ jenkins] (restricted, you'll need VPN+LDAP login to get here)
# Push items in the following order (push by clicking "phx1" and then typing in the name of the tag or hash and hitting deploy):
# Push items by choosing "Build with Parameters" (on the left, above "Build History" -- if you don't see that option, you need to ask Ops to change your permissions).
## payments.allizom.org-solitude
# Enter the tag to be deployed where it says "DeployRef" -- note that the tag '''must''' be the same for all repos.
## payments-proxy.allizom.org-solitude
## marketplace.allizom.org-spartacus
## marketplace.allizom.org-webpay
## monolith.allizom.org-aggregator
## marketplace.allizom.org-marketplace-stats
## marketplace.allizom.org-rocketfuel
## marketplace.allizom.org-commbadge
## marketplace.allizom.org-discoplace
## marketplace.allizom.org-transonic
## marketplace.allizom.org-fireplace
## marketplace.allizom.org-zamboni (would be good to wait for everything else to finish deploying before hitting this one)


There is [https://github.com/jasonthomas/random/blob/master/dreadnot.deploy a script] which can also do all this for you. You'll need to set up a config file (mine is '''dreadnot.settings''' which will be used in the commands that follow).


<code><pre>
==== What it means to steward the push ====
# I have this saved as dreadnot.settings
While the ideal is for pushes to be uneventful, that's not always the case. The push hero isn't expected to single-handedly resolve any issues, but they are expected to work with Ops to identify issues and get the proper help (most likely the relevant commit author). '''It's important that this happens as part of the push to Stage, rather than on Tuesday as part of the push to Production.''' That's part of the point of having a Staging site.
[dev]
username = <your dreadnot user name>
password = <your dreadnot password>
dreadnot = https://dreadnot-stage.addonsadm.private.phx1.mozilla.com
region = phx1
</pre></code>


Once you have the settings file configured, you can run the deploy commands, monitoring the deploys on the dreadnot landing page.
'''Important note about data migrations:''' in our system, as with any system that isn't under immediate control (due to load-balancing or caching), we have to ensure that a push doesn't incur unreasonable system downtime. Data migrations are a known risk in this regard. If a migration on Stage shows that an unacceptable lag in performance will occur, the relevant commit should be refactored so that the to-be-pushed code does not rely on to-be-pushed data changes -- and Ops will need to know that updating the database servers must be handled differently.


<code><pre>
For example:
python dreadnot.deploy payments.allizom.org-solitude -r [TAG] --conf=dreadnot.settings -e dev
python dreadnot.deploy payments-proxy.allizom.org-solitude  -r [TAG] --conf=dreadnot.settings -e dev
python dreadnot.deploy marketplace.allizom.org-spartacus -r [TAG] --conf=dreadnot.settings -e dev
python dreadnot.deploy marketplace.allizom.org-webpay -r [TAG] --conf=dreadnot.settings -e dev


python dreadnot.deploy marketplace.allizom.org-rocketfuel  -r [TAG] --conf=dreadnot.settings -e dev
''Task: rename a column in a 4-million row table.'' This can take minutes, and can render the system unresponsive during that time. To do this without noticeable downtime:
python dreadnot.deploy marketplace.allizom.org-marketplace-stats  -r [TAG] --conf=dreadnot.settings -e dev
# Add a new column with the new column name
python dreadnot.deploy marketplace.allizom.org-commbadge -r [TAG] --conf=dreadnot.settings -e dev
# Copy data from old column to new column via SQL script
python dreadnot.deploy marketplace.allizom.org-fireplace -r [TAG] --conf=dreadnot.settings -e dev
# Push the code that uses the new column
# Update any rows that may have been added during previous steps
# Remove the old column


python dreadnot.deploy marketplace.allizom.org-discoplace -r [TAG] --conf=dreadnot.settings -e dev
... with Ops performing steps 1, 2, and 5 on each database server individually (by taking it out of rotation, running updates, and then putting it back into rotation to catch up via replication). We don't want to surprise Ops with this on Tuesday; we'd want to identify this if not during tagging, at least after the push to Stage.
python dreadnot.deploy marketplace.allizom.org-transonic -r [TAG] --conf=dreadnot.settings -e dev


# wait for the rest to finish deploying, then:
python dreadnot.deploy marketplace.allizom.org-zamboni -r [TAG] --conf=dreadnot.settings -e dev
</pre></code>
Presumably we can script this too, but for now it can be useful to run each deploy separately to catch problems before they cascade.


=== Pushing ===
=== Pushing ===
Pushes happen '''Tuesday at 11am'''.  There is an etherpad made each week named
Pushes happen '''Tuesday at 11am'''.  There is an etherpad made each week named mkt-YYYY-MM-DD.  An [https://etherpad.mozilla.org/amo-2014-01-14 example].  The push will mostly follow this etherpad and any special notes should be in that pad.
mkt-YYYY-MM-DD.  An [https://etherpad.mozilla.org/amo-2014-01-14 example].  The
push will mostly follow this etherpad and any special notes should be in that pad.


You might want to add in meeting for yourself in Zimbra for the push time so that people won't try and schedule you for meetings.
You might want to add in a meeting for yourself for the push time so that people won't try and schedule you for meetings.


To push:
To push:
# The release manager (you), QA (krupa), and Ops (Jason) should be in contact on IRC and in the Marketplace vidyo room
# The release manager (you), QA (krupa), and Ops (jason or jlaz) should be in contact on IRC and in the Marketplace vidyo room.
# Once everyone gives the thumbs up Ops will push the actual code using [https://mana.mozilla.org/wiki/display/websites/Services#Services-DreadnotInstances dreadnot].  Ops will push the projects in order (same order you did for stage).  Talk on vidyo if there are any questions.
# Once everyone gives the thumbs up Ops will push the actual code using jenkins.  Ops will push the projects in order (same order you did for stage).  Talk on vidyo if there are any questions.
# The IRC bots will say when the pushes are done
# The IRC bots will say when the pushes are done.
# Once the push is done, QA will verify any changes.  Work with them to flip any waffle switches or tweak any adjustments
# Once the push is done, QA will verify changes.  Work with them to flip any waffle switches or tweak any adjustments.
# Whilst QA is reviewing....
# Whilst QA is reviewing...
## Review [http://sentry.mktmon.services.phx1.mozilla.com/ sentry] errors, there will often be errors during the push, but after there should be nothing to worry about
## Review [http://sentry.mktmon.services.phx1.mozilla.com/ sentry] errors; there will often be errors during the push, but after there should be nothing to worry about at this point;
## Review the graphite graphs on the [http://dashboard.mktadm.ops.services.phx1.mozilla.com/ dashboard] to see if anything looks amiss
## Review the graphite graphs on the [http://dashboard.mktadm.ops.services.phx1.mozilla.com/ dashboard] to see if anything looks amiss;
## You can look at nagios as well if you like on the dashboard, but Ops will do that.
## You can look at nagios as well if you like on the dashboard, but Ops will do that.
# If QA or Ops finds something that needs fixing immediately:
# If QA or Ops finds something that needs fixing immediately:
## Either write a patch or find someone who can
## Write a patch (or find someone who can);
## Cherry-pick the patch onto the previous tag([https://gist.github.com/anonymous/8787796 Example])
## Cherry-pick the patch onto the previous tag ([https://gist.github.com/anonymous/8787796 Example]);
## Go back to step 2 until QA is (relatively ;) happy
## Go back to step 2 until QA is happy. OK, until QA is satisfied, then.
# Once QA, Ops, and you all sign off the push is over.  Record the time it took in the bottom of the etherpad
# Once QA, Ops, and you all sign off the push is over.  Record the time it took in the bottom of the etherpad.


=== Post Push ===
=== After the Push ===


# Create a new etherpad for the next week using the [https://wiki.mozilla.org/Marketplace/Templates#Push_Etherpad push template]
# Create a new etherpad for the next week using the [https://wiki.mozilla.org/Marketplace/Templates#Push_Etherpad push template].
# Edit the topic in the secret channel pointing to the new etherpad.
# Edit the topic in the secret channel pointing to the new etherpad.
# Remind next week's release manager they are on the hook :)
# Remind next week's release manager they are on the hook! :)
# Send an email to the public mailing list (dev-marketplace@lists.mozilla.org) saying how the push went. If there was reason for multiple pushes, or anything that could be improved or fixed (eg: dodgy migration), let the team know[https://wiki.mozilla.org/Marketplace/Templates#Push_Email simple template]
# Send an email to the public mailing list (dev-marketplace@lists.mozilla.org) saying how the push went. If there was reason for multiple pushes, or anything that could be improved or fixed (e.g. dodgy migration), let the team know using [https://wiki.mozilla.org/Marketplace/Templates#Push_Email this handy template].
 


=== Release manager rotation ===
=== Release manager rotation ===


* ashort
* kumar
* spasovski
* chuck
* ngoke
* mstriemer
* robhudson
* ddurst
* ddurst
* jared
* andym


(there will be exceptions.  No problem, we just need to be aware of them and plan for them)
(There will be exceptions to the rotation.  No problem, we just need to be aware of them and plan for them.)

Latest revision as of 02:33, 1 April 2016

Stop (medium size).png
The Marketplace has been placed into maintenance mode. It is no longer under active development. You can read complete details here.

Marketplace Push Duty

So, you're going to update marketplace.firefox.com, eh? You've come to the right place. As release manager your responsibilities are:

  • Tagging releases when the milestone closes and updating stage
  • Evaluating any potential impact of the push on system performance
  • Evaluating and cherry picking requests for the tag after it closes
  • Ensuring the waffle flags on stage are set appropriately (if it's going out in the next push, it's enabled, otherwise it is equivalent to production)
  • Working with Ops during push to make sure the release is smooth
  • Working with QA to make sure any concerns are addressed
  • Following up with Ops and QA to do repeat pushes to address any critical issues
  • Noting any new major features going out on the etherpad
  • Telling the person after you that they are on for the next week

Calendar

If you'd like some calendar events to remind you when to do things, AndyM set one up: HTML or iCal.

You can subscribe to this for the week you are on push duty, then turn off when you are off it.

A walkthrough of a push

Tagging and Pushing to Stage

You should be tagging Friday at 11am PST before you expect to push. Remind folks on IRC that you're tagging and make sure they don't have half-finished patches.

  1. Tag the repositories and push the tags to Stage -- this can be done automatically or manually.
  2. Actively steward the push to Stage -- if there's an error during push or if the push will have adverse affects on production performance, work with Ops and commit authors to either redo or adjust the push (more on that below).
  3. Update the etherpad with the compare URLs for each repo -- add in the github compare URLs into the etherpad, so when the push comes people can easily see what is about to go out.


Manual tagging and pushing to Stage

Name your tag with the date of the push in the format YYYY.MM.DD.

The following repositories need tagging:

There is a script which can do all that for you. Try:

 python tagz.py -r mozilla/commbadge,mozilla/fireplace,mozilla/marketplace-operator-dashboard,mozilla/marketplace-stats,mozilla/monolith-aggregator,mozilla/transonic,mozilla/zamboni,mozilla/marketplace-content-tools -c create -t YYYY.MM.DD

Next you'll need to update the staging servers:

  1. Go to jenkins (restricted, you'll need VPN+LDAP login to get here)
  2. Push items by choosing "Build with Parameters" (on the left, above "Build History" -- if you don't see that option, you need to ask Ops to change your permissions).
  3. Enter the tag to be deployed where it says "DeployRef" -- note that the tag must be the same for all repos.


What it means to steward the push

While the ideal is for pushes to be uneventful, that's not always the case. The push hero isn't expected to single-handedly resolve any issues, but they are expected to work with Ops to identify issues and get the proper help (most likely the relevant commit author). It's important that this happens as part of the push to Stage, rather than on Tuesday as part of the push to Production. That's part of the point of having a Staging site.

Important note about data migrations: in our system, as with any system that isn't under immediate control (due to load-balancing or caching), we have to ensure that a push doesn't incur unreasonable system downtime. Data migrations are a known risk in this regard. If a migration on Stage shows that an unacceptable lag in performance will occur, the relevant commit should be refactored so that the to-be-pushed code does not rely on to-be-pushed data changes -- and Ops will need to know that updating the database servers must be handled differently.

For example:

Task: rename a column in a 4-million row table. This can take minutes, and can render the system unresponsive during that time. To do this without noticeable downtime:

  1. Add a new column with the new column name
  2. Copy data from old column to new column via SQL script
  3. Push the code that uses the new column
  4. Update any rows that may have been added during previous steps
  5. Remove the old column

... with Ops performing steps 1, 2, and 5 on each database server individually (by taking it out of rotation, running updates, and then putting it back into rotation to catch up via replication). We don't want to surprise Ops with this on Tuesday; we'd want to identify this if not during tagging, at least after the push to Stage.


Pushing

Pushes happen Tuesday at 11am. There is an etherpad made each week named mkt-YYYY-MM-DD. An example. The push will mostly follow this etherpad and any special notes should be in that pad.

You might want to add in a meeting for yourself for the push time so that people won't try and schedule you for meetings.

To push:

  1. The release manager (you), QA (krupa), and Ops (jason or jlaz) should be in contact on IRC and in the Marketplace vidyo room.
  2. Once everyone gives the thumbs up Ops will push the actual code using jenkins. Ops will push the projects in order (same order you did for stage). Talk on vidyo if there are any questions.
  3. The IRC bots will say when the pushes are done.
  4. Once the push is done, QA will verify changes. Work with them to flip any waffle switches or tweak any adjustments.
  5. Whilst QA is reviewing...
    1. Review sentry errors; there will often be errors during the push, but after there should be nothing to worry about at this point;
    2. Review the graphite graphs on the dashboard to see if anything looks amiss;
    3. You can look at nagios as well if you like on the dashboard, but Ops will do that.
  6. If QA or Ops finds something that needs fixing immediately:
    1. Write a patch (or find someone who can);
    2. Cherry-pick the patch onto the previous tag (Example);
    3. Go back to step 2 until QA is happy. OK, until QA is satisfied, then.
  7. Once QA, Ops, and you all sign off the push is over. Record the time it took in the bottom of the etherpad.

After the Push

  1. Create a new etherpad for the next week using the push template.
  2. Edit the topic in the secret channel pointing to the new etherpad.
  3. Remind next week's release manager they are on the hook! :)
  4. Send an email to the public mailing list (dev-marketplace@lists.mozilla.org) saying how the push went. If there was reason for multiple pushes, or anything that could be improved or fixed (e.g. dodgy migration), let the team know using this handy template.


Release manager rotation

  • ddurst

(There will be exceptions to the rotation. No problem, we just need to be aware of them and plan for them.)