Marketplace/PushDuty: Difference between revisions
(added banner) |
|||
(24 intermediate revisions by 6 users not shown) | |||
Line 1: | Line 1: | ||
{{Marketplace_banner}} | |||
== Marketplace Push Duty == | == Marketplace Push Duty == | ||
So, you're going to update marketplace.firefox.com, eh? You've come to the right place. As release manager your responsibilities | So, you're going to update marketplace.firefox.com, eh? You've come to the right place. As release manager your responsibilities are: | ||
* Tagging releases when the milestone closes and updating stage | * Tagging releases when the milestone closes and updating stage | ||
* '''Evaluating any potential impact of the push on system performance''' | |||
* '''Evaluating''' and cherry picking requests for the tag after it closes | * '''Evaluating''' and cherry picking requests for the tag after it closes | ||
* Ensuring | * Ensuring the waffle flags on stage are set appropriately (if it's going out in the next push, it's enabled, otherwise it is equivalent to production) | ||
* Working with | * Working with Ops during push to make sure the release is smooth | ||
* Working with QA | * Working with QA to make sure any concerns are addressed | ||
* | * Following up with Ops and QA to do repeat pushes to address any critical issues | ||
* Noting any new major features going out on the etherpad | * Noting any new major features going out on the etherpad | ||
* Telling the person after you that they are on for the next week | * Telling the person after you that they are on for the next week | ||
Line 15: | Line 17: | ||
=== Calendar === | === Calendar === | ||
If you'd like some calendar events to remind you when to do things, | If you'd like some calendar events to remind you when to do things, AndyM set one up: [https://www.google.com/calendar/embed?src=mozilla.com_5efh3ujitq1gt5l86qpiaka39g%40group.calendar.google.com&ctz=America/Vancouver HTML] or [https://www.google.com/calendar/ical/mozilla.com_5efh3ujitq1gt5l86qpiaka39g%40group.calendar.google.com/public/basic.ics iCal]. | ||
You can subscribe to this for the week you are on push duty, then turn off when you are off it. | |||
You can subscribe to this for the week you are on push duty, then turn off when you are off it | |||
== A walkthrough of a push == | == A walkthrough of a push == | ||
=== Tagging === | === Tagging and Pushing to Stage === | ||
You should be tagging '''Friday at 11am PST''' before you expect to push. | You should be tagging '''Friday at 11am PST''' before you expect to push. Remind folks on IRC that you're tagging and make sure they don't have half-finished patches. | ||
tagging and | |||
# '''Tag the repositories and push the tags to Stage''' -- this can be done [[#Automatic tagging and pushing to Stage|automatically]] or [[#Manual tagging and pushing to Stage|manually]]. | |||
# '''Actively steward the push to Stage''' -- if there's an error during push '''or''' if the push will have adverse affects on production performance, work with Ops and commit authors to either redo or adjust the push (more on that [[#What it means to steward the push|below]]). | |||
# '''Update the etherpad with the compare URLs for each repo''' -- add in the github compare URLs into the etherpad, so when the push comes people can easily see what is about to go out. | |||
==== Manual | ==== Manual tagging and pushing to Stage ==== | ||
Name your tag with the '''date of the push''' in the format YYYY.MM.DD. | Name your tag with the '''date of the push''' in the format YYYY.MM.DD. | ||
The following repositories need tagging: | The following repositories need tagging: | ||
* https://github.com/mozilla/commbadge | * https://github.com/mozilla/commbadge | ||
* https://github.com/mozilla/fireplace | * https://github.com/mozilla/fireplace | ||
* https://github.com/mozilla/marketplace-stats | * https://github.com/mozilla/marketplace-stats | ||
* https://github.com/mozilla/monolith-aggregator | * https://github.com/mozilla/monolith-aggregator | ||
* https://github.com/mozilla/transonic | * https://github.com/mozilla/transonic | ||
* https://github.com/mozilla/zamboni | * https://github.com/mozilla/zamboni | ||
* https://github.com/mozilla/ | * https://github.com/mozilla/marketplace-operator-dashboard | ||
* https://github.com/mozilla/marketplace-content-tools | |||
There is [https://github.com/cvan/tagz/ a script] which can do all that for you. Try: | There is [https://github.com/cvan/tagz/ a script] which can do all that for you. Try: | ||
python tagz.py -r mozilla/ | python tagz.py -r mozilla/commbadge,mozilla/fireplace,mozilla/marketplace-operator-dashboard,mozilla/marketplace-stats,mozilla/monolith-aggregator,mozilla/transonic,mozilla/zamboni,mozilla/marketplace-content-tools -c create -t YYYY.MM.DD | ||
Next you'll need to update the staging servers: | Next you'll need to update the staging servers: | ||
# | # Go to [https://deploy.mozaws.net/job/marketplace-STAGE/ jenkins] (restricted, you'll need VPN+LDAP login to get here) | ||
# Push items | # Push items by choosing "Build with Parameters" (on the left, above "Build History" -- if you don't see that option, you need to ask Ops to change your permissions). | ||
# Enter the tag to be deployed where it says "DeployRef" -- note that the tag '''must''' be the same for all repos. | |||
# | |||
==== What it means to steward the push ==== | |||
While the ideal is for pushes to be uneventful, that's not always the case. The push hero isn't expected to single-handedly resolve any issues, but they are expected to work with Ops to identify issues and get the proper help (most likely the relevant commit author). '''It's important that this happens as part of the push to Stage, rather than on Tuesday as part of the push to Production.''' That's part of the point of having a Staging site. | |||
'''Important note about data migrations:''' in our system, as with any system that isn't under immediate control (due to load-balancing or caching), we have to ensure that a push doesn't incur unreasonable system downtime. Data migrations are a known risk in this regard. If a migration on Stage shows that an unacceptable lag in performance will occur, the relevant commit should be refactored so that the to-be-pushed code does not rely on to-be-pushed data changes -- and Ops will need to know that updating the database servers must be handled differently. | |||
For example: | |||
''Task: rename a column in a 4-million row table.'' This can take minutes, and can render the system unresponsive during that time. To do this without noticeable downtime: | |||
# Add a new column with the new column name | |||
# Copy data from old column to new column via SQL script | |||
# Push the code that uses the new column | |||
# Update any rows that may have been added during previous steps | |||
# Remove the old column | |||
... with Ops performing steps 1, 2, and 5 on each database server individually (by taking it out of rotation, running updates, and then putting it back into rotation to catch up via replication). We don't want to surprise Ops with this on Tuesday; we'd want to identify this if not during tagging, at least after the push to Stage. | |||
=== Pushing === | === Pushing === | ||
Pushes happen '''Tuesday at 11am'''. There is an etherpad made each week named | Pushes happen '''Tuesday at 11am'''. There is an etherpad made each week named mkt-YYYY-MM-DD. An [https://etherpad.mozilla.org/amo-2014-01-14 example]. The push will mostly follow this etherpad and any special notes should be in that pad. | ||
mkt-YYYY-MM-DD. An [https://etherpad.mozilla.org/amo-2014-01-14 example]. The | |||
push will mostly follow this etherpad and any special notes should be in that pad. | |||
You might want to add in meeting for yourself | You might want to add in a meeting for yourself for the push time so that people won't try and schedule you for meetings. | ||
To push: | To push: | ||
# The release manager (you), QA (krupa), and Ops ( | # The release manager (you), QA (krupa), and Ops (jason or jlaz) should be in contact on IRC and in the Marketplace vidyo room. | ||
# Once everyone gives the thumbs up Ops will push the actual code using | # Once everyone gives the thumbs up Ops will push the actual code using jenkins. Ops will push the projects in order (same order you did for stage). Talk on vidyo if there are any questions. | ||
# The IRC bots will say when the pushes are done | # The IRC bots will say when the pushes are done. | ||
# Once the push is done, QA will verify | # Once the push is done, QA will verify changes. Work with them to flip any waffle switches or tweak any adjustments. | ||
# Whilst QA is reviewing | # Whilst QA is reviewing... | ||
## Review [http://sentry.mktmon.services.phx1.mozilla.com/ sentry] errors | ## Review [http://sentry.mktmon.services.phx1.mozilla.com/ sentry] errors; there will often be errors during the push, but after there should be nothing to worry about at this point; | ||
## Review the graphite graphs on the [http://dashboard.mktadm.ops.services.phx1.mozilla.com/ dashboard] to see if anything looks amiss | ## Review the graphite graphs on the [http://dashboard.mktadm.ops.services.phx1.mozilla.com/ dashboard] to see if anything looks amiss; | ||
## You can look at nagios as well if you like on the dashboard, but Ops will do that. | ## You can look at nagios as well if you like on the dashboard, but Ops will do that. | ||
# If QA or Ops finds something that needs fixing immediately: | # If QA or Ops finds something that needs fixing immediately: | ||
## | ## Write a patch (or find someone who can); | ||
## Cherry-pick the patch onto the previous tag | ## Cherry-pick the patch onto the previous tag ([https://gist.github.com/anonymous/8787796 Example]); | ||
## Go back to step 2 until QA is | ## Go back to step 2 until QA is happy. OK, until QA is satisfied, then. | ||
# Once QA, Ops, and you all sign off the push is over. Record the time it took in the bottom of the etherpad | # Once QA, Ops, and you all sign off the push is over. Record the time it took in the bottom of the etherpad. | ||
=== | === After the Push === | ||
# Create a new etherpad for the next week using the [https://wiki.mozilla.org/Marketplace/Templates#Push_Etherpad push template] | # Create a new etherpad for the next week using the [https://wiki.mozilla.org/Marketplace/Templates#Push_Etherpad push template]. | ||
# Edit the topic in the secret channel pointing to the new etherpad. | # Edit the topic in the secret channel pointing to the new etherpad. | ||
# Remind next week's release manager they are on the hook :) | # Remind next week's release manager they are on the hook! :) | ||
# Send an email to the public mailing list (dev-marketplace@lists.mozilla.org) saying how the push went. If there was reason for multiple pushes, or anything that could be improved or fixed ( | # Send an email to the public mailing list (dev-marketplace@lists.mozilla.org) saying how the push went. If there was reason for multiple pushes, or anything that could be improved or fixed (e.g. dodgy migration), let the team know using [https://wiki.mozilla.org/Marketplace/Templates#Push_Email this handy template]. | ||
=== Release manager rotation === | === Release manager rotation === | ||
* ddurst | * ddurst | ||
( | (There will be exceptions to the rotation. No problem, we just need to be aware of them and plan for them.) |
Latest revision as of 02:33, 1 April 2016
Marketplace Push Duty
So, you're going to update marketplace.firefox.com, eh? You've come to the right place. As release manager your responsibilities are:
- Tagging releases when the milestone closes and updating stage
- Evaluating any potential impact of the push on system performance
- Evaluating and cherry picking requests for the tag after it closes
- Ensuring the waffle flags on stage are set appropriately (if it's going out in the next push, it's enabled, otherwise it is equivalent to production)
- Working with Ops during push to make sure the release is smooth
- Working with QA to make sure any concerns are addressed
- Following up with Ops and QA to do repeat pushes to address any critical issues
- Noting any new major features going out on the etherpad
- Telling the person after you that they are on for the next week
Calendar
If you'd like some calendar events to remind you when to do things, AndyM set one up: HTML or iCal.
You can subscribe to this for the week you are on push duty, then turn off when you are off it.
A walkthrough of a push
Tagging and Pushing to Stage
You should be tagging Friday at 11am PST before you expect to push. Remind folks on IRC that you're tagging and make sure they don't have half-finished patches.
- Tag the repositories and push the tags to Stage -- this can be done automatically or manually.
- Actively steward the push to Stage -- if there's an error during push or if the push will have adverse affects on production performance, work with Ops and commit authors to either redo or adjust the push (more on that below).
- Update the etherpad with the compare URLs for each repo -- add in the github compare URLs into the etherpad, so when the push comes people can easily see what is about to go out.
Manual tagging and pushing to Stage
Name your tag with the date of the push in the format YYYY.MM.DD.
The following repositories need tagging:
- https://github.com/mozilla/commbadge
- https://github.com/mozilla/fireplace
- https://github.com/mozilla/marketplace-stats
- https://github.com/mozilla/monolith-aggregator
- https://github.com/mozilla/transonic
- https://github.com/mozilla/zamboni
- https://github.com/mozilla/marketplace-operator-dashboard
- https://github.com/mozilla/marketplace-content-tools
There is a script which can do all that for you. Try:
python tagz.py -r mozilla/commbadge,mozilla/fireplace,mozilla/marketplace-operator-dashboard,mozilla/marketplace-stats,mozilla/monolith-aggregator,mozilla/transonic,mozilla/zamboni,mozilla/marketplace-content-tools -c create -t YYYY.MM.DD
Next you'll need to update the staging servers:
- Go to jenkins (restricted, you'll need VPN+LDAP login to get here)
- Push items by choosing "Build with Parameters" (on the left, above "Build History" -- if you don't see that option, you need to ask Ops to change your permissions).
- Enter the tag to be deployed where it says "DeployRef" -- note that the tag must be the same for all repos.
What it means to steward the push
While the ideal is for pushes to be uneventful, that's not always the case. The push hero isn't expected to single-handedly resolve any issues, but they are expected to work with Ops to identify issues and get the proper help (most likely the relevant commit author). It's important that this happens as part of the push to Stage, rather than on Tuesday as part of the push to Production. That's part of the point of having a Staging site.
Important note about data migrations: in our system, as with any system that isn't under immediate control (due to load-balancing or caching), we have to ensure that a push doesn't incur unreasonable system downtime. Data migrations are a known risk in this regard. If a migration on Stage shows that an unacceptable lag in performance will occur, the relevant commit should be refactored so that the to-be-pushed code does not rely on to-be-pushed data changes -- and Ops will need to know that updating the database servers must be handled differently.
For example:
Task: rename a column in a 4-million row table. This can take minutes, and can render the system unresponsive during that time. To do this without noticeable downtime:
- Add a new column with the new column name
- Copy data from old column to new column via SQL script
- Push the code that uses the new column
- Update any rows that may have been added during previous steps
- Remove the old column
... with Ops performing steps 1, 2, and 5 on each database server individually (by taking it out of rotation, running updates, and then putting it back into rotation to catch up via replication). We don't want to surprise Ops with this on Tuesday; we'd want to identify this if not during tagging, at least after the push to Stage.
Pushing
Pushes happen Tuesday at 11am. There is an etherpad made each week named mkt-YYYY-MM-DD. An example. The push will mostly follow this etherpad and any special notes should be in that pad.
You might want to add in a meeting for yourself for the push time so that people won't try and schedule you for meetings.
To push:
- The release manager (you), QA (krupa), and Ops (jason or jlaz) should be in contact on IRC and in the Marketplace vidyo room.
- Once everyone gives the thumbs up Ops will push the actual code using jenkins. Ops will push the projects in order (same order you did for stage). Talk on vidyo if there are any questions.
- The IRC bots will say when the pushes are done.
- Once the push is done, QA will verify changes. Work with them to flip any waffle switches or tweak any adjustments.
- Whilst QA is reviewing...
- If QA or Ops finds something that needs fixing immediately:
- Write a patch (or find someone who can);
- Cherry-pick the patch onto the previous tag (Example);
- Go back to step 2 until QA is happy. OK, until QA is satisfied, then.
- Once QA, Ops, and you all sign off the push is over. Record the time it took in the bottom of the etherpad.
After the Push
- Create a new etherpad for the next week using the push template.
- Edit the topic in the secret channel pointing to the new etherpad.
- Remind next week's release manager they are on the hook! :)
- Send an email to the public mailing list (dev-marketplace@lists.mozilla.org) saying how the push went. If there was reason for multiple pushes, or anything that could be improved or fixed (e.g. dodgy migration), let the team know using this handy template.
Release manager rotation
- ddurst
(There will be exceptions to the rotation. No problem, we just need to be aware of them and plan for them.)