CloudServices/Loop/Deploy: Difference between revisions

ADDING OPS, DEV, QA DEPLOYING FLOW.
m (Fix missing title)
(ADDING OPS, DEV, QA DEPLOYING FLOW.)
Line 152: Line 152:


For Fx 34/35, we are choosing Option B. till we reach a point where have a history of injecting bugs due to this architecture.
For Fx 34/35, we are choosing Option B. till we reach a point where have a history of injecting bugs due to this architecture.
= Deploying flow =
See full version at: https://old.etherpad-mozilla.org/deploy-release-process
== How does a release get to production? ==
* QA/DEV creates a stage deployment ticket and adds dependencies and blockers
** (e.g. "Loop — Please deploy loop-server 0.13.0 to Stage)
* DEV make a tag
** Here we should try to make sure that the changelog has all Resolved/Fixed bugs going into this release.
* OPS deploys build to stage
* QA validates the fact that the release get deployed by OPS to stage
* OPS set the stage bug to fixed as soon as it is deployed, yes this is fine
* QA runs verification steps, quick tests, and loadtests
** (after having set a window with partners)
* QA set the bug to verified as soon as on as it is ok to deploy to Production
* QA creates a deployment bug for production and add dependencies and blockers
* OPS deploy the release to production and sets the bug to Resolved/Fixed
* QA set the bug to verified as soon as deployment has been verified.
** (This may include  verification by the Loop client and QA teams)
* OPS should be monitoring the release for a specific period of time
** (to watch out for unforeseen issues and side-effects)
== What do we do in the following cases? ==
* A bug is found during the stage validation
** DEV fix the issue and make a new minor release 0.13.1 and create a new deployment request bug (e.g. "Loop — Please deploy loop-server 0.13.1 to Stage)
*** (do we morph the existing one? jbonacci says no last time I did). So we close the current deployment bug and create a new one. ok.
** OPS fix the issue and make a new minor release or a re-release of same build. We have had circumstances where the change is OPS-specific, not dev specific.
** QA close the previous stage ticket as invalid and the story restarts with the new bug
*** I am pondering this idea for minor vs. major releases. One the one hand, having a history in the ticket (12.0, 12.1, 12.2) is good. On the other hand, the ticket can get to large (see Loop-Server 12.2)...
* A bug is found in production
** DEV fix the issue and make a new minor release from the production release (e.g 0.12.3)
** DEV creates a stage bug (e.g. "Loop — Please deploy loop-server 0.12.3 to Stage). Well, QA should create the Stage ticket with information gathered from Dev. But either way works for me...
*** Then same story as an usual release
== Who gives the green light when prod is ready to be updated? ==
For instance, lately we had a bug in production that happened while stage validation was passed by QA.
In this case, it's a bit tricky to know if we should deploy to prodution ot not.
In order to avoid things going wrong, should we wait for QA to give the green light again before pushing something new to production?
Consider this can be blocking the resolution of a problem.
* As soon as the stage ticket has been verified and that the production bug is created
** Then OPS have a QA green light and can start the deployment.
** Right. And issues specific to Production are a special case anyway. If tests pass in Stage but something goes run in Production, then we need to add the fix to both. If there is a Production-specific issue (that we would never see in Stage), then we should approach it on a case-by-case basis. There are cases where we have had to push something special/specific/urgent/break fix for other Production environments. It's not something we should consider "normal procedure" though, because it requires Stage and Prod to be out of sync.
* There is the idea of a code change that always needs to go through this process. This is DEV driven.
* Then, there is the idea of a service-level change that always needs to go through this process. This should be OPS driven.
* Then, sometimes we have a real emergency in Production that requires a change (DEV or OPS). We have not always been good about the process for this case.
Examples:
1. The service is broken and needs a code change
2. Server issues like stack size, cpu/memory/disk issues, config issues, DB issues
68

edits