CloudServices/Loop/Deploy: Difference between revisions

CloudServices/Loop/Deploy (view source)

Revision as of 10:37, 4 January 2016

4,044 bytes added , 4 January 2016

ADDING OPS, DEV, QA DEPLOYING FLOW.

Natim

68

edits

@@ Line 152: / Line 152: @@
 For Fx 34/35, we are choosing Option B. till we reach a point where have a history of injecting bugs due to this architecture.
+= Deploying flow =
+See full version at: https://old.etherpad-mozilla.org/deploy-release-process
+== How does a release get to production? ==
+* QA/DEV creates a stage deployment ticket and adds dependencies and blockers
+** (e.g. "Loop — Please deploy loop-server 0.13.0 to Stage)
+* DEV make a tag
+** Here we should try to make sure that the changelog has all Resolved/Fixed bugs going into this release.
+* OPS deploys build to stage
+* QA validates the fact that the release get deployed by OPS to stage
+* OPS set the stage bug to fixed as soon as it is deployed, yes this is fine
+* QA runs verification steps, quick tests, and loadtests
+** (after having set a window with partners)
+* QA set the bug to verified as soon as on as it is ok to deploy to Production
+* QA creates a deployment bug for production and add dependencies and blockers
+* OPS deploy the release to production and sets the bug to Resolved/Fixed
+* QA set the bug to verified as soon as deployment has been verified.
+** (This may include  verification by the Loop client and QA teams)
+* OPS should be monitoring the release for a specific period of time
+** (to watch out for unforeseen issues and side-effects)
+== What do we do in the following cases? ==
+* A bug is found during the stage validation
+** DEV fix the issue and make a new minor release 0.13.1 and create a new deployment request bug (e.g. "Loop — Please deploy loop-server 0.13.1 to Stage)
+*** (do we morph the existing one? jbonacci says no last time I did). So we close the current deployment bug and create a new one. ok.
+** OPS fix the issue and make a new minor release or a re-release of same build. We have had circumstances where the change is OPS-specific, not dev specific.
+** QA close the previous stage ticket as invalid and the story restarts with the new bug
+*** I am pondering this idea for minor vs. major releases. One the one hand, having a history in the ticket (12.0, 12.1, 12.2) is good. On the other hand, the ticket can get to large (see Loop-Server 12.2)...
+* A bug is found in production
+** DEV fix the issue and make a new minor release from the production release (e.g 0.12.3)
+** DEV creates a stage bug (e.g. "Loop — Please deploy loop-server 0.12.3 to Stage). Well, QA should create the Stage ticket with information gathered from Dev. But either way works for me...
+*** Then same story as an usual release
+== Who gives the green light when prod is ready to be updated? ==
+For instance, lately we had a bug in production that happened while stage validation was passed by QA.
+In this case, it's a bit tricky to know if we should deploy to prodution ot not.
+In order to avoid things going wrong, should we wait for QA to give the green light again before pushing something new to production?
+Consider this can be blocking the resolution of a problem.
+* As soon as the stage ticket has been verified and that the production bug is created
+** Then OPS have a QA green light and can start the deployment.
+** Right. And issues specific to Production are a special case anyway. If tests pass in Stage but something goes run in Production, then we need to add the fix to both. If there is a Production-specific issue (that we would never see in Stage), then we should approach it on a case-by-case basis. There are cases where we have had to push something special/specific/urgent/break fix for other Production environments. It's not something we should consider "normal procedure" though, because it requires Stage and Prod to be out of sync.
+* There is the idea of a code change that always needs to go through this process. This is DEV driven.
+* Then, there is the idea of a service-level change that always needs to go through this process. This should be OPS driven.
+* Then, sometimes we have a real emergency in Production that requires a change (DEV or OPS). We have not always been good about the process for this case.
+Examples:
+. The service is broken and needs a code change
+. Server issues like stack size, cpu/memory/disk issues, config issues, DB issues