Identity/DevOps: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
(Created page with "= Roadmap = == Q1 2013 roadmap == In Q1 2013 the Identity DevOps team will be moving services out of the physical datacenter SCL2 and into Amazon Web Services. 2/4 * 1/28 - 2/4 ...")
 
No edit summary
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
= Documentation =
All DevOps documentation can be found in our [https://github.com/mozilla/identity-ops/blob/master/docs/ idenetity-ops github repository documentation] section
= Source Code =
All DevOps tools and provisioning logic can be found in our [https://github.com/mozilla/identity-ops/ idenetity-ops github repository]
= Roadmap =
= Roadmap =
== Q1 2013 roadmap ==
== Q1/Q2 2013 roadmap ==
In Q1 2013 the Identity DevOps team will be moving services out of the physical datacenter SCL2 and into Amazon Web Services.
In Q1 2013 the Identity DevOps team will be moving services out of the physical datacenter SCL2 and into Amazon Web Services.


2/4
* 3/19 : VPC outbound internet access via NAT instance
* 1/28 - 2/4 : roadmap defined and signed off
* 3/21 : Webhead AMI
* 1/28 - 2/4 : technology stack justification written and shared
* 3/22 : Zeus routing logic converted to nginx
 
* 3/26 : keysigner AMI
2/11
* 3/27 : dbwriter AMI
* 2/4 - 2/11 : chef server built and working
* 3/29 : Cross DC VPN for DBs
* 2/4 - 2/11 : established 1 region VPC
* 4/2 : db AMI
 
* 4/3 : Handoff environment to QA
2/18
* 4/5 : Nagios monitoring
* 2/4 - 2/18 : completed a mini provisioning test and plan
* 4/9 : Create region 2
* 2/11 - 2/18 : written the webhead chef provisioning logic
* 4/10 : QA approval of region 1
* 2/11 - 2/18 : written the nginx chef provisioning logic and carried over existing nginx routing logic
* 4/30 : Final day to turn down servers at SCL2
* '''milestone : chef can fully provision webheads'''
 
2/25
* 2/18 - 2/25 : zeus routing logic is converted into nginx logic
* 2/18 - 2/25 : written the nagios chef provisioning logic
* 2/18 - 2/25 : basic webhead nagios checks created
* '''milestone : admin can see the monitored availability and performance of the webhead'''
 
3/4
* 2/25 - 3/4 : ELB is setup and sending traffic to the webhead
* 2/18 - 2/25 : basic webhead nagios checks against the ELB created
* '''milestone : internet client can fetch persona main page from AWS traversing ELB'''
 
3/11
* 3/4 - 3/11 : written swebhead chef provisioning logic
* 3/4 - 3/11 : ELB configured for swebhead cluster
* 3/4 - 3/11 : written db chef provisioning logic
* 3/4 - 3/11 : ELB configured for db cluster
* 3/4 - 3/11 : written keysign chef provisioning logic
* 3/4 - 3/11 : ELB configured for keysign cluster
* 3/4 - 3/11 : established VPN to PHX1
* 3/4 - 3/11 : basic swebhead db and keysign nagios checks created
* '''milestone : internet client can login using persona in AWS'''
 
3/18
* 3/11 - 3/18 : written bigtent and squid proxy chef provisioning logic
* 3/11 - 3/18 : ELB configured for bigtent and squid clusters
* 3/11 - 3/18 : basic bigtent and squid proxy nagios checks created
* '''milestone : internet client can login with a yahoo address using yahoo bigtent'''
 
3/25
* 3/18 - 3/25 : load tested/validated that region 1 is ready for prod traffic
* 3/18 - 3/25 : full security group logic is in place replicating existing physical network
* '''milestone : security : network security is in place and all tiers use proxies for communication'''
* '''milestone : load testing complete for region 1'''
 
4/1
* 3/25 - 4/1 : moved master from PHX1 to region 1 AWS
* dynect is changed to balance between AWS region 1 and PHX1. SCL2 sits running as a backup
* '''milestone : all db writes are now going to AWS'''
* '''milestone : AWS region 1 is live in production, SCL2 no longer receives traffic'''
 
=== State at end of Q1 2013 ===
* SCL2 is dark
* Production is running off of 1 AWS region and 1 physical datacenter
* runbooks for AWS deployments & core troubleshooting have been developed
* The staging environment has been moved to AWS
** key differences between production and staging AWS areas: server localization & access
* monitoring: existing monitoring minus some cepmon rate-of-change monitors has been moved into a new nagios deployment in AWS
* alerting: existing minus cepmon-triggered stuff has been migrated
 
== Q2 2013 roadmap ==
In Q2 DevOps will be bringing up the second AWS region and executing remaining tasks to get us to a truly highly available architecture, ready to graduate from beta
 
4/8
* 3/25 - 4/8 : spun up region 2 AWS
* 3/25 - 4/8 : determined how to do log processing (logstash?) and pump data into zenoss
 
4/15
* 4/8 - 4/15 : load tested/validated that region 2 is ready for prod traffic
* 4/1 - 4/15 : written auto provisioning logic to call AWS and spin up instances, assign them roles, and pass them to chef for provisioning
* dynect is changed to balance between AWS region 1 and AWS region 2
* '''milestone : persona is fully hosted in AWS multi-region'''
 
4/30
* Final day to turn down servers at SCL2
 
5/13
* Modify DB architecture to remove single point of failure (single write master)
** This is '''not''' re-evaluating our choice of persistence. It's just making our existing architecture truly fault-tolerant and highly available.
* Add more performance monitoring to enable later platform improvements
** There are many ways we could further scale. To make intelligent choices, we need to gather information about the performance and behavior of our servers.
 
== Beyond  ==
[https://github.com/mozilla/identity-ops/wiki/Operational-Improvements-List Additional Operational Improvements]

Latest revision as of 17:59, 20 June 2014

Documentation

All DevOps documentation can be found in our idenetity-ops github repository documentation section

Source Code

All DevOps tools and provisioning logic can be found in our idenetity-ops github repository

Roadmap

Q1/Q2 2013 roadmap

In Q1 2013 the Identity DevOps team will be moving services out of the physical datacenter SCL2 and into Amazon Web Services.

  • 3/19 : VPC outbound internet access via NAT instance
  • 3/21 : Webhead AMI
  • 3/22 : Zeus routing logic converted to nginx
  • 3/26 : keysigner AMI
  • 3/27 : dbwriter AMI
  • 3/29 : Cross DC VPN for DBs
  • 4/2 : db AMI
  • 4/3 : Handoff environment to QA
  • 4/5 : Nagios monitoring
  • 4/9 : Create region 2
  • 4/10 : QA approval of region 1
  • 4/30 : Final day to turn down servers at SCL2