|
|
(3 intermediate revisions by the same user not shown) |
Line 1: |
Line 1: |
| | = Documentation = |
| | All DevOps documentation can be found in our [https://github.com/mozilla/identity-ops/blob/master/docs/ idenetity-ops github repository documentation] section |
| | |
| | = Source Code = |
| | All DevOps tools and provisioning logic can be found in our [https://github.com/mozilla/identity-ops/ idenetity-ops github repository] |
| | |
| = Roadmap = | | = Roadmap = |
| == Q1 2013 roadmap == | | == Q1/Q2 2013 roadmap == |
| In Q1 2013 the Identity DevOps team will be moving services out of the physical datacenter SCL2 and into Amazon Web Services. | | In Q1 2013 the Identity DevOps team will be moving services out of the physical datacenter SCL2 and into Amazon Web Services. |
|
| |
|
| 2/4
| | * 3/19 : VPC outbound internet access via NAT instance |
| * 1/28 - 2/4 : roadmap defined and signed off | | * 3/21 : Webhead AMI |
| * 1/28 - 2/4 : technology stack justification written and shared
| | * 3/22 : Zeus routing logic converted to nginx |
| | | * 3/26 : keysigner AMI |
| 2/11
| | * 3/27 : dbwriter AMI |
| * 2/4 - 2/11 : chef server built and working
| | * 3/29 : Cross DC VPN for DBs |
| * 2/4 - 2/11 : established 1 region VPC
| | * 4/2 : db AMI |
| | | * 4/3 : Handoff environment to QA |
| 2/18
| | * 4/5 : Nagios monitoring |
| * 2/4 - 2/18 : completed a mini provisioning test and plan | | * 4/9 : Create region 2 |
| * 2/11 - 2/18 : written the webhead chef provisioning logic | | * 4/10 : QA approval of region 1 |
| * 2/11 - 2/18 : written the nginx chef provisioning logic and carried over existing nginx routing logic
| | * 4/30 : Final day to turn down servers at SCL2 |
| * '''milestone : chef can fully provision webheads'''
| |
| | |
| 2/25
| |
| * 2/18 - 2/25 : zeus routing logic is converted into nginx logic
| |
| * 2/18 - 2/25 : written the nagios chef provisioning logic
| |
| * 2/18 - 2/25 : basic webhead nagios checks created
| |
| * '''milestone : admin can see the monitored availability and performance of the webhead'''
| |
| | |
| 3/4
| |
| * 2/25 - 3/4 : ELB is setup and sending traffic to the webhead | |
| * 2/18 - 2/25 : basic webhead nagios checks against the ELB created
| |
| * '''milestone : internet client can fetch persona main page from AWS traversing ELB'''
| |
| | |
| 3/11
| |
| * 3/4 - 3/11 : written swebhead chef provisioning logic | |
| * 3/4 - 3/11 : ELB configured for swebhead cluster | |
| * 3/4 - 3/11 : written db chef provisioning logic | |
| * 3/4 - 3/11 : ELB configured for db cluster
| |
| * 3/4 - 3/11 : written keysign chef provisioning logic
| |
| * 3/4 - 3/11 : ELB configured for keysign cluster | |
| * 3/4 - 3/11 : established VPN to PHX1
| |
| * 3/4 - 3/11 : basic swebhead db and keysign nagios checks created | |
| * '''milestone : internet client can login using persona in AWS'''
| |
| | |
| 3/18
| |
| * 3/11 - 3/18 : written bigtent and squid proxy chef provisioning logic
| |
| * 3/11 - 3/18 : ELB configured for bigtent and squid clusters
| |
| * 3/11 - 3/18 : basic bigtent and squid proxy nagios checks created
| |
| * '''milestone : internet client can login with a yahoo address using yahoo bigtent'''
| |
| | |
| 3/25
| |
| * 3/18 - 3/25 : load tested/validated that region 1 is ready for prod traffic
| |
| * 3/18 - 3/25 : full security group logic is in place replicating existing physical network
| |
| * '''milestone : security : network security is in place and all tiers use proxies for communication'''
| |
| * '''milestone : load testing complete for region 1'''
| |
| | |
| 4/1
| |
| * 3/25 - 4/1 : moved master from PHX1 to region 1 AWS
| |
| * dynect is changed to balance between AWS region 1 and PHX1. SCL2 sits running as a backup
| |
| * '''milestone : all db writes are now going to AWS'''
| |
| * '''milestone : AWS region 1 is live in production, SCL2 no longer receives traffic'''
| |
| | |
| === State at end of Q1 2013 ===
| |
| * SCL2 is dark
| |
| * Production is running off of 1 AWS region and 1 physical datacenter
| |
| * runbooks for AWS deployments & core troubleshooting have been developed
| |
| * The staging environment has been moved to AWS
| |
| ** key differences between production and staging AWS areas: server localization & access
| |
| * monitoring: existing monitoring minus some cepmon rate-of-change monitors has been moved into a new nagios deployment in AWS
| |
| * alerting: existing minus cepmon-triggered stuff has been migrated | |
| | |
| == Q2 2013 roadmap ==
| |
| In Q2 DevOps will be bringing up the second AWS region and executing remaining tasks to get us to a truly highly available architecture, ready to graduate from beta
| |
| | |
| 4/8
| |
| * 3/25 - 4/8 : spun up region 2 AWS
| |
| * 3/25 - 4/8 : determined how to do log processing (logstash?) and pump data into zenoss
| |
| | |
| 4/15
| |
| * 4/8 - 4/15 : load tested/validated that region 2 is ready for prod traffic | |
| * 4/1 - 4/15 : written auto provisioning logic to call AWS and spin up instances, assign them roles, and pass them to chef for provisioning
| |
| * dynect is changed to balance between AWS region 1 and AWS region 2 | |
| * '''milestone : persona is fully hosted in AWS multi-region'''
| |
| | |
| 4/30 | |
| * Final day to turn down servers at SCL2
| |
| | |
| 5/13
| |
| * Modify DB architecture to remove single point of failure (single write master)
| |
| ** This is '''not''' re-evaluating our choice of persistence. It's just making our existing architecture truly fault-tolerant and highly available.
| |
| * Add more performance monitoring to enable later platform improvements
| |
| ** There are many ways we could further scale. To make intelligent choices, we need to gather information about the performance and behavior of our servers.
| |
| | |
| == Beyond ==
| |
| [https://github.com/mozilla/identity-ops/wiki/Operational-Improvements-List Additional Operational Improvements]
| |