Taskcluster/Monitoring/Services

From MozillaWiki
< Taskcluster
Revision as of 12:01, 18 September 2015 by SelenaDeckelmann (talk | contribs) (Created page with " == Service Tier Definitions == * Tier 1: Required for TaskCluster Platform function * Tier 2: Insights into operations of TaskCluster Platform * Tier 3: External infra cause...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Service Tier Definitions

  • Tier 1: Required for TaskCluster Platform function
  • Tier 2: Insights into operations of TaskCluster Platform
  • Tier 3: External infra causes task failures

Services in each tier

Tier 1

  • AWS (us-east-1, us-west-1/2)
  • Heroku
  • Tutum
  • Azure
  • DockerHub (moving to AWS S3 for primary automation)
  • Pulse / CloudAMQP
  • Mozilla LDAP (not yet, soon)

Tier 2

  • Papertrail
  • Influx

Tier 3

  • Hg.mozilla.org
  • git.mozilla.org
  • github.com
  • VPN -> Balrog
  • AWS not in Tier 1

Project summary

  • Mak an API for events related to infrastructure status
  • Emit pulse messages for events
  • have an out of band status page
  • Consider: pause, or stop accepting tasks or stop scheduling tasks on Tier 1 failures