Devops/monitoring-alerting: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
Line 8: Line 8:
* For accounts, questions, or suggestions, email jp at mozillafoundation.org
* For accounts, questions, or suggestions, email jp at mozillafoundation.org


===== Monitoring =====
*MONITORING TOOLS, SYSTEMS, AND LINKS*
Mozilla Foundation applications are monitored and measured in a number of systems:
Mozilla Foundation applications are monitored and measured in a number of systems:
* '''Opsview, a Nagios clone with a much friendlier interface.'''
* '''Opsview, a Nagios clone with a much friendlier interface.'''
Line 19: Line 19:
:: [http://opsview.mofoprod.net:3000/status/service?filter=unhandled&order=state_desc&order=host&order=service&includeunhandledhosts=1 Current Unhandled Alerts (Login required)]
:: [http://opsview.mofoprod.net:3000/status/service?filter=unhandled&order=state_desc&order=host&order=service&includeunhandledhosts=1 Current Unhandled Alerts (Login required)]
:: [http://opsview.mofoprod.net:3000/event Recent Alerts in Opsview]
:: [http://opsview.mofoprod.net:3000/event Recent Alerts in Opsview]
:: !!!TODO : Add the guide for notifications & contact settings


* '''New Relic monitoring ''(Login Required)'''''
* '''New Relic monitoring ''(Login Required)'''''
Line 30: Line 31:
:: * Watching Mongo server utilization and metrics
:: * Watching Mongo server utilization and metrics
:: * Marks and compares new/old deployed versions of software
:: * Marks and compares new/old deployed versions of software
:: !!!TODO : Add the guide for notifications & contact settings


::  '''Important New Relic Links'''
::  '''Important New Relic Links'''
Line 39: Line 41:


* '''Log monitoring with [https://loggins.mofoprod.net Loggins (Kibana) (Login Required)]'''
* '''Log monitoring with [https://loggins.mofoprod.net Loggins (Kibana) (Login Required)]'''
* "AWS Infrastructure and Autoscaling Monitoring/Alerting"
:: * An email group exists to be notified of any autoscaling activities (up or down).  Contact jp at mozillafoundation.org to be added to this list.
:: * Cloudwatch in the AWS console is capable of monitoring many metrics and utilization metrics, including CPU usage or network usage for a group, database, server, or ELB.  Not many alarms are triggered from this outside of to trigger scaling.
:: Most AWS infrastructure is monitored via New Relic.  See the side menu options in New Relic for RDS, ELB, EC2, Elasticache, etc...

Revision as of 05:06, 31 May 2014

Mozilla Foundation Monitoring & Alerting

===== TLDR ===== :

  • MONITORING TOOLS, SYSTEMS, AND LINKS*

Mozilla Foundation applications are monitored and measured in a number of systems:

  • Opsview, a Nagios clone with a much friendlier interface.
* Monitors and alerts when servers in load balancers are unhealthy
* Monitors and alerts on uptime/downtime of overall endpoints, such as https://webmaker.org
* Monitors and alerts on database utilization and downtime.
"Important Opsview Links'
Public Status Page
Current Unhandled Alerts (Login required)
Recent Alerts in Opsview
!!!TODO : Add the guide for notifications & contact settings
  • New Relic monitoring (Login Required)
* Watching application response time in browser and server side
* Watching database and web server utilization, transactions, timings, and throughput
* Watching load balancer (ELB) metrics
* Performing serverside and client-side tracing of long running transactions
* Overall endpoint monitoring, such as https://webmaker.org
* Watching cache server utilization and metrics
* Watching Elasticsearch server utilization and metrics
* Watching Mongo server utilization and metrics
* Marks and compares new/old deployed versions of software
!!!TODO : Add the guide for notifications & contact settings
Important New Relic Links
New Relic Dashboards
Recent New Relic Alerts
New Relic Applications Overview
Recent Deployments
Browser / Front-end Performance Overview
  • "AWS Infrastructure and Autoscaling Monitoring/Alerting"
* An email group exists to be notified of any autoscaling activities (up or down). Contact jp at mozillafoundation.org to be added to this list.
* Cloudwatch in the AWS console is capable of monitoring many metrics and utilization metrics, including CPU usage or network usage for a group, database, server, or ELB. Not many alarms are triggered from this outside of to trigger scaling.
Most AWS infrastructure is monitored via New Relic. See the side menu options in New Relic for RDS, ELB, EC2, Elasticache, etc...