Devops/monitoring-alerting: Difference between revisions
< Devops
Jump to navigation
Jump to search
Line 8: | Line 8: | ||
* For accounts, questions, or suggestions, email jp at mozillafoundation.org | * For accounts, questions, or suggestions, email jp at mozillafoundation.org | ||
'''MONITORING TOOLS, SYSTEMS, AND LINKS ''' | |||
'' | |||
* '''Opsview, a Nagios clone with a much friendlier interface.''' | * '''Opsview, a Nagios clone with a much friendlier interface.''' | ||
:: * Monitors and alerts when servers in load balancers are unhealthy | :: * Monitors and alerts when servers in load balancers are unhealthy | ||
:: * Monitors and alerts on uptime/downtime of overall endpoints, such as https://webmaker.org | :: * Monitors and alerts on uptime/downtime of overall endpoints, such as https://webmaker.org | ||
:: * Monitors and alerts on database utilization and downtime. | :: * Monitors and alerts on database utilization and downtime. | ||
:: '''Important Opsview Links''' | |||
:: | |||
:: [http://opsview.mofoprod.net:3000/viewport Public Status Page] | :: [http://opsview.mofoprod.net:3000/viewport Public Status Page] | ||
:: [http://opsview.mofoprod.net:3000/status/service?filter=unhandled&order=state_desc&order=host&order=service&includeunhandledhosts=1 Current Unhandled Alerts (Login required)] | :: [http://opsview.mofoprod.net:3000/status/service?filter=unhandled&order=state_desc&order=host&order=service&includeunhandledhosts=1 Current Unhandled Alerts (Login required)] | ||
Line 33: | Line 30: | ||
:: * Marks and compares new/old deployed versions of software | :: * Marks and compares new/old deployed versions of software | ||
:: !!!TODO : Add the guide for notifications & contact settings | :: !!!TODO : Add the guide for notifications & contact settings | ||
:: '''Important New Relic Links''' | :: '''Important New Relic Links''' | ||
:: [https://rpm.newrelic.com/accounts/255689/custom_dashboards/1695/pages New Relic Dashboards ] | :: [https://rpm.newrelic.com/accounts/255689/custom_dashboards/1695/pages New Relic Dashboards ] | ||
Line 43: | Line 39: | ||
* '''Log monitoring with [https://loggins.mofoprod.net Loggins (Kibana) (Login Required)]''' | * '''Log monitoring with [https://loggins.mofoprod.net Loggins (Kibana) (Login Required)]''' | ||
* | * '''AWS Infrastructure and Autoscaling Monitoring/Alerting''' | ||
:: * An email group exists to be notified of any autoscaling activities (up or down). Contact jp at mozillafoundation.org to be added to this list. | :: * An email group exists to be notified of any autoscaling activities (up or down). Contact jp at mozillafoundation.org to be added to this list. | ||
:: * Cloudwatch in the AWS console is capable of monitoring many metrics and utilization metrics, including CPU usage or network usage for a group, database, server, or ELB. Not many alarms are triggered from this outside of to trigger scaling. | :: * Cloudwatch in the AWS console is capable of monitoring many metrics and utilization metrics, including CPU usage or network usage for a group, database, server, or ELB. Not many alarms are triggered from this outside of to trigger scaling. | ||
:: Most AWS infrastructure is monitored via New Relic. See the side menu options in New Relic for RDS, ELB, EC2, Elasticache, etc... | :: Most AWS infrastructure is monitored via New Relic. See the side menu options in New Relic for RDS, ELB, EC2, Elasticache, etc... |
Revision as of 05:12, 31 May 2014
Mozilla Foundation Monitoring & Alerting
===== TLDR ===== :
- Performance and Most Infrastructure Monitoring in New Relic
- New Relic Dashboards are a good way to get info fast (login required)
- Load balancer health, database, and overall application healthchecks in Opsview (login required) (Public Viewport)
- Many dashboards for traffic, metrics, performance, and monitoring on this Dashboard of Dashboards
- For accounts, questions, or suggestions, email jp at mozillafoundation.org
MONITORING TOOLS, SYSTEMS, AND LINKS
- Opsview, a Nagios clone with a much friendlier interface.
- * Monitors and alerts when servers in load balancers are unhealthy
- * Monitors and alerts on uptime/downtime of overall endpoints, such as https://webmaker.org
- * Monitors and alerts on database utilization and downtime.
- Important Opsview Links
- Public Status Page
- Current Unhandled Alerts (Login required)
- Recent Alerts in Opsview
- !!!TODO : Add the guide for notifications & contact settings
- New Relic monitoring (Login Required)
- * Watching application response time in browser and server side
- * Watching database and web server utilization, transactions, timings, and throughput
- * Watching load balancer (ELB) metrics
- * Performing serverside and client-side tracing of long running transactions
- * Overall endpoint monitoring, such as https://webmaker.org
- * Watching cache server utilization and metrics
- * Watching Elasticsearch server utilization and metrics
- * Watching Mongo server utilization and metrics
- * Marks and compares new/old deployed versions of software
- !!!TODO : Add the guide for notifications & contact settings
- Important New Relic Links
- New Relic Dashboards
- Recent New Relic Alerts
- New Relic Applications Overview
- Recent Deployments
- Browser / Front-end Performance Overview
- Log monitoring with Loggins (Kibana) (Login Required)
- AWS Infrastructure and Autoscaling Monitoring/Alerting
- * An email group exists to be notified of any autoscaling activities (up or down). Contact jp at mozillafoundation.org to be added to this list.
- * Cloudwatch in the AWS console is capable of monitoring many metrics and utilization metrics, including CPU usage or network usage for a group, database, server, or ELB. Not many alarms are triggered from this outside of to trigger scaling.
- Most AWS infrastructure is monitored via New Relic. See the side menu options in New Relic for RDS, ELB, EC2, Elasticache, etc...