Devops/monitoring-alerting: Difference between revisions

Revision as of 05:12, 31 May 2014

Mozilla Foundation Monitoring & Alerting

===== TLDR ===== :

Performance and Most Infrastructure Monitoring in New Relic
New Relic Dashboards are a good way to get info fast (login required)
Load balancer health, database, and overall application healthchecks in Opsview (login required) (Public Viewport)
Many dashboards for traffic, metrics, performance, and monitoring on this Dashboard of Dashboards
For accounts, questions, or suggestions, email jp at mozillafoundation.org

MONITORING TOOLS, SYSTEMS, AND LINKS

Opsview, a Nagios clone with a much friendlier interface.

* Monitors and alerts when servers in load balancers are unhealthy

* Monitors and alerts on uptime/downtime of overall endpoints, such as https://webmaker.org

* Monitors and alerts on database utilization and downtime.

Important Opsview Links

Public Status Page

Current Unhandled Alerts (Login required)

Recent Alerts in Opsview

!!!TODO : Add the guide for notifications & contact settings

New Relic monitoring (Login Required)

* Watching application response time in browser and server side

* Watching database and web server utilization, transactions, timings, and throughput

* Watching load balancer (ELB) metrics

* Performing serverside and client-side tracing of long running transactions

* Overall endpoint monitoring, such as https://webmaker.org

* Watching cache server utilization and metrics

* Watching Elasticsearch server utilization and metrics

* Watching Mongo server utilization and metrics

* Marks and compares new/old deployed versions of software

!!!TODO : Add the guide for notifications & contact settings

Important New Relic Links

New Relic Dashboards

@@ Line 8: / Line 8: @@
 * For accounts, questions, or suggestions, email jp at mozillafoundation.org
-"MONITORING TOOLS, SYSTEMS, AND LINKS "
+'''MONITORING TOOLS, SYSTEMS, AND LINKS '''
-''Mozilla Foundation applications are monitored and measured in a number of systems:
-''
 * '''Opsview, a Nagios clone with a much friendlier interface.'''
 :: * Monitors and alerts when servers in load balancers are unhealthy
 :: * Monitors and alerts on uptime/downtime of overall endpoints, such as https://webmaker.org
 :: * Monitors and alerts on database utilization and downtime.
+::  '''Important Opsview Links'''
-::  "Important Opsview Links'
 :: [http://opsview.mofoprod.net:3000/viewport Public Status Page]
 :: [http://opsview.mofoprod.net:3000/status/service?filter=unhandled&order=state_desc&order=host&order=service&includeunhandledhosts=1 Current Unhandled Alerts (Login required)]
@@ Line 33: / Line 30: @@
 :: * Marks and compares new/old deployed versions of software
 :: !!!TODO : Add the guide for notifications & contact settings
 ::  '''Important New Relic Links'''
 :: [https://rpm.newrelic.com/accounts/255689/custom_dashboards/1695/pages New Relic Dashboards ]
@@ Line 43: / Line 39: @@
 * '''Log monitoring with [https://loggins.mofoprod.net Loggins (Kibana) (Login Required)]'''
-* "AWS Infrastructure and Autoscaling Monitoring/Alerting"
+* '''AWS Infrastructure and Autoscaling Monitoring/Alerting'''
 :: * An email group exists to be notified of any autoscaling activities (up or down).  Contact jp at mozillafoundation.org to be added to this list.
 :: * Cloudwatch in the AWS console is capable of monitoring many metrics and utilization metrics, including CPU usage or network usage for a group, database, server, or ELB.  Not many alarms are triggered from this outside of to trigger scaling.
 :: Most AWS infrastructure is monitored via New Relic.  See the side menu options in New Relic for RDS, ELB, EC2, Elasticache, etc...

Devops/monitoring-alerting: Difference between revisions

Revision as of 05:12, 31 May 2014

Mozilla Foundation Monitoring & Alerting

Navigation menu

Search