Flume ElasticSearch WOO Maintenance Page: Difference between revisions
(Created page with "Flume WOO project facilitates realtime buildbot log ingestion inside HDFS/Hive and ElasticSearch via Flume. This page describes the different machines, installed software, and s...") |
No edit summary |
||
Line 1: | Line 1: | ||
Flume WOO project facilitates realtime buildbot log ingestion inside HDFS/Hive and ElasticSearch via Flume. | Flume WOO project facilitates realtime buildbot log ingestion inside HDFS/Hive and ElasticSearch via Flume. This page describes the different machines, installed software, and steps to restart services. | ||
This page describes the different machines, installed software, and steps to restart services. | |||
ElasticSearch cluster: | '''ElasticSearch cluster: '''<br>elasticsearch1.metrics.sjc1.mozilla.com (master) elasticsearch2.metrics.sjc1.mozilla.com (slave) elasticsearch3.metrics.sjc1.mozilla.com (slave) | ||
elasticsearch1.metrics.sjc1.mozilla.com (master) | |||
elasticsearch2.metrics.sjc1.mozilla.com (slave) | |||
elasticsearch3.metrics.sjc1.mozilla.com (slave) | |||
Symptom: Nagios ElasticSearch alert indicates one (or many) machines are down. | '''Symptom''': Nagios ElasticSearch alert indicates one (or many) machines are down. <br>'''Fix''': Login to the relevant machine/s. Kill all running elasticsearch processes (ps ax|grep elasticsearch)<br> Restart the services in following order ('''elasticsearch1, elasticsearch2, elasticsearch3''') <br>'''Restart command:''' /usr/lib/es/bin/elasticsearch | ||
Fix: Login to the relevant machine/s | |||
Kill all running elasticsearch processes (ps ax|grep elasticsearch) | |||
Restart the services in following order (elasticsearch1, elasticsearch2, elasticsearch3) | |||
Restart command: /usr/lib/es/bin/elasticsearch | |||
Please email aphadke@mozilla.com, desinspanjer@mozilla.com if problem persists. | Please email aphadke@mozilla.com, desinspanjer@mozilla.com if problem persists. | ||
<br> Flume cluster: elasticsearch3.metrics.sjc1.mozilla.com (master) elasticsearch4.metrics.sjc1.mozilla.com (node-collector) elasticsearch5.metrics.sjc1.mozilla.com (node-agent) | |||
Flume | Symptom: Nagios Flume alert indicates a given machine is down. Hostname: elasticsearch4.metrics.sjc1.mozilla.com Stop Flume: /usr/lib/flume/bin/flume-daemon.sh stop Confirm flume has stopped (ps ax|grep flume) else kill -9 the pid Start Flume: /usr/lib/flume/bin/flume-daemon.sh start node_nowatch -n elasticsearch4.metrics.sjc1.mozilla.com | ||
Symptom: Nagios Flume alert indicates a given machine is down. | Symptom: Nagios Flume alert indicates a given machine is down. Hostname: elasticsearch5.metrics.sjc1.mozilla.com Stop Flume: /usr/lib/flume/bin/flume-daemon.sh stop Confirm flume has stopped (ps ax|grep flume) else kill -9 the pid Start Flume: /usr/lib/flume/bin/flume-daemon.sh start node -n elasticsearch5.metrics.sjc1.mozilla.com | ||
Hostname: | |||
Stop Flume: /usr/lib/flume/bin/flume-daemon.sh stop | |||
Confirm flume has stopped (ps ax|grep flume) else kill -9 the pid | |||
Start Flume: /usr/lib/flume/bin/flume-daemon.sh start | |||
Symptom: Nagios Flume alert indicates a given machine is down. | <br> Symptom: Nagios Flume alert indicates a given machine is down. Hostname: elasticsearch3.metrics.sjc1.mozilla.com Resolution: Please email aphadke@mozilla.com (213-509-0575) or deinspanjer@mozilla.com. While we can restart Flume master, a master going down might indicate deeper problems. Given the infancy nature of flume, its best to investigate further before just restarting it. Stop Flume: /usr/lib/flume/bin/flume-daemon.sh stop Confirm flume has stopped (ps ax|grep flume) else kill -9 the pid Start Flume: /usr/lib/flume/bin/flume-daemon.sh start master | ||
Hostname: elasticsearch3.metrics.sjc1.mozilla.com | |||
Resolution: Please email aphadke@mozilla.com (213-509-0575) or deinspanjer@mozilla.com. While we can restart Flume master, a master going down might indicate deeper problems. Given the infancy nature of flume, its best to investigate further before just restarting it. | |||
Stop Flume: /usr/lib/flume/bin/flume-daemon.sh stop | |||
Confirm flume has stopped (ps ax|grep flume) else kill -9 the pid | |||
Start Flume: /usr/lib/flume/bin/flume-daemon.sh start master |
Revision as of 16:15, 7 February 2011
Flume WOO project facilitates realtime buildbot log ingestion inside HDFS/Hive and ElasticSearch via Flume. This page describes the different machines, installed software, and steps to restart services.
ElasticSearch cluster:
elasticsearch1.metrics.sjc1.mozilla.com (master) elasticsearch2.metrics.sjc1.mozilla.com (slave) elasticsearch3.metrics.sjc1.mozilla.com (slave)
Symptom: Nagios ElasticSearch alert indicates one (or many) machines are down.
Fix: Login to the relevant machine/s. Kill all running elasticsearch processes (ps ax|grep elasticsearch)
Restart the services in following order (elasticsearch1, elasticsearch2, elasticsearch3)
Restart command: /usr/lib/es/bin/elasticsearch
Please email aphadke@mozilla.com, desinspanjer@mozilla.com if problem persists.
Flume cluster: elasticsearch3.metrics.sjc1.mozilla.com (master) elasticsearch4.metrics.sjc1.mozilla.com (node-collector) elasticsearch5.metrics.sjc1.mozilla.com (node-agent)
Symptom: Nagios Flume alert indicates a given machine is down. Hostname: elasticsearch4.metrics.sjc1.mozilla.com Stop Flume: /usr/lib/flume/bin/flume-daemon.sh stop Confirm flume has stopped (ps ax|grep flume) else kill -9 the pid Start Flume: /usr/lib/flume/bin/flume-daemon.sh start node_nowatch -n elasticsearch4.metrics.sjc1.mozilla.com
Symptom: Nagios Flume alert indicates a given machine is down. Hostname: elasticsearch5.metrics.sjc1.mozilla.com Stop Flume: /usr/lib/flume/bin/flume-daemon.sh stop Confirm flume has stopped (ps ax|grep flume) else kill -9 the pid Start Flume: /usr/lib/flume/bin/flume-daemon.sh start node -n elasticsearch5.metrics.sjc1.mozilla.com
Symptom: Nagios Flume alert indicates a given machine is down. Hostname: elasticsearch3.metrics.sjc1.mozilla.com Resolution: Please email aphadke@mozilla.com (213-509-0575) or deinspanjer@mozilla.com. While we can restart Flume master, a master going down might indicate deeper problems. Given the infancy nature of flume, its best to investigate further before just restarting it. Stop Flume: /usr/lib/flume/bin/flume-daemon.sh stop Confirm flume has stopped (ps ax|grep flume) else kill -9 the pid Start Flume: /usr/lib/flume/bin/flume-daemon.sh start master