Flume ElasticSearch WOO Maintenance Page
Flume WOO project facilitates realtime buildbot log ingestion inside HDFS/Hive and ElasticSearch via Flume. This page describes the different machines, installed software, and steps to restart services.
ElasticSearch cluster: elasticsearch1.metrics.sjc1.mozilla.com (master) elasticsearch2.metrics.sjc1.mozilla.com (slave) elasticsearch3.metrics.sjc1.mozilla.com (slave)
Symptom: Nagios ElasticSearch alert indicates one (or many) machines are down. Fix: Login to the relevant machine/s Kill all running elasticsearch processes (ps ax|grep elasticsearch) Restart the services in following order (elasticsearch1, elasticsearch2, elasticsearch3) Restart command: /usr/lib/es/bin/elasticsearch
Please email aphadke@mozilla.com, desinspanjer@mozilla.com if problem persists.
Flume cluster:
elasticsearch3.metrics.sjc1.mozilla.com (master)
elasticsearch4.metrics.sjc1.mozilla.com (node-collector)
elasticsearch5.metrics.sjc1.mozilla.com (node-agent)
Symptom: Nagios Flume alert indicates a given machine is down. Hostname: elasticsearch4.metrics.sjc1.mozilla.com Stop Flume: /usr/lib/flume/bin/flume-daemon.sh stop Confirm flume has stopped (ps ax|grep flume) else kill -9 the pid Start Flume: /usr/lib/flume/bin/flume-daemon.sh start node_nowatch -n elasticsearch4.metrics.sjc1.mozilla.com
Symptom: Nagios Flume alert indicates a given machine is down. Hostname: elasticsearch5.metrics.sjc1.mozilla.com Stop Flume: /usr/lib/flume/bin/flume-daemon.sh stop Confirm flume has stopped (ps ax|grep flume) else kill -9 the pid Start Flume: /usr/lib/flume/bin/flume-daemon.sh start node -n elasticsearch5.metrics.sjc1.mozilla.com
Symptom: Nagios Flume alert indicates a given machine is down.
Hostname: elasticsearch3.metrics.sjc1.mozilla.com
Resolution: Please email aphadke@mozilla.com (213-509-0575) or deinspanjer@mozilla.com. While we can restart Flume master, a master going down might indicate deeper problems. Given the infancy nature of flume, its best to investigate further before just restarting it.
Stop Flume: /usr/lib/flume/bin/flume-daemon.sh stop
Confirm flume has stopped (ps ax|grep flume) else kill -9 the pid
Start Flume: /usr/lib/flume/bin/flume-daemon.sh start master