ReleaseEngineering/How To/Handle an Idle Slave: Difference between revisions

Updated the instructions on how to deal with an idle slave
No edit summary
(Updated the instructions on how to deal with an idle slave)
Line 1: Line 1:
{{Release Engineering How To|Handle an Idle Slave}}
{{Release Engineering How To|Handle an Idle Slave}}
We are working toward not having slaves go idle, but progress is slow.
* check SlaveHealth dashboard for idle machines
 
** https://secure.pub.build.mozilla.org/builddata/reports/slave_health/index.html
The nagios alert is currently set for four days.  The current strategy is just to reboot them, which should buy another 4 hours and flush any oddities in the master/slave communication:
** any machine that hasn't taken a job for a time period > 5 hours is considered "idle" and the row color for the corresponding machine will turn orange.
 
* check buildbot.tac to ensure the slave is enabled and see which master it's on.
* check buildbot.tac to ensure the slave is enabled and see which master it's on.
* tail twistd.log to see if the slave is really idle.  In most cases, the slave has been idle since it started up.  Note any differences.
* tail twistd.log to see if the slave is really idle.  In most cases, the slave has been idle since it started up.  Note any differences.
* Reboot.  Don't use `buildbot restart`, since it won't work on macs!
* Reboot.  Don't use `buildbot restart`, since it won't work on macs!
148

edits