Confirmed users
1,989
edits
Line 174: | Line 174: | ||
= Rebooting workers in batches = | = Rebooting workers in batches = | ||
When many | When many workers are disconnected, e.g. after a network event, it is useful to be able to reboot many of them at one time. The various slave type subpages in slave health (e.g. [https://secure.pub.build.mozilla.org/builddata/reports/slave_health/slavetype.html?class=test&type=t-w732-ix t-w732-ix]) lets you do this via batch actions. | ||
Two batch actions are currently available: | Two batch actions are currently available: | ||
# Reboot all broken | # Reboot all broken workers - will reboot all workers that haven't reported a result in more than 6 hours | ||
# Reboot | # Reboot workers that have not reported in # minutes - allows you to specify the time cut-off used for rebooting workers. This is sometimes useful when you have many workers (or even a whole pool) that are failing to connect after, e.g., a network event, and you don't want to wait for them all to idle for 6 hours. | ||
'''Note:''' these actions don't care or check if the slave is currently running a job, only when the last job was run. As such, you *may* lose work-in-progress. However, if you're having issues across an entire pool, sometimes it is preferable to lose a few in-progress jobs to ensure the health of the larger pool. | '''Note:''' these actions don't care or check if the slave is currently running a job, only when the last job was run. As such, you *may* lose work-in-progress. However, if you're having issues across an entire pool, sometimes it is preferable to lose a few in-progress jobs to ensure the health of the larger pool. | ||
These actions use slaveapi to perform the reboots. You could also do this by hand by creating a text file containing the list of the | These actions use slaveapi to perform the reboots. You could also do this by hand by creating a text file containing the list of the workers you want to reboot (let's call it bad_workers.list), set MY_LDAP_USER and MY_LDAP_PASSWORD environment variables to your LDAP credentials, making sure you are on the VPN; and then run: | ||
cat | cat bad_workers.list | \ | ||
while read slave; do \ | while read slave; do \ | ||
curl -u "${MY_LDAP_USER}:${MY_LDAP_PASSWORD}" \ | curl -u "${MY_LDAP_USER}:${MY_LDAP_PASSWORD}" \ |