CIDuty/How To/High Pending Counts: Difference between revisions

CIDuty/How To/High Pending Counts (view source)

Revision as of 20:11, 8 August 2017

1 byte added , 8 August 2017

m

→‎Rebooting workers in batches

Kmoir

Confirmed users

1,989

edits

@@ Line 174: / Line 174: @@
 = Rebooting workers in batches =
-When many slaves are disconnected, e.g. after a network event, it is useful to be able to reboot many of them at one time. The various slave type subpages in slave health (e.g. [https://secure.pub.build.mozilla.org/builddata/reports/slave_health/slavetype.html?class=test&type=t-w732-ix t-w732-ix]) lets you do this via batch actions.
+When many workers are disconnected, e.g. after a network event, it is useful to be able to reboot many of them at one time. The various slave type subpages in slave health (e.g. [https://secure.pub.build.mozilla.org/builddata/reports/slave_health/slavetype.html?class=test&type=t-w732-ix t-w732-ix]) lets you do this via batch actions.
 Two batch actions are currently available:
-# Reboot all broken slaves - will reboot all slaves that haven't reported a result in more than 6 hours
+# Reboot all broken workers - will reboot all workers that haven't reported a result in more than 6 hours
-# Reboot slaves that have not reported in # minutes - allows you to specify the time cut-off used for rebooting slaves. This is sometimes useful when you have many slaves (or even a whole pool) that are failing to connect after, e.g., a network event, and you don't want to wait for them all to idle for 6 hours.
+# Reboot workers that have not reported in # minutes - allows you to specify the time cut-off used for rebooting workers. This is sometimes useful when you have many workers (or even a whole pool) that are failing to connect after, e.g., a network event, and you don't want to wait for them all to idle for 6 hours.
 '''Note:''' these actions don't care or check if the slave is currently running a job, only when the last job was run. As such, you *may* lose work-in-progress. However, if you're having issues across an entire pool, sometimes it is preferable to lose a few in-progress jobs to ensure the health of the larger pool.
-These actions use slaveapi to perform the reboots. You could also do this by hand by creating a text file containing the list of the slaves you want to reboot (let's call it naughty_slaves.list), set MY_LDAP_USER and MY_LDAP_PASSWORD environment variables to your LDAP credentials, making sure you are on the VPN; and then run:
+These actions use slaveapi to perform the reboots. You could also do this by hand by creating a text file containing the list of the workers you want to reboot (let's call it bad_workers.list), set MY_LDAP_USER and MY_LDAP_PASSWORD environment variables to your LDAP credentials, making sure you are on the VPN; and then run:
-  cat naughty_slaves.list | \
+  cat bad_workers.list | \
     while read slave; do \
       curl -u "${MY_LDAP_USER}:${MY_LDAP_PASSWORD}" \

CIDuty/How To/High Pending Counts: Difference between revisions

CIDuty/How To/High Pending Counts (view source)

Revision as of 20:11, 8 August 2017

Navigation menu

Search