canmove, Confirmed users
112
edits
m (Apop moved page BuildDuty:QuarantineInstances to CIDuty:QuarantineInstances) |
m (added link to quarantine multiple instances) |
||
(One intermediate revision by the same user not shown) | |||
Line 1: | Line 1: | ||
= When and How to quarantine taskcluster instances = | ===== When and How to quarantine taskcluster instances ===== | ||
: 1. Choose the worker types you wish to investigate. You can find them [https://tools.taskcluster.net/provisioners/releng-hardware/worker-types/ here.] | : 1. Choose the worker types you wish to investigate. You can find them [https://tools.taskcluster.net/provisioners/releng-hardware/worker-types/ here.] | ||
Line 5: | Line 5: | ||
: 2. Check which instances have exception(orange) or failed(red) at task state and investigate each of them separately. | : 2. Check which instances have exception(orange) or failed(red) at task state and investigate each of them separately. | ||
[[File:Worker List.png| | [[File:Worker List.png|Worker List.png]] | ||
: | |||
===== Log analyses and quarantine the machine ===== | |||
: 1. If the last 4-5+ tests are problematic, be sure to check out a few of them. | |||
[[File:Test name.png]] | [[File:Test name.png]] | ||
:: Check public/logs/live_backing.log for errors on a few of the latest tests by going to <Test-Name> -> Run Artifacts -> public/logs/live_backing.log as shown below: | :: Check public/logs/live_backing.log for errors on a few of the latest tests by going to <Test-Name> -> Run Artifacts -> public/logs/live_backing.log as shown below: | ||
Line 16: | Line 18: | ||
[[File:Log location.png]] | [[File:Log location.png]] | ||
: 2. Judging by the error logs we will know if the machine is faulty (quarantine if this is true) or not. There is no correct answer for this, only that we will know it from experience. Thus far we know if the above conditions are met and the error log terminated with error code -1 and a message like : | |||
: | [[File:Error log.png]] | ||
: 3. Quarantine the machine for which all of the above is true by pressing the Quarantine button and leaving the default 1000 years as expiration date, as shown in below : | |||
[[File:Quarantine pic.png|center]] | |||
===== Bugzilla ===== | |||
: 1. Check if there is any bug opened for the affected machine on [https://bugzilla.mozilla.org/ Bugzilla], under CIDuty and/or Relops using the keywords | |||
ALL machine_name | |||
: 2. If there is a bug created, just update with a message that you have quarantined the machine and add the reason. | |||
: | : 3. If there is no bug created, file a bug in Bugzilla under [https://bugzilla.mozilla.org/enter_bug.cgi?product=Infrastructure%20%26%20Operations&component=RelOps%3A%20Hardware RelOps] | ||
===== Update Moonshot inventory ===== | |||
: | : 1. Update the [https://docs.google.com/spreadsheets/d/1IPTmppvqDw0PQV-O1LgXLJg_7TC-H_IAAnSxcur8c7I/edit?ts=5ad7748a#gid=562893333 Master Moonshot Inventory] spreadsheet with the bug number for the affected machine. | ||
Also check how to quarantine a machine or multiple machine using [[CIDuty/How_To/QuarantineMultipleInstances|taskcluster cli]] |