CIDuty:QuarantineInstances: Difference between revisions

Updated information on the page.
No edit summary
(Updated information on the page.)
Line 1: Line 1:
= When and How to quarantine taskcluster instances =
= When and How to quarantine taskcluster instances =


'''1. Choose the worker types you wish to investigate. You can find them [https://tools.taskcluster.net/provisioners/releng-hardware/worker-types/ here.]'''
: 1. Choose the worker types you wish to investigate. You can find them [https://tools.taskcluster.net/provisioners/releng-hardware/worker-types/ here.]


'''2. Check which instances have exception(orange) or failed(red) at task state and investigate each of them separately.'''
: 2. Check which instances have exception(orange) or failed(red) at task state and investigate each of them separately.


[[File:Worker List.png|left]]
[[File:Worker List.png|left]]


'''3. If the last 5-6+ tests are problematic, be sure to check out a few of them. '''
: 3. If the last 5-6+ tests are problematic, be sure to check out a few of them.


[[File:Test name.png]]
[[File:Test name.png]]




'''Check public/logs/live_backing.log for errors on a few of the latest tests by going to <Test-Name> -> Run Artifacts -> public/logs/live_backing.log as shown below :'''
:: Check public/logs/live_backing.log for errors on a few of the latest tests by going to <Test-Name> -> Run Artifacts -> public/logs/live_backing.log as shown below:


[[File:Log location.png]]
[[File:Log location.png]]




'''4. Judging by the error logs we will know if the machine is faulty (quarantine if this is true) or not. There is no black or white answer for this only that we will know it from experience. Thus far we know if the above conditions are met and the error log terminated with error code -1 and a message like :'''
: 4. Judging by the error logs we will know if the machine is faulty (quarantine if this is true) or not. There is no black or white answer for this only that we will know it from experience. Thus far we know if the above conditions are met and the error log terminated with error code -1 and a message like :


[[File:Error log.png]]
[[File:Error log.png]]




'''5. Quarantine all the instances for which all of the above is true by pressing the Quarantine button and leaving the default 1000 years as expiration date, as shown in this [https://irccloud.mozilla.com/file/JTN97Erw/image.png image.]''' 
: 5. Quarantine all the instances for which all of the above is true by pressing the Quarantine button and leaving the default 1000 years as expiration date, as shown in this [https://irccloud.mozilla.com/file/JTN97Erw/image.png image.]
 


: 6. File a bug in Bugzilla under RelOps e.g.: https://bugzilla.mozilla.org/show_bug.cgi?id=1441820


'''6. File a bug in Bugzilla under RelOps e.g.:''' https://bugzilla.mozilla.org/show_bug.cgi?id=1441820
: 7. Update the [https://docs.google.com/spreadsheets/d/1IPTmppvqDw0PQV-O1LgXLJg_7TC-H_IAAnSxcur8c7I/edit?ts=5ad7748a#gid=562893333 Master Moonshot Inventory] spreadsheet with the details for the bug (usually BUG:<NUMBER>]
Confirmed users
67

edits