CIDuty/How To/Troubleshoot AWS: Difference between revisions

Added a new paragraph
m (small typo)
(Added a new paragraph)
 
Line 1: Line 1:
Sometimes AWS spins up bad instances. Usually sheriffs notify ciduty about these but if you see one escalate to ciduty in #ci. A job may appear as failed if the instance it was running on disappears. Spot instances can disappear when they are outbid.  
Sometimes AWS spins up bad instances. Usually sheriffs notify ciduty about these but if you see one escalate to ciduty in #ci. A job may appear as failed if the instance it was running on disappears. Spot instances can disappear when they are outbid.  
= Figuring out if the instance is bad or not =
When jobs fail we have two things we should consider/look into. If the job failures are isolated incidents or they happen on a large number of instances from the worker pool (usually over 30%).
1. When it is an isolated incident check for other jobs the affected machine has run. If the rest of the jobs are green this means the instance isn't faulty, and the failure was probably caused by a bad build,config or a network issue. If all the jobs are failed or completed as exception, this is a sign that the instance is in a bad state and should be terminated.
2. If a large portion of the worker pool is affected the first step should be looking into the logs of the failed tests. The best bet is finding a common issue that is usually caused by infra, network or the tasks weren't properly configured.


= Bad Instances =  
= Bad Instances =  
Confirmed users
39

edits