CIDuty/How To/AWS Pending Test: Difference between revisions
(Created content) |
m (→Escalation: jhford doesn't works for mozilla any longer. I've removed him from the list.) |
||
Line 15: | Line 15: | ||
Sometimes the issue isn't as easy to figure out so pinging people is the nest best step: | Sometimes the issue isn't as easy to figure out so pinging people is the nest best step: | ||
For EU time-zone we have | For EU time-zone we have pmoore <br /> | ||
For US time-zone we have bstack, wcosta. | For US time-zone we have bstack, wcosta. |
Revision as of 14:33, 18 April 2019
What to do in case of high pending tests under an AWS worker pool
Sometimes AWS worker pool get overloaded with tests or simply we don't have enough workers of a specific pool. If this happens you will see an alert such as:
nagios1.private.releng.mdc1.mozilla.com:Pending tests is CRITICAL: CRITICAL Pending tests: 3589 on gecko-t-linux-xlarge.
When this happens the first step is to check if we are getting outbid. You can see this here. Look for the number of InsufficientInstanceCapacity instances belonging to the affected pool.
A second best step is to check papertrail. You can filter the logs after each worker type.
Escalation
Letting people know about the queue in #ci before starting with the steps above is always a good thing. If we are just missing workers or the number of jobs just keeps piling up, escalate to sheriffs so they can close trees until the queues go down and notify #ci that trees are closed because of InsufficientInstanceCapacity.
Sometimes the issue isn't as easy to figure out so pinging people is the nest best step:
For EU time-zone we have pmoore
For US time-zone we have bstack, wcosta.