Revision as of 19:54, 23 October 2018

Sometimes AWS spins up bad instances. Usually sheriffs notifies ciduty about these but if you see one escalate to ciduty in #ci. A job may appear as failed if the instance it was running on disappears. Spot instances can disappear when they are outbid.

Bad Instances

To understand if a job failure is caused by a spot instance or not it's best to first understand the various ways a task can be resolved. See this page for more information.

When AWS spins up a bad instances (usually identified by the fact that it fails every job), find it in the worker explorer of AWS Provisioner and terminate it, AWS will spin up a new one. You can do this even while a task is running due to the built in mechanism for retrying jobs. To further understand the interaction between the queue and a worker, check out the official docs.

Revision as of 04:06, 14 October 2018 (view source) Zsolt (talk \| contribs) (Added more info) ← Older edit		Revision as of 19:54, 23 October 2018 (view source) Zsolt (talk \| contribs) (changed quarantine to terminate) Newer edit →
Line 4:		Line 4:
	To understand if a job failure is caused by a spot instance or not it's best to first understand the various ways a task can be resolved. See [https://docs.taskcluster.net/docs/reference/platform/taskcluster-queue/references/api#status this page] for more information.		To understand if a job failure is caused by a spot instance or not it's best to first understand the various ways a task can be resolved. See [https://docs.taskcluster.net/docs/reference/platform/taskcluster-queue/references/api#status this page] for more information.

	When AWS spins up a bad instances (usually identified by the fact that it fails every job), find it in the worker explorer of [https://tools.taskcluster.net/provisioners/aws-provisioner-v1/worker-types AWS Provisioner] and ~~quarantine~~ it~~. Its inactivity will cause the worker to be terminated and~~ AWS will spin up a new one. You can do this even while a task is running due to the built in mechanism for retrying jobs. To further understand the interaction between the queue and a worker, check out the [https://docs.taskcluster.net/docs/reference/platform/taskcluster-queue/docs/worker-interaction official docs].		When AWS spins up a bad instances (usually identified by the fact that it fails every job), find it in the worker explorer of [https://tools.taskcluster.net/provisioners/aws-provisioner-v1/worker-types AWS Provisioner] and terminate it, AWS will spin up a new one. You can do this even while a task is running due to the built in mechanism for retrying jobs. To further understand the interaction between the queue and a worker, check out the [https://docs.taskcluster.net/docs/reference/platform/taskcluster-queue/docs/worker-interaction official docs].

CIDuty/How To/Troubleshoot AWS: Difference between revisions

Revision as of 19:54, 23 October 2018

Bad Instances

Navigation menu

CIDuty/How To/Troubleshoot AWS: Difference between revisions

Revision as of 19:54, 23 October 2018

Bad Instances

Navigation menu

Search