Confirmed users
2,317
edits
(infra failure example from philor) |
No edit summary |
||
Line 4: | Line 4: | ||
Some of the criteria used include: | Some of the criteria used include: | ||
* Broken build on an integration or main tree (e.g. mozilla-inbound, mozilla-central, | * Broken build on an integration or main tree (e.g. mozilla-inbound, mozilla-central, autoland) | ||
* Excessive backlog for builds or tests in any platform | * Excessive backlog for builds or tests in any platform | ||
* Infrastructure or systems failures that affect a significant number of tests or builds (e.g. AWS, data center, networking issues) | * Infrastructure or systems failures that affect a significant number of tests or builds (e.g. AWS, data center, networking issues) | ||
* Mass "bustage" that could hide other test failures (this is when code lands and causes multiple tests to fail across multiple chunks of tests or suites of tests, making it harder to catch further failures if something else lands *during* the period in which these tests are failing from the original code landing) | * Mass "bustage" that could hide other test failures (this is when code lands and causes multiple tests to fail across multiple chunks of tests or suites of tests, making it harder to catch further failures if something else lands *during* the period in which these tests are failing from the original code landing) | ||
* Infra failure that affects our ability to see what's happening (e.g. treeherder being down or not ingesting jobs or the data it consumes not being updated, or treestatus being broken so we're closed by default) | * Infra failure that affects our ability to see what's happening (e.g. treeherder being down or not ingesting jobs or the data it consumes not being updated, or treestatus being broken so we're closed by default) | ||
== Actions to take == | |||
Mostly this Tree-Closures are due failing Code (like Build Bustages, Test failures) and are this Tree Closures are normally fixed very fast. However we might have cases where we have longer Tree-Closures due to Infra Related Problems that are UN-planned and need deeper investigations by Teams like IT,Releng, Taskcluster etc | |||
For this cases we might need notifications to Developers to : | |||
* Avoid frustrations when people want to push to try and notice its not possible | |||
* Pending/Running Testruns (especially try) fails to whatever unrelated reasons and cause a Developer to spend time on issues that are not caused by her/his changes | |||
* Reduce hammering on Sheriffs why a tree is closed and whats the eta etc | |||
=== Steps to-do from the onduty Sheriff === | |||
1.) make clear that there is a outage and who is the current Sheriff with changing the topic of the #developers channel on IRC. Please add also the tracking bug and a possible eta time into the topic of #developers. | |||
example: | |||
Onduty-Sheriff: Tomcat - All Tree Closure - Database issue Bug 1234567 - no eta yet | |||
2. Also post to #treestatus about the tree closure | |||
3.) If the tree closures is expect to be a longer problem -> Post a short mail to the mozilla.dev.platform and dev-fxos <dev-fxos@lists.mozilla.org> newsgroup like https://groups.google.com/forum/#!topic/mozilla.dev.platform/Kzd1es4KiYA - also like here the next sheriff could if the issue is fixed sent a all clear information. |