Auto-tools/Projects/Stockwell/backfill-retrigger: Difference between revisions

(→‎how many retriggers: - first image)
 
(8 intermediate revisions by 2 users not shown)
Line 1: Line 1:
= finding bugs to work on =
= finding bugs to work on =
We have a [https://charts.mozilla.org/FreshOranges1/index.html fresh oranges dashboard] which looks like neglected oranges, except that it shows new failures that are high frequency and ignoring [stockwell infra] bugs.
We have a [https://charts.mozilla.org/FreshOranges/index.html fresh oranges dashboard] which looks like neglected oranges, except that it shows new failures that are high frequency and ignoring [stockwell infra] bugs.


As these are new bugs, there will be issues here that are infra or harness related, use this as an opportunity to annotate [stockwell infra] if it is build, taskcluster, network, machine related.  Otherwise rules are similar to disable-recommended- if a testcase is in the bugzilla summary, we should be able to retrigger and find the patch which caused this to become so frequent.
As these are new bugs, there will be issues here that are infra or harness related. Use this as an opportunity to annotate [stockwell infra] if it is build, taskcluster, network, machine related.  Otherwise, rules are similar to disable-recommended: If a test case is in the bugzilla summary, we should be able to retrigger and find the patch which caused this to become so frequent.
 
'''Skip test-verify bugs''': test-verify already repeats tests, and only runs tests which were modified on a push. There is no need to retrigger or backfill a test-verify failure.


= choosing a config to test =
= choosing a config to test =
Line 8: Line 10:


If there is not a clear winner, then consider a few factors which could help:
If there is not a clear winner, then consider a few factors which could help:
* debug typically provides more data, but takes longer
* debug typically provides more data than opt, but takes longer
* pgo is harder to backfill and builds take longer- try to avoid this
* pgo is harder to backfill and builds take longer: try to avoid this
* ccov/jsdcov builds/tests are only run on mozilla-central- avoid these configs
* ccov/jsdcov builds/tests are only run on mozilla-central: avoid these configs
* nightly is only run on mozilla-central- avoid these configs
* nightly is only run on mozilla-central: avoid these configs
* mac osx has a limited device pool- try to pick linux or windows
* mac osx has a limited device pool: try to pick linux or windows


= choosing a starting point =
= choosing a starting point =
Line 64: Line 66:


= what to do with the data =
= what to do with the data =
Once you have retriggered/backfilled a job, now you wait for it to finish.  opt tests usually finish in <30 minutes once they start running- debug can be up to 1 hour. 
When your initial tests finish, you might see a view like this:
[[File:TH_repeat.jpg|500px]]
Here you can see the 2-4 oranges per push.  Check each failure to make sure the same test is failing.  In the above case that is true and we need to go further back in history.
After repeating the process a few times, the root cause will become visible:
[[File:TH_rootcause.jpg|500px]]
You can see that we switched from bc1 -> bc2 as the failing test, so now the filter is on bc instead of bc1.  you can see a clear pattern of failures for every push and then almost no failures before the offending patch landed.
= exceptions and odd things =
= exceptions and odd things =
some common exceptions to watch out for:
some common exceptions to watch out for:
Line 74: Line 87:
* root cause looks like a merge, repeat on the other integration branch
* root cause looks like a merge, repeat on the other integration branch
* rarely but sometimes failures occur on mozilla-central landings, or as a result of code merging
* rarely but sometimes failures occur on mozilla-central landings, or as a result of code merging
* sometimes it is obvious from check-in messages (or TV failures) that the failing test case was modified on a certain push: If the test was modified around the time it started failing, that's suspicious and can be used as a short-cut to find the regressing changeset.
Confirmed users
1,759

edits