Auto-tools/Projects/Stockwell/backfill-retrigger: Difference between revisions

 
(15 intermediate revisions by 2 users not shown)
Line 1: Line 1:
= finding bugs to work on =
= finding bugs to work on =
We have a [https://charts.mozilla.org/FreshOranges1/index.html fresh oranges dashboard] which looks like neglected oranges, except that it shows new failures that are high frequency and ignoring [stockwell infra] bugs.
We have a [https://charts.mozilla.org/FreshOranges/index.html fresh oranges dashboard] which looks like neglected oranges, except that it shows new failures that are high frequency and ignoring [stockwell infra] bugs.


As these are new bugs, there will be issues here that are infra or harness related, use this as an opportunity to annotate [stockwell infra] if it is build, taskcluster, network, machine related.  Otherwise rules are similar to disable-recommended- if a testcase is in the bugzilla summary, we should be able to retrigger and find the patch which caused this to become so frequent.
As these are new bugs, there will be issues here that are infra or harness related. Use this as an opportunity to annotate [stockwell infra] if it is build, taskcluster, network, machine related.  Otherwise, rules are similar to disable-recommended: If a test case is in the bugzilla summary, we should be able to retrigger and find the patch which caused this to become so frequent.
 
'''Skip test-verify bugs''': test-verify already repeats tests, and only runs tests which were modified on a push. There is no need to retrigger or backfill a test-verify failure.


= choosing a config to test =
= choosing a config to test =
Line 8: Line 10:


If there is not a clear winner, then consider a few factors which could help:
If there is not a clear winner, then consider a few factors which could help:
* debug typically provides more data, but takes longer
* debug typically provides more data than opt, but takes longer
* pgo is harder to backfill and builds take longer- try to avoid this
* pgo is harder to backfill and builds take longer: try to avoid this
* ccov/jsdcov builds/tests are only run on mozilla-central- avoid these configs
* ccov/jsdcov builds/tests are only run on mozilla-central: avoid these configs
* nightly is only run on mozilla-central- avoid these configs
* nightly is only run on mozilla-central: avoid these configs
* mac osx has a limited device pool- try to pick linux or windows
* mac osx has a limited device pool: try to pick linux or windows


= choosing a starting point =
= choosing a starting point =
Line 24: Line 26:


In many cases you will pick a different failure as the first point- I often like to pick the second instance of the branch/config so I can confirm multiple revisions show the failure (show a pattern).
In many cases you will pick a different failure as the first point- I often like to pick the second instance of the branch/config so I can confirm multiple revisions show the failure (show a pattern).
BEWARE - in many cases the first failure posted is not the earliest revision.  Timestamps in orangefactor are based on when the job was completed, not when the revision was pushed.
[[File:1-OF first failures.jpg|300px]]
The above example shows that windows 7 opt/pgo is common- I am picking win7-pgo on mozilla-inbound as it is where the pattern seems to be the most frequent.


= how to find which job to retrigger =
= how to find which job to retrigger =
Line 29: Line 37:


Picking the first job is easy- that is usually very obvious when choosing the config that you are running against and pulling up the revision to start with.  for example, it might be linux64/debug mochitest-browser-chrome-e10s-3.
Picking the first job is easy- that is usually very obvious when choosing the config that you are running against and pulling up the revision to start with.  for example, it might be linux64/debug mochitest-browser-chrome-e10s-3.
[[File:TH_filtered_view.jpg||500px]]
Note in the above picture we filter on |win pgo bc1| and then we need to click the '20' link for 20 more revisions
[[File:TH_history.jpg|500px]]
Note in the above picture we have bc1 available to retrigger on many revisions, you can see the specific error highlighted in the preview pane, and I have circled the 'retrigger' button


As a sanity check, I pull up the log file and search for the test name, it should show up as TEST-START, and then shortly after TEST-UNEXPECTED-FAIL.
As a sanity check, I pull up the log file and search for the test name, it should show up as TEST-START, and then shortly after TEST-UNEXPECTED-FAIL.
Line 40: Line 54:
* we do not run every job/chunk on every push, so it could be 30 failures in 75 data points
* we do not run every job/chunk on every push, so it could be 30 failures in 75 data points
* there could be retriggers on the existing data and we could have 3 or 4 failures on a few pushes making it failing less than 20%
* there could be retriggers on the existing data and we could have 3 or 4 failures on a few pushes making it failing less than 20%
[[File:TH_retriggered.jpg|500px]]
The above shows 20 retriggers (21 data points each) for the bc1 job.  40 would give us a clear pattern, but I wanted to save a few resources and make sure 20 retriggers would show an error and possibly a closer range.


= backfilling =
= backfilling =
Line 49: Line 66:


= what to do with the data =
= what to do with the data =
Once you have retriggered/backfilled a job, now you wait for it to finish.  opt tests usually finish in <30 minutes once they start running- debug can be up to 1 hour. 
When your initial tests finish, you might see a view like this:
[[File:TH_repeat.jpg|500px]]
Here you can see the 2-4 oranges per push.  Check each failure to make sure the same test is failing.  In the above case that is true and we need to go further back in history.
After repeating the process a few times, the root cause will become visible:
[[File:TH_rootcause.jpg|500px]]
You can see that we switched from bc1 -> bc2 as the failing test, so now the filter is on bc instead of bc1.  you can see a clear pattern of failures for every push and then almost no failures before the offending patch landed.
= exceptions and odd things =
= exceptions and odd things =
some common exceptions to watch out for:
some common exceptions to watch out for:
Line 59: Line 87:
* root cause looks like a merge, repeat on the other integration branch
* root cause looks like a merge, repeat on the other integration branch
* rarely but sometimes failures occur on mozilla-central landings, or as a result of code merging
* rarely but sometimes failures occur on mozilla-central landings, or as a result of code merging
* sometimes it is obvious from check-in messages (or TV failures) that the failing test case was modified on a certain push: If the test was modified around the time it started failing, that's suspicious and can be used as a short-cut to find the regressing changeset.
Confirmed users
1,759

edits