Auto-tools/Projects/Stockwell/backfill-retrigger: Difference between revisions

← Older edit

Auto-tools/Projects/Stockwell/backfill-retrigger (view source)

Revision as of 15:42, 29 March 2018

2,249 bytes added , 29 March 2018

→‎exceptions and odd things

Gbrown

Confirmed users

1,759

edits

@@ Line 1: / Line 1: @@
 = finding bugs to work on =
-We have a [https://charts.mozilla.org/FreshOranges1/index.html fresh oranges dashboard] which looks like neglected oranges, except that it shows new failures that are high frequency and ignoring [stockwell infra] bugs.
+We have a [https://charts.mozilla.org/FreshOranges/index.html fresh oranges dashboard] which looks like neglected oranges, except that it shows new failures that are high frequency and ignoring [stockwell infra] bugs.
-As these are new bugs, there will be issues here that are infra or harness related, use this as an opportunity to annotate [stockwell infra] if it is build, taskcluster, network, machine related.  Otherwise rules are similar to disable-recommended- if a testcase is in the bugzilla summary, we should be able to retrigger and find the patch which caused this to become so frequent.
+As these are new bugs, there will be issues here that are infra or harness related. Use this as an opportunity to annotate [stockwell infra] if it is build, taskcluster, network, machine related.  Otherwise, rules are similar to disable-recommended: If a test case is in the bugzilla summary, we should be able to retrigger and find the patch which caused this to become so frequent.
+'''Skip test-verify bugs''': test-verify already repeats tests, and only runs tests which were modified on a push. There is no need to retrigger or backfill a test-verify failure.
 = choosing a config to test =
@@ Line 8: / Line 10: @@
 If there is not a clear winner, then consider a few factors which could help:
-* debug typically provides more data, but takes longer
+* debug typically provides more data than opt, but takes longer
-* pgo is harder to backfill and builds take longer- try to avoid this
+* pgo is harder to backfill and builds take longer: try to avoid this
-* ccov/jsdcov builds/tests are only run on mozilla-central- avoid these configs
+* ccov/jsdcov builds/tests are only run on mozilla-central: avoid these configs
-* nightly is only run on mozilla-central- avoid these configs
+* nightly is only run on mozilla-central: avoid these configs
-* mac osx has a limited device pool- try to pick linux or windows
+* mac osx has a limited device pool: try to pick linux or windows
 = choosing a starting point =
@@ Line 24: / Line 26: @@
 In many cases you will pick a different failure as the first point- I often like to pick the second instance of the branch/config so I can confirm multiple revisions show the failure (show a pattern).
+BEWARE - in many cases the first failure posted is not the earliest revision.  Timestamps in orangefactor are based on when the job was completed, not when the revision was pushed.
+[[File:1-OF first failures.jpg|300px]]
+The above example shows that windows 7 opt/pgo is common- I am picking win7-pgo on mozilla-inbound as it is where the pattern seems to be the most frequent.
 = how to find which job to retrigger =
@@ Line 29: / Line 37: @@
 Picking the first job is easy- that is usually very obvious when choosing the config that you are running against and pulling up the revision to start with.  for example, it might be linux64/debug mochitest-browser-chrome-e10s-3.
+[[File:TH_filtered_view.jpg||500px]]
+ Note in the above picture we filter on |win pgo bc1| and then we need to click the '20' link for 20 more revisions
+[[File:TH_history.jpg|500px]]
+ Note in the above picture we have bc1 available to retrigger on many revisions, you can see the specific error highlighted in the preview pane, and I have circled the 'retrigger' button
 As a sanity check, I pull up the log file and search for the test name, it should show up as TEST-START, and then shortly after TEST-UNEXPECTED-FAIL.
@@ Line 40: / Line 54: @@
 * we do not run every job/chunk on every push, so it could be 30 failures in 75 data points
 * there could be retriggers on the existing data and we could have 3 or 4 failures on a few pushes making it failing less than 20%
+[[File:TH_retriggered.jpg|500px]]
+ The above shows 20 retriggers (21 data points each) for the bc1 job.  40 would give us a clear pattern, but I wanted to save a few resources and make sure 20 retriggers would show an error and possibly a closer range.
 = backfilling =
@@ Line 49: / Line 66: @@
 = what to do with the data =
+Once you have retriggered/backfilled a job, now you wait for it to finish.  opt tests usually finish in <30 minutes once they start running- debug can be up to 1 hour.
+When your initial tests finish, you might see a view like this:
+[[File:TH_repeat.jpg|500px]]
+ Here you can see the 2-4 oranges per push.  Check each failure to make sure the same test is failing.  In the above case that is true and we need to go further back in history.
+After repeating the process a few times, the root cause will become visible:
+[[File:TH_rootcause.jpg|500px]]
+ You can see that we switched from bc1 -> bc2 as the failing test, so now the filter is on bc instead of bc1.  you can see a clear pattern of failures for every push and then almost no failures before the offending patch landed.
 = exceptions and odd things =
 some common exceptions to watch out for:
@@ Line 59: / Line 87: @@
 * root cause looks like a merge, repeat on the other integration branch
 * rarely but sometimes failures occur on mozilla-central landings, or as a result of code merging
+* sometimes it is obvious from check-in messages (or TV failures) that the failing test case was modified on a certain push: If the test was modified around the time it started failing, that's suspicious and can be used as a short-cut to find the regressing changeset.