Sheriffing/How To/Intermittent bugs: Difference between revisions

Jump to navigation Jump to search
added creation of bugs for generic failure messages but starting to fail for test type
(explained test-verify backfill)
(added creation of bugs for generic failure messages but starting to fail for test type)
Line 19: Line 19:
**# The test will be run multiple times in a job TV-bf. If it fails for the later job but passes for the previous one, it is a strong indicator that the failure is related to changes of the push with the TV-bf failure.
**# The test will be run multiple times in a job TV-bf. If it fails for the later job but passes for the previous one, it is a strong indicator that the failure is related to changes of the push with the TV-bf failure.


= General failure messages - deciding if new bug needed =
Sometimes CI jobs only provide general failure message, e.g. [https://bugzilla.mozilla.org/show_bug.cgi?id=1411358 bug 1411358]: "Intermittent [taskcluster:error] Task timeout after 3600 seconds. Force killing container. / [taskcluster:error] Task timeout after 5400 seconds. Force killing container. / [taskcluster:error] Task timeout after 7200 seconds. Force killing container."
If job types start to fail with such a general failure message which didn't do that before and the bug for the general failure message is not new, a new bug only for that job type + failure message shall be created.
Example: Btup builds started to also fail intermittently with the message from [https://bugzilla.mozilla.org/show_bug.cgi?id=1411358 bug 1411358]. The logs for these jobs showed no output before the timeout got hit, often even for more than 40 minutes. [https://bugzilla.mozilla.org/show_bug.cgi?id=1480494 bug 1480494] got created and because the scope was only on that build type, investigation by developers started quickly.
The jobs [https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2018-08-03&endday=2018-08-10&tree=trunk&bug=1411358 classified as bug 1411358] - sort by "Test Suite" and look for Test Suite "opt" - showed the issue started on August 3rd while there had been many similar failure messages for other job types already before that.


= How to file a bug for an intermittent failure =
= How to file a bug for an intermittent failure =
Confirmed users
571

edits

Navigation menu