Sheriffing/How To/Intermittent bugs: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
m (cleaned up bugfiler section)
m (made intro less ambiguous)
Line 1: Line 1:
{{Sheriffing How To|Intermittent bugs}}
{{Sheriffing How To|Intermittent bugs}}
When you find a test that fails more than once, but not every time, congratulations, you've uncovered an intermittent failure. This is the most annoying class of failures for both sheriffs and developers because it is not necessarily related to the code under test, but more likely indicates that the test itself might need to change to improve stability.
When you find a test that fails sometimes, you've hit an intermittent failure. This is the most annoying class of failures for both sheriffs and developers because it is not necessarily related to the code under test, but more likely indicates that the test itself might need to change to improve stability.


The test failure may have happened before and a bug may already be on file. In such cases, Treeherder should suggest the bug number and title under the failure, e.g.:
The test failure may have happened before and a bug may already be on file. In such cases, Treeherder should suggest the bug number and title under the failure, e.g.:

Revision as of 14:04, 16 July 2018

When you find a test that fails sometimes, you've hit an intermittent failure. This is the most annoying class of failures for both sheriffs and developers because it is not necessarily related to the code under test, but more likely indicates that the test itself might need to change to improve stability.

The test failure may have happened before and a bug may already be on file. In such cases, Treeherder should suggest the bug number and title under the failure, e.g.:

Treeherder suggestions

How to file a bug for an intermittent failure

If there's no bug on file, you'll need to file one.

Bugfiler

A click on the little bug icon beside a failure opens a tool called "bugfiler" that automates most of the manual steps but you shall open the log and copy the relevant lines for the failure (e.g. from last TEST-PASS to the failure, or including the stack trace below the failure) and paste them in Treeherder's bug filing form.

The are two requirements need to be included in the bug that this bug can be displayed automatically by Treeherder when this intermittent failure happens again:

  1. In the summary: Intermittent test_file test failure
  2. In the Keyword field choose the keyword: intermittent-failure

Manually filing a bug

Lets imagine there are issues with bugfiler or you can't use it for other reasons (e.g. security sensitive bug which should not be public) and you have a test failure like TEST-UNEXPECTED-TIMEOUT | /navigation-timing/test_timing_xserver_redirect.html | expected OK in Treeherder and there is no bug on file for this failure.

  1. Open the Treeherder Log
  2. Login into Bugzilla in a different tab/window
  3. Find the Product/Component where you need to file this bug (note: dxr and hg.mozilla.org can be very helpful if you are in doubt)
    1. Copy the file path from the failure line /navigation-timing/test_timing_xserver_redirect.html
    2. Find it in the repository, either with the search term 'path:/navigation-timing/test_timing_xserver_redirect.html' on DXR or '/navigation-timing/test_timing_xserver_redirect.html' in the right path filter field of searchfox. If you don't find anything, then there are still folders from outside the source folder in the path. Delete everything e.g. up to 'gecko' or 'build' and try again.
    3. Copy the full folder and file path, e.g. testing/web-platform/tests/navigation-timing/test_timing_xserver_redirect.html
    4. In the console with the mozilla-unified folder, run the following command to get the Bugzilla product and component in which bugs related to the file should be posted:

./mach file-info bugzilla-component testing/web-platform/tests/navigation-timing/test_timing_xserver_redirect.html
In this case, we get: Core :: DOM
testing/web-platform/tests/navigation-timing/test_timing_xserver_redirect.html

  1. Copy the failure text from the log window into the bug
  2. Set the Summary as: Intermittent navigation-timing/test_timing_xserver_redirect.html | expected OK
  3. In the keyword field choose intermittent-failure
  4. Submit the bug

The bug should look like https://bugzilla.mozilla.org/show_bug.cgi?id=1172135

Treeherder syncs with Bugzilla several times a day. Once your bug is added and the systems sync, Treeherder will suggest your new bug as a match for the next intermittent failure of this type.

Machine-specific failures

Machines can get into a bad or be in that from the start (e.g. bad memory). This will fall all or just more tests than usual, often in the same test type.

webgl ("gl") and reftests ("R") might fail because of dead pixels which can be far away from any content that gets rendered. In the following zoomed out example, the red rectangle is at the bottom and is a dead pixel which causes the test to fail.

reftest analyzer with highlighted dead pixel outside of area with content created for testing

Terminate the machine if you discover such an issue.