Sheriffing/Manifest Scheduling: Difference between revisions

Revision as of 17:59, 16 July 2020

The "manifest scheduling" project is a major shift in Mozilla's CI system. It aims to reduce costs and improve regression detection by only running the exact tests that we need.

Changes

Under the old system, the CI roughly performs these steps:

Compute all tests
Split them across a hardcoded number of tasks (i.e total chunks)
Figure out which tests should run
Schedule the tasks that contain at least one of the tests we want to run

The major downside to the above system is that each task that contains a test we care about, also contains a whole lot of tests we don't care about. With "manifest scheduling" enabled, the steps now become:

Compute the tests we care about
Figure out how many chunks it would take to run them given a hardcoded time interval
Split them across said chunks
Schedule all chunks

With this new method, we *only* schedule the exact set of manifests that we have deemed important. This should represent a huge improvement in CI efficiency.

Sheriffing Implications

The benefits of "manifest scheduling" are fairly clear, but there are several drawbacks as well. Most of which are related to sheriffing.

Push Continuity

The main issue is that under "manifest scheduling" the same mochitest-1 task on push A, will run a completely different set of tests as it does on push B. In other words, it will no longer be possible to filter Treeherder by task label to identify test-level regressions (though it should still work for many types of infra related issues). Instead, sheriffs will need to filter Treeherder by "test path". Read this blog post for details of the feature.

UI showing active filter for a test path

Push showing tasks that executed the same test path

Backfills

Another major issue is backfilling. If tasks run different sets of tests on different pushes, then that will break the backfill action (as we'll need to make sure the exact same set of manifests were scheduled on each backfill push).

The backfill action can automatically detect if "manifest scheduling" was used for the task. If not it will perform the normal standard backfill sheriffs are used to. If so, it will run the same set of manifests from the originating push on all of the backfilled pushes. Because it's possible to run more than one backfill at a time, we need a way to identify which tasks were backfilled from where. To that end, the symbols of backfilled tasks have been changed to something like `<group>-bk(<symbol>-<rev>-bk)`. For example, if `M-fis(bc3)` was backfilled from revision `abcdef`, then the symbol for the backfill tasks would be `M-fis-bk(bc3-abcdef-bk)`. This notifies sheriffs that the task was backfilled starting at revision `abcdef` and contains the same set of test manifests as on that push.

UI showing a failed task, two backfill requests and few retriggered tasks

You can filter out a task you're backfilling and all backfilled tasks by selecting the task and selecting "Filter jobs containing these keywords" (text that shows once you hover the link). See the screenshot below for the location in the UI.

Link to filter tasks and backfilled tasks

Other than being aware of this change, there shouldn't be any differences in performing a backfill.

You can read this blog post for more details.

Add New Jobs

There is currently no way to specify test manifests when adding new jobs via Treeherder's "Add New Jobs" UI. This means that it can't be used to fill in tasks for the purpose of bisecting a regression. There are plans to add this feature in the future.

Retriggers

Retriggers should remain unaffected. Though in the future there are plans to add an action that only retriggers the manifests that failed. This may or may not become the default.

Intermittent Risk

One risk of "manifest scheduling" is since tests are no longer running in a deterministic order, we may get more bad interactions between tests that produce intermittents. It's possible we see many new intermittent bugs after "manifest scheduling" is enabled (though by the same logic, these new intermittents should also be way less frequent).

To mitigate this risk, we have only enabled "manifest scheduling" for suites that have the "run-by-manifest" feature. That is the harness restarts Firefox in between each new manifest. In practice, this seems to reduce nearly all of these "bad interaction" types of failures. Though it is still certainly possible for tests to influence one another even across restarts (e.g if they write stuff to disk outside of the profile for instance).

@@ Line 18: / Line 18: @@
 With this new method, we *only* schedule the exact set of manifests that we have deemed important. This should represent a huge improvement in CI efficiency.
 == Sheriffing Implications ==
@@ Line 26: / Line 25: @@
 === Push Continuity ===
-The main issue is that under "manifest scheduling" the same mochitest-1 task on push A, will run a completely different set of tests as it does on push B. In other words, it will no longer be possible to filter treeherder by task label to identify test-level regressions (though it should still work for many types of infra related issues). Instead, sheriffs will need to filter treeherder by "test path":
+The main issue is that under "manifest scheduling" the same mochitest-1 task on push A, will run a completely different set of tests as it does on push B. In other words, it will no longer be possible to filter Treeherder by task label to identify test-level regressions (though it should still work for many types of infra related issues). Instead, sheriffs will need to filter Treeherder by "test path". Read this [https://medium.com/@armenzg/filter-treeherder-jobs-by-test-or-manifest-path-af0e1ae74e61 blog post] for details of the feature.
-TODO: insert treeherder test path filtering instructions / demo
+[[File:Test path.png|thumb|UI showing active filter for a test path]]
+[[File:Filtered tasks.png|thumb|Push showing tasks that executed the same test path]]
 === Backfills ===
-Another major issue is backfilling. If tasks run different sets tests on different pushes, then that will break the backfill action (as we'll need to make sure the exact same set of manifests were scheduled on each backfill push).
+Another major issue is backfilling. If tasks run different sets of tests on different pushes, then that will break the backfill action (as we'll need to make sure the exact same set of manifests were scheduled on each backfill push).
+The backfill action can automatically detect if "manifest scheduling" was used for the task. If not it will perform the normal standard backfill sheriffs are used to. If so, it will run the same set of manifests from the originating push on all of the backfilled pushes. Because it's possible to run more than one backfill at a time, we need a way to identify which tasks were backfilled from where. To that end, the symbols of backfilled tasks have been changed to something like `<group>-bk(<symbol>-<rev>-bk)`. For example, if `M-fis(bc3)` was backfilled from revision `abcdef`, then the symbol for the backfill tasks would be `M-fis-bk(bc3-abcdef-bk)`. This notifies sheriffs that the task was backfilled starting at revision `abcdef` and contains the same set of test manifests as on that push.
+[[File:Backfilled tasks.png|thumb|UI showing a failed task, two backfill requests and few retriggered tasks]]
-Luckily Armen has been working on this and has landed a fix. The backfill action should now automatically detect if "manifest scheduling" was used for the task. If not it will perform the normal standard backfill sheriffs are used to. If so, it will run the same set of manifests from the originating push on all of the backfill pushes. Because it's possible to run more than one backfill at a time, we need a way to identify which tasks were backfilled from where. To that end, the symbols of backfilled tasks have been changed to something like `<group>-bk(<symbol>-<rev>-bk)`. For example, if `M-fis(bc3)` was backfilled from revision `abcdef`, then the symbol for the backfill tasks would be `M-fis-bk(bc3-abcdef-bk)`. This notifies sheriffs that the task was backfilled starting at revision `abcdef` and contains the same set of test manifests as on that push.
+You can filter out a task you're backfilling and all backfilled tasks by selecting the task and selecting "Filter jobs containing these keywords" (text that shows once you hover the link). See the screenshot below for the location in the UI.
+[[File:Filter task and backfilled.png|thumb|Link to filter tasks and backfilled tasks]]
 Other than being aware of this change, there shouldn't be any differences in performing a backfill.
+You can read this [https://medium.com/@armenzg/new-backfill-action-26788d0db81a blog post] for more details.
 === Add New Jobs ===
-There is currently no way to specify test manifests when adding new jobs via treeherder's "Add New Jobs" UI. This means that it can't be used to fill in tasks for the purpose of bisecting a regression. There are plans to add this feature in the future.
+There is currently no way to specify test manifests when adding new jobs via Treeherder's "Add New Jobs" UI. This means that it can't be used to fill in tasks for the purpose of bisecting a regression. There are plans to add this feature in the future.
 === Retriggers ===
 Retriggers should remain unaffected. Though in the future there are plans to add an action that only retriggers the manifests that failed. This may or may not become the default.
 == Intermittent Risk ==
@@ Line 53: / Line 58: @@
 One risk of "manifest scheduling" is since tests are no longer running in a deterministic order, we may get more bad interactions between tests that produce intermittents. It's possible we see many new intermittent bugs after "manifest scheduling" is enabled (though by the same logic, these new intermittents should also be way less frequent).
-To mitigate this risk, we have only enabled "manifest scheduling" for suites that have the "run-by-manifest" feature. That is the harness restarts Firefox in between each new manifest. In practice this seems to reduce nearly all of these "bad interaction" type of failures. Though it is still certainly possible for tests to influence one another even across restarts (e.g if they write stuff to disk outside of the profile for instance).
+To mitigate this risk, we have only enabled "manifest scheduling" for suites that have the "run-by-manifest" feature. That is the harness restarts Firefox in between each new manifest. In practice, this seems to reduce nearly all of these "bad interaction" types of failures. Though it is still certainly possible for tests to influence one another even across restarts (e.g if they write stuff to disk outside of the profile for instance).

Sheriffing/Manifest Scheduling: Difference between revisions

Revision as of 17:59, 16 July 2020

Contents

Changes