Auto-tools/Projects/Stockwell/backfill-retrigger: Difference between revisions
(→choosing a config to test: format) |
m (→choosing a starting point: initial data) |
||
Line 11: | Line 11: | ||
= choosing a starting point = | = choosing a starting point = | ||
Ideally you want to pick the first instance of a failure and work backwards in time to find the root cause. | |||
In practice this can be confusing as we have multiple branches or sometimes different configs that fail at different times. | |||
I would look at the first 10 failures and weigh: | |||
* what branch is most common | |||
* where do the timestamps end up close to each other | |||
* is the most common config on the same branch and with close timestamps | |||
In many cases you will pick a different failure as the first point- I often like to pick the second instance of the branch/config so I can confirm multiple revisions show the failure (show a pattern). | |||
= how to find which job to retrigger = | = how to find which job to retrigger = | ||
= how many retriggers = | = how many retriggers = | ||
= what to do with the data = | = what to do with the data = | ||
= exceptions = | = exceptions = |
Revision as of 17:01, 14 March 2018
finding bugs to work on
choosing a config to test
It is best to look at the existing pattern of data you see when looking at all the starred instances. Typically when adding a comment to a bug while triaging it is normal to list the configurations that the failures are most frequent on. Usually pick the most frequency configuration, maybe if it is a tie for 2 choose both of those.
If there is not a clear winner, then consider a few factors which could help:
- debug typically provides more data, but takes longer
- pgo is harder to backfill and builds take longer- try to avoid this
- ccov/jsdcov builds/tests are only run on mozilla-central- avoid these configs
- nightly is only run on mozilla-central- avoid these configs
- mac osx has a limited device pool- try to pick linux or windows
choosing a starting point
Ideally you want to pick the first instance of a failure and work backwards in time to find the root cause. In practice this can be confusing as we have multiple branches or sometimes different configs that fail at different times.
I would look at the first 10 failures and weigh:
- what branch is most common
- where do the timestamps end up close to each other
- is the most common config on the same branch and with close timestamps
In many cases you will pick a different failure as the first point- I often like to pick the second instance of the branch/config so I can confirm multiple revisions show the failure (show a pattern).