TestEngineering/Performance/Talos/RegressionBugsHandling

From MozillaWiki
< TestEngineering‎ | Performance‎ | Talos
Revision as of 18:09, 9 February 2015 by Avih (talk | contribs) (minor edit)
Jump to navigation Jump to search

Handling of Performance Regression Bugs

In order to better understand and track our performance regressions, the performance team came up with the following system:

When a performance regression is detected and the patch which caused it is identified
  • Whoever landed the patch is responsible to drive a decision on whether or not the regression is acceptable.
  • Such decision should preferably involve a module owner/peer of the regressed module.
  • A decision has to be made within two weeks, otherwise the offending patch may be backed out.
If the decision was to try and fix the regression, preferably a time frame and an owner would be provided as well.
It's OK to decide to not fix a regression for valid reasons, including
  • There is other higher priority work, the patch's functionality is absolutely required, and no one else is available to investigate in the foreseeable future.
  • The magnitude of the regression is too small to warrant the effort required for an investigation.
  • The performance improvements from the patch outweigh the regression on the regressing test
  • The functionality provided by the patch is absolutely required (e.g. a security patch) or otherwise out-weighs the regression.
  • etc.
Invalid reasons for closing a regression bug
  • "I don't understand how my patch could regress this test"
    • If you think your patch is not to blame, we can re-trigger Talos test runs to confirm the before-and-after behavior. Talk to the person who filed the bug, they would be happy to help you figure out if the regression is real and if it is indeed your patch that is responsible.
    • It's up to the patch author to contact the right people (test authors, another team, etc) to figure out why the test regressed.
    • We're all responsible for the performance of the code we write, even if the regression is in an unrelated part of the codebase.
  • "I don't have time to deal with this right now"
    • This one can be valid (see "valid reasons" above). However, if the regression is quite large and you're pressed for time, consider backing the patch out and re-landing it when you have time to study its performance impact.
    • If the patch can't be backed out and you don't have time to work on it at the moment, make sure to file a follow-up bug to fix the regression, assign yourself and provide a timeline for the investigation & fixing. We should fix regressions before they reach Beta/Release users.

The bottom line is that we want a visible decision on each case where the cause for the regression is known.