TestEngineering/Performance/Triage Process: Difference between revisions

Jump to navigation Jump to search
m
add initial draft of strategies
m (clarify goal of triage duty)
m (add initial draft of strategies)
Line 50: Line 50:
* P4 - Not used (reserved for bots)
* P4 - Not used (reserved for bots)
* P5 - Used for intermittent failures, or no intention to fix but will accept patches
* P5 - Used for intermittent failures, or no intention to fix but will accept patches
== Strategies for investigating intermittents ==
* Look for patterns in intermittent failures view (platforms, build types, tree etc)
** E.g. Investigate the intermittent logs associations with a grain of salt, as Code sheriffs may occasionally misattribute some failure logs.
** E.g. if 90% of failures happen on Android, and the rest on some desktop platforms, there’s a chance that desktop failures were incorrectly assigned.
* Recognise and mark duplicates as early as possible (example)
* Use generic intermittent bugs when there’s the case
** Simply ask a Code sheriff to group them (a needinfo? + some guidelines should suffice)
** Use this when you have lots of bugs covering the exact same underlying issue
*** Pick the oldest bug
*** Replace parts of the bug summary with <random>
*** Use this only for common patterns you notice
*** There are some risks involved here, especially if we’re not entirely sure about the underlying problem. Any mistake could hide other Raptor regressions.
*** Making this too generic can increase the failure rate for what seems to be a common culprit. Code sheriffs will then have more reasons to turn our tests off.
* Move anything related to crashes out of Raptor, as crashes relate to problems to Firefox itself & not to the test harness
** Figure out the right component
** Don’t rush & assume that the 1st frame of the crashing thread is the culprit, especially if its corresponding source code points to a header (*.h) file
** If indeed 1st frame isn’t the culprit, just go to the next frame from the logs.
** Note: most often, this is not a trivial task. So even if you end up to another source file, it’s still very likely that the problem happens a bit more up the stack. If you get blocked, request an engineer’s assistance. Follow his work until he identifies the right component, so you learn from his experience.
** How do you know which engineer to ask for assistance? By looking over the source file (you got stuck at) & figuring out its component (use Mercurial’s blame feature). With that component, you identify the team that likely has more knowledge over the problem. Contact it & ask someone there to assist you.


== FAQ ==
== FAQ ==
Confirmed users
378

edits

Navigation menu