TestEngineering/Performance/Triage Process: Difference between revisions

TestEngineering/Performance/Triage Process (view source)

Revision as of 21:17, 28 April 2020

2,194 bytes added , 28 April 2020

m

add initial draft of strategies

Mjzffr

Confirmed users

378

edits

@@ Line 50: / Line 50: @@
 * P4 - Not used (reserved for bots)
 * P5 - Used for intermittent failures, or no intention to fix but will accept patches
+== Strategies for investigating intermittents ==
+* Look for patterns in intermittent failures view (platforms, build types, tree etc)
+** E.g. Investigate the intermittent logs associations with a grain of salt, as Code sheriffs may occasionally misattribute some failure logs.
+** E.g. if 90% of failures happen on Android, and the rest on some desktop platforms, there’s a chance that desktop failures were incorrectly assigned.
+* Recognise and mark duplicates as early as possible (example)
+* Use generic intermittent bugs when there’s the case
+** Simply ask a Code sheriff to group them (a needinfo? + some guidelines should suffice)
+** Use this when you have lots of bugs covering the exact same underlying issue
+*** Pick the oldest bug
+*** Replace parts of the bug summary with <random>
+*** Use this only for common patterns you notice
+*** There are some risks involved here, especially if we’re not entirely sure about the underlying problem. Any mistake could hide other Raptor regressions.
+*** Making this too generic can increase the failure rate for what seems to be a common culprit. Code sheriffs will then have more reasons to turn our tests off.
+* Move anything related to crashes out of Raptor, as crashes relate to problems to Firefox itself & not to the test harness
+** Figure out the right component
+** Don’t rush & assume that the 1st frame of the crashing thread is the culprit, especially if its corresponding source code points to a header (*.h) file
+** If indeed 1st frame isn’t the culprit, just go to the next frame from the logs.
+** Note: most often, this is not a trivial task. So even if you end up to another source file, it’s still very likely that the problem happens a bit more up the stack. If you get blocked, request an engineer’s assistance. Follow his work until he identifies the right component, so you learn from his experience.
+** How do you know which engineer to ask for assistance? By looking over the source file (you got stuck at) & figuring out its component (use Mercurial’s blame feature). With that component, you identify the team that likely has more knowledge over the problem. Contact it & ask someone there to assist you.
 == FAQ ==

TestEngineering/Performance/Triage Process: Difference between revisions

TestEngineering/Performance/Triage Process (view source)

Revision as of 21:17, 28 April 2020

Navigation menu

Search