TestEngineering/Performance/Sheriffing/Alerts: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
(platform_microbench description)
(more Autophone details)
Line 17: Line 17:
=== build_metrics ===
=== build_metrics ===
* short description: Monitor build times on multiple platforms, the size of the installers and other compiler-specific insights.
* short description: Monitor build times on multiple platforms, the size of the installers and other compiler-specific insights.
* frequency: little less than daily, around 5 alerts
* frequency: every 1-2 days, around 5 alerts
* contact: :froydnj, :ted.mielczarek, :gps
* contact: :froydnj, :ted.mielczarek, :gps
* [https://wiki.mozilla.org/Buildbot/Talos/Sheriffing/Tree_FAQ#What_is_coalescing coalesced] by SETA?: no (doesn't require backfilling)
* [https://wiki.mozilla.org/Buildbot/Talos/Sheriffing/Tree_FAQ#What_is_coalescing coalesced] by SETA?: no (shouldn't require backfilling)
* available on platforms:
* available on platforms:
** Windows: 32/64bit (OPT, No-OPT, Mingw builds)
** Windows: 32/64bit (OPT, No-OPT, Mingw builds)
Line 31: Line 31:
** build times often spike upwards for just a short time; they then lower to previous levels thanks to caching mechanisms set in place. We mark these as invalid alerts.
** build times often spike upwards for just a short time; they then lower to previous levels thanks to caching mechanisms set in place. We mark these as invalid alerts.
=== Autophone ===
=== Autophone ===
* short description: <provide it>
* frequency: every week or so, around 4 alerts
* contact: <mention Bob Clary>
* [https://wiki.mozilla.org/Buildbot/Talos/Sheriffing/Tree_FAQ#What_is_coalescing coalesced] by SETA?: no
* available on platforms:
** Android: 4.2, 4.4, 6.0, 7.1
* triaging specifics:
** when investigating, one should look for Android related changes
** many of these tests are pretty noisy; often, they turn out to be invalid (one reason is devices overheat, which affects tests)
** should consider to needinfo? Bob Clary, to check status of suspect phone devices
** tricky to investigate; should use Phonedash, as a more precise investigation tool
** retriggers are almost always needed, but results show up after a day or so
=== AWSY ===
=== AWSY ===
* short description: <provide it> [https://wiki.mozilla.org/AWSY/Tests link]
* short description: <provide it> [https://wiki.mozilla.org/AWSY/Tests link]

Revision as of 09:18, 16 January 2018

Perfherder alerts

General triage process

<how triaging is similar among all alerts>

Types of alerts

Talos

  • short description: <provide it> link
  • frequency: daily, little more than a dozen alerts
  • coalesced by SETA?: yes (often requires backfilling)
  • available on platforms:
    • Windows: 7 32bit, 10 64bit (OPT, PGO builds)
    • Linux: 64bit (OPT, PGO builds)
    • OS X: 10.10 (OPT builds only)
  • triaging specifics:

build_metrics

  • short description: Monitor build times on multiple platforms, the size of the installers and other compiler-specific insights.
  • frequency: every 1-2 days, around 5 alerts
  • contact: :froydnj, :ted.mielczarek, :gps
  • coalesced by SETA?: no (shouldn't require backfilling)
  • available on platforms:
    • Windows: 32/64bit (OPT, No-OPT, Mingw builds)
    • Linux: 32/64bit (OPT, No-OPT builds)
    • OS X: 10.10 (cross, no-cross builds)
    • Android: 4.0, 4.2, 5.0
  • triaging specifics:
    • often easy to investigate
    • most alerts aren't noisy
    • when investigating, one should look for build config changes <ask :gps to provide more data>
    • build times often spike upwards for just a short time; they then lower to previous levels thanks to caching mechanisms set in place. We mark these as invalid alerts.

Autophone

  • short description: <provide it>
  • frequency: every week or so, around 4 alerts
  • contact: <mention Bob Clary>
  • coalesced by SETA?: no
  • available on platforms:
    • Android: 4.2, 4.4, 6.0, 7.1
  • triaging specifics:
    • when investigating, one should look for Android related changes
    • many of these tests are pretty noisy; often, they turn out to be invalid (one reason is devices overheat, which affects tests)
    • should consider to needinfo? Bob Clary, to check status of suspect phone devices
    • tricky to investigate; should use Phonedash, as a more precise investigation tool
    • retriggers are almost always needed, but results show up after a day or so

AWSY

  • short description: <provide it> link
  • frequency: every 2-3 days, around half a dozen
  • contact: :erahm
  • coalesced by SETA?: yes
  • available on platforms:
    • Windows: 32/64bit (OPT, PGO builds)
    • Linux: 64bit (OPT builds)
    • OS X: 10.10 (OPT builds)
    • Android: 4.2, 4.3 (OPT builds)
  • triaging specifics:
    • retriggering/backfilling takes some time (>1h per test), so one must not abuse in collecting missing graph data

platform_microbench

  • short description: <provide it>
  • frequency: daily, around 1-2 dozen alerts
  • contact:
  • coalesced by SETA?: yes
  • available on platforms:
    • Linux: 32bit (OPT builds), 64bit (OPT, PGO, ASAN builds)
    • Windows: 7 32bit, 10 64bit (OPT builds)
    • OS X: 10.10 (OPT builds)
  • triaging specifics:
    • happen very often; unless triaged, they quickly pile up
    • very noisy alerts; often many of the alerts turn out to be invalid
    • cheap to retrigger, as each test takes <20min to finish; still, one should not abuse this