The main focus of this page is to collect information around running *tests*, how to have a good comprehension, good metrics and determine the efficiency of the system.

Information about jobs

Non-running-tests wall time:

machine reboot time (if applicable)
runner (if applicable)
buildslave connecting to master assigning job
buildbot steps besides mozharness call
buildbot steps lag (due to master lagginess)
mozharness non-running-tests actions
- clobber
- download-and-extract
- checkout

We always reboot on Windows testers since runner isn't managing all the processes there. We also reboot after any android, emulator, mochitests or reftests, since those change the system state in ways we haven't been able to identify...the only way to get back to a known good state is to reboot.

Known bugs

We are currently experiencing lags introduced by masters
- reduce # of active jobs running on a master
- reduce # of buildbot steps
- reduce output
  - the reason this impacts step lag is that the log processing is happening over the same channel as the start/stop commands
  - can we make mozharness not output to stdio and make the log_uploader.py upload the Mozharness log and set log_url to it?
- send logs back to the master on bigger chunks (less interruptions of the masters)
- http://hg.mozilla.org/build/buildbotcustom/file/03644c855bb4/bin/log_uploader.py#l111
  - the data is somewhat structured already - that function serializes it out to the current format
bug 1209112 - Virtualenv cache always gets clobbered
bug 1208223 - We lack Mozharness metrics for test jobs (per-action)
We lack per Buildbot steps metrics
- We have some data on pulse but we don't know real elapsedTime
We don't have runner for Windows test jobs
- This would move clean up steps prior to Buildbot start up

Optimizations

Auditing

Evaluate which jobs can be combined or re-shuffled

Sources

http://activedata.allizom.org/tools/query.html#query_id=SDcCQmDR

buildbot_status    duration
exception              3473
failure             1353995
retry                107128
success           174430338
warnings            8688192

Buildbot master lags: dashboard
- The master lag is calculated by measuring the reported time of one of the initial steps that should be nearly instantaneous
- What is the impact on jobs?
Tree uptimes, end to end, branch load, time per push dashboard

Per buildbot step metrics - pulse stream

runner dashboard
- We only have support for Linux and Mac

User:Armenzg/Test pool efficiency

Contents

Information about jobs

Known bugs

Optimizations

Auditing

Sources