Talos Tests

Where to get this information

Talos tests are defined in http://hg.mozilla.org/build/talos/file/tip/talos/test.py
TBPL abbreviations are defined in http://hg.mozilla.org/users/mstange_themasta.com/tinderboxpushlog/file/tip/js/Config.js#l302
Perf-o-matic names are defined in http://hg.mozilla.org/graphs/file/tip/sql/data.sql
Talos suites are configured for production in http://hg.mozilla.org/build/buildbot-configs/file/tip/mozilla-tests/config.py; these names are mapped to TBPL via regexes: http://hg.mozilla.org/users/mstange_themasta.com/tinderboxpushlog/file/e2e344885c80/js/Data.js#l512
Inline help is available by running talos --print-tests to get a list of all tests and their descriptions. To get help on a subset of tests, including their run-time parameters, use e.g. talos --print-tests -a ts:tsvg [options]

A table detailing information flow from buildbot to talos to TBPL and graphserver is available at http://k0s.org:8080/ . This is generated with the talosnames script, as detailed in http://k0s.org/mozilla/blog/20120724135349 . See also bug 770460.

Talos Test Types

There are two different species of Talos tests:

#Startup Tests : start up the browser and wait for either the load event or the paint event and exit, measuring the time
#Page Load Tests : load a manifest of pages

Startup Tests

Startup tests launch Firefox and measure the time to the onload or paint events. Firefox is invoked with a URL to:

http://hg.mozilla.org/build/talos/file/tip/talos/startup_test/startup_test.html for the onload event
http://hg.mozilla.org/build/talos/file/tip/talos/startup_test/tspaint_test.html for the paint event

Page Load Tests

Many of the talos tests use the page loader to load a manifest of pages. These are tests that load a specific page and measure the time it takes to load the page, scroll the page, draw the page etc. In order to run a page load test, you need a manifest of pages to run. The manifest is simply a list of URLs of pages to load, separated by carriage returns, e.g.:

http://www.mozilla.org
http://www.mozilla.com

Example: http://hg.mozilla.org/build/talos/file/tip/talos/page_load_test/svg/svg.manifest

Manifests may also specify that a test computes its own data by prepending a % in front of the line:

% http://www.mozilla.org
% http://www.mozilla.com

Example: http://hg.mozilla.org/build/talos/file/tip/talos/page_load_test/v8_7/v8.manifest

The file you created should be referenced in your config file, for example, open sample.config, and look for the line referring to the test you want to run:

- name: tp4
url: '-tp page_load_test/tp4.manifest -tpchrome -tpnoisy -tpformat tinderbox -tpcycles 10'

-tp controls the location of your manifest
-tpchrome tells Talos to run the browser with the normal browser UI active
-tpnoisy means "generate lots of output"
-tpformat controls the format of the results, they default to the results we send to displays like graphserver and tbpl.
-tpcycles controls the number of times we run the entire test.

Paint Tests

Paint tests are measuring the time to receive both the MozAfterPaint and OnLoad event instead of just the OnLoad event.

Currently we run _paint tests for these tests:

ts_paint
tpaint
tp5n
sunspider
a11y
tscroll/tscrollx

NoChrome Tests

All tests run through the pageloader extension can be run with or without browser chrome. The tests load the same pages as described above in either case. The majority or tests are run with browser chrome enabled. On mobile (native android builds) we have to run everything as nochrome since we don't support additional xul windows.

The ability to run tests without the browser chrome opens up the ability to further isolate performance regressions.

Test Descriptions

CanvasMark

contact: :jmaher
source: https://github.com/kevinroast/CanvasMark
type: PageLoader

Talos test name	Graphserver name	Description
tcanvasmark	Canvasmark
tcanvasmark_nochrome	Canvasmark, NoChrome	see #NoChrome Tests

These tests run the third-party CanvasMark benchmark suite, which measures the browser's ability to render a variety of canvas animations at a smooth framerate as the scenes grow more complex.

Results are a score "based on the length of time the browser was able to maintain the test scene at greater than 30 FPS, multiplied by a weighting for the complexity of each test type" (higher is better).

tp5

contact: :jhammel, :jmaher
source: not available
type: PageLoader

Talos test name	Graphserver name	Description
tp5r	Tp5r MozAfterPaint	tp5 with responsiveness
tp5row	Tp5 Row Major MozAfterPaint	tp5r running in Row Major with 25 cycles/page, ignoring the first 5
tp5n	Tp5 No Network Row Major MozAfterPaint	tp5row with a new tp5.zip that has no 404s and no external network access
tp5	Tp5 MozAfterPaint	Measures the time to load a webpage and receive both a MozAfterPaint and OnLoad event.

Tests the time it takes Firefox to load the tp5 web page test set. The web set was culled from the Alexa top 500 April 8th, 2011 and consists of 100 pages.

Unfortunately, we do not distribute a copy of the set of test web pages as these would not constitute fair use. Here are the broad steps we use to create the test set:

Take the Alexa top 500 sites list
Remove all sites with questionable or explicit content
Remove duplicate site (for ex. many Google search front pages)
Manually select to keep interesting pages (such as pages in different locales)
Select a more representative page from any site presenting a simple search/login/etc. page
Deal with Windows 255 char limit for cached pages
Limit test set to top 100 pages

Note that the above steps did not eliminate all outside network access so we had to take further action to scrub all the pages so that there are 0 outside network accesses (this is done so that the tp test is as deterministic measurement of our rendering/layout/paint process as possible). If you are on the Mozilla intranet, you can obtain the current page set for local testing. DO NOT DISTRIBUTE IT.

Private Bytes

A memory metric tracked during tp4 test runs. This metric is sampled every 20 seconds.

For windows, a description from Microsoft TechNet.

RSS (Resident Set Size)

A memory metric tracked during tp5 test runs. This metric is sampled every 20 seconds. This metric is collected on linux/mac only.

Description from wikipedia.

Xres (X Resource Monitoring)

A memory metric tracked during tp5 test runs. This metric is sampled every 20 seconds. This metric is collected on linux only.

xres man page.

Working Set (tp5_memset)

A memory metric tracked during tp5 test runs. This metric is sampled every 20 seconds. This metric is collected on windows only. Description from Microsoft TechNet.

Modified Page List Bytes

A memory metric tracked during tp5 test runs. This metric is sampled every 20 seconds. This metric is collected on Windows7 only. Description from Microsoft MSDN.

% CPU

Cpu usage tracked during tp5 test runs. This metric is sampled every 20 seconds. This metric is collected on windows only.

Responsiveness

Reports the delay in milliseconds for the event loop to process a tracer event. For more details, see bug 631571.

ts_paint

contact: :mak, :jimm, :jhammel, :jmaher
source: tspaint_test.html
Perfomatic: "Ts, Paint"
type: Startup

Launches tspaint_test.html with the current timestamp in the url, waits for MozAfterPaint and onLoad to fire, then records the end time and calculates the time to startup.

The basic ts test uses a blank profile. Formerly known as ts before we looked for the MozAfterPaint event.

ts_places_generated_med

contact: :mak, :mattn, :jhammel, :jmaher
source: tspaint_test.html
type: Startup
dirty: this is also referred to as the dirty test

Runs the same test as ts_paint, but uses a generated profile to simulate what an average user would have. The profile consists of 4 files:

[permissions.sqlite] - allowXULXBL in moz_hosts
[prefs.js] - these are just prefs to allow script access to chrome
places.sqlite - updated daily via [buildbot script] to have [recent dates] for these tables:

moz_historyvisit, 111750 items
moz_bookmarks, 1354 items
moz_favicons, 22042 items
moz_annos, 0 items
moz_items_annos, 8 items

other tables which are not updated:
moz_places, 22088 items
moz_keywords, 7 items
moz_anno_attributes, 6 items
moz_bookmarks_roots, 5 items
moz_inputhistory, 342 items

[localstore.rdf]
- todo: this is outdated

ts_places_generated_max

contact: :mak, :mattn, :jhammel, :jmaher
source: tspaint_test.html
type: Startup
dirty: this is also referred to as the dirty test

Runs the same test as ts_paint, but uses a generated profile to simulate what an average user would have. The profile consists of 4 files:

[permissions.sqlite] - allowXULXBL in moz_hosts
[prefs.js] - these are just prefs to allow script access to chrome
places.sqlite - updated daily via [buildbot script] to have [recent dates] for these tables:

moz_historyvisit, 725054 items
moz_bookmarks, 144757 items
moz_favicons, 144705 items
moz_annos, 0 items
moz_items_annos, 8 items

other tables which are not updated:
moz_places, 144751 items
moz_keywords, 601 items
moz_anno_attributes, 6 items
moz_bookmarks_roots, 5 items
moz_inputhistory, 342 items

[localstore.rdf]
- todo: this is outdated

tdhtml

turned off on all branches and platforms November 1st, 2012
contact: :peterv, :jhammel, :jmaher
source: dhtml.manifest
type: PageLoader

Talos test name	Graphserver name	Description
tdhtmlr	DHTML Row Major	Row based and 25 cycles/page.
tdhtml.2	DHTML 2	Ignoring the first value instead of the highest (usually the highest is the first)

Tests which measure the time to cycle through a set of DHTML test pages. This test will be updated in the near future.

This test is also ran with the nochrome option.

tsvg, tsvgx

contact: :jwatt, :jhammel, :jmaher, :avih
source: svg.manifest, svgx
type: PageLoader

Talos test name	Graphserver name	Description
tsvgx	SVG-ASAP	Replacing tscroll,tsvg with tscrollx,tsvgx
svgr	SVG Row Major	Row Major and 25 cycles/page.
svg	SVG	Column based and 5 cycles.

An svg-only number that measures SVG rendering performance. About half of the tests are animations or iterations of rendering. The ASAP test (tsvgx) iterates in unlimited frame-rate mode thus reflecting the maximum rendering throughput of each test. The reported value is the page load time, or, for animations/iterations - overall duration the sequence/animation took to complete. To turn on ASAP mode, we set these preferences:

preferences = {'layout.frame_rate': 0, 'docshell.event_starvation_delay_hint': 1}

tsvg-opacity

contact: :jwatt, :jhammel, :jmaher
source: svg.manifest
type: PageLoader

Talos test name	Graphserver name	Description
svgr_opacity	SVG, Opacity Row Major	Row Major and 25 cycles/page.
svg_opacity	SVG, Opacity	Column based and 5 cycles.

An svg-only number that measures SVG rendering performance.

tpaint

contact: :jimm, :jhammel, :jmaher
source: [tpaint-window.html]
type: Startup

Talos test name	Graphserver name	Description
tpaint	Paint	twinopen but measuring the time after we receive the MozAfterPaint and OnLoad event.
twinopen		original test to measure the time to open window based on OnLoad event.
txul	Txul	another name for twinopen. Also we report txul in the regression emails.

Tests the amount of time it takes the open a new window. This test does not include startup time. Multiple test windows are opened in succession, results reported are the average amount of time required to create and display a window in the running instance of the browser. (Measures ctrl-n performance.)

JSS/Domaeo Tests

Dromaeo suite of tests for JavaScript performance testing. See the Dromaeo wiki for more information.

This suite is divided into several sub-suites.

Dromaeo CSS

contact: :dmandelin, :jhammel, :jmaher source: [css.manifest] type: PageLoader reporting: speed in test runs per second (higher is better)

Each page in the manifest is part of the dromaemo css benchmark.

Dromaeo DOM

contact: :dmandelin, :jhammel, :jmaher source: [dom.manifest] type: PageLoader reporting: speed in test runs per second (higher is better)

Each page in the manifest is part of the dromaemo css benchmark.

a11y

contact: :davidb, :tbsaunde, :jhammel, :jmaher
source: [a11y.manifest]
type: PageLoader
measuring: ???
reporting: test time in ms (lower is better)

Talos test name	Graphserver name	Description
a11yr	a11y Row Major MozAfterPaint	Row Major testing with 25 cycles per page
a11y.2	a11y 2 MozAfterPaint	same as a11y ignoring the first value collected instead of the largest
a11y	a11y MozAfterPaint	iterate through each page, 5 cycles through the list, ignore the highest value from each page

This test ensures basic a11y tables and permutations do not cause performance regressions.

tscroll, tscrollx

contact: :jrmuizel, :jhammel, :jmaher, :avih
source: [scroll.manifest]
type: PageLoader
measuring: Scroll performance
reporting: Average frame interval (1/FPS). Lower is better.

Talos test name	Graphserver name	Description
tscrollx	tscroll-ASAP MozAfterPaint	Replacing tscroll,tsvg with tscrollx,tsvgx
tscrollr	tscroll Row Major	Row Major testing with 25 cycles
tscroll.2	tscroll 2	Ignore the first value for each page instead of the largest
tscroll	tscroll	run through each page in the manifest and cycle 5 times. For each page, ignore the largest value

This test scrolls several pages where each represent a different known "hard" case to scroll (* needinfo), and measures the average frames interval (1/FPS) on each. The ASAP test (tscrollx) iterates in unlimited frame-rate mode thus reflecting the maximum scroll throughput per page. To turn on ASAP mode, we set these preferences:

preferences = {'layout.frame_rate': 0, 'docshell.event_starvation_delay_hint': 1}

tresize

contact: :jimm, :jmaher
source: [tresize-test.html]
type: StartupTest
measuring: Time to do XUL resize, in ms (lower is better).
reporting: ???

Talos test name	Graphserver name	Description
tresize	tresize	TODO

A purer form of paint measurement than tpaint. This test opens a single window positioned at 10,10 and sized to 300,300, then resizes the window outward |max| times measuring the amount of time it takes to repaint each resize. Dumps the resulting dataset and average to stdout or logfile.

xperf

contact: :taras, :aklotz, :jmaher, :jhammel
source: [xperf instrumentation]
type: Pageloader (tp5n)
measuring: IO counters from windows
reporting: Summary of read/write counters for disk, network (lower is better)

Talos will turn orange for 'x' jobs on windows 7 if your changeset accesses files which are not predefined in the [whitelist]. If your job turns orange, you will see a list of files in tbpl (or in the log file) which have been accessed unexpectedly (similar to this):

* TEST-UNEXPECTED-FAIL : xperf: File '{profile}\secmod.db' was accessed and we were not expecting it. DiskReadCount: 6, DiskWriteCount: 0, DiskReadBytes: 16904, DiskWriteBytes: 0
* TEST-UNEXPECTED-FAIL : xperf: File '{profile}\cert8.db' was accessed and we were not expecting it. DiskReadCount: 4, DiskWriteCount: 0, DiskReadBytes: 33288, DiskWriteBytes: 0
* TEST-UNEXPECTED-FAIL : xperf: File 'c:\$logfile' was accessed and we were not expecting it. DiskReadCount: 0, DiskWriteCount: 2, DiskReadBytes: 0, DiskWriteBytes: 32768 TEST-UNEXPECTED-FAIL : xperf: File '{profile}\secmod.db' was accessed and we were not expecting it. DiskReadCount: 6, DiskWriteCount: 0, DiskReadBytes: 16904, DiskWriteBytes: 0
* TEST-UNEXPECTED-FAIL : xperf: File '{profile}\cert8.db' was accessed and we were not expecting it. DiskReadCount: 4, DiskWriteCount: 0, DiskReadBytes: 33288, DiskWriteBytes: 0
* TEST-UNEXPECTED-FAIL : xperf: File 'c:\$logfile' was accessed and we were not expecting it. DiskReadCount: 0, DiskWriteCount: 2, DiskReadBytes: 0, DiskWriteBytes: 32768

In the case that these files are expected to be accessed by your changeset, then we can add them to the [whitelist].

Xperf runs tp5 while collecting xperf metrics for disk IO and network IO. The providers we listen for are:

['PROC_THREAD', 'LOADER', 'HARD_FAULTS', 'FILENAME', 'FILE_IO', 'FILE_IO_INIT']

The values we collect during stackwalk are:

['FileRead', 'FileWrite', 'FileFlush']

kraken

contact: :dmandelin, :jhammel, :jmaher
source: [kraken.manifest]
type: PageLoader
measuring: JavaScript performance
reporting: Total time for all tests, in ms (lower is better)
Perfomatic name: Kraken Benchmark MozAfterPaint

This is the Kraken javascript benchmark taken verbatim and slightly modified to fit into our pageloader extension and talos harness.

V8, version 7

contact: :jhammel, :jmaher
source: [v8.manifest]
type: PageLoader
measuring: ???
reporting: weighted score (higher is better)
Perfomatic name: V8 version 7 MozAfterPaint

this is the V8 (version 7) javascript benchmark taken verbatim and slightly modified to fit into our pageloader extension and talos harness.

The previous version of this test is V8 version 5 which was run on selective branches and operating systems.

TART/CART

contact: :avih, :jmaher, :MattN
source: tart
type: PageLoader
measuring: Desktop Firefox UI animation speed and smoothness
reporting: intervals in ms (lower is better) - see below for details
Perfomatic name: Tab Animation Test, Customization Animation Tests

TART is the Tab Animation Regression Test and CART is the Customize Animation Regression Test.

TART tests tab animation on these cases:

Simple: single new tab of about:blank open/close without affecting (shrinking/expanding) other tabs.
icon: same as above with favicons and long title instead of about:blank.
Newtab: newtab open with thumbnails preview - without affecting other tabs, with and without preload.
Fade: opens a tab, then measures fadeout/fadein (tab animation without the overhead of opening/closing a tab).
- Case 1 is tested with DPI scaling of 1.
- Case 2 is tested with DPI scaling of 1.0 and 2.0.
- Case 3 is tested with the default scaling of the test system.
- Case 4 is tested with DPI scaling of 2.0 with the "icon" tab (favicon and long title).
- Each animation produces 3 test results:
  - error: difference between the designated duration and the actual completion duration from the trigger.
  - half: average interval over the 2nd half of the animation.
  - all: average interval over all recorded intervals.

CART uses the same framework to measure performance of the Australize "customize" animation (for entering the toolbar/menu customization view). Subtests include:

Customize-enter animation (full and css-animation-only part).
Customize-exit animation

TART/CART can be used as a stand-alone addon:

Set the browser to ASAP mode (preferences layout.framerate=0, docshell.event_starvation_delay_hint=1). This makes the browser refresh the screen as fast as possible instead of limiting it to 60hz, thus allows higher resolution measurements. Requires restart to take effect.
Zip the addon dir of the source code and rename the extension to xpi.
Install the addon xpi and restart the browser.
Visit chrome://tart/content/tart.html
Select subtests to run. By default the selected tests are all the TART tests. CART is the "Customize" test.

Robocop

Robocop is Mozilla's Android test framework based on Robotium. In addition to functional/unit tests, there are several robocop performance tests run by Talos.

Robocop Checkerboarding Benchmark (tcheckerboard/trobocheck/rck): "Checkerboard" refers to cases where we can't render new portions of a page as fast as a user scrolls to them, and so they see a blank area, low-resolution rendering, or a checkerboard pattern as they scroll. This test measures "checkerboarding" by scrolling up and down a page and recording the average percentage of the screen that is "checkerboarded" over time. (Lower is better.) (source)
Robocop Checkerboarding Real User Benchmark (tcheck2/trobocheck2/rck2): This test is similar to tcheckerboard but designed to stress the browser harder. It uses a real-world test page, scrolls in all directions, and also zooms the page in various ways. Reports the average percentage of the screen that is "checkerboarded" over time (lower is better). (source)
Robocop Pan Benchmark (robopan/trobopan/rp): This test measures "jank" during scrolling. It scrolls down a page repeatedly, and records each "missed" frame (any frame drawn more than 1/40 sec after the previous frame). For each missed frame, it calculates how much it was delayed past 1/40 second. The test reports the sum of the squares of the delays for all missed frames (lower is better). (source)
Robocop Database Benchmark (roboprovider/troboprovider/rpr): This test measures the performance of the history and bookmarks ContentProvider database in Firefox for Android. It performs several database operations and reports the time to complete the operations, in milliseconds (lower is better). (source)

Other data

These are not part of the Talos code, but like Talos they are benchmarks that record data using the graphserver and are analyzed by the same scripts for regressions.

Number of Constructors (num_ctors)

This test runs at build time and measures the number of static initializers in the compiled code. Reducing this number is helpful for startup optimizations.

https://hg.mozilla.org/build/tools/file/348853aee492/buildfarm/utils/count_ctors.py

Codesighs

Codesighs measures the size of the compiled libraries and executables. Runs on Linux and Mac at build time, triggered by the --enable-codesighs configure flag.

For details see Codesighs.

Trace Malloc

This test is run as part of the "make leaktest" step during debug build jobs. It uses the trace-malloc tool from tools/trace-malloc to wrap calls to malloc and log information about every memory allocation. See also leaktest.py.

trace_malloc_leaks
trace_malloc_maxheap
trace_malloc_alloc

Buildbot/Talos/Tests