Talos Tests
Where to get this information
- Talos tests are defined in http://hg.mozilla.org/build/talos/file/tip/talos/test.py
- TBPL abbreviations are defined in http://hg.mozilla.org/users/mstange_themasta.com/tinderboxpushlog/file/tip/js/Config.js#l302
- Perf-o-matic names are defined in http://hg.mozilla.org/graphs/file/tip/sql/data.sql
- Talos suites are configured for production in http://hg.mozilla.org/build/buildbot-configs/file/tip/mozilla-tests/config.py; these names are mapped to TBPL via regexes: http://hg.mozilla.org/users/mstange_themasta.com/tinderboxpushlog/file/e2e344885c80/js/Data.js#l512
- Inline help is available by running talos --print-tests to get a list of all tests and their descriptions. To get help on a subset of tests, including their run-time parameters, use e.g. talos --print-tests -a ts:tsvg [options]
A table detailing information flow from buildbot to talos to TBPL and graphserver is available at http://k0s.org:8080/ . This is generated with the talosnames script, as detailed in http://k0s.org/mozilla/blog/20120724135349 . See also bug 770460.
Talos Test Types
There are two different species of Talos tests:
- #Startup Tests : start up the browser and wait for either the load event or the paint event and exit, measuring the time
- #Page Load Tests : load a manifest of pages
Startup Tests
Startup tests launch Firefox and measure the time to the onload or paint events. Firefox is invoked with a URL to:
- http://hg.mozilla.org/build/talos/file/tip/talos/startup_test/startup_test.html for the onload event
- http://hg.mozilla.org/build/talos/file/tip/talos/startup_test/tspaint_test.html for the paint event
Page Load Tests
Many of the talos tests use the page loader to load a manifest of pages. These are tests that load a specific page and measure the time it takes to load the page, scroll the page, draw the page etc. In order to run a page load test, you need a manifest of pages to run. The manifest is simply a list of URLs of pages to load, separated by carriage returns, e.g.:
http://www.mozilla.org http://www.mozilla.com
Example: http://hg.mozilla.org/build/talos/file/tip/talos/page_load_test/svg/svg.manifest
Manifests may also specify that a test computes its own data by prepending a % in front of the line:
% http://www.mozilla.org % http://www.mozilla.com
Example: http://hg.mozilla.org/build/talos/file/tip/talos/page_load_test/v8_7/v8.manifest
The file you created should be referenced in your config file, for example, open sample.config, and look for the line referring to the test you want to run:
- name: tp4 url: '-tp page_load_test/tp4.manifest -tpchrome -tpnoisy -tpformat tinderbox -tpcycles 10'
- -tp controls the location of your manifest
- -tpchrome tells Talos to run the browser with the normal browser UI active
- -tpnoisy means "generate lots of output"
- -tpformat controls the format of the results, they default to the results we send to displays like graphserver and tbpl.
- -tpcycles controls the number of times we run the entire test.
Paint Tests
Paint tests are measuring the time to receive both the MozAfterPaint and OnLoad event instead of just the OnLoad event.
Currently we run _paint tests for these tests:
- ts_paint
- tpaint
- tp5n
- sunspider
- a11y
- tscroll/tscrollx
NoChrome Tests
All tests run through the pageloader extension can be run with or without browser chrome. The tests load the same pages as described above in either case. The majority or tests are run with browser chrome enabled. On mobile (native android builds) we have to run everything as nochrome since we don't support additional xul windows.
The ability to run tests without the browser chrome opens up the ability to further isolate performance regressions.
Test Descriptions
CanvasMark
- contact: :jmaher
- source: https://github.com/kevinroast/CanvasMark
- type: PageLoader
Talos test name | Graphserver name | Description |
tcanvasmark | Canvasmark | |
tcanvasmark_nochrome | Canvasmark, NoChrome | see #NoChrome Tests |
These tests run the third-party CanvasMark benchmark suite, which measures the browser's ability to render a variety of canvas animations at a smooth framerate as the scenes grow more complex.
Results are a score "based on the length of time the browser was able to maintain the test scene at greater than 30 FPS, multiplied by a weighting for the complexity of each test type" (higher is better).
tp5
- contact: :jhammel, :jmaher
- source: not available
- type: PageLoader
Talos test name | Graphserver name | Description |
tp5r | Tp5r MozAfterPaint | tp5 with responsiveness |
tp5row | Tp5 Row Major MozAfterPaint | tp5r running in Row Major with 25 cycles/page, ignoring the first 5 |
tp5n | Tp5 No Network Row Major MozAfterPaint | tp5row with a new tp5.zip that has no 404s and no external network access |
tp5 | Tp5 MozAfterPaint | Measures the time to load a webpage and receive both a MozAfterPaint and OnLoad event. |
Tests the time it takes Firefox to load the tp5 web page test set. The web set was culled from the Alexa top 500 April 8th, 2011 and consists of 100 pages.
Unfortunately, we do not distribute a copy of the set of test web pages as these would not constitute fair use. Here are the broad steps we use to create the test set:
- Take the Alexa top 500 sites list
- Remove all sites with questionable or explicit content
- Remove duplicate site (for ex. many Google search front pages)
- Manually select to keep interesting pages (such as pages in different locales)
- Select a more representative page from any site presenting a simple search/login/etc. page
- Deal with Windows 255 char limit for cached pages
- Limit test set to top 100 pages
Note that the above steps did not eliminate all outside network access so we had to take further action to scrub all the pages so that there are 0 outside network accesses (this is done so that the tp test is as deterministic measurement of our rendering/layout/paint process as possible). If you are on the Mozilla intranet, you can obtain the current page set for local testing. DO NOT DISTRIBUTE IT.
Private Bytes
A memory metric tracked during tp4 test runs. This metric is sampled every 20 seconds.
For windows, a description from Microsoft TechNet.
RSS (Resident Set Size)
A memory metric tracked during tp5 test runs. This metric is sampled every 20 seconds. This metric is collected on linux/mac only.
Xres (X Resource Monitoring)
A memory metric tracked during tp5 test runs. This metric is sampled every 20 seconds. This metric is collected on linux only.
Working Set (tp5_memset)
A memory metric tracked during tp5 test runs. This metric is sampled every 20 seconds. This metric is collected on windows only. Description from Microsoft TechNet.
Modified Page List Bytes
A memory metric tracked during tp5 test runs. This metric is sampled every 20 seconds. This metric is collected on Windows7 only. Description from Microsoft MSDN.
% CPU
Cpu usage tracked during tp5 test runs. This metric is sampled every 20 seconds. This metric is collected on windows only.
Responsiveness
Reports the delay in milliseconds for the event loop to process a tracer event. For more details, see bug 631571.
ts_paint
- contact: :mak, :jimm, :jhammel, :jmaher
- source: tspaint_test.html
- Perfomatic: "Ts, Paint"
- type: Startup
Launches tspaint_test.html with the current timestamp in the url, waits for MozAfterPaint and onLoad to fire, then records the end time and calculates the time to startup.
The basic ts test uses a blank profile. Formerly known as ts before we looked for the MozAfterPaint event.
ts_places_generated_med
- contact: :mak, :mattn, :jhammel, :jmaher
- source: tspaint_test.html
- type: Startup
- dirty: this is also referred to as the dirty test
Runs the same test as ts_paint, but uses a generated profile to simulate what an average user would have. The profile consists of 4 files:
- [permissions.sqlite] - allowXULXBL in moz_hosts
- [prefs.js] - these are just prefs to allow script access to chrome
- places.sqlite - updated daily via [buildbot script] to have [recent dates] for these tables:
moz_historyvisit, 111750 items moz_bookmarks, 1354 items moz_favicons, 22042 items moz_annos, 0 items moz_items_annos, 8 items other tables which are not updated: moz_places, 22088 items moz_keywords, 7 items moz_anno_attributes, 6 items moz_bookmarks_roots, 5 items moz_inputhistory, 342 items
- [localstore.rdf]
- todo: this is outdated
ts_places_generated_max
- contact: :mak, :mattn, :jhammel, :jmaher
- source: tspaint_test.html
- type: Startup
- dirty: this is also referred to as the dirty test
Runs the same test as ts_paint, but uses a generated profile to simulate what an average user would have. The profile consists of 4 files:
- [permissions.sqlite] - allowXULXBL in moz_hosts
- [prefs.js] - these are just prefs to allow script access to chrome
- places.sqlite - updated daily via [buildbot script] to have [recent dates] for these tables:
moz_historyvisit, 725054 items moz_bookmarks, 144757 items moz_favicons, 144705 items moz_annos, 0 items moz_items_annos, 8 items other tables which are not updated: moz_places, 144751 items moz_keywords, 601 items moz_anno_attributes, 6 items moz_bookmarks_roots, 5 items moz_inputhistory, 342 items
- [localstore.rdf]
- todo: this is outdated
tdhtml
- turned off on all branches and platforms November 1st, 2012
- contact: :peterv, :jhammel, :jmaher
- source: dhtml.manifest
- type: PageLoader
Talos test name | Graphserver name | Description |
tdhtmlr | DHTML Row Major | Row based and 25 cycles/page. |
tdhtml.2 | DHTML 2 | Ignoring the first value instead of the highest (usually the highest is the first) |
Tests which measure the time to cycle through a set of DHTML test pages. This test will be updated in the near future.
This test is also ran with the nochrome option.
tsvg, tsvgx
- contact: :jwatt, :jhammel, :jmaher, :avih
- source: svg.manifest, svgx
- type: PageLoader
Talos test name | Graphserver name | Description |
tsvgx | SVG-ASAP | Replacing tscroll,tsvg with tscrollx,tsvgx |
svgr | SVG Row Major | Row Major and 25 cycles/page. |
svg | SVG | Column based and 5 cycles. |
An svg-only number that measures SVG rendering performance. About half of the tests are animations or iterations of rendering. The ASAP test (tsvgx) iterates in unlimited frame-rate mode thus reflecting the maximum rendering throughput of each test. The reported value is the page load time, or, for animations/iterations - overall duration the sequence/animation took to complete. To turn on ASAP mode, we set these preferences:
preferences = {'layout.frame_rate': 0, 'docshell.event_starvation_delay_hint': 1}
tsvg-opacity
- contact: :jwatt, :jhammel, :jmaher
- source: svg.manifest
- type: PageLoader
Talos test name | Graphserver name | Description |
svgr_opacity | SVG, Opacity Row Major | Row Major and 25 cycles/page. |
svg_opacity | SVG, Opacity | Column based and 5 cycles. |
An svg-only number that measures SVG rendering performance.
tpaint
- contact: :jimm, :jhammel, :jmaher
- source: [tpaint-window.html]
- type: Startup
Talos test name | Graphserver name | Description |
tpaint | Paint | twinopen but measuring the time after we receive the MozAfterPaint and OnLoad event. |
twinopen | original test to measure the time to open window based on OnLoad event. | |
txul | Txul | another name for twinopen. Also we report txul in the regression emails. |
Tests the amount of time it takes the open a new window. This test does not include startup time. Multiple test windows are opened in succession, results reported are the average amount of time required to create and display a window in the running instance of the browser. (Measures ctrl-n performance.)
JSS/Domaeo Tests
Dromaeo suite of tests for JavaScript performance testing. See the Dromaeo wiki for more information.
This suite is divided into several sub-suites.
Dromaeo CSS
contact: :dmandelin, :jhammel, :jmaher source: [css.manifest] type: PageLoader reporting: speed in test runs per second (higher is better)
Each page in the manifest is part of the dromaemo css benchmark.
Dromaeo DOM
contact: :dmandelin, :jhammel, :jmaher source: [dom.manifest] type: PageLoader reporting: speed in test runs per second (higher is better)
Each page in the manifest is part of the dromaemo css benchmark.
a11y
- contact: :davidb, :tbsaunde, :jhammel, :jmaher
- source: [a11y.manifest]
- type: PageLoader
- measuring: ???
- reporting: test time in ms (lower is better)
Talos test name | Graphserver name | Description |
a11yr | a11y Row Major MozAfterPaint | Row Major testing with 25 cycles per page |
a11y.2 | a11y 2 MozAfterPaint | same as a11y ignoring the first value collected instead of the largest |
a11y | a11y MozAfterPaint | iterate through each page, 5 cycles through the list, ignore the highest value from each page |
This test ensures basic a11y tables and permutations do not cause performance regressions.
tscroll, tscrollx
- contact: :jrmuizel, :jhammel, :jmaher, :avih
- source: [scroll.manifest]
- type: PageLoader
- measuring: Scroll performance
- reporting: Average frame interval (1/FPS). Lower is better.
Talos test name | Graphserver name | Description |
tscrollx | tscroll-ASAP MozAfterPaint | Replacing tscroll,tsvg with tscrollx,tsvgx |
tscrollr | tscroll Row Major | Row Major testing with 25 cycles |
tscroll.2 | tscroll 2 | Ignore the first value for each page instead of the largest |
tscroll | tscroll | run through each page in the manifest and cycle 5 times. For each page, ignore the largest value |
This test scrolls several pages where each represent a different known "hard" case to scroll (* needinfo), and measures the average frames interval (1/FPS) on each. The ASAP test (tscrollx) iterates in unlimited frame-rate mode thus reflecting the maximum scroll throughput per page. To turn on ASAP mode, we set these preferences:
preferences = {'layout.frame_rate': 0, 'docshell.event_starvation_delay_hint': 1}
tresize
- contact: :jimm, :jmaher
- source: [tresize-test.html]
- type: StartupTest
- measuring: Time to do XUL resize, in ms (lower is better).
- reporting: ???
Talos test name | Graphserver name | Description |
tresize | tresize | TODO |
A purer form of paint measurement than tpaint. This test opens a single window positioned at 10,10 and sized to 300,300, then resizes the window outward |max| times measuring the amount of time it takes to repaint each resize. Dumps the resulting dataset and average to stdout or logfile.
xperf
- contact: :taras, :aklotz, :jmaher, :jhammel
- source: [xperf instrumentation]
- type: Pageloader (tp5n)
- measuring: IO counters from windows
- reporting: Summary of read/write counters for disk, network (lower is better)
Talos will turn orange for 'x' jobs on windows 7 if your changeset accesses files which are not predefined in the [whitelist]. If your job turns orange, you will see a list of files in tbpl (or in the log file) which have been accessed unexpectedly (similar to this):
* TEST-UNEXPECTED-FAIL : xperf: File '{profile}\secmod.db' was accessed and we were not expecting it. DiskReadCount: 6, DiskWriteCount: 0, DiskReadBytes: 16904, DiskWriteBytes: 0 * TEST-UNEXPECTED-FAIL : xperf: File '{profile}\cert8.db' was accessed and we were not expecting it. DiskReadCount: 4, DiskWriteCount: 0, DiskReadBytes: 33288, DiskWriteBytes: 0 * TEST-UNEXPECTED-FAIL : xperf: File 'c:\$logfile' was accessed and we were not expecting it. DiskReadCount: 0, DiskWriteCount: 2, DiskReadBytes: 0, DiskWriteBytes: 32768 TEST-UNEXPECTED-FAIL : xperf: File '{profile}\secmod.db' was accessed and we were not expecting it. DiskReadCount: 6, DiskWriteCount: 0, DiskReadBytes: 16904, DiskWriteBytes: 0 * TEST-UNEXPECTED-FAIL : xperf: File '{profile}\cert8.db' was accessed and we were not expecting it. DiskReadCount: 4, DiskWriteCount: 0, DiskReadBytes: 33288, DiskWriteBytes: 0 * TEST-UNEXPECTED-FAIL : xperf: File 'c:\$logfile' was accessed and we were not expecting it. DiskReadCount: 0, DiskWriteCount: 2, DiskReadBytes: 0, DiskWriteBytes: 32768
In the case that these files are expected to be accessed by your changeset, then we can add them to the [whitelist].
Xperf runs tp5 while collecting xperf metrics for disk IO and network IO. The providers we listen for are:
The values we collect during stackwalk are:
kraken
- contact: :dmandelin, :jhammel, :jmaher
- source: [kraken.manifest]
- type: PageLoader
- measuring: JavaScript performance
- reporting: Total time for all tests, in ms (lower is better)
- Perfomatic name: Kraken Benchmark MozAfterPaint
This is the Kraken javascript benchmark taken verbatim and slightly modified to fit into our pageloader extension and talos harness.
V8, version 7
- contact: :jhammel, :jmaher
- source: [v8.manifest]
- type: PageLoader
- measuring: ???
- reporting: weighted score (higher is better)
- Perfomatic name: V8 version 7 MozAfterPaint
this is the V8 (version 7) javascript benchmark taken verbatim and slightly modified to fit into our pageloader extension and talos harness.
The previous version of this test is V8 version 5 which was run on selective branches and operating systems.
TART/CART
- contact: :avih, :jmaher, :MattN
- source: tart
- type: PageLoader
- measuring: Desktop Firefox UI animation speed and smoothness
- reporting: intervals in ms (lower is better) - see below for details
- Perfomatic name: Tab Animation Test, Customization Animation Tests
TART is the Tab Animation Regression Test and CART is the Customize Animation Regression Test.
TART tests tab animation on these cases:
- Simple: single new tab of about:blank open/close without affecting (shrinking/expanding) other tabs.
- icon: same as above with favicons and long title instead of about:blank.
- Newtab: newtab open with thumbnails preview - without affecting other tabs, with and without preload.
- Fade: opens a tab, then measures fadeout/fadein (tab animation without the overhead of opening/closing a tab).
- Case 1 is tested with DPI scaling of 1.
- Case 2 is tested with DPI scaling of 1.0 and 2.0.
- Case 3 is tested with the default scaling of the test system.
- Case 4 is tested with DPI scaling of 2.0 with the "icon" tab (favicon and long title).
- Each animation produces 3 test results:
- error: difference between the designated duration and the actual completion duration from the trigger.
- half: average interval over the 2nd half of the animation.
- all: average interval over all recorded intervals.
CART uses the same framework to measure performance of the Australize "customize" animation (for entering the toolbar/menu customization view). Subtests include:
- Customize-enter animation (full and css-animation-only part).
- Customize-exit animation
TART/CART can be used as a stand-alone addon:
- Set the browser to ASAP mode (preferences layout.framerate=0, docshell.event_starvation_delay_hint=1). This makes the browser refresh the screen as fast as possible instead of limiting it to 60hz, thus allows higher resolution measurements. Requires restart to take effect.
- Zip the addon dir of the source code and rename the extension to xpi.
- Install the addon xpi and restart the browser.
- Visit chrome://tart/content/tart.html
- Select subtests to run. By default the selected tests are all the TART tests. CART is the "Customize" test.
Robocop
Robocop is Mozilla's Android test framework based on Robotium. In addition to functional/unit tests, there are several robocop performance tests run by Talos.
- Robocop Checkerboarding Benchmark (tcheckerboard/trobocheck/rck)
- "Checkerboard" refers to cases where we can't render new portions of a page as fast as a user scrolls to them, and so they see a blank area, low-resolution rendering, or a checkerboard pattern as they scroll. This test measures "checkerboarding" by scrolling up and down a page and recording the average percentage of the screen that is "checkerboarded" over time. (Lower is better.) (source)
- Robocop Checkerboarding Real User Benchmark (tcheck2/trobocheck2/rck2)
- This test is similar to tcheckerboard but designed to stress the browser harder. It uses a real-world test page, scrolls in all directions, and also zooms the page in various ways. Reports the average percentage of the screen that is "checkerboarded" over time (lower is better). (source)
- Robocop Pan Benchmark (robopan/trobopan/rp)
- This test measures "jank" during scrolling. It scrolls down a page repeatedly, and records each "missed" frame (any frame drawn more than 1/40 sec after the previous frame). For each missed frame, it calculates how much it was delayed past 1/40 second. The test reports the sum of the squares of the delays for all missed frames (lower is better). (source)
- Robocop Database Benchmark (roboprovider/troboprovider/rpr)
- This test measures the performance of the history and bookmarks ContentProvider database in Firefox for Android. It performs several database operations and reports the time to complete the operations, in milliseconds (lower is better). (source)
See also:
Other data
These are not part of the Talos code, but like Talos they are benchmarks that record data using the graphserver and are analyzed by the same scripts for regressions.
Number of Constructors (num_ctors)
This test runs at build time and measures the number of static initializers in the compiled code. Reducing this number is helpful for startup optimizations.
Codesighs
Codesighs measures the size of the compiled libraries and executables. Runs on Linux and Mac at build time, triggered by the --enable-codesighs configure flag.
For details see Codesighs.
Trace Malloc
This test is run as part of the "make leaktest" step during debug build jobs. It uses the trace-malloc tool from tools/trace-malloc to wrap calls to malloc and log information about every memory allocation. See also leaktest.py.
- trace_malloc_leaks
- trace_malloc_maxheap
- trace_malloc_alloc