Buildbot/Talos/Tests

< Buildbot‎ | Talos
Revision as of 13:11, 19 February 2014 by Avih (talk | contribs)

Talos Tests

Where to get this information

A table detailing information flow from buildbot to talos to TBPL and graphserver is available at http://k0s.org:8080/ . This is generated with the talosnames script, as detailed in http://k0s.org/mozilla/blog/20120724135349 . See also bug 770460.

Talos Test Types

There are two different species of Talos tests:

  • #Startup Tests : start up the browser and wait for either the load event or the paint event and exit, measuring the time
  • #Page Load Tests : load a manifest of pages

Startup Tests

Startup tests launch Firefox and measure the time to the onload or paint events. Firefox is invoked with a URL to:

Page Load Tests

Many of the talos tests use the page loader to load a manifest of pages. These are tests that load a specific page and measure the time it takes to load the page, scroll the page, draw the page etc. In order to run a page load test, you need a manifest of pages to run. The manifest is simply a list of URLs of pages to load, separated by carriage returns, e.g.:

http://www.mozilla.org
http://www.mozilla.com

Example: http://hg.mozilla.org/build/talos/file/tip/talos/page_load_test/svg/svg.manifest

Manifests may also specify that a test computes its own data by prepending a % in front of the line:

% http://www.mozilla.org
% http://www.mozilla.com

Example: http://hg.mozilla.org/build/talos/file/tip/talos/page_load_test/v8_7/v8.manifest

The file you created should be referenced in your config file, for example, open sample.config, and look for the line referring to the test you want to run:

- name: tp4
url: '-tp page_load_test/tp4.manifest -tpchrome -tpnoisy -tpformat tinderbox -tpcycles 10'
  • -tp controls the location of your manifest
  • -tpchrome tells Talos to run the browser with the normal browser UI active
  • -tpnoisy means "generate lots of output"
  • -tpformat controls the format of the results, they default to the results we send to displays like graphserver and tbpl.
  • -tpcycles controls the number of times we run the entire test.

Paint Tests

Paint tests are measuring the time to receive both the MozAfterPaint and OnLoad event instead of just the OnLoad event.

Currently we run _paint tests for these tests:

  • ts_paint
  • tpaint
  • tp5n
  • sunspider
  • a11y
  • tscroll/tscrollx

NoChrome Tests

All tests run through the pageloader extension can be run with or without browser chrome. The tests load the same pages as described above in either case. The majority or tests are run with browser chrome enabled. On mobile (native android builds) we have to run everything as nochrome since we don't support additional xul windows.

The ability to run tests without the browser chrome opens up the ability to further isolate performance regressions.

Test Descriptions

CanvasMark

Talos test name Graphserver name Description
tcanvasmark Canvasmark
tcanvasmark_nochrome Canvasmark, NoChrome see #NoChrome Tests

These tests run the third-party CanvasMark benchmark suite, which measures the browser's ability to render a variety of canvas animations at a smooth framerate as the scenes grow more complex.

Results are a score "based on the length of time the browser was able to maintain the test scene at greater than 30 FPS, multiplied by a weighting for the complexity of each test type" (higher is better).

tp5

  • contact: :jhammel, :jmaher
  • source: not available
  • type: PageLoader
Talos test name Graphserver name Description
tp5r Tp5r MozAfterPaint tp5 with responsiveness
tp5row Tp5 Row Major MozAfterPaint tp5r running in Row Major with 25 cycles/page, ignoring the first 5
tp5n Tp5 No Network Row Major MozAfterPaint tp5row with a new tp5.zip that has no 404s and no external network access
tp5 Tp5 MozAfterPaint Measures the time to load a webpage and receive both a MozAfterPaint and OnLoad event.

Tests the time it takes Firefox to load the tp5 web page test set. The web set was culled from the Alexa top 500 April 8th, 2011 and consists of 100 pages.

Unfortunately, we do not distribute a copy of the set of test web pages as these would not constitute fair use. Here are the broad steps we use to create the test set:

  1. Take the Alexa top 500 sites list
  2. Remove all sites with questionable or explicit content
  3. Remove duplicate site (for ex. many Google search front pages)
  4. Manually select to keep interesting pages (such as pages in different locales)
  5. Select a more representative page from any site presenting a simple search/login/etc. page
  6. Deal with Windows 255 char limit for cached pages
  7. Limit test set to top 100 pages

Note that the above steps did not eliminate all outside network access so we had to take further action to scrub all the pages so that there are 0 outside network accesses (this is done so that the tp test is as deterministic measurement of our rendering/layout/paint process as possible). If you are on the Mozilla intranet, you can obtain the current page set for local testing. DO NOT DISTRIBUTE IT.

Private Bytes

A memory metric tracked during tp4 test runs. This metric is sampled every 20 seconds.

For windows, a description from Microsoft TechNet.

RSS (Resident Set Size)

A memory metric tracked during tp5 test runs. This metric is sampled every 20 seconds. This metric is collected on linux/mac only.

Description from wikipedia.

Xres (X Resource Monitoring)

A memory metric tracked during tp5 test runs. This metric is sampled every 20 seconds. This metric is collected on linux only.

xres man page.

Working Set (tp5_memset)

A memory metric tracked during tp5 test runs. This metric is sampled every 20 seconds. This metric is collected on windows only. Description from Microsoft TechNet.

Modified Page List Bytes

A memory metric tracked during tp5 test runs. This metric is sampled every 20 seconds. This metric is collected on Windows7 only. Description from Microsoft MSDN.

% CPU

Cpu usage tracked during tp5 test runs. This metric is sampled every 20 seconds. This metric is collected on windows only.

Responsiveness

Reports the delay in milliseconds for the event loop to process a tracer event. For more details, see bug 631571.

ts_paint

  • contact: :mak, :jimm, :jhammel, :jmaher
  • source: tspaint_test.html
  • Perfomatic: "Ts, Paint"
  • type: Startup

Launches tspaint_test.html with the current timestamp in the url, waits for MozAfterPaint and onLoad to fire, then records the end time and calculates the time to startup.

The basic ts test uses a blank profile. Formerly known as ts before we looked for the MozAfterPaint event.

ts_places_generated_med

  • contact: :mak, :mattn, :jhammel, :jmaher
  • source: tspaint_test.html
  • type: Startup
  • dirty: this is also referred to as the dirty test

Runs the same test as ts_paint, but uses a generated profile to simulate what an average user would have. The profile consists of 4 files:

moz_historyvisit, 111750 items
moz_bookmarks, 1354 items
moz_favicons, 22042 items
moz_annos, 0 items
moz_items_annos, 8 items

other tables which are not updated:
moz_places, 22088 items
moz_keywords, 7 items
moz_anno_attributes, 6 items
moz_bookmarks_roots, 5 items
moz_inputhistory, 342 items

ts_places_generated_max

  • contact: :mak, :mattn, :jhammel, :jmaher
  • source: tspaint_test.html
  • type: Startup
  • dirty: this is also referred to as the dirty test

Runs the same test as ts_paint, but uses a generated profile to simulate what an average user would have. The profile consists of 4 files:

moz_historyvisit, 725054 items
moz_bookmarks, 144757 items
moz_favicons, 144705 items
moz_annos, 0 items
moz_items_annos, 8 items

other tables which are not updated:
moz_places, 144751 items
moz_keywords, 601 items
moz_anno_attributes, 6 items
moz_bookmarks_roots, 5 items
moz_inputhistory, 342 items

tdhtml

  • turned off on all branches and platforms November 1st, 2012
  • contact: :peterv, :jhammel, :jmaher
  • source: dhtml.manifest
  • type: PageLoader
Talos test name Graphserver name Description
tdhtmlr DHTML Row Major Row based and 25 cycles/page.
tdhtml.2 DHTML 2 Ignoring the first value instead of the highest (usually the highest is the first)

Tests which measure the time to cycle through a set of DHTML test pages. This test will be updated in the near future.

This test is also ran with the nochrome option.

tsvg, tsvgx

  • contact: :jwatt, :jhammel, :jmaher, :avih
  • source: svg.manifest, svgx
  • type: PageLoader
Talos test name Graphserver name Description
tsvgx SVG-ASAP Replacing tscroll,tsvg with tscrollx,tsvgx
svgr SVG Row Major Row Major and 25 cycles/page.
svg SVG Column based and 5 cycles.

An svg-only number that measures SVG rendering performance. About half of the tests are animations or iterations of rendering. The ASAP test (tsvgx) iterates in unlimited frame-rate mode thus reflecting the maximum rendering throughput of each test. The reported value is the page load time, or, for animations/iterations - overall duration the sequence/animation took to complete. To turn on ASAP mode, we set these preferences:

preferences = {'layout.frame_rate': 0, 'docshell.event_starvation_delay_hint': 1}

tsvg-opacity

  • contact: :jwatt, :jhammel, :jmaher
  • source: svg.manifest
  • type: PageLoader
Talos test name Graphserver name Description
svgr_opacity SVG, Opacity Row Major Row Major and 25 cycles/page.
svg_opacity SVG, Opacity Column based and 5 cycles.

An svg-only number that measures SVG rendering performance.

tpaint

Talos test name Graphserver name Description
tpaint Paint twinopen but measuring the time after we receive the MozAfterPaint and OnLoad event.
twinopen original test to measure the time to open window based on OnLoad event.
txul Txul another name for twinopen. Also we report txul in the regression emails.

Tests the amount of time it takes the open a new window. This test does not include startup time. Multiple test windows are opened in succession, results reported are the average amount of time required to create and display a window in the running instance of the browser. (Measures ctrl-n performance.)

JSS/Domaeo Tests

Dromaeo suite of tests for JavaScript performance testing. See the Dromaeo wiki for more information.

This suite is divided into several sub-suites.

Dromaeo CSS

contact: :dmandelin, :jhammel, :jmaher source: [css.manifest] type: PageLoader reporting: speed in test runs per second (higher is better)

Each page in the manifest is part of the dromaemo css benchmark.

Dromaeo DOM

contact: :dmandelin, :jhammel, :jmaher source: [dom.manifest] type: PageLoader reporting: speed in test runs per second (higher is better)

Each page in the manifest is part of the dromaemo css benchmark.

a11y

  • contact: :davidb, :tbsaunde, :jhammel, :jmaher
  • source: [a11y.manifest]
  • type: PageLoader
  • measuring: ???
  • reporting: test time in ms (lower is better)
Talos test name Graphserver name Description
a11yr a11y Row Major MozAfterPaint Row Major testing with 25 cycles per page
a11y.2 a11y 2 MozAfterPaint same as a11y ignoring the first value collected instead of the largest
a11y a11y MozAfterPaint iterate through each page, 5 cycles through the list, ignore the highest value from each page

This test ensures basic a11y tables and permutations do not cause performance regressions.

tscroll, tscrollx

  • contact: :jrmuizel, :jhammel, :jmaher, :avih
  • source: [scroll.manifest]
  • type: PageLoader
  • measuring: Scroll performance
  • reporting: Average frame interval (1/FPS). Lower is better.
Talos test name Graphserver name Description
tscrollx tscroll-ASAP MozAfterPaint Replacing tscroll,tsvg with tscrollx,tsvgx
tscrollr tscroll Row Major Row Major testing with 25 cycles
tscroll.2 tscroll 2 Ignore the first value for each page instead of the largest
tscroll tscroll run through each page in the manifest and cycle 5 times. For each page, ignore the largest value


This test scrolls several pages where each represent a different known "hard" case to scroll (* needinfo), and measures the average frames interval (1/FPS) on each. The ASAP test (tscrollx) iterates in unlimited frame-rate mode thus reflecting the maximum scroll throughput per page. To turn on ASAP mode, we set these preferences:

preferences = {'layout.frame_rate': 0, 'docshell.event_starvation_delay_hint': 1}

tresize

  • contact: :jimm, :jmaher
  • source: [tresize-test.html]
  • type: StartupTest
  • measuring: Time to do XUL resize, in ms (lower is better).
  • reporting: ???
Talos test name Graphserver name Description
tresize tresize TODO

A purer form of paint measurement than tpaint. This test opens a single window positioned at 10,10 and sized to 300,300, then resizes the window outward |max| times measuring the amount of time it takes to repaint each resize. Dumps the resulting dataset and average to stdout or logfile.

xperf

  • contact: :taras, :aklotz, :jmaher, :jhammel
  • source: [xperf instrumentation]
  • type: Pageloader (tp5n)
  • measuring: IO counters from windows
  • reporting: Summary of read/write counters for disk, network (lower is better)

Talos will turn orange for 'x' jobs on windows 7 if your changeset accesses files which are not predefined in the [whitelist]. If your job turns orange, you will see a list of files in tbpl (or in the log file) which have been accessed unexpectedly (similar to this):

* TEST-UNEXPECTED-FAIL : xperf: File '{profile}\secmod.db' was accessed and we were not expecting it. DiskReadCount: 6, DiskWriteCount: 0, DiskReadBytes: 16904, DiskWriteBytes: 0
* TEST-UNEXPECTED-FAIL : xperf: File '{profile}\cert8.db' was accessed and we were not expecting it. DiskReadCount: 4, DiskWriteCount: 0, DiskReadBytes: 33288, DiskWriteBytes: 0
* TEST-UNEXPECTED-FAIL : xperf: File 'c:\$logfile' was accessed and we were not expecting it. DiskReadCount: 0, DiskWriteCount: 2, DiskReadBytes: 0, DiskWriteBytes: 32768 TEST-UNEXPECTED-FAIL : xperf: File '{profile}\secmod.db' was accessed and we were not expecting it. DiskReadCount: 6, DiskWriteCount: 0, DiskReadBytes: 16904, DiskWriteBytes: 0
* TEST-UNEXPECTED-FAIL : xperf: File '{profile}\cert8.db' was accessed and we were not expecting it. DiskReadCount: 4, DiskWriteCount: 0, DiskReadBytes: 33288, DiskWriteBytes: 0
* TEST-UNEXPECTED-FAIL : xperf: File 'c:\$logfile' was accessed and we were not expecting it. DiskReadCount: 0, DiskWriteCount: 2, DiskReadBytes: 0, DiskWriteBytes: 32768 

In the case that these files are expected to be accessed by your changeset, then we can add them to the [whitelist].

Xperf runs tp5 while collecting xperf metrics for disk IO and network IO. The providers we listen for are:

The values we collect during stackwalk are:

kraken

  • contact: :dmandelin, :jhammel, :jmaher
  • source: [kraken.manifest]
  • type: PageLoader
  • measuring: JavaScript performance
  • reporting: Total time for all tests, in ms (lower is better)
  • Perfomatic name: Kraken Benchmark MozAfterPaint

This is the Kraken javascript benchmark taken verbatim and slightly modified to fit into our pageloader extension and talos harness.

V8, version 7

  • contact: :jhammel, :jmaher
  • source: [v8.manifest]
  • type: PageLoader
  • measuring: ???
  • reporting: weighted score (higher is better)
  • Perfomatic name: V8 version 7 MozAfterPaint

this is the V8 (version 7) javascript benchmark taken verbatim and slightly modified to fit into our pageloader extension and talos harness.

The previous version of this test is V8 version 5 which was run on selective branches and operating systems.

TART/CART

  • contact: :avih, :jmaher, :MattN
  • source: tart
  • type: PageLoader
  • measuring: Desktop Firefox UI animation speed and smoothness
  • reporting: intervals in ms (lower is better) - see below for details
  • Perfomatic name: Tab Animation Test, Customization Animation Tests

TART is the Tab Animation Regression Test and CART is the Customize Animation Regression Test.

TART tests tab animation on these cases:

  • Simple: single new tab of about:blank open/close without affecting (shrinking/expanding) other tabs.
  • icon: same as above with favicons and long title instead of about:blank.
  • Newtab: newtab open with thumbnails preview - without affecting other tabs, with and without preload.
  • Fade: opens a tab, then measures fadeout/fadein (tab animation without the overhead of opening/closing a tab).
    • Case 1 is tested with DPI scaling of 1.
    • Case 2 is tested with DPI scaling of 1.0 and 2.0.
    • Case 3 is tested with the default scaling of the test system.
    • Case 4 is tested with DPI scaling of 2.0 with the "icon" tab (favicon and long title).
    • Each animation produces 3 test results:
      • error: difference between the designated duration and the actual completion duration from the trigger.
      • half: average interval over the 2nd half of the animation.
      • all: average interval over all recorded intervals.

CART uses the same framework to measure performance of the Australize "customize" animation (for entering the toolbar/menu customization view). Subtests include:

  • Customize-enter animation (full and css-animation-only part).
  • Customize-exit animation

TART/CART can be used as a stand-alone addon:

  • Set the browser to ASAP mode (preferences layout.framerate=0, docshell.event_starvation_delay_hint=1). This makes the browser refresh the screen as fast as possible instead of limiting it to 60hz, thus allows higher resolution measurements. Requires restart to take effect.
  • Zip the addon dir of the source code and rename the extension to xpi.
  • Install the addon xpi and restart the browser.
  • Visit chrome://tart/content/tart.html
  • Select subtests to run. By default the selected tests are all the TART tests. CART is the "Customize" test.

Robocop

Robocop is Mozilla's Android test framework based on Robotium. In addition to functional/unit tests, there are several robocop performance tests run by Talos.

Robocop Checkerboarding Benchmark (tcheckerboard/trobocheck/rck)
"Checkerboard" refers to cases where we can't render new portions of a page as fast as a user scrolls to them, and so they see a blank area, low-resolution rendering, or a checkerboard pattern as they scroll. This test measures "checkerboarding" by scrolling up and down a page and recording the average percentage of the screen that is "checkerboarded" over time. (Lower is better.) (source)
Robocop Checkerboarding Real User Benchmark (tcheck2/trobocheck2/rck2)
This test is similar to tcheckerboard but designed to stress the browser harder. It uses a real-world test page, scrolls in all directions, and also zooms the page in various ways. Reports the average percentage of the screen that is "checkerboarded" over time (lower is better). (source)
Robocop Pan Benchmark (robopan/trobopan/rp)
This test measures "jank" during scrolling. It scrolls down a page repeatedly, and records each "missed" frame (any frame drawn more than 1/40 sec after the previous frame). For each missed frame, it calculates how much it was delayed past 1/40 second. The test reports the sum of the squares of the delays for all missed frames (lower is better). (source)
Robocop Database Benchmark (roboprovider/troboprovider/rpr)
This test measures the performance of the history and bookmarks ContentProvider database in Firefox for Android. It performs several database operations and reports the time to complete the operations, in milliseconds (lower is better). (source)

See also:

Other data

These are not part of the Talos code, but like Talos they are benchmarks that record data using the graphserver and are analyzed by the same scripts for regressions.

Number of Constructors (num_ctors)

This test runs at build time and measures the number of static initializers in the compiled code. Reducing this number is helpful for startup optimizations.

Codesighs

Codesighs measures the size of the compiled libraries and executables. Runs on Linux and Mac at build time, triggered by the --enable-codesighs configure flag.

For details see Codesighs.

Trace Malloc

This test is run as part of the "make leaktest" step during debug build jobs. It uses the trace-malloc tool from tools/trace-malloc to wrap calls to malloc and log information about every memory allocation. See also leaktest.py.

  • trace_malloc_leaks
  • trace_malloc_maxheap
  • trace_malloc_alloc