Compatibility/Tripwire: Difference between revisions

(Added Q1 2017 KRs)
(Add link to the metabug)
 
(6 intermediate revisions by the same user not shown)
Line 1: Line 1:
== Web Regression Test Suite ==
== Tripwire Web Regression Test Suite ==


Broadly, this is planned to be an automatable tool which will take "snapshots" of a page as rendered in a known good build, and be able to compare those snapshots to how the page renders in a newer build, so that regressions may be detected.
Broadly, this is planned to be an automatable tool which will take "snapshots" of a page as rendered in a known good build, and be able to compare those snapshots to how the page renders in a newer build, so that regressions may be detected.
Line 7: Line 7:
It is hoped that this tool can be created as a WebExtension so that it may be used by multiple browsers (ie, Firefox and Servo), assuming that test builds can be made with the required APIs to gather the metrics necessary for a useful comparison of snapshots.
It is hoped that this tool can be created as a WebExtension so that it may be used by multiple browsers (ie, Firefox and Servo), assuming that test builds can be made with the required APIs to gather the metrics necessary for a useful comparison of snapshots.


=== Prototype Plan ===
== Current Status ==


Requirements analysis for initial prototype(s):
[https://bugzilla.mozilla.org/show_bug.cgi?id=1404474 Active meta-bug]


  - APIs needed for creating a page snapshot, which can be loaded offline by multiple builds.
=== What works now. ===
    - browsing to a given URL over a live network:
* creating a .webtest file in a known-good-version to compare against in the future;
      - with or without some initial session state
  - the base addon gathers information on tabs as they load.
      - at a given inner window size
  - heuristically detects when the page has finished loading.
      - perhaps using deterministic RNG, system date/time, etc
  - lets the user save a .webtest file ("freezing" the page to record it).
    - recording the *raw* response for each network request (compressed, etc)
  - saves the recorded network requests and data, and layout information as-rendered by the browser.
    - recording the final session state (cookies, etc)
* allows comparing results "live" in-browser (presumably in a future version);
    - recording and storing the system fonts which were used
  - the base addon allows the user to load a previously-saved .webtest file.
    - recording seed values for other deterministic options
  - it re-runs the test with the saved network/state data in the .webtest file.
      - RNG, date/time, window size, detected plugin versions, etc
   - it compares the results with the old result.
    - detecting when a page has "loaded enough" for a snapshot to be worthwhile
  - it presents the "differences" it found with a simple UI with screenshots.
      - based on DOM events, nominal CPU activity, nominal network activity, etc.
* also allows comparing results in automation;
    - writing a user-accessible snapshot file to the filesystem
  - the base addon also works with automation via a marionette driver.
  - additional APIs needed for loading a page snapshot in a controlled manner, to produce a result snapshot.
  - similarly loads and re-runs .webtests, comparing results.
    - limiting resource accesses to the list of URLs/requests/fonts in the page snapshot
   - presents a pass/fail to the marionette driver.
    - detecting unexpected resource accesses (network, font, etc)
  - pass is based on whether *any* differences were found.
    - making it possible to link each frame to its source node
  - environment variables are used to load a single .webtest, or a "manifest" with multiple tests;
    - possibly allow manual marking of certain frames/trees as non-deterministic (not worth comparing)
    - record the frozen DOM state as a frame tree (for comparing result snapshots)
    - possibly record the frozen DOM state as a loadable HTML document (including window dimensions)
    - possibly record a screenshot of the final rendering
      - perhaps maybe even compressed videos of the loading process?
    - recording debugging information, possibly in a neutral format instead of raw text
      - web console output
      - summary log of network activity
    - probably the ability to write summary output to stdout/CLI for CI tools
   - techniques needed for comparing result snapshots.
    - comparing the frame tree of two result snapshots, perhaps in a fuzzy manner.
    - detecting differences in frame output, to get a pass/fail result.
    - possibly being able to find the analogous frames in two snapshots (for manual marking, etc)
    - detecting possible failure reasons to target debugging:
      - missing resources being accessed
      - not having the same version of a plugin
      - scripts not running or being executed in different order
   - decisions on how to present output.
    - as an HTML-based browser UI with options to create/compare snapshots?
    - pass/fail, based on a fuzzy threshold for how similar the pages are?
    - a view summarizing the likely-to-be-major differences found in the frame trees.
    - able to show the screenshots for both versions (and easily compare them)
    - able to show a self-contained iframe with the frozen DOM outputs (and easily compare them)
  - decisions on how to integrate with CI.
    - can this be run easily in CI as a WebExtension (plus WE Experiments)?
    - what needs to be logged to the console to be picked up as a pass/fail?


      WEBTEST="file:///path/to/test.webtest" ./mach marionette-test toolkit/components/tripwire/tests/marionette/test_tripwire.py
      WEBTEST_MANIFEST="http://server.org/webtests.json" ./mach marionette-test toolkit/components/tripwire/tests/marionette/test_tripwire.py
        where webtests.json contains an array:  ["domain1.webtest", "domain2.webtest", "https://another-server.org/another.webtest"]


Minimum Viable Product for initial prototype:
=== What still needs to be done ===


  - able to take minimal page snapshots reasonably reliably
* improving the heurstics detecting when the page has loaded (it is not working with some sites)
    - for at least a few select sites (TBD)
* making results deterministic and stable;
    - only as much determinism as necessary for those sites (window size?)
   - fuzzing: nightly builds vary more than expected, differences in layout aren't pixel-precise over even the short-term. Fuzzing may be enough to mitigate this, as it seems related to sub-pixel layout differences accumulating.
    - only storing the minimum necessary information
   - determinism: still locking down the variables that cause a .webtest file to render differently not because the browser is at fault, but because the site is using RNGs, timestamps, or other things which can trigger different ads, A/B tests, or animation frames that end up appearing as major differences.
   - able to take a minimal result snapshot from the page snapshot
  - stability: there may still be network requests which aren't being logged due to CORS/etc. Work is underway to try to mitigate this, and seems likely to work.
    - saving the frame tree
   - deciding where to host the .webtest files for automation, as they cannot be in-tree.
    - saving a screenshot
   - able to present a comparison of two *result snapshots*
    - a simple pass/fail output (or fuzzy pass/fail)
    - a comparison of differences between frame trees
    - ability to see dumps of the frame trees for both results
    - ability to show screenshots of both results
    - not necessarily in a CI-friendly manner
    - can simply be a set of output files
   - does not have to be based on WebExtensions
    - can be a custom patchset against moz/central, XUL+XPCOM addon, etc


 
=== Future work ===
=== Q1 2017 Key Results ===
* adding artifacts to the marionette test results (screenshots and other differences) to aid in debugging.
 
* user-interface polish;
  1. Page snapshot creation
   - the "diff" tool is rudimentary, and needs to better-explain what the differences are.
    a. recording the raw network requests and responses
  - irrelevant differences may need to be filtered out (boxes which lay out differently, but do not visibly affect the final result).
    b. recording the initial and final session state
  - WebExtension APIs needed for opening a file-input dialog via hotkey, or reduce the amount of clicks needed to load a .webtest file.
    c. recording inner window size
  - adding a mobile-friendly interface for running .webtests on Android.
    d. recording and storing which system fonts were used
* increasing the amount and type of data that's being considered by the tests;
    e. writing a user-accessible snapshot file to the filesystem
  - the actual resulting markup, not just the layout box-model information.
    f. recording plugin versions
  - RAM usage and other performance metrics.
   2. Result snapshot creation
  - JS console logs, uncaught promises, CORS failures, etc.
    a. opening the given snapshot with initial state (window size, session, etc) and detecting when it has "loaded"
  - which CSS rules were being applied/not applied.
    b. sandboxing networking requests to the list in the page snapshot
  - sounds, canvases, videos and other animation.
    c. recording unexpected resource access
  - recording user-interactions and/or taking multiple snapshots per test-file.
    d. recording a screenshot of the final page
* pulling out the network request recording sub-module so it can be used elsewhere in automation.
    e. recording the final frame tree/display list
    f. recording the final "frozen" DOM state as an HTML file
    g. writing a user-accessible snapshot file to the filesystem
  3. Result snapshot comparison
    a. comparing the frame tree of two results and giving a pass/fail result
    b. presenting frame trees of both results for comparison
    c. presenting screenshots of both results for comparison
    d. presenting the frozen DOM results, with links from the frame tree to the nodes they represent
    e. summarizing unexpected network fetches, missing system fonts, plugins, web console output, full network log
 
[[Category:Web Compatibility]]

Latest revision as of 21:51, 26 March 2018

Tripwire Web Regression Test Suite

Broadly, this is planned to be an automatable tool which will take "snapshots" of a page as rendered in a known good build, and be able to compare those snapshots to how the page renders in a newer build, so that regressions may be detected.

This will require snapshots to be both a bundle of all the resources needed to load the page offline as deterministically as possible (the "page snapshot"), as well as an analysis of the results of loading the page in that initial build (the "result snapshot"). Initially only the CSS layout will be compared between snapshots to determine a pass/fail result, but this will be extensible.

It is hoped that this tool can be created as a WebExtension so that it may be used by multiple browsers (ie, Firefox and Servo), assuming that test builds can be made with the required APIs to gather the metrics necessary for a useful comparison of snapshots.

Current Status

Active meta-bug

What works now.

  • creating a .webtest file in a known-good-version to compare against in the future;
 - the base addon gathers information on tabs as they load.
 - heuristically detects when the page has finished loading.
 - lets the user save a .webtest file ("freezing" the page to record it).
 - saves the recorded network requests and data, and layout information as-rendered by the browser.
  • allows comparing results "live" in-browser (presumably in a future version);
 - the base addon allows the user to load a previously-saved .webtest file.
 - it re-runs the test with the saved network/state data in the .webtest file.
 - it compares the results with the old result.
 - it presents the "differences" it found with a simple UI with screenshots.
  • also allows comparing results in automation;
 - the base addon also works with automation via a marionette driver.
 - similarly loads and re-runs .webtests, comparing results.
 - presents a pass/fail to the marionette driver.
 - pass is based on whether *any* differences were found.
 - environment variables are used to load a single .webtest, or a "manifest" with multiple tests;
     WEBTEST="file:///path/to/test.webtest" ./mach marionette-test toolkit/components/tripwire/tests/marionette/test_tripwire.py
     WEBTEST_MANIFEST="http://server.org/webtests.json" ./mach marionette-test toolkit/components/tripwire/tests/marionette/test_tripwire.py
       where webtests.json contains an array:  ["domain1.webtest", "domain2.webtest", "https://another-server.org/another.webtest"]

What still needs to be done

  • improving the heurstics detecting when the page has loaded (it is not working with some sites)
  • making results deterministic and stable;
 - fuzzing: nightly builds vary more than expected, differences in layout aren't pixel-precise over even the short-term. Fuzzing may be enough to mitigate this, as it seems related to sub-pixel layout differences accumulating.
 - determinism: still locking down the variables that cause a .webtest file to render differently not because the browser is at fault, but because the site is using RNGs, timestamps, or other things which can trigger different ads, A/B tests, or animation frames that end up appearing as major differences.
 - stability: there may still be network requests which aren't being logged due to CORS/etc. Work is underway to try to mitigate this, and seems likely to work.
 - deciding where to host the .webtest files for automation, as they cannot be in-tree.

Future work

  • adding artifacts to the marionette test results (screenshots and other differences) to aid in debugging.
  • user-interface polish;
 - the "diff" tool is rudimentary, and needs to better-explain what the differences are.
 - irrelevant differences may need to be filtered out (boxes which lay out differently, but do not visibly affect the final result).
 - WebExtension APIs needed for opening a file-input dialog via hotkey, or reduce the amount of clicks needed to load a .webtest file.
 - adding a mobile-friendly interface for running .webtests on Android.
  • increasing the amount and type of data that's being considered by the tests;
 - the actual resulting markup, not just the layout box-model information.
 - RAM usage and other performance metrics.
 - JS console logs, uncaught promises, CORS failures, etc.
 - which CSS rules were being applied/not applied.
 - sounds, canvases, videos and other animation.
 - recording user-interactions and/or taking multiple snapshots per test-file.
  • pulling out the network request recording sub-module so it can be used elsewhere in automation.