EngineeringProductivity/Autophone
Autophone
Autophone is a test framework for Firefox for Android (Fennec) which runs tests on actual Android devices. Autophone is currently used to measure page load performance and to run various Unit tests in Fennec.
Autophone is unlike most test frameworks at mozilla.
- Its source is maintained outside of the mozilla source repositories.
- It runs on a small number of devices.
- It is hosted and managed separately from the other test frameworks in use at mozilla.
- It only tests builds from mozilla-central, mozilla-inbound, fx-team, mozilla-aurora, mozilla-beta and mozilla-release.
Due to these limitations, Autophone test results on Treeherder are currently Tier-2.
In addition to reporting pass/fail results to Treeherder, Autophone reports performance measurements to Perfherder and phonedash.
Autophone Status
- Autophone is up.
2017-06-29
Autophone servers have been converted to Fedora 25. File any regressions as blocking 1377108.
Autophone for Sheriffs
Maintainers
Autophone is maintained by Bob Clary (Bugzilla :bc:) with help from Joel Maher (Bugzilla :jmaher) and Geoff Brown (Bugzilla :gbrown). Dan Minor (Bugzilla :dminor) is working to get WebRTC tests running in production. You can usually find at least one of us in #ateam on irc.mozilla.org.
Bugs
File any bugs or infrastructure issues with Autophone in bugzilla under product Testing component Autophone.
Disabling an individual failing test
The procedure for disabling tests varies depending whether you wish to prevent Autophone from running a test suite entirely or if you wish to disable an individual Unit test contained in the mozilla source tree.
To disable an entire Autophone test such as a S1S2 Test or an entire Unit test suite such as Autophone Mochitest WebRTC, please file a bug giving the test to be disabled, the reason it should be disabled along with a link to a Treeherder job illustrating the problem if possible. The Autophone maintainers will update the appropropriate test manifest files and restart the Autophone instances.
To disable individual tests within Unit test suites, follow the same procedures for disabling the test as you would for disabling the test normally. This typically involves editing the test manifests to skip the test for Android. You can find the Unit test manifests used by Autophone in the Tests table later in this document.
Autophone for Developers
Introduction
Submitting try builds to Autophone
The TryServer offer the means for developers who do not possess a rooted Android device to run Autophone tests as well as the ability to run tests on the exact same devices used in production.
Autophone will only execute tests against try builds if the try build's commit message explicitly specifies Autophone tests. Thus Autophone will not execute any tests when All is selected under Unit Test Suites since the try commit message will contain only '-u all'. This behavior is intentional and is intended to prevent developers from inadvertently scheduling Autophone try jobs.
Autophone has the following devices dedicated to running try builds:
- nexus-4-7
- nexus-5-6
- nexus-6-2
- nexus-6p-11
- nexus-9-2
- pixel-10
In order to help reduce the turn around time for developers, Autophone prioritizes try builds and will test them before any normal tinderbox build. Note: Be careful not to DoS Autophone by submitting unnecessary requests since this can prevent the two shared devices from testing new tinderbox builds in a timely fashion.
Trychooser try commit message
Trychooser provides an easy method for selecting Autophone tests. Trychooser lists the tests which run in production on mozilla-inbound, fx-team, mozilla-central, mozilla-aurora, mozilla-beta or mozilla-release. It does not list all of the available tests however since some of them (reftest, jsreftest and mochitest) can tie up devices for many hours. If the test you wish to run on try is not available from the Trychooser, you can still compose the manual try commit using any of the supported tests.
- Under Build Types, select either Opt or Debug. Autophone only supports opt builds when running performance tests such as S1S2 Test or Talos. Autophone does support both Opt and Debug when running Unit tests.
- Under Platforms, select either one or both of Android api 9-10 constrained or Android api 16+. Note that currently (June 2016) Android api 9-10 is only supported on mozilla-release. After the release of Firefox 48 in August, Android 2.3 will no longer be supported.
- Under Android-Only Unittest Suites, select only the tests that you need to run.
Trychooser will produce a try commit message of the form:
try: -b o -p android-api-16 -u autophone-mochitest-dom-media -t none
Manual try commit message
Autophone specifies its tests using the unittests argument in the try message. Autophone support both -u and --unittests.
try: -b o -p android-api-16 -u autophone-mochitest-dom-media -t none
or
try: -b o -p android-api-16 --unittests autophone-mochitest-dom-media -t none
The list of test names which can be used in the try commit message can be found in the Tests table later in this document.
Note that if a test specifies more than one chunk, you can either specify the test name to get all chunks or append a dash followed by a chunk number to only specify that chunk.
For example, to run all 16 reftest chunks:
try: -b o -p android-api-16 --unittests autophone-reftest -t none
To run only reftest chunk 3 use:
try: -b o -p android-api-16 --unittests autophone-reftest-3 -t none
Following your try build
- Once you have submitted your build to try, you will be able to follow the execution of the tests on Treeherder using
https://treeherder.mozilla.org/#/jobs?repo=try&filter-searchStr=autophone&exclusion_profile=all&filter-tier=1&filter-tier=2&filter-tier=3&filter-job_group_name=autophone&author=<youremail>
where <youremail> is the email account registered with hg.mozilla.org. Note that until Autophone reaches Tier 2 status, results are hidden by default on Treeherder. You must click the "Show/hide hidden jobs" icon to set 'exclusion_profile=all' and include Tier 3 in order to show all jobs.
- If your tests included the Autophone performance related tests, you can view the results on Perfherder or Phonedash. Phonedash may be a better choice when reviewing your Try server results since it allows you to choose to display only your Try server results.
Running Autophone tests locally
Once you have built Fennec, you can run Autophone locally iIf you have a rooted Android device available using the command
mach autophone
This will download the necessary python packages and install them locally into a virtual environment on your computer. You will be able to select the tests you wish to run from a series of text prompts.
Tests
Autophone currently supports the following tests:
Smoke test
smoketest.py tests if the build can be installed, a profile created and initialized and whether the Throbber messages can be detected in the logcat output. The Smoke test results appear on Treeherder as A(s).
Note: The Smoke test does not automatically run in production though it is available via the Try Server.
S1S2 Test
s1s2test.py measures the Throbber times for loading web pages. Three web pages are using in Autophone:
Note: git.mozilla.org is going away soon. We had permission to host the copyrighted Twitter and NY Times pages on git.mozilla.org, but will be manually distributing the saved pages in the future. If you need access to the actual files, contact one of the Autophone maintainers.
Autophone runs two versions of each test:
- "Local" tests which load the pages from the device's internal storage.
- "Remote" tests which load the pages from a web server running on the server hosting the device.
The S1S2 Tests appear on Treeherder as A(t).
As Mark Finkle described in bug 1120511#c6:
"Throbber Start and Throbber Stop should map to the points where Gecko nsIWebProgressListener fires START|NETWORK and STOP|NETWORK notifications, respectively. In the UI we use those to control the visibility of the "page progress" indicator. It used to be a throbber(spinner) but is now a simple progress line.
"The time between Throbber Start and Throbber Stop is a combination of Gecko networking and page parsing & loading, and rendering. We also have some Java UI affects too.
We need to convert the Throbber start and stop times from values relative to the system time to values relative to the time Firefox was started. We use the "Fennec application start" message added in bug 1214810. If it is not available, we fall back on the system time of the first logcat message after starting Fennec which contains the string "Gecko".
The logcat fennec start message used by Autophone is produced in GeckoApplication.java.
The logcat throbber messages used by Autophone are produced in ToolbarDisplayLayout.java and look like:
06-07 11:20:16.035 I/GeckoApplication( 7247): zerdatime 117697 - Fennec application start 06-07 11:20:18.323 I/GeckoToolbarDisplayLayout( 7247): zerdatime 119985 - Throbber start 06-07 11:20:18.548 I/GeckoToolbarDisplayLayout( 7247): zerdatime 120210 - Throbber stop
where "zerdatime" is the value of SystemClock.uptimeMillis(). For those of you who are curious about the meaning of zerda, it refers to the scientific name for Fennec foxes, Vulpes Zerda.
For historical reasons due to the initial lack of the "Fennec application start" message, Autophone uses logcat with the "time" format which provides the device's system time with millisecond resolution at the beginning of each logcat message. Autophone uses that value instead of the reported zerdatime. Now that "Fennec application start" is available in the current train of builds, we may be able to revisit the use of the logcat time stamps and begin using the zerda time directly.
The system time of the Throbber start and stop messages is determined similarly. The reported values of the Throbber start and stop times are the differences between the Throbber start and stop system times and the fennec start time.
The blank, twitter and nytimes pages contain JavaScript which invokes Jesse Ruderman's quitter.xpi extension to cleanly shut down the browser after the page completes loading. Shutting down the browser cleanly is important due to the side-effects of killing the browser which can negatively impact performace measurements.
For each build to be tested, it is installed, then a test run consists of performing the following operations 8 times:
- Create a new profile containing the quitter extension.
- Initialize the profile by starting the browser loading initialize_profile.html. initialize_profile.html is an empty page which calls quitter to shutdown the browser.
- Measure the "First Run" (uncached) Throbber start and stop values by starting the browser loading the test page.
- Measure the "Second Run" (cached) Throbber start and stop values by starting the browser with the same profile loading the test page again.
The values for each iteration are posted to Perfherder and phonedash.mozilla.org where the measurements can be displayed.
Talos Tests
TODO: jmaher?
Tp4m
Tsvg
Unit Tests
runtestsremote.py can run Reftest and Mochitest based tests though due to the time required to run each test and the limited number of devices, only the following tests are currently run:
- Mdb - autophone-mochitest-dom-browser-element
- Cdm1 - autophone-crashtest-dom-media
- Cdm2 - autophone-crashtest-dom-media-tests
- Cdm3 - autophone-crashtest-dom-media-mediasource
- Cdm4 - autophone-crashtest-dom-media-webspeech-synth
- Mdm1 - autophone-mochitest-dom-media
- Mdm2 - autophone-mochitest-dom-media-tests
- Mdm3 - autophone-mochitest-dom-media-mediasource
- Mdm4 - autophone-mochitest-dom-media-tests-identity
- Mdm5 - autophone-mochitest-dom-media-webaudio
- Mdm6 - autophone-mochitest-dom-media-webaudio-blink
- Mdm7 - autophone-mochitest-dom-media-webspeech-recognition
- Mdm8 - autophone-mochitest-dom-media-webspeech-synth
- Mdm9 - autophone-mochitest-dom-media-webspeech-synth-startup
- Msk - autophone-mochitest-skia
- Mtw - autophone-mochitest-toolkit-widgets
- Rov - autophone-reftest-ogg-video
- Rwv - autophone-reftest-webm-video
- rca - autophone-robocoptest-autophone
- t - autophone-s1s2
- tpn/svg - autophone-talos
Devices
Autophone currently tests Nexus S (Android 2.3), Nexus 4 (Android 4.2.2), Nexus 5 (Android 4.4.2), Nexus 6 (Android 5.1.1), Nexus 9 (Android 5.0.2), Nexus 6P (Android 6.0.1).
The Nexus S devices are especially good at showing performance changes due to their slow speed and their single core processor. They are being phased out of testing as support for Android 2.3 is dropped and will be completely removed when Firefox 48 is released in August 2016. The other devices are faster, have multiple core processors and behave differently for multi-threaded code paths.
Reviewing Autophone test results
Monitoring Autophone tests on Treeherder
Load https://treeherder.mozilla.org/, then select one of the mozilla-inbound, fx-team, mozilla-central, mozilla-aurora, mozilla-beta or mozilla-release repositories.
Click "Show/hide hidden jobs" and select Tier 3 in order to view Autophone results since they are currently hidden by default on Treeherder until Autophone achieves Tier 2 status.
Autophone jobs appear with the group name A. To see only Autophone jobs, you can set the quick filter for "Platforms & jobs" to Autophone or use the "Filters" drop down to add a new filter based on "group name" Autophone.
For example, the following will show Autophone jobs on mozilla-central:
Note that if you change the repository, you will need to click "Show/hide hidden jobs" and filter for Autophone again.
Links to the logcat output, Autophone log and any available tombstone or ANR files are available in the Job Details panel which is opened by clicking on an Autophone test symbol. If the test is an S1S2 test, the Job Details panel will contain a link to phonedash.mozilla.org which will display a graph of the performance measurements.
Note that you can retrigger and cancel Autophone jobs using the Treeherder UI.
Monitoring Autophone Performance tests on Perfherder
TODO
Monitoring Autophone Performance tests on Phonedash
When you first load phonedash, it defaults to loading a summary graph of all repositories, tests and devices for the last day. The graph is scaled to the window size at the time the graph is displayed. If you wish to change the size of the graph, resize the window and reload the page.
Controls
Date selection
Changing the date range controls at the top left of the page will automatically download the data for the date range and redraw the graph with the new date selection.
TIP: If you wish to pick a date range that ends in the past, you can prevent Autophone from unnecessarily loading data by first changing the end date, then changing the start date.
Many of the non-date controls are created from the test runs contained in the selected date range. If you make changes to the non-date controls, then change the date selection new controls will be created for any new tests, repositories or devices which were detected. In this case, you may need to manually select the new devices then click the Apply button to update the graph with the new data. You can quickly select all of the non-date controls and redraw the graph by clicking the Reset button.
Apply and Reset buttons
Below the Apply and Reset buttons, are a set of inputs which control which data is displayed in the graph. Changing these controls does not automatically redraw the graph.
In order to make your non-date control changes effective and redraw the graph, click on the Apply button. This allows you to change what and how data is graphed without having to download the data again.
To reset all of the non-date controls to their default values and redraw the graph, click the Reset button.
Non-date Controls
- Binning
Binning controls how the various measurements are combined together to create the data series in the graph. Binning involves combining measurements by taking the geometric mean of all measurements which have the same binning value. This can sometimes help reviewing all of the repositories and/or tests looking for regressions or improvements. Once an interesting repository, test, metric or phone type is identified, the irrelevant items and be eliminated and the binning increased to highlight in changes in detail.
Note that changes in small values have a small effect on binned results. Relying solely on too gross of a binning may result in missing changes to small values such as the Throbber start times.
The possible binnings are:
- repo
- repo phonetype
- repo phonetype phoneid
- repo phonetype phoneid test_name
- repo phonetype phoneid test_name cached_label
- repo phonetype phoneid test_name cached_label metric
The finest level of binning is repo phonetype phoneid test_name cached_label metric' where the measurements are not binned at all.
- Trim min/max values
When the Trim min/max values checkbox is checked, the minimum and maximum values of the 8 iterations for a data point are ignored when displaying the graph. This can be helpful if a measurement contains outliers which obscure the true behavior.
- Exclude rejected results
In order to deal with the sometimes flaky behavior of the devices, Autophone will "reject" a set of measurements if the estimated standard error percentage exceeds a given threshold which is currently 50% of the value. If a set of measurements is rejected, Autophone will re-run the test in the hope that the variability was temporary. Even though the measurements are "rejected", they are still stored and are available when "Include rejected results" is selected.
By default, phonedash ignores measurements which were originally rejected by Autophone due to a high standard error. You can include thes values by changing the value from Exclude rejected results to Include rejected results.
- Error bars
Errors bars are only displayed when the maximum binning * repo phonetype phoneid test_name cached_label metric is selected.
By default, phonedash does not display error bars since they can obscure details in the graph. If you wish to display Error bars, change No Error bars to Error bars.
- Error type
By default, phonedash displays the standard error which is calculated from the standard deviation by diving the standard deviation by the square root of the number of observations.
To see the standard deviation instead of the 'standard error', change the select.
- Measurement type
The data reported to the phonedash database consists of the raw data for all iterations in a test. Measurement type refers to how the phonedash web application treats these individual iterations when reporting the value for a test.
- All
- Mean
- Geometric Mean
- Median
- Minimum
All displays each iteration separately. The other choices display the respective calculation on the iteration values. Measurement type controls what kinds of values are used in binning.
- Tests
Each test detected in the requested date range is given a checkbox control. By checking or unchecking the tests, you can control which tests are displayed.
- Metrics
Metrics refers to the measurement of the Throbber start, Throbber stop values or their difference, Throbber time. By checking or unchecking the metrics, you can control which are displayed.
- Cached
Cached refers to whether the measurement is for the first or second visit to the test page. By checking or unchecking the cached values, you can control which are displayed.
- Repositories
Each repository detected in the requested date range is given a checkbox control. By default, all repositories except try are checked. By checking or unchecking the repositories, you can control which repositories are displayed.
- Phones
Phones are named according to their model and an sequential numeric identifier. For example, nexus-6p-1 is the first Nexus 6P device.
Each phone detected in the requested date range is given a checkbox control. By default, all devices are checked. By checking or unchecking the devices, you can control which devices are displayed.
Individual devices are grouped under their type which is also given a checkbox. Changing this device type checkbox will force the devices of that type to be either checked or unchecked to match the state of the device type checkbox.