B2G/QA/Automation/UI/Strategy/Integration vs End to end
Objective
Separate concerns for Continuous Integration (CI) vs. User Acceptance Testing (UAT) so that test processes and automation harnesses can optimize for their specific purposes and minimize risk and dependency.
Challenges Addressed
- Development and QA have different approaches and needs from UI testing
- Team has been blocked too much by cross-team dependencies
The Problem
To date, UI automation has been treated as a single type of testing, with the differences expressed largely as to whether it was running on TBPL or not, was written in JS or Python, who sheriffed, and so forth. And so much discussion, debate, and analysis paralysis has involved whether the automation should be run, written, or treated a particular way. It has been all too common to reach an impasse based on differing opinions.
But not all automation is the same just because it targets the UI, just as not all testing is the same just because it targets the SUT.
In particular, automation run under continuous integration is generally subject to a series of restrictions meant to ensure that the test results are returned quickly and in an unambiguous fashion. Those restrictions reduce the usefulness of the automation for use in acceptance testing, which must accept a higher level of fragility to be more comprehensive.
When viewed through the lens of the other purpose, each type of automation is inferior:
Continuous integration automation is incomplete and sloppy, trading expedience for coverage, and is generally written from a developer's perspective to verify what they did write--which might not actually meet the requirements of what they should have written. As such, it frequently falls short of the needs of acceptance.
And acceptance automation is too slow, device-bound, and sometimes has spurious failures due to non-determinism inherent in the SUT. Since the user scenarios it tests often bridge multiple parts of the system, it's very fragile with poor isolation. The scope also means it often can't be written adequately until entire subsystems are in place. So it's often wholly unsuitable for the incremental growth, quick land-or-not decisions, and debugging assistance that a good continuous integration suite provides.
However, these are weaknesses in perception. When used for their own specific purposes, each type of automation provides great value. Further, the compromises each makes are compensated for by the other.
The Solution
The solution is to separate the concerns. And since each type has a different primary stakeholder, ownership should be separated as well.
Two different suites of UI automation
- Continuous Integration
- Must be quick-running with no non-deterministic factors.
- Results must be absolutely unambiguous.
- Coverage is restricted because of these rules
- Will be run prior to each commit and can prevent a bad code change from landing
- Scope, detail, and maintenance is owned by functional teams as part of their code
- Small isolated tests can be checked in with code changes immediately
- Is sheriffed reliably and quickly through long-standing and established process
- Runs on B2G Desktop currently, will run on device as well soon.
- User Acceptance Testing
- Can have longer-running tests and have reasonable amounts of fragility due to non-determinism
- Results can be occasionally ambiguous as a trade off for higher coverage
- Has much fewer restrictions than CI, so long as time spent triaging failures isn't extreme
- Runs after each Tinderbox build and find post-integration bugs quickly
- Scope, detail, and maintenance is owned by QA
- Tests can be created after area is delivered and UI is stable enough to make cost acceptable
- Must be sheriffed by QA, via a combination of alerts and result reviews
- Primarily runs on-device to test full stack
Different Contexts
The main differences are:
- CI tests run faster and are generally more isolated.
- UATs have more coverage than CI due to less rules.
- CI tests can never fail unless there's a defective code change. They must be unambiguous.
- UATs can still fail non-deterministically if they are reliable enough to be a net gain.
- CI tests are maintained by development to give themselves confidence in code changes
- UATs are maintained by QA to replace or extend manual regression testing.
- CI tests frequently test fragments of UI behavior and may be based on mocks or other low-level objects
- UATs are always written as complete user-like scenarios, and generally operate and verify as the user would.
- It is more important that CI tests be solid than that they're complete. Any incremental gain is valuable.
- It is more important that UATs be complete than solid. All acceptance criteria must be tested to accept a build.
Ownership and Overlap
Ownership is separated because these differences can lead to variant needs in terms of breadth and depth of verification.
In addition, tests can and should overlap between CI and UATs. It is unacceptable process complication for QA to expect developers to always consult their needs and vice versa for every change. If the suites are united, it's all too easy for one group to inadvertently make a change that damages the purpose of the other. For acceptance, in particular, this is too risky.
Of course, each group should have an opinion on coverage for either suite, and can (and should) help expand each suite, but single point of ownership allows decisions to be made quickly as appropriate for each set of primary stakeholders.
Via skillful code reuse, each suite can be treated as an independent target without increasing maintenance unacceptably.
Timeline
The time is now.
Unlike other aspects of our strategy, this is a perspective change and articulating a dual path towards the rest of our plans:
QA has defined and will own UATs as an aid to its acceptance mandate, development will continue to own CI as an aid to its own processes.
Risks
None.