Auto-tools/Projects/Lifeguard
< Auto-tools | Projects
Goals
Lifeguard will be a library used to manage a pool of devices, primarily those in use by buildbot for running mobile/B2G unittests. Lifeguard will track the status of all devices registered with it, attempt to recover devices that go offline, and respond to requests for a device by "checking out" a known good device and returning its identifier to the requester.
Non-Goals
Lifeguard will not be responsible for flashing or installing builds on target devices.
Design
TBD, but Autophone provides some of this already, so probably is a good place to start.
Requirements
- Lifeguard should provide a mechanism for devices to register themselves with it.
- Once a device is registered with Lifeguard, it should periodically ping the device to verify that it's still alive.
- If the ping fails, Lifeguard should attempt to recover the device by resetting its power up to N times.
- If resetting the device fails, Lifeguard should mark the device as offline.
- Lifeguard should provide a web UI that users can use to see and change the status of connected devices.
- The UI should allow users to set the status of the device (offline or online) and to attempt rebooting or resetting its power.
- If a user marks a device as online, Lifeguard should attempt to bring the device online; if it fails, it should return the status to offline.
- Lifeguard should provide an API with which remote components can interact with it (via TCP sockets or HTTP), and should include the following:
- an API to request a device for testing. This API should accept some parameters: processor (armv6 vs armv7), hardware type (panda, ...), and pool (b2g vs mobile). It should return an identifier of a device that it has a valid recent ping, and then mark the status of the device accordingly (e.g., 'checked_out').
- an API to return a checked_out device to the pool. This API should accept a device identifier. After being returned to the pool, Lifeguard should reboot the device and verify it is alive, after which the status should be updated to online.
- additional APIs needed to support the Web UI above
- Lifeguard should scale to handle a large number of devices (several hundred, still TBD)
- Lifeguard should have unit tests, that we can run before changes.
- Lifeguard should have a staging environment, and integration test that we can run in a live environment.
- Lifeguard should maintain a detailed log including, among other things, details on device registrations, device state transitions, and all API requests.