Auto-tools/Projects/Lifeguard

Goals

Lifeguard will be a library used to manage a pool of devices, primarily those in use by buildbot for running mobile/B2G unittests. Lifeguard will track the status of all devices registered with it, attempt to recover devices that go offline, and respond to requests for a device by "checking out" a known good device and returning its identifier to the requester.

Non-Goals

Lifeguard will not be responsible for flashing or installing builds on target devices.

Design

TBD, but Autophone provides some of this already, so probably is a good place to start.

Requirements

  1. Lifeguard should provide a mechanism for devices to register themselves with it.
  2. Once a device is registered with Lifeguard, it should periodically ping the device to verify that it's still alive.
    1. If the ping fails, Lifeguard should attempt to recover the device by resetting its power up to N times.
    2. If resetting the device fails, Lifeguard should mark the device as offline.
  3. Lifeguard should provide a web UI that users can use to see and change the status of connected devices.
    1. The UI should allow users to set the status of the device (offline or online) and to attempt rebooting or resetting its power.
    2. If a user marks a device as online, Lifeguard should attempt to bring the device online; if it fails, it should return the status to offline.
  4. Lifeguard should provide an API with which remote components can interact with it (via TCP sockets or HTTP), and should include the following:
    1. an API to request a device for testing. This API should accept some parameters: processor (armv6 vs armv7), hardware type (panda, ...), and pool (b2g vs mobile). It should return an identifier of a device that it has a valid recent ping, and then mark the status of the device accordingly (e.g., 'checked_out').
    2. an API to return a checked_out device to the pool. This API should accept a device identifier. After being returned to the pool, Lifeguard should reboot the device and verify it is alive, after which the status should be updated to online.
    3. additional APIs needed to support the Web UI above
  5. Lifeguard should scale to handle a large number of devices (several hundred, still TBD)
  6. Lifeguard should have unit tests, that we can run before changes.
  7. Lifeguard should have a staging environment, and integration test that we can run in a live environment.
  8. Lifeguard should maintain a detailed log including, among other things, details on device registrations, device state transitions, and all API requests.