Auto-tools/Projects/Lifeguard: Difference between revisions

Jump to navigation Jump to search
no edit summary
No edit summary
Line 1: Line 1:
= Goals =
= Goals =


Lifeguard will be a library used to manage a pool of devices, primarily those in use by buildbot for running mobile/B2G unittests.  Lifeguard will track the status of all devices registered with it, attempt to recover devices that go offline, and respond to requests for a device by "checking out" a known good device and returning its identifier to the requester.
Lifeguard is a library that provides the ability to query the state of mobile devices, generally via SUTAgent, and to attempt to recover devices that experience problems, such as bad SD cards. It will be an important part of [[Auto-tools/Projects/MozPool]]. Ideally it will function in a standalone environment as well, though it will probably still require a local MozPool.


= Non-Goals =
= Non-Goals =


* Since Lifeguard will be running on a small number of machines (potentially just one), I think we can target a specific OS (linux?) and specific Python version (2.7); I don't think we need to require Python 2.4/2.5/2.6 compatibility, or explicit support or testing for MacOSX or Windows.
* As with MozPool, ideally it will work on all major platforms, though Linux will be targetted initially. Lifeguard will target Python 2.7 but may work with earlier versions.


= Design =
= Design =


TBD, but [https://github.com/mozilla/autophone Autophone] provides some of this already, so probably is a good place to start.  There's also a bunch of code in [http://mxr.mozilla.org/build/source/tools/sut_tools/ sut_tools] that's currently used to verify device state; we could potentially re-use some of this.
TBD, but [[Auto-tools/Projects/AutoPhone|AutoPhone]] provides some of this already, so probably is a good place to start.  There's also a bunch of code in [http://mxr.mozilla.org/build/source/tools/sut_tools/ sut_tools] that's currently used to verify device state; we could potentially re-use some of this.


= Requirements =
= Requirements =


# Lifeguard should provide a mechanism for devices to register themselves with it.
# Once a device is registered with Lifeguard, it should track its state (via MozPool), and should periodically ping the device to verify that it's still alive.
# Once a device is registered with Lifeguard, it should track its state, and should periodically ping the device to verify that it's still alive.
## If the ping fails, Lifeguard should attempt to recover the device by resetting its power up to N times.
## If the ping fails, Lifeguard should attempt to recover the device by resetting its power up to N times.
## If resetting the device fails, Lifeguard should mark the device as offline.
## If resetting the device fails, Lifeguard should mark the device (in MozPool) as offline.
# Lifeguard should provide a web UI that users can use to see and change the status of connected devices.
## The UI should allow users to set the status of the device (offline or online) and to attempt rebooting or resetting its power.
## If a user marks a device as online, Lifeguard should attempt to bring the device online; if it fails, it should return the status to offline.
# Lifeguard should provide an API with which remote components can interact with it (via TCP sockets or HTTP), and should include the following:
# Lifeguard should provide an API with which remote components can interact with it (via TCP sockets or HTTP), and should include the following:
## an API to request a device for testing.  This API should accept some parameters: processor (armv6 vs armv7), hardware type (panda, ...), pool (b2g vs mobile), and potentially android version.  It should return an identifier of a device that it has a valid recent ping, and then mark the status of the device accordingly (e.g., 'checked_out').
## an API to return a checked_out device to the pool.  This API should accept a device identifier.  After being returned to the pool, Lifeguard should reboot the device (?) and verify it is alive, after which the status should be updated to online.
## an API to flash a given B2G build on a device (do we need to be able to flash fennec boards as well?).
## an API to flash a given B2G build on a device (do we need to be able to flash fennec boards as well?).
## an API to reboot a device, given its identifier.
## an API to reboot a device, given its identifier.
## an API to reset the power on a device, given its identifier.
## an API to reset the power on a device, given its identifier.
## an API to get the current status of a device, given its identifier.  The status should include device state, and any other details that another process would need in order to initiate a remote flash of the device.
## an API to set the current state of a device, given its identifier.
## additional APIs needed to support the Web UI above
# Lifeguard should scale to handle a large number of devices (several hundred, exact number still TBD)
# Lifeguard should have unit tests, that we can run before committing changes.
# Lifeguard should have unit tests, that we can run before committing changes.
# Lifeguard should have a staging environment, and integration test that we can run in a live environment.
# Lifeguard should maintain a detailed log including, among other things, errors and all API requests.
# Lifeguard should maintain a detailed log including, among other things, details on device registrations, device state transitions, and all API requests.
# Lifeguard should be well-documented.
# Lifeguard should be well-documented.


= Open Questions =
= Open Questions =


* How should Lifeguard verify a device is online?  Is a simple ping enough?  Should we periodically reboot online devices, ala Autophone?
* How should Lifeguard verify a device is online?  Is a simple ping enough?
* Do we need to provide a Python client for this, that buildbot will use to communicate with Lifeguard?  No, mozharness will interact with Lifeguard and so can use a Python REST/HTTP client.
* How does Lifeguard fit in with [[ReleaseEngineering/BlackMobileMagic|BlackMobileMagic]]?
Confirmed users
1,927

edits

Navigation menu