ReleaseEngineering:Sheriffing:HowTo
Return to ReleaseEngineering:Sheriffing
This page serves as a clearinghouse of information on how to perform the various duties associated with buildduty.
Try Server
How do I trigger a talos run for a given try build?
- When someone pings you in #build with their try run dir name (format: email-changeset):
- ssh into production-master 01 or 03 OR run from your machine tools/buildfarm/maintenance/talos_sendchanges.py (on the production-masters, the script is aliased to respond to "talos")
talos mak77@bonardo.net-04da41d5f2ce #example email-changeset
- It will spew back to you all the sendchanges it does.
To loan slaves
- change cltbld's and root's password (passwd)
- change vnc's password (Linux: vncpasswd / Windows: UltraVNC server "admin properties" on bottom right task bar / OSX: Control Panel -> Sharing)
- disable buildbot from running after reboot (rename buildbot.tac / rename startTalos.bat for Windows)
- [only for build slaves] remember to remove all .ssh keys
- provide to developer the IP address, cltbld's password and VNC's password
To change autologin
- start -> run -> control userpasswords2
- (on w7, start -> Search programs and files -> netplwiz)
- check the option “Users must enter a user name and password to use this computer”
- apply
- uncheck the option “Users must enter a user name and password to use this computer”
- apply
- account: cltbld, enter new password twice
Dealing with machines
- The current downtime bug should always be aliased as "releng-downtime": http://is.gd/cQO7I
- The current machine reboots bug should always be aliased as "reboots": http://is.gd/dqSV0
Mobile
n810s
Once a device hits a hard state (100% of retries), it is dead. Please use this template to file a new bug with the device names.
- 8 devices per bug max
- if the newest open reimage bug has less then 8 devices, please add to it until it has 8
- once the newest bug has 8 device in it open a new bug
- any bug that is resolved should not have any devices added to it
Nagios
- All unacknowledged problems:
- all unacknowledged problems which have notifications enabled:
- All unacknowledged problems with notifications enabled with HARD failure states (i.e. have hit the retry attempt ceiling):
Coordinate downtime with IT
- Some IT maintenance requires tree closure. Details here: [ https://wiki.mozilla.org/ReleaseEngineering:RelEngITSharedDowntime https://wiki.mozilla.org/ReleaseEngineering:RelEngITSharedDowntime]
- If possible, consolidate RelEng and IT downtimes that need tree closures to avoid having two tree closures soon after each other. This is "nice to do", not a "requirement"; if it reduces risk by doing two separate downtimes, thats fine!