ReleaseEngineering:Sheriffing:HowTo
Jump to navigation
Jump to search
Return to ReleaseEngineering:Sheriffing
This page serves as a clearinghouse of information on how to perform the various duties associated with buildduty.
Try Server
How do I trigger a talos run for a given try build?
- When someone pings you in #build with their try run dir name (format: $email-$changeset eg: lsblkk@mozilla.org-4asf23fsd251d):
- Either:
- ssh into production-master{01,02,03}
- OR run from your machine tools/buildfarm/maintenance/try_sendchange.py
- on the production-masters, there is a ~/try_sendchange.sh wrapper script which uses argparse in /tools/buildbot-0.8.0/bin/python
Then run:
./try_sendchange.sh $email-$changeset # OR to do custom set of talos suites ./try_sendchange.sh $email-$changeset --t scroll,svg,nochrome # NOTICE no spaces between comma-separated suite names!
- It will spew back to you all the sendchanges it does.
How do I cancel existing jobs?
on pm02, 'history | grep cancell' for sample usage
To loan slaves
- change cltbld's and root's password (passwd)
- change vnc's password (Linux: vncpasswd / Windows: UltraVNC server "admin properties" on bottom right task bar / OSX: Control Panel -> Sharing)
- disable buildbot from running after reboot (rename buildbot.tac / rename startTalos.bat for Windows)
- [only for build slaves] remember to remove all .ssh keys
- provide to developer the IP address, cltbld's password and VNC's password
To change autologin
- start -> run -> control userpasswords2
- (on w7, start -> Search programs and files -> netplwiz)
- check the option “Users must enter a user name and password to use this computer”
- apply
- uncheck the option “Users must enter a user name and password to use this computer”
- apply
- account: cltbld, enter new password twice
Dealing with machines
- The current downtime bug should always be aliased as "releng-downtime": http://is.gd/cQO7I
- The current machine reboots bug should always be aliased as "reboots": http://is.gd/dqSV0
Mobile
n810s
Once a device hits a hard state (100% of retries), it is dead. Please use this template to file a new bug with the device names.
- 8 devices per bug max
- if the newest open reimage bug has less then 8 devices, please add to it until it has 8
- once the newest bug has 8 device in it open a new bug
- any bug that is resolved should not have any devices added to it
Nagios
- All unacknowledged problems:
- all unacknowledged problems which have notifications enabled:
- All unacknowledged problems with notifications enabled with HARD failure states (i.e. have hit the retry attempt ceiling):
Coordinate downtime with IT
- Some IT maintenance requires tree closure. Details here: ReleaseEngineering:RelEngITSharedDowntime
- If possible, consolidate RelEng and IT downtimes that need tree closures to avoid having two tree closures soon after each other. This is "nice to do", not a "requirement"; if it reduces risk by doing two separate downtimes, thats fine!