ReleaseEngineering:Sheriffing:HowTo: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
No edit summary
 
(34 intermediate revisions by 8 users not shown)
Line 1: Line 1:
<small>Return to [[ReleaseEngineering:Sheriffing]]</small>
See [[ReleaseEngineering:Buildduty]]
 
This page serves as a clearinghouse of information on how to perform the various duties associated with buildduty.
 
__TOC__
 
= Try Server =
== How do I trigger a talos run for a given try build? ==
* When someone pings you in #build with their try run dir name (format: email-changeset):
** ssh into production-master 01 or 03 OR run from your machine tools/buildfarm/maintenance/talos_sendchanges.py (on the production-masters, the script is aliased to respond to "talos")
** <pre>talos mak77@bonardo.net-04da41d5f2ce #example email-changeset</pre>
** It will spew back to you all the sendchanges it does.
 
= To loan slaves =
* change cltbld's and root's password (passwd)
* change vnc's password (Linux: vncpasswd / Windows: UltraVNC server "admin properties" on bottom right task bar / OSX: Control Panel -> Sharing)
* disable buildbot from running after reboot (rename buildbot.tac / rename startTalos.bat for Windows)
* [only for build slaves] remember to remove all .ssh keys
* provide to developer the IP address, cltbld's password and VNC's password
 
== To change autologin ==
* start -> run -> control userpasswords2
** (on w7, start -> Search programs and files -> netplwiz)
* check the option “Users must enter a user name and password to use this computer”
* apply
* uncheck the option “Users must enter a user name and password to use this computer”
* apply
* account: cltbld, enter new password twice
 
= Dealing with machines =
* The current downtime bug should always be aliased as "releng-downtime": http://is.gd/cQO7I
* The current machine reboots bug should always be aliased as "reboots": http://is.gd/dqSV0
==Mobile==
===n810s===
Once a device hits a hard state (100% of retries), it is dead.  Please use [https://bugzilla.mozilla.org/enter_bug.cgi?alias=&assigned_to=server-ops%40mozilla-org.bugs&blocked=588156&bug_file_loc=http%3A%2F%2F&bug_severity=normal&bug_status=NEW&cc=release%40mozilla-org.bugs&comment=Please%20reimage%20the%20following%20n810s%3A%0D%0Amaemo-n810-%0D%0Amaemo-n810-%0D%0Amaemo-n810-%0D%0Amaemo-n810-%0D%0Amaemo-n810-%0D%0Amaemo-n810-%0D%0Amaemo-n810-%0D%0Amaemo-n810-&component=Server%20Operations&contenttypeentry=&contenttypemethod=autodetect&contenttypeselection=text%2Fplain&data=&dependson=&description=&flag_type-4=X&flag_type-607=X&form_name=enter_bug&keywords=&maketemplate=Remember%20values%20as%20bookmarkable%20template&op_sys=Maemo&priority=--&product=mozilla.org&qa_contact=mrz%40mozilla.com&rep_platform=ARM&short_desc=Mobile%20Imaging%20-%20n810%20%28YYYY-MM-DD%20%23X%29&status_whiteboard=%5Bmobile%5D%5Breimaging%5D&target_milestone=---&version=other this template] to file a new bug with the device names.
* 8 devices per bug max
* if the newest open reimage bug has less then 8 devices, please add to it until it has 8
* once the newest bug has 8 device in it open a new bug
* any bug that is resolved should not have any devices added to it
 
== Nagios ==
* All unacknowledged problems:
** https://nagios.mozilla.org/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=28&hoststatustypes=15&serviceprops=10
* all unacknowledged problems which have notifications enabled:
** https://nagios.mozilla.org/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=28&hoststatustypes=15&serviceprops=8202
* All unacknowledged problems with notifications enabled with HARD failure states (i.e. have hit the retry attempt ceiling):
** https://nagios.mozilla.org/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=28&hoststatustypes=15&serviceprops=270346
 
== Coordinate downtime with IT ==
* Some IT maintenance requires tree closure. Details here: [ https://wiki.mozilla.org/ReleaseEngineering:RelEngITSharedDowntime https://wiki.mozilla.org/ReleaseEngineering:RelEngITSharedDowntime]
* If possible, consolidate RelEng and IT downtimes that need tree closures to avoid having two tree closures soon after each other. This is "nice to do", not a "requirement"; if it reduces risk by doing two separate downtimes, thats fine!

Latest revision as of 17:42, 18 July 2011