ReleaseEngineering/How To/Manage Buildbot with Fabric: Difference between revisions

 
(3 intermediate revisions by 2 users not shown)
Line 5: Line 5:


[http://docs.fabfile.org/0.9.3/ Fabric] is a pre-requisite for running these tools.  It is easy-installable into a virtual environment. Setup of <tt>ssh-agent</tt> is strongly recommended (see [[#Hosts_and_role_groups|below]] for details.
[http://docs.fabfile.org/0.9.3/ Fabric] is a pre-requisite for running these tools.  It is easy-installable into a virtual environment. Setup of <tt>ssh-agent</tt> is strongly recommended (see [[#Hosts_and_role_groups|below]] for details.
= Setup =
hg clone ssh://hg.mozilla.org/build/tools
cd tools
mkvirtualenv tools
pip install fabric


= Usage =
= Usage =
Line 35: Line 42:
Don't use fabric with the test masters to reconfig if you are in a rush (backing something out) as it takes forever (sequential reconfigs).
Don't use fabric with the test masters to reconfig if you are in a rush (backing something out) as it takes forever (sequential reconfigs).


If you need to reconfig everything it is much better if you run four instances of fabric (each on a different terminal). The reconfig step is blocking and it won't continue to the next host on a role group until it finishes.
If you need to reconfig everything it is much better if you run four instances of fabric (each on a different terminal). The reconfig step is blocking and it won't continue to the next host on a role group until it finishes. (Remember the reconfig step does NOT update.)


  # in case it is not clear; Run each one on a different window
  # in case it is not clear; Run each one on a different window
  python manage_masters.py -f production-masters.json -R scheduler reconfig
  python manage_masters.py -f production-masters.json -j16 -R scheduler update checkconfig reconfig
  python manage_masters.py -f production-masters.json -R build reconfig
  python manage_masters.py -f production-masters.json -j16 -R build     update checkconfig reconfig
  python manage_masters.py -f production-masters.json -R try reconfig
  python manage_masters.py -f production-masters.json -j16 -R try       update checkconfig reconfig
  python manage_masters.py -f production-masters.json -R tests reconfig
  python manage_masters.py -f production-masters.json -j16 -R tests    update checkconfig reconfig
 
The tests reconfig can take a really long time, so you can parallelize the test process using -M {macosx|windows|linux|panda} (instead of "-R tests") each on a different tab plus -j16. So, replace the last line/window with these 5 (for a total of 8 windows):


The tests reconfig can take a really long time, so you can parallize the test process using -M {macosx|windows|linux|tegra} each on a different tab plus -j4.
python manage_masters.py -f production-masters.json -j16 -M macosx  update checkconfig reconfig
python manage_masters.py -f production-masters.json -j16 -M windows update checkconfig reconfig
python manage_masters.py -f production-masters.json -j16 -M linux  update checkconfig reconfig
python manage_masters.py -f production-masters.json -j16 -M tegra  update checkconfig reconfig
python manage_masters.py -f production-masters.json -j16 -M panda  update checkconfig reconfig
 
To validate the above (i.e. we haven't added any new platforms since the docs were updated), run:
diff -u \
  <(./manage_masters.py -f production-masters.json -l -R tests) \
  <(./manage_masters.py -f production-masters.json -l -M macosx \
      -M windows -M linux -M tegra -M panda)
If any differences are reported, include those platforms and update the docs.


= Hosts and role groups =
= Hosts and role groups =
Line 50: Line 70:
Hosts are selected with the -H flag, and roles are selected with the -R flag.  Hosts correspond to the 'name' field in the masters json file, and are short abbreviations to refer to each master, e.g. bm13-build1, bm19-tests1-tegra, bm33-try1, bm36-build_scheduler.  We have 4 roles defined: '''build''', '''scheduler''', '''try''', and '''tests'''.  Selecting a role will restrict fabric to only operate on masters that operate on that role.
Hosts are selected with the -H flag, and roles are selected with the -R flag.  Hosts correspond to the 'name' field in the masters json file, and are short abbreviations to refer to each master, e.g. bm13-build1, bm19-tests1-tegra, bm33-try1, bm36-build_scheduler.  We have 4 roles defined: '''build''', '''scheduler''', '''try''', and '''tests'''.  Selecting a role will restrict fabric to only operate on masters that operate on that role.


The string 'all' when specified via -H or -R means that all masters in the masters file will be operated on. You can also use -M flag to match on strings in the master name, eg -M tests1-windows to pick up all the windows test masters.
The string 'all' when specified via -H or -R means that all masters in the masters file will be operated on. You can also use -M flag to match on strings in the master name, eg -M tests1-windows to pick up all the windows test masters. Note that manage_masters.py will "or" all host specifications from the command line, e.g. "-R tests -M windows" will return all hosts in role "tests", not just the windows test masters.


Fabric relies on being able to ssh to the masters without password authentication, so be sure to have your ssh keys set up! Which means have the needed keys added into the running instance of your ssh-agent (your "<tt>~/.ssh/config</tt>" file is ''not'' consulted by Paramiko.) If you don't have the keys set up, you'll be asked for your password one time per invocation, so use multiple commands per invocation where appropriate.
Fabric relies on being able to ssh to the masters without password authentication, so be sure to have your ssh keys set up! Which means have the needed keys added into the running instance of your ssh-agent (your "<tt>~/.ssh/config</tt>" file is ''not'' consulted by Paramiko.) If you don't have the keys set up, you'll be asked for your password one time per invocation, so use multiple commands per invocation where appropriate.
Line 138: Line 158:


= Reconfigure =
= Reconfigure =
'''''Reminder:''''' ''<tt>reconfigure</tt> only does the reconfig; you need to have previously done an '<tt>update</tt>' and '<tt>checkconfig</tt>'''
<pre>
<pre>
python manage_masters.py -f production-masters.json -R build reconfig     
python manage_masters.py -f production-masters.json -R build reconfig     
Line 162: Line 185:


If the reconfig gets stuck, see [https://wiki.mozilla.org/ReleaseEngineering/How_To/Unstick_a_Stuck_Slave_From_A_Master How To/Unstick a Stuck Slave From A Master].
If the reconfig gets stuck, see [https://wiki.mozilla.org/ReleaseEngineering/How_To/Unstick_a_Stuck_Slave_From_A_Master How To/Unstick a Stuck Slave From A Master].
As a special case for test masters, you can unstick things by either:
* triggering a "Clean Shutdown" from the web UI for that master, or
* using manage_masters.py graceful_restart command
After jobs complete, the master will shut down (web page will not be served). Fabric should notice and unstick itself at that point. If fabric doesn't notice, in a separate window, individually do the update and start steps. If fabric still doesn't notice, good luck and document what works.
Confirmed users
1,989

edits