Confirmed users
1,989
edits
(3 intermediate revisions by 2 users not shown) | |||
Line 5: | Line 5: | ||
[http://docs.fabfile.org/0.9.3/ Fabric] is a pre-requisite for running these tools. It is easy-installable into a virtual environment. Setup of <tt>ssh-agent</tt> is strongly recommended (see [[#Hosts_and_role_groups|below]] for details. | [http://docs.fabfile.org/0.9.3/ Fabric] is a pre-requisite for running these tools. It is easy-installable into a virtual environment. Setup of <tt>ssh-agent</tt> is strongly recommended (see [[#Hosts_and_role_groups|below]] for details. | ||
= Setup = | |||
hg clone ssh://hg.mozilla.org/build/tools | |||
cd tools | |||
mkvirtualenv tools | |||
pip install fabric | |||
= Usage = | = Usage = | ||
Line 35: | Line 42: | ||
Don't use fabric with the test masters to reconfig if you are in a rush (backing something out) as it takes forever (sequential reconfigs). | Don't use fabric with the test masters to reconfig if you are in a rush (backing something out) as it takes forever (sequential reconfigs). | ||
If you need to reconfig everything it is much better if you run four instances of fabric (each on a different terminal). The reconfig step is blocking and it won't continue to the next host on a role group until it finishes. | If you need to reconfig everything it is much better if you run four instances of fabric (each on a different terminal). The reconfig step is blocking and it won't continue to the next host on a role group until it finishes. (Remember the reconfig step does NOT update.) | ||
# in case it is not clear; Run each one on a different window | # in case it is not clear; Run each one on a different window | ||
python manage_masters.py -f production-masters.json -R scheduler reconfig | python manage_masters.py -f production-masters.json -j16 -R scheduler update checkconfig reconfig | ||
python manage_masters.py -f production-masters.json -R build reconfig | python manage_masters.py -f production-masters.json -j16 -R build update checkconfig reconfig | ||
python manage_masters.py -f production-masters.json -R try reconfig | python manage_masters.py -f production-masters.json -j16 -R try update checkconfig reconfig | ||
python manage_masters.py -f production-masters.json -R tests reconfig | python manage_masters.py -f production-masters.json -j16 -R tests update checkconfig reconfig | ||
The tests reconfig can take a really long time, so you can parallelize the test process using -M {macosx|windows|linux|panda} (instead of "-R tests") each on a different tab plus -j16. So, replace the last line/window with these 5 (for a total of 8 windows): | |||
python manage_masters.py -f production-masters.json -j16 -M macosx update checkconfig reconfig | |||
python manage_masters.py -f production-masters.json -j16 -M windows update checkconfig reconfig | |||
python manage_masters.py -f production-masters.json -j16 -M linux update checkconfig reconfig | |||
python manage_masters.py -f production-masters.json -j16 -M tegra update checkconfig reconfig | |||
python manage_masters.py -f production-masters.json -j16 -M panda update checkconfig reconfig | |||
To validate the above (i.e. we haven't added any new platforms since the docs were updated), run: | |||
diff -u \ | |||
<(./manage_masters.py -f production-masters.json -l -R tests) \ | |||
<(./manage_masters.py -f production-masters.json -l -M macosx \ | |||
-M windows -M linux -M tegra -M panda) | |||
If any differences are reported, include those platforms and update the docs. | |||
= Hosts and role groups = | = Hosts and role groups = | ||
Line 50: | Line 70: | ||
Hosts are selected with the -H flag, and roles are selected with the -R flag. Hosts correspond to the 'name' field in the masters json file, and are short abbreviations to refer to each master, e.g. bm13-build1, bm19-tests1-tegra, bm33-try1, bm36-build_scheduler. We have 4 roles defined: '''build''', '''scheduler''', '''try''', and '''tests'''. Selecting a role will restrict fabric to only operate on masters that operate on that role. | Hosts are selected with the -H flag, and roles are selected with the -R flag. Hosts correspond to the 'name' field in the masters json file, and are short abbreviations to refer to each master, e.g. bm13-build1, bm19-tests1-tegra, bm33-try1, bm36-build_scheduler. We have 4 roles defined: '''build''', '''scheduler''', '''try''', and '''tests'''. Selecting a role will restrict fabric to only operate on masters that operate on that role. | ||
The string 'all' when specified via -H or -R means that all masters in the masters file will be operated on. You can also use -M flag to match on strings in the master name, eg -M tests1-windows to pick up all the windows test masters. | The string 'all' when specified via -H or -R means that all masters in the masters file will be operated on. You can also use -M flag to match on strings in the master name, eg -M tests1-windows to pick up all the windows test masters. Note that manage_masters.py will "or" all host specifications from the command line, e.g. "-R tests -M windows" will return all hosts in role "tests", not just the windows test masters. | ||
Fabric relies on being able to ssh to the masters without password authentication, so be sure to have your ssh keys set up! Which means have the needed keys added into the running instance of your ssh-agent (your "<tt>~/.ssh/config</tt>" file is ''not'' consulted by Paramiko.) If you don't have the keys set up, you'll be asked for your password one time per invocation, so use multiple commands per invocation where appropriate. | Fabric relies on being able to ssh to the masters without password authentication, so be sure to have your ssh keys set up! Which means have the needed keys added into the running instance of your ssh-agent (your "<tt>~/.ssh/config</tt>" file is ''not'' consulted by Paramiko.) If you don't have the keys set up, you'll be asked for your password one time per invocation, so use multiple commands per invocation where appropriate. | ||
Line 138: | Line 158: | ||
= Reconfigure = | = Reconfigure = | ||
'''''Reminder:''''' ''<tt>reconfigure</tt> only does the reconfig; you need to have previously done an '<tt>update</tt>' and '<tt>checkconfig</tt>''' | |||
<pre> | <pre> | ||
python manage_masters.py -f production-masters.json -R build reconfig | python manage_masters.py -f production-masters.json -R build reconfig | ||
Line 162: | Line 185: | ||
If the reconfig gets stuck, see [https://wiki.mozilla.org/ReleaseEngineering/How_To/Unstick_a_Stuck_Slave_From_A_Master How To/Unstick a Stuck Slave From A Master]. | If the reconfig gets stuck, see [https://wiki.mozilla.org/ReleaseEngineering/How_To/Unstick_a_Stuck_Slave_From_A_Master How To/Unstick a Stuck Slave From A Master]. | ||
As a special case for test masters, you can unstick things by either: | |||
* triggering a "Clean Shutdown" from the web UI for that master, or | |||
* using manage_masters.py graceful_restart command | |||
After jobs complete, the master will shut down (web page will not be served). Fabric should notice and unstick itself at that point. If fabric doesn't notice, in a separate window, individually do the update and start steps. If fabric still doesn't notice, good luck and document what works. |