ReleaseEngineering/How To/Manage Buildbot with Fabric
RelEng has started writing some tools to manage all the buildbot masters using fabric.
Currently the tools are available from the braindump repository
Fabric is a pre-requisite for running these tools. It is easy-installable into a virtual environment.
Usage
python manage_masters.py -f http://people.mozilla.org/~catlee/production-masters.json -H scheduler check
buildbot-wrangler.py
Make sure you run fabric from "buildbot-related" since buildbot-wrangler.py is there and needs to be uploaded to the masters when we try to do a reconfigure.
Traceback (most recent call last): File "build/bdist.macosx-10.6-universal/egg/fabric/main.py", line 540, in main File "/Users/armenzg/repos/releng/braindump/buildbot-related/master_fabric.py", line 99, in reconfig put('buildbot-wrangler.py', '%s/buildbot-wrangler.py' % m['master_dir']) File "build/bdist.macosx-10.6-universal/egg/fabric/network.py", line 391, in host_prompting_wrapper File "build/bdist.macosx-10.6-universal/egg/fabric/operations.py", line 283, in put ValueError: 'buildbot-wrangler.py' is not a valid local path or glob. Disconnecting from production-master02.build.mozilla.org... done.
Suggestions
Don't use fabric with the test masters to reconfigure if you are in a rush (backing something out) as it takes forever (sequential reconfigures).
If you need to reconfigure everything it is much better if you run four instances of fabric (each on a different terminal). The reconfigure step is blocking and it won't continue to the next host on a role group until it finishes.
# in case it is not clear; Run each one on a different window python manage_masters.py -f production_masters.json -R scheduler reconfigure python manage_masters.py -f production_masters.json -R build reconfigure python manage_masters.py -f production_masters.json -R try reconfigure python manage_masters.py -f production_masters.json -R tests reconfigure
Hosts and role groups
Fabric works on individual hosts, and supports organizing these hosts into groups. This is mostly a good fit for how we need to work, except we often have multiple buildbot masters on a single host, so there is a bit of hacking in master_fabric.py to pick out the right hosts to operate on depending on what the user has selected.
Hosts are selected with the -H flag, and roles are selected with the -R flag. Hosts correspond to the 'name' field in the masters json file, and are short abbreviations to refer to each master, e.g. pm01-bm, pm01-sm, pm02-try. We have 4 roles defined: 'build', 'scheduler', 'try', and 'tests'. Selecting a role will restrict fabric to only operate on masters that operate on that role.
The string 'all' when specified via -H or -R means that all masters in the masters file will be operated on.
Fabric relies on being able to ssh to the masters without password authentication, so be sure to have your ssh keys set up!
Updating checkout
python manage_masters.py -f production_masters.json -R scheduler update [production-master02.build.mozilla.org] run: hg pull [production-master02.build.mozilla.org] out: pulling from http://hg.mozilla.org/build/buildbotcustom [production-master02.build.mozilla.org] out: searching for changes [production-master02.build.mozilla.org] out: adding changesets [production-master02.build.mozilla.org] out: adding manifests [production-master02.build.mozilla.org] out: adding file changes [production-master02.build.mozilla.org] out: added 11 changesets with 19 changes to 12 files [production-master02.build.mozilla.org] out: (run 'hg update' to get a working copy) [production-master02.build.mozilla.org] run: hg update -r default [production-master02.build.mozilla.org] err: .hgtags@8546abc704ee, line 93: tag 'FIREFOX_3_6_9_BUILD1' refers to unknown node [production-master02.build.mozilla.org] err: .hgtags@8546abc704ee, line 94: tag 'FIREFOX_3_6_9_RELEASE' refers to unknown node [production-master02.build.mozilla.org] out: 12 files updated, 0 files merged, 2 files removed, 0 files unresolved [production-master02.build.mozilla.org] run: hg pull [production-master02.build.mozilla.org] out: pulling from http://hg.mozilla.org/build/buildbot-configs [production-master02.build.mozilla.org] out: searching for changes [production-master02.build.mozilla.org] out: adding changesets [production-master02.build.mozilla.org] out: adding manifests [production-master02.build.mozilla.org] out: adding file changes [production-master02.build.mozilla.org] out: added 35 changesets with 49 changes to 32 files [production-master02.build.mozilla.org] out: (run 'hg update' to get a working copy) [production-master02.build.mozilla.org] run: hg update -r default [production-master02.build.mozilla.org] err: .hgtags@ac95f8973f7e, line 221: tag 'FIREFOX_3_6_13_RELEASE' refers to unknown node [production-master02.build.mozilla.org] err: .hgtags@ac95f8973f7e, line 222: tag 'FIREFOX_3_6_13_BUILD1' refers to unknown node [production-master02.build.mozilla.org] out: 32 files updated, 0 files merged, 0 files removed, 0 files unresolved [production-master01.build.mozilla.org] run: hg pull [production-master01.build.mozilla.org] out: pulling from http://hg.mozilla.org/build/buildbotcustom [production-master01.build.mozilla.org] out: searching for changes [production-master01.build.mozilla.org] out: adding changesets [production-master01.build.mozilla.org] out: adding manifests [production-master01.build.mozilla.org] out: adding file changes [production-master01.build.mozilla.org] out: added 5 changesets with 13 changes to 10 files [production-master01.build.mozilla.org] out: (run 'hg update' to get a working copy) [production-master01.build.mozilla.org] run: hg update -r default [production-master01.build.mozilla.org] out: 10 files updated, 0 files merged, 2 files removed, 0 files unresolved [production-master01.build.mozilla.org] run: hg pull [production-master01.build.mozilla.org] out: pulling from http://hg.mozilla.org/build/buildbot-configs [production-master01.build.mozilla.org] out: searching for changes [production-master01.build.mozilla.org] out: adding changesets [production-master01.build.mozilla.org] out: adding manifests [production-master01.build.mozilla.org] out: adding file changes [production-master01.build.mozilla.org] out: added 10 changesets with 11 changes to 9 files [production-master01.build.mozilla.org] out: (run 'hg update' to get a working copy) [production-master01.build.mozilla.org] run: hg update -r default [production-master01.build.mozilla.org] out: 9 files updated, 0 files merged, 0 files removed, 0 files unresolved Done. Disconnecting from production-master01.build.mozilla.org... done. Disconnecting from production-master02.build.mozilla.org... done.
Show which revisions are checked out
python manage_masters.py -f production_masters.json -R build,scheduler show_revisions bm3 94b7596a2523 632937d89dd7 pm02-sm 94b7596a2523 632937d89dd7 pm01-bm 94b7596a2523+ 632937d89dd7 pm01-sm 94b7596a2523+ 632937d89dd7 bm4 94b7596a2523 632937d89dd7
Looks like we have some local modifications!
Checkconfig
python manage_masters.py -f production_masters.json -R build,scheduler checkconfig bm3 OK pm02-sm OK pm01-bm OK pm01-sm OK bm4 OK Done. Disconnecting from buildbot-master1.build.mozilla.org... done. Disconnecting from production-master01.build.mozilla.org... done. Disconnecting from buildbot-master2.build.mozilla.org... done. Disconnecting from production-master02.build.mozilla.org... done.
Reconfigure
python manage_masters.py -f production_masters.json -R build reconfig [buildbot-master1.build.mozilla.org] put: buildbot-wrangler.py -> /builds/buildbot/build_master3/master/buildbot-wrangler.py [buildbot-master1.build.mozilla.org] run: rm -f *.pyc [buildbot-master1.build.mozilla.org] run: python buildbot-wrangler.py reconfig . [production-master01.build.mozilla.org] put: buildbot-wrangler.py -> /builds/buildbot/builder_master1/buildbot-wrangler.py [production-master01.build.mozilla.org] run: rm -f *.pyc [production-master01.build.mozilla.org] run: python buildbot-wrangler.py reconfig . [production-master01.build.mozilla.org] err: 2010-11-24 06:58:26-0800 [Broker,252,10.2.71.15] Unhandled Error [production-master01.build.mozilla.org] err: Traceback (most recent call last): [production-master01.build.mozilla.org] err: Failure: twisted.spread.pb.PBConnectionLost: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionDone'>: Connection was closed cleanly. [production-master01.build.mozilla.org] err: ] [buildbot-master2.build.mozilla.org] put: buildbot-wrangler.py -> /builds/buildbot/build_master4/master/buildbot-wrangler.py [buildbot-master2.build.mozilla.org] run: rm -f *.pyc [buildbot-master2.build.mozilla.org] run: python buildbot-wrangler.py reconfig . Done. Disconnecting from buildbot-master1.build.mozilla.org... done. Disconnecting from production-master01.build.mozilla.org... done. Disconnecting from buildbot-master2.build.mozilla.org... done.