CIDuty/Other Duties
Tree Maintenance
Repo Errors
If a dev reports a problem pushing to hg (either m-c or try repo) then you need to do the following:
- File a bug (or have dev file it) and then poke in #ops noahm
- If he doesn't respond, then escalate the bug to page on-call
- Follow the steps below for "How do I close the tree"
How do I see problems in TBPL?
All "infrastructure" (that's us!) problems should be purple at http://tbpl.mozilla.org. Some aren't, so keep your eyes open in IRC, but get on any purples quickly.
How do I close the tree?
See ReleaseEngineering/How_To/Close_or_Open_the_Tree
How do I claim a rentable project branch?
See ReleaseEngineering/DisposableProjectBranches#BOOKING_SCHEDULE
Clean up the scheduler DB
Sometimes we get some jobs pending for days: https://secure.pub.build.mozilla.org/buildapi/pending
These are supposed to be cleaned up automatically now. See bug 755012 for details.
Re-run jobs
How to trigger Talos jobs
see ReleaseEngineering/How_To/Trigger_Talos_Jobs
How to re-trigger all Talos runs for a build (by using sendchange)
see ReleaseEngineering/How_To/Trigger_Talos_Jobs
How to re-run a build
Do not go to the page of the build you'd like to re-run and cook up a sendchange to try to re-create the change that caused it. Changes without revlinks trigger releases, which is not what you want.
Find the revision you want, find a builder page for the builder you want (preferably, but not necessarily, on the same master), and plug the revision, your name, and a comment into the "Force Build" form. Note that the YOU MUST specify the branch, so there's no null keys in the builds-running.js. Otherwise your build will not show up in self-serve or tbpl.
Nightlies
How do I re-spin mozilla-central nightlies?
To rebuild the same nightly for some platform
buildbot's Rebuild button works fine. BUT, if the original build uploaded files or published updates you don't want to do this (caching on ftp.m.o will mean the old file is served for some time). Check the log of the original build - if it failed before uploading it's OK to rebuild.
To build new nightlies
NB: For b2g, the revision set can only be the mercurial gecko revision.
To build nightlies on all the platforms
Use self-serve for this. Scroll to the bottom of the page and put the requested revision into the box next to Create new nightly builds on mozilla-central revision. Click on the Submit button. This takes care of using the same buildID for all builds.
To rebuild some subset of all the nightly builders (eg only desktop/Android/b2g)
To reach a specific nightly builder on a build master, you can use a finished nightly job on tbpl and this snippet of javascript code (add it to the bookmark toolbar):
javascript:(function(){function%20JSONRequest(url,callback){var%20req=new%20XMLHttpRequest();req.open('GET',url+'?format=json');req.withCredentials=true;req.timeout=5000;req.onreadystatechange=function(){if(req.readyState===4){if(req.status!==200||!req.responseText){if(req.status===0){alert('Self-Serve%20API%20request%20timed%20out.');}else{alert('Self-Serve%20API%20request%20failed%20('+req.status+').%20\nIs%20the%20job%20still%20pending?%20\n\n'+req.responseText);}return;}try{callback(JSON.parse(req.responseText));}catch(e){alert(e);}}};req.send();}function%20openJob(job){var%20bm=job.claimed_by_name.split(':');var%20port=/build[0-9]/.test(bm[1])?"8001":/try[0-9]/.test(bm[1])?"8101":/tests[0-9]/.test(bm[1])?"8201":'unknown';window.open('http://'+bm[0]+':'+port+'/builders/'+job.buildername+'/builds/'+job.buildnumber);}function%20openCompletedJob(jobs){var%20selectedJob=UserInterface._activeResultObject();var%20startTime=Date.parse(selectedJob.startTime)/1000;jobs=jobs.filter(function(j){return(j.buildername===selectedJob.machine.name)&&((j.starttime-startTime)<=1);});if(jobs.length!==1){alert('Should%20have%20found%20exactly%20one%20matching%20job,%20found%20'+jobs.length+'!');return;}openJob(jobs[0]);}var%20id=(typeof%20UserInterface!=="undefined")?UserInterface._activeResult:;if(!id){alert('Error%20-%20are%20you%20on%20TBPL%20with%20a%20job%20selected?');return;}var%20baseURL='https://secure.pub.build.mozilla.org/buildapi/self-serve/'+UserInterface.treeInfo.buildbotBranch;var%20re=/^(?:running%7Cpending)-/;if(!re.test(id)){var%20rev=UserInterface._activeResultObject().revs[UserInterface.treeInfo.primaryRepo];JSONRequest(baseURL+'/rev/'+rev,openCompletedJob);}else{JSONRequest(baseURL+'/build/'+id.replace(re,),openJob);}})()
You will need to force the specific builders on a build master, setting several parameters. First open a waterfall page (eg mozilla-central) and search for 'nightly'. Open each builder you want in a new tab.
For each builder you should set
- 'Your name' to your name
- 'Reason for build' to 'bug ###' or 'requested by <someone>'
- 'Branch to build' to 'mozilla-central' (other branches may have a prefix, eg releases/mozilla-aurora)
- 'Revision to build' to the <revision>
- 'Property 1': 'Name' to 'buildid', 'Value' to the current Pacific time (eg from running TZ=US/Pacific date +%Y%m%d%H%M%S)
Then click on the 'Force build' button.
The first two are nice-to-have for later debugging. The branch and revisions are required so that TBPL shows the builds while they are pending and running (rather than just after completion). Setting the buildid property the same across all builds helps keep nightlies consistent. An add-on like AutoFill Forms is invaluable in storing these values and filling the form for you, but there is probably a way to do this with curl too (it's a POST).
Disable updates
See ReleaseEngineering/How_To/Shut_off_all_updates for global shutoff. We use Balrog now for nightly & aurora updates.
Talos
How to update the talos zips
We only need to do this for mobile requests.
This deployment is super safe. NPOTB
# running this from cruncher is faster than downloading/uploading from your localhost ssh -A cruncher export URL=http://people.mozilla.org/~jmaher/taloszips/zips/talos.07322bbe0f7d.zip export TALOS_ZIP=`basename $URL` wget $URL #relengwebadmn has limited access to the internet - that is why we scp from another host scp ${TALOS_ZIP} relengwebadm.private.scl3.mozilla.com:/mnt/netapp/relengweb/talos-bundles/zips ssh relengwebadm.private.scl3.mozilla.com "chmod 644 /mnt/netapp/relengweb/talos-bundles/zips/${TALOS_ZIP}" ssh relengwebadm.private.scl3.mozilla.com "sha1sum /mnt/netapp/relengweb/talos-bundles/zips/${TALOS_ZIP}" curl -I http://talos-bundles.pvt.build.mozilla.org/zips/${TALOS_ZIP}
For talos.zip changes: Once deployed, notify the a-team and let them know that they can land at their own convenience.
- Please verify the shasum matches what is in the [comment], we have had a few instances where the talos.zip was incorrect.
Update mobile talos webhosts
Keep track of what revisions is being run. Copy/paste the output into the bug. Please update our maintenance page
This could affect mobile talos numbers or break the jobs altogether. Please coordinate with sheriffs
NOTE: There's a great deal of data we can not check into revision control for legal reasons, so there's an extensive .hgignore file. If you're adding new data to the tree that can not be checked in, please make sure to add it to the .hgignore file as well so that people are not confused by files when they perform an hg status.
- Please ask a talos developer/reviewer or [file a bug] for any [.hgignore] changes
NOTE: For now, we're only using the "old, mac mini" setup. Update both but only talk about the old setup
new, webapp cluster
ssh relengwebadm.private.scl3.mozilla.com sudo su - cd /data/releng/src/talos-remote/www/talos-repo # NOTICE that we have uncommitted files hg status # Take note of the current revision to revert to (just in case) hg id hg pull -u # 488bc187a3ef tip # ..capture the output here; the remainder will be long and not that useful.. /data/releng/src/talos-remote/update
old, mac mini
We have a load balancer (bm-remote) that is in front of three web hosts (bm-remote-talos-0{1,2,3}). Here is how you update them: Update Procedure:
ssh root@bm-remote-talos-webhost-01 cd /var/www/html/talos-repo # NOTICE that we have uncommitted files hg status # Take note of the current revision to revert to (just in case) hg id hg pull -u # 488bc187a3ef tip rsync -azv --delete /var/www/html/. bm-remote-talos-webhost-02:/var/www/html/. rsync -azv --delete /var/www/html/. bm-remote-talos-webhost-03:/var/www/html/.
TBPL
How to deploy changes
RelEng no longer has access to do this. TBPL devs will request a push from Server Ops.
How to hide/unhide builders
- In the 'Tree Info' menu select 'Open tree admin panel'
- Filter/select the builders you want to change
- Save changes
- Enter the sheriff password and a description (with bug number if available) of your changes
- CC :edmorley & :philor on the relevant bug so that they know what to expect when sheriffing.
Ganglia
- if you see that a host is reporting to ganglia in an incorrect manner it might just take this to fix it (e.g. bug 674233):
switch to root, service gmond restart
Queue Directories
If you see this in #build:
<nagios-sjc1> [54] buildbot-master12.build.scl1:Command Queue is CRITICAL: 4 dead items
It means that there are items in the "dead" queue for the given master. You need to look at the logs and fix any underlying issue and then retry the command by moving *only* the json file over to the "new" queue. See the Queue directories wiki page for details.
Cruncher
If you get an alert about cruncher running out of space it might be a sendmail issue (backed up emails taking up too much space and not getting sent out):
<nagios-sjc1> [07] cruncher.build.sjc1:disk - / is WARNING: DISK WARNING - free space: / 384 MB (5% inode=93%):
As root:
du -s -h /var/spool/* # confirm that mqueue or clientmqueue is the oversized culprit # stop sendmail, clean out the queues, restart sendmail /etc/init.d/sendmail stop rm -rf /var/spool/clientmqueue/* rm -rf /var/spool/mqueue/* /etc/init.d/sendmail start
hg<->git conversion
This is a production system RelEng built, but has not yet transitioned to full IT operation. As a production system, it is supported 24x7x365 - escalate to IT oncall (who can page) as needed.
We'll get problem reports from 2 sources:
- via email from vcs2vcs user to release+vcs2vcs@m.c - see email handling instructions for those.
- via a bug report for a customer visible condition - this should only be if there is a new error we aren't detecting ourselves. See the resources below and/or page hwine.
Documentation for this system:
- recent docs (troubleshooting)
- source code: http://hg.mozilla.org/users/hwine_mozilla.com/repo-sync-tools/
- config files: http://hg.mozilla.org/users/hwine_mozilla.com/repo-sync-configs/
All services run as user vcs2vcs on one of the following hosts (as of 2013-01-07): github-sync1-dev.dmz.scl3.mozilla.com, github-sync1.dmz.scl3.mozilla.com, github-sync2.dmz.scl3.mozilla.com, github-sync3.dmz.scl3.mozilla.com.
Handling alert_major_errors
# SSH as yourself to the hostname in the 'from' address of the alert_major_errors email. $ ssh yourname@github-sync3.dmz.scl3.mozilla.com $ sudo su - vcs2vcs $ cd etc # find the repo name that vcs2vcs is complaining about. For example: $ grep releases-mozilla-central-no-cvs * job02_cmds:# "hg:$HOME/repos/releases-mozilla-central-no-cvs" "github" # discover where that job runs $ grep job02 status job02_cmds,github-sync3.dmz.scl3.mozilla.com,m-c w/o cvs as used by b2g # connect to that host the same as we did above (if not already connected) # then $ cd logs/job02 # same job as above $ show_update_errors update.log # Note: the command exit code precedes the command itself # eg. ...;255;hg --cwd...
Continue with instructions here.
disable/re-enable aurora updates
Take care of by the person doing the final release since merge day activities are on the Monday before the release.
Upload
Python packages

From your local machine:
FILE=your_python_package.tar.gz scp $FILE $LDAP_SHORT_USERNAME@relengwebadm.private.scl3.mozilla.com:
From relengwebadm:
ssh $LDAP_SHORT_USERNAME@relengwebadm.private.scl3.mozilla.com FILE=your_python_package.tar.gz sudo mv -vi $FILE /mnt/netapp/relengweb/pypi/pub/ sudo chmod 644 /mnt/netapp/relengweb/pypi/pub/$FILE
From your local machine:
curl -I http://pypi.pub.build.mozilla.org/pub/$FILE # You should see "HTTP/1.1 200 OK"
How to upload to Tooltool
If you don't want to upload from your own laptop (because, eg, you have a slow uplink) you can do this from cruncher.
Access cruncher with your credentials:
ME="your_short_ldap_username" # or `whoami` ssh -A $ME@cruncher.build.mozilla.org
Download the file and then:
scp filename.tar.xz $ME@relengwebadm.private.scl3.mozilla.com:
Login to relengwebadmn:
ssh $ME@relengwebadm.private.scl3.mozilla.com
And deploy the file to tooltool:
FILE=~/emulator.zip # or whatever you're uploading export SHA512=`openssl sha512 $FILE | cut -d' ' -f2` sudo mv -i $FILE /mnt/netapp/relengweb/tooltool/pvt/build/sha512/${SHA512} sudo chmod 644 /mnt/netapp/relengweb/tooltool/pvt/build/sha512/${SHA512} ls -l /mnt/netapp/relengweb/tooltool/pvt/build/sha512/${SHA512}
- Add the filename, filesize, and sha512 digest to the bug you are working on. These can be added to the tooltool manifests later.
How to upload Talos ZIPs
See How to update the talos zips.