CIDuty/Other Duties
Tree Maintenance
Repo Errors
If a dev reports a problem pushing to hg (either m-c or try repo) then you need to do the following:
- File a bug (or have dev file it) and then poke in #ops noahm
- If he doesn't respond, then escalate the bug to page on-call
- Follow the steps below for "How do I close the tree"
How do I see problems in TBPL?
All "infrastructure" (that's us!) problems should be purple at http://tbpl.mozilla.org. Some aren't, so keep your eyes open in IRC, but get on any purples quickly.
How do I close the tree?
See ReleaseEngineering/How_To/Close_or_Open_the_Tree
How do I claim a rentable project branch?
See ReleaseEngineering/DisposableProjectBranches#BOOKING_SCHEDULE
Clean up the scheduler DB
Sometimes we get some jobs pending for days: https://secure.pub.build.mozilla.org/buildapi/pending
Here's how to clean them: TODO
Re-run jobs
How to trigger Talos jobs
see ReleaseEngineering/How_To/Trigger_Talos_Jobs
How to re-trigger all Talos runs for a build (by using sendchange)
see ReleaseEngineering/How_To/Trigger_Talos_Jobs
How to re-run a build
Do not go to the page of the build you'd like to re-run and cook up a sendchange to try to re-create the change that caused it. Changes without revlinks trigger releases, which is not what you want.
Find the revision you want, find a builder page for the builder you want (preferably, but not necessarily, on the same master), and plug the revision, your name, and a comment into the "Force Build" form. Note that the YOU MUST specify the branch, so there's no null keys in the builds-running.js.
Nightlies
How do I re-spin mozilla-central nightlies?
To rebuild the same nightly, buildbot's Rebuild button works fine.
To build a different revision, Force build all builders matching /.*mozilla-central.*nightly/, on any of the regular build masters. Set revision to the desired revision. With no revision set, the tip of the default branch will be used, but it's probably best to get an explicit revision from hg.mozilla.org/mozilla-central. (For b2g, the revision set can only be the mercurial gecko revision.)
You can use https://build.mozilla.org/buildapi/self-serve/mozilla-central to do initiate this build and use the changeset at the tip of http://hg.mozilla.org/mozilla-central. Sometimes the developer will request a specific changeset in the bug. (For b2g, the revision set can only be the mercurial gecko revision.)
To respin just the android nightlies, find the revisions in the fennec*txt file here and here. Then kick off a build (specifying the revision in the revision field) for armv6 and armv7.
To start a new b2g Unagi nightly, force a build on a build master such as bm58. You may want to provide a value for the 'buildid' property such as 20130828155234 (which represents a Pacific timezone date/time).
Disable updates
If you're requested to disable updates for whatever reasons you can log on to aus3-staging to do it. Depending what you're asked to shut off, you'll have to chmod a different directory (or directories) to 700. You can logon to aus3-staging.mozilla.org through ldap account and use 'sudo su - ffxbld' (or tbirdbld) to gain the correct privileges. Some examples of shutting off different updates are below:
- 64-bit Windows on the ux branch:
chmod 700 /opt/aus2/incoming/2/Firefox/ux/WINNT_x86_64-msvc
- All updates on Nightly:
chmod 700 /opt/aus2/incoming/2/Firefox/mozilla-central
- Linux (32-bit + 64-bit) updates on Aurora:
chmod 700 /opt/aus2/incoming/2/Firefox/mozilla-aurora/Linux_x86-gcc3 /opt/aus2/incoming/2/Firefox/mozilla-aurora/Linux_x86_64-gcc3
Talos
Note because a change to the Talos bundle always causes changes in the baseline times, the following should be done for *any* change...
- close all trees that are impacted by the change
- ensure all pending builds are done and GREEN
- do the update step below
- send a Talos changeset to all trees to generate new baselines
How to update the talos zips
NOTE: Deploying talos.zip is not scary anymore as we don't replace the file anymore and the a-team has to land a change in the tree.
You may need to get IT to turn on access to build.mozilla.org.
# on your localhost export URL=http://people.mozilla.org/~jmaher/taloszips/zips/talos.07322bbe0f7d.zip export TALOS_ZIP=`basename $URL` wget $URL # wget from people doesn't work anymore export RELENGWEB_USER=`whoami` scp ${TALOS_ZIP} ${RELENGWEB_USER}@relengwebadm.private.scl3.mozilla.com:/mnt/netapp/relengweb/talos-bundles/zips ssh ${RELENGWEB_USER}@relengwebadm.private.scl3.mozilla.com "chmod 644 /mnt/netapp/relengweb/talos-bundles/zips/${TALOS_ZIP}" ssh ${RELENGWEB_USER}@relengwebadm.private.scl3.mozilla.com "sha1sum /mnt/netapp/relengweb/talos-bundles/zips/${TALOS_ZIP}" ssh cruncher "curl -I http://talos-bundles.pvt.build.mozilla.org/zips/${TALOS_ZIP}"
Note that you can get to root by running |sudo su -|
For talos.zip changes: Once deployed, notify the a-team and let them know that they can land at their own convenience.
Updating talos for Tegras
To update talos on Android,
# for foopy05-11 csshX --login=cltbld foopy{05,06,07,08,09,10,11,12,13,14,15,16,17,18,19,20,22,23,24} cd /builds/talos-data/talos hg pull -u
This will update talos on each foopy to the tip of default.
B2G Emulator
How to update the emulator
- The password is on our intranet
# The password is on our intranet curl -u b2g -o emulator.zip http://ec2-107-20-108-245.compute-1.amazonaws.com/jenkins/job/b2g-build/ws/package.zip # enter password, wait for download to finish
Then upload the file to tooltool; see "How to upload to tooltool", below.
- Test that you can is readable from your localhost:
curl -I http://runtime-binaries.pvt.build.mozilla.org/tooltool/sha512/${SHA512}
- There will need to be an in-tree patch like this one to update the emulator; a-team will probably handle this.
B2G Dogfood promotion
https://intranet.mozilla.org/RelEngWiki/index.php/How_To/Perform_b2g_dogfood_tasks
TBPL
How to deploy changes
RelEng no longer has access to do this. TBPL devs will request a push from Server Ops.
How to hide/unhide builders
- In the 'Tree Info' menu select 'Open tree admin panel'
- Filter/select the builders you want to change
- Save changes
- Enter the sheriff password and a description (with bug number if available) of your changes
- CC :edmorley & :philor on the relevant bug so that they know what to expect when sheriffing.
Ganglia
- if you see that a host is reporting to ganglia in an incorrect manner it might just take this to fix it (e.g. bug 674233):
switch to root, service gmond restart
Queue Directories
If you see this in #build:
<nagios-sjc1> [54] buildbot-master12.build.scl1:Command Queue is CRITICAL: 4 dead items
It means that there are items in the "dead" queue for the given master. You need to look at the logs and fix any underlying issue and then retry the command by moving *only* the json file over to the "new" queue. See the Queue directories wiki page for details.
Cruncher
If you get an alert about cruncher running out of space it might be a sendmail issue (backed up emails taking up too much space and not getting sent out):
<nagios-sjc1> [07] cruncher.build.sjc1:disk - / is WARNING: DISK WARNING - free space: / 384 MB (5% inode=93%):
As root:
du -s -h /var/spool/* # confirm that mqueue or clientmqueue is the oversized culprit # stop sendmail, clean out the queues, restart sendmail /etc/init.d/sendmail stop rm -rf /var/spool/clientmqueue/* rm -rf /var/spool/mqueue/* /etc/init.d/sendmail start
hg<->git conversion
This is a production system RelEng built, but has not yet transitioned to full IT operation. As a production system, it is supported 24x7x365 - escalate to IT oncall (who can page) as needed.
We'll get problem reports from 2 sources:
- via email from vcs2vcs user to release+vcs2vcs@m.c - see email handling instructions for those.
- via a bug report for a customer visible condition - this should only be if there is a new error we aren't detecting ourselves. See the resources below and/or page hwine.
Documentation for this system:
- recent docs
- source code: http://hg.mozilla.org/users/hwine_mozilla.com/repo-sync-tools/
- config files: http://hg.mozilla.org/users/hwine_mozilla.com/repo-sync-configs/
All services run as user vcs2vcs on one of the following hosts (as of 2013-01-07): github-sync1-dev.dmz.scl3.mozilla.com, github-sync1.dmz.scl3.mozilla.com, github-sync2.dmz.scl3.mozilla.com, github-sync3.dmz.scl3.mozilla.com.
Handling alert_major_errors
# SSH as yourself to the hostname in the 'from' address of the alert_major_errors email. $ ssh yourname@github-sync3.dmz.scl3.mozilla.com $ sudo su - vcs2vcs $ cd etc # find the repo name that vcs2vcs is complaining about. For example: $ grep releases-mozilla-central-no-cvs * job02_cmds:# "hg:$HOME/repos/releases-mozilla-central-no-cvs" "github" # discover where that job runs $ grep job02 status job02_cmds,github-sync3.dmz.scl3.mozilla.com,m-c w/o cvs as used by b2g # connect to that host the same as we did above (if not already connected) # then $ cd logs/job02 # same job as above $ show_update_errors update.log # Note: the command exit code precedes the command itself # eg. ...;255;hg --cwd...
Continue with instructions here.
disable/reenable aurora updates
After merge day.
Disable
We need to disable aurora updates on merge day until aurora builds pass QA.
- RelMan sends email
- Write a patch like this; get review; land.
- reconfig
Reenable
After QA signs off, we'll get an email/bug about reenabling.
- To enable the previous nightly:
# ffxbld@aus3-staging cd /opt/aus2/incoming/2/Firefox rsync -av mozilla-aurora-test/ mozilla-aurora/ cd /opt/aus2/incoming/2/Fennec rsync -av mozilla-aurora-test/ mozilla-aurora/
- Then, to reenable updates for futher nightlies, revert the previous patch and reconfig.
- Update bouncer links for stub installer (increment the major version in each of these):
Upload
Python packages
ReleaseEngineering/How_To/Add_a_Python_Package_to_PuppetAgain#Step_by_step_instructions
How to upload to Tooltool
SSH to relengwebadm.private.scl3.mozilla.com, or any host with the relengweb volume mounted.
FILE=~/emulator.zip # or whatever you're uploading export SHA512=`openssl sha512 $FILE | cut -d' ' -f2` sudo mv -i $FILE /mnt/netapp/relengweb/tooltool/pvt/build/sha512/${SHA512} sudo chmod 644 /mnt/netapp/relengweb/tooltool/pvt/build/sha512/${SHA512} ls -l /mnt/netapp/relengweb/tooltool/pvt/build/sha512/${SHA512}
copy and save the filesize (from ls -l) and sha512 to add to tooltool manifests later.