CIDuty/Other Duties

From MozillaWiki
Jump to navigation Jump to search

Tree Maintenance

Repo Errors

If a dev reports a problem pushing to hg (either m-c or try repo) then you need to do the following:

  • File a bug (or have dev file it) and then poke in #ops noahm
    • If he doesn't respond, then escalate the bug to page on-call
  • Follow the steps below for "How do I close the tree"

How do I see problems in TBPL?

All "infrastructure" (that's us!) problems should be purple at http://tbpl.mozilla.org. Some aren't, so keep your eyes open in IRC, but get on any purples quickly.

How do I close the tree?

See ReleaseEngineering/How_To/Close_or_Open_the_Tree

How do I claim a rentable project branch?

See ReleaseEngineering/DisposableProjectBranches#BOOKING_SCHEDULE

Clean up the scheduler DB

Sometimes we get some jobs pending for days: https://secure.pub.build.mozilla.org/buildapi/pending

Here's how to clean them: TODO

Re-run jobs

How to trigger Talos jobs

see ReleaseEngineering/How_To/Trigger_Talos_Jobs

How to re-trigger all Talos runs for a build (by using sendchange)

see ReleaseEngineering/How_To/Trigger_Talos_Jobs

How to re-run a build

Do not go to the page of the build you'd like to re-run and cook up a sendchange to try to re-create the change that caused it. Changes without revlinks trigger releases, which is not what you want.

Find the revision you want, find a builder page for the builder you want (preferably, but not necessarily, on the same master), and plug the revision, your name, and a comment into the "Force Build" form. Note that the YOU MUST specify the branch, so there's no null keys in the builds-running.js.

Nightlies

How do I re-spin mozilla-central nightlies?

To rebuild the same nightly, buildbot's Rebuild button works fine.

To build a different revision, Force build all builders matching /.*mozilla-central.*nightly/, on any of the regular build masters. Set revision to the desired revision. With no revision set, the tip of the default branch will be used, but it's probably best to get an explicit revision from hg.mozilla.org/mozilla-central. (For b2g, the revision set can only be the mercurial gecko revision.)

You can use https://build.mozilla.org/buildapi/self-serve/mozilla-central to do initiate this build and use the changeset at the tip of http://hg.mozilla.org/mozilla-central. Sometimes the developer will request a specific changeset in the bug. (For b2g, the revision set can only be the mercurial gecko revision.)

To respin just the android nightlies, find the revisions in the fennec*txt file here and here. Then kick off a build (specifying the revision in the revision field) for armv6 and armv7.

To start a new b2g Unagi nightly, force a build on a build master such as bm58. You may want to provide a value for the 'buildid' property such as 20130828155234 (which represents a Pacific timezone date/time).

Disable updates

If you're requested to disable updates for whatever reasons you can log on to aus3-staging to do it. Depending what you're asked to shut off, you'll have to chmod a different directory (or directories) to 700. You can logon to aus3-staging.mozilla.org through ldap account and use 'sudo su - ffxbld' (or tbirdbld) to gain the correct privileges. Some examples of shutting off different updates are below:

  • 64-bit Windows on the ux branch:
chmod 700 /opt/aus2/incoming/2/Firefox/ux/WINNT_x86_64-msvc
  • All updates on Nightly:
chmod 700 /opt/aus2/incoming/2/Firefox/mozilla-central
  • Linux (32-bit + 64-bit) updates on Aurora:
chmod 700 /opt/aus2/incoming/2/Firefox/mozilla-aurora/Linux_x86-gcc3 /opt/aus2/incoming/2/Firefox/mozilla-aurora/Linux_x86_64-gcc3

Talos

Note because a change to the Talos bundle always causes changes in the baseline times, the following should be done for *any* change...

  1. close all trees that are impacted by the change
  2. ensure all pending builds are done and GREEN
  3. do the update step below
  4. send a Talos changeset to all trees to generate new baselines

How to update the talos zips

NOTE: Deploying talos.zip is not scary anymore as we don't replace the file anymore and the a-team has to land a change in the tree.

You may need to get IT to turn on access to build.mozilla.org.

# on your localhost
export URL=http://people.mozilla.org/~jmaher/taloszips/zips/talos.07322bbe0f7d.zip
export TALOS_ZIP=`basename $URL`
wget $URL
# wget from people doesn't work anymore
export RELENGWEB_USER=`whoami`
scp ${TALOS_ZIP} ${RELENGWEB_USER}@relengwebadm.private.scl3.mozilla.com:/mnt/netapp/relengweb/talos-bundles/zips
ssh ${RELENGWEB_USER}@relengwebadm.private.scl3.mozilla.com "chmod 644 /mnt/netapp/relengweb/talos-bundles/zips/${TALOS_ZIP}"
ssh ${RELENGWEB_USER}@relengwebadm.private.scl3.mozilla.com "sha1sum /mnt/netapp/relengweb/talos-bundles/zips/${TALOS_ZIP}"
ssh cruncher "curl -I http://talos-bundles.pvt.build.mozilla.org/zips/${TALOS_ZIP}"

Note that you can get to root by running |sudo su -|

For talos.zip changes: Once deployed, notify the a-team and let them know that they can land at their own convenience.

Updating talos for Tegras

To update talos on Android,

# for foopy05-11
csshX --login=cltbld foopy{05,06,07,08,09,10,11,12,13,14,15,16,17,18,19,20,22,23,24}
cd /builds/talos-data/talos
hg pull -u

This will update talos on each foopy to the tip of default.

B2G Emulator

How to update the emulator

  • The password is on our intranet
# The password is on our intranet
curl -u b2g -o emulator.zip http://ec2-107-20-108-245.compute-1.amazonaws.com/jenkins/job/b2g-build/ws/package.zip
# enter password, wait for download to finish

Then upload the file to tooltool; see "How to upload to tooltool", below.

  • Test that you can is readable from your localhost:

curl -I http://runtime-binaries.pvt.build.mozilla.org/tooltool/sha512/${SHA512}

  • There will need to be an in-tree patch like this one to update the emulator; a-team will probably handle this.

B2G Dogfood promotion

https://intranet.mozilla.org/RelEngWiki/index.php/How_To/Perform_b2g_dogfood_tasks

TBPL

How to deploy changes

RelEng no longer has access to do this. TBPL devs will request a push from Server Ops.

How to hide/unhide builders

  • In the 'Tree Info' menu select 'Open tree admin panel'
  • Filter/select the builders you want to change
  • Save changes
  • Enter the sheriff password and a description (with bug number if available) of your changes
  • CC :edmorley & :philor on the relevant bug so that they know what to expect when sheriffing.

Ganglia

  • if you see that a host is reporting to ganglia in an incorrect manner it might just take this to fix it (e.g. bug 674233):
switch to root, service gmond restart

Queue Directories

If you see this in #build:

<nagios-sjc1> [54] buildbot-master12.build.scl1:Command Queue is CRITICAL: 4 dead items

It means that there are items in the "dead" queue for the given master. You need to look at the logs and fix any underlying issue and then retry the command by moving *only* the json file over to the "new" queue. See the Queue directories wiki page for details.

Cruncher

If you get an alert about cruncher running out of space it might be a sendmail issue (backed up emails taking up too much space and not getting sent out):

<nagios-sjc1> [07] cruncher.build.sjc1:disk - / is WARNING: DISK WARNING - free space: / 384 MB (5% inode=93%):

As root:

du -s -h /var/spool/*
# confirm that mqueue or clientmqueue is the oversized culprit
# stop sendmail, clean out the queues, restart sendmail
/etc/init.d/sendmail stop
rm -rf /var/spool/clientmqueue/*
rm -rf /var/spool/mqueue/*
/etc/init.d/sendmail start

hg<->git conversion

This is a production system RelEng built, but has not yet transitioned to full IT operation. As a production system, it is supported 24x7x365 - escalate to IT oncall (who can page) as needed.

We'll get problem reports from 2 sources:

  • via email from vcs2vcs user to release+vcs2vcs@m.c - see email handling instructions for those.
  • via a bug report for a customer visible condition - this should only be if there is a new error we aren't detecting ourselves. See the resources below and/or page hwine.

Documentation for this system:

All services run as user vcs2vcs on one of the following hosts (as of 2013-01-07): github-sync1-dev.dmz.scl3.mozilla.com, github-sync1.dmz.scl3.mozilla.com, github-sync2.dmz.scl3.mozilla.com, github-sync3.dmz.scl3.mozilla.com.

Handling alert_major_errors

# SSH as yourself to the hostname in the 'from' address of the alert_major_errors email.
$ ssh yourname@github-sync3.dmz.scl3.mozilla.com
$ sudo su - vcs2vcs
$ cd etc
# find the repo name that vcs2vcs is complaining about. For example:
$ grep releases-mozilla-central-no-cvs *
job02_cmds:#    "hg:$HOME/repos/releases-mozilla-central-no-cvs" "github"
# discover where that job runs
$ grep job02 status
job02_cmds,github-sync3.dmz.scl3.mozilla.com,m-c w/o cvs as used by b2g
# connect to that host the same as we did above (if not already connected)
# then
$ cd logs/job02 # same job as above
$ show_update_errors update.log
# Note: the command exit code precedes the command itself
# eg. ...;255;hg --cwd...

Continue with instructions here.

disable/reenable aurora updates

After merge day.

Disable

We need to disable aurora updates on merge day until aurora builds pass QA.

  • RelMan sends email
  • Write a patch like this; get review; land.
  • reconfig

Reenable

After QA signs off, we'll get an email/bug about reenabling.

  • To enable the previous nightly:
# ffxbld@aus3-staging
cd /opt/aus2/incoming/2/Firefox
rsync -av mozilla-aurora-test/ mozilla-aurora/
cd /opt/aus2/incoming/2/Fennec
rsync -av mozilla-aurora-test/ mozilla-aurora/
  • Then, to reenable updates for futher nightlies, revert the previous patch and reconfig.

Upload

Python packages

ReleaseEngineering/How_To/Add_a_Python_Package_to_PuppetAgain#Step_by_step_instructions

How to upload to Tooltool

SSH to relengwebadm.private.scl3.mozilla.com, or any host with the relengweb volume mounted.

FILE=~/emulator.zip # or whatever you're uploading
export SHA512=`openssl sha512 $FILE | cut -d' ' -f2`
sudo mv -i $FILE /mnt/netapp/relengweb/tooltool/pvt/build/sha512/${SHA512}
sudo chmod 644  /mnt/netapp/relengweb/tooltool/pvt/build/sha512/${SHA512}
ls -l  /mnt/netapp/relengweb/tooltool/pvt/build/sha512/${SHA512}

copy and save the filesize (from ls -l) and sha512 to add to tooltool manifests later.

How to upload Talos ZIPs

See How to update the talos zips.