CIDuty: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
(Moved 2 links.)
 
(34 intermediate revisions by 12 users not shown)
Line 1: Line 1:
__TOC__
__TOC__


= What is buildduty? =
= What is CIDuty? =
Every month, there is one person from the Release Engineering (releng) team dedicated to helping out developers with releng-related issues.  This person will be available during his or her regular work hours for the whole month. This is similar to the sheriff role that rotates through the [[Sheriff|sheriffing team]] . To avoid confusion, the releng sheriff position is known as "'''buildduty'''."
CiDuty (formerly BuildDuty) is a team dedicated to helping out developers with Firefox continuous integration infra issues and enquiries. We currently have six people based in Romania that provide 24/7 supportCiDuty complement the [[Sheriff|sheriffing team]] where sheriffs respond to Firefox code regressions, CiDuty respond to the infrastructure that builds and tests Firefox code.


= Who is on buildduty? (schedule) =
Have a question or issue with Firefox, build and test infrastructure? ciduty can help and ensure your inquiry gets answered.
The person on buildduty should have 'buildduty' appended to their IRC nick, and should be available in the #developers, #releng, and #buildduty IRC channels.


Mozilla Releng Buildduty Schedule ([https://www.google.com/calendar/embed?src=mozilla.com_30qa9d8c380jrqi454kjo34624%40group.calendar.google.com&ctz=America/Toronto Google Calendar]|[https://www.google.com/calendar/ical/mozilla.com_30qa9d8c380jrqi454kjo34624%40group.calendar.google.com/public/basic.ics iCal]|[https://www.google.com/calendar/feeds/mozilla.com_30qa9d8c380jrqi454kjo34624%40group.calendar.google.com/public/basic XML])
= Communication =


== Buildduty not around? ==
As a 24/7 support team, ciduty are available via irc, email, and bugzilla.
It happens, especially outside of standard North American working hours (0600-1800 PT). Please [https://bugzilla.mozilla.org/enter_bug.cgi?product=Release%20Engineering&component=Buildduty open a bug] under these circumstances.


= Buildduty priorities =
irc:
== How should I make myself available for duty? ==
* #ci - look for 'ciduty' in nick (monitors other channels as well)
* Add 'buildduty' to your IRC nick
* Be available in the following IRC channels (at least): [irc://irc.mozilla.org/#developers #developers], [irc://irc.mozilla.org/#releng #releng], and [irc://irc.mozilla.org/#buildduty #buildduty] (as well as #mozbuild of course)
** also useful to be in [irc://irc.mozilla.org/#mobile #mobile] and [irc://irc.mozilla.org/#ateam #ateam]
** if you are in the middle of an outage, or need IT help, it is useful to be in [irc://irc.mozilla.org/#moc #moc], [irc://irc.mozilla.org/#infra #infra], and [irc://irc.mozilla.org/#sysadmins #sysadmins].


== What should I take care of? ==
bugzilla:
=== Outages ===
* needinfo or assign ciduty@mozilla.com
Things fail. It's sad. Getting systems and services stood back up again is buildduty's top priority. Note: this doesn't mean you need to do all the work yourself. For big outages, rope in whatever help you need: domain experts from releng, managers, netops, relops...whoever.
* file under [https://bugzilla.mozilla.org/enter_bug.cgi?product=Infrastructure%20%26%20Operations&component=CIDuty CIDuty] component if you are not sure where to file your CI related ticket


The [[ReleaseEngineering/Buildduty/Dealing With Outages|Dealing with Outages]] wiki has more instructions.
email:
* ciduty@mozilla.com


=== Daily ===
= Manifesto =
==== Buildduty Triage ====
The [[ReleaseEngineering/Buildduty_manifesto| CIDuty manifesto]] describes the team responsibilities in a nutshell.
The [https://secure.pub.build.mozilla.org/builddata/reports/slave_health/buildduty_report.html Buildduty report] (generated hourly) should be your starting point for triage.


'''Note:''' Use the "View list in bugzilla" links in the buildduty report to navigate the bugs more easily.
= Team =


At the top, it lists unassigned bugs for loan requests. You should try to keep this queue empty to make sure developers are unblocked. The wiki has [[ReleaseEngineering/How_To/Loan_a_Slave|instructions for how to loan a slave]].
{| border=1
| '''Name'''
| '''Profile'''
| '''Social'''
| '''Blog'''
|-
| Jordan Lund
| [https://mozillians.org/u/jlund jlund]
| [https://github.com/lundjordan github]
| [http://jordan-lund.ghost.io/ blog]
|-
| Zsolt Fay
| [https://mozillians.org/en-US/u/zfay/ zfay]
| [https://github.com/Rivulu5 github]
|  N/A
|-
| Radu Iman
| [https://mozillians.org/en-US/u/riman/ riman]
| [https://github.com/raduiman github]
| N/A
|-
| Bogdan Crisan
| [https://mozillians.org/en-US/u/bcrisan/ bcrisan]
| [https://github.com/bccrisan github]
| N/A
|-
| Danut Labici
| [https://mozillians.org/en-US/u/dlabici/ dlabici]
| [https://github.com/akhliskun github]
| N/A
|-
| Roland Mutter
| [https://mozillians.org/en-US/u/rmutter/ rmutter]
| [https://github.com/mutterroland github]
| N/A
|-
| Adrian Pop
| [https://mozillians.org/en-US/u/apop/ apop]
| [https://github.com/popadrianc github]
| N/A
|}


After loans are taken care of, make sure that bugs in the "No dependencies" section get dependencies filed, e.g. diagnosis bug, decomm bug, etc.
= CiDuty priorities =
The [[ReleaseEngineering/Buildduty_actionable| CiDuty actionable]] enumerates their daily/weekly sanity job.


Do the same for bugs in the "All dependencies resolved" section to make sure the next action is taken (re-image, decomm, return to production, etc).
= Documentation =
There's a [https://wiki.mozilla.org/CIDuty/How_To HowTo wiki page] that aggregates useful info related to the tasks CiDuty is taking care of (as of January 2019).  


'''Note:''' systemic issues (e.g. test failures that require further investigation) should *not* stay in the buildduty bugzilla component. It may be OK for you to take the bug and work on it depending on how much time you have, but generally these types of bugs should be moved to a more-appropriate component (e.g. General Automation) once buildduty has triaged them.
= Useful Links =
* [[ReleaseEngineering/Buildduty/day_1_checklist|Day 1 checklist]]
* [https://tools.taskcluster.net/provisioners Provision Explorer]
* [https://wiki.mozilla.org/Buildduty/How_To Public "How To" documents]
* [https://mana.mozilla.org/wiki/dosearchsite.action?queryString=title%3A%22How%20To%22&where=RelEng Private "How To" documents]


Aside from the buildduty report, there may also be [https://nagios.mozilla.org/releng-scl3/cgi-bin/status.cgi?host=all&servicestatustypes=28&hoststatustypes=15&serviceprops=270346&hostprops=270346 unacknowledged nagios alerts] in the #buildduty IRC channel. Deal with them, filing bugs as needed.
= Deprecated / Archived =
 
The following links and pages are out-of-date or not used anymore. They are still here for historical reasons.
==== Infrastructure performance ====
In addition to the individual slave bugs tackled in triage above, there may be systemic issues that need investigating. The [[ReleaseEngineering/Buildduty/Infrastructure_Performance|Infrastructure performance]] wiki has more details about how to do this, and links to the wiki page for [[ReleaseEngineering/How_To/Dealing_with_high_pending_counts|how to deal with high pending counts]].
 
=== Semi-Daily ===
* '''Reconfigs'''
** Run [[ReleaseEngineering/Buildduty/Reconfigs|reconfigs]] (every day or two days) for other relengers
*'''Review "long running" and "lazy" AWS instances'''
** ''When'': 2-3 times a week (eg: Mondays or after weekends/holidays, Wednesdays, and Fridays)
** ''How'': use aws sanity check email (sent daily):
*** Email filter => to: release+aws-sanity-check@mozilla.com, subject: [cron] aws sanity check
*** for each host under heading "Long running instances", follow steps in dealing with [https://wiki.mozilla.org/ReleaseEngineering/How_To/Manage_AWS_slaves#Long_Running_Instances long running instances]
*** for each host under heading "Lazy long running instances", figure out why they're still up and not taking jobs
**** twistd.log, uptime, reboot history on the slave health page et al
 
=== Weekly ===
*'''Review AWS instances that have 'Unknown State/Type or have stopped for a while' '''
** ''When'':
*** Once a week. Preferably, this would be evenly spaced out between tackling this so let's say Fridays if possible.
** ''How'':
*** use aws sanity check email (sent daily):
**** Email filter => to: release+aws-sanity-check@mozilla.com, subject: [cron] aws sanity check
*** for each host under heading "Unknown State", "Unknown Type"
**** follow steps in dealing with [https://wiki.mozilla.org/ReleaseEngineering/How_To/Manage_AWS_slaves#Unknown_Type_Or_State_Instances unknown state or type] instances
*** for each host under heading "Stopped For A While"
**** follow steps in dealing with [https://wiki.mozilla.org/ReleaseEngineering/How_To/Manage_AWS_slaves#Stopped_For_A_While_Instances stopped for a while instances]


== Others ==
== Others ==
There is a long list of '''[[ReleaseEngineering/Buildduty/Other_Duties|other, less-frequent duties]]''' that buildduty can assist with.
* '''[[CIDuty/Other_Duties|other, less-frequent duties]]''' that CiDuty can assist with.
 
* [[ReleaseEngineering/How_To|Old/Deprecated Public "How To" documents]]
= Useful Links =
* [https://secure.pub.build.mozilla.org/builddata/reports/slave_health/ Slave Health]
* [https://secure.pub.build.mozilla.org/buildapi/ Build Dashboard Main Page]
** You can get JSON dumps for people to analyze by adding <code>&format=json</code>
* [[ReleaseEngineering/How_To|Public "How To" documents]]
* [https://mana.mozilla.org/wiki/dosearchsite.action?queryString=title%3A%22How%20To%22&where=RelEng Private "How To" documents]
 
= Standard Bugs =
* For IT bugs that are marked "infra only", yet still need to be readable by RelEng, it is not enough to add release@ alias - people get updates but not able to comment or read prior comments. Instead, cc the following:
** :bhearsum, :Callek, :catlee, :coop, :hwine, :jlund, :kmoir, :mrrrgn, :nthomas, :rail
** :Tomcat, :RyanVM, :KWierso


= Meeting Notes =
== Meeting Notes ==
* [https://releng.etherpad.mozilla.org/buildduty Daily buildduty stand-up notes]
Old meeting docs from BuildDuty era.
* [https://etherpad.mozilla.org/buildduty-notes Daily buildduty stand-up notes]
* [[ReleaseEngineering/Buildduty/Meetings|Old buildduty weekly meetings notes]]
* [[ReleaseEngineering/Buildduty/Meetings|Old buildduty weekly meetings notes]]
* [[ReleaseEngineering/Buildduty/SVMeetings| SoftVision buildduty stand-up notes]]

Latest revision as of 15:09, 15 January 2019

What is CIDuty?

CiDuty (formerly BuildDuty) is a team dedicated to helping out developers with Firefox continuous integration infra issues and enquiries. We currently have six people based in Romania that provide 24/7 support. CiDuty complement the sheriffing team where sheriffs respond to Firefox code regressions, CiDuty respond to the infrastructure that builds and tests Firefox code.

Have a question or issue with Firefox, build and test infrastructure? ciduty can help and ensure your inquiry gets answered.

Communication

As a 24/7 support team, ciduty are available via irc, email, and bugzilla.

irc:

  • #ci - look for 'ciduty' in nick (monitors other channels as well)

bugzilla:

  • needinfo or assign ciduty@mozilla.com
  • file under CIDuty component if you are not sure where to file your CI related ticket

email:

  • ciduty@mozilla.com

Manifesto

The CIDuty manifesto describes the team responsibilities in a nutshell.

Team

Name Profile Social Blog
Jordan Lund jlund github blog
Zsolt Fay zfay github N/A
Radu Iman riman github N/A
Bogdan Crisan bcrisan github N/A
Danut Labici dlabici github N/A
Roland Mutter rmutter github N/A
Adrian Pop apop github N/A

CiDuty priorities

The CiDuty actionable enumerates their daily/weekly sanity job.

Documentation

There's a HowTo wiki page that aggregates useful info related to the tasks CiDuty is taking care of (as of January 2019).

Useful Links

Deprecated / Archived

The following links and pages are out-of-date or not used anymore. They are still here for historical reasons.

Others

Meeting Notes

Old meeting docs from BuildDuty era.