Buildduty/manifesto: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
m (Update BuildDuty to CiDuty. Renaming project)
(updated manifesto to reflect ciduty support and newly defined escalation methods)
Line 3: Line 3:
CiDuty is an operational support team dedicated to monitoring and maintaining the health of Firefox’s continuous integration (CI) infrastructure. Employees are contractors located in Romania that provide 24/7 support. The team's '''''responsibilities include''''' but are not limited to:  
CiDuty is an operational support team dedicated to monitoring and maintaining the health of Firefox’s continuous integration (CI) infrastructure. Employees are contractors located in Romania that provide 24/7 support. The team's '''''responsibilities include''''' but are not limited to:  
* [[ReleaseEngineering/Buildduty_manifesto#Firefox_CI_infrastructure_outage_coordination_and_investigation| Firefox CI infrastructure outage coordination and investigation]]
* [[ReleaseEngineering/Buildduty_manifesto#Firefox_CI_infrastructure_outage_coordination_and_investigation| Firefox CI infrastructure outage coordination and investigation]]
* [[ReleaseEngineering/Buildduty_manifesto#Firefox_CI_support_and_case_management| Firefox CI support and case management]]
* [[ReleaseEngineering/Buildduty_manifesto#Monitoring,_investigating,_and_debugging_issues_with_the_Linux,_Windows,_and_OS_X_Firefox_CI_infrastructure| Monitoring, investigating, and debugging issues with the Linux, Windows, and OS X Firefox CI infrastructure]]
* [[ReleaseEngineering/Buildduty_manifesto#Monitoring,_investigating,_and_debugging_issues_with_the_Linux,_Windows,_and_OS_X_Firefox_CI_infrastructure| Monitoring, investigating, and debugging issues with the Linux, Windows, and OS X Firefox CI infrastructure]]
* [[ReleaseEngineering/Buildduty_manifesto#Monitoring_Firefox_CI_backlog/pending_counts| Monitoring Firefox CI backlog/pending counts]]
* [[ReleaseEngineering/Buildduty_manifesto#Monitoring_Firefox_CI_backlog/pending_counts| Monitoring Firefox CI backlog/pending counts]]
Line 10: Line 11:
* [[ReleaseEngineering/Buildduty_manifesto#Routine_maintenance_of_the_Firefox_CI_configuration| Routine maintenance of the Firefox CI configuration]]
* [[ReleaseEngineering/Buildduty_manifesto#Routine_maintenance_of_the_Firefox_CI_configuration| Routine maintenance of the Firefox CI configuration]]


While team’s responsibilities cover a wide variety of tasks, people sometimes contact them for (or file bugs in their queue for) '''''issues that don’t''''' fall under their area of expertise. The following are some of the most common examples:
The team’s responsibilities cover a wide variety of tasks, however they are not deeply knowledgeable about any particular tool, worker, or task running in our infra. Therefore, They should be treated as quick res ponders who are able to assess state in a timely manner, and escalate issues and inquiries to the appropriate person.
* [[ReleaseEngineering/Buildduty_manifesto#Firefox_release_builds| Firefox release builds]]
* [[ReleaseEngineering/Buildduty_manifesto#Developer_test_failures| Developer test failures]]


== Things CiDuty can help with ==
== Things CiDuty can help with ==


===== Firefox CI support and case management =====
First and foremost, ciduty are "case managers" to your CI developer needs. They have escalation paths and a well defined knowledge of the CI system as a whole. Given that, they are excellent at responding to issues and inquiries, and making sure anything Firefox CI related is triaged and managed appropriately.


===== Firefox CI infrastructure outage coordination and investigation =====
===== Firefox CI infrastructure outage coordination and investigation =====
Line 38: Line 40:
While most of the Taskcluster configuration is handled by the end-developer, we still have infrastructure using the Buildbot CI infrastructure as well. CiDuty has the knowledge and capability to modify the Firefox buildbot-configs and perform general maintenance of the Buildbot systems. Maintenance includes tasks such as retasking machines from one platform to another as capacity requirements demand, decommissioning machines, updating keys and secrets, etc.
While most of the Taskcluster configuration is handled by the end-developer, we still have infrastructure using the Buildbot CI infrastructure as well. CiDuty has the knowledge and capability to modify the Firefox buildbot-configs and perform general maintenance of the Buildbot systems. Maintenance includes tasks such as retasking machines from one platform to another as capacity requirements demand, decommissioning machines, updating keys and secrets, etc.


== Things CiDuty isn’t responsible for ==
== Things ciduty are not responsible for ==


===== Firefox release builds =====
===== Fixing Firefox build and test tasks =====
CiDuty is not responsible for checking the state or fixing issues with any of the Firefox channel builds (Nightly, DevEdition, Release, etc). Releases of any kind are are handled by the ReleaseDuty team within Release Engineering.


===== Developer test failures =====
While ciduty have the skills to diagnose CI infra health and make sure that the workers are in a good state, they are not knowledgeable about build and test internal logic. They do however know who owns what and can help you escalate to the appropriate team
CiDuty is the first point of contact for any ongoing, systemic, Firefox CI test failures which appear to be caused by infrastructure-related issues, but they do not triage or fix other types of test failures. Any bugs filed in the CiDuty queue for these general types of developer test failures (e.g. Mochitest, reftest, etc) will be moved during bug triage.

Revision as of 23:06, 25 May 2018

Intro

CiDuty is an operational support team dedicated to monitoring and maintaining the health of Firefox’s continuous integration (CI) infrastructure. Employees are contractors located in Romania that provide 24/7 support. The team's responsibilities include but are not limited to:

The team’s responsibilities cover a wide variety of tasks, however they are not deeply knowledgeable about any particular tool, worker, or task running in our infra. Therefore, They should be treated as quick res ponders who are able to assess state in a timely manner, and escalate issues and inquiries to the appropriate person.

Things CiDuty can help with

Firefox CI support and case management

First and foremost, ciduty are "case managers" to your CI developer needs. They have escalation paths and a well defined knowledge of the CI system as a whole. Given that, they are excellent at responding to issues and inquiries, and making sure anything Firefox CI related is triaged and managed appropriately.

Firefox CI infrastructure outage coordination and investigation

When the Firefox CI system fails, getting services online again is ciduty's top priority. They are the initial point of contact for outages, but will likely escalate to additional teams with subject matter experts for resolution.

Monitoring, investigating, and debugging issues with the Linux, Windows, and OS X Firefox CI infrastructure

CiDuty monitors the Firefox CI infrastructure using the Nagios GUI and irc alerts in the #ci irc channel. They routinely look for system issues, resolve them using our automation tooling, or work with datacenter staff to repair offline or degraded hardware. They also monitor email from AWS about infrastructure that is degraded or requires maintenance.

Monitoring Firefox CI backlog/pending counts

CiDuty is the first point of contact for monitoring the load on the Firefox CI system and determining the cause of any high backlog or pending job counts. If they are unable to determine the root cause and solve the issue, CiDuty escalates to other teams who have subject matter experts.

Tree closing and opening

Closing and opening the trees (denying and allowing code checkins to our mercurial repos) are typically handled by the Mozilla Code Sheriffs, but CiDuty can also help out with this if needed.

Loaning Firefox build/test instances to developers

CiDuty processes bugzilla requests from developers for Firefox CI build or test loaners. To obtain a loaner, submit a request to bugzilla under CiDuty and expect a response in less than one working day (UTC+2).

Upload new packages or Python modules to our internal mirrors

CiDuty can help a developer who needs a new software package uploaded to tooltool or a Python package uploaded to our internal PyPi mirror. They can also grant other developers access to upload packages to tooltool, for a given paths subset, to allow for future self-service.

Routine maintenance of the Firefox CI configuration

While most of the Taskcluster configuration is handled by the end-developer, we still have infrastructure using the Buildbot CI infrastructure as well. CiDuty has the knowledge and capability to modify the Firefox buildbot-configs and perform general maintenance of the Buildbot systems. Maintenance includes tasks such as retasking machines from one platform to another as capacity requirements demand, decommissioning machines, updating keys and secrets, etc.

Things ciduty are not responsible for

Fixing Firefox build and test tasks

While ciduty have the skills to diagnose CI infra health and make sure that the workers are in a good state, they are not knowledgeable about build and test internal logic. They do however know who owns what and can help you escalate to the appropriate team