TestEngineering/Performance/Sheriffing/Tree FAQ: Difference between revisions

Redirected page to Performance FAQ
(remove extra brackets)
(Redirected page to Performance FAQ)
 
(11 intermediate revisions by 2 users not shown)
Line 1: Line 1:
#REDIRECT [[Performance FAQ|https://wiki.mozilla.org/TestEngineering/Performance/FAQ#Sheriffing]]
= Branch names and confusion =
= Branch names and confusion =
We have a variety of branches at Mozilla, here are the main ones that we see alerts on:
We have a variety of branches at Mozilla, here are the main ones that we see alerts on:
* Mozilla-Inbound[-Non-PGO]
* Mozilla-Inbound (PGO, Non-PGO)
* Fx-Team[-Non-PGO]
* Autoland (PGO, Non-PGO)
* Mozilla-Aurora (all PGO)
* Mozilla-Beta (all PGO)
* Mozilla-Beta (all PGO)


Linux and Windows builds have [[https://wiki.mozilla.org/Buildbot/Talos/Sheriffing/Tree_FAQ#What_is_PGO PGO]], OSX does not.
Linux and Windows builds have [[TestEngineering/Performance/Sheriffing/Tree_FAQ#What_is_PGO|PGO]], OSX does not.


When investigating alerts, always look for the Non-PGO branch first.  Usually expect to find changes on Mozilla-Inbound (about 75%) and fx-team (25%).
When investigating alerts, always look for the Non-PGO branch first.  Usually expect to find changes on Mozilla-Inbound (about 50%) and Autoland (50%).


The volume on the branches is something to be aware of, we have higher volume on Mozilla-Inbound and Fx-Team, this means that alerts will be generated faster and it will be easier to track down the offending revision.
The volume on the branches is something to be aware of, we have higher volume on Mozilla-Inbound and Autoland, this means that alerts will be generated faster and it will be easier to track down the offending revision.


A final note, Mozilla-Aurora/Mozilla-Beta are branches where little development takes place.  The volume is really low and alerts come out a week later.  It is important to address Mozilla-Beta alerts ASAP because that is what we are shipping to customers.
A final note, Mozilla-Beta is a branch where little development takes place.  The volume is really low and alerts come 5 days (or more) later.  It is important to address Mozilla-Beta alerts ASAP because that is what we are shipping to customers.


= What is coalescing =
= What is coalescing =
Line 23: Line 23:
* missing builds -  we would have to generate builds, which automatically schedules test jobs, sometimes these test jobs are coalesced and not run.
* missing builds -  we would have to generate builds, which automatically schedules test jobs, sometimes these test jobs are coalesced and not run.
* results might not be possible due to build failures, or test failures
* results might not be possible due to build failures, or test failures
* [[https://wiki.mozilla.org/Buildbot/Talos/Sheriffing/Tree_FAQ#What_is_PGO pgo builds]] are not coalesced, they just run much less frequently.  Most likely a pgo build isn't the root cause
* [[TestEngineering/Performance/Sheriffing/Tree_FAQ#What_is_PGO|pgo builds]] are not coalesced, they just run much less frequently.  Most likely a pgo build isn't the root cause


Here is a view on treeherder of missing data (usually coalescing):
Here is a view on treeherder of missing data (usually coalescing):
Line 32: Line 32:


= What is an uplift =
= What is an uplift =
Every [https://wiki.mozilla.org/RapidRelease/Calendar 6 weeks] we release a new version of Firefox.  When we do that, our code which developers check into the nightly branch gets uplifted (thing of this as a large [https://wiki.mozilla.org/Buildbot/Talos/Sheriffing/Tree_FAQ#What_is_a_merge merge]) to the Aurora branch.  Now all the code, features, and Talos regressions are on Aurora.  The same thing happens with Aurora, we uplift that to Beta.
Every [[RapidRelease/Calendar|6 weeks]] we release a new version of Firefox.  When we do that, our code which developers check into the nightly branch gets uplifted (thing of this as a large [[TestEngineering/Performance/Sheriffing/Tree_FAQ#What_is_a_merge|merge]]) to the Beta branch.  Now all the code, features, and Talos regressions are on Beta.


This affects the Talos Sheriff because we will get a big pile of alerts for Mozilla-Beta and Mozilla-Aurora. These need to be addressed rapidly. Luckily almost all the regressions seen on Mozilla-Beta will already have been tracked on Mozilla-Aurora, likewise all alerts that now show up on Mozilla-Aurora were already tracked on Mozilla-Central.
This affects the Performance Sheriffs because we will get a big pile of alerts for Mozilla-Beta. These need to be addressed rapidly. Luckily almost all the regressions seen on Mozilla-Beta will already have been tracked on Mozilla-Inbound or Autoland.


= What is a merge =
= What is a merge =
Many times each day we merge code from the integration branches into the main branch and back.  This is a common process in large projects.  At Mozilla, this means that the majority of the code for Firefox is checked into Mozilla-Inbound and Fx-Team, then it is merged into Mozilla-Central (also referred to as Firefox) and then once merged, it gets merged back into the other branches.
Many times each day we merge code from the integration branches into the main branch and back.  This is a common process in large projects.  At Mozilla, this means that the majority of the code for Firefox is checked into Mozilla-Inbound and Autoland, then it is merged into Mozilla-Central (also referred to as Firefox) and then once merged, it gets merged back into the other branches. If you want to read more about this merge procedure, here are [[Sheriffing/How_To/Merges|the details]].


Here is an example of a view of what a merge looks like on [https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=126a1ec5c7c5 TreeHerder]:
Here is an example of a view of what a merge looks like on [https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=126a1ec5c7c5 TreeHerder]:
Line 45: Line 45:
Note that the topmost revision has the commit messsage of: "merge m-c to m-i".  This is pretty standard and you can see that there are a series of [https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=126a1ec5c7c5 changesets], not just a few related patches.
Note that the topmost revision has the commit messsage of: "merge m-c to m-i".  This is pretty standard and you can see that there are a series of [https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=126a1ec5c7c5 changesets], not just a few related patches.


How this affects alerts is that when a regression lands on Mozilla-Inbound, it will be merged into Firefox, then fx-team.  Most likely this means that you will see duplicate alerts on the other integration branch.
How this affects alerts is that when a regression lands on Mozilla-Inbound, it will be merged into Firefox, then Autoland.  Most likely this means that you will see duplicate alerts on the other integration branch.


* note: we do not generate alerts for the Firefox (Mozilla-Central) branch.
* note: we do not generate alerts for the Firefox (Mozilla-Central) branch.


= What is a backout =
= What is a backout =
Many times we backout or hotfix code as it is causing a build failure or unittest failure.  The [https://wiki.mozilla.org/Sheriffing/Sheriff_Duty Sheriff team] handles this process in general and backouts/hotfixes are usually done within 3 hours (i.e. we won't have [https://wiki.mozilla.org/Buildbot/Talos/Sheriffing/Noise_FAQ#Why_do_we_need_12_future_data_points 12 future changesets]) of the original fix.  As you can imagine we could get an alert 6 hours later and go to look at the graph and see there is no regression, instead there is a temporary spike for a few data points.
Many times we backout or hotfix code as it is causing a build failure or unittest failure.  The [[Sheriffing/Sheriff_Duty|Sheriff team]] handles this process in general and backouts/hotfixes are usually done within 3 hours (i.e. we won't have [[TestEngineering/Performance/Sheriffing/Noise_FAQ#Why_do_we_need_12_future_data_points|12 future changesets]]) of the original fix.  As you can imagine we could get an alert 6 hours later and go to look at the graph and see there is no regression, instead there is a temporary spike for a few data points.


While looking on TreeHerder for a backout, they all mention a backout in the commit message:
While looking on TreeHerder for a backout, they all mention a backout in the commit message:
Line 58: Line 58:
* note ^ the above image mentions the bug that was backed out, sometimes it is the revisoin
* note ^ the above image mentions the bug that was backed out, sometimes it is the revisoin


Backouts which affect talos always generate a set of improvements and regressions.  These are usually easy to spot on the graph server and we just need to annotate the set of alerts for the given revision to be a 'backout' with the bug to track what took place.
Backouts which affect [[TestEngineering/Performance/Sheriffing/Alerts|Perfherder alerts]] always generate a set of improvements and regressions.  These are usually easy to spot on the graph server and we just need to annotate the set of alerts for the given revision to be a 'backout' with the bug to track what took place.


Here is a view on graph server of what appears to be a backout (it could be a fix that landed quickly also):
Here is a view on graph server of what appears to be a backout (it could be a fix that landed quickly also):
Line 65: Line 65:


= What is PGO =
= What is PGO =
PGO is [[https://developer.mozilla.org/en-US/docs/Building_with_Profile-Guided_Optimization Profile Guided Optimization]] where we do a build, run it to collect metrics and optimize based on the output of the metrics.  We only release PGO builds, and for the integration branches we do these periodically (6 hours) or as needed.  For Mozilla-Central we follow the same pattern.  As the builds take considerably longer (2+ times as long) we don't do this for every commit into our integration branches.
PGO is [https://developer.mozilla.org/en-US/docs/Building_with_Profile-Guided_Optimization Profile Guided Optimization] where we do a build, run it to collect metrics and optimize based on the output of the metrics.  We only release PGO builds, and for the integration branches we do these periodically (6 hours) or as needed.  For Mozilla-Central we follow the same pattern.  As the builds take considerably longer (2+ times as long) we don't do this for every commit into our integration branches.


How does this affect alerts?  We care most about PGO alerts- that is what we ship!  Most of the time an alert will be generated for a -Non-PGO build and then a few hours or a day later we will see alerts for the PGO build.
How does this affect alerts?  We care most about PGO alerts- that is what we ship!  Most of the time an alert will be generated for a -Non-PGO build and then a few hours or a day later we will see alerts for the PGO build.
Line 74: Line 74:
* OSX does not do PGO builds, so we do not have -Non-PGO branches for those platforms. (i.e. we only have Mozilla-Inbound)
* OSX does not do PGO builds, so we do not have -Non-PGO branches for those platforms. (i.e. we only have Mozilla-Inbound)
* PGO alerts will probably have different regression percentages, but the overall list of platforms/tests for a given revision will be almost identical
* PGO alerts will probably have different regression percentages, but the overall list of platforms/tests for a given revision will be almost identical
* [https://wiki.mozilla.org/index.php?title=Buildbot/Talos/Sheriffing/Tree_FAQ&action=edit duplicated & updated from old page]
342

edits