ReferencePlatforms/How To/Setup a New Reference Platform: Difference between revisions

 
(41 intermediate revisions by 5 users not shown)
Line 1: Line 1:
Congratulations.  You have been chosen to setup a new reference platform.  Armen summarized this journey as "It will be difficult".  In addition to testing the new image on the test or build machines, there are several other steps that must be taken to ensure that our build infrastructure is read to work with the new platform and the associated slaves.   
Congratulations.  You have been chosen to setup a new reference platform.  Armen summarized this journey as "It will be difficult".  In addition to testing the new image on the test or build machines, there are several other steps that must be taken to ensure that our build infrastructure is read to work with the new platform and the associated slaves.   
There are several tasks that you can do ahead of time to make it easier noted in the checklist below. Many of these involve IT, so open bugs accordingly.
There are several tasks that you can do ahead of time to make it easier noted in the checklist below. Many of these involve IT, so open bugs accordingly.
{{Release Engineering How To|Setup a New Reference Platform}}


== Do you need a new master? ==
== Do you need a new master? ==
Line 12: Line 14:
=== If so, open bugs for the new master ===
=== If so, open bugs for the new master ===


If the answer is no to any of the questions above, you'll need to setup a new master, unless there unused masters that are already provisioned.  Open a bug with IT to bring up some VMs where you can install a new master (example: {{bug|782870}}.  Read [[ReleaseEngineering/Master_Setup]] to understand the steps to setup a new master.  This document also describes some bugs that need to be opened with various teams when setting up the new master, so read it now.
If the answer is no to any of the questions above, you'll need to setup a new master, unless there unused masters that are already provisioned and available to you.  Open a bug with IT to bring up some VMs where you can install a new master (example: {{bug|782870}}.  Read [[ReleaseEngineering/Master_Setup]] to understand the steps to setup a new master.  This document also describes some bugs that need to be opened with various teams when setting up the new master, so read it now.


=== Open a bug to establish network flows to the sql server from the new master ===
=== Open a bug to establish network flows to the sql server from the new master ===
Line 20: Line 22:
=== Open bugs so you can puppetize the new master(s) and add them to productionmasters.json ===
=== Open bugs so you can puppetize the new master(s) and add them to productionmasters.json ===


This is a example {{bug|783455}}.  The  buildmaster-production.pp  (puppet-manifests) needs to have the new masters added to the nodes so you can puppetize the new masters.  The productionmasters.json (tools) needs to have the new masters listed.  I initially set them to disabled, they will be enabled when we are ready for production.
This is a example {{bug|783455}}.  The  buildmaster-production.pp  (puppet-manifests) needs to have the new masters added to the nodes so you can puppetize the new masters.  The productionmasters.json (tools) needs to have the new masters listed.  I initially set them to disabled, they will be enabled when we are ready for production.  Before the reconfig to enable the new master occurs, you should add ssh keys and an updated authorized_keys file to the master.  Once the reconfig is complete, you'll need to start the new master.
 
== Open a bug for buildbot and puppet changes ==
 
There are changes needed to buildbotcustom, buildbot-configs and puppet-manifests to support the new platform - example {{bug|777759}}.  The buildbot-configs patch will have to wait to be released until all your testing is complete and the platform is ready to land in a reconfig.  The buildbotcustom and puppet-manifests changes can be landed at any time.
There are also changed required to enable a platform's tests running to mozilla-central + peers see https://bugzilla.mozilla.org/show_bug.cgi?id=777759#c31 for an example.  


=== Are you able to send mail to the tinderbox server from the new master? ===
The changes in puppet-manifests repo are changes to the modules/buildmaster/templates/BuildSlaves-tests.py.erb file.  (From rail: the BuildSlaves-tests.py.erb file is in the puppet-manifests repo, but these machines are running from puppet again master).  As well, you'll need to update the secrets.pp.template and secrets.pp on master-puppet1, and replicate these changes to the other masters.  Changes to the puppet-again servers are deployed automatically, changes to the puppet-manifests servers are not. See [[ReleaseEngineering/Puppet/Usage#Deploy_changes]] on how to deploy changes to servers pulling from the old puppet-manifests repo.


{{bug|717808}} is an example.  Tested this tonight from bm37, think it works
== Open bugs for graph server changes ==


<pre>
* testing machines and each type of build need graph server changes
Aug 19 16:47:06 buildbot-master38 sendmail[17861]: q7JNl5wW017859: to=<dm-mail01@tinderbox.mozilla.org>, ctladdr=<root@buildbot-master38.srv.releng.scl3.mozilla.com> (0/0), delay=00:00:01, xdelay=00:00:01, mailer=esmtp, pri=120434, relay=mx1.corp.phx1.mozilla.com. [63.245.216.69], dsn=5.1.1, stat=User unknown
** need to land changes to 'sql/data.sql' on the default branch of http://hg.mozilla.org/graphs (to match your inserts).
Aug 19 16:47:06 buildbot-master38 sendmail[17861]: q7JNl5wW017859: q7JNl6wW017861: DSN: User unknown
** If this is a new build platform, make sure that graph server knows about the build platform
Aug 19 16:47:06 buildbot-master38 sendmail[17861]: q7JNl6wW017861: to=<root@buildbot-master38.srv.releng.scl3.mozilla.com>, delay=00:00:00, xdelay=00:00:00, mailer=local, pri=31748, dsn=2.0.0, stat=Sent
*** insert a machine name like %OS%_%branch% (e.g. "WINNT_5.2_mozilla-central" and "WINNT_5.2_mozilla-central_leak_test")
</pre>
 
We don't have access to update the graph server anymore. You need to open a bug with the [https://bugzilla.mozilla.org/enter_bug.cgi?product=Data%20%26%20BI%20Services%20Team&component=Database%20Operations database operations team] to add them. This is an example {{bug||1131072}} of such a request.


NOTE: If you inadvertently add an incorrect entry to the graph database, it is best to remove that entry so it doesn't appear as an option on this page, for example: http://graphs.mozilla.org/graph.html


== Open a bug for buildbot and puppet changes ==
== Open a bug for Treeherder changes ==
* Treeherder needs to be updated to support the new platform. Please file a bug [https://bugzilla.mozilla.org/enter_bug.cgi?product=Tree+Management&component=Treeherder%3A+Data+Ingestion here] as soon as possible once the buildernames are known, to avoid delays - since unlike TBPL, Treeherder needs regex support prior to the jobs going live, since they are categorised as part of ingestion & not just in the UI layer.


There are changes needed to buildbotcustom, buildbot-configs and puppet-manifests to support the new platform - example {{bug|777759}}. The buildbot-configs patch will have to wait to be released until all your testing is complete and the platform is ready to go before it can be released into a reconfig.  The buildbotcustom and puppet-manifests changes can be released at any time.
== Open a bug for buildfaster changes ==
As easy as this ([https://hg.mozilla.org/build/braindump/file/default/reports/buildfaster_report.py buildfaster_report.py]):
<pre>
        ('winxp', ['Rev3 WINNT 5.1']),
+        ('win8', ['Rev3 WINNT 6.2']),
        ]
</pre>


== Open a bug for graph server changes ==
== Slave health ==
https://hg.mozilla.org/build/tools/rev/5dbaa5080bcd
https://hg.mozilla.org/users/coop_mozilla.com/slave_health/rev/f3eda2cc7d72
https://hg.mozilla.org/users/coop_mozilla.com/slave_health/rev/128bcd9c0e58


* testing machines and each type of build need graph server changes
== Disable tests on branches where this platform isn't needed ==
** graph server work needs to be run on staging and production graph server
** {{bug|786424}} shows an example, disabled Mountain Lion tests for mozilla-esr10
** need to land changes to 'sql/data.sql' on the 1.0 branch of http://hg.mozilla.org/graphs.  Example {{bug|783660 }} This bug should be approved opened in the webdev bucket but the patch should be approved by a releng person.
** {{bug|803248}} buildbot config changes to support panda_android*
** If this is a new build platform, make sure that graph server knows about the build platform
** Also, you'll need to open a bug with Server Operations: Database to add the new machines to the graphs.mozilla.org and graphs.allizom.org.  Example {{ bug|784330 }}


== Open a bug for tbpl changes ==
When testing, ensure that you initiate sendchanges to both the the superset of branches, and the branches you want to limit the tests on to ensure when you run it in production there aren't any unexpected builds.
** tbpl needs to be patched to reflect the new platform {{bug|782826}} is an example of this change


== Slavealloc changes and cnames for slaves ==
== Slavealloc changes and cnames for slaves ==
Line 52: Line 68:
* Open a bug with IT for the cnames for the slaves.  Example {{bug|782870}}
* Open a bug with IT for the cnames for the slaves.  Example {{bug|782870}}


* Add the new slaves to slavealloc
* Add the new slaves to slavealloc (initially disabled)


* Add the slave password to the slave_passwords table for the appropriate poolid and distro
* Add the slave password to the slave_passwords table for the appropriate poolid and distro
Line 72: Line 88:
** Reconfig  
** Reconfig  
*** to enable changes to buildbot-configs and make the platform available for tests {{bug|777759}}
*** to enable changes to buildbot-configs and make the platform available for tests {{bug|777759}}
*** to enable the new masters as enabled in productionmasters.json see {{bug|783455}}
*** to enable the new masters in productionmasters.json see {{bug|783455}}
 
Once the slaves are in production, monitor last builds per slave http://build.mozilla.org/builds/last-job-per-slave.html to ensure there aren't any problems with hung slaves.


== Notes on configuring the client ==
== Notes on configuring the client ==
   
   
  (perhaps this should be moved to another section)
  (perhaps this should be moved to another section or document)
* disable screensaver
* disable screensaver
* disable power savings
* disable power savings
* test that resolution meets requirements set forth by devs.  This may require a dongle.
* test that resolution meets requirements set forth by devs.  This may require a dongle.
== Update releng monitoring/reporting ==
=== Wait times emails ===
Your new platform will appear as 'other' in the wait times emails unless you add a pattern match to the [https://hg.mozilla.org/build/buildapi buildapi libs]:
https://hg.mozilla.org/build/buildapi/file/default/buildapi/model/util.py
You will need to update the buildapi code in /home/buildapi/src on buildapi01.build.mozilla.org, and then restart the buildapi daemon for your code changes to take affect.
# buildapi@buildapi01
cd /home/buildapi/src
hg pull && hg up -r default
su -
# root@buildapi01
/etc/init.d/buildapi restart
=== Buildfaster ===
The buildfaster report will break unless you add your new platform to the list of _os_patterns in buildfaster_report.py:
https://hg.mozilla.org/build/braindump/file/ceaecd3c5b4f/reports/buildfaster_report.py#l73
=== BuildAPI ===
See https://bug770579.bugzilla.mozilla.org/attachment.cgi?id=747452
=== Slave health ===
TBD
Confirmed users
1,989

edits