CIDuty/SVMeetings/Mar13-Mar17: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
No edit summary
 
No edit summary
Line 19: Line 19:
= Possible bugs =  
= Possible bugs =  
* Write a tool to update bugzilla bugs for outstanding machine loans and ask if the loaner is still needed on say, a weekly basis: https://bugzilla.mozilla.org/show_bug.cgi?id=1171165
* Write a tool to update bugzilla bugs for outstanding machine loans and ask if the loaner is still needed on say, a weekly basis: https://bugzilla.mozilla.org/show_bug.cgi?id=1171165
* Long Inter-Job Timing on t-yosemite-r7 https://bugzilla.mozilla.org/show_bug.cgi?id=1258480
* Mitigating DoS Potential on BuildBot Masters web service on port 8201: https://bugzilla.mozilla.org/show_bug.cgi?id=1324802
* Create new script to setup a buildbot master for development: https://bugzil.la/1275135
** [alin]: Iris & Alison are working on this


= From coop: =
= From coop: =
Line 44: Line 42:
* How is your workload these days?  Do you feel like there is too much to do or too little? What percentage of time are you spending on operational requests vs writing patches for automation etc?
* How is your workload these days?  Do you feel like there is too much to do or too little? What percentage of time are you spending on operational requests vs writing patches for automation etc?
* How are things after the s3 outage yesterday?
* How are things after the s3 outage yesterday?
= SV: 2017/03/13 - 2017/03/17 =
= SV: 2017/03/20 - 2017/03/24 =
* New buidduty team member: Sebastian Păcurar (:spacurar) \o/
* New buidduty team member: Sebastian Păcurar (:spacurar) \o/
** It seems that he isn’t in ‘mozilla-release’ mail group
** It would be nice if Sebi could get access to Nagios dashboard too (not urgent)
*** [alin] DONE
*** Kmoir - will ask arr about this
* Bug 1332337 - migrate machine health dashboard to releng services
*** [alin] there’s a bug on file for this, still not working atm
** Had some short meetings with  Rok
 
** Created  api.yml and for api.py for test purpose at this phase
* [https://bugzilla.mozilla.org/show_bug.cgi?id=1332337 Bug 1332337] - migrate machine health dashboard to releng services
** Plan:
** working on loading the informations for machine page ([https://secure.pub.build.mozilla.org/builddata/reports/slave_health/slave.html?class=test&type=tst-linux32-spot&name=tst-linux32-spot-124 example])
*** create api.py,api.yml structure
** Need to speak with Rok after his return to clarify some possible issue
*** Redirection of frontend to use those informations from api.py
 
*** Rewrite api.py
* [https://bugzilla.mozilla.org/show_bug.cgi?id=1233170 Bug 1233170] - Slave health page: details page opens several/multiple http auth requests
** Did some tests on frontend to use the informations from api.py, I was able to generate and load main page (slavehealth) , pool page ( pool example) and now working on loading the informations for machine page ( example)
** Callek spoke with :jabba and added Aryx to vpn_sheriffs group
** Most of these changes  are now in my github repository : slave_health_tests
** Now he should be able to login to that page
* Bug 1342825 - Don't run mochitest browser screenshots on PGO builds
 
** mochitest-browser-chrome tests run on PGO on WIndows 7 ix and Windows 8 ix on mozilla-central branch in buildbot. They can be found in Treeherder as Tier 3
* [https://bugzilla.mozilla.org/show_bug.cgi?id=1338871 Bug 1338871] - Enable Talos tests for linux64-stylo builds
** Tried to find a way to remove the tests on Linux64/opt on nightlies, but I receive an error from Task Graph when trying to generate graphs:
** Still investigate why the linux64-stylo talos jobs aren’t appearing on the master
*** There is an exception thrown when trying to generate graphs from  parameters.yml artifact of the task: “extra parameters: include_nightly”
** The last patch was not pushed so for talos-chrome we have linux64-stylo/.*: [] instead of linux64-stylo/.*: ['mozilla-central']
*** This exception blocks me from generating the graphs I need to work on, since I need to disable mochitest-browser-chrome tests on nightlies.
 
*** Kmoir - attach the paramaters.yml file and the patch to the bug and I will try to figure out what is wrong
* [https://bugzilla.mozilla.org/show_bug.cgi?id=1338239 Bug 1338239] - Remove tc() prefix from platforms that no longer exist in buildbot
* Bug 1338239 - Remove tc() prefix from platforms that no longer exist in buildbot
** currently investigating this
** currently investigating this
** will need to find a way to differentiate platforms when setting up group symbols
** will need to find a way to differentiate platforms when setting up group symbols
* Releng tech-talks  for the Buildduty team
** we really appreciate the initiative and feel that’ll help us a lot
** Link: https://docs.google.com/document/d/1EMneZ8F9YpxljpajvlE6mYA_xHKSBrCXbQ8AXsy_rmA/
* [https://bugzilla.mozilla.org/show_bug.cgi?id=1309812 Bug 1309812] - Windows 7 VM reftest jobs are not running by default if 'Windows 7' is not explicitly specified
** Asked catlee if we should have reftest jobs running by default on try, without specifying "Windows 7" in the try syntax.
*** Catlee answered that we should give it a try but we should keep a close eye on capacity and problems getting new instances from Amazon.
* [https://bugzilla.mozilla.org/show_bug.cgi?id=1342825 Bug 1342825] - Don't run mochitest browser screenshots on PGO builds
** I already provided a patch to remove "linux64-nightly opt" variant in TaskCluster (not landed yet)
** I don't see an easy way to remove the pgo variant of mochitest-browser-screenshots.

Revision as of 09:39, 27 March 2017

Meeting Details

  • Upcoming vacation/PTO:
    • Alin:
    • Andrei:
    • Sebastian:
    • Florin:
    • Coop:
    • Kmoir:
  • Holidays:
    • Canada:
    • Romania:

Possible bugs

From coop:

From kmoir:

  • Other possible bugs:
    • Is there further automation that you can implement to make regular tasks easier i.e. the machine loan email as a template
    • [alin]: as discussed in Hawaii, we’ll start investigating what improvements can be done here
  • Other goals
    • Puppet master upgrades: we're on a really old version, and should have a better story for upgrades going forward. We can move a single platform to start(kmoir needs to open bug here and ask arr on current state)
    • Talked to Rok this week re moving releng web properties to releng services. He said that his documentation is ready to use and he is willing to pair program with people as needed. We could move one of this projects (slave health, buildapi, slavealloc) as a start, depending on what you think is most useful. https://docs.mozilla-releng.net/.
  • Priorities update. Lawrence had a meeting with his mgmt team in Toronto recently and established the following ordered priorities
    • 1. Data centre move - two different data centres (will ask Amy for plan)
    • 2. OSX on TC (we need to do this so we don’t have to move mac builders into new DC)
    • 3. Quantum
    • 4. Security, Disaster recovery (part of having two data centres now). Securing the autoland pipeline
    • 5. Phase out aurora (Q1 planning)
    • 6. Reduce orange factor
    • 7. Blocking bad commits
    • 8. TC for Linux (almost done) talos needs to be done
    • 9. TC Windows
  • How is your workload these days? Do you feel like there is too much to do or too little? What percentage of time are you spending on operational requests vs writing patches for automation etc?
  • How are things after the s3 outage yesterday?

SV: 2017/03/20 - 2017/03/24

  • New buidduty team member: Sebastian Păcurar (:spacurar) \o/
    • It would be nice if Sebi could get access to Nagios dashboard too (not urgent)
      • Kmoir - will ask arr about this
      • [alin] there’s a bug on file for this, still not working atm
  • Bug 1332337 - migrate machine health dashboard to releng services
    • working on loading the informations for machine page (example)
    • Need to speak with Rok after his return to clarify some possible issue
  • Bug 1233170 - Slave health page: details page opens several/multiple http auth requests
    • Callek spoke with :jabba and added Aryx to vpn_sheriffs group
    • Now he should be able to login to that page
  • Bug 1338871 - Enable Talos tests for linux64-stylo builds
    • Still investigate why the linux64-stylo talos jobs aren’t appearing on the master
    • The last patch was not pushed so for talos-chrome we have linux64-stylo/.*: [] instead of linux64-stylo/.*: ['mozilla-central']
  • Bug 1338239 - Remove tc() prefix from platforms that no longer exist in buildbot
    • currently investigating this
    • will need to find a way to differentiate platforms when setting up group symbols
  • Bug 1309812 - Windows 7 VM reftest jobs are not running by default if 'Windows 7' is not explicitly specified
    • Asked catlee if we should have reftest jobs running by default on try, without specifying "Windows 7" in the try syntax.
      • Catlee answered that we should give it a try but we should keep a close eye on capacity and problems getting new instances from Amazon.
  • Bug 1342825 - Don't run mochitest browser screenshots on PGO builds
    • I already provided a patch to remove "linux64-nightly opt" variant in TaskCluster (not landed yet)
    • I don't see an easy way to remove the pgo variant of mochitest-browser-screenshots.