DeveloperServices/TeamMeetings/2014-08-12: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
(→‎Last week: Added erik.)
 
(2 intermediate revisions by 2 users not shown)
Line 30: Line 30:
** approval from bmoss for the extra blades
** approval from bmoss for the extra blades
* erik
* erik
** Chased hgweb spins around. With bkero, got debug symbols installed on hgweb1 to pull Python tracebacks from running processes. What actually seems to be the case is that the spins are happening while at apr_poll()→poll() in mod_wsgi, which is weird. I'd like to get a few more backtraces out of a spinning webhead to be sure that wasn't a  fluke. Did a bunch of theorizing around mod_wsgi spin causes, directives we could frob, etc.
** Chased hgweb spins around. With bkero, got debug symbols installed on hgweb1 to pull Python tracebacks from running processes. What actually seems to be the case (n=2) is that the spins are happening while at apr_poll()→poll() in mod_wsgi, which is weird. I'd like to get a few more backtraces out of a spinning webhead to be sure that wasn't a  fluke. Did a bunch of theorizing around mod_wsgi spin causes, directives we could frob, etc.


=== Planned for this week ===
=== Planned for this week ===
Line 45: Line 45:
** more added monitoring and correlation.
** more added monitoring and correlation.
* laura
* laura
** what's the most helpful thing I can do?
** get help from srich for rb deployment
** status board bug
** other than that: what's the most helpful thing I can do?


== Other business ==
== Other business ==

Latest revision as of 19:38, 12 August 2014

« previous meetingindexnext week » create?

Meeting Info

Hot items

  • Still seeing intermittent sync failures (bug 1038678); ssh timeout tweaked on zeus, but mirror-pull could still stand to be more resilient
  • Add'l hgweb nodes? (bug 1049519) Added two spares, but how many should we have?

Last week

  • bkero
    • Deployed user repository fixes
    • Deployed serverlog extension on cluster, debugged
    • Build packages and installed python debugging packages, then installed on hgweb1
    • Diagnosed and created verbal (IRC) reports of some traffic statistics
  • fubar
    • Added two build trees to DXR! Also, staging working again, all cron and config bits now in build repo, and build script refactored
    • Two new hgweb nodes provisioned (9 & 10); added new webhead docs
    • Configured local2 syslog logging for new pash_wrapper and gps' extensions
  • hwine
    • oncall last week - only one late page -- clarified how unimportant the current nagios alert is (it's a leading indicator with about 80% false positive)
    • started releng intern Mihai Tabara on looking at logs near the start of event to find root cause of issues.
    • installed pash_wrapper for ssh
  • laura
    • more headcount justifications
    • approval from bmoss for the extra blades
  • erik
    • Chased hgweb spins around. With bkero, got debug symbols installed on hgweb1 to pull Python tracebacks from running processes. What actually seems to be the case (n=2) is that the spins are happening while at apr_poll()→poll() in mod_wsgi, which is weird. I'd like to get a few more backtraces out of a spinning webhead to be sure that wasn't a fluke. Did a bunch of theorizing around mod_wsgi spin causes, directives we could frob, etc.

Planned for this week

  • bkero
    • Update wsgi, deploy
    • Tune wsgi settings to see if it alleviates unavailability
    • Parse logs for patterns/errors/statistics
    • Push for hg update
  • fubar
    • ReviewBoard web heads/admin node
    • hg firefighting
    • more build repos in DXR
  • hwine
    • more added monitoring and correlation.
  • laura
    • get help from srich for rb deployment
    • status board bug
    • other than that: what's the most helpful thing I can do?

Other business

PTOs, etc

Links

Goals