DeveloperServices/TeamMeetings/2014-08-12: Difference between revisions

(Hal's updates)
 
(7 intermediate revisions by 4 users not shown)
Line 14: Line 14:
== Last week ==
== Last week ==
* bkero
* bkero
** Thing 1
** Deployed user repository fixes
** Thing 2
** Deployed serverlog extension on cluster, debugged
** Build packages and installed python debugging packages, then installed on hgweb1
** Diagnosed and created verbal (IRC) reports of some traffic statistics
* fubar
* fubar
** Added two build trees to DXR! Also, staging working again, all cron and config bits now in [http://hg.mozilla.org/webtools/dxr build repo], and build script refactored
** Added two build trees to DXR! Also, staging working again, all cron and config bits now in [http://hg.mozilla.org/webtools/dxr build repo], and build script refactored
Line 23: Line 25:
** oncall last week - only one late page -- clarified how unimportant the current nagios alert is (it's a leading indicator with about 80% false positive)
** oncall last week - only one late page -- clarified how unimportant the current nagios alert is (it's a leading indicator with about 80% false positive)
** started releng intern Mihai Tabara on looking at logs near the start of event to find root cause of issues.
** started releng intern Mihai Tabara on looking at logs near the start of event to find root cause of issues.
** installed pash_wrapper for ssh
* laura
* laura
** more headcount justifications
** approval from bmoss for the extra blades
* erik
** Chased hgweb spins around. With bkero, got debug symbols installed on hgweb1 to pull Python tracebacks from running processes. What actually seems to be the case (n=2) is that the spins are happening while at apr_poll()→poll() in mod_wsgi, which is weird. I'd like to get a few more backtraces out of a spinning webhead to be sure that wasn't a  fluke. Did a bunch of theorizing around mod_wsgi spin causes, directives we could frob, etc.


=== Planned for this week ===
=== Planned for this week ===
* bkero
* bkero
** Thing 1
** Update wsgi, deploy
** Thing 2
** Tune wsgi settings to see if it alleviates unavailability
** Parse logs for patterns/errors/statistics
** Push for hg update
* fubar
* fubar
** ReviewBoard web heads/admin node
** ReviewBoard web heads/admin node
Line 36: Line 45:
** more added monitoring and correlation.
** more added monitoring and correlation.
* laura
* laura
** get help from srich for rb deployment
** status board bug
** other than that: what's the most helpful thing I can do?


== Other business ==
== Other business ==
Confirmed users
631

edits