DeveloperServices/TeamMeetings/2014-08-12: Difference between revisions
Jump to navigation
Jump to search
(→Last week: Added erik.) |
|||
Line 29: | Line 29: | ||
** more headcount justifications | ** more headcount justifications | ||
** approval from bmoss for the extra blades | ** approval from bmoss for the extra blades | ||
* erik | |||
** Chased hgweb spins around. With bkero, got debug symbols installed on hgweb1 to pull Python tracebacks from running processes. What actually seems to be the case is that the spins are happening while at apr_poll()→poll() in mod_wsgi, which is weird. I'd like to get a few more backtraces out of a spinning webhead to be sure that wasn't a fluke. Did a bunch of theorizing around mod_wsgi spin causes, directives we could frob, etc. | |||
=== Planned for this week === | === Planned for this week === |
Revision as of 19:09, 12 August 2014
« previous meeting — index – next week » create?
Meeting Info
Hot items
- Still seeing intermittent sync failures (bug 1038678); ssh timeout tweaked on zeus, but mirror-pull could still stand to be more resilient
- Add'l hgweb nodes? (bug 1049519) Added two spares, but how many should we have?
Last week
- bkero
- Deployed user repository fixes
- Deployed serverlog extension on cluster, debugged
- Build packages and installed python debugging packages, then installed on hgweb1
- Diagnosed and created verbal (IRC) reports of some traffic statistics
- fubar
- Added two build trees to DXR! Also, staging working again, all cron and config bits now in build repo, and build script refactored
- Two new hgweb nodes provisioned (9 & 10); added new webhead docs
- Configured local2 syslog logging for new pash_wrapper and gps' extensions
- hwine
- oncall last week - only one late page -- clarified how unimportant the current nagios alert is (it's a leading indicator with about 80% false positive)
- started releng intern Mihai Tabara on looking at logs near the start of event to find root cause of issues.
- installed pash_wrapper for ssh
- laura
- more headcount justifications
- approval from bmoss for the extra blades
- erik
- Chased hgweb spins around. With bkero, got debug symbols installed on hgweb1 to pull Python tracebacks from running processes. What actually seems to be the case is that the spins are happening while at apr_poll()→poll() in mod_wsgi, which is weird. I'd like to get a few more backtraces out of a spinning webhead to be sure that wasn't a fluke. Did a bunch of theorizing around mod_wsgi spin causes, directives we could frob, etc.
Planned for this week
- bkero
- Update wsgi, deploy
- Tune wsgi settings to see if it alleviates unavailability
- Parse logs for patterns/errors/statistics
- Push for hg update
- fubar
- ReviewBoard web heads/admin node
- hg firefighting
- more build repos in DXR
- hwine
- more added monitoring and correlation.
- laura
- what's the most helpful thing I can do?