Breakpad/Status Meetings/2010-Feb-10: Difference between revisions

(Adding notes for Wed)
 
 
(21 intermediate revisions by the same user not shown)
Line 1: Line 1:
= Status =
= Production Status =
* Are we 100% caught up?
** CrashKill is having a tough time with TCBS crash movement
** Were all crash volumes artificially reduced and then later increased during recovery?
** Effects of date processed in prod since 1/28
** Everything is based on date_processed right? Do we need to adjust based on client crash date?
** Next steps?
*** Yes, we are caught up as of Tuesday 10am
*** Some crashes may have been lost between Sunday and Tues as Sun filer was slower
*** We're on a FreeBSD filer and back to normal now
* {{Bug|544583}} - ADU service who? priority?
** Do we want to do a bug fix release before 1.5?
*** Case by case (?)


= OOPP Blocker =
== Next Steps ==
* chofmann to file a bug on a specific 'jumpy' crash signature (top 10 -> top 300)
* {{Bug|544583}} - lower priority, we'll timebox investigation
* {{Bug|545035}} - Daily ADU/Crash bug - griswolf is on it
 
= OOPP Blocker Post Mortem =
* I suck -- [[User:Aking|Aking]] 23:39, 8 February 2010 (UTC)
* I suck -- [[User:Aking|Aking]] 23:39, 8 February 2010 (UTC)
** More QA needed?
** More QA needed?
** platform testing took place in production, work on cross-team communication
** platform testing took place in production, work on cross-team communication
* Testing error or invalid stage env slowed time to release
* Testing error or invalid stage env slowed time to release
** Deployed to prod and then rolled back
* Deployment issues
* Deployment issues
** Stage and Prod release documentation
** Stage and Prod release documentation
Line 11: Line 29:
= Homepage Slow Queries =
= Homepage Slow Queries =
* bandaid in place
* bandaid in place
* pgfouine | database administration
* {{Bug|545006}} - More RAM
* Do we need to trim TCBS data?
** if yes {{Bug|545004}}, {{Bug|545000}} who/when?
== Next Steps ==
* aravind to provide slow query logs
** ozten will setup daily report
** webdev and IT to proactively monitor Postgres health
* morgamic to evaluate memory purchase and prod requirements
= Security Bugs =
* {{Bug|543921}} - next steps or are we done for now?
* URL reports not useful in current state
* Reports generated outside of Socorro have superseded these
* Urchin states that a couple dozen views to bydomain and 120+ views to byurl
== Next Steps ==
* ozten to file bug to put all url reports behind auth


= CrashKill =
= CrashKill =
* {{Bug|545019}} - Update prod nav product versions
* {{Bug|545019}} - Update prod nav product versions
* Others covered in prod status above
= 1.5 Release Status =
* Review bug list
* UI status
* Hbase (hadoop) status
* Impediments, blockers, etc for webdevs?
== Next Steps ==
* UI on stage needs more QA and Feedback
** morgamic to talk to QA
** ozten to email mozilla.dev.planning
= Hadoop Project status =
* metrics any issues or questions?
= ADU tweaks =
* [https://bugzilla.mozilla.org/show_bug.cgi?id=539337 539337] - Need requirements
== Next Steps ==
* ken or chofmann to comment on bug
= SkipList =
* No new SkiList as of last Monday

Latest revision as of 00:33, 11 February 2010

Production Status

  • Are we 100% caught up?
    • CrashKill is having a tough time with TCBS crash movement
    • Were all crash volumes artificially reduced and then later increased during recovery?
    • Effects of date processed in prod since 1/28
    • Everything is based on date_processed right? Do we need to adjust based on client crash date?
    • Next steps?
      • Yes, we are caught up as of Tuesday 10am
      • Some crashes may have been lost between Sunday and Tues as Sun filer was slower
      • We're on a FreeBSD filer and back to normal now
  • bug 544583 - ADU service who? priority?
    • Do we want to do a bug fix release before 1.5?
      • Case by case (?)

Next Steps

  • chofmann to file a bug on a specific 'jumpy' crash signature (top 10 -> top 300)
  • bug 544583 - lower priority, we'll timebox investigation
  • bug 545035 - Daily ADU/Crash bug - griswolf is on it

OOPP Blocker Post Mortem

  • I suck -- Aking 23:39, 8 February 2010 (UTC)
    • More QA needed?
    • platform testing took place in production, work on cross-team communication
  • Testing error or invalid stage env slowed time to release
    • Deployed to prod and then rolled back
  • Deployment issues
    • Stage and Prod release documentation

Homepage Slow Queries

Next Steps

  • aravind to provide slow query logs
    • ozten will setup daily report
    • webdev and IT to proactively monitor Postgres health
  • morgamic to evaluate memory purchase and prod requirements

Security Bugs

  • bug 543921 - next steps or are we done for now?
  • URL reports not useful in current state
  • Reports generated outside of Socorro have superseded these
  • Urchin states that a couple dozen views to bydomain and 120+ views to byurl

Next Steps

  • ozten to file bug to put all url reports behind auth

CrashKill

  • bug 545019 - Update prod nav product versions
  • Others covered in prod status above

1.5 Release Status

  • Review bug list
  • UI status
  • Hbase (hadoop) status
  • Impediments, blockers, etc for webdevs?

Next Steps

  • UI on stage needs more QA and Feedback
    • morgamic to talk to QA
    • ozten to email mozilla.dev.planning

Hadoop Project status

  • metrics any issues or questions?

ADU tweaks

Next Steps

  • ken or chofmann to comment on bug

SkipList

  • No new SkiList as of last Monday