Breakpad/Status Meetings/2015-08-19

« previous meetingindexnext week » create?

Meeting Info

Breakpad status meetings occur on Wed at 11:00am Pacific Time.

Conference numbers:

   Vidyo: Stability 
   650-903-0800 x92 conf 98200#
   800-707-2533 (pin 369) conf 98200# 

IRC backchannel: #breakpad
Mountain View: Dancing Baby (3rd floor)

Operations Updates

  • stage is working again
    • fair and balanced submitter is running
      • 10% of what we receive is now going to stage
    • adi is coming in now
  • JSON problems reoccured in prod
    • stage has a patch the filters keys as well as values
    • manually submitting a problematic crash leads us to believe stage is fixed
    • write out the null bytes
  • updating supersearchfields is not working
    • adrian working on a patch
  • datadog runs on UDP
    • we've been losing windows of time
  • webapp has been registering as down to pingdom/nagios
    • high response time requests from the webapp, but they are serving requests in those windows
    • can we get newrelic to look?
      • talk to Travis
      • we need 8 hosts for a while
    • moztrap was seeing this
      • cache was acclimating pingdom to fast responses
      • whole cache was invalidating at once
    • we are not actually down in these times, may just have higher response times above the timeouts
    • the only thing more dangerous than no alerting is noisy alerting
  • pingdom accounts for the team?
    • going to take approx 1 month to set us up for pingdom on the mozilla account
  • sentry is down
    • cannot ingest
    • should be back by the end of the day
  • loggly
    • django errors don't end up in syslog, they don't go to stdout
    • maybe we don't want to continue with loggly
      • the features it has over other log aggregators are not that useful


Other

  • How many EC2 webhead nodes do we have to prod?
    • Wanna scale that down?

Project Updates

Socorro Bug Tracker
this week's bugs

Deployment Triage

PR Triage

QA

  • Help tracking down an intermittent failure
    • ReadTimeout: HTTPSConnectionPool(host='crash-stats.mozilla.com', port=443): Read timed out. (read timeout=10)

other business

Travel, etc

Links