Unified Telemetry/Status reports/August 7 2015: Difference between revisions

(tweak summary)
 
(4 intermediate revisions by one other user not shown)
Line 1: Line 1:
[https://wiki.mozilla.org/Status_reports/July_31_2015 previous weeks report]
[https://wiki.mozilla.org/Unified_Telemetry/Status_reports/July_31_2015 previous weeks report]


== Unified Telemetry status report August 7, 2015 ==
== Unified Telemetry status report August 7, 2015 ==
Line 6: Line 6:
Last week: Yellow
Last week: Yellow


This week: Red
This week: Yellow - The primary risk is now the executive report. We're still debugging the discrepancies between the v4 and the v2 versions. We will not turn off v2 data if we don't have confidence in the executive report.


=== Exec Summary ===
=== Exec Summary ===
* Some outstanding issues from the July 30 milestone were missed, meaning not all data validation are yet completed, moved remaining items into August 10th (Team has adopted FireFox iterations to align with greater org)
* Data quality, validation
* Final client changes needed by August 10
** Team is focusing on executive dashboard roll ups in validation effort
* Testing plan up on wiki:[[Telemetry/Testing]]
** added client probes to beta population to narrow down missing pings question and a few other issues
* Added a Windows drill down for executive report data
* Starting work on executive dashboard with combined v2 + v4 data
* Healthreport content is up and work to use the new API is underway (no longer a big risk)
* '''Proposal''': if we don't feel confident turning off v2 data for Fx41, collect v4 data for 5% of the population


=== Risks/Issues ===
=== Risks/Issues ===
Line 43: Line 47:
=== Accomplished for Last Period ===
=== Accomplished for Last Period ===
Engineering & Ops
Engineering & Ops
* Unexpected jump in traffic last friday (beta and release), doubled instance size on monday
* FHR Jelly PR
* DataBricks meeting
* Client work: [https://docs.google.com/spreadsheets/d/1yAJmgCGYyk1d7A41DZa653Z3u2AbH-kDWsO1vPSgbfE/edit?usp=sharing Spreadsheet]
* Client work: [https://docs.google.com/spreadsheets/d/1yAJmgCGYyk1d7A41DZa653Z3u2AbH-kDWsO1vPSgbfE/edit?usp=sharing Spreadsheet]
* Data validation
* Data validation
* probes for telemetry
** Missing pings [https://etherpad.mozilla.org/u230yVoP9S doc]
** Missing pings [https://etherpad.mozilla.org/u230yVoP9S doc]
** Generated v4 data set with complete set of pings from all clients seen on nightly: https://bugzilla.mozilla.org/show_bug.cgi?id=1171265#c24
** Generated v4 data set with complete set of pings from all clients seen on nightly: https://bugzilla.mozilla.org/show_bug.cgi?id=1171265#c24
Line 62: Line 68:
Engineering
Engineering
* Client
* Client
** uplifts required to hit beta
** uplifts for probes
** focus on work required for about:healthreport (use new apis and migrate content)
** data quality investigations
** datachoices infobar bug
* Pipeline
* Pipeline
** In talk with Databricks wrt to Sparks hosting
** In talk with Databricks wrt to Sparks hosting
Line 70: Line 77:
** data retention spec
** data retention spec
* Data validation
* Data validation
** update data sets (exe dashboard)
** [https://docs.google.com/document/d/1KpcQy_QEfizd6Q4MFvOt5rMCL32Ef49_2T5yXNqWIWw/edit#Callingoutvarious acceptance criteria]
** [https://docs.google.com/document/d/1KpcQy_QEfizd6Q4MFvOt5rMCL32Ef49_2T5yXNqWIWw/edit#Callingoutvarious acceptance criteria]
** missing subsessions ping investigation
** missing subsessions ping investigation
Line 77: Line 85:
Ops
Ops
* building automated jenkins deployments
* building automated jenkins deployments
* nginx load balanacing
* nginx load balancing
QA:
QA:
* Look into prod T issue with Ops
* Look into prod T issue with Ops
Line 99: Line 107:


=== Important Links/References ===
=== Important Links/References ===
* [project timeline]
* [[Unified_Telemetry|https://wiki.mozilla.org/Unified_Telemetry]]
* [https://etherpad.mozilla.org/MkHZgXc84o etherpad]
* [[CloudServices/DataPipeline|https://wiki.mozilla.org/CloudServices/DataPipeline]]
* [http://mzl.la/1FPWObo bug list]

Latest revision as of 23:20, 7 August 2015

previous weeks report

Unified Telemetry status report August 7, 2015

Overall Project Health

Last week: Yellow

This week: Yellow - The primary risk is now the executive report. We're still debugging the discrepancies between the v4 and the v2 versions. We will not turn off v2 data if we don't have confidence in the executive report.

Exec Summary

  • Data quality, validation
    • Team is focusing on executive dashboard roll ups in validation effort
    • added client probes to beta population to narrow down missing pings question and a few other issues
  • Added a Windows drill down for executive report data
  • Starting work on executive dashboard with combined v2 + v4 data
  • Healthreport content is up and work to use the new API is underway (no longer a big risk)
  • Proposal: if we don't feel confident turning off v2 data for Fx41, collect v4 data for 5% of the population

Risks/Issues

Description of Risks/Issues State Owner Plan to Resolve/Mitigation Target Date
Investigate gaps in pings Open Stuart/Alessio https://bugzilla.mozilla.org/show_bug.cgi?id=1185123, working doc 8/10
Data integrity between V2/V4 and V4 internal data consistency Open Brendan/Sam Investigation in progress. Added resources (Sam). https://etherpad.mozilla.org/fhr-v4-validation 8/10
Data continuity across V2/V4 Open Katie/Mark/Trink Plan, Metabug 8/10
Legal review Open BDS/Legal Meeting between groups 8/10
QA sign off (functional, load) Open Stuart Telemetry/Testing 8/10
Operations - data retention requirements Open Travis/Katie Eng team owes ops a doc defining ping types and data retention requirements 8/10
Operations - analysis tools & microservices Open Travis/Mark/Roberto Architecture/Data flow diagram 8/10
Data loss incident Fixed mreid/whd/trink Tee server needs to return error status from old or new. Added Ops resources (Daniel Thornton). 7/15
Remote about:healthreport content Open Katie/BDS Working on pr for fhr-jelly, will deploy next week 8/10
Budget, size of UT pings Open Mark/BDS https://bugzilla.mozilla.org/show_bug.cgi?id=1182693 8/10
Analysis difficulty Open Katie/tbd Spark training; need comprehensive plan 8/10

Accomplished for Last Period

Engineering & Ops

QA

  • Load testing
  • work with softvision

Project management

  • meetings, emails, hand waving

Planned for Upcoming Period

Engineering

  • Client
    • uplifts for probes
    • data quality investigations
    • datachoices infobar bug
  • Pipeline
    • In talk with Databricks wrt to Sparks hosting
    • Mechanism for Heka state preservation when it gets wedged
    • UT specific monitoring and alerting
    • data retention spec
  • Data validation
    • update data sets (exe dashboard)
    • acceptance criteria
    • missing subsessions ping investigation
    • Many submission for few clients issue
  • Data continuity
    • Document strategy for executive dashboards with v2 + v4 data

Ops

  • building automated jenkins deployments
  • nginx load balancing

QA:

  • Look into prod T issue with Ops
  • continue test suite creation
  • finalizing long term QA engagement (softvision engagement, tooling asks for CI loop based testing)

Project Management

  • Finish triage of bugs
  • remainder of release tasks scheduled

Outstanding requests not yet road mapped into a release

Description State Owner Plan to Resolve/Mitigation Target Date
FireFox OS - app pings Open Katie Need to schedule and understand impact on project TBD
histograms for loop/hello Open Katie Need to schedule and understand impact on project TBD

Important Links/References