Unified Telemetry/Status reports/August 7 2015: Difference between revisions
(→Important Links/References: links) |
(tweak summary) |
||
(4 intermediate revisions by one other user not shown) | |||
Line 1: | Line 1: | ||
[https://wiki.mozilla.org/Status_reports/July_31_2015 previous weeks report] | [https://wiki.mozilla.org/Unified_Telemetry/Status_reports/July_31_2015 previous weeks report] | ||
== Unified Telemetry status report August 7, 2015 == | == Unified Telemetry status report August 7, 2015 == | ||
Line 6: | Line 6: | ||
Last week: Yellow | Last week: Yellow | ||
This week: | This week: Yellow - The primary risk is now the executive report. We're still debugging the discrepancies between the v4 and the v2 versions. We will not turn off v2 data if we don't have confidence in the executive report. | ||
=== Exec Summary === | === Exec Summary === | ||
* | * Data quality, validation | ||
* | ** Team is focusing on executive dashboard roll ups in validation effort | ||
* | ** added client probes to beta population to narrow down missing pings question and a few other issues | ||
* Added a Windows drill down for executive report data | |||
* Starting work on executive dashboard with combined v2 + v4 data | |||
* Healthreport content is up and work to use the new API is underway (no longer a big risk) | |||
* '''Proposal''': if we don't feel confident turning off v2 data for Fx41, collect v4 data for 5% of the population | |||
=== Risks/Issues === | === Risks/Issues === | ||
Line 43: | Line 47: | ||
=== Accomplished for Last Period === | === Accomplished for Last Period === | ||
Engineering & Ops | Engineering & Ops | ||
* | * FHR Jelly PR | ||
* DataBricks meeting | |||
* Client work: [https://docs.google.com/spreadsheets/d/1yAJmgCGYyk1d7A41DZa653Z3u2AbH-kDWsO1vPSgbfE/edit?usp=sharing Spreadsheet] | * Client work: [https://docs.google.com/spreadsheets/d/1yAJmgCGYyk1d7A41DZa653Z3u2AbH-kDWsO1vPSgbfE/edit?usp=sharing Spreadsheet] | ||
* Data validation | * Data validation | ||
* probes for telemetry | |||
** Missing pings [https://etherpad.mozilla.org/u230yVoP9S doc] | ** Missing pings [https://etherpad.mozilla.org/u230yVoP9S doc] | ||
** Generated v4 data set with complete set of pings from all clients seen on nightly: https://bugzilla.mozilla.org/show_bug.cgi?id=1171265#c24 | ** Generated v4 data set with complete set of pings from all clients seen on nightly: https://bugzilla.mozilla.org/show_bug.cgi?id=1171265#c24 | ||
Line 62: | Line 68: | ||
Engineering | Engineering | ||
* Client | * Client | ||
** uplifts | ** uplifts for probes | ||
** | ** data quality investigations | ||
** datachoices infobar bug | |||
* Pipeline | * Pipeline | ||
** In talk with Databricks wrt to Sparks hosting | ** In talk with Databricks wrt to Sparks hosting | ||
Line 70: | Line 77: | ||
** data retention spec | ** data retention spec | ||
* Data validation | * Data validation | ||
** update data sets (exe dashboard) | |||
** [https://docs.google.com/document/d/1KpcQy_QEfizd6Q4MFvOt5rMCL32Ef49_2T5yXNqWIWw/edit#Callingoutvarious acceptance criteria] | ** [https://docs.google.com/document/d/1KpcQy_QEfizd6Q4MFvOt5rMCL32Ef49_2T5yXNqWIWw/edit#Callingoutvarious acceptance criteria] | ||
** missing subsessions ping investigation | ** missing subsessions ping investigation | ||
Line 77: | Line 85: | ||
Ops | Ops | ||
* building automated jenkins deployments | * building automated jenkins deployments | ||
* nginx load | * nginx load balancing | ||
QA: | QA: | ||
* Look into prod T issue with Ops | * Look into prod T issue with Ops | ||
Line 99: | Line 107: | ||
=== Important Links/References === | === Important Links/References === | ||
* [ | * [[Unified_Telemetry|https://wiki.mozilla.org/Unified_Telemetry]] | ||
* [https:// | * [[CloudServices/DataPipeline|https://wiki.mozilla.org/CloudServices/DataPipeline]] | ||
* [http://mzl.la/1FPWObo bug list] |
Latest revision as of 23:20, 7 August 2015
Unified Telemetry status report August 7, 2015
Overall Project Health
Last week: Yellow
This week: Yellow - The primary risk is now the executive report. We're still debugging the discrepancies between the v4 and the v2 versions. We will not turn off v2 data if we don't have confidence in the executive report.
Exec Summary
- Data quality, validation
- Team is focusing on executive dashboard roll ups in validation effort
- added client probes to beta population to narrow down missing pings question and a few other issues
- Added a Windows drill down for executive report data
- Starting work on executive dashboard with combined v2 + v4 data
- Healthreport content is up and work to use the new API is underway (no longer a big risk)
- Proposal: if we don't feel confident turning off v2 data for Fx41, collect v4 data for 5% of the population
Risks/Issues
Description of Risks/Issues | State | Owner | Plan to Resolve/Mitigation | Target Date |
---|---|---|---|---|
Investigate gaps in pings | Open | Stuart/Alessio | https://bugzilla.mozilla.org/show_bug.cgi?id=1185123, working doc | 8/10 |
Data integrity between V2/V4 and V4 internal data consistency | Open | Brendan/Sam | Investigation in progress. Added resources (Sam). https://etherpad.mozilla.org/fhr-v4-validation | 8/10 |
Data continuity across V2/V4 | Open | Katie/Mark/Trink | Plan, Metabug | 8/10 |
Legal review | Open | BDS/Legal | Meeting between groups | 8/10 |
QA sign off (functional, load) | Open | Stuart | Telemetry/Testing | 8/10 |
Operations - data retention requirements | Open | Travis/Katie | Eng team owes ops a doc defining ping types and data retention requirements | 8/10 |
Operations - analysis tools & microservices | Open | Travis/Mark/Roberto | Architecture/Data flow diagram | 8/10 |
Data loss incident | Fixed | mreid/whd/trink | Tee server needs to return error status from old or new. Added Ops resources (Daniel Thornton). | 7/15 |
Remote about:healthreport content | Open | Katie/BDS | Working on pr for fhr-jelly, will deploy next week | 8/10 |
Budget, size of UT pings | Open | Mark/BDS | https://bugzilla.mozilla.org/show_bug.cgi?id=1182693 | 8/10 |
Analysis difficulty | Open | Katie/tbd | Spark training; need comprehensive plan | 8/10 |
Accomplished for Last Period
Engineering & Ops
- FHR Jelly PR
- DataBricks meeting
- Client work: Spreadsheet
- Data validation
- probes for telemetry
- Missing pings doc
- Generated v4 data set with complete set of pings from all clients seen on nightly: https://bugzilla.mozilla.org/show_bug.cgi?id=1171265#c24
- Work on missing subsessions analysis (hints at a client bug): https://bugzilla.mozilla.org/show_bug.cgi?id=1171268
- Pipeline scaling work
- Back fill of executive summary pings (hindsight)
- snappy support added to Spark and Heka infrastructure
QA
- Load testing
- work with softvision
Project management
- meetings, emails, hand waving
Planned for Upcoming Period
Engineering
- Client
- uplifts for probes
- data quality investigations
- datachoices infobar bug
- Pipeline
- In talk with Databricks wrt to Sparks hosting
- Mechanism for Heka state preservation when it gets wedged
- UT specific monitoring and alerting
- data retention spec
- Data validation
- update data sets (exe dashboard)
- acceptance criteria
- missing subsessions ping investigation
- Many submission for few clients issue
- Data continuity
- Document strategy for executive dashboards with v2 + v4 data
Ops
- building automated jenkins deployments
- nginx load balancing
QA:
- Look into prod T issue with Ops
- continue test suite creation
- finalizing long term QA engagement (softvision engagement, tooling asks for CI loop based testing)
Project Management
- Finish triage of bugs
- remainder of release tasks scheduled
Outstanding requests not yet road mapped into a release
Description | State | Owner | Plan to Resolve/Mitigation | Target Date |
---|---|---|---|---|
FireFox OS - app pings | Open | Katie | Need to schedule and understand impact on project | TBD |
histograms for loop/hello | Open | Katie | Need to schedule and understand impact on project | TBD |