Unified Telemetry/Status reports/August 14 2015: Difference between revisions

 
(5 intermediate revisions by the same user not shown)
Line 6: Line 6:
Last week: Yellow
Last week: Yellow


This week: Yellow - The primary risk is now the executive report. We're still debugging the discrepancies between the v4 and the v2 versions. We will not turn off v2 data if we don't have confidence in the executive report.  
This week: Yellow - Discrepancies in the executive rollups remain a significant risk; if we don't resolve these in the next two week iteration, we can't turn off FHRv4. As a risk mitigation strategy, we've implemented the ability to send v4 data for a 5% sample of v4 clients for the opt-out release population. We have a meeting scheduled on '''Aug 24''' to make the decision, so that we can have everything set to go by Fx41 beta6.


=== Exec Summary ===
=== Exec Summary ===
* Data quality, validation
* Data quality, validation
** Team is focusing on executive dashboard roll ups in validation effort
** Client side validation tool being built
** added client probes to beta population to narrow down missing pings question and a few other issues
** continued focus on v2/v4 executive dashboard roll ups
* Added a Windows drill down for executive report data
* healthreport content was pushed live; first pass at API changes have been implemented.
* Starting work on executive dashboard with combined v2 + v4 data
* Healthreport content is up and work to use the new API is underway (no longer a big risk)
* '''Proposal''': if we don't feel confident turning off v2 data for Fx41, collect v4 data for 5% of the population


=== Risks/Issues ===
=== Risks/Issues ===
Line 22: Line 19:
! Description of Risks/Issues !! State !! Owner !! Plan to Resolve/Mitigation !! Target Date
! Description of Risks/Issues !! State !! Owner !! Plan to Resolve/Mitigation !! Target Date
|-
|-
| Investigate gaps in pings || Open || Stuart/Alessio || https://bugzilla.mozilla.org/show_bug.cgi?id=1185123, [https://etherpad.mozilla.org/u230yVoP9S working doc] || 8/10
| Investigate gaps in pings || Open || Stuart/Alessio || https://bugzilla.mozilla.org/show_bug.cgi?id=1185123, [https://etherpad.mozilla.org/u230yVoP9S working doc] || 8/24
|-
|-
| Data integrity between V2/V4 and V4 internal data consistency || Open || Brendan/Sam || Investigation in progress. Added resources (Sam). https://etherpad.mozilla.org/fhr-v4-validation || 8/10
| Data integrity between V2/V4 and V4 internal data consistency || Open || Brendan/Sam || Investigation in progress. Added resources (Sam). https://etherpad.mozilla.org/fhr-v4-validation || 8/24
|-
|-
| Data continuity across V2/V4 || Open || Katie/Mark/Trink || [https://docs.google.com/a/mozilla.com/document/d/1VzQHfzfA-S_lO2wpXDFjDzSJntJCMwP03TzefIj7RrE/edit?usp=sharing Plan], [https://bugzilla.mozilla.org/show_bug.cgi?id=1182684 Metabug] || 8/10
| Data continuity across V2/V4 || Open || Katie/Mark/Trink || [https://docs.google.com/a/mozilla.com/document/d/1VzQHfzfA-S_lO2wpXDFjDzSJntJCMwP03TzefIj7RrE/edit?usp=sharing Plan], [https://bugzilla.mozilla.org/show_bug.cgi?id=1182684 Metabug] || 8/24
|-
|-
| Legal review || Open || BDS/Legal || Meeting between groups || 8/10
| Legal review || Open || BDS/Legal || Meeting between groups || 8/24
|-
|-
| QA sign off (functional, load) || Open || Stuart || [[Telemetry/Testing]] || 8/10
| QA sign off (functional, load) || Open || Stuart || [[Telemetry/Testing]] || 8/24
|-
|-
| Operations - data retention requirements || Open || Travis/Katie || Eng team owes ops a doc defining ping types and data retention requirements || 8/10
| Operations - data retention requirements || Open || Travis/Katie || Eng team owes ops a doc defining ping types and data retention requirements || 8/24
|-
|-
| Operations - analysis tools & microservices || Open || Travis/Mark/Roberto || [https://docs.google.com/a/mozilla.com/document/d/1KoLtIFV-aZtxruSVNmcc26F22MfqWjDynKgZ6adYk54/edit?usp=sharing%20 Architecture/Data flow diagram]|| 8/10
| Operations - analysis tools & microservices || Open || Travis/Mark/Roberto || [https://docs.google.com/a/mozilla.com/document/d/1KoLtIFV-aZtxruSVNmcc26F22MfqWjDynKgZ6adYk54/edit?usp=sharing%20 Architecture/Data flow diagram]|| 8/24
|-
|-
| Data loss incident || Fixed || mreid/whd/trink || [https://bugzilla.mozilla.org/show_bug.cgi?id=1179128 Tee server needs to return error status from old or new]. Added Ops resources (Daniel Thornton). || 7/15
| Data loss incident || Fixed || mreid/whd/trink || [https://bugzilla.mozilla.org/show_bug.cgi?id=1179128 Tee server needs to return error status from old or new]. Added Ops resources (Daniel Thornton). || 7/15
|-
|-
| Remote about:healthreport content || Open || Katie/BDS || Working on pr for [https://github.com/mozilla/fhr-jelly fhr-jelly], will deploy next week || 8/10
| Remote about:healthreport content || Done || Katie/BDS || Working on pr for [https://github.com/mozilla/fhr-jelly fhr-jelly], will deploy next week || 8/10
|-
|-
| Budget, size of UT pings || Open || Mark/BDS || https://bugzilla.mozilla.org/show_bug.cgi?id=1182693 || 8/10
| Budget, size of UT pings || Open || Mark/BDS || https://bugzilla.mozilla.org/show_bug.cgi?id=1182693 || 8/10
Line 48: Line 45:
please see http://benjamin.smedbergs.us/weekly-updates.fcgi/project/firefox-measurement
please see http://benjamin.smedbergs.us/weekly-updates.fcgi/project/firefox-measurement


=== Planned for Upcoming Period ===
=== Planned for Next Period ===
 
please see http://benjamin.smedbergs.us/weekly-updates.fcgi/project/firefox-measurement
Engineering
* Client
** uplifts for probes
** data quality investigations
** datachoices infobar bug
* Pipeline
** In talk with Databricks wrt to Sparks hosting
** Mechanism for Heka state preservation when it gets wedged
** UT specific monitoring and alerting
** data retention spec
* Data validation
** update data sets (exe dashboard)
** [https://docs.google.com/document/d/1KpcQy_QEfizd6Q4MFvOt5rMCL32Ef49_2T5yXNqWIWw/edit#Callingoutvarious acceptance criteria]
** missing subsessions ping investigation
** Many submission for few clients [https://bugzilla.mozilla.org/show_bug.cgi?id=1142543 issue]
* Data continuity
** Document strategy for executive dashboards with v2 + v4 data
Ops
* building automated jenkins deployments
* nginx load balancing
QA:
* Look into prod T issue with Ops
* continue test suite creation
* finalizing long term QA engagement (softvision engagement, tooling asks for CI loop based testing)
Project Management
* Finish triage of bugs
* remainder of release tasks scheduled


=== Outstanding requests not yet road mapped into a release ===
=== Outstanding requests not yet road mapped into a release ===