CloudServices/DataPipeline: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
(→‎Overview: Now processing Telemetry data)
(→‎Telemetry: Added a bunch of code links)
Line 64: Line 64:
* https://github.com/mozilla/pipeline-monitoring-dashboard
* https://github.com/mozilla/pipeline-monitoring-dashboard
=== Telemetry ===
=== Telemetry ===
* https://github.com/mozilla/telemetry-server
 
* https://github.com/bsmedberg/telemetry-experiments-dashboard
{| class="wikitable"
|-
! Link !! Description
|-
| https://github.com/vitillo/telemetry-onboarding || Slides / notebooks for Telemetry Onboarding
|-
| https://github.com/mozilla/telemetry-server || Code for analysis.telemetry.mozilla.org among other things
|-
| https://github.com/bsmedberg/telemetry-experiments-dashboard || A dashboard to track the deployment of Firefox Telemetry Experiments
|-
| https://github.com/mozilla/telemetry-batch-view || A Scala framework to build derived datasets, aka batch views, of Telemetry data.
|-
| https://github.com/mozilla/cerberus || Automatic alert system for telemetry histograms
|-
| https://github.com/mozilla/emr-bootstrap-spark || AWS bootstrap scripts for Mozilla's flavoured Spark setup.
|-
| https://github.com/mozilla/jupyter-notebook-gist || Plugin to create, list, and load GitHub Gists from Jupyter notebooks
|-
| https://github.com/mreid-moz/jupyter-spark || Jupyter Notebook extension for Apache Spark integration
|-
| https://github.com/mozilla/python_mozaggregator || Aggregator job for telemetry.mozilla.org
|-
| https://github.com/mozilla/python_moztelemetry || Spark bindings for Mozilla Telemetry
|-
| https://github.com/mozilla/telemetry-analysis-service || Eventual home of the revamped a.t.m.o (per Bug 1248688)
|-
| https://github.com/mozilla/telemetry-tools || Utility code to work with Mozilla Telemetry data
|}
*


= Archive =
= Archive =
* [https://docs.google.com/a/mozilla.com/document/d/1CTazW99zBK5K40f-fgSyTPw9IXgmFYjQmNhzxTT9Tts/edit?usp=sharing post workweek roadmap]
* [https://docs.google.com/a/mozilla.com/document/d/1CTazW99zBK5K40f-fgSyTPw9IXgmFYjQmNhzxTT9Tts/edit?usp=sharing post workweek roadmap]
* [https://id.etherpad.mozilla.org/data-team old etherpad]
* [https://id.etherpad.mozilla.org/data-team old etherpad]

Revision as of 14:08, 13 April 2016

Overview

The cloud services data pipeline ingests data for analysis, monitoring and reporting. The pipeline is currently used for processing desktop and device Telemetry data and cloud services server logs. The Firefox Measurement Team is building the data pipeline.

Team Communication

Cross Team Communication

Resources

Pipeline specs/docs

Reporting and tools

Planning

Pipeline Milestones

  • Q1 2015: Launch pipeline prototype
    • Architecture decisions completed; production stack up and running with monitoring dashboards
    • Business Intelligence/Data Warehouse proof of concept implemented
    • Ingestion process completed for FHR+telemetry (start collecting on 2015-02-23)
    • Backprocessing from pipeline datastore implemented
    • By client ID analysis supported
    • Pipeline runs in parallel to existing infrastructure; not yet source of truth
  • Q2 2015: Pipeline officially supports business use cases
    • FHR v4 feeds executive dashboard
    • Complete set of use cases tbd (most likely primarily FHR+telemetry use cases)
    • Complete set of monitoring and reporting outputs tbd: dashboards, data warehouse, monitoring, self-service access to data
    • FHR+telemetry hits full release 2015-05-19, handle full production load
  • Q3 2015: Fill out monitoring and reporting capabilities; add sources and use cases

Related Dates and Schedules

  • FHR+Telemetry client work
    • Current plan: FF38
    • 2015-02-23 Nightly
    • 2015-03-30 Aurora
    • 2015-05-11 Release

Work Queue

Tracking tasks in bugzilla: http://mzl.la/1DOOBZt

Risks and Open Questions

  • Old-FHR data through pipeline? Yes/No: [telliot]
  • Deletes & legal policy [telliot]
  • Security review [telliot]

Code

V2 Pipeline

Telemetry

Link Description
https://github.com/vitillo/telemetry-onboarding Slides / notebooks for Telemetry Onboarding
https://github.com/mozilla/telemetry-server Code for analysis.telemetry.mozilla.org among other things
https://github.com/bsmedberg/telemetry-experiments-dashboard A dashboard to track the deployment of Firefox Telemetry Experiments
https://github.com/mozilla/telemetry-batch-view A Scala framework to build derived datasets, aka batch views, of Telemetry data.
https://github.com/mozilla/cerberus Automatic alert system for telemetry histograms
https://github.com/mozilla/emr-bootstrap-spark AWS bootstrap scripts for Mozilla's flavoured Spark setup.
https://github.com/mozilla/jupyter-notebook-gist Plugin to create, list, and load GitHub Gists from Jupyter notebooks
https://github.com/mreid-moz/jupyter-spark Jupyter Notebook extension for Apache Spark integration
https://github.com/mozilla/python_mozaggregator Aggregator job for telemetry.mozilla.org
https://github.com/mozilla/python_moztelemetry Spark bindings for Mozilla Telemetry
https://github.com/mozilla/telemetry-analysis-service Eventual home of the revamped a.t.m.o (per Bug 1248688)
https://github.com/mozilla/telemetry-tools Utility code to work with Mozilla Telemetry data

Archive