Telemetry/Available Telemetry Datasets and their Applications: Difference between revisions

(→‎Main Summary: Update main summary warnings)
(Add API Backlink summary)
 
(8 intermediate revisions by 2 users not shown)
Line 1: Line 1:
=Data Set Documentation=
=Data Set Documentation=
==Longitudinal==
[[Telemetry/LongitudinalExamples|Complete documentation]]


{{longitudinal data intro}}
This document now lives here:
https://github.com/mozilla/telemetry-batch-view/blob/master/docs/choosing_a_dataset.md


==Main Summary==
[https://wiki.mozilla.org/api.php?action=query&list=backlinks&bltitle=Telemetry/Available_Telemetry_Datasets_and_their_Applications Wiki.mo pages linking to this dead page]
[https://github.com/mozilla/telemetry-batch-view/blob/master/docs/MainSummary.md Complete Documentation]
 
Like the longitudinal dataset, main summary summarizes [https://gecko.readthedocs.io/en/latest/toolkit/components/telemetry/telemetry/data/main-ping.html main pings]. Each row corresponds to a single ping. This table does no sampling and includes all desktop pings.
 
===Caveats===
Querying against main summary on SQL.t.m.o/re:dash can '''impact performance for other users''' and can '''take a while to complete''' (~30m for simple queries). Since main summary includes a row for every ping, there are a large number of records which can consume a lot of resources on the shared cluster.
 
Instead, we recommend using the Longitudinal dataset where possible if querying from re:dash/sql.t.m.o. The longitudinal dataset samples to 1% of all data and organized the data by client_id. In the odd case where these queries are necessary, limit to a short submission_date_s3 range and ideally make use of the sample_id field. Even better, try using Spark.
 
==Cross Sectional==
 
 
==Client Count==
 
==Crash Aggregates==
 
==Mobile Metrics==
The android_events, android_clients, android_addons, and mobile_clients tables are documented here:
https://wiki.mozilla.org/Mobile/Metrics/Redash

Latest revision as of 21:49, 7 November 2016