Telemetry/Available Telemetry Datasets and their Applications: Difference between revisions

→‎Cross Sectional: Clarify XSec docs
(Add docs for XSec dataset)
(→‎Cross Sectional: Clarify XSec docs)
Line 16: Line 16:


==Cross Sectional==
==Cross Sectional==
The Cross Sectional dataset is a simplified version of the Longitudinal dataset. The majority of Longitudinal columns contain array values with one element for each ping, which is difficult to work with in SQL. The Cross Sectional dataset '''replaces these array-valued columns with summary statistics'''. To give an example, the Longitudinal dataset will contain a column named "geo_country" where each row is an array of locales for one client (e.g. array<"en_US", "en_US", "en_GB">). Instead, the Cross Sectional dataset includes a column named "geo_country_mode" where each row contains a single string representing the mode (e.g. "en_US"). The Cross Sectional column is '''easier to work with''' in SQL and is more representative than just choosing a single value from the Longitudinal array.  
The Cross Sectional dataset is a simplified version of the Longitudinal dataset.  


This dataset is sometimes abbreviated as the xsec dataset. You can find the current version of the code [https://github.com/mozilla/telemetry-batch-view/blob/master/src/main/scala/com/mozilla/telemetry/views/CrossSectionalView.scala here]. This dataset is under active development, please contact rharter@mozilla.com with any questions.
The majority of Longitudinal columns contain array values with one element for each ping, which is difficult to work with in SQL. The Cross Sectional dataset '''replaces these array-valued columns with summary statistics'''. To give an example, the Longitudinal dataset will contain a column named "geo_country" where each row is an array of locales for one client (e.g. array<"en_US", "en_US", "en_GB">). Instead, the Cross Sectional dataset includes a column named "geo_country_mode" where each row contains a single string representing the mode (e.g. "en_US"). The Cross Sectional column is '''easier to work with''' in SQL and is more representative than just choosing a single value from the Longitudinal array.
 
Note that the Cross Sectional dataset is derived from the Longitudinal dataset, so the dataset is a '''1% sample of main pings'''
 
This dataset is sometimes abbreviated as the '''xsec dataset'''. You can find the current version of the code [https://github.com/mozilla/telemetry-batch-view/blob/master/src/main/scala/com/mozilla/telemetry/views/CrossSectionalView.scala here]. This dataset is under active development, please '''contact rharter@mozilla.com with any questions'''.


==Client Count==
==Client Count==
54

edits