54
edits
(Add docs for XSec dataset) |
(→Cross Sectional: Clarify XSec docs) |
||
Line 16: | Line 16: | ||
==Cross Sectional== | ==Cross Sectional== | ||
The Cross Sectional dataset is a simplified version of the Longitudinal dataset | The Cross Sectional dataset is a simplified version of the Longitudinal dataset. | ||
This dataset is sometimes abbreviated as the xsec dataset. You can find the current version of the code [https://github.com/mozilla/telemetry-batch-view/blob/master/src/main/scala/com/mozilla/telemetry/views/CrossSectionalView.scala here]. This dataset is under active development, please contact rharter@mozilla.com with any questions. | The majority of Longitudinal columns contain array values with one element for each ping, which is difficult to work with in SQL. The Cross Sectional dataset '''replaces these array-valued columns with summary statistics'''. To give an example, the Longitudinal dataset will contain a column named "geo_country" where each row is an array of locales for one client (e.g. array<"en_US", "en_US", "en_GB">). Instead, the Cross Sectional dataset includes a column named "geo_country_mode" where each row contains a single string representing the mode (e.g. "en_US"). The Cross Sectional column is '''easier to work with''' in SQL and is more representative than just choosing a single value from the Longitudinal array. | ||
Note that the Cross Sectional dataset is derived from the Longitudinal dataset, so the dataset is a '''1% sample of main pings''' | |||
This dataset is sometimes abbreviated as the '''xsec dataset'''. You can find the current version of the code [https://github.com/mozilla/telemetry-batch-view/blob/master/src/main/scala/com/mozilla/telemetry/views/CrossSectionalView.scala here]. This dataset is under active development, please '''contact rharter@mozilla.com with any questions'''. | |||
==Client Count== | ==Client Count== |
edits