54
edits
(small prose changes) |
(Make l10l intro more assertive) |
||
Line 1: | Line 1: | ||
The longitudinal dataset is a summary of main pings. | The longitudinal dataset is a summary of main pings. | ||
The longitudinal dataset: | The longitudinal dataset differs from main_summary in two important ways: | ||
* | * The longitudinal dataset groups all data for a client-id in the same row. This makes it easy to report profile level metrics. Without this deduping, metrics would be weighted by the number of submissions instead of by clients. | ||
* | * The dataset uses a 1% of all recent profiles, which will reduce query computation time and save resources. The sample of clients will be stable over time. | ||
Accordingly, one should prefer using the Longitudinal dataset except in the rare case where a 100% sample is strictly necessary. | |||
As discussed in the [https://gist.github.com/vitillo/627eab7e2b3f814725d2 Longitudinal Data Set Example Notebook]: | As discussed in the [https://gist.github.com/vitillo/627eab7e2b3f814725d2 Longitudinal Data Set Example Notebook]: |
edits