Template:Longitudinal data intro: Difference between revisions

Template:Longitudinal data intro (view source)

248 bytes added , 1 September 2016

Make l10l intro more assertive

54

edits

@@ Line 1: / Line 1: @@
 The longitudinal dataset is a summary of main pings.
-In general, you should prefer using the longitudinal set to main_summary unless there are extenuating circumstances.
-The longitudinal dataset:
+The longitudinal dataset differs from main_summary in two important ways:
-* makes it easy to report profile level metrics by grouping all data for a client-id in the same row
+* The longitudinal dataset groups all data for a client-id in the same row. This makes it easy to report profile level metrics. Without this deduping, metrics would be weighted by the number of submissions instead of by clients.
-* samples to 1% of all recent profiles, which will reduce query computation time and save resources
+* The dataset uses a 1% of all recent profiles, which will reduce query computation time and save resources. The sample of clients will be stable over time.
+Accordingly, one should prefer using the Longitudinal dataset except in the rare case where a 100% sample is strictly necessary.
 As discussed in the [https://gist.github.com/vitillo/627eab7e2b3f814725d2 Longitudinal Data Set Example Notebook]: