Telemetry/LongitudinalExamples: Difference between revisions

Fixing l10l sampling methodology.
(Filling out resources section)
(Fixing l10l sampling methodology.)
Line 24: Line 24:
  SELECT * FROM longitudinal LIMIT 1000 ...
  SELECT * FROM longitudinal LIMIT 1000 ...


Or to look at a 1% sample of the clients:
For a statistically sound sample, use TABLESAMPLE:
SELECT * FROM longitudinal TABLESAMPLE BERNOULLI(xx)


SELECT * FROM longitudinal WHERE sample_id[1] = 5 ...
Where xx is an integer representing what percentage of data you want to include in your sample (e.g. 10% sample -> xx=10).


The sample_id partitions the clients into stable ~1% samples.
A couple of caveats:
* This sampling method will only decrease your query run time if you're manipulating the data a lot. Bernoulli sampling still requires reading the whole DB before proceeding.
* This sample will not be deterministic. I.e. you will not get the same sample for every run. This can cause problems when using Presto Views or logical tables.
* Unlike LIMIT, this method does not guarantee a fixed number of results.


=== Arrays ===
=== Arrays ===
54

edits