Data Publishing: Difference between revisions

no edit summary
(Minor cleanup, including adding a link to the data blog and fixing the doc link.)
No edit summary
Line 82: Line 82:
The goal of this process is to (1) make the “easy” (that is, safe) data publishing requests relatively friction-less, (2) have guard rails in-place so we don’t publish something that exposes us or our users to risk in some way, and (3) ensure that the dataset publishing request process matches closely other processes that are familiar to the data stewards.
The goal of this process is to (1) make the “easy” (that is, safe) data publishing requests relatively friction-less, (2) have guard rails in-place so we don’t publish something that exposes us or our users to risk in some way, and (3) ensure that the dataset publishing request process matches closely other processes that are familiar to the data stewards.


Having a dataset published requires filling out a bug.  Use the nomenclature defined in the preceding sections to answer the following four questions. If the answer to all of them is “no”, you may publish. A “yes” above means extra review is required.
Having a dataset published requires filling out a bug.  Requests will use the nomenclature defined in the preceding sections to answer a series of questions including the following four. If the answer to all of them is “no”, the data may be published. A “yes” above means extra review is required.


*  Is the level of aggregation 3 or higher?
*  Is the level of aggregation 3 or higher?
Line 111: Line 111:
'''Tabular Data''' - Data that consists of rows (or records) and columns (or fields). Each row has the same number of columns, and each column represents a dimension or metric for that row. Think of a spreadsheet or CSV file as examples of this type of data.
'''Tabular Data''' - Data that consists of rows (or records) and columns (or fields). Each row has the same number of columns, and each column represents a dimension or metric for that row. Think of a spreadsheet or CSV file as examples of this type of data.


<big>'''Example Data'''</big>
<big>'''What's Been Published So Far?'''</big>
Here are some examples of data aggregated to the levels described above.


*  Level 7: raw data, with fine-grained timestamps
Our publicly available datasets are [https://public-data.telemetry.mozilla.org/all-datasets.json here].
*  Level 6: individual-level data, aggregated to day-level time granularity
*  Level 5: anonymized individual-level data, identifiers replaced with pseudonyms
*  Level 4: probabilistic aggregates
*  Level 3: dimension-level aggregates without a minimum group size
*  Level 2: dimension-level aggregates with a minimum group size
39

edits