Confirmed users
556
edits
(Remove Ben from the stewards list) |
Groovecoder (talk | contribs) (Adding myself (Luke Crouch)) |
||
(43 intermediate revisions by 23 users not shown) | |||
Line 9: | Line 9: | ||
* Accountability - We assign accountability for the design, approval, and implementation of data collection | * Accountability - We assign accountability for the design, approval, and implementation of data collection | ||
Owner: | Owner: Nneka Soyinka | ||
Data Stewards: | Data Stewards: | ||
* [https:// | * [https://people.mozilla.org/p/kennylong/ Kenny Long] | ||
* [https:// | * [https://people.mozilla.org/p/jhirsch Jared Hirsch] | ||
* [https:// | * [https://people.mozilla.org/p/adavis Alex Davis] | ||
* [https:// | * [https://people.mozilla.org/p/TheOne Andreas Wagner] | ||
* [https:// | * [https://people.mozilla.org/p/tlong/ Travis Long] | ||
* [https:// | * [https://people.mozilla.org/p/willkg Will Kahn-Greene] | ||
* [https:// | * [https://people.mozilla.org/p/p--n8wmyowcldls6pvp6ab1pj Roger Yang] | ||
* [https:// | * [https://people.mozilla.org/p/sancus :sancus] | ||
* [https:// | * [https://people.mozilla.org/p/charlie-humphreys Charlie Humphreys] | ||
* [https:// | * [https://people.mozilla.org/p/cboozarjomehri Cameron Boozarjomehri] | ||
* [https:// | * [https://people.mozilla.org/p/chutten/ :chutten] | ||
* [https:// | * [https://people.mozilla.org/p/sergiosonline Sergio Betancourt] | ||
* [https:// | * [https://people.mozilla.org/p/aminomancer Shane Hughes] | ||
* [https:// | * [https://people.mozilla.org/p/roux Roux Buciu] | ||
* [https://people.mozilla.org/p/groovecoder Luke Crouch] | |||
Data stewards come from a variety of teams within Mozilla, including data science, Firefox engineering, mobile products, Pocket, Common Voice, AMO, and Thunderbird. You are welcome to tag any steward for any collection request, without respect to the nature of your collection. | Data stewards come from a variety of teams within Mozilla, including data science, Firefox engineering, mobile products, Pocket, Common Voice, AMO, and Thunderbird. You are welcome to tag any steward for any collection request, without respect to the nature of your collection. | ||
Line 35: | Line 36: | ||
Most assets involved in data review can be found [https://github.com/mozilla/data-review in this repository]. References to who fills out a form when are covered in the documentation below. | Most assets involved in data review can be found [https://github.com/mozilla/data-review in this repository]. References to who fills out a form when are covered in the documentation below. | ||
= Scope = | |||
These guidelines are '''required''' for data collection in products with an active user base and established privacy policies under the Firefox organization, but may be applied to any Mozilla product as needed. Changes to policies themselves or the creation of a policy for a new product is out of scope of what is described here. | |||
= Key Roles for Data Collection = | = Key Roles for Data Collection = | ||
Line 47: | Line 51: | ||
Mozilla always strives to make data reviews public. However, there are sometimes limited sets of circumstances when we may conduct our reviews in a private bug; for example, a service is part of an agreement where the partnership is not yet public. These reviews will be made public once the actual data collection begins. | Mozilla always strives to make data reviews public. However, there are sometimes limited sets of circumstances when we may conduct our reviews in a private bug; for example, a service is part of an agreement where the partnership is not yet public. These reviews will be made public once the actual data collection begins. | ||
= | = Adding or Modifying Data Collection = | ||
The process is slightly different for collections in [https://hg.mozilla.org/mozilla-central/ mozilla-central] code (Firefox Desktop, Firefox & Focus for Android, and Gecko) than it is elsewhere. Please consult the relevant section below. | |||
== Firefox Desktop, Firefox and Focus for Android, Gecko (from May 7, 2024) == | |||
When a developer uploads a change to Phabricator that adds or modifies any data collection, Phabricator will automatically add the <tt>needs-data-classification</tt> tag, and explain what happens next. | |||
If you’re adding or modifying data collection in your Phabricator revision and this doesn’t happen automatically, please manually add this tag and then follow the same procedure. | |||
Once this tag is in place Herald will ask the patch author and reviewer to assess the [[#Data_Collection_Categories|correct category for the data collection ]]: | |||
* If the data being collected fits in the “technical data” or “interaction data” categories described there, use the <tt>data-classification-low</tt> tag. | |||
* If it’s any other category, or patch author and reviewer disagree about the right category, use the <tt>data-classification-high</tt> tag, and go through [[#Step_3:_Sensitive_Data_Collection_Review_Process|the sensitive data collection review process]]. | |||
* If you think that the data in question fits in “technical” or “interaction” data but would benefit from additional review, you can also explicitly choose to use the <tt>data-classification-high</tt> tag and thereby opt in to the sensitive data collection review process. | |||
When using Glean for the data collection, the data classification of the new or expanded data collections should match the <tt>data_sensitivity</tt> property in the metric definitions. The entry in the <tt>data_reviews</tt> list should reflect the bug URL. | |||
If the reviewer is unsure or feels uncomfortable making this assessment themselves, they can [mailto:data-stewards@mozilla.com email the data stewards group] or [https://chat.mozilla.org/#/room/#data-stewards:mozilla.org contact them on matrix] for help. | |||
Whichever tag you choose, please '''leave a comment explaining your choice'''. Note that you will not be able to land this revision until the revision has one of these tags and you remove the <tt>needs-data-classification</tt> tag. For low sensitivity data collection, you will be able to land the patch once this sensitivity is marked and you remove the <tt>needs-data-classification</tt> tag. For high sensitivity data collection, the [https://phabricator.services.mozilla.com/project/view/209/ <tt>data-stewards</tt>] group will be added as a blocking reviewer on the patch. They will approve or request changes to the patch based on the [[#Step_3:_Sensitive_Data_Collection_Review_Process|sensitive data collection review process]]. | |||
Patch authors are encouraged to add these tags themselves, but '''reviewers are responsible for making sure the right tag is used'''. | |||
If you do not yet have a code change but are in the planning stages of a change and want to proactively discuss data collection options, reach out to [mailto:data-stewards@mozilla.com the data stewards group]. | |||
== Other Products == | |||
== Step 1: Submit Request == | == Step 1: Submit Request == | ||
To request a review for new or changed Data Collection in a Mozilla product, Data Review requesters are required to provide the following: | To request a review for new or changed Data Collection in a Mozilla product, Data Review requesters are required to provide the following: | ||
* A completed Request Form, documenting what data is to be collected, why Mozilla needs to collect this data, how much data will be collected, and for how long it will be collected: | * A completed Request Form, documenting what data is to be collected, why Mozilla needs to collect this data, how much data will be collected, and for how long it will be collected: | ||
** Take [https://github.com/mozilla/data-review/blob/ | ** Take [https://github.com/mozilla/data-review/blob/main/request.md this request] and fill it out completely. | ||
*** (If you are renewing a previously-reviewed data collection, you may use [https://github.com/mozilla/data-review/blob/ | *** (If you are renewing a previously-reviewed data collection, you may use [https://github.com/mozilla/data-review/blob/main/renewal_request.md this shorter form] instead.) | ||
** If your collection is [https://mozilla.github.io/glean/book/index.html Glean] you can [https://blog.mozilla.org/data/2021/09/07/this-week-in-glean-data-reviews-are-important-glean-parser-makes-them-easy/ use <tt>glean_parser</tt> to generate a partially-filled template for you]. | |||
* A bug to attach the completed Request Form to: | * A bug to attach the completed Request Form to: | ||
** If you already have a bug filed to add the collection code, attach the form to that one. | ** If you already have a bug filed to add the collection code, attach the form to that one. | ||
Line 58: | Line 89: | ||
** Tell Bugzilla that your form's extension is <tt>.txt</tt> so it can render it inline and so your Data Steward can review it more easily. | ** Tell Bugzilla that your form's extension is <tt>.txt</tt> so it can render it inline and so your Data Steward can review it more easily. | ||
* A notification so the Data Steward knows it's time to review your Request Form: | * A notification so the Data Steward knows it's time to review your Request Form: | ||
** Flag the attached, completed Request Form for <tt>data-review</tt>. | ** Flag the attached, completed Request Form for <tt>data-review</tt> by setting the <tt>data-review</tt> flag to <tt>?</tt> and choosing your chosen Data Steward in the "Requestee" field that appears. | ||
** If a Data Steward doesn't get to your review within a couple of days, please [https://chat.mozilla.org/#/room/#data-stewards:mozilla.org reach out to us on | ** If a Data Steward doesn't get to your review within a couple of days, please [https://chat.mozilla.org/#/room/#data-stewards:mozilla.org reach out to us on Element]. | ||
== Step 2: Request is reviewed == | == Step 2: Request is reviewed == | ||
Line 65: | Line 96: | ||
* Data stewards receive a <tt>data-review?</tt> on a file in a bug | * Data stewards receive a <tt>data-review?</tt> on a file in a bug | ||
* Data stewards complete the [https://github.com/mozilla/data-review/blob/ | * Data stewards complete the [https://github.com/mozilla/data-review/blob/main/review.md data review form] based on the information provided in the data collection request. They ensure that the request: | ||
** Follows Lean Data Practices & Guidelines | ** Follows Lean Data Practices & Guidelines | ||
** The basic mechanics of what is being measured is documented publicly. | ** The basic mechanics of what is being measured is documented publicly. | ||
Line 76: | Line 107: | ||
* Complex requests that pose broader policy and legal implications may be escalated to the Trust and Legal teams. (See Step 3) | * Complex requests that pose broader policy and legal implications may be escalated to the Trust and Legal teams. (See Step 3) | ||
== Step 3: | == Step 3: Sensitive Data Collection Review Process == | ||
=== Determine if you need to follow this process === | |||
For any data collection that is classified as category 3 or 4 (described below) – including in pre-release channels and experiments – we require additional review to be performed and an announcement to a mailing list. The reason for this is that while our privacy policies describe what we can do without additional user notice, this is an upper bound; even for collection which fits within the policy, we need to determine whether that collection is appropriate and conforms to our overall commitment to privacy and minimization. | |||
=== Create documentation and request review=== | |||
As a first step, it is important that the details of the implementation, intended use, and value to users be clearly documented for future reference and efficient review. As soon as this is ready (we recommend as early as possible, before you move forward with the implementation), send an email to the [https://groups.google.com/a/mozilla.com/g/data-review data-review@mozilla.com] mailing list. | |||
The initial documentation from engineering/data stewardship and privacy/technical review should be completed as a prerequisite ahead of legal and security. | |||
{| class="wikitable" | |||
|- | |||
! Risk Assessment !! Owner !! Facilitator | |||
|- | |||
| Privacy/Technical Review || Office of the Firefox CTO || Martin Thompson | |||
|- | |||
| Legal/Trust Review || Legal || Nneka Soyinka | |||
|- | |||
| Security Review || Office of the CSO || Marc Perreault | |||
|- | |||
| Data Review || Data || Mark Reid | |||
|} | |||
Facilitators (named above) are expected to express judgement about how much risk is involved and will involve the appropriate reviewers. | |||
If the level of risk is determined to be low enough and/or there is clear precedent, further discussion may not be necessary and each reviewer may give a sign-off immediately; otherwise, mitigations should be incorporated and documentation updated once they have been addressed. Live discussion is often very helpful – and should be planned for – when there is significant risk involved. | |||
Data collection may not be shipped to users until final sign-offs have been obtained. | |||
=== Escalation === | |||
In the case of a dispute about sensitive data collection and/or which mitigations are appropriate, the proposer or any reviewer should work with one of the facilitators to escalate the decision to the VP/XLT member in charge of the product (e.g., Head of Firefox, Head of Pocket). Depending on the scope and nature of the risk, there may also be cases where escalation goes beyond the immediate product owner (i.e., to the CPO or CEO). When this happens, the facilitator and escalating party: | |||
* Give each party a chance to document their recommended approach in writing. | |||
* Share the document with all involved parties for asynchronous review/comment. | |||
* Schedule a meeting for discussion if necessary. | |||
* Record the final decision by the product owner. | |||
= Data Collection Categories = | = Data Collection Categories = | ||
There are four "categories" of data collection | There are four "categories" of data collection: | ||
; '''Category 1 “Technical data”''' | ; '''Category 1 “Technical data”''' | ||
Line 93: | Line 155: | ||
:Examples include OS, crashes and errors, outcome of automated processes like updates, activation, version #s, etc. This also includes aggregated compatibility information about features and API usage by websites, addons, and other 3rd-party software that interact with the application during usage. | :Examples include OS, crashes and errors, outcome of automated processes like updates, activation, version #s, etc. This also includes aggregated compatibility information about features and API usage by websites, addons, and other 3rd-party software that interact with the application during usage. | ||
: It also includes information about the user's settings that is necessary to provide functionality. For example, what applications users have connected to a service or what services users have logged into using a | : It also includes information about the user's settings that is necessary to provide functionality. For example, what applications users have connected to a service or what services users have logged into using a Mozilla account. | ||
; '''Category 2 “Interaction data”''' | ; '''Category 2 “Interaction data”''' | ||
Line 116: | Line 178: | ||
: It also includes any data from different categories that, when combined, can identify a person, device, household or account. For example: Category 1 log data combined with Category 3 saved URLs. | : It also includes any data from different categories that, when combined, can identify a person, device, household or account. For example: Category 1 log data combined with Category 3 saved URLs. | ||
: Additional examples are: voice audio commands (including a voice audio file), speech-to-text or text-to-speech (including transcripts), biometric data, demographic information, and precise location data associated with a persistent identifier, individual or small population cohorts. This is location inferred or determined from mechanisms other than IP such as wi-fi access | : Additional examples are: voice audio commands (including a voice audio file), speech-to-text or text-to-speech (including transcripts), biometric data, demographic information, and precise location data associated with a persistent identifier, individual or small population cohorts. This is location inferred or determined from mechanisms other than IP such as wi-fi access points, Bluetooth beacons, cell phone towers or provided directly to us, such as in a survey or a profile. | ||
: | : | ||
== Eligibility for Default on Data Collection == | == Eligibility for Default on Data Collection == | ||
At installation, Mozilla’s products and services include one or more preferences and settings. These preferences and settings typically belong to a data collection state: a status that describes whether data collection occurs by default or not. | |||
{| class="wikitable" | |||
|- | |||
! State !! What it Means | |||
|- | |||
| Default ON || Data may be collected automatically. | |||
Users must have a way to turn off data collection. [https://support.mozilla.org/en-US/kb/telemetry-clientid Learn how to opt out] of data collection in Firefox. | |||
|- | |||
| Default OFF || Data may be collected, but only if a user takes an clear, express action to opt-in to the collection. This can be through a configuration option, a prompt or an update through an account profile. | |||
Users must have a way to turn off data collection. | |||
|} | |||
“'''Release'''” means products that are not experimental. These include Firefox, Pocket, Lockwise, Monitor, and others. | |||
“'''Pre-release'''” means experimental products. They are typically identified by the words “Beta,” “Nightly,” “Preview,” “Reference Browser,” or “Developer Edition” in the name of the product. | |||
{| class="wikitable" | |||
|- | |||
! Category 1 “Technical data” | |||
|- | |||
| ''Release & Pre-Release'' - eligible for Default ON. | |||
|} | |||
{| class="wikitable" | |||
|- | |||
! Category 2 “Interaction data” | |||
|- | |||
| ''Release & Pre-Release'' - eligible for Default ON. | |||
|} | |||
{| class="wikitable" | |||
|- | |||
! Category 3 “Stored Content and Communications” | |||
|- | |||
| ''Release'': Default OFF. Default ON requires prior Trust approval. | |||
''Pre-Release'': Default ON eligible | |||
On a case-by-case basis collections may be eligible to be "Default ON" if mitigations are identified. Mitigations may include UX changes that make users aware of additional risk, technical mechanisms that remove the risk, or a risk assessment done of a case-by-case basis that determines the risk is limited. | |||
|} | |||
{| class="wikitable" | |||
|- | |||
! Category 4 “Highly Sensitive or Clearly identifiable personal data” | |||
|- | |||
| ''Release & Pre-Release'': Default OFF | |||
Any collection requires prior Trust approval and (i) advance user notice (ii) consent and (iii) an opt-out. | |||
|} | |||
= Other Practices = | = Other Practices = |