131
edits
DEinspanjer (talk | contribs) No edit summary |
DEinspanjer (talk | contribs) |
||
Line 11: | Line 11: | ||
* This is why European and German law *requires* opt-in for any gathering of data about the user. | * This is why European and German law *requires* opt-in for any gathering of data about the user. | ||
= Discussion of using random sampling | = Discussion of using random sampling = | ||
== Comments from [[User:DEinspanjer|DEinspanjer]] 20:10, 2 February 2012 (PST) == | == Comments from [[User:DEinspanjer|DEinspanjer]] 20:10, 2 February 2012 (PST) == | ||
[[User:BenB]] brought up the idea of using random sampling | During the security review meeting, [[User:BenB]] brought up the idea of using random sampling to enroll installations into the data submission vs. enrolling all installations by default. This had been previously discussed by the Metrics team. It is a viable option with some possibly moderate drawbacks. Anyone manually opting in to the system must be flagged as such so their self-selection bias does not skew analysis. The current proposed system generates aggregate views of the data which roll up any high cardinality groups to an acceptable level (the initial threshold was set at 1000). It is reasonable to assume there will be a lot of long tail groups with the minimum threshold aggregation levels. Heavy sampling is likely to make that long tail unuseful for comparison analysis. For example, it is very likely that even a 10% sampling might not allow Mozilla or an individual user doing local analysis to compare performance of their installation with other installations that have a particular add-on installed. | ||
It is not something that I would consider to be a closed topic by any means, but my personal preference is to make sure the system has adequate privacy controls and can handle the load of the full installation-base and avoid potential issues with sampling errors or reduced analytic capability for both the user and for Mozilla. | It is not something that I would consider to be a closed topic by any means, but my personal preference is to make sure the system has adequate privacy controls and can handle the load of the full installation-base and avoid potential issues with sampling errors or reduced analytic capability for both the user and for Mozilla. |
edits