Firefox/Input/Data: Difference between revisions
Line 1: | Line 1: | ||
== Summary == | == Summary == | ||
Currently, input offers two export formats for the user feedback data. The data is exported | Currently, input offers two export formats for the user feedback data. The data is exported as TSV coded tables: | ||
* ''[http://input.mozilla.com/data/opinions.tsv.bz2 opinions.tsv.bz2]'' offers the everything but ratings | * ''[http://input.mozilla.com/data/opinions.tsv.bz2 opinions.tsv.bz2]'' offers the everything but ratings |
Revision as of 18:42, 30 March 2011
Summary
Currently, input offers two export formats for the user feedback data. The data is exported as TSV coded tables:
- opinions.tsv.bz2 offers the everything but ratings
- ratings.tsv.bz2 has the ratings data
Both tables form a 1:n relationship and can be joined using the first column (the opinion id). Both tables are compressed using bzip2, so decompress them e.g. using bunzip2 or bzip2 -d.
TSV Coding
The data is a UTF-8 encoded unicode stream. Lines (=records) are separated using LF (newline, U+000A). There are no header/title records. Fields (=columns) are separated by TAB (U+0009). So TAB and LF in fields need escaping. For this, they are preceded using backslash (U+005C). Of course, this means that backslashes in fields are escaped themselves.
- Example FSM to parse input data
Opinions
When a column has no value (e.g. Device on desktop Firefox), it still occupies an empty cell (so there are multiple consecutive tabs).
Fields
- 1. Opinion ID
- coded as base10 integer number, used to lookup ratings or items on the input website
- 2. Time of feedback
- base10 integer, note this is UNIX time (i.e. UTC+0, so seconds since 1970-01-01T00:00:00Z)
- 3. Type
- one of issue, praise, suggestion, rating
- 4. Product
- one of firefox, mobile
- 5. Version
- a version identifier such as 4.0b11 or 3.6.13
- 6. Platform
- one of mac, windows, linux, android, maemo
- 7. Locale
- a locale identifier such as en-US
- 8. Manufacturer
- for product:mobile only, the device manufacturer
- 9. Device
- for product:mobile only, a device identifier
- 10. URL
- an http, https, chrome or about URL given by the user with his feedback
- 11. Description
- Free text entered by the user. Limited to 140 unicode characters (not bytes)
Ratings
One line per (opinion x rating category). Keyed to opinion table using opinion ID.
Fields
- 1. Opinion ID
- base10 integer, used to group related ratings
- 2. Rating Type
- one of startup, pageload, responsive, crashy (higher = more stable), features
- 3. Rating Value
- base10 integer ranging from 1 to 5