Loop/Data Collection: Difference between revisions

Line 149: Line 149:
Tracked in {{bug|1024568}}
Tracked in {{bug|1024568}}


This needs further fleshing out; it is currently a placeholder
=== Use Cases ===
This data collection is intended to handle the following use cases:


We should ideally have a mid-call mechanism to say "this call has an issue," which will submit a report of the following form:
# "I was expecting a call, but never got alerted, even though the other guy said he tried to call." For this use case, the user needs to be able to click on something, probably in the Loop panel, to proactively indidicate a problem.
* UA string of remote party (TBD: how to collect?)
# "I am in the middle of a call right now, and this is really bad." Basically, if the media experience becomes unsatisfactory, we want the user to be able to tell the client while we still have the media context available, so we can grab metrics before they go away.
* Console log from duration of call
# "I was in the middle of a call and it ended unexpectedly."
* All the information from the ice failure reports, above
# "I just tried to make a call but it failed to set up."
* /tmp/WebRTC.log, after performing the same operations as the "start debug mode" / "stop debug mode" buttons do
# "I just received a call alerting and tried to answer, but it failed to set up."
 
=== Data to Collect ===
 
When the user indicates an issue, the Loop client will create a ZIP file containing information that potentially pertains to the failure. For the initial set of reports, it will include the following files:
 
# ''index.txt'': Simple text file containing a JSON object. This JSON object contains indexable meta-information relating to the failure report. The format is described in [[#Index_Format]], below.
# ''sdk_log.txt'': All entries in the Browser Console emitted by chrome://browser/content/loop/*
# ''stats.txt'': JSON object containing JSON serialization of stats object
# ''local_sdp.txt'': SDP body for local end of connection
# ''remote_sdp.txt'': SDP body for remote end of connection
# ''ice_log.txt'': Contents of ICE log ring-buffer
# ''WebRTC.log'': Contents of WebRTC log; only applicable for reports submitted mid-call. This will require activation of the WebRTC debugging mode, collection of data for several seconds, and then termination of WebRTC debugging mode. See the "Start/Stop Debug Mode" buttons on the about:webrtc panel.
# ''push_log.txt'': Logging from the Loop Simple Push module (note: this isn't currently implemented, but should be added as part of this data collection).
 
=== Index Format ===
The "index.txt" file MUST be the first file in the ZIP file, and MUST use compression method of "stored" (uncompressed). This can be easily achieved by passing a compression level of 0 to [https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XPCOM/Reference/Interface/nsIZipWriter nsIZipWriter] for this file. These properties are critical to allow for rapid indexing of the reports.
 
The index file is a JSON object with the following fields:
* ''id'': UUID of report (must match ZIP file name)
* ''timestamp'': Date of report submission, in seconds past the epoch
* ''phase'': Indicates what state the call was in when the user indicated an issue.
** '''noalert''' - The user proactively indicated that a call should have arrived, but did not.
** '''setup''' - The call was in the process of being set up when the user indicated an issue.
** '''midcall''' - The call was set up and in progress when the user indicated an issue.
** '''postcall''' - The call was already finished when the user indicated an issue. The call may have never set up successfully.
* ''setupState'': call setup state; see [[MVP#Call_Setup_States]] '''(only if phase == setup)'''
** '''init'''
** '''alerting'''
** '''connecting'''
** '''half-connected'''
* ''type'': Indicates whether the user was receiving or placing a call:
** '''incoming'''
** '''outgoing'''
* ''client'': Indicates which client is generating the report. Currently, only the builtin client has the ability to collect the necessary information; this field is included to allow for future expansion if the interfaces to collect this informaiton are made available in the future.
** '''builtin'''
* ''channel'': Browser channel (nightly, aurora, beta, release, esr)
* ''version'': Browser version
* ''callId'': (for correlating to the other side, if they also made a report)
* ''apiKey'': Copied from the field provided during call setup.
* ''sessionId'': Copied from the field provided during call setup (to correlate to TB servers)
* ''sessionToken'': Copied from the field provided during call setup (to correlate to TB servers)
* ''simplePushUrl'': Copied from the field provided during user registration
* ''loopServer'': Value of the "loop.server" pref
* ''callerId'': Firefox accounts ID or MSISDN of calling party, if direct
* ''calleeId'': Firefox accounts ID or MSISDN of called party
* ''callUrl'': URL that initiated the call, if applicable
* ''dndMode'': Indication of user "availability" state; one of the following values:
** '''available'''
** '''contactsOnly'''
** '''doNotDisturb'''
** '''disconnected'''
* ''termination'': If the call has been during setup, the "reason" code from the call progress signaling (see [[MVP#Termination_Reasons]]).
* ''reason'': User-selected value, one of the following:
** '''quality'''
** '''failure'''
* ''comment'': User-provided description of problem.
* ''okToContact'': Set to "true" only if user opts-in to option to let developer contact them about this report
* ''contactEmail'': If "okToContact" is true, preferred email address for contact
 
These objects will look roughly like the following:
<code><pre>
{
  "id": "4b42e9ff-5406-4839-90f5-3ccb121ec1a7",
  "timestamp": "1407784618",
  "phase": "midcall",
  "type": "incoming",
  "client": "builtin",
  "channel": "aurora",
  "version": "33.0a1",
  "callId": "35e7c3a511f424d3b1d6fba442b3a9a5",
  "apiKey": "44669102",
  "sessionId": "1_MX40NDY2OTEwMn5-V2VkIEp1bCAxNiAwNjo",
  "sessionToken": "T1==cGFydG5lcl9pZD00NDY2OTEwMiZzaW",
  "simplePushURL": "https://push.services.mozilla.com/update/MGlYke2SrEmYE8ceyu",
  "loopServer": "https://loop.services.mozilla.com",
  "callerId": "adam@example.com",
  "callurl": "http://hello.firefox.com/nxD4V4FflQ",
  "dndMode": "available",
  "reason": "quality",
  "comment": "The video is showing up about one second after the audio",
  "okToContact": "true",
  "contactEmail": "adam@example.org",
}
</pre></code>
 
=== Uploading a User Issue Report ===
 
No existing Mozilla data ingestion systems provide a perfect fit for the data that these user reports will contain. Instead, we will create a very simple system that has the ability to grow more complex as needs evolve.
 
To that end, we will be using Microsoft Azure blob storage for the report upload and storage. The Loop client will perform POST requests directly against into a blob container. Azure includes an access control mechanism that allows servers to hand out time-limited signed URLs that can then be used to access the indicated resource.
 
When a user indicates an issue, the Loop client selects a unique issue ID, and contacts the Loop server asking for a new URL to store the issue report in (including the date on which the report was generated):
 
POST /issue-report HTTP/1.1
Accept: application/json
Content-Type: application/json; charset=utf-8
Authorization: ''<authentication information>''
{
  "id": "13b09e3f-0839-495e-a9b0-1e917d983766",
  "timestamp": "1407958471"
}
 
The Loop server forms a blob storage URL to upload the information to. The fields are constructed as follows:
 
* The host is the Azure instance assigned to Mozilla's account
* The container is "loop-" followed by a four-digit year and two-digit month (e.g., if the report date sent by the client in its post falls in August of 2014, UTC, then the container would be named "loop-201408").
* The filename is {issueID}.zip, using the issueID field provided by the client.
* The "signedversion" field (sv) is the Azure API version we're currently using
* The "signedexpiry" field (se) is the current time plus five minutes (this simply needs to be long enough to upload the report)
* The "signedresource" field (sr) is "b" (blob storage)
* The "signedpermissions" field (sp) is "w" (write only)
* The "signature" field (sig) is computed with our Azure shared key, [http://msdn.microsoft.com/en-us/library/azure/dn140255.aspx as described by the Azure SAS documentation]
 
This URL is then returned to the user:
 
HTTP/1.1 200 OK
Access-Control-Allow-Methods: GET,POST
Access-Control-Allow-Origin: https://localhost:3000
Content-Type: application/json; charset=utf-8
{
  "issueURL": "https://mozilla.blob.core.windows.net/loop-201408/13b09e3f-0839-495e-a9b0-1e917d983766.zip?sv=2012-02-12&se=2014-08-13T08%3a49Z&sr=b&sp=w&sig=Rcp6gQRfV7WDlURdVTqCa%2bqEArnfJxDgE%2bKH3TCChIs%3d"
}
 
To mitigate potential abuse, the Loop server needs to throttle handing out issue URLs on a per-IP basis. If a Loop client attempts to send a request more frequently than the throttle allows, then the Loop server will send an HTTP 429 response indicating how long the client must wait before submitting the report. The client will then re-attempt sending the report once that period has passed.
 
HTTP/1.1 429 Too Many Requests
Access-Control-Allow-Methods: GET,POST
Access-Control-Allow-Origin: https://localhost:3000
Content-Type: application/json; charset=utf-8
Retry-After: 3600
{
  "code": "429",
  "errno": "114", ''// or whatever is allocated for this use''
  "error": "Too Many Requests",
  "retryAfter": "3600"
}
 
This means that, upon startup, the Loop client code needs to check for outstanding (not-yet-upoaded) reports, and attempt to send them.  If a report is over 30 days old and has not been successfully uploaded, clients will delete the report. The Loop server will similarly check that the issueDate field is no older than 30 days, and will reject the request for an upload URL.
 
Once it aqcuires an issue upload URL, the loop client then performs a POST against the supplied URL to upload the report zipfile. The [http://msdn.microsoft.com/en-us/library/azure/dd135733.aspx Azure REST API documentation] contains more detailed information about this operation.
 
=== Data Indexing ===
 
Periodically (proposal: every hour), a job will traverse any new files that have appeared in the current month's container as well as the previous month's container and extract the information for placing in an index. Data extraction is performed using two ranged GET requests.
 
To help understand the procedure that follows, the ZIP file will begin with the following fields (see [http://www.pkware.com/documents/casestudies/APPNOTE.TXT the PKWARE documentation for full details]):
{| class="wikitable"
|-
! !! +0 !! +1
|-
! Byte 0
| rowspan = 2 colspan = 2 | ''Local file header signature (0x04034b50)''
|-
! Byte 2
|-
! Byte 4
| colspan = 2 | Version needed to extract
|-
! Byte 6
| colspan = 2 | General purpose bit flag
|-
! Byte 8
| colspan = 2 | ''Compression method''
|-
! Byte 10
| colspan = 2 | Last modfied time
|-
! Byte 12
| colspan = 2 | Last modified date
|-
! Byte 14
| rowspan = 2 colspan = 2 | CRC-32
|-
! Byte 16
|-
! Byte 18
| rowspan = 2 colspan = 2 | ''Compressed size''
|-
! Byte 20
|-
! Byte 22
| rowspan = 2 colspan = 2 | ''Uncompressed size''
|-
! Byte 24
|-
! Byte 26
| colspan = 2 | ''Filename Length''
|-
! Byte 28
| colspan = 2 | ''Extra field length''
|-
! Byte 30 ...
| colspan = 2 | <br>Filename (variable length)<br>&nbsp;
|-
! Byte 30 +<br>filename length
| colspan = 2 | <br>Extra Field (variable length)<br>&nbsp;
|-
! Byte 30 +<br>filename length +<br>extra field length
| colspan = 2 | <br>''File Contents''<br>&nbsp;
|-
|}
 
 
First, the indexing job retrieves the first 30 bytes of the file, and performs the following steps:
* Verify signature == 0x04034b50
* Verify compression method == 0
* Verify compressed size == uncompressed size
* Verify compressed size < 8 kB (these should typically be ~1 kB in size)
* Set index_file_start = 30 + file_name_length + extra_field_length
* Read range of bytes from start to start + compressed_size
* Perform a JSON parse of the resulting body
 
If any of the verification or parsing steps fail, then the indexing job deletes the blob from the container.
 
Once the parsing is complete, the indexing job then adds an entry to the index table (location TBD -- should we put this in S3?), to allow developers to search for reports by specific criteria.
 
=== Data Retrieval ===
 
To retrieve reports, authorized developers will need to be able to perform the following operations:
  * Given a report ID, retrieve the corresponding report ZIP file
  * Given a set of criteria selected from [[the fields described above|#Index_Format]], list the values of the other index fields, including the report ID (which should be a hyperlink to retrieve the report ZIP file itself)
 
Aside from satisfying those two use cases, and enforcing that only authenticated and authorized developer have access, this interface can be very rudimentary.
 
=== Data Purging ===
At the end of each month, a data retention job will locate containers with a name indicating that they exceed the data retention policy for these reports (proposal: 6 months), and remove the containers (including all contained reports). This job also removes corresponding data from the index table.
Confirmed users
632

edits