Thirdparty

Overview

Problem: Third party cookies and other browser fingerprinting techniques allow behavioral tracking, which is frequently undesirable to the user. However, disabling third party cookies by default (as attempted in the past) breaks too many legitimate cases.

Goals:

  • Stop behavioral tracking to the maximum extent possible, except where the user specifically wants it. (Note: this includes more than cookies.)
  • Allow common legitimate cases, such as Facebook Connect, OpenID, bank logins and such, to work as seamlessly as possible.
  • To make this the default setting in Firefox 4.

Use cases

Evil:

1. User visits multiple shopping sites, which have resources from ad sites embedded in iframes, images, or requests made directly from script. They do not want the advertiser to track their movements across those sites.
2. User visits a site that embeds Facebook Connect, but does not want their Facebook login cookies automatically sent.

Good:

3. User visits their credit union, which uses third party resources for banking functions, and wants those functions to work.
4. User visits a site that uses OpenID or Facebook Connect, and wants to be able to log in to those services and use them with the site.

Proposal

Currently, cookies are keyed (i.e. set for and sent back to) by the domain that set them. Instead, double-key the cookies by (first party base domain, setting domain). Cookies are first party if the second part is derived from the first part, e.g. (google.com, mail.google.com); third party otherwise.

Cookies are only sent back in situations where the double-keys are the same. For instance, when browsing buy.com, cookies set by an image hosted on ads.google.com would only be sent back when browsing buy.com; not when browsing another site.

In addition, third party cookies are discarded after the session (i.e. on browser close).

Definitions: first party domain: the domain of the site that the user is browsing; specifically, what appears in the urlbar. base domain: the toplevel domain for a given site, e.g. for mail.google.com the base domain would be google.com.

Analysis

This prevents tracking by third parties across different sites (case 1), since there's effectively a separate third party store per first party site. It also prevents automatic sending of session information in a third party context, even when the user has logged into that site as a first party, since the third party store is separate from the first party store for a given site.

It allows cases where temporary third party cookies are required on a given first party site (case 4), since those situations will have the same (first party, third party) key, and the cookies last for the session. It can also allow OpenID and such to work (see below!), again since the (first party, third party) context can be made the same. In this case, the user will have to log in separately for each first party, and the login will persist for the session only, unless the user whitelists the third party site via the usual mechanism.

The tricky part is defining in what cases the first party context should carry over. For instance, an iframe within a page has an obvious first party domain (the urlbar). What about a redirect (such as a clickthrough ad, or an OpenID login)? Since it's an obvious hole, we have to track first party context through redirects. (So going to digg.com --> redirect to clickthrough ad on ads.google.com --> click back to digg.com would maintain a first party context of digg.com throughout.) If we didn't, those clickthrough ads would be first parties, and could track the user across sites.

Facebook Connect uses a JS lightbox to throw the login dialog (http://wiki.developers.facebook.com/index.php/Authenticating_Users_with_Facebook_Connect). This counts as part of the page, rather than a popup window, and thus would be considered a third party. So double-keying would work fine here. Note that the embedder can specify they want to use a popup dialog instead, but let's say that's not the common case.

OpenID probably uses redirects in general (http://www.merchantos.com/makebeta/php/single-sign-on-with-openid-and-google-part-1/), though I'm not sure about provider specifics. If we track redirects and consider them third parties -- which would require some extra mechanics -- then this would work just fine. (So going to digg.com --> redirect to clickthrough ad on ads.google.com --> click back to digg.com would maintain a first party context of digg.com throughout. If we didn't, those clickthrough ads would be first parties, and could track the user across sites. So doing this is good all around.)

Note that Opera does something interesting here: by default, they consider redirects to be "unverified transactions", which are considered third party. Link clicks are verified transactions -- first party. This is actually part of RFC2965 (http://www.faqs.org/rfcs/rfc2965.html) section 3.3.6: "A transaction is verifiable if the user, or a user-designated agent, has the option to review the request-URI prior to its use in the transaction." In Opera, with "automatic redirection" turned off, I believe this means that redirects throw a page which says "this is a redirect to http://foo.com, continue?" or somesuch. Clicking that link then makes the transaction verified, and the cookies are first party.

With that, I propose (where it is implied that the first party domain carries over, until reset):

1) Typing in the urlbar, loading bookmarks, other totally toplevel actions -- resets first party domain. 2) Link clicks (href tags) -- resets (but I'm not sure about this yet). 3) Setting document.location -- carries over first party domain. (It's hard to distinguish a user-initiated action that results in a document.location change vs. an automated change. So we have to go with carrying over here.) 4) Redirects -- carries over. 5) Popup windows -- carries over.

We might want to make link clicks carry over the first party. Rationale: a site that relies on an href click (to a third party) to perform a login operation, rather than using a redirect or document.location, needs that load to carry over the first party such that things work when redirected back. The downside is that long browsing sessions in a single tab, across multiple sites, will result in them all being considered third party. (And thus allow behavioral tracking during that tab lifetime.) Having it reset is probably a good tradeoff, since it's less surprising. But it would allow holes, e.g. where a site has a link targeted at ads.google.com which then redirects back to some content.

Implementation

Step 1: Make third party cookies persist for the session only, by default. (Can be disabled by a network.cookie.thirdparty.sessionOnly pref.) See bug 565475; patch up.

Step 2: Double-key cookies by (first party domain, setting domain). See bug 565965; patch in progress.

Step 3: Implement the first party carry-over rules described above, probably as a separate service such that localstorage etc. can use it.