Thirdparty

Overview

Problem: Third party cookies and other browser fingerprinting techniques allow behavioral tracking, which is frequently undesirable to the user. However, disabling third party cookies by default (as attempted in the past) breaks too many legitimate cases.

Goals:

  • Stop behavioral tracking to the maximum extent possible, except where the user specifically wants it. (Note: this includes more than cookies.)
  • Allow common legitimate cases, such as Facebook Connect, OpenID, bank logins and such, to work as seamlessly as possible.
  • To make this the default setting in Firefox 4.

Use cases

Evil:

1. User visits multiple shopping sites, which have resources from ad sites embedded in iframes, images, or requests made directly from script. They do not want the advertiser to track their movements across those sites.
2. User visits a site that embeds Facebook Connect, but does not want their Facebook login cookies automatically sent.

Good:

3. User visits their credit union, which uses third party resources for banking functions, and wants those functions to work.
4. User visits a site that uses OpenID, Facebook Connect, or other federated login service, and wants to be able to log in to those services and use them with the site. Todo: OpenID may actually not require cookies on the first party site at all -- information is passed in a backchannel. Need to confirm. What about other authentication-related services?

Proposal Overview

Currently, cookies are keyed (i.e. set for and sent back to) by the domain that set them. Instead, double-key the cookies by (first party base domain, setting domain). Cookies are first party if the second key is derived from the first key, e.g. (google.com, mail.google.com); third party otherwise, e.g. (huffingtonpost.com, doubleclick.net).

Cookies are only sent back in situations where the double-keys are the same. For instance, when browsing buy.com, cookies set by an image hosted on ads.google.com would only be sent back when browsing buy.com; not when browsing another site.

In addition, third party cookies are discarded after the session (i.e. on browser close). (This part may be non-default behavior; it does not necessarily strike a good balance wrt UX/privacy.)

Definitions:

first party domain: the domain of the site that the user is browsing; specifically, what appears in the urlbar.
base domain: the toplevel domain for a given site, e.g. for mail.google.com the base domain would be google.com.

Discussion

This prevents automatic tracking by third parties across different sites (cases 1 and 2), since there's effectively a separate third party store per first party site. It also prevents automatic sending of session information in a third party context, even when the user has logged into that site as a first party, since the third party store is separate from the first party store for a given site.

It allows cases where temporary third party cookies are required on a given first party site (case 4), since those situations will have the same (first party, third party) key. Whether we limit third party cookie lifetime to session only will have no effect here.

It can also allow OpenID and Facebook Connect to work (case 3), with some additional user interaction. The (first party, third party) context will prevent an existing Facebook login (via facebook.com) from automatically carrying over to another site (huffingtonpost.com). However, the user can log in to Facebook from Huffington Post, and separately for each site that embeds Facebook, and things can work as usual. If we optionally limit third party lifetime to session, this login will persist for the session only. If the user trusts Facebook, they can whitelist facebook.com (via the usual mechanism) to circumvent the double keying restrictions, resulting in their Facebook login to carry over to all other sites.

The tricky part, and what determines what we will break, is defining what actions result in a first party vs third party context. For instance, an iframe within a page has an obvious first party domain (the urlbar). What about a redirect (such as a clickthrough ad, or a federated login)? A popup window? Facebook Connect uses a combination of popup windows (for the Facebook login itself), and iframes on the first party page (to display relevant Facebook information), to deliver its experience. All these elements need to exist within the same context -- whether it be a first party or third party one -- for the service to work as intended.

What this logic really comes down to is separating actions related to the current site vs. actions that are completely unrelated. For instance, an OpenID login process is conceptually related to whatever site the user is on. Clicking a link to browse to another site, at the discretion of the user, is not.

With that, some terminology: there will be actions that reset (i.e. result in a new) first party domain, and actions that carry over the existing first party domain. We should decide on, and the browser should implement, the logic for each case. Note that this only applies to toplevel actions (changing the URL of the current page, or opening a popup or new tab) -- any actions embedded inside the page always have a very well-defined first party domain.

Proposal Details

1. Typing in the urlbar, loading bookmarks, other totally toplevel actions -- resets first party domain.
2. Link clicks (href tags) -- resets.
3. Setting the toplevel document.location -- resets.
4. Toplevel redirects -- resets.
5. Popup windows (window.open) -- carries over.

Rationale

Again, let me reiterate -- what matters here is not how the user thinks of a particular action, but whether the action is related, in an integral way, with the current site.

We have some hard data points here, but more is always better, and will allow us to make a more informed decision on how these changes will affect the web.

1. Typing in the urlbar is clearly not something that can be considered as integral to the functioning of a particular site.
2. There are two cases when clicking on a link: a) the link is targeted at the same domain; b) it is not. For the former, what we do is irrelevant. For the latter, by and large, it means the link is not integrally related -- I strongly doubt, for instance, that any federated login processes use <href> tags pointing at their domain. Is this really true? Are there other relevant use cases?
3. The answer to this really depends on what use cases exist on the web. Someone out there undoubtedly uses document.location to implement an authentication scheme. Need more hard data here. However, I suspect that the right thing to do is consider it unrelated.
4. This case is clearer. Many services, such as bit.ly, immediately and permanently redirect to a target. Sites that redirect from original to target and back to original probably mean the two are related, and could easily be an implementation of federated login. It could also be an implementation of an auto-redirect ad. What we do here is important to get right.
The key is [TBD...]

Since it's an obvious hole, we have to track first party context through redirects. (So going to digg.com --> redirect to clickthrough ad on ads.google.com --> click back to digg.com would maintain a first party context of digg.com throughout.) If we didn't, those clickthrough ads would be first parties, and could track the user across sites.

Facebook Connect uses a JS lightbox to throw the login dialog (http://wiki.developers.facebook.com/index.php/Authenticating_Users_with_Facebook_Connect). This counts as part of the page, rather than a popup window, and thus would be considered a third party. So double-keying would work fine here. Note that the embedder can specify they want to use a popup dialog instead, but let's say that's not the common case.

OpenID probably uses redirects in general (http://www.merchantos.com/makebeta/php/single-sign-on-with-openid-and-google-part-1/), though I'm not sure about provider specifics. If we track redirects and consider them third parties -- which would require some extra mechanics -- then this would work just fine. (So going to digg.com --> redirect to clickthrough ad on ads.google.com --> click back to digg.com would maintain a first party context of digg.com throughout. If we didn't, those clickthrough ads would be first parties, and could track the user across sites. So doing this is good all around.)

Note that Opera does something interesting here: by default, they consider redirects to be "unverified transactions", which are considered third party. Link clicks are verified transactions -- first party. This is actually part of RFC2965 (http://www.faqs.org/rfcs/rfc2965.html) section 3.3.6: "A transaction is verifiable if the user, or a user-designated agent, has the option to review the request-URI prior to its use in the transaction." In Opera, with "automatic redirection" turned off, I believe this means that redirects throw a page which says "this is a redirect to http://foo.com, continue?" or somesuch. Clicking that link then makes the transaction verified, and the cookies are first party.

With that, I propose (where it is implied that the first party domain carries over, until reset):

1. Typing in the urlbar, loading bookmarks, other totally toplevel actions -- resets first party domain.
2. Link clicks (href tags) -- resets (but I'm not sure about this yet).
3. Setting document.location -- carries over first party domain. (It's hard to distinguish a user-initiated action that results in a document.location change vs. an automated change. So we have to go with carrying over here.)
4. Redirects -- carries over.
5. Popup windows -- carries over.

We might want to make link clicks carry over the first party. Rationale: a site that relies on an href click (to a third party) to perform a login operation, rather than using a redirect or document.location, needs that load to carry over the first party such that things work when redirected back. The downside is that long browsing sessions in a single tab, across multiple sites, will result in them all being considered third party. (And thus allow behavioral tracking during that tab lifetime.) Having it reset is probably a good tradeoff, since it's less surprising. But it would allow holes, e.g. where a site has a link targeted at ads.google.com which then redirects back to some content.

Implementation

Step 1: Make third party cookies persist for the session only, by default. (Can be disabled by a network.cookie.thirdparty.sessionOnly pref.) See bug 565475; patch up.

Step 2: Double-key cookies by (first party domain, setting domain). See bug 565965; patch in progress.

Step 3: Implement the first party carry-over rules described above, probably as a separate service such that localstorage etc. can use it.

Further Steps

Other services such as localstorage should use a set of policies consistent with the above.

Make the browser fingerprint more anonymous, by reducing the uniqueness of queryable information other than cookies. See Fingerprinting for details.