Public Suffix List/Uses: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
(Added a Java library)
(Rejig page to classify uses properly)
Line 1: Line 1:
This page attempts to list all the known uses of the Public Suffix List, to help us work out what problems any replacement for it would need to solve.
{{draft}}


The PSL website has [http://publicsuffix.org/learn/ a list], on which this one is based, and this data may migrate there later.
This page attempts to list all the things people are using the Public Suffix List for. For each use, it also attempts to outline some caveats with using the PSL for that purpose.


In this document, the "registered domain" is the part of a domain consisting of the public suffix plus one additional label. ("Registered" can also be "Registrable" if the domain is not yet registered; we ignore this for linguistic convenience.)
In this document, the "registered domain" is the part of a domain consisting of the public suffix plus one additional label. ("Registered" can also be "Registrable" if the domain is not yet registered; we ignore this for linguistic convenience.)
Line 7: Line 7:
The modern PSL has two sections, the ICANN area and the PRIVATE area, delimited by structured comments. Most applications use both areas without distinction; if an application uses only one or the other, that is noted (where known).  
The modern PSL has two sections, the ICANN area and the PRIVATE area, delimited by structured comments. Most applications use both areas without distinction; if an application uses only one or the other, that is noted (where known).  


The PRIVATE area exists because some registered domain owners wish to delegate subdomains to mutually-untrusting parties, and therefore wish to have them occupy different origins, as far as web browsers are concerned. Getting added to the PSL is an effective way to accomplish this. Entries in this part of the PSL come from many pseudo-NICs such as CentralNIC (owner of e.g. eu.com and us.org), and companies such as Amazon, Google, GitHub, Heroku, Microsoft and Red Hat, who provide cloud services. They are segregated into a different part of the PSL because some applications need to distinguish between the two types.
The PRIVATE area exists because some registered domain owners wish to delegate subdomains to mutually-untrusting parties, and find that being added to the PSL gives their solution more favourable security properties. Entries in this part of the PSL come from many pseudo-NICs such as CentralNIC (owner of e.g. eu.com and us.org), and companies such as Amazon, Google, GitHub, Heroku, Microsoft and Red Hat, who provide cloud services. They are segregated into a different part of the PSL because some applications need to distinguish between the two types.


==Browsers==
==Same Origin Policy==


Many of the uses in browsers boil down to their need to distinguish which sites are controlled by the same entity and which are not, for example to implement the [https://en.wikipedia.org/wiki/Same_origin_policy Same Origin Policy], or some equivalent.
The [https://en.wikipedia.org/wiki/Same_origin_policy Same Origin Policy] is the bedrock of the browser security model. It defines which domain names trust one another and which do not. This use case was the original one for which the PSL was created.  


===Common===
===Browser Uses===
 
Browsers use these divisions for:


====Cookies====
====Cookies====


Browsers restrict the domains for which cookies can be set, to avoid "supercookies" being set for e.g. "co.uk", which would allow sites to track users across multiple domains owned by different entities.
All browsers restrict the domains for which cookies can be set, to avoid "supercookies" being set for e.g. "co.uk", which would allow sites to track users across multiple domains owned by different entities.


====document.domain====
====document.domain====


The [http://www.whatwg.org/specs/web-apps/current-work/multipage/origin-0.html#dom-document-domain document.domain attribute] is used to enable pages on different hosts of a domain to access each others' DOMs. Browsers restrict the values to which the document.domain property can be set, to maintain the same origin policy.
The [http://www.whatwg.org/specs/web-apps/current-work/multipage/origin-0.html#dom-document-domain document.domain attribute] is used to enable pages on different hosts of a domain to access each others' DOMs. All browsers restrict the values to which the document.domain property can be set, to maintain the same origin policy.
 
Chrome implements of a multi-process architecture involving a singular "browser" process and multiple "renderer" processes. It uses the PSL (via document.domain) to identify pages that cannot script each other, helping to determine when to create a new renderer process.
 
It does not make a distinction between private domains and ICANN-delegated domains.


====window.external.IsSearchProviderInstalled()====
====window.external.IsSearchProviderInstalled()====
Line 29: Line 35:
====URL Bar====
====URL Bar====


Both Firefox and Chrome highlight the registered domain within the UI when displaying a page address.
Firefox, Chrome and IE all highlight the registered domain within the UI when displaying a page address.


====General UI====
====General UI====


Both Firefox and Chrome make use of the PSL to order entries within their interfaces for managing cookies and local data.
Both Firefox and Chrome make use of the PSL to order entries within their interfaces for managing cookies and local data.
===Chrome===
====URL Bar====
Chrome uses a combined search and URL bar. "name-shaped" queries - such as foo.com - query the PSL to determine whether the entered text is likely a search or a domain name. A term of "com" will be treated as a search for the phrase "com", because the term does not resolve to a registered domain (as it is just a public suffix). A term for "foo.com" is treated as a navigation, because it does contain a registered domain ("foo.com")
For this purpose, PRIVATE domains are ignored, permitting navigation to domains like "appspot.com", which are listed within the private section.
====Certificates====
Chrome will reject wildcard certificates (*.foo.bar) if foo.bar is a Public Suffix.
For this purpose, PRIVATE domains are ignored, permitting certificates for domains like "*.appspot.com"


====Safe Browsing====
====Safe Browsing====
Line 53: Line 46:


For this purpose, PRIVATE domains are ignored, although this may change in the future.
For this purpose, PRIVATE domains are ignored, although this may change in the future.
====Multi-process Architecture====
Chrome implements of a multi-process architecture involving a singular "browser" process and multiple "renderer" processes. It uses the PSL to identify pages that cannot script each other, helping to determine when to create a new renderer process.
It does not make a distinction between private domains and ICANN-delegated domains.
(annevk remark: This is a direct fallout from document.domain, not sure it should count as a distinct use.)


====SDCH====
====SDCH====
Line 68: Line 53:
It does not make a distinction between private domains and ICANN-delegated domains.
It does not make a distinction between private domains and ICANN-delegated domains.


===Firefox===
====Downloads====
====Downloads====


Line 75: Line 59:
====DOM Storage Manager and Permissions====
====DOM Storage Manager and Permissions====


Firefox sets quotas in the DOM Storage Manager, and sets other site-based permissions, based on registered domain.
Firefox and IE set quotas in the DOM Storage Manager, and set other site-based permissions, based on registered domain.


====Miscellaneous====
====Miscellaneous====


* In login prompts, the displayed domain name is stripped back to the registered domain.  
* In login prompts in Firefox, the displayed domain name is stripped back to the registered domain.  
* It is possible to configure Firefox such that whether a Referer is sent can depend on whether the two sites are in the same registered domain.
* It is possible to configure Firefox such that whether a Referer is sent can depend on whether the two sites are in the same registered domain.
* Providers are distinguished from each other in the Social API via registered domain.
* Providers are distinguished from each other in the Firefox Social API via registered domain.
* IE does Compatibility View on a per-registered-domain basis.
 
===Other Uses===
 
====DMARC====
 
The [https://datatracker.ietf.org/doc/draft-kucherawy-dmarc-base/ DMARC draft RFC] uses the PSL to determine the "organizational domain". This is where the DMARC algorithm looks for DNS records relating to DMARC. (This usage should probably exclude the PRIVATE area, but the draft does not currently say that it should.)
 
==Determining Valid Domains==


===Internet Explorer===
Some browsers and applications use the PSL for determining whether a particular string is "name-shaped" - i.e. whether it is, or could be, a domain that someone could navigate to. There is advantage in being able to do this with some degree of accuracy without needing to consult the DNS.


Integrate [http://blogs.msdn.com/b/ieinternals/archive/2009/09/19/private-domain-names-and-public-suffixes-in-internet-explorer.aspx http://blogs.msdn.com/b/ieinternals/archive/2009/09/19/private-domain-names-and-public-suffixes-in-internet-explorer.aspx] somehow.
===Caveats===


==Standards==
* For a number of reasons, the PSL may say something is name-shaped when it is not actually a domain anyone can navigate to. In other words, you will get false positives. For example, until recently, the PSL had a rule "*.il", even though the * represented only about a dozen possibilities. So "foo.wibble.il" would have passed this check, but would not be a navigable domain name.


===CAB Forum Baseline Requirements===
* New gTLDs are constantly being added to the DNS as part of the ICANN process, and it takes time for them to make it into the PSL and for copies of the PSL to be updated. Using the PSL this way risks therefore making some new gTLDs unnavigable, or have a degraded user experience, for a period of time after they are registered. In other words, you will get false negatives. For example, if there is a new gTLD ".cheese", and a user attempts to navigate to "edam.cheese" in software which has an outdated PSL, the navigation will not be possible as the software will think "edam.cheese" is not "name-shaped".


The [https://cabforum.org/baseline-requirements-documents/ CAB Forum Baseline Requirements], in section 11.1.3, require that CAs, before issuing a wildcard certificate, make sure that such a certificate is not for *.public.suffix, e.g. *.co.uk. (Or, that the entity actually owns the entirety of the public suffix, which could be true for suffixes in the PRIVATE area).  
It is therefore strongly recommended that if you use the PSL for this purpose, you a) make sure it is regularly updated in all deployed software, and b) design the software to be tolerant to false positives.


===DMARC===
===Specific Uses===


The [https://datatracker.ietf.org/doc/draft-kucherawy-dmarc-base/ DMARC draft RFC] uses the PSL to determine the "organizational domain". This is where the DMARC algorithm looks for DNS records relating to DMARC. (This usage should probably exclude the PRIVATE area, but the draft does not currently say that it should.)
====Google Chrome's URL Bar====
 
Chrome uses a combined search and URL bar. "name-shaped" queries - such as foo.com - query the PSL to determine whether the entered text is likely a search or a domain name. A term of "com" will be treated as a search for the phrase "com", because the term does not resolve to a registered domain (as it is just a public suffix). A term for "foo.com" is treated as a navigation, because it does contain a registered domain ("foo.com")
 
For this purpose, PRIVATE domains are ignored, permitting navigation to domains like "appspot.com", which are listed within the private section.
 
==Determining Valid Wildcard Certificates==
 
Some standards, browsers and applications use the PSL to give guidance on whether a particular wildcard certificate should be permitted or not.
 
===Caveats===
 
There is a risk of false negatives here with the new gTLDs. For example, "amazon" is a new gTLD in the main ICANN section. Amazon, Inc. is perfectly entitled to have a "*.amazon" wildcard certificate if they want one. However, rejecting "*.<psl>" wildcards unilaterally would cause this certificate to be rejected. This is why the CAB Forum Baseline Requirements do not forbid issuance of a certificate for "*.<psl>", but instead require the CA to be particularly diligent.
 
===Specific Uses===
 
====CAB Forum Baseline Requirements====


===HTML===
The [https://cabforum.org/baseline-requirements-documents/ CAB Forum Baseline Requirements], in section 11.1.3, require that CAs, before issuing a wildcard certificate, make sure that such a certificate is not for *.public.suffix, e.g. *.co.uk. (Or, that the entity actually owns the entirety of the public suffix, which could be true for suffixes in the PRIVATE area and some new gTLDs).


As noted above, the HTML Standard references the PSL for document.domain and IsSearchProviderInstalled().
====Google Chrome====


==Other==
Chrome will reject wildcard certificates (*.foo.bar) if foo.bar is a Public Suffix.  
===Services===
* [http://www.whoismind.com/ WhoisMind] uses the PSL to get the registered domain name out of inputted URLs.
* [http://ct-watch.tom-fitzhenry.me.uk/ Certificate Transparency Watch] uses the PSL to allow querying for all SSL certificate for domains, but not TLDs. For example, http://api.ct-watch.tom-fitzhenry.me.uk/domain/mozilla.org 200s, but http://api.ct-watch.tom-fitzhenry.me.uk/domain/org 404s .


===Programming Languages and Libraries===
For this purpose, PRIVATE domains are ignored, permitting certificates for domains like "*.appspot.com"
* [http://godoc.org/code.google.com/p/go.net/publicsuffix The Go Language] uses the public suffix to determine whether or not Internet users can register domain names under the given domain.
* [http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/net/InternetDomainName.html Guava] provides an interface for Java applications to query the Public Suffix List
* [https://github.com/whois-server-list/public-suffix-list Public Suffix List API] is a Java library.
* [https://pypi.python.org/pypi/publicsuffix/ publicsuffix] is a Python library based on the PSL.
* [https://github.com/john-kurkowski/tldextract tldextract] is another Python library.
* [https://wiki.gnome.org/Projects/libsoup libsoup] contains a [https://git.gnome.org/browse/libsoup/tree/libsoup/tld-parser.py tld-parser.py] which appears to consume the PSL. It is apparently based on a tld-parser.c.

Revision as of 15:49, 30 September 2015

Draft-template-image.png THIS PAGE IS A WORKING DRAFT Pencil-emoji U270F-gray.png
The page may be difficult to navigate, and some information on its subject might be incomplete and/or evolving rapidly.
If you have any questions or ideas, please add them as a new topic on the discussion page.

This page attempts to list all the things people are using the Public Suffix List for. For each use, it also attempts to outline some caveats with using the PSL for that purpose.

In this document, the "registered domain" is the part of a domain consisting of the public suffix plus one additional label. ("Registered" can also be "Registrable" if the domain is not yet registered; we ignore this for linguistic convenience.)

The modern PSL has two sections, the ICANN area and the PRIVATE area, delimited by structured comments. Most applications use both areas without distinction; if an application uses only one or the other, that is noted (where known).

The PRIVATE area exists because some registered domain owners wish to delegate subdomains to mutually-untrusting parties, and find that being added to the PSL gives their solution more favourable security properties. Entries in this part of the PSL come from many pseudo-NICs such as CentralNIC (owner of e.g. eu.com and us.org), and companies such as Amazon, Google, GitHub, Heroku, Microsoft and Red Hat, who provide cloud services. They are segregated into a different part of the PSL because some applications need to distinguish between the two types.

Same Origin Policy

The Same Origin Policy is the bedrock of the browser security model. It defines which domain names trust one another and which do not. This use case was the original one for which the PSL was created.

Browser Uses

Browsers use these divisions for:

Cookies

All browsers restrict the domains for which cookies can be set, to avoid "supercookies" being set for e.g. "co.uk", which would allow sites to track users across multiple domains owned by different entities.

document.domain

The document.domain attribute is used to enable pages on different hosts of a domain to access each others' DOMs. All browsers restrict the values to which the document.domain property can be set, to maintain the same origin policy.

Chrome implements of a multi-process architecture involving a singular "browser" process and multiple "renderer" processes. It uses the PSL (via document.domain) to identify pages that cannot script each other, helping to determine when to create a new renderer process.

It does not make a distinction between private domains and ICANN-delegated domains.

window.external.IsSearchProviderInstalled()

The IsSearchProviderInstalled() method uses Public Suffix.

URL Bar

Firefox, Chrome and IE all highlight the registered domain within the UI when displaying a page address.

General UI

Both Firefox and Chrome make use of the PSL to order entries within their interfaces for managing cookies and local data.

Safe Browsing

Chrome uses the PSL to restrict Safe Browsing exceptions to registered domains. That is, if a domain is believed to have hosted malware/phishing, and a user chooses to proceed, that exception is remembered at the level of a registered domain.

For this purpose, PRIVATE domains are ignored, although this may change in the future.

SDCH

Chrome implements Shared Dictionary Compression over HTTP (SDCH) [1]. It uses the PSL to determine whether or not a given dictionary may be shared between services.

It does not make a distinction between private domains and ICANN-delegated domains.

Downloads

Firefox uses the registered domain to sort entries in the Download Manager.

DOM Storage Manager and Permissions

Firefox and IE set quotas in the DOM Storage Manager, and set other site-based permissions, based on registered domain.

Miscellaneous

  • In login prompts in Firefox, the displayed domain name is stripped back to the registered domain.
  • It is possible to configure Firefox such that whether a Referer is sent can depend on whether the two sites are in the same registered domain.
  • Providers are distinguished from each other in the Firefox Social API via registered domain.
  • IE does Compatibility View on a per-registered-domain basis.

Other Uses

DMARC

The DMARC draft RFC uses the PSL to determine the "organizational domain". This is where the DMARC algorithm looks for DNS records relating to DMARC. (This usage should probably exclude the PRIVATE area, but the draft does not currently say that it should.)

Determining Valid Domains

Some browsers and applications use the PSL for determining whether a particular string is "name-shaped" - i.e. whether it is, or could be, a domain that someone could navigate to. There is advantage in being able to do this with some degree of accuracy without needing to consult the DNS.

Caveats

  • For a number of reasons, the PSL may say something is name-shaped when it is not actually a domain anyone can navigate to. In other words, you will get false positives. For example, until recently, the PSL had a rule "*.il", even though the * represented only about a dozen possibilities. So "foo.wibble.il" would have passed this check, but would not be a navigable domain name.
  • New gTLDs are constantly being added to the DNS as part of the ICANN process, and it takes time for them to make it into the PSL and for copies of the PSL to be updated. Using the PSL this way risks therefore making some new gTLDs unnavigable, or have a degraded user experience, for a period of time after they are registered. In other words, you will get false negatives. For example, if there is a new gTLD ".cheese", and a user attempts to navigate to "edam.cheese" in software which has an outdated PSL, the navigation will not be possible as the software will think "edam.cheese" is not "name-shaped".

It is therefore strongly recommended that if you use the PSL for this purpose, you a) make sure it is regularly updated in all deployed software, and b) design the software to be tolerant to false positives.

Specific Uses

Google Chrome's URL Bar

Chrome uses a combined search and URL bar. "name-shaped" queries - such as foo.com - query the PSL to determine whether the entered text is likely a search or a domain name. A term of "com" will be treated as a search for the phrase "com", because the term does not resolve to a registered domain (as it is just a public suffix). A term for "foo.com" is treated as a navigation, because it does contain a registered domain ("foo.com")

For this purpose, PRIVATE domains are ignored, permitting navigation to domains like "appspot.com", which are listed within the private section.

Determining Valid Wildcard Certificates

Some standards, browsers and applications use the PSL to give guidance on whether a particular wildcard certificate should be permitted or not.

Caveats

There is a risk of false negatives here with the new gTLDs. For example, "amazon" is a new gTLD in the main ICANN section. Amazon, Inc. is perfectly entitled to have a "*.amazon" wildcard certificate if they want one. However, rejecting "*.<psl>" wildcards unilaterally would cause this certificate to be rejected. This is why the CAB Forum Baseline Requirements do not forbid issuance of a certificate for "*.<psl>", but instead require the CA to be particularly diligent.

Specific Uses

CAB Forum Baseline Requirements

The CAB Forum Baseline Requirements, in section 11.1.3, require that CAs, before issuing a wildcard certificate, make sure that such a certificate is not for *.public.suffix, e.g. *.co.uk. (Or, that the entity actually owns the entirety of the public suffix, which could be true for suffixes in the PRIVATE area and some new gTLDs).

Google Chrome

Chrome will reject wildcard certificates (*.foo.bar) if foo.bar is a Public Suffix.

For this purpose, PRIVATE domains are ignored, permitting certificates for domains like "*.appspot.com"