IDN Display Algorithm: Difference between revisions

Jump to navigation Jump to search
no edit summary
No edit summary
Line 22: Line 22:
has not applied for inclusion, or because we do not think they have sufficiently
has not applied for inclusion, or because we do not think they have sufficiently
strong protections in place. In addition, ICANN is about to open
strong protections in place. In addition, ICANN is about to open
a [http://newgtlds.icann.org/ large number of new TLDs]. So either maintaining a whitelist is going to become
a [http://newgtlds.icann.org/en/program-status/application-results/strings-1200utc-13jun12-en large number of new TLDs]. So either maintaining a whitelist is going to become
burdensome, or the list will become wildly out of date and we will not
burdensome, or the list will become wildly out of date and we will not
be serving our users.
be serving our users.
Line 63: Line 63:
* Common + Inherited + Latin + Han + Bopomofo; or
* Common + Inherited + Latin + Han + Bopomofo; or
* Common + Inherited + Latin + Han + Hangul; or
* Common + Inherited + Latin + Han + Hangul; or
* Common + Inherited + Latin + any single other script except Cyrillic, Greek, or Cherokee
* Common + Inherited + Latin + any single other "Recommended" or "Aspirational" script except Cyrillic or Greek
</blockquote>
</blockquote>


[http://www.unicode.org/reports/tr39/#Mixed_Script_Detection Unicode Technical Report 39] gives  
[http://www.unicode.org/reports/tr39/#Mixed_Script_Detection Unicode Technical Report 39] gives  
a definition for how we detect whether a string is "single script". Some Common or Inherited characters
a definition for how we detect whether a string is "single script".  
 
Some Common or Inherited characters
are only used in a small number (but more than one) script. Mark Davis writes:
are only used in a small number (but more than one) script. Mark Davis writes:
"The Unicode Consortium in U6.1 (due out soon) is adding the property [http://unicode.org/Public/6.1.0/ucd/ScriptExtensions.txt Script_Extensions],  
"The Unicode Consortium in U6.1 (due out soon) is adding the property [http://unicode.org/Public/6.1.0/ucd/ScriptExtensions.txt Script_Extensions],  
to provide data about characters which are only used in a few (but more than one) script.  
to provide data about characters which are only used in a few (but more than one) script.  
The sample code in #39 should be updated to include that, so handling such cases." We should
The sample code in #39 should be updated to include that, so handling such cases."  
take this enhancement when the data becomes available; in the mean time, Common and Inherited  
This data is now available, but not yet in the Firefox platform. In the mean time, Common and Inherited  
characters are permitted without restriction.
characters are permitted without restriction.


Additional checks, as suggested by TR #39 section 5:
We also implement additional checks, as suggested by TR #39 sections 5.3 and 5.4:


* Display as Punycode labels which use more than one numbering system
* Display as Punycode labels which use more than one numbering system
Account confirmers, Anti-spam team, Confirmed users, Bureaucrats and Sysops emeriti
4,925

edits

Navigation menu