I18n:Updating Unicode version: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
(→‎Unicode properties: updated for version 8.0 of Unicode)
(Add CaseFolding.txt to the documentation for SpiderMonkey's Unicode support)
 
(2 intermediate revisions by one other user not shown)
Line 33: Line 33:
* nsUnicodeScriptCodes.h
* nsUnicodeScriptCodes.h
in the current directory.
in the current directory.
== Casing ==
We require  Unicode data files from http://www.unicode.org/Public/UNIDATA/<br>
As well as UnicodeData.txt downloaded in the previous step, we need
* SpecialCasing.txt
From intl/unichar/util, run the command:
perl ../tools/genSpecialCasingData.pl /path/to/UCD-directory/UnicodeData.txt /path/to/UCD-directory/SpecialCasing.txt > nsSpecialCasingData.cpp
This will generate (or overwrite!) the files
* nsSpecialCasingData.cpp
* all-lower-ref.html
* all-lower.html
* all-title-ref.html
* all-title.html
* all-upper-ref.html
* all-upper.html
in the current directory
Then move the six *.html files to layout/reftests/text-transform


== Normalization ==
== Normalization ==
== Transliteration ==


#Download the latest version of UnicodeData.txt from the Unicode website. The current version can be found at http://www.unicode.org/Public/UNIDATA/UnicodeData.txt.
Currently our normalization data is frozen at Unicode 3.2 to conform to [https://www.ietf.org/rfc/rfc3454.txt RFC 3454] (Stringprep), see [https://bugzilla.mozilla.org/show_bug.cgi?id=728180 Bug 728180]
#Copy this file to intl/unicharutil/tools/UnicodeData-Latest.txt in the mozilla source tree
#Run <tt>perl gentransliterate.pl</tt> in intl/unichar/tools. This creates a new version of intl/unicharutil/tables/transliterate.properties


== JavaScript Unicode support ==
== JavaScript Unicode support ==
Line 47: Line 65:
* move into <code>js/src/vm/</code>
* move into <code>js/src/vm/</code>
* run <code>python ./make_unicode.py</code>
* run <code>python ./make_unicode.py</code>
* verify that <code>UnicodeData.txt</code> and the derived files were correctly updated
* verify that <code>UnicodeData.txt</code>, <code>CaseFolding.txt</code>, and the derived files were correctly updated


Note that running <code>python ./make_unicode.py FILENAME</code> instead uses <code>FILENAME</code> as a <code>UnicodeData.txt</code>, if you ever want to generate new data without overwriting the current <code>js/src/vm/UnicodeData.txt</code>.
Note that running <code>python ./make_unicode.py FILENAME1 FILENAME2</code> instead uses <code>FILENAME1</code> as a <code>UnicodeData.txt</code> and <code>FILENAME2</code> as a <code>CaseFolding.txt</code>, if you ever want to generate new data without overwriting the current <code>js/src/vm/UnicodeData.txt</code> and <code>js/src/vm/CaseFolding.txt</code>.

Latest revision as of 10:26, 28 June 2016

I18n:Home Page

This document describes the process of updating the files in the Mozilla codebase that are generated from Unicode data files.

Unicode properties

To regenerate the tables in nsUnicodePropertyData.cpp:

Download the current Unicode data files from http://www.unicode.org/Public/UNIDATA/
NB: not all the files are actually needed; currently, we require

  • UnicodeData.txt
  • Scripts.txt
  • EastAsianWidth.txt
  • BidiMirroring.txt
  • HangulSyllableType.txt
  • SpecialCasing.txt
  • ReadMe.txt (to record version/date of the UCD)
  • Unihan_Variants.txt (from Unihan.zip)

though this may change if we find a need for additional properties.

The Unicode data files listed above should be together in one directory.

We also require the file http://www.unicode.org/Public/security/latest/xidmodifications.txt
This file should be in a sub-directory "security" immediately below the directory containing the other Unicode data files.

We also require the latest data file for UTR50, currently revision-13: http://www.unicode.org/Public/vertical/revision-13/VerticalOrientation-13.txt
This file should be in a sub-directory "vertical" immediately below the directory containing the other Unicode data files.

From intl/unicharutil/util, run the command:

perl ../tools/genUnicodePropertyData.pl /path/to/hb-common.h /path/to/UCD-directory

(where hb-common.h is found in the gfx/harfbuzz/src directory).

This will generate (or overwrite!) the files

  • nsUnicodePropertyData.cpp
  • nsUnicodeScriptCodes.h

in the current directory.

Casing

We require Unicode data files from http://www.unicode.org/Public/UNIDATA/
As well as UnicodeData.txt downloaded in the previous step, we need

  • SpecialCasing.txt

From intl/unichar/util, run the command:

perl ../tools/genSpecialCasingData.pl /path/to/UCD-directory/UnicodeData.txt /path/to/UCD-directory/SpecialCasing.txt > nsSpecialCasingData.cpp

This will generate (or overwrite!) the files

  • nsSpecialCasingData.cpp
  • all-lower-ref.html
  • all-lower.html
  • all-title-ref.html
  • all-title.html
  • all-upper-ref.html
  • all-upper.html

in the current directory

Then move the six *.html files to layout/reftests/text-transform

Normalization

Currently our normalization data is frozen at Unicode 3.2 to conform to RFC 3454 (Stringprep), see Bug 728180

JavaScript Unicode support

To update SpiderMonkey's Unicode support:

  • move into js/src/vm/
  • run python ./make_unicode.py
  • verify that UnicodeData.txt, CaseFolding.txt, and the derived files were correctly updated

Note that running python ./make_unicode.py FILENAME1 FILENAME2 instead uses FILENAME1 as a UnicodeData.txt and FILENAME2 as a CaseFolding.txt, if you ever want to generate new data without overwriting the current js/src/vm/UnicodeData.txt and js/src/vm/CaseFolding.txt.