I18n:Updating Unicode version: Difference between revisions

No edit summary
 
(Add CaseFolding.txt to the documentation for SpiderMonkey's Unicode support)
 
(22 intermediate revisions by 4 users not shown)
Line 1: Line 1:
[[I18n:Home Page]]
[[I18n:Home Page]]
This document describes the process of updating the files in the Mozilla codebase that are generated from Unicode data files.
This document describes the process of updating the files in the Mozilla codebase that are generated from Unicode data files.


== Case conversion ==
== Unicode properties ==


#Download the latest version of UnicodeData.txt from the Unicode website. The current version can be found at http://www.unicode.org/Public/UNIDATA/UnicodeData.txt.
To regenerate the tables in nsUnicodePropertyData.cpp:
#Copy this file to intl/unicharutil/tools/UnicodeData-Latest.txt in the mozilla source tree
 
#'''Until {{Bug|210501}} is fixed''' you will have to edit UnicodeData-Latest.txt by hand and delete all the lines for codepoints above FFFF
Download the current Unicode data files from http://www.unicode.org/Public/UNIDATA/<br>NB: not all the files are actually needed; currently, we require
#Run <tt>perl gencasetable.pl</tt> This creates a new version of intl/unicharutil/src/casetable.h
* UnicodeData.txt
* Scripts.txt
* EastAsianWidth.txt
* BidiMirroring.txt
* HangulSyllableType.txt
* SpecialCasing.txt
* ReadMe.txt (to record version/date of the UCD)
* Unihan_Variants.txt (from Unihan.zip)
though this may change if we find a need for additional properties.
 
The Unicode data files listed above should be together in one directory.
 
We also require the file http://www.unicode.org/Public/security/latest/xidmodifications.txt<br>This file should be in a sub-directory "security" immediately below the directory containing the other Unicode data files.
 
We also require the latest data file for UTR50, currently revision-13: http://www.unicode.org/Public/vertical/revision-13/VerticalOrientation-13.txt<br>
This file should be in a sub-directory "vertical" immediately below the directory containing the other Unicode data files.
 
From intl/unicharutil/util, run the command:
perl ../tools/genUnicodePropertyData.pl /path/to/hb-common.h /path/to/UCD-directory
(where hb-common.h is found in the gfx/harfbuzz/src directory).
 
This will generate (or overwrite!) the files
* nsUnicodePropertyData.cpp
* nsUnicodeScriptCodes.h
in the current directory.
 
== Casing ==
 
We require  Unicode data files from http://www.unicode.org/Public/UNIDATA/<br>
As well as UnicodeData.txt downloaded in the previous step, we need
* SpecialCasing.txt
 
From intl/unichar/util, run the command:
perl ../tools/genSpecialCasingData.pl /path/to/UCD-directory/UnicodeData.txt /path/to/UCD-directory/SpecialCasing.txt > nsSpecialCasingData.cpp
 
This will generate (or overwrite!) the files
* nsSpecialCasingData.cpp
* all-lower-ref.html
* all-lower.html
* all-title-ref.html
* all-title.html
* all-upper-ref.html
* all-upper.html
in the current directory
 
Then move the six *.html files to layout/reftests/text-transform


== Character properties ==
== Normalization ==
== Normalization ==
== Transliteration ==
 
== Bidi ==
Currently our normalization data is frozen at Unicode 3.2 to conform to [https://www.ietf.org/rfc/rfc3454.txt RFC 3454] (Stringprep), see [https://bugzilla.mozilla.org/show_bug.cgi?id=728180 Bug 728180]
 
== JavaScript Unicode support ==
 
To update SpiderMonkey's Unicode support:
 
* move into <code>js/src/vm/</code>
* run <code>python ./make_unicode.py</code>
* verify that <code>UnicodeData.txt</code>, <code>CaseFolding.txt</code>, and the derived files were correctly updated
 
Note that running <code>python ./make_unicode.py FILENAME1 FILENAME2</code> instead uses <code>FILENAME1</code> as a <code>UnicodeData.txt</code> and <code>FILENAME2</code> as a <code>CaseFolding.txt</code>, if you ever want to generate new data without overwriting the current <code>js/src/vm/UnicodeData.txt</code> and <code>js/src/vm/CaseFolding.txt</code>.

Latest revision as of 10:26, 28 June 2016

I18n:Home Page

This document describes the process of updating the files in the Mozilla codebase that are generated from Unicode data files.

Unicode properties

To regenerate the tables in nsUnicodePropertyData.cpp:

Download the current Unicode data files from http://www.unicode.org/Public/UNIDATA/
NB: not all the files are actually needed; currently, we require

  • UnicodeData.txt
  • Scripts.txt
  • EastAsianWidth.txt
  • BidiMirroring.txt
  • HangulSyllableType.txt
  • SpecialCasing.txt
  • ReadMe.txt (to record version/date of the UCD)
  • Unihan_Variants.txt (from Unihan.zip)

though this may change if we find a need for additional properties.

The Unicode data files listed above should be together in one directory.

We also require the file http://www.unicode.org/Public/security/latest/xidmodifications.txt
This file should be in a sub-directory "security" immediately below the directory containing the other Unicode data files.

We also require the latest data file for UTR50, currently revision-13: http://www.unicode.org/Public/vertical/revision-13/VerticalOrientation-13.txt
This file should be in a sub-directory "vertical" immediately below the directory containing the other Unicode data files.

From intl/unicharutil/util, run the command:

perl ../tools/genUnicodePropertyData.pl /path/to/hb-common.h /path/to/UCD-directory

(where hb-common.h is found in the gfx/harfbuzz/src directory).

This will generate (or overwrite!) the files

  • nsUnicodePropertyData.cpp
  • nsUnicodeScriptCodes.h

in the current directory.

Casing

We require Unicode data files from http://www.unicode.org/Public/UNIDATA/
As well as UnicodeData.txt downloaded in the previous step, we need

  • SpecialCasing.txt

From intl/unichar/util, run the command:

perl ../tools/genSpecialCasingData.pl /path/to/UCD-directory/UnicodeData.txt /path/to/UCD-directory/SpecialCasing.txt > nsSpecialCasingData.cpp

This will generate (or overwrite!) the files

  • nsSpecialCasingData.cpp
  • all-lower-ref.html
  • all-lower.html
  • all-title-ref.html
  • all-title.html
  • all-upper-ref.html
  • all-upper.html

in the current directory

Then move the six *.html files to layout/reftests/text-transform

Normalization

Currently our normalization data is frozen at Unicode 3.2 to conform to RFC 3454 (Stringprep), see Bug 728180

JavaScript Unicode support

To update SpiderMonkey's Unicode support:

  • move into js/src/vm/
  • run python ./make_unicode.py
  • verify that UnicodeData.txt, CaseFolding.txt, and the derived files were correctly updated

Note that running python ./make_unicode.py FILENAME1 FILENAME2 instead uses FILENAME1 as a UnicodeData.txt and FILENAME2 as a CaseFolding.txt, if you ever want to generate new data without overwriting the current js/src/vm/UnicodeData.txt and js/src/vm/CaseFolding.txt.