I18n:Updating Unicode version: Difference between revisions
(→Unicode properties: updated for version 8.0 of Unicode) |
(Updated for version 8.0 of Unicode) |
||
Line 33: | Line 33: | ||
* nsUnicodeScriptCodes.h | * nsUnicodeScriptCodes.h | ||
in the current directory. | in the current directory. | ||
== Casing == | |||
We require Unicode data files from http://www.unicode.org/Public/UNIDATA/<br> | |||
As well as UnicodeData.txt downloaded in the previous set, we need | |||
* SpecialCasing.txt | |||
From intl/unichar/util, run the command: | |||
perl ../tools/genSpecialCasingData.pl /path/to/UCD-directory/UnicodeData.txt /path/to/UCD-directory/SpecialCasing.txt > nsSpecialCasingData.cpp | |||
This will generate (or overwrite!) the files | |||
* nsSpecialCasingData.cpp | |||
* all-lower-ref.html | |||
* all-lower.html | |||
* all-title-ref.html | |||
* all-title.html | |||
* all-upper-ref.html | |||
* all-upper.html | |||
in the current directory | |||
Then move the six *.html files to layout/reftests/text-transform | |||
== Normalization == | == Normalization == | ||
Currently our normalization data is frozen at Unicode 3.2 to conform to [https://www.ietf.org/rfc/rfc3454.txt RFC 3454] (Stringprep), see [https://bugzilla.mozilla.org/show_bug.cgi?id=728180 Bug 728180] | |||
== JavaScript Unicode support == | == JavaScript Unicode support == |
Revision as of 12:37, 15 July 2015
This document describes the process of updating the files in the Mozilla codebase that are generated from Unicode data files.
Unicode properties
To regenerate the tables in nsUnicodePropertyData.cpp:
Download the current Unicode data files from http://www.unicode.org/Public/UNIDATA/
NB: not all the files are actually needed; currently, we require
- UnicodeData.txt
- Scripts.txt
- EastAsianWidth.txt
- BidiMirroring.txt
- HangulSyllableType.txt
- SpecialCasing.txt
- ReadMe.txt (to record version/date of the UCD)
- Unihan_Variants.txt (from Unihan.zip)
though this may change if we find a need for additional properties.
The Unicode data files listed above should be together in one directory.
We also require the file http://www.unicode.org/Public/security/latest/xidmodifications.txt
This file should be in a sub-directory "security" immediately below the directory containing the other Unicode data files.
We also require the latest data file for UTR50, currently revision-13: http://www.unicode.org/Public/vertical/revision-13/VerticalOrientation-13.txt
This file should be in a sub-directory "vertical" immediately below the directory containing the other Unicode data files.
From intl/unicharutil/util, run the command:
perl ../tools/genUnicodePropertyData.pl /path/to/hb-common.h /path/to/UCD-directory
(where hb-common.h is found in the gfx/harfbuzz/src directory).
This will generate (or overwrite!) the files
- nsUnicodePropertyData.cpp
- nsUnicodeScriptCodes.h
in the current directory.
Casing
We require Unicode data files from http://www.unicode.org/Public/UNIDATA/
As well as UnicodeData.txt downloaded in the previous set, we need
- SpecialCasing.txt
From intl/unichar/util, run the command:
perl ../tools/genSpecialCasingData.pl /path/to/UCD-directory/UnicodeData.txt /path/to/UCD-directory/SpecialCasing.txt > nsSpecialCasingData.cpp
This will generate (or overwrite!) the files
- nsSpecialCasingData.cpp
- all-lower-ref.html
- all-lower.html
- all-title-ref.html
- all-title.html
- all-upper-ref.html
- all-upper.html
in the current directory
Then move the six *.html files to layout/reftests/text-transform
Normalization
Currently our normalization data is frozen at Unicode 3.2 to conform to RFC 3454 (Stringprep), see Bug 728180
JavaScript Unicode support
To update SpiderMonkey's Unicode support:
- move into
js/src/vm/
- run
python ./make_unicode.py
- verify that
UnicodeData.txt
and the derived files were correctly updated
Note that running python ./make_unicode.py FILENAME
instead uses FILENAME
as a UnicodeData.txt
, if you ever want to generate new data without overwriting the current js/src/vm/UnicodeData.txt
.