I18n:Updating Unicode version
This document describes the process of updating the files in the Mozilla codebase that are generated from Unicode data files.
Unicode properties
To regenerate the tables in nsUnicodePropertyData.cpp:
Download the current Unicode data files from http://www.unicode.org/Public/UNIDATA/
NB: not all the files are actually needed; currently, we require
- UnicodeData.txt
- Scripts.txt
- EastAsianWidth.txt
- BidiMirroring.txt
- HangulSyllableType.txt
- SpecialCasing.txt
- ReadMe.txt (to record version/date of the UCD)
- Unihan_Variants.txt (from Unihan.zip)
though this may change if we find a need for additional properties.
The Unicode data files listed above should be together in one directory.
We also require the file http://www.unicode.org/Public/security/latest/xidmodifications.txt
This file should be in a sub-directory "security" immediately below the directory containing the other Unicode data files.
We also require the latest data file for UTR50, currently revision-13: http://www.unicode.org/Public/vertical/revision-13/VerticalOrientation-13.txt
This file should be in a sub-directory "vertical" immediately below the directory containing the other Unicode data files.
From intl/unicharutil/util, run the command:
perl ../tools/genUnicodePropertyData.pl /path/to/hb-common.h /path/to/UCD-directory
(where hb-common.h is found in the gfx/harfbuzz/src directory).
This will generate (or overwrite!) the files
- nsUnicodePropertyData.cpp
- nsUnicodeScriptCodes.h
in the current directory.
Normalization
Transliteration
- Download the latest version of UnicodeData.txt from the Unicode website. The current version can be found at http://www.unicode.org/Public/UNIDATA/UnicodeData.txt.
- Copy this file to intl/unicharutil/tools/UnicodeData-Latest.txt in the mozilla source tree
- Run perl gentransliterate.pl in intl/unichar/tools. This creates a new version of intl/unicharutil/tables/transliterate.properties
JavaScript Unicode support
To update SpiderMonkey's Unicode support:
- move into
js/src/vm/
- run
python ./make_unicode.py
- verify that
UnicodeData.txt
and the derived files were correctly updated
Note that running python ./make_unicode.py FILENAME
instead uses FILENAME
as a UnicodeData.txt
, if you ever want to generate new data without overwriting the current js/src/vm/UnicodeData.txt
.