I18n:Updating Unicode version

Revision as of 12:22, 15 July 2015 by Smontagu (talk | contribs) (→‎Unicode properties: updated for version 8.0 of Unicode)

I18n:Home Page

This document describes the process of updating the files in the Mozilla codebase that are generated from Unicode data files.

Unicode properties

To regenerate the tables in nsUnicodePropertyData.cpp:

Download the current Unicode data files from http://www.unicode.org/Public/UNIDATA/
NB: not all the files are actually needed; currently, we require

  • UnicodeData.txt
  • Scripts.txt
  • EastAsianWidth.txt
  • BidiMirroring.txt
  • HangulSyllableType.txt
  • SpecialCasing.txt
  • ReadMe.txt (to record version/date of the UCD)
  • Unihan_Variants.txt (from Unihan.zip)

though this may change if we find a need for additional properties.

The Unicode data files listed above should be together in one directory.

We also require the file http://www.unicode.org/Public/security/latest/xidmodifications.txt
This file should be in a sub-directory "security" immediately below the directory containing the other Unicode data files.

We also require the latest data file for UTR50, currently revision-13: http://www.unicode.org/Public/vertical/revision-13/VerticalOrientation-13.txt
This file should be in a sub-directory "vertical" immediately below the directory containing the other Unicode data files.

From intl/unicharutil/util, run the command:

perl ../tools/genUnicodePropertyData.pl /path/to/hb-common.h /path/to/UCD-directory

(where hb-common.h is found in the gfx/harfbuzz/src directory).

This will generate (or overwrite!) the files

  • nsUnicodePropertyData.cpp
  • nsUnicodeScriptCodes.h

in the current directory.

Normalization

Transliteration

  1. Download the latest version of UnicodeData.txt from the Unicode website. The current version can be found at http://www.unicode.org/Public/UNIDATA/UnicodeData.txt.
  2. Copy this file to intl/unicharutil/tools/UnicodeData-Latest.txt in the mozilla source tree
  3. Run perl gentransliterate.pl in intl/unichar/tools. This creates a new version of intl/unicharutil/tables/transliterate.properties

JavaScript Unicode support

To update SpiderMonkey's Unicode support:

  • move into js/src/vm/
  • run python ./make_unicode.py
  • verify that UnicodeData.txt and the derived files were correctly updated

Note that running python ./make_unicode.py FILENAME instead uses FILENAME as a UnicodeData.txt, if you ever want to generate new data without overwriting the current js/src/vm/UnicodeData.txt.