Update:Remora Localization

From MozillaWiki
Revision as of 20:21, 12 December 2006 by Wenzel (talk | contribs) (→‎example: Deutsch :))
Jump to navigation Jump to search

Localizing 101

Note: Expressions are converted to tags according to the standards mentioned at the end of the page (error_blah and such) so that multiple occurances of the same expression only have to be localized once.

Note 2: I assume you have the GNU gettext tools installed, otherwise the scripts won't work and, more importantly, the wrath of the merciless babelfish[TM] will come upon you.

L10n standards

  • If the string is a "widget" as defined in the shavictionary: (labels, common navigational elements like "Home" or "Top" or "Next")
    • element_name_additional
  • If the string is prose for explanations, error messages, instructions
    • type_name_additional
  • If the string is structural text like headers, titles, breadcrumbs, etc.
    • If the string is not in a form:
      • namespace_pagename_name_additional
    • If the string is inside of a form:
      • namespace_pagename_formname_element_name_additional

Where:

  • namespace is the location in cake of the view, so if it's under /views/developers/ the namespace is "developers"
  • pagename is the name of the file, with underscores taken out
  • formname is just what you think the form should be called with "form" appended, since cake is actually naming them... (should we make this more specific?)
  • name is a unique name for the element (preferably its id)
  • element is the closest tag or parameter, so for an images alt tag it would be "alt". For a label it would be "label"
  • additional is anything else needed to make a string unique
  • type is a global category, like "error"


Static Strings (PHP and gettext)

Use php's gettext functions to make localized strings, for example:

echo _('error_empty_glass');

To do string replacement, use sprintf like this:

echo sprintf(_('refill_something'), $glass, $beer);

Localizers can then translate it similar to:

The waiter pours some more %2$s into your %1$s.

and PHP will put in the value of $glass for %1$s and $beer for %2$s.

Note that we use ordinal parameters (%1$s) rather than simply %s which allows localizers to use a different order of the parameters (or drop some altogether).

Dynamic Strings

We need strings from the database to be localizable as well. This includes all english content in the remora code (Categories, etc.) but also the addons themselves (title, description, etc.)

The translations table is the hub of our localization. It was born out of the Pear::Translation2 specs, but we decided that the pear class wouldn't fit our needs. The translations table has 3 primary columns, and then all the additional comments are supported languages. The 3 main columns are:

  • `foreign_id` - The unique id of the row in the foreign table (that you are getting the translation for)
  • `foreign_table` - The name of the foreign table (as seen by cake!)
  • `foreign_column` - The name of the column in the foreign table that you want the translation for.

For example, if I wanted the name of second category in french, I could run:

 SELECT `fr` FROM `translations` WHERE `foreign_id`=2 AND `foreign_table`="Addontypes" AND `foreign_column`="name";

There are two methods in the Translation model (again, pulled from PEAR's Translation2): getOne() and getPage(). getOne() will retrieve a single row, and getPage() will retrieve an entire "page" - technically, it's actually retrieving all rows with a matching table name (foeign_table).

The table looks like this, for reference:

+----------------+------------------+------+-----+---------+-------+
| Field          | Type             | Null | Key | Default | Extra | 
+----------------+------------------+------+-----+---------+-------+
| foreign_id     | int(11) unsigned |      | PRI | 0       |       |       
| foreign_table  | varchar(50)      |      | PRI |         |       |       
| foreign_column | varchar(50)      |      | PRI |         |       |       
| en-US          | text             | YES  |     | NULL    |       |       
| en-GB          | text             | YES  |     | NULL    |       |       
| de             | text             | YES  |     | NULL    |       |       
...
...
| zh-CN          | text             | YES  |     | NULL    |       |
| zh-TW          | text             | YES  |     | NULL    |       |       
+----------------+------------------+------+-----+---------+-------+

Lars Digression

From the standpoint of an old school SQL guy, this technique just doesn't feel right. I want to preface this digression with the disclaimer that I do not know how your development tools work. Perhaps my discomfort with this technique stems from ignorance.

Here are a couple reasons why the translations table schema bothers me. First it is brittle, changing or adding a locale means that the table schema must be altered. Does the code have to be modified to handle a changing schema? Second, the database's referential integrity checking is subverted because the foreign_id column refers to more than one table. I've never seen that kind of feature supported for referential integrity in an RDBMS. That means the referential integrity must be handled externally. The foreign key relationship seems backwards to me.

Consider this alternative: make the translations table a simple three column table - a non-unique id, a locale and a string. Together the non-unique id and the locale make up the primary key. For each column in another table that needs a translation, replace that column with a partial key to the translations table id column. Whenever you select a row needing translations from a table, you simply add a join to the translations table using the partial key and the locale.

+------------------+------------------+------+-----+---------+-------+
| Field            | Type             | Null | Key | Default | Extra | 
+------------------+------------------+------+-----+---------+-------+
| id               | int(11) unsigned |      | PRI | 0       |       |       
| locale           | varchar(10)      |      | PRI |         |       |       
| localized_string | text             |  yes |     | NULL    |       |       
+------------------+------------------+------+-----+---------+-------+

pros

  • referential integrity
  • extensible - new or changed locale just means additional rows

cons

  • more complicated SQL - enough so to affect preformance? It depends on how many translations are needed in one query. I'll start to get worried at six in a single query.

example

translation table

+----+--------+------------------+
| id | locale | localized_string |
+----+--------+------------------+
| 1  | en-us  | hello            |
| 1  | de     | Guten Tag        |
| 2  | en-us  | help             |
| 2  | de     | Hilfe            |
+----+--------+------------------+

a table

+----+----------+----------+
| id | greeting | danger   |
+----+----------+----------+
| 1  | 1        | 2        |
+----+----------+----------+

sql for english

select a.id, g.localized_string as greeting, d.localized_string as danger
from ((a join translation as g on a.greeting = g.id)
         join translation as d on a.greeting = d.id)
where
 g.locale = 'en-us' 
 and d.locale = 'en-us';
+----+----------+----------+
| id | greeting | danger   |
+----+----------+----------+
| 1  | hello    | help     |
+----+----------+----------+

Now there are a couple of problems with this simplified technique. First, if you have a query that requires ten localized strings, then you need to add ten joins to the query. Nesting joins that deep make the SQL nearly unreadable. Second, if a localized string is missing, there is no simple (non-hacky) way to fetch a default value without resorting to another query.

Updating locales

These steps are most commonly executed in order.

extracting

After l10n strings have been added to the code files/views, they have to be extracted. There's a shell script that goes through the controllers, models and views (.php and .thtml files) and searches for gettext strings. The extracted strings are stored into ./messages.po .

Execute from the app dir:
./locale/extract-po.sh

merging

To bring the respective .po files of the individual locales up to date, execute from the app directory:

./locale/merge-po.sh messages.po ./locale
where messages.po is the file created by the extraction step and ./locale is the directory in which all the locales lie. The merge script will merge the new strings from messages.po into every *.po file underneath ./locale, then.

"compiling"

After translation, plain text .po files (PO = portable object) have to be translated into binary .mo files (MO = machine object). There's a third script you can run for that:

./locale/compile-mo.sh ./locale

This will make a .mo file in the same directory of every .po file.