Update:Remora Localization: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
(moved updating locale files chapter under static l10n)
 
(19 intermediate revisions by 4 users not shown)
Line 39: Line 39:
Note that we use ordinal parameters (%1$s) rather than simply %s which allows localizers to use a different order of the parameters (or drop some altogether).
Note that we use ordinal parameters (%1$s) rather than simply %s which allows localizers to use a different order of the parameters (or drop some altogether).


==Dynamic Strings==
=== Static pluralization ===


We need strings from the database to be localizable as well. This includes all english content in the remora code (Categories, etc.) but also the addons themselves (title, description, etc.)
Gettext also supports some pluralization (for example, adding an 's' to the end of words when there are more than 1).  To support that, we need to add a Plurarl-Forms header to the .po file, like so:
 
: <code>"Plural-Forms: nplurals=2; plural=n != 1;\n"</code>
 
This Plural-Forms header is appropriate for english, other languages are given [http://www.gnu.org/software/gettext/manual/html_node/gettext_150.html here].
 
After you have the plural forms header, you need to change the translation in the .po file.  Which means, for example, this:
 
: <code> msgid "addons_display_a_previous_releases"</code>
: <code> msgstr "View %d previous releases"</code>
 
becomes:
 
: <code> msgid "addons_display_a_previous_releases"</code>
: <code> msgid_plural "addons_display_a_previous_releases"</code>
: <code> msgstr[0] "View %d previous release"</code>
: <code> msgstr[1] "View %d previous releases"</code>
 
Note that our single and plural forms are the same - that's because we're using placeholder strings.


The ''translations'' table is the hub of our localization.  It was born out of the Pear::Translation2 specs, but we decided that the pear class wouldn't fit our needs.  The ''translations'' table has 3 primary columns, and then all the additional comments are supported languages.  The 3 main columns are:
The final change comes to the code, instead of calling _(), you'll need to use ngettext():


* `id` - The unique id of the row in the foreign table (that you are getting the translation for)
: <code>sprintf(ngettext('addons_display_x_comments_total','addons_display_x_comments_total',$total_comments), $total_comments)</code>
* `pk_column` - The name of the foreign table (as seen by cake!)
* `translated_column` - The name of the column in the foreign table that you want the translation for.


For example, if I wanted the name of second category in french, I could run:
The full gettext manual is [http://www.gnu.org/software/gettext/manual/ available] and intimidating, but it will probably answer any other questions.
  SELECT `fr` FROM `translations` WHERE `id`=2 AND `pk_column`="Addontypes" AND `translated_column`="name";


There are two methods in the Translation model (again, pulled from PEAR's Translation2): getOne() and getPage().  getOne() will retrieve a single row, and getPage() will retrieve an entire "page" - technically, it's actually retrieving all rows with a matching table name (pk_column).
=== Updating gettext files in the remora tree ===
These steps are most commonly executed in order.


== Updating locales ==
==== extracting ====
=== extracting ===
After l10n strings have been added to the PHP code files etc., they have to be extracted into gettext's .po files. There's a shell script that goes through the application source tree (.php and .thtml files) and searches for gettext strings. The extracted strings are stored into <code>./messages.po</code> .
After l10n strings have been added to the code files/views, they have to be extracted. There's a shell script that goes through the controllers, models and views (.php and .thtml files) and searches for gettext strings. The extracted strings are stored into <code>./messages.po</code> .


: Execute from the app dir:
: Execute from the app dir:
: <code>./locale/extract-po.sh</code>
: <code>./locale/extract-po.sh</code>


=== merging ===
==== merging ====
To bring the respective .po files of the individual locales up to date, execute from the app directory:
To bring the respective .po files of the individual locales up to date, execute from the app directory:


Line 67: Line 82:
: where <code>messages.po</code> is the file created by the extraction step and <code>./locale</code> is the directory in which all the locales lie. The merge script will merge the new strings from messages.po into every *.po file underneath ./locale, then.
: where <code>messages.po</code> is the file created by the extraction step and <code>./locale</code> is the directory in which all the locales lie. The merge script will merge the new strings from messages.po into every *.po file underneath ./locale, then.


=== "compiling" ===
Note that translations already made will not be overwritten. New tags will be inserted and strings that aren't used anymore will be deprecated (i.e. commented out and put at the end of the file).
 
==== "compiling" ====
After translation, plain text .po files (PO = portable object) have to be translated into binary .mo files (MO = machine object). There's a third script you can run for that:
After translation, plain text .po files (PO = portable object) have to be translated into binary .mo files (MO = machine object). There's a third script you can run for that:
: <code>./locale/compile-mo.sh ./locale</code>
: <code>./locale/compile-mo.sh ./locale</code>
This will make a .mo file in the same directory of every .po file.
This will make a .mo file in the same directory of every .po file.
== Dynamic Strings ==
We need strings from the database to be localizable as well.  This includes all english content in the remora code (Categories, etc.) but also the addons themselves (title, description, etc.)
Our original method was related closely to Pear::Translation2, but Lars showed us the error of our ways with the great Lars Digression of 2006.  He came up with a new method that allowed referential integrity and a more stable table structure, which is detailed below:
make the translations table a simple three column table - a non-unique id, a locale and a string.  Together the non-unique id and the locale make up the primary key.  For each column in another table that needs a translation, replace that column with a partial key to the translations table id column.  Whenever you select a row needing translations from a table, you simply add a join to the translations table using the partial key and the locale.
+------------------+------------------+------+-----+---------+-------+
| Field            | Type            | Null | Key | Default | Extra |
+------------------+------------------+------+-----+---------+-------+
| id              | int(11) unsigned |      | PRI | 0      |      |     
| locale          | varchar(10)      |      | PRI |        |      |     
| localized_string | text            |  yes |    | NULL    |      |     
+------------------+------------------+------+-----+---------+-------+
====pros====
* referential integrity
* extensible - new or changed locale just means additional rows
====cons====
* more complicated SQL - enough so to affect preformance?  It depends on how many translations are needed in one query.  I'll start to get worried at six in a single query.
====example - fetching values ====
translation table
+----+--------+------------------+
| id | locale | localized_string |
+----+--------+------------------+
| 1  | en-us  | hello            |
| 1  | de    | Guten Tag        |
| 2  | en-us  | help            |
| 2  | de    | Hilfe            |
+----+--------+------------------+
A table
+----+----------+----------+
| id | greeting | danger  |
+----+----------+----------+
| 1  | 1        | 2        |
+----+----------+----------+
sql for english
select a.id, g.localized_string as greeting, d.localized_string as danger
from ((a left outer join translation as g on a.greeting = g.id and g.locale = 'en-us')
          left outer join translation as d on a.greeting = d.id and d.locale = 'en-us')
where
  a.id = ''myTargetForFetching''
+----+----------+----------+
| id | greeting | danger  |
+----+----------+----------+
| 1  | hello    | help    |
+----+----------+----------+
Now there are a couple of problems with this simplified technique.  First, if you have a query that requires ten localized strings, then you need to add ten joins to the query.  Nesting joins that deep make the SQL nearly unreadable.  Second, if a localized string is missing, there is no simple (non-hacky) way to fetch a default value without resorting to another query.
====example - inserting new values====
Let's say we need to add a new row to the A table above.  We must insert the localized strings into the translations table before inserting the new row in the A table.  And, since we cannot rely on the database to automatically generate a new primary key, we'll need to get a new key from a sequence first.
To get a guaranteed unique new key:
UPDATE translations_seq SET id=LAST_INSERT_ID(id+1)
SELECT LAST_INSERT_ID()
With that new key, we can insert a new translation:
insert into translations (id, locale, localized_String) values (newId, 'en-US', 'howdy');
Now assuming we's done those steps twice (once for each new localized string required by a new row in A table).
insert into a (greeting, danger) values (newKey, secondNewKey)
To add a second language option for this new row, we need to both newKey and secondNewKey.  Either these have been stored programmatically or we've refetched them by querying the A table.
insert into translations (id, locale, localized_String) values (newId, 'de', 'Gruss Gott');
  insert into translations (id, locale, localized_String) values (secondNewKey, 'de', 'der Himmel fällt');
====example - deleting rows====
Unfortunately, the cascading deletes using referential integrity within the database do not help us much in deleting translations if their parent row is deleted.  This is because cascading deletes only work to delete rows that refer to a deleted primary key.  If we were to delete a row from the A table, the corresponding rows in the translations table would not disappear because they do not refer to A's primary keys.  This behavior could be repaired at the expense of complicating the new value insertion technique).
For now, we must manually make sure that we delete the translations after we've deleted the target row in the A table.  However, we must save the keys to the translations from the target row before we delete it.  Once the target row is deleted, we can iterate throught the saved list of keys and delete the translations.
Here's an example paraphrased from the python migration script:
  listOfTranslationsToDelete = newDB.executeSql("""
      select name from addons where id = ''targetAddonIDForDeletion''
      union
      select homepage from addons where id = ''targetAddonIDForDeletion''
      union
      select description from addons where id = ''targetAddonIDForDeletion''
      union
      select summary from addons where id = ''targetAddonIDForDeletion''
      union
      select developercomments from addons where id = ''targetAddonIDForDeletion''
      union
      select eula from addons where id = ''targetAddonIDForDeletion''
      union
      select privacypolicy from addons where id = ''targetAddonIDForDeletion''""")
  newDB.executeSql("delete from addons where id = ''targetAddonIDForDeletion''")
  newDB.executeManySql("delete from translations where id = %s", listOfTranslationsToDelete)

Latest revision as of 18:59, 27 January 2007

Localizing 101

Note: Expressions are converted to tags according to the standards mentioned at the end of the page (error_blah and such) so that multiple occurances of the same expression only have to be localized once.

Note 2: I assume you have the GNU gettext tools installed, otherwise the scripts won't work and, more importantly, the wrath of the merciless babelfish[TM] will come upon you.

L10n standards

  • If the string is a "widget" as defined in the shavictionary: (labels, common navigational elements like "Home" or "Top" or "Next")
    • element_name_additional
  • If the string is prose for explanations, error messages, instructions
    • type_name_additional
  • If the string is structural text like headers, titles, breadcrumbs, etc.
    • If the string is not in a form:
      • namespace_pagename_name_additional
    • If the string is inside of a form:
      • namespace_pagename_formname_element_name_additional

Where:

  • namespace is the location in cake of the view, so if it's under /views/developers/ the namespace is "developers"
  • pagename is the name of the file, with underscores taken out
  • formname is just what you think the form should be called with "form" appended, since cake is actually naming them... (should we make this more specific?)
  • name is a unique name for the element (preferably its id)
  • element is the closest tag or parameter, so for an images alt tag it would be "alt". For a label it would be "label"
  • additional is anything else needed to make a string unique
  • type is a global category, like "error"


Static Strings (PHP and gettext)

Use php's gettext functions to make localized strings, for example:

echo _('error_empty_glass');

To do string replacement, use sprintf like this:

echo sprintf(_('refill_something'), $glass, $beer);

Localizers can then translate it similar to:

The waiter pours some more %2$s into your %1$s.

and PHP will put in the value of $glass for %1$s and $beer for %2$s.

Note that we use ordinal parameters (%1$s) rather than simply %s which allows localizers to use a different order of the parameters (or drop some altogether).

Static pluralization

Gettext also supports some pluralization (for example, adding an 's' to the end of words when there are more than 1). To support that, we need to add a Plurarl-Forms header to the .po file, like so:

"Plural-Forms: nplurals=2; plural=n != 1;\n"

This Plural-Forms header is appropriate for english, other languages are given here.

After you have the plural forms header, you need to change the translation in the .po file. Which means, for example, this:

msgid "addons_display_a_previous_releases"
msgstr "View %d previous releases"

becomes:

msgid "addons_display_a_previous_releases"
msgid_plural "addons_display_a_previous_releases"
msgstr[0] "View %d previous release"
msgstr[1] "View %d previous releases"

Note that our single and plural forms are the same - that's because we're using placeholder strings.

The final change comes to the code, instead of calling _(), you'll need to use ngettext():

sprintf(ngettext('addons_display_x_comments_total','addons_display_x_comments_total',$total_comments), $total_comments)

The full gettext manual is available and intimidating, but it will probably answer any other questions.

Updating gettext files in the remora tree

These steps are most commonly executed in order.

extracting

After l10n strings have been added to the PHP code files etc., they have to be extracted into gettext's .po files. There's a shell script that goes through the application source tree (.php and .thtml files) and searches for gettext strings. The extracted strings are stored into ./messages.po .

Execute from the app dir:
./locale/extract-po.sh

merging

To bring the respective .po files of the individual locales up to date, execute from the app directory:

./locale/merge-po.sh messages.po ./locale
where messages.po is the file created by the extraction step and ./locale is the directory in which all the locales lie. The merge script will merge the new strings from messages.po into every *.po file underneath ./locale, then.

Note that translations already made will not be overwritten. New tags will be inserted and strings that aren't used anymore will be deprecated (i.e. commented out and put at the end of the file).

"compiling"

After translation, plain text .po files (PO = portable object) have to be translated into binary .mo files (MO = machine object). There's a third script you can run for that:

./locale/compile-mo.sh ./locale

This will make a .mo file in the same directory of every .po file.


Dynamic Strings

We need strings from the database to be localizable as well. This includes all english content in the remora code (Categories, etc.) but also the addons themselves (title, description, etc.)

Our original method was related closely to Pear::Translation2, but Lars showed us the error of our ways with the great Lars Digression of 2006. He came up with a new method that allowed referential integrity and a more stable table structure, which is detailed below:

make the translations table a simple three column table - a non-unique id, a locale and a string. Together the non-unique id and the locale make up the primary key. For each column in another table that needs a translation, replace that column with a partial key to the translations table id column. Whenever you select a row needing translations from a table, you simply add a join to the translations table using the partial key and the locale.

+------------------+------------------+------+-----+---------+-------+
| Field            | Type             | Null | Key | Default | Extra | 
+------------------+------------------+------+-----+---------+-------+
| id               | int(11) unsigned |      | PRI | 0       |       |       
| locale           | varchar(10)      |      | PRI |         |       |       
| localized_string | text             |  yes |     | NULL    |       |       
+------------------+------------------+------+-----+---------+-------+

pros

  • referential integrity
  • extensible - new or changed locale just means additional rows

cons

  • more complicated SQL - enough so to affect preformance? It depends on how many translations are needed in one query. I'll start to get worried at six in a single query.

example - fetching values

translation table

+----+--------+------------------+
| id | locale | localized_string |
+----+--------+------------------+
| 1  | en-us  | hello            |
| 1  | de     | Guten Tag        |
| 2  | en-us  | help             |
| 2  | de     | Hilfe            |
+----+--------+------------------+

A table

+----+----------+----------+
| id | greeting | danger   |
+----+----------+----------+
| 1  | 1        | 2        |
+----+----------+----------+

sql for english

select a.id, g.localized_string as greeting, d.localized_string as danger
from ((a left outer join translation as g on a.greeting = g.id and g.locale = 'en-us')
         left outer join translation as d on a.greeting = d.id and d.locale = 'en-us')
where 
  a.id = myTargetForFetching
+----+----------+----------+
| id | greeting | danger   |
+----+----------+----------+
| 1  | hello    | help     |
+----+----------+----------+

Now there are a couple of problems with this simplified technique. First, if you have a query that requires ten localized strings, then you need to add ten joins to the query. Nesting joins that deep make the SQL nearly unreadable. Second, if a localized string is missing, there is no simple (non-hacky) way to fetch a default value without resorting to another query.

example - inserting new values

Let's say we need to add a new row to the A table above. We must insert the localized strings into the translations table before inserting the new row in the A table. And, since we cannot rely on the database to automatically generate a new primary key, we'll need to get a new key from a sequence first.

To get a guaranteed unique new key:

UPDATE translations_seq SET id=LAST_INSERT_ID(id+1)
SELECT LAST_INSERT_ID()

With that new key, we can insert a new translation:

insert into translations (id, locale, localized_String) values (newId, 'en-US', 'howdy');

Now assuming we's done those steps twice (once for each new localized string required by a new row in A table).

insert into a (greeting, danger) values (newKey, secondNewKey)

To add a second language option for this new row, we need to both newKey and secondNewKey. Either these have been stored programmatically or we've refetched them by querying the A table.

insert into translations (id, locale, localized_String) values (newId, 'de', 'Gruss Gott');
 insert into translations (id, locale, localized_String) values (secondNewKey, 'de', 'der Himmel fällt'); 

example - deleting rows

Unfortunately, the cascading deletes using referential integrity within the database do not help us much in deleting translations if their parent row is deleted. This is because cascading deletes only work to delete rows that refer to a deleted primary key. If we were to delete a row from the A table, the corresponding rows in the translations table would not disappear because they do not refer to A's primary keys. This behavior could be repaired at the expense of complicating the new value insertion technique).

For now, we must manually make sure that we delete the translations after we've deleted the target row in the A table. However, we must save the keys to the translations from the target row before we delete it. Once the target row is deleted, we can iterate throught the saved list of keys and delete the translations.

Here's an example paraphrased from the python migration script:

 listOfTranslationsToDelete = newDB.executeSql(""" 
     select name from addons where id = targetAddonIDForDeletion
     union 
     select homepage from addons where id = targetAddonIDForDeletion
     union
     select description from addons where id = targetAddonIDForDeletion
     union
     select summary from addons where id = targetAddonIDForDeletion
     union
     select developercomments from addons where id = targetAddonIDForDeletion
     union
     select eula from addons where id = targetAddonIDForDeletion
     union
     select privacypolicy from addons where id = targetAddonIDForDeletion""")
  newDB.executeSql("delete from addons where id = targetAddonIDForDeletion")
  newDB.executeManySql("delete from translations where id = %s", listOfTranslationsToDelete)