L10n:Locale Codes: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
m (minor ISO/RFC link adjustments)
(Rewrite the page to clearly state what we have decided upon.)
Line 1: Line 1:
We might want to support/allow generic 2/3-letter language names without an [http://www.iso.org/iso/en/prods-services/iso3166ma/02iso-3166-code-lists/list-en1.html ISO 3166] region identifier.
After recent discussions, MLP staff decided on using an extended scheme for locale names, following the "language tag" RFC 3066.


For example, someone's working on Esperanto, which has no region. Or our current German builds are actually a generic German L10n, usable for all countries which use German, not just Germany (de-DE, aviary) or Austria (de-AT, SeaMonkey). I'm sure there are other examples. Does anything speak against allowing language names like "de", "eo" et al.?
This means that in addition to the previous style of ab-CD locale names, we also support simple language-only names, and even extended names for dialects.


The basic for of our new locale identifiers is <language>-<region>-<dialect>, where the region dialect parts are optional.


Some comments from different people about that idea and eventual problems:
Actually, every language that's not different for different regions should go
with the [http://www.loc.gov/standards/iso639-2/ ISO 639.1/.2] (2-letter/3-letter) language code alone in theory ("de",
"eo", "pl", "cs", etc.), while all where the region does matter should include
it (2-letter [http://www.iso.org/iso/en/prods-services/iso3166ma/02iso-3166-code-lists/list-en1.html ISO 3166] code; locale strings look like those we have used until
now: "es-ES", "es-AR", "pt-PT", "en-US"). In some rare cases, we might need the
dialect part as a third part (3- to 8-letter basically freeform part), we
currently can image two cases there:


&lt;dveditz&gt; KaiRo: it's not something I'm going to spend much time on, but if you can identify what needs to change and what stands in the way I'll help work out any roadblocks
1. (real case we have at the moment): there's no ISO 639.2 code for some
language that wants to do a localization (venetian Firefox in our current case).
In this case, we can use the generic identifier for the language family
(romance: roa) from ISO 639.2 as the language code, and add an identifier for
the specific language as the dialect (if one exists, we prefer to use the
3-letter [http://www.ethnologue.com/codes/default.asp SIL code]). In the
case of venetian, we end up with "roa-IT-vec" this way.


I talked to gandalf if there are issues with the build system, and he tested it for FF trunk:
2. (hypothetical case): we have a real dialect, e.g. a Bavarian L10n, which
<br>&lt;gandalf&gt; KaiRo: --enable-locale-ui=pl works ok.
would get "de-DE-bavarian" or similar.


bsmedberg noted there are some problems in other areas though:
<br>&lt;bsmedberg&gt; The tinderbox build system will choke, however.
<br>&lt;bsmedberg&gt; Because it uses regex like [a-z]{2,3}-[A-Z]{2,3} to ship files and such.


Chase could clear up the problems with tools a bit more:
To summarize, these schemes are supported:
<br>&lt;KaiRo&gt; Chase: the exact specification for what it could be is \w{2-3}(-\w{2}(-\w+)?)? - but we don't expect to get any with a thrid part (dialects) soon, so it's best to use \w{2-3}(-\w{2})?
* ab
<br>&lt;KaiRo&gt; Chase: so what tools are relying on the region right now (guessing out of your head)?
* ab-CD
<br>&lt;Chase&gt; My automation scripts, download.m.o redirect tool, mirror management. Additionally we need to know about it to properly prepare for the correct way to update clients.
* abc-CD
<br>&lt;Chase&gt; My concerns specifically concern dmo and the update client code.
* abc-CD-SIL
<br>&lt;Chase&gt; We need to think that out thoroughly and prepare for what life may be like for us in a year (since we'll still have 1.0/1.0.1/1.1 clients out there at that time that should be able to handle changes).
* abc-CD-dialect
where
* ab/abc  - from [http://www.loc.gov/standards/iso639-2/ ISO 639.1/639.2]
* CD      - from [http://www.iso.org/iso/en/prods-services/iso3166ma/02iso-3166-code-lists/list-en1.html ISO 3166]
* SIL    - from the [http://www.ethnologue.com/codes/ SIL list]
* dialect - following the rules from RFC 3066


As a side note, the [http://www.mozilla-world.org/ mozilla-world.org] domain (operated by [http://www.mozilla-europe.org/ Mozilla Europe]) now shortens locale names in that way by default (after I had some talk with peterv about that matter), see their language navigation at the bottom.


Currently, we do only support those names on releases that user the new version of the "source L10n" approach, with localized files in CVS under the l10n/ partition (main CVS directory). This means it works for Firefox and Thunderbird "trunk" (1.1), while we do not support the new naming scheme for Firefox/Thunderbird 1.0.x or Mozilla suite releases. For those, only the "ab-CD" and "abc-CD" schemes are currently supported.


After recent discussions, someone wanting to register for venetian, which has no [http://www.loc.gov/standards/iso639-2/ ISO 639.2] code, and some of our people even reading the "language tag" RFC 3066, it seems we want to support the third (dialect) part of locale codes as well. That's extremely useful for languages that have no ISO 639.2 code defined, as we can use a generic code (like e.g. roa) and add the [http://www.ethnologue.com/codes/default.asp SIL code] as the "dialect" identifier. This way, we stay inside standardized values and can nicely support those languages we had problems with in the old scheme.
Mozilla Foundation is currently getting this new naming scheme to work with their servers. If you're interested in the progress, look at [https://bugzilla.mozilla.org/show_bug.cgi?id=288244 bug 288244].

Revision as of 16:03, 1 April 2005

After recent discussions, MLP staff decided on using an extended scheme for locale names, following the "language tag" RFC 3066.

This means that in addition to the previous style of ab-CD locale names, we also support simple language-only names, and even extended names for dialects.

The basic for of our new locale identifiers is <language>-<region>-<dialect>, where the region dialect parts are optional.

Actually, every language that's not different for different regions should go with the ISO 639.1/.2 (2-letter/3-letter) language code alone in theory ("de", "eo", "pl", "cs", etc.), while all where the region does matter should include it (2-letter ISO 3166 code; locale strings look like those we have used until now: "es-ES", "es-AR", "pt-PT", "en-US"). In some rare cases, we might need the dialect part as a third part (3- to 8-letter basically freeform part), we currently can image two cases there:

1. (real case we have at the moment): there's no ISO 639.2 code for some language that wants to do a localization (venetian Firefox in our current case). In this case, we can use the generic identifier for the language family (romance: roa) from ISO 639.2 as the language code, and add an identifier for the specific language as the dialect (if one exists, we prefer to use the 3-letter SIL code). In the case of venetian, we end up with "roa-IT-vec" this way.

2. (hypothetical case): we have a real dialect, e.g. a Bavarian L10n, which would get "de-DE-bavarian" or similar.


To summarize, these schemes are supported:

  • ab
  • ab-CD
  • abc-CD
  • abc-CD-SIL
  • abc-CD-dialect

where


Currently, we do only support those names on releases that user the new version of the "source L10n" approach, with localized files in CVS under the l10n/ partition (main CVS directory). This means it works for Firefox and Thunderbird "trunk" (1.1), while we do not support the new naming scheme for Firefox/Thunderbird 1.0.x or Mozilla suite releases. For those, only the "ab-CD" and "abc-CD" schemes are currently supported.

Mozilla Foundation is currently getting this new naming scheme to work with their servers. If you're interested in the progress, look at bug 288244.