Labs/Ubiquity/Parser 2/Localization Tutorial: Difference between revisions

no edit summary
No edit summary
 
(8 intermediate revisions by the same user not shown)
Line 18: Line 18:
As you read along, you may find it beneficial to follow along in some of the more complete language settings files included in Parser 2: [https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/tip/ubiquity/modules/parser/new/en.js English], [https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/tip/ubiquity/modules/parser/new/ja.js Japanese], [https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/tip/ubiquity/modules/parser/new/da.js Danish].
As you read along, you may find it beneficial to follow along in some of the more complete language settings files included in Parser 2: [https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/tip/ubiquity/modules/parser/new/en.js English], [https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/tip/ubiquity/modules/parser/new/ja.js Japanese], [https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/tip/ubiquity/modules/parser/new/da.js Danish].


== The structure of the language file ==
== Writing your language settings ==
 
=== The structure of the language file ===


Each language in Parser 2 gets its own settings file. You'll need to look up the [http://en.wikipedia.org/wiki/List_of_ISO_639-1_codes ISO 639-1 code for your language]... Here we'll use English (code <code>en</code>) as an example here and the language settings file would then be called <code>en.js</code> and go in the <code>/ubiquity/modules/parser/new/</code> directory of the repository.
Each language in Parser 2 gets its own settings file. You'll need to look up the [http://en.wikipedia.org/wiki/List_of_ISO_639-1_codes ISO 639-1 code for your language]... Here we'll use English (code <code>en</code>) as an example here and the language settings file would then be called <code>en.js</code> and go in the <code>/ubiquity/modules/parser/new/</code> directory of the repository.
Line 34: Line 36:
Now let's walk through some of the parameters you must set to get your language working. For reference, the properties the language parser object is required to have are: <code>branching</code>, <code>anaphora</code>, and <code>roles</code>.
Now let's walk through some of the parameters you must set to get your language working. For reference, the properties the language parser object is required to have are: <code>branching</code>, <code>anaphora</code>, and <code>roles</code>.


== Identifying your branching parameter ==
=== Identifying your branching parameter ===


   en.branching = 'right'; // or 'left'
   en.branching = 'right'; // or 'left'
Line 58: Line 60:
In general, if your language has prepositions, you should use <code>.branching = 'right'</code> and if your language has postpositions, you can use <code>.branching = 'left'</code>.
In general, if your language has prepositions, you should use <code>.branching = 'right'</code> and if your language has postpositions, you can use <code>.branching = 'left'</code>.


=== For more info ===
==== For more info ====


* see [http://en.wikipedia.org/wiki/Branching_%28linguistics%29 branching] on Wikipedia.
* see [http://en.wikipedia.org/wiki/Branching_%28linguistics%29 branching] on Wikipedia.


== Defining your roles ==
=== Defining your roles ===


   en.roles = [
   en.roles = [
Line 76: Line 78:
The second required property is the inventory of semantic roles and their corresponding delimiters. Each entry has a <code>role</code> from the [[https://wiki.mozilla.org/Labs/Ubiquity/Parser_2/Semantic_Roles|inventory of semantic roles]] and a corresponding delimiter. Note that this mapping can be many-to-many, i.e., each role can have multiple possible delimiters and different roles can have shared delimiters. Try to make sure to cover all of the roles in the [[Labs/Ubiquity/Parser_2/Semantic_Roles|inventory of semantic roles]].
The second required property is the inventory of semantic roles and their corresponding delimiters. Each entry has a <code>role</code> from the [[https://wiki.mozilla.org/Labs/Ubiquity/Parser_2/Semantic_Roles|inventory of semantic roles]] and a corresponding delimiter. Note that this mapping can be many-to-many, i.e., each role can have multiple possible delimiters and different roles can have shared delimiters. Try to make sure to cover all of the roles in the [[Labs/Ubiquity/Parser_2/Semantic_Roles|inventory of semantic roles]].


=== For more info: ===
==== For more info ====


* [http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/ Writing commands with semantic roles], the original proposal
* [http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/ Writing commands with semantic roles], the original proposal
Line 82: Line 84:
* Wikipedia entry on [http://en.wikipedia.org/wiki/Thematic_relation themantic relations]
* Wikipedia entry on [http://en.wikipedia.org/wiki/Thematic_relation themantic relations]


== Entering your anaphora ("magic words") ==
=== Entering your anaphora ("magic words") ===


   en.anaphora = ["this", "that", "it", "selection", "him", "her", "them"];
   en.anaphora = ["this", "that", "it", "selection", "him", "her", "them"];


The final required property is the <code>anaphora</code> property which takes a list of "magic words". Currently there is no distinction between all the different [http://en.wikipedia.org/wiki/Deixis deictic] [http://en.wikipedia.org/wiki/Anaphora_%28linguistics%29 anaphora] which might refer to different things.
The final required property is the <code>anaphora</code> property which takes a list of "magic words". Currently there is no distinction between all the different [http://en.wikipedia.org/wiki/Deixis deictic] [http://en.wikipedia.org/wiki/Anaphora_%28linguistics%29 anaphora] which might refer to different things.
== Register your language ==
Before testing out your new language settings file, you must register that language with the parser. There is a parser resgistry file at <code>ubiquity/modules/parser/new/parser_registry.json</code>. Open it up and add a new line to the JSON object mapping your language code to the native name of your language or locale. For example, if we wanted to add Danish (language code <code>da</code>), we could add the following line:
  da: "Dansk",


== Special cases ==
== Special cases ==
Line 94: Line 102:
=== Languages with no spaces ===
=== Languages with no spaces ===


If your language does not delimit arguments (or words, more generally) with spaces, there will be a need to write a custom <code>wordBreaker()</code> method and set <code>usespaces = false</code> and <code>joindelimiter = ''</code>. For an example, please take a look at the [https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/tip/ubiquity/modules/parser/new/ja.js Japanese] or [https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/tip/ubiquity/modules/parser/new/zh.js Chinese].
If your language does not delimit arguments (or words, more generally) with spaces, there will be a need to write a custom <code>wordBreaker()</code> method and set <code>usespaces = false</code> and <code>joindelimiter = '</code><code>'</code>. For an example, please take a look at the [https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/tip/ubiquity/modules/parser/new/ja.js Japanese] or [https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/tip/ubiquity/modules/parser/new/zh.js Chinese].


=== Case marking languages ===
=== Case marking languages ===
Line 112: Line 120:
# "'''au''' chat" = "'''to the''' cat"
# "'''au''' chat" = "'''to the''' cat"


These types of ''portmanteau'ed prepositions'' can be handled through a process of argument normalization. Each language's parser can optionally define a <code>normalizeArgument()</code> method which takes an argument and returns a list of normalized alternates. Normalized arguments are returned in the form of <code>{prefix: '', newInput: '', suffix: ''}</code>. For example, if you feed "la table" to the French <code>normalizeArgument()</code>, it ought to return  
These types of ''portmanteau'ed prepositions'' can be handled through a process of argument normalization. Each language's parser can optionally define a <code>normalizeArgument()</code> method which takes an argument and returns a list of normalized alternates. Normalized arguments are returned in the form of <code>{prefix: ' ', newInput: ' ', suffix: ' '}</code>. For example, if you feed "la table" to the French <code>normalizeArgument()</code>, it ought to return  


   [{prefix: 'la ', newInput: 'table', suffix: ''}]
   [{prefix: 'la ', newInput: 'table', suffix: ''}]
Line 125: Line 133:


== Test your parser ==
== Test your parser ==
Now you can go into your <code>about:config</code> page and change the value of "extensions.ubiquity.language" to your language code and restart. All the verbs and nountypes at this point will remain the same as in the English version, but it should obey the argument structure (the word order and delimiters) of your language.
[[Image:Ubiquity_Parser_2_Playpen.png|300px]]
You can also test your parser in the Parser 2 Playpen at <code>chrome://ubiquity/content/playpen.html</code>. There's a video explaining [http://vimeo.com/5013787 how you can use the Parser Playpen].
== Conclusion ==
If you run into any trouble, feel free to ask for help on the [http://groups.google.com/group/ubiquity-i18n Ubiquity i18n listhost] or find mitcho on the Ubiquity IRC channel (mitcho @ irc.mozilla.org#ubiquity). Of course, once you're at a good stopping point, please [http://ubiquity.mozilla.com/trac/ticket/662 contribute your language file to Ubiquity].
The next logical step to getting a better Ubiquity experience in your language is to [[Labs/Ubiquity/Ubiquity_0.5_Command_Localization_Tutorial|localize commands]].
308

edits