Labs/Ubiquity/Parser 2/Localization Tutorial: Difference between revisions
Line 63: | Line 63: | ||
== Defining your roles == | == Defining your roles == | ||
en.roles = [ | |||
{role: 'goal', delimiter: 'to'}, | |||
{role: 'source', delimiter: 'from'}, | |||
{role: 'position', delimiter: 'at'}, | |||
{role: 'position', delimiter: 'on'}, | |||
{role: 'alias', delimiter: 'as'}, | |||
{role: 'instrument', delimiter: 'using'}, | |||
{role: 'instrument', delimiter: 'with'} | |||
]; | |||
The second required property is the inventory of semantic roles and their corresponding delimiters. Each entry has a <code>role</code> from the [[https://wiki.mozilla.org/Labs/Ubiquity/Parser_2/Semantic_Roles|inventory of semantic roles]] and a corresponding delimiter. Note that this mapping can be many-to-many, i.e., each role can have multiple possible delimiters and different roles can have shared delimiters. Try to make sure to cover all of the roles in the [[Labs/Ubiquity/Parser_2/Semantic_Roles|inventory of semantic roles]]. | |||
=== For more info: === | |||
* [http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/ Writing commands with semantic roles], the original proposal | |||
* [[Labs/Ubiquity/Parser_2/Semantic_Roles|Semantic Roles in Parser 2]] | |||
* Wikipedia entry on [http://en.wikipedia.org/wiki/Thematic_relation themantic relations] | |||
== Entering your anaphora ("magic words") == |
Revision as of 07:47, 22 June 2009
If you are interested in command localization, please read the wiki pages on localizing commands and making commands localizable. This entry is for adding your language to Parser 2, so you can use Ubiquity with the grammar of your language.
Introduction
Ubiquity's Parser 2 was written from the ground up with one of its greatest priorities being internationalization... not just making commands localizable, but actually making it so Parser 2 can be easily taught the grammars of other languages. Key to this undertaking is an idea from the Principles and Parameters school of linguistics, that all languages' grammars are made up of the following: (from wikipedia)
- A finite set of fundamental principles that are common to all languages; e.g., that a sentence must always have a subject, even if it is not overtly pronounced.
- A finite set of parameters that determine syntactic variability amongst languages; e.g., a binary parameter that determines whether or not the subject of a sentence must be overtly pronounced.
Following this idea, we built a flexible universal parser, Parser 2, and pair it with a (often very small) set of individual language settings.
The result of this architecture is that it takes very little code to teach Parser 2 a new language. With a little bit of JavaScript and knowledge of and interest in your own language, you’ll be able to get at least rudimentary Ubiquity functionality in your language. Follow along in this step by step guide and please submit your (even incomplete) language files.
Set up your environment
If you’re new to Ubiquity core development, you’ll want to first read the Ubiquity Development Tutorial to learn how to get a live copy of the Ubiquity repository using Mercurial.
As you read along, you may find it beneficial to follow along in some of the more complete language settings files included in Parser 2: English, Japanese, Danish.
The structure of the language file
Each language in Parser 2 gets its own settings file. You'll need to look up the ISO 639-1 code for your language... Here we'll use English (code en
) as an example here and the language settings file would then be called en.js
and go in the /ubiquity/modules/parser/new/
directory of the repository.
Here is the basic template for a Ubiquity Parser 2 language file:
function makeParser() { var en = new Parser('en'); ... return en; };
Everything here is wrapped in a factory function called makeParser
. This function initializes the new Parser
object with the appropriate language code, sets a bunch of parameters (elided above) and returns it. That's it!
Now let's walk through some of the parameters you must set to get your language working. For reference, the properties the language parser object is required to have are: branching
, anaphora
, and roles
.
Identifying your branching parameter
en.branching = 'right'; // or 'left'
One of the first things you'll have to set for your parser is the branching
parameter. Ubiquity Parser 2 uses the branching parameter to decide which direction to look for an argument after finding a delimiter or "role marker" (most often, these are prepositions or postpositions). For example, in English "from" is a delimiter for the goal
role and its argument is on its right.
![]() | |||
to | Mary | from | John |
So "John" is a possible argument for the source
role, but "Mary" should not be. Ubiquity can figure this out because English has the property en.branching = 'right'
.
In Japanese, on the other hand, the argument of a delimiter like から ("from") is found on the left of that delimiter, so en.branching = 'left'
.
![]() | |||
メアリー | -から | ジョン | -に |
Mary | from | John | to |
In general, if your language has prepositions, you should use .branching = 'right'
and if your language has postpositions, you can use .branching = 'left'
.
For more info
- see branching on Wikipedia.
Defining your roles
en.roles = [ {role: 'goal', delimiter: 'to'}, {role: 'source', delimiter: 'from'}, {role: 'position', delimiter: 'at'}, {role: 'position', delimiter: 'on'}, {role: 'alias', delimiter: 'as'}, {role: 'instrument', delimiter: 'using'}, {role: 'instrument', delimiter: 'with'} ];
The second required property is the inventory of semantic roles and their corresponding delimiters. Each entry has a role
from the [of semantic roles] and a corresponding delimiter. Note that this mapping can be many-to-many, i.e., each role can have multiple possible delimiters and different roles can have shared delimiters. Try to make sure to cover all of the roles in the inventory of semantic roles.
For more info:
- Writing commands with semantic roles, the original proposal
- Semantic Roles in Parser 2
- Wikipedia entry on themantic relations