Labs/Ubiquity/Ubiquity 0.1 Localization Tutorial: Difference between revisions

From MozillaWiki
< Labs‎ | Ubiquity
Jump to navigation Jump to search
(Replaced content with 'This page is obsolete and has been removed. Localization is now being done in parser 2. Please go to the [Labs/Ubiquity/Parser_2/Localization_Tutorial Parser 2 Localization...')
Line 1: Line 1:
(See the evolution of the parser plugin idea in this blog post and comment thread: [[http://jonoscript.wordpress.com/2008/10/01/%E3%83%90%E3%83%93%E3%83%AB%E3%81%AE%E5%A1%94/]] )
This page is obsolete and has been removedLocalization is now being done in parser 2. Please go to the [Labs/Ubiquity/Parser_2/Localization_Tutorial Parser 2 Localization Tutorial].
 
= Localizing the Parser =
 
(I'm assuming you already have a checkout of the Ubiquity source code.  If not, you can find out how to get it on the [https://wiki.mozilla.org/Labs/Ubiquity/Ubiquity_0.1_Development_Tutorial Development Tutorial] page.)
 
Existing parser plugins are located at ubiquity/chrome/content/nlparser/en/parser_plugin.js  (for English) and ubiquity/chrome/content/nlparser/jp/parser_plugin.js  (for Japanese).  Please take a look at the source code for these and use it for reference.
 
# Go to ubiquity/chrome/content/nlparser.
# Create a directory named for the two-letter language code for your language.
# In that directory, create a file called parser_plugin.js.
 
For example, if we're making a parser plugin for Japanese:
<pre>
cd ubiquity/chrome/content/nlparser
mkdir jp
touch jp/parser_plugin.js
hg add jp/parser_plugin.js
</pre>
 
In the file, create a namespace object.  It can have whatever other data or functions you want, but it must have a top-level object with the following two attributes:
 
# .PRONOUN, A list of strings.  Any string in this list will be treated as a "magic word" which can refer to the user's selection or to the output of a previous command, the way that "it" and "this" work in English Ubiquity.
# .parseSentence, a function that takes (inputString, nounList, verbList, selObj) and returns a list of NLParser.PartiallyParsedSentence objects.
 
So for example:
 
<pre>
var JpParser = {};
JpParser.PRONOUNS = ["これ", "それ", "あれ"];
JpParser.parseSentence = function(inputString, nounList, verbList, selObj) {
  //...
  return parsingList;
}
</pre>
 
NOTE:  If you have non-Ascii characters in your Javascript file, make sure to save it as UTF-8, and not as any other encoding.
 
 
== Writing Your parseSentence() Function ==
 
The arguments that will be passed into your parseSentence method are as follows:
 
'''inputString''':  The user's literal input string -- exactly what they've typed into the Ubiquity input box.
 
'''nounList''': The list of all NounTypes that the parser knows about.  Each one is a NounType object with a suggest() method.
 
'''verbList''': The list of all Verbs (commands) that the parser knows about.  Each one is an NLParser.Verb object (as defined in chrome/content/nlparser/verbtypes.js).
 
'''selObj''': An object representing the user's selection at the time that Ubiquity was invoked.  It has .text and .html attributes (which are often the same -- they'll be different only if the user's selection is a formatted chunk of a web page), like so:
 
<pre>
  selObj = {
    text: "The text of the user's selection",
    html: "<p>The <b>html</b> of the <i>user's</i> selection</p>"
  };
</pre>
 
=== Creating PartiallyParsedSentence Objects ===
 
Since most strings of input can be parsed in more than one way, your parseSentence() function needs to return a ''list'' of PartiallyParsedSentence objects -- one for each possible parsing of the inputThe constructor for PartiallyParsedSentence is like this:
 
<pre>
  new NLParser.PartiallyParsedSentence(verb, argStrings, selObj, matchScore);
</pre>
 
Where:
 
'''verb''' is just the verb object that represents the main verb for this sentence (i.e. it's one of the items from '''verbList''').
 
'''argStrings''' is a dictionary-type object where each key is the name of an argument, and the value is the substring of the input that is being assigned to that argument.
 
'''selObj''' is the same selection object that is passed in to parseSentence().
 
'''matchScore''' is a number from 0 to 1 representing how good of a match the verb is to the input.
 
Verb objects have a .match() method which takes a string and returns a matchScore number.  If a verb's match() method returns 0, that verb doesn't match the given input at all, and no PartiallyParsedSentences should be made using it.
 
=== A Logical Framework for parseSentence() ===
 
So here's a skeletal version of what your parseSentence() function will probably look like:
 
<pre>
JpParser.parseSentence = function( inputString, nounList, verbList, selObj ) {
  let parsings = [];
  let verbString = "";
  /* Do language-specific string processing on inputString to decide what
  * substring of the input is the verb, and assign that to verbString */
 
  for each ( let verb in verbList ) {
    let matchScore = verb.match( verbString );
    if (matchScore == 0)
      continue;  // verb won't match; skip it.
   
    let argStrings = {};
    /* Do some more language-specific string processing on inputString to
    * decide what substring of the input goes with what verb argument, and
    * fill in argStrings' keys and values based on that. */
    parsings.push( new NLParser.PartiallyParsedSentence( verb,
                                                        argStrings,
                                                        selObj,
                                                        matchScore ) );
  }
  return parsings;
}
</pre>
 
Of course, "language-specific string processing", the part I've hand-waved with a comment above, is where all the difficulty happens.
 
=== Making an argStrings Dictionary ===
 
Of that, the most difficult part is generating the argStrings dictionary, so I should explain that a little more. 
 
The keys of the argStrings dictionary must correspond to the keys of the verb's '''_arguments''' attribute.  So for instance, if we had the English version of the translateVerb object and said:
 
<pre>
  for (let argName in translateVerb._arguments)
    //...
</pre>
 
we would get the following argNames, in no particular order:
* "to"
*"from"
*"direct_object"
 
These argument names are the keys that the argStrings object should provide values for.  So for instance, if we had the following input to the English parser:
 
<pre>
  translate buenos tardes from spanish to japanese
</pre>
 
Then the verbString would be "translate", the matching verb object would be the translate verb, and the argStrings dictionary we create would be like this:
 
<pre>
  argStrings = {
    direct_object: "buenos tardes",
    from: "spanish",
    to: "japanese"
  };
</pre>
 
If the inputString provides nothing for one of the arguments, it's OK to leave that argument out of your argStrings dictionary.  So if the input had just been
 
<pre>
  translate buenos tardes from spanish
</pre>
 
it would be OK to create an argStrings object that's just:
 
<pre>
  argStrings = {
    direct_object: "buenos tardes",
    from: "spanish"
  };
</pre>
 
Finally, if the sentence is ambiguous, so that for a certain verb there is more than one way to assign substrings to arguments, you should create an argStrings dictionary for ''each'' possible way of assigning the substrings, create a PartiallyParsedSentence out of ''each'' of those argStrings dictionaries, and return all of them.
 
=== Stuff Your Function Does NOT Have To Worry About ===
 
All this is done for you by the generic parser code that calls your plugin:
 
* Getting suggestions from NounTypes
* Deciding whether a given substring is or is not a ''valid'' value for the argument you're assigning it to.
* Autocompleting a partially-typed verb or any partially-typed arguments.
* Filling in default values for required arguments that are missing.
* Ranking the suggestions from best to worst.
* Making suggestions based on just a selection and no input.
* Making suggestions based on input that matches ''no'' verbs (if it matches no verbs, your function should return an empty list, and the generic parser logic will fall back on other ways of producing a suggestion.)
 
== Registering Your Parser Plugin ==
 
In the future, maybe this registration will be able to happen automatically (we're sticking to a standardized naming scheme for exactly this reason) but for now you'll have to manually add your new parser plugin in a couple of places:
 
# Your .js file needs to be added to the following places:
  * xpcshell_tests.js
  * chrome/content/test.html
  * chrome/content/browser.xul
# In chrome/content/nlparser/parser.js, you need to add the namespace object from your parser_plugin file to the makeParserForLanguage() function, like so:
 
<pre>
NLParser.makeParserForLanguage = function(languageCode, verbList, nounList) {
  let parserPlugin = null;
  if (languageCode == "en") {
    parserPlugin = EnParser;
  }
  if (languageCode == "jp") {
    parserPlugin = JpParser;
  }
  // etc.
</pre>
 
= Localizing Verbs =
 
TODO
 
= Localizing Noun Types =
 
TODO

Revision as of 19:49, 23 June 2009

This page is obsolete and has been removed. Localization is now being done in parser 2. Please go to the [Labs/Ubiquity/Parser_2/Localization_Tutorial Parser 2 Localization Tutorial].