Long-term project goals
- 1) Machine translation at Mozilla
Many Mozilla l10n teams consist of only 1-2 people. While they would love to be able to provide l10n coverage for all of the Mozilla support sites (and other projects) they do not have the time or resources to accomplish the task. Users, thus, have a localized Firefox, but lack documented product support in their language. Proving that MT works at Mozilla is the first barrier to accomplishing the rest of our goals.
- 2) Create a multi-methodological, open, MT service for Mozilla first, then the world!
MT users are limited to using engines that follow a single MT methodology for all language pairs and content types. Studies have shown that a one-size-fits-all approach in MT does not provide the user with optimal translation output. Users need a single access point to different MT engines following different MT methodologies that will produce the best quality output by selecting the right engine for the right language pair.
- 3) Breaking closed ecosystems
Intellego seeks to further establish an open MT ecosystem, as we feel it is the best way to quickly provide high-quality MT services to users on the web at the lowest cost and in a way that engages the open source community. The current machine translation ecosystem is dominated by proprietary, closed systems. This includes their code base, their data collection processes, and public accessibility to their language resources. Additionally, the open MT ecosystem suffers from being unable to reach the vast majority of participants on the web through web services or APIs.
- 4) Machine translation in Firefox
Google's ability to provide users with automatic translation of web content using Google Translate attracts global users to the Chrome browser. Intellego aims to be to Firefox what Google Translate is to Chrome by powering the automatic translation feature within the browser.
- 5) Advancements in MT research
Language support selection for machine translation projects are driven, in part, by ROI and availability of resources. This often results in minority languages, and even some majority languages (see Indic languages) being under-represented in the machine translation ecosystem. While ROI continues to be a primary motivator for incorporating support for these languages, they will remain under-represented and unsupported.
These milestones are based one what a user will hope to be able to do using the Intellego platform.
- Q1 2015
- Create usable corpus data from multilingual Mozilla assets.
- Stand up an internal instance of MosesMT (Intellego server).
- Producing output for 5 European languages with top 10 translated SUMO articles.
- Score output based on MT evaluation methods.
- Gather community-feedback on MT output.
- Identify external sources for corpus data.
- Define what open data means for MT corpus data.
- Q2 2015
- Identify low-barrier platform to integrate Intellego server (possibly Pontoon).
- Pilot the use of MT within a low-impact Mozilla project (snippets?).
- Scale language output to 15 languages.
- Gather feedback from localizers and incorporate it into project spec.
- Design Intellego API, based on existing standards (see TAUS?) with a low barrier to adoption.
- Design terminology extraction project.
- Q3 2015
- Evangelize Intellego API spec among open MT projects.
- Attend Machine Translation Marathon to propose Intellego project, gather feedback, and hack on API design.
- Build community of skilled MT, API, and web dev specialists.
- Develop API and design web platform.
- Link Mozilla Moses MT instance + Apertium to Intellego web platform.
- Q4 2015
- Intellego platform support for Mozilla l10n tools, with necessary UI design for selecting between MT engines.
- Add Pontoon-based feedback mechanism to Intellego UI and API.
- Add 1 RBMT engine to Intellego platform.
- Milestone planning for 2016