L10n:Pontoon/API: Difference between revisions

mNo edit summary
 
(30 intermediate revisions by 3 users not shown)
Line 1: Line 1:
''Italic text''''For now, this page serves as a scratchpad for documenting the research into different API solutions for Pontoon. Once one solution is chosen and implemented, this page will feature the documentation about this solution.''
==Description==


High-level Q3 2017 goal: Create an API endpoint supporting queries related to aggregate statistics per locale and per project
Exposing Pontoon's data through an API will enable external consumers to build tools, extensions and reports about translations.  In the future, the API will serve as the backend for Pontoon.NEXT's front-end.  We chose an iterative approach to exposing the data.  We start small with a small number of clear-focused use-cases in mind and expand the scope in subsequent iterations.  The API is based on [http://graphql.org/ GraphQL] (see [https://wiki.mozilla.org/index.php?title=L10n:Pontoon/API&oldid=1181890#Technology discussion]).


=Discussion=
==Overview==
{| class="wikitable"
! style="text-align: center;" | Milestone
! style="text-align: center;" | Theme
! style="text-align: center;" | Status
|-
| M1
| Projects and Locales
| ✓
|-
| M2
| User Notifications
|
|-
| M3
| Statistics over time
|
|-
| M4
| Contributors
|
|-
| M5
| Translation Memory
|
|-
| M6
| Translations
|
|-
|}


See https://groups.google.com/forum/#!topic/mozilla.tools.l10n/R1S7Pk-c6uU for more discussion on this topic.
==Roadmap==


=Roadmap=
===Milestone 1: Projects and Locales===


In Q3 2017, we'd like to make some data stored in Pontoon openly available for third-parties.  The main driver is the use case from {{bug|1302053}}:  
''Complete, deployed on October 2, 2017.''
 
In the first iteration we'd like to make some data stored in Pontoon openly available for third-parties.  The goals is to create an API endpoint supporting queries related to aggregate statistics per locale and per project.  The main driver is the use case from {{bug|1302053}}:  


* Stats for a locale: supported projects, status of each project.
* Stats for a locale: supported projects, status of each project.
* Stats for a project: supported locales, incomplete locales, complete locales.
* Stats for a project: supported locales, incomplete locales, complete locales.


In future iterations, more use-case can be supported:
<bugzilla>
    {
        "id": "1302053",
        "include_fields": "id, summary, status, resolution, priority, assigned_to"
    }
</bugzilla>
 
===Milestone 2: User notifications===
 
Use-cases:
 
* [[L10n:Pontoon-Tools|Michal's Pontoon Tools extension]]
 
Queries:
 
* [[L10n:Pontoon-Tools#Data_sources|See Data sources section on the Pontoon Tools wiki]]
* Authentication (only return non-public data like notifications if the API consumer is authenticated)
 
===Milestone 3: Statistics over time===
 
Use-cases:
 
* Community Health, e.g.:
** Translation progress over time.
** Unreviewed suggestions progress over time.
 
Queries:
 
* [https://docs.google.com/document/d/1FlZLe8m2sFX6-AT9u_R8D4IODzcTNfx3d1QNaFzT1MQ/edit Get a number of strings per project within a period of time].
** Or a number of words.
 
===Milestone 4: Contributors===
 
Use-cases:


* Exposing data which can be fetched by a SPA front-end
* [https://docs.google.com/spreadsheets/d/1-QWPJovsag4eYghkK2MMo30wa7QzOOMQkZdIjXnfZGQ/edit#gid=532595279 Number of user groups per locale].
** This will likely require pagination
* Getting the stream of notifications per authorized user


=Technology=
Queries:


We'll be considering three solutions: REST, GraphQL and GraphQL with Relay.
* Query a single contributor (by email? unique key?)
** List contributor data: email, display name, permissions, settings
** List recent activity: date, string, action
** Aggregate counts of: translated, unreviewed, fuzzy strings across all projects
** List of projects they contribute to
*** Aggregate counts of: translated, unreviewed, fuzzy strings for each project
* List all contributors on Pontoon
* List all contributors for a locale
* List all contributors for a project
* List all contributors for a localization (ProjectLocale)


==REST==
===Milestone 5: Translation memory===


REST has been the ''de facto'' standard of API design for the last 10-15 years.
Use-cases:


====Pros====
* External services.
* Mozilla translators in non-Mozilla projects.


* Easy to implement thanks to the [http://www.django-rest-framework.org/ Django REST Framework] project
Queries:
* Browsable API: http://restframework.herokuapp.com/
* Familiar to the consumers of the API
* The developer has the exact control over which fields and relations are exposed


====Cons====
* For a given source string, locale and minimum Levenshtein ratio, return a maximum number of results.


* By default, all fields as decided by the developer, are exposed and transferred, resulting in increased bandwidth
===Milestone 6: Translations===
** Work-arounds exists, e.g. <code>&fields=foo,bar</code>
* Only the relations expected by the developer can be queried in a single query, e.g. <code>project/1/locales</code>
** Other relations require multiple requests, which can't be optimized
* Requires versioning and documentation


Use-cases:


==GraphQL==
* Report translation status of a single page on mozilla.org
* Read-only data required by Pontoon.Next's Translate app
* Editing translations via Pontoon.Next's Translate app


GraphQL is a query language in which the consumer describes the shape of the data they want back.
Queries:


====Pros====
* Establish a good practice for paginating results.
* Easy to learn syntax
* List all Resources for a Project.
* Documentation generated out-of-the-box
* List all Entities for a Resource.
* GUI tool for browsing the API with a docs explorer (GraphiQL)
* List all Translations into a given Locale for an Entity.
* The consumer specifies exactly which fields they're interested in
** Include status: approved, unreviewed, fuzzy.
* A single query can span multiple types as long as they are connected in the graph
** Allow filtering on status via params?
* <code>graphene_django</code> automates a lot of integration, including support for Enum types
** List all Translations for an Entity
* List all TranslatedResources for a ProjectLocale
** Include aggregate statistics.
** List all TranslatedResources for a Resource.
** List all TranslatedResources for a Locale.
* Add a translation for an Entity
** "approved" if the permissions are high enough
** "unreviewed" otherwise
* Approve/reject a suggestion.


====Cons====
<bugzilla>{
* Circular queries are possible (<code>{ projects { locales { projects } } }</code>)
    "f1":"blocked",
** In order to avoid them, we'd need to write code that inspects the query itself and checks if the fields don't repeat deeper in the query tree
    "o1":"equals",
** See https://github.com/graphql-python/graphene/issues/348#issuecomment-267717809 and https://github.com/graphql-python/graphene/issues/462#issuecomment-298218524
    "v1":"1409704",
* Optimizations relying on <code>prefetch_selected</code> can be brittle.
    "include_fields": "id, summary, status, resolution, priority, assigned_to"
** I'm still trying to understand exactly what happens.
}</bugzilla>
** The best place to optimize seems to be the top-level Query Type.
** For instance, when querying a list of projects, I can <code>ProjectModel.objects.prefetch_related('project_locale__locale')</code> in the top-level query in order to anticipate that the consumer will want to see the information about the related locales.  In Django terms, this implies <code>project.project_locale.all()</code> which means that I now have to use <code>all()</code> in <code>resolve_locales</code> in the Project GraphQL type.  Which in turn means that when asking for a single Project, I can't <code>prefetch_related</code> in its <code>resolve_locales</code>.  The work-around is to <code>prefetch_related</code> in the top-level query for the single Project too.
** The optimizations can be added dynamically depending on the exact query thanks to the introspection.  This is similar to the approach to preventing circular queries
*** See https://yacine.org/2017/02/27/graphqlgraphene-sqlalchemy-and-the-n1-problem/


==GraphQL with Relay==
==Ideas==
A list of ideas to consider for future milestones.


Relay is a specification for cursor-based pagination which solves the problem of omitting items when switching between pages if items are being added quickly in real time to the DB.  It works great for Facebook's use-case of showing a feed of news and updates.
==Contact==


====Pros====
{| class="wikitable"
* Pagination is guaranteed to not omit items which have been added to the DB while the user was looking at one page and then switched to another one
! style="text-align: center;" | Role
* Relay has good integration with React
! style="text-align: center;" | Name
* It's becoming a standard for pagination in GraphQL
! style="text-align: center;" | IRC
|-
| Feature Owner
| Staś Małolepszy
| stas
|-
| Product Owner
| Matjaž Horvat
| mathjazz
|-
| Reviewer
| Adrian Gaudebert
| adrian
|}


====Cons====
;Mailing list
* Pontoon's data doesn't change so quickly (projects, locales, entities) to actually require a solution this powerful.
:[https://groups.google.com/forum/#!forum/mozilla.tools.l10n tools-l10n]
** Translations and suggestions may change more quickly, however.
;IRC
* <code>graphene_django</code> doesn't handle ManyToMany fields well with Relay enabled; by default the <code>through</code> table adds another layer of edges to the graph, which becomes verbose very quickly
:[irc://irc.mozilla.org/pontoon #pontoon]
** See https://github.com/graphql-python/graphene/issues/83
* Suffers from the N+1 queries problem for ForeignKeys and ManyToMany relationships
** See https://github.com/graphql-python/graphene-django/issues/57
* De-optimizes <code>prefetch_related</code> and <code>select_related</code>
** See https://github.com/graphql-python/graphene-django/issues/179

Latest revision as of 22:08, 5 July 2018

Description

Exposing Pontoon's data through an API will enable external consumers to build tools, extensions and reports about translations. In the future, the API will serve as the backend for Pontoon.NEXT's front-end. We chose an iterative approach to exposing the data. We start small with a small number of clear-focused use-cases in mind and expand the scope in subsequent iterations. The API is based on GraphQL (see discussion).

Overview

Milestone Theme Status
M1 Projects and Locales
M2 User Notifications
M3 Statistics over time
M4 Contributors
M5 Translation Memory
M6 Translations

Roadmap

Milestone 1: Projects and Locales

Complete, deployed on October 2, 2017.

In the first iteration we'd like to make some data stored in Pontoon openly available for third-parties. The goals is to create an API endpoint supporting queries related to aggregate statistics per locale and per project. The main driver is the use case from bug 1302053:

  • Stats for a locale: supported projects, status of each project.
  • Stats for a project: supported locales, incomplete locales, complete locales.
Full Query
ID Summary Status Resolution Priority Assigned to
1302053 Expose project status and information through API RESOLVED FIXED P3 Staś Małolepszy :stas

1 Total; 0 Open (0%); 1 Resolved (100%); 0 Verified (0%);


Milestone 2: User notifications

Use-cases:

Queries:

Milestone 3: Statistics over time

Use-cases:

  • Community Health, e.g.:
    • Translation progress over time.
    • Unreviewed suggestions progress over time.

Queries:

Milestone 4: Contributors

Use-cases:

Queries:

  • Query a single contributor (by email? unique key?)
    • List contributor data: email, display name, permissions, settings
    • List recent activity: date, string, action
    • Aggregate counts of: translated, unreviewed, fuzzy strings across all projects
    • List of projects they contribute to
      • Aggregate counts of: translated, unreviewed, fuzzy strings for each project
  • List all contributors on Pontoon
  • List all contributors for a locale
  • List all contributors for a project
  • List all contributors for a localization (ProjectLocale)

Milestone 5: Translation memory

Use-cases:

  • External services.
  • Mozilla translators in non-Mozilla projects.

Queries:

  • For a given source string, locale and minimum Levenshtein ratio, return a maximum number of results.

Milestone 6: Translations

Use-cases:

  • Report translation status of a single page on mozilla.org
  • Read-only data required by Pontoon.Next's Translate app
  • Editing translations via Pontoon.Next's Translate app

Queries:

  • Establish a good practice for paginating results.
  • List all Resources for a Project.
  • List all Entities for a Resource.
  • List all Translations into a given Locale for an Entity.
    • Include status: approved, unreviewed, fuzzy.
    • Allow filtering on status via params?
    • List all Translations for an Entity
  • List all TranslatedResources for a ProjectLocale
    • Include aggregate statistics.
    • List all TranslatedResources for a Resource.
    • List all TranslatedResources for a Locale.
  • Add a translation for an Entity
    • "approved" if the permissions are high enough
    • "unreviewed" otherwise
  • Approve/reject a suggestion.
Full Query
ID Summary Status Resolution Priority Assigned to
1408625 [API] Query for a project of a particular locale RESOLVED MOVED P3
1409711 [API] Establish a good practice for paginating results. RESOLVED MOVED P3
1409723 [API] Expose Resources and TranslatedResources RESOLVED MOVED P3
1409724 [API] Expose Entities and Translations RESOLVED MOVED P3

4 Total; 0 Open (0%); 4 Resolved (100%); 0 Verified (0%);


Ideas

A list of ideas to consider for future milestones.

Contact

Role Name IRC
Feature Owner Staś Małolepszy stas
Product Owner Matjaž Horvat mathjazz
Reviewer Adrian Gaudebert adrian
Mailing list
tools-l10n
IRC
#pontoon