L10n:Pontoon/API: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
mNo edit summary
(Remove the Technology discussion)
Line 1: Line 1:
''For now, this page serves as a scratchpad for documenting the research into different API solutions for Pontoon. Once one solution is chosen and implemented, this page will feature the documentation about this solution.''
Exposing Pontoon's data through an API will enable external consumers to build tools, extensions and reports about translations.  In the future, the API will serve as the backend for Pontoon.NEXT's SPA front-end.  We chose an iterative approach to exposing the data.  We start small with a small number of clear-focused use-cases in mind and expand the scope in subsequent iterations.  The API is based on [http://graphql.org/ GraphQL] (see [https://wiki.mozilla.org/index.php?title=L10n:Pontoon/API&oldid=1181890#Technology discussion]).


High-level Q3 2017 goal: Create an API endpoint supporting queries related to aggregate statistics per locale and per project
The tracking bug for all work related to the API for Pontoon is {{bug|1395273}}.


=Roadmap=
==Roadmap==


In Q3 2017, we'd like to make some data stored in Pontoon openly available for third-parties.  The main driver is the use case from {{bug|1302053}}:  
In Q3 2017, we'd like to make some data stored in Pontoon openly available for third-parties.  The main driver is the use case from {{bug|1302053}}:  
Line 16: Line 16:
* Getting the stream of notifications per authorized user
* Getting the stream of notifications per authorized user


=Tracking=
==Milestone 1==


The tracking bug for all work related to the API for Pontoon is {{bug|1395273}}. Please make sure new bugs block it.
''Complete, deployed on October 2, 2017.''
 
In the first iteration we'd like to make some data stored in Pontoon openly available for third-parties.  The goals is to create an API endpoint supporting queries related to aggregate statistics per locale and per project.  The main driver is the use case from {{bug|1302053}}:
 
* Stats for a locale: supported projects, status of each project.
* Stats for a project: supported locales, incomplete locales, complete locales.


<bugzilla>{
<bugzilla>{
Line 26: Line 31:
     "include_fields": "id, summary, status, resolution, assigned_to, depends_on, blocks"
     "include_fields": "id, summary, status, resolution, assigned_to, depends_on, blocks"
}</bugzilla>
}</bugzilla>
=Technology=
We'll be considering three solutions: REST, GraphQL and GraphQL with Relay.  See https://groups.google.com/forum/#!topic/mozilla.tools.l10n/R1S7Pk-c6uU for more discussion on this topic.
==REST==
REST has been the ''de facto'' standard of API design for the last 10-15 years.
====Pros====
* Easy to implement thanks to the [http://www.django-rest-framework.org/ Django REST Framework] project
* Browsable API: http://restframework.herokuapp.com/
* Familiar to the consumers of the API
* The developer has the exact control over which fields and relations are exposed
====Cons====
* By default, all fields as decided by the developer, are exposed and transferred, resulting in increased bandwidth
** Work-arounds exists, e.g. <code>&fields=foo,bar</code>
* Only the relations expected by the developer can be queried in a single query, e.g. <code>project/1/locales</code>
** Other relations require multiple requests, which can't be optimized
* Requires versioning and documentation
==GraphQL==
GraphQL is a query language in which the consumer describes the shape of the data they want back.
====Pros====
* Easy to learn syntax
* Documentation generated out-of-the-box
* GUI tool for browsing the API with a docs explorer (GraphiQL)
* The consumer specifies exactly which fields they're interested in
* A single query can span multiple types as long as they are connected in the graph
* <code>graphene_django</code> automates a lot of integration, including support for Enum types
====Cons====
* Circular queries are possible (<code>{ projects { locales { projects } } }</code>)
** In order to avoid them, we'd need to write code that inspects the query itself and checks if the fields don't repeat deeper in the query tree
** See https://github.com/graphql-python/graphene/issues/348#issuecomment-267717809 and https://github.com/graphql-python/graphene/issues/462#issuecomment-298218524
* Optimizations relying on <code>prefetch_selected</code> can be brittle.
** I'm still trying to understand exactly what happens.
** The best place to optimize seems to be the top-level Query Type.
** For instance, when querying a list of projects, I can <code>ProjectModel.objects.prefetch_related('project_locale__locale')</code> in the top-level query in order to anticipate that the consumer will want to see the information about the related locales.  In Django terms, this implies <code>project.project_locale.all()</code> which means that I now have to use <code>all()</code> in <code>resolve_locales</code> in the Project GraphQL type.  Which in turn means that when asking for a single Project, I can't <code>prefetch_related</code> in its <code>resolve_locales</code>.  The work-around is to <code>prefetch_related</code> in the top-level query for the single Project too.
** The optimizations can be added dynamically depending on the exact query thanks to the introspection.  This is similar to the approach to preventing circular queries
*** See https://yacine.org/2017/02/27/graphqlgraphene-sqlalchemy-and-the-n1-problem/
==GraphQL with Relay==
Relay is a specification for cursor-based pagination which solves the problem of omitting items when switching between pages if items are being added quickly in real time to the DB.  It works great for Facebook's use-case of showing a feed of news and updates.
====Pros====
* Pagination is guaranteed to not omit items which have been added to the DB while the user was looking at one page and then switched to another one
* Relay has good integration with React
* It's becoming a standard for pagination in GraphQL
====Cons====
* Pontoon's data doesn't change so quickly (projects, locales, entities) to actually require a solution this powerful.
** Translations and suggestions may change more quickly, however.
* <code>graphene_django</code> doesn't handle ManyToMany fields well with Relay enabled; by default the <code>through</code> table adds another layer of edges to the graph, which becomes verbose very quickly
** See https://github.com/graphql-python/graphene/issues/83
* Suffers from the N+1 queries problem for ForeignKeys and ManyToMany relationships
** See https://github.com/graphql-python/graphene-django/issues/57
* De-optimizes <code>prefetch_related</code> and <code>select_related</code>
** See https://github.com/graphql-python/graphene-django/issues/179

Revision as of 11:15, 6 October 2017

Exposing Pontoon's data through an API will enable external consumers to build tools, extensions and reports about translations. In the future, the API will serve as the backend for Pontoon.NEXT's SPA front-end. We chose an iterative approach to exposing the data. We start small with a small number of clear-focused use-cases in mind and expand the scope in subsequent iterations. The API is based on GraphQL (see discussion).

The tracking bug for all work related to the API for Pontoon is bug 1395273.

Roadmap

In Q3 2017, we'd like to make some data stored in Pontoon openly available for third-parties. The main driver is the use case from bug 1302053:

  • Stats for a locale: supported projects, status of each project.
  • Stats for a project: supported locales, incomplete locales, complete locales.

In future iterations, more use-case can be supported:

  • Exposing data which can be fetched by a SPA front-end
    • This will likely require pagination
  • Getting the stream of notifications per authorized user

Milestone 1

Complete, deployed on October 2, 2017.

In the first iteration we'd like to make some data stored in Pontoon openly available for third-parties. The goals is to create an API endpoint supporting queries related to aggregate statistics per locale and per project. The main driver is the use case from bug 1302053:

  • Stats for a locale: supported projects, status of each project.
  • Stats for a project: supported locales, incomplete locales, complete locales.
Full Query
ID Summary Status Resolution Assigned to Depends on Blocks
1302053 Expose project status and information through API RESOLVED FIXED Staś Małolepszy :stas 1395273
1403861 [API] Hide aggregate statistics about Suggested strings RESOLVED FIXED Staś Małolepszy :stas 1377969 1395273
1407192 [API] Enable GraphiQL IDE on production RESOLVED MOVED 1395273
1409704 [tracking] Pontoon API Milestone 2 RESOLVED WONTFIX 1408625, 1409711, 1409723, 1409724 1395273
1410387 [API] Expose aggregate statistics about Suggestions RESOLVED MOVED 1377969 1395273

5 Total; 0 Open (0%); 5 Resolved (100%); 0 Verified (0%);