Gaia/Architecture Proposal
Architecture Proposal
This is a proposal for a new and unified application architecture for Gaia apps. This version of the proposed architecture is not finalized yet, and still under heavy development.
This documentation tries to explain the core ideas in more details, and how each pieces works and interacts with others.
If you are looking for a prettier and higher level introduction to the architecture goals, it can be found at: http://arcturus.github.io/v3-architecture/presentation/#/
Note: This proposal is not about using framework x, y or z. Not even about using libraries x, y and z for the internals. Such discussions could and should append in a separated proposal, or on https://etherpad.mozilla.org/fxos-engineering-most-wanted
Design Goals
- Offline experience (caching assets using Service Workers)
- Multi-threaded (leveraging workers)
- Guaranteed encapsulation (via multiple documents and workers to minimize regressions)
- Optimized loading speed (caching rendered content for fast subsequent page loads)
- Continuity (storing content in the cloud)
- Delta updates (small patches applied transparently)
- Strong memory management (shutting down parts of an application to free up resources)
High level overview
Web App
Web Applications architecture is built based on the pattern described in this document.
The pattern is not tied to specific versions of a library nor a framework. It is intended to encapsulate and expose some logical blocks of the application 'blackbox' logic into something that can be understandable by the browser.
As a result it enforces a platform level encapsulation, by having one compartment per logical pieces of code. For more informations on compartments, please see [1] [2] [3] [4]
In this document 'logical pieces of code' is often referred to as a Client or a Server. But more specific examples for applications can be a particular View, or the view application logic, a main-thread-only WebAPI wrapper, etc. In order to not create confusions with Modules, nor Web Components, they will be called capsules.
Also all applications are hosted web applications, running offline through the use of Service Worker.
Lastly, while various part of the current proposal are directly managed by the application itself, and while this is a deliberated choice in order to prototype things, one of the goal is to move some of them to the platform side.
Service Worker
Applications are no longer packaged. Instead they are web applications cached locally by Service Worker. For more informations on Service Worker, see [5]
Applications are not glued together and have independent updates. As a result every application lives into its own repository.
Telemetry
Every capsule has its own set of telemetry reports. So the telemetry reports for an application is a set of capsule reports. Those reports contain user data such as the time to load a specific capsule, the memory consumed by this capsule as well as any capsule specific data the developer has asked to be reported.
Those data will be collected on a remote server, if the user has opted-in, in order to provide tools for decision making.
For an idea about what is telemetry, please see [6]
Data Sync
Applications data are now synchronized over a remote service. The storage back-end is not decided yet, but the idea is to ensure users data are always available.
Those data can be displayed, using a mobile device, or any other front-end built for the desktop browser.
So one application may have multiple front-end used to access its data.
Service Worker
As describe previously, Service Worker is used as a replacement for our current packaging solution. This change implies a new Security Model that is currently under investigation.
Each application has its own Service Worker. Individual Service Worker, usually offers one store for caching the application resources. In this proposal it is often referred to as the Offline Store.
But a Service Worker can have an indefinite numbers of stores. The current proposal is to leverage this capability to add 2 new stores. More can be added in the future if needed.
The 2 additional stores are:
- Custom store - Render store
And so, the proposal contains 3 stores:
- Offline store - Custom store - Render store
Each of those stores has a different purposes. See the individual section for each store.
Those stores are ordered in the following way:
Render store -> Custom store -> Offline store
So when an application fetch a resource (js, css, html, images, locales, etc...), it iterates over stores to see if there is a match, or falls back on the network.
fetch -> Render store -> Custom store -> Offline store -> Network
Offline store
The Offline store is used to perform a local copy of the source code of the hosted application.
Note: For applications that are shipped by default, the local copy will be inserted at build time.
The application Service Worker can be awaken on a timer in order to check for updates. If any, instead of performing a raw fetch (the default for Service Worker) a client side library will try to perform a delta update.
Application updates are independent of other applications updates. So one application can ship an update when there is a new feature fully finished and validated by QA, or in order to fix a Security issue, etc...
If possible, and if the update does not affect one of the visible capsule for the user, the client side library will try to perform a 'restartless' delta update.
As a more concrete example, if one update is fixing the code of one of the View that is not directly visible to the user, it can be updated in the background without having to restart the application. The next time the user will access this view it will be updated.
Another example is if one update affect some of the view specific logic. Even if this logic is the one for the current view, there are cases where this logic can be shutdown at runtime, and updated transparently for the user.
Custom store
As its name stands, the Custom store is intended for customizations purposes.
Because there might be multiple customizations sources, we can have multiple Custom stores. Also a Custom store can have independent updates from the Offline store.
Some examples about how to use the Custom stores are:
- Partners customizations
- Users customizations
- Replacing one resource in order to fix a bug, or to fix a color you don't like
- Locally fix a bug if the fix has not been released yet
- A/B testing with telemetry reports
- UX/UI concept
- Framework comparison
- Impact of a change
- ...
In order to fully leverage the Custom Store, a remote infrastructure will be needed. If so, it should be possible to distribute changes to a group of users and observe the impact of this change through telemetry reports.
Render store
The Render store is intended to save/restore a serialized version of a particular view, mostly for performance purpose.
As an example, if one view has been pre-translated at build-time to en-US, and if the user changes the locale to fr-FR, then the new serialized version of the html content can be saved into this store.
Next time the user will access this view, it will correctly localized by default (pre-translated) without having to run l10n.js during the view startup.
The Render store can also contains pure virtual files.
As an example, if in the contact application the user look at Fernando's contact details. The specific contact details page can be serialized into the render store. So the next time you will access this view, it will be served over the network as pure html/css, as if it was part of the original source code.
Sometimes the cached information will be out-of-date. The specific cache eviction strategy is up to the application. A save/restore API will be exposed to any specific views, and it is up to the view logic to manage its own cache.
Telemetry Overview
Telemetry is a remote service, with real users data reports.
Those reports can then be used for decisions making.
As mentioned previously, each capsule has its own report. It offers a wide variety of opportunities to observe the application usage in a granular way.
Reports can contains:
- Startup time per panel
- about:memory per capsule
- performance.memory API (chrome Only). But for now it still does not reports enough metrics to us. Basically it reports only the JS Heap size, while additional data such as DOM, CSS, Images, ... consumptions would be valuable.
- Lags
- Communication lags between capsule (See the Bridge section for more details)
- Event loop lags (available in Gecko, need to find a way to expose it to our telemetry report).
- Various other data
- Heatmap
- How many times a capsule has been used
Those reports, if formatted and exploited correctly could help for various types of decisions making:
- A/B testing results for marketing, UX, UI
- A/B testing results when investigating a new framework
- Blockers/Approvals decisions
- QA validation by releasing new feature to a small set of users first (via Custom store).
- ...
Data Sync Overview
Applications data lives on the device, and is synchronized to a remote service. The content is first encrypted on the client side before being propagated remotely.
This remote service is accessible by any app using Firefox Accounts. The encryption token is derived from Firefox Accounts in order to be shareable between multiple devices.
The remote storage back-end is not yet defined. One suggestion that is currently under investigation is to have a proxy offering an HTTP API in order to abstract the specificity of the remote storage back-end.
High Level App Overview
Application front-end and application back-end are independent pieces of code. And so the application front-end and the application back-end lives onto different repositories.
Application back-end and application front-end are both a set of capsules with strong encapsulations.
A single front-end team could then be created in order to own all front-ends. It should makes it easier to unify all our applications front-end and to ensure the front-end and the back-end are not tied in a way that makes it hard for the front-end to evolve.
As a result, the working version of an application will be the union of 3 changesets:
- Gecko revision - App back-end revision - App front-end revision
The back-end repository will not contains any html, css nor localization files.
The front-end repository will contains html, css, localization and js files.
Front-end and back-end are intended to runs on separate threads. Both should also be able to runs as independent standalone applications.
The integration of the front-end and the back-end is enforce by a strict contract established between capsules, following a Client/Server approach. This contract has a version in order to keep the compatibility between newest version of the server and its clients.
Basically the contract defined a set of APIs exposed over the bridge.
Front-End
As mentioned in the introduction, every view is an independent capsule, living in its own compartment.
Technically the compartment split is implemented using a separated <iframe> for each view. Those views are wrapped into a container responsible for the application navigation as well as transitions.
The frond-end is the part responsible for perceived performance, the main thread should be handled with care.
Disclaimers
Note: A fairly common mistake is to try to compare this high-level decoupling with existing framework. Those are solving orthogonal problems, and a direct comparison does not really make sense.
The main idea here is to expose some of the application logic to the web browser in order to benefits from the browser internal machinery as well as being able to get low level monitoring for the exposed part of the application.
This model is not about using x, y or z. It is technology agnostic and uses very basic primitives of the Web. Various technologies can be put on top of that (module UI, React, Web Components, etc.) and can actually be benchmarked with real data from users using Telemetry.
Note: There seems to be a common negative feeling about <iframe>s. Please note that <iframe>s are just a tool to achieve this compartment isolation which results into exposing the app internals to the Web Browser. So are <iframe>s the future of the Web? Probably not, but a high level encapsulation is definitively needed, and the only thing that provide this level of encapsulation at the moment is an <iframe>.
Features
- High-Level Content encapsulation aka no collisions between views for:
- DOM
- DOM per view. When the DOM needs to be traversed for any restyle/reflow/repaint operations, it makes it cheaper.
- CSS
- CSS Per view. When the CSS rules needs to be traversed, it makes it cheaper.
- JavaScript
- This high-level encapsulation is not a replacement for a module loader.
- Smaller JS Heap Size. When the mark-and-sweep algorithm used for GC has to run, it does not need to iterate over the whole Object tree.
- Locales
- DOM
- Contained regressions. A change in a view should not affect other views.
- Fully async UI (since Bridge is async)
- Per-View instrumentation
- Performance API
- Visibility State (visible, hidden, prerendered)
- about:memory
- Telemetry reports
- Prioritization/De-prioritization of views on the event loop. (Since they all run on the UI thread).
- Safe load/unload mechanism for views
Back-End
As described previously the back-end is a set of capsules, specialized to resolve specific needs. Those needs can be related to a specific panel, or the needs of an other back-end capsule.
None of the back-end code is allowed to touch the DOM. The DOM is purely own by the front-end side, and since the back-end can run in a Worker accessing the DOM directly is not an option.
DOM changes are driven by the front-end that can remotely call methods on the back-end, or subscribe to some events. For example the front-end can subscribe to any contacts change, and react in order to update its rendering, possibly calling some of the methods available in the back-end.
Back-end capsules are loaded on-demand. Initially the back-end does not run, but the front-end can ask for part of the back-end to start, because it either needs to call a method, or to subscribe to a specific event.
So back-ends capsule are lazy-loaded, based on the UI needs.
The back-end is also intended to be shutdown at any time if the UI does not need it anymore, or if the app is going into background mode. This is intended to save resources on low-end devices that may not be able to runs too many apps at the same time.
Bridge
The bridge component is an helper to facilitate the communication between capsules. An example is between the views of the front-end with the various pieces of the back-end or even back-end intra communication.
The bridge is designed around a Client/Server architecture, where one server can have multiple clients.
Extra cautious note: both Client and Server are running on the device.
The clients can call remote methods on the server, and can subscribe to events. The server API is defined in a separated strict contract file.
The client does not need to know who is going to resolve the contract. As a result servers can either be Windows, Workers, SharedWorkers or even a ServiceWorker.
It can also be used by a Worker (as a Client) to access Main-thread only WebAPIs via the bridge channel.
The contract defined between a client and a server define the methods and events available, has strong types, offer a place to record the communication for debugging purpose, can measure the latency of responses and fire a telemetry report, ...
The contract for a specific service can have multiple versions in order to allow older clients to works with the newly server code.
Contracts are defined in JS, and lives next to the code that is going to resolve this contract. The type of contexts where this code will run can be decided at runtime, which offers dynamic threading model (See the Threading model section).
Contract example:
contracts['update'] = { methods: { checkForUpdate: { args: [] } }, events: { updatefound: 'undefined' } };
Client side usage example:
var c = new Client('update'); c.checkForUpdate().then(function() { ... }; c.addEventListener('updatefound', function(e) { ... };
Note: The contract resolution is asynchronous since the server may not run when the client asks for a service. But the API is abstracting that so you can call methods even if the server is not running yet.
Server side usage example:
var s = new Server(contracts['update'], { checkForUpdate: function() { return lookForRemoteUpdate(); } });
s.broadcast('updatefound');
Interactions
Front-End / Back-End
This schema represents how the front-end and the back-end collaborates together.
Front-End / Back-End with main-thread-only WebAPIs
It happens that a Worker needs to access a main-thread-only API. In such cases a server capsule will be introduced in the front-end content wrapper, and the worker will use it as a server to access the main-thread-only API.
Front-End / Back-end. Multiple Windows
While on low-end devices most of the application will be shutdown when the application is in background, on high-end devices memory is not the big bottlenext anymore and so we can favor the user experience.
In such cases, if the application is already opened in the background, and the user opens a bookmark to a specific panel, starts a WebActivity resolving to the app, etc.., there is no need to restart the whole application logic, the bridge will just connect the 2 windows in a transparent fashion.
Memory Management
While one of the goal of this architecture is to free the main thread (using it UI related tasks only), and to share the related logic for instant bookmarks, actions, activities, there may be times where the memory limitations of the device are the bottleneck.
For such devices, the model offers a macro memory management. So when the application goes in background, most of the non user-facing parts (in red here) can be shutdown safely in order to recover memory.
Then, when the app is coming back to foreground, those can be restored to maximize the user-experience.
Threading Model
One of the goal of the architecture is to provide
- an easy way to create multi-threaded applications via bridge abstractions
- a workaround to make main-thread-only APIs available to worker threads
That said it's hard to predict which threading model will fit better on which device. So the architecture is intended to be flexible and run on different threading models based on runtime metrics such as the available number of cores and the available memory of the device.
This should let us use different threading model on a per hardware basis based on a configuration file per app.
Single-thread
Double-threads
Multi-threads
Back-End as Services
The hard split between the front-end and the back-end will let us explore alternative models, where both can runs onto separate processes, using the same bridge abstractions.
One front-end / One Service
One front-end / Multiple services
Multipe front-ends / Multiple services
Since the back-end is the one responsible to manage applications data, the same set of data can be shared across multiple front-ends.