CloudServices/Sync/FxSync/StoreRedesign: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
No edit summary
Line 3: Line 3:
== Proposal ==
== Proposal ==


Reduce to two classes: ''Store'' and ''Engine''. ''Store''s are both sinks and sources of records; ''Engine''s exist (from one perspective, at least) to reify the relationship between two Stores (including tracking <tt>lastSync</tt>), which in practice means connecting a pair of stores as source then sink in turn.
Reduce to two classes: ''Repository'' and ''Synchronizer'':
* ''Repositories'' are both sinks and sources of records
* ''Synchronizers'' exist (from one perspective, at least) to reify the relationship between two Repositories (including tracking <tt>lastSync</tt>), which in practice means connecting a pair of repositories as source then sink in turn.


Stores are entirely responsible for providing a timestamp- and record-centric API over a source of data. This interface abstracts the tracking of changed items and application and retrieval of records, and is uniform across both remote and local storage. For example, we would build a ''FxBookmarkStore'' layer above Places, a ''ServerBookmarkStore'' layer in front of the v5/v1.1 Sync API, and connect the two with a simple Engine. Both Store implementations would implement ''exactly the same interface'', which would allow us to trivially implement:
Repositories are entirely responsible for providing a timestamp- and record-centric API over a source of data. This interface abstracts the tracking of changed items and application and retrieval of records, and is uniform across both remote and local storage. For example, we would build a ''FxBookmarkRepository'' layer above Places, a ''ServerRepository'' layer in front of the v5/v1.1 Sync API, and connect the two with a simple Synchronizer. Both Repository implementations would implement ''exactly the same interface'', which would allow us to trivially implement:


* Direct sync between two devices, without a server intermediary
* Direct sync between two devices, without a server intermediary
Line 12: Line 14:
* Sync to multiple destinations
* Sync to multiple destinations


Furthermore, ''middleware'' (classes that implement the Store interface and wrap another Store) can be used to implement:
Furthermore, ''middleware'' (classes that implement the Repository interface and wrap another Repository) can be used to implement:


* Encryption: consume and emit encrypted WBOs; pass decrypted WBOs to the inner store; implement crypto recovery (key refetches)
* Encryption: consume and emit encrypted WBOs; pass decrypted WBOs to the inner store; implement crypto recovery (key refetches)
* Archiving, logging, etc.
* Archiving, logging, etc.
* Version translation, giving Sync multiple version support
* Version translation, giving Sync multiple version support
* Storage item translation: e.g., define Stores in terms of deltas, but maintain storage version compatibility by translation into full objects.
* Storage item translation: e.g., define Repository in terms of deltas, but maintain storage version compatibility by translation into full objects.


== API ==
== API ==

Revision as of 07:06, 9 June 2011

There's currently an artificial and perverted separation between Store, Engine, and Tracker. It's excessive and limiting complexity. We can fix this.

Proposal

Reduce to two classes: Repository and Synchronizer:

  • Repositories are both sinks and sources of records
  • Synchronizers exist (from one perspective, at least) to reify the relationship between two Repositories (including tracking lastSync), which in practice means connecting a pair of repositories as source then sink in turn.

Repositories are entirely responsible for providing a timestamp- and record-centric API over a source of data. This interface abstracts the tracking of changed items and application and retrieval of records, and is uniform across both remote and local storage. For example, we would build a FxBookmarkRepository layer above Places, a ServerRepository layer in front of the v5/v1.1 Sync API, and connect the two with a simple Synchronizer. Both Repository implementations would implement exactly the same interface, which would allow us to trivially implement:

  • Direct sync between two devices, without a server intermediary
  • Sync to a backup file
  • Sync connectors to external data stores
  • Sync to multiple destinations

Furthermore, middleware (classes that implement the Repository interface and wrap another Repository) can be used to implement:

  • Encryption: consume and emit encrypted WBOs; pass decrypted WBOs to the inner store; implement crypto recovery (key refetches)
  • Archiving, logging, etc.
  • Version translation, giving Sync multiple version support
  • Storage item translation: e.g., define Repository in terms of deltas, but maintain storage version compatibility by translation into full objects.

API

The API for Store is defined in terms of callbacks, each of which can be called multiple times (e.g., as batches of records arrive), and iterable sequences. A callback invocation takes one or more errors, and optionally one or more records, as input. Each invocation can be provided with a DONE constant to indicate completion, and can return a STOP constant to (optionally) prevent further cycles.

The Store API consists of three methods and two callbacks:

Callbacks

  • fetchCallback(errs, rex):
    • Invoke with errs = null, an error, or an iterable of errors. Each error has a .guid attribute. rex = null, a record, or an iterable of records. errs and rex cannot both be provided.
    • Invoked one or more times as required.
    • Return STOP if an error has occurred which means that no further records should be provided.
    • Pass DONE as the err input on completion.
  • storeCallback(errs):
    • Invoke with errs = null, an error, or an iterable of errors. Each error has a .guid attribute.

Methods

  • fetchSince(timestamp, fetchCallback):
    • Retrieve a sequence of records that have been modified since timestamp. Invoke the callback with one or more records each time, or DONE when complete.
  • fetchRecords(guids, fetchCallback):
    • Retrieve a sequence of records by GUID. guids should be an iterable. Invoke the callback, as with fetchSince.
  • storeRecords(rex, storeCallback):
    • rex: as for fetchCallback. The output of fetchSince is likely piped into this method.
    • Optionally returns STOP. (Return rather than callback to allow synchronous handoff to fetchSince.)

Notifications

???

Questions and considerations

  • Where does batching happen?
  • What are the patterns around multiple invocation of callbacks?
  • How much can we unify/parameterize stores? I can easily imagine a single V5HTTPStore class that is simply constructed with a collection URL…
  • Does the callback/return combination work in practice, given that we'll be chaining callbacks and invoking async methods at each end?
  • As much storage as possible should be pushed down into the data store. That should allow these classes to be effectively stateless; the two query inputs are modified time and GUID. Tracking should be reduced to deleted items, and even that can be elided.
  • I'm very intrigued by deltas. It should be possible to stage rollout through an appropriate wrapper store.