User:Gal/SyncDataModel
Data model
Passwords
Each password is a doc. There are two kinds of password entries. One for forms, and one for the WWW-Authenticate header. We expect dozens of passwords for most users. Extreme cases can be hundreds of passwords.
Key: sha1([hostname, formSubmitURL, usernameField, passwordField].join("|")) or sha1([hostname, httpRealm].join("|"))
Value: JSON of { hostname, username, password, formSubmitURL, usernameField, passwordField } or { hostname, username, password, httpRealm }
Conflict resolution
Last write wins. Future iterations might do merging at read time in presence of _conflicts. Passwords are expected to change very rarely, and we immediately push updates to the server and notify other clients from there.
Initial import
Replicate from (overwrite local passwords with server values). Replicate to (add any passwords we only have locally to server). Retry in case of a race.
Bookmarks
The entire bookmark tree is represented as one doc. The tree consists of folders ({ description, title, children }), separators ({}), and bookmarks ({ title, uri, description, loadInSidebar, tags, keyword}). We expect dozens of bookmarks, and many hundreds worst-case.
Key: "bookmarks"
Value: JSON of tree as described above.
Conflict resolution
Last write wins. Future iterations might do merging at read time in presence of _conflicts. Bookmarks are expected to change rather rarely, and we immediately push updates to the server and notify other clients from there.
Initial import
Replicate from. For desktop, move all local bookmarks into a folder "Local Bookmarks" that is not replicated. The bookmarks we received from the server are now the bookmark tree. For mobile, add any bookmarks to the Mobile folder that don't already exist there. Replicate to. Retry in case of a race.
History
Each history entry is one doc. Long histories are not very uncommon (thousands of uris).
Key: sha1(uri)
Value: JSON of { uri, title, visits }
Conflict resolution
Last write wins. History is expected to change frequently, and we group updates with a timer to avoid paying network transaction overhead for every history update. As a result, conflicts are possible, but also fairly inconsequential.
Initial import
Replicate from. Add any entries that don't already exist on the server. Replicate to. Retry in case of a race.
Revision purging
Clients keep the current revision of a doc in shadow couchdb. Since clients don't directly replicate with each other, there is no need for them to keep any more history than that. The server keeps the revision known to the client that is the furthest behind in replication.
Notifications
Clients that have sync enabled should use our notifications protocol to listen to server changes. For bookmark and password changes, we push to the server after a brief delay. For history updates, we group updates with a certain (not necessarily very short) time delay.
Optional CouchDB protocol modifications
CouchDB is not an overly wire-compact protocol. This is particularly painful for bookmarks updates since we repeat a lot of data that is unchanged from the last state. Instead of sending the whole document again, we should instead do a jsondiff and send the diff only. The server and the client both have the last _rev and can reconstruct the full version. Once the initial version of this is operational, we should consider a fairly simple addition to the CouchDB protocol to make it more compact on the wire. This can be implemented by the web heads in front of the CouchDB server on the server side. Such a "/db/doc_diff" method can then do a GET to the database server, get the last version, apply the diff that came over the network, and then do a PUT, and similarly send a diff based on a GET of the current and the version known to the client. This makes most sense as a generic extension to the protocol to let other data types benefit from this.
In addition, we should use SPDY to enable content compression.