Services/Sync/SimplifiedCrypto
The current state of Sync crypto
See Labs/Weave/Developer/Crypto for thorough details.
In short:
- Generate a RSA key pair
- Generate a symmetric key from the passphrase, using PBKDF2 and a random salt, and encrypt ("wrap") the private key using that symmetric key
- Upload public key, wrapped private key + IV + salt to server
- For each collection, generate a random symkey, encrypt it using the public key, and upload it to the server.
- Each encrypted object contains a relative URI pointing to the key that can decrypt it (in 99.99999% percent of the case this is same, except when the WBO IDs contain slashes and clients get very confused).
- Fetching a decrypted object involves:
- Fetching the encrypted object from the server
- Looking at the key URI in that JSON blob to find the symkey URI
- Fetching (if necessary) from the server and RSA-decrypting the symkey
- Using the symkey to AES-decrypt the object.
Goal and motivation
We want to drop the PKI layer. We don't use it (for the original speculative sharing scenarios), and it costs client computation, server storage, and network bandwidth (~ 16% of our bandwidth is spent on key fetches).
Proposal
tl;dr: replace the passphrase with an AES key, which will be schlepped around using J-PAKE (so typing it is likely unnecessary). Use this key to directly encrypt the 'bulk' symkeys. No RSA involved.
Passphrase
Rather than have a user enter a passphrase (which will likely be weak), we have already transitioned to having them generate a "sync key" (which they can replace if they so choose). This is 20 alphanumeric characters.
We propose to expand this to 25 characters, enough for a 128-bit AES key (case-insensitive; 23 is enough if case-sensitive, but that's little advantage). This avoids the use of PBKDF2 to routinely bootstrap the sync key into an AES key. Remove the ability for users to enter a key; it's always generated (giving us more confidence in the amount of entropy), and can be regenerated if desired.
The length of this key is not a big issue: we intend to use J-PAKE for the (infrequent) migration of keys between devices. In any case, 25 is not significantly worse than 20 if typing it does enter the picture.
As before, the sync key is stored on the client.
Existing users will have their passphrase bootstrapped into an AES key using PBKDF2:
- Spot old version
- Get a salt (Proposal: use the Services.syncID from the meta/global object. Presumably the client will be bumping this…)
- Apply PBKDF2 to salt and passphrase to yield our new AES key
- Generate bulk keys, encrypt
- Attempt to store, using appropriate race-avoidance technique in case there are multiple clients attempting to upgrade.
- Wipe old key data.
So long as the salt is available, other clients can apply PBKDF2 to their stored passphrase and the salt to yield the new key without any re-entry or J-PAKE-style key distribution.
Bulk keys
The server stores one or more bulk keys: one default ("keys/default"), and an optional set of keys associated with specific collections. This will allow rudimentary sharing scenarios (provide your bookmarks collection key to a web app, and your passwords remain secure). A default key is simpler than having per-engine/collection keys without an obvious need.
Bulk keys are encrypted using the sync key, and cached on the client. (TODO: per-session or persistent caching?)
The timestamp on the collections record allows clients to invalidate their key cache when a new key is associated with a collection: the 'keys' collection will appear to have changed.
Discussion topic: do we need to preserve (add?) the ability to do per-object encryption? Brian Warner suggested:
- While not really interesting here, this one-way property is really useful in other situations, like if you derived per-object encryption keys from a parent folder's key. You could then share the whole thing with someone by giving them the parent's key, or share just one object and *not* give them the ability to get at anything else in that folder. But the sharing discussion is for another day..
HMAC
It's a good practice to use separate keys for HMAC and for encryption. The simplest approach here is to store a single key ("keys/hmac") for HMAC. This key is treated exactly like a bulk key wrt storage.
Alternative proposal: use a single key as input (for each collection), and derive from it a pair of keys (one for encryption, one for HMAC) by SHA-256 hashing with different tags. TODO: thoughts?
Objects
Now is a great time to partition the "storage" namespace into "keys" and "data", rather than "keys" and everything else (e.g., "tabs"). This makes deletion of just keys or just data much more straightforward.
Proposed flows
New user
- Generate a 128-bit sync key. Store it as we do now.
- Generate a random HMAC key and default key. Encrypt them with the sync key, upload them to the server. Store them as Identities.
- Encrypt and upload collections in the obvious way.
Existing user
(See above.)
Fetching objects
- (On startup: invalidate/refresh key cache if keys collection has changed. I believe we make this fetch anyway...)
- Retrieve object from collection.
- Verify HMAC using "keys/hmac". On failure, check for changed keys.
- Look up key for collection name (defaulting to "keys/default"). Fetch if necessary.
- Decrypt object.
Version bump
This change is incompatible with older clients: not only due to reorganizing the storage namespace, but also because existing clients will be unaware of the simpler encryption mechanism. That means a storage version bump (from 3 to 4).