CloudServices/Sync/ExtensionStorage Design Doc: Difference between revisions

Updated description to match bug 1333810
(This is why we don't encrypt record keys)
(Updated description to match bug 1333810)
Line 9: Line 9:
=== Crypto ===
=== Crypto ===


When a user does a sync, we want the user's data to be stored securely, so we encrypt it. This encryption happens using the Kinto "remote transformer" feature. This means that encryption happens on the client side before sending the data, or just after receiving the data. This also means that all data is stored unencrypted locally.
When a user does a sync, we want the user's data to be stored securely, so we encrypt it. This encryption happens using the Kinto "remote transformer" feature. This means that encryption happens on the client side before sending the data, or just after receiving the data. This also means that all data is stored unencrypted locally. Storing the data encrypted at rest on the user's machine seems hard, because it means you need to have access to any encryption keys or hash salts when you're offline or not logged in, as well as reencrypt everything if those keys change, and it doesn't seem like it provides much in the way of security because if an attacker has access to a user's machine, they can probably already get access to the same encryption keys that Firefox uses.


Each collection (thus, extension) gets its own key. These keys are stored in a separate "keyring", which is itself stored as a record in a special "crypto" collection. This record is encrypted using a key that is derived from a user's kB. This two-tier crypto system was inherited from Firefox Sync and it helps us to minimize data that we reupload when a user's kB changes.
Each collection (thus, extension) gets its own key. These keys are stored in a separate "keyring", which is itself stored as a record in a special "crypto" collection. This record is encrypted using a key that is derived from a user's kB. This two-tier crypto system was inherited from Firefox Sync and it helps us to minimize data that we reupload when a user's kB changes. Each collection also gets its own "salt" which is used to hash IDs related to that collection.


When we sync, we map the local collection name to an "obfuscated" remote collection name. This is done so that metadata doesn't leak information about what extensions a user has installed.
When we sync, we map the local collection name to an "obfuscated" remote collection name. This is done so that metadata doesn't leak information about what extensions a user has installed. The "obfuscated" name is computed by hashing the collection ID using the collection's salt.


=== Encrypting records ===
=== Encrypting records ===
Line 20: Line 20:


When it's time to send this record to the server, it's encrypted using an EncryptionRemoteTransformer. The record is serialized to produce a plaintext. An IV is generated and is used in conjunction with the extension key (above) to produce a ciphertext. An HMAC is computed over the record ID, IV, and ciphertext. The ID and last_modified fields are copied over from the cleartext record so that syncing can work correctly. The encrypted record will then look like {"id": "key-I_20__2665__20_moz_3A__2F__2F_a", "ciphertext": "[some gibberish]", "IV": "[some gibberish]", "hmac": "[some gibberish]", "last_modified": 12345}.
When it's time to send this record to the server, it's encrypted using an EncryptionRemoteTransformer. The record is serialized to produce a plaintext. An IV is generated and is used in conjunction with the extension key (above) to produce a ciphertext. An HMAC is computed over the record ID, IV, and ciphertext. The ID and last_modified fields are copied over from the cleartext record so that syncing can work correctly. The encrypted record will then look like {"id": "key-I_20__2665__20_moz_3A__2F__2F_a", "ciphertext": "[some gibberish]", "IV": "[some gibberish]", "hmac": "[some gibberish]", "last_modified": 12345}.
Additionally, the record ID is hashed to try not to leak information about the record or the extension being used. The hashed record ID has to be consistent across clients so that syncing works correctly, so we hash the ID using the collection's salt. Note that in order for this to work, we have to always be able to go from a hashed record to its original ID. This is normally tricky because Kinto doesn't store any data with the "tombstones" that it stores for deleted records. However, if we store unencrypted tombstones, we would be leaking information about records being deleted, so before sending "delete" notifications to Kinto, we encrypt them the same way we do for normal records. (In the kinto.js documentation, this is described as "local deletes become remote keeps".)


When the server provides this record to a client, it decrypts it in the usual way -- verifying the HMAC first, and then using the IV and the extension key to decrypt the ciphertext, producing a serialized record, which is then used as the real record.
When the server provides this record to a client, it decrypts it in the usual way -- verifying the HMAC first, and then using the IV and the extension key to decrypt the ciphertext, producing a serialized record, which is then used as the real record.
This approach currently leaks metadata -- specifically, information about record identity, which can itself be valuable or allow an attacker to infer what extension is being used. This information would be accessible to anyone who had access to the Kinto database or an FXA token for the user. (Only having an FXA token wouldn't be enough to decrypt the data itself, since you'd still need kB.)
Is it possible to hash the record IDs, so that we don't leak data in this way? If we do this, then we need to store the "true" record ID somewhere, for example in the ciphertext, so that when we get records from the server, we can store them under their true ID so that the extension can access them. However, if one client deletes the record from the server, the server stops serving the complete record body -- instead it just serves a "tombstone", which contains nothing but the (hashed) ID. When we get one of these from the server, it's impossible for us to figure out what local record to delete, so syncing will break. In order to hash record IDs, we would have to modify Kinto to store deleted records forever, and serve them, rather than just tombstones.
How about using encryption to encrypt the record IDs in a reversible way? If we do this, we have to decide what keys to use to encrypt record IDs. We have to be careful with these keys, since if they ever change, we have to rename every record in the collection. Once we have keys that we can use, we'll have to decide what to do with the IVs that we used for each record. Because we can't afford to lose them either, we'd have to embed them in the Kinto record ID somehow too. Finally, once we've surmounted these obstacles, we find that we've opened ourselves up to known-plaintext attacks. Since the universe of webextensions is relatively small (a few thousand), it isn't that difficult to figure out what keys are in use by which extensions, each of which is an attack vector. This seems like a lot of complexity for not a lot of security.


=== Password changes ===
=== Password changes ===
30

edits