Identity/AttachedServices/StorageServerProtocol: Difference between revisions

Jump to navigation Jump to search
no edit summary
No edit summary
Line 5: Line 5:
== Delta-Sync Data Model ==
== Delta-Sync Data Model ==


The storage server hosts a number of independent named '''collections''' for each user.  Each collection is a key-value store whose contents can be atomically modified by the client.  Each modification of a collection creates a new '''version''' with corresponding version identifier, which is a signed hash of the contents of the collection at that version.
The storage server hosts a number of independent named '''collections''' for each user.  Each collection is a key-value store whose contents can be atomically modified by the client.


Each modification of a collection creates a new '''version''' with corresponding version identifier in the format <seqnum>:<hash>:<hmac>, giving a signed hash of the contents of the collection at that version.  The server ensures that versions can only be created with monotonically-increasing sequence numbers.
A collection can be marked as '''obsolete'''.  This will cause any further attempts to access it to return an error code.  Obsolete collections may be garbage-collected by the storage server after 24 hours.


More details at [[Identity/CryptoIdeas/04-Delta-Sync]].
More details at [[Identity/CryptoIdeas/04-Delta-Sync]].
Line 15: Line 18:


* the current version number of each collection
* the current version number of each collection
* a short-lived id/key pair that can be used to authenticate subsequent requests with Hawk
* a short-lived id/key pair that can be used to authenticate subsequent requests using the Hawk request-signing scheme
* a URL to which further requests should be directed
* a URL to which further requests should be directed


Line 36: Line 39:
     <  "key": <hawk auth secret key>,
     <  "key": <hawk auth secret key>,
     <  "collections": {
     <  "collections": {
     <    "bookmarks": <version id for bookmarks collection>,
     <    "XXXXX": <version id for this collection>,
     <    "passwords": <version id for passwords collection>,
     <    "YYYYY": <version id for this collection>,
     <    <...etc...>
     <    <...etc...>
     <  }
     <  }
Line 61: Line 64:
     <  {
     <  {
     <  "collections": {
     <  "collections": {
     <    "bookmarks": <version id for bookmarks collection>,
     <    "XXXXX": <version id for this collection>,
     <    "passwords": <version id for passwords collection>,
     <    "YYYYY": <version id for this collection>,
     <    <...etc...>
     <    <...etc...>
     <  }
     <  }
Line 69: Line 72:
=== GET <base-url>/<collection> ===
=== GET <base-url>/<collection> ===


Get the current version id for a specific collection.  Example:
Get the current metadata for a specific collection: its version id, obsolete status, and last read/write times.  Example:


     >  GET <base-url>/<collection>
     >  GET <base-url>/<collection>
Line 77: Line 80:
     <  Content-Type: application/json
     <  Content-Type: application/json
     <  {
     <  {
     <  "version": <version id for this collection>
     <  "version": <version id for this collection>,
    <  "obsolete": false,
    <  "atime": <last-accessed timestamp for this collection>,
    <  "mtime": <last-modified timestamp for this collection>
    <  }
 
=== POST <base-url>/<collection> ===
 
Update writeable metadata for a specific collection.  Currently the only piece of metadata that can be updated is the "obsolete" flag,
which can be flipped to true:
 
    >  POST <base-url>/<collection>
    >  Authorization:  <hawk auth parameters>
    >  {
    >    "obsolete": true
    >  }
    .
    <  200 OK
    <  Content-Type: application/json
    <  {
    <  "version": <version id for this collection>,
    <  "obsolete": true,
    <  "atime": <last-accessed timestamp for this collection>,
    <  "mtime": <last-modified timestamp for this collection>
     <  }
     <  }


Line 109: Line 135:
     <    "key2": null      // a key that was deleted
     <    "key2": null      // a key that was deleted
     <  }
     <  }
     <  }  
     <  }


To allow reliable transfer of a large number of items, both client and server may choose to paginate responses to this query.
To allow reliable transfer of a large number of items, both client and server may choose to paginate responses to this query.
Line 231: Line 257:
== Things To Think About ==
== Things To Think About ==


* How do people feel about the separate "login" step.  It's providing value to the server since it lets us tunnel some state information, but maybe it's not very nice from the client side?
* Currently there's no explicit way for the server to track the current version held by each client.  We could add this in the initial handshake, or intuit it based on their activity.
* Currently there's no explicit way for the server to track the current version held by each client.  We could add this in the initial handshake, or intuit it based on their activity.
* Is json the best format for this transfer, or could we come up with a more efficient representation?
* Is json the best format for this transfer, or could we come up with a more efficient representation?
* Should we add a way to retrieve specific keys, for real-time updating of just the important bits?
* Should we add a way to retrieve specific keys, for real-time updating of just the important bits?
feedback from warner:
  <warner> rfkelly: some random thoughts
  <rfkelly> please :-)
  <warner> there will be "shared collections" and "per-device collections", might be useful to have some metadata indicating which is which
  <warner> something to indicate whether data is stored as class-A or class-B, although we've talked (without conclusion) on how to prevent the storage server from getting to make a downgrade attack
  <warner> might be good to store a key ID with each collection, so clients can discover when a key has been changed (and thus they shouldn't be surprised to get MAC failures when they try to decrypt the records)
  <warner> garbage-collection when the password (and thus kB) is reset, pretty tricky
  <rfkelly> could the keyID also double as the classA/classB indicator?
  <warner> GET base/collection/version?limit= needs a response code to indicate "we're done" versus "more is coming"
  <warner> yeah, probably
  <warner> keyID probably = hash(key)
  <rfkelly> right
  <warner> although, if that, (encKey,hmacKey,keyID) = HKDF(key) would be better
  <rfkelly> is "garbage collection" essentially "delete everything that was created with the old key"
  <rfkelly> ?
  <warner> POSTing batches: first= and upto= sounds good, using "upto not in args" requires that we can always detect a missing message, which might not be the case if we memcache the inbound batch (or if we write it to SQL but then SQL rolls back). Might be worth thinking about that part more than I did in my docs.
  <rfkelly> GET base/collection/version?limit= currently indicates doneness by presence/absence of the "next" key in the body; a response code would be better
  <warner> yeah, GC is that, although we probably need some care to make sure an out-of-date client doesn't manage to delete everything, or get into a delete-fight with a less-out-of-date client
  <warner> (might require seqnums in the keyids)
  <warner> ah, next= is fine, unless REST prefers a response code
  * warner gets down to Things To Think About
  <warner> I think the login step is fine, you probably don't want to be doing pubkey verification with every message
  <warner> it adds one RTT (plus sign, plus verify) per hour, or per whatever lifetime we use on the certs (maybe 12 hours?), which seems pretty reasonable
  <warner> but removes the verify time on every single server message
  <warner> ok, time to chat with chris about native-data stuff
  <warner> rfkelly: looks good overall, I think your list of outstanding questions matches my own
And more:
  <warner> rfkelly: hm, so it might be useful to put the "which keyids do I have data for" list in the verify-signature/issue-token handshake, and then if it changes later, revoke that token, so they must do a new handshake
  <warner> rather than defining error responses for what happens when the data is moved from one class to another (or the class-B data is flushed) in between handshakes
  <rfkelly> interesting
  <rfkelly> basically tie your session to a set of metadata, and if the metadata changes you automatically get your session invaldiated
  <warner> yeah
  <rfkelly> the keyids thing, would it be a distinct keyid per collection, or some additional top-level metadata?
  * warner loves to eliminate error pathways
  <rfkelly> warner++
  <warner> probably one keyid per collection
  <warner> something like, "if you can see this account, you can get/set data for the following collections:.." and "to get the plaintext for collection X, you'll need keyid Y"
  <warner> hm
  <warner> well, the main hope is to not confuse clients who try to use the wrong key
  <warner> basically the only time a client should ever see an HMAC failure is when the server manages to corrupt some data
  <rfkelly> right
  <warner> or if the server is being intentionally malicious
  <warner> so there must be some earlier mechanism to indicate A-vs-B-vs-new-B
  <rfkelly> so I'm thinking of doing a bit more explicit "collection metadata" API; currently the only piece of metadata is hte version number, but now it might be (version, keyid, ...other stuff...?)
  <rfkelly> and let clients explicitly get/set/delete this blog to manage the collection state
  <warner> hm, yeah
  <warner> one thing I think we talked about a while ago was collection discovery
  <rfkelly> warner: I like the idea of using an opaque keyid to distinguish classA/classB, because it could prevent the server from learning what class the data is in
  <warner> yeah
  <rfkelly> the client just tries each key in turn until it finds the one that matches the keyid (like truecrypt does to discover the encryption parameters, IIRC)
  <warner> so, looking at this from the inside of the browser..
  <warner> some component or some plugin tells the PICL client "hey, I have data to sync. My data category is named "bookmarks" and this is a "one shared collection" kind of thing"
  <warner> vs one-collection-per-device
  <warner> also it says "this data is going into class-A" or B, probably according to what the user prefs asked for
  <warner> the data-category is unique to this component/plugin (maybe it's a domain name or URL, or GUID)
  <warner> then the PICL client derives some keys, and computes collection-id = hash(kA, category-name), or maybe hash(kB, category-name), for one-shared-collection types
  <warner> or hash(kA/kB, category-name, device-id) for one-collection-per-device
  <warner> so the server can't actually learn what category-name is, or device-id for that matter
  <warner> and then any one-collection-per-device category also needs a device-id-discovery mechanism
  <rfkelly> can it piggyback that from the devices list in the keyserver/idp/thingo
  <rfkelly> ?
  <warner> something like enc(key=hash(kA,category-name), data=device-id), and the server holds a set of the results
  <warner> hm
  <warner> yeah, that's probably better
  <rfkelly> ISTM that "these are my peer devices" is a higher-level concern than at this storage layer
  <warner> although when you add a device, the existing devices need to learn about it
  <rfkelly> it's no specific to a particular collection or datatype
  * warner nods
But [rfkelly] wonders, if the collection name is derived from a hash of its metadata, whether we need to include an explicit "keyid" at all on the server side.  Change the key?  Change the name of the collection.  Doesn't make garbage-collection any easier though...
need last-written and last-read timestamps, to enable garbage collection in some to-be-defined clever system
Confirmed users
358

edits

Navigation menu