Confirmed users
358
edits
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
This | == Summary == | ||
This is a working proposal for the PiCL Storage API, to implement the concepts described in [[Identity/CryptoIdeas/05-Queue-Sync]]. | |||
It's a work in progress that will eventually obsolete [[Identity/AttachedServices/StorageProtocolZero]]. | |||
== Queue-Sync Data Model == | |||
More details at [[Identity/CryptoIdeas/05-Queue-Sync]]. | |||
Data is stored in independent named '''collections'''. A collection is a key-value store mapping keys to '''records'''. Each | |||
collection has a monotonically-increasing '''sequence number''' which is incremented whenever a record is changed, and provides the | |||
ability to request all '''changes''' since a given sequence number. | |||
'''Collection''' objects have the following fields: | |||
<table> | |||
<tr><th>Parameter</th><th>Type</th><th>Description</th></tr> | |||
<tr><td>name</td><td>urlsafe string, 64 bytes</td><td>A unique identifier for this collection amongt all the user's data. Collection | |||
names may only contain characters from the urlsafe-base64 alphabet (i.e. alphanumerics, underscore and hyphen).</td></tr> | |||
<tr><td>seqnum</td><td>integer, 8 bytes</td><td>A monotonically-increasing integer that is incremented with each change to the | |||
contents of the collection.</td></tr> | |||
<tr><td>changeid</td><td>urlsafe string, XXX bytes</td><td>A hash that uniquely identifies the last change to this collection. It is | |||
derived from the new sequence number, the previous changeid, and the details of the change that was made.</td></tr> | |||
<tr><td>signature</td><td>urlsafe string, XXX bytes</td><td>A client-generated HMAC signature of the current changeid. Not used or | |||
verified by the server, since it doesn't have the secret key.</td></tr> | |||
</table> | |||
'''Record''' objects have the following fields: | |||
<table> | |||
<tr><th>Parameter</th><th>Type</th><th>Description</th></tr> | |||
<tr><td>key</td><td>urlsafe string, 64 bytes</td><td>A unique identifier for this record within the collection. Keys may only contain | |||
characters from the urlsafe-base64 alphabet (i.e. alphanumerics, underscore and hyphen).</td></tr> | |||
<tr><td>payload</td><td>urlsafe string, 256 KB</td><td>The value current stored in this record. Typically this would be encrypted and | |||
signed by the client.</td></tr> | |||
<tr><td>seqnum</td><td>integer, 8 byte</td><td>The collection-level sequence number at which this record was last modified.</td></tr> | |||
<tr><td>changeid</td><td>urlsafe string, XXX bytes</td><td>The collection-level changeid corresponding to the modification of this | |||
record. It is derived from the new sequence number, the previous changeid, the record key, and the new record payload.</td></tr> | |||
<tr><td>signature</td><td>urlsafe string, XXX bytes</td><td>A client-generated HMAC signature of the changeid for this record. Not | |||
used or verified by the server, since it doesn't have the secret key.</td></tr> | |||
</table> | |||
'''Change''' objects are identical to '''record''' objects, except their payload field may have the value NULL to indicate a deletion | |||
rather than an update: | |||
<table> | |||
<tr><th>Parameter</th><th>Type</th><th>Description</th></tr> | |||
<tr><td>key</td><td>urlsafe string, 64 bytes</td><td>A unique identifier for the changed record within the collection. Keys may only | |||
contain characters from the urlsafe-base64 alphabet (i.e. alphanumerics, underscore and hyphen).</td></tr> | |||
<tr><td>payload</td><td>urlsafe string or null, 256 KB</td><td>The new value to be stored in the record, or null if the record is to | |||
be deleted. Typically this would be encrypted and signed by the client.</td></tr> | |||
<tr><td>seqnum</td><td>integer, 8 byte</td><td>The new collection-level sequence number after this change is applied.</td></tr> | |||
<tr><td>changeid</td><td>urlsafe string, XXX bytes</td><td>The new collection-level changeid corresponding to this change. It is | |||
derived from the new sequence number, the previous changeid, the record key, and the new record payload.</td></tr> | |||
<tr><td>signature</td><td>urlsafe string, XXX bytes</td><td>A client-generated HMAC signature of the changeid. Not used or verified | |||
by the server, since it doesn't have the secret key.</td></tr> | |||
</table> | |||
== Authentication == | == Authentication == | ||
To access the storage service, a client device must authenticate by providing a BrowserID assertion and a Device ID. It will receive in exchange: | To access the storage service, a client device must authenticate by providing a BrowserID assertion and a Device ID. It will receive | ||
in exchange: | |||
* a short-lived id/key pair that can be used to authenticate subsequent requests using the Hawk request-signing scheme | * a short-lived id/key pair that can be used to authenticate subsequent requests using the Hawk request-signing scheme | ||
* a | * a mapping of collection names to access URLs | ||
You can think of this as establishing a "login session" with the server. Access requests for a specific collection should be directed | |||
to the appropriate URL. | |||
Example: | Example: | ||
Line 40: | Line 124: | ||
< Content-Type: application/json | < Content-Type: application/json | ||
< { | < { | ||
< "id": <hawk auth id>, | < "id": <hawk auth id>, | ||
< "key": <hawk auth secret key>, | < "key": <hawk auth secret key>, | ||
< "collections": { | < "collections": { | ||
< " | < "history": <access url for history collection>, | ||
< " | < "bookmarks": <access url for bookmarks collection>, | ||
< <...etc...> | < <...etc...> | ||
< } | < } | ||
< } | < } | ||
The user and device identity information is encoded in the hawk auth id, to avoid re-sending it on each request. The server may also include additional state in this value, depending on the implementation. It's opaque to the client. | The user and device identity information is encoded in the hawk auth id, to avoid re-sending it on each request. The server may also | ||
include additional state in this value, depending on the implementation. It's opaque to the client. | |||
The collection-specific access URLs may include a unique identifier for the user, in order to improve RESTful-icity of the API. Or | |||
they might point the client to a specific data-center which houses their write master for each collection. It's opaque to the client. | |||
== Data Access == | == Data Access == | ||
The client now makes Hawk-authenticated requests to | The client now makes Hawk-authenticated requests to a specific collection at its assigned access url. | ||
The following operations are available on each collection. | |||
=== GET <collection-url> === | |||
Get the current metadata for a collection: its name, seqnum and changeid. | |||
Example: | |||
> GET <collection-url> | |||
> Authorization: <hawk auth parameters> | |||
. | |||
< 200 OK | |||
< Content-Type: application/json | |||
< { | |||
< "name": "history" | |||
< "seqnum": 123, | |||
< "changeid": "HASH_OF_DETAILS_OF_THE_MOST_RECENT_CHANGE", | |||
< "signature": "HMAC_SIGNATURE_OF_CHANGEID" | |||
< } | |||
=== GET <collection-url>/records === | |||
Query parameters: start, end, limit. | |||
Request headers: If-Match, If-None-Match | |||
Response headers: ETag | |||
Get the | Get the set of records currently contained in the collection. For small collections, the full set | ||
of records will be returned like so: | |||
> GET < | > GET <collection-url>/records | ||
> Authorization: <hawk auth parameters> | > Authorization: <hawk auth parameters> | ||
. | . | ||
Line 68: | Line 184: | ||
< Content-Type: application/json | < Content-Type: application/json | ||
< { | < { | ||
< " | < "records": { | ||
< | < "key1": { "payload": "payload1", "seqnum": 123, "changeid": "HASH1", "signature": "sig1" }, | ||
< | < "key2": { "payload": "payload2", "seqnum": 124, "changeid": "HASH2", "signature": "sig2" } | ||
< } | < } | ||
< } | < } | ||
If there are a large number of records in the collection then the server may choose to paginate the result, returning only some of the | |||
records in the initial response. It will include the key "next" in the output to indicate that more records are available: | |||
> GET < | > GET <collection-url>/records | ||
> Authorization: <hawk auth parameters> | > Authorization: <hawk auth parameters> | ||
. | . | ||
Line 85: | Line 201: | ||
< Content-Type: application/json | < Content-Type: application/json | ||
< { | < { | ||
< " | < "next": "key3", | ||
< " | < "items": { | ||
< | < "key1": <record1>, | ||
< | < "key2": <record2> | ||
< } | |||
< } | < } | ||
Clients can request the next batch using the 'start' query parameter: | |||
> GET <collection-url>/records?start=key3 | |||
> | |||
> Authorization: <hawk auth parameters> | > Authorization: <hawk auth parameters> | ||
. | . | ||
< 200 OK | < 200 OK | ||
< Content-Type: application/json | < Content-Type: application/json | ||
< { | < { | ||
< " | < "items": { | ||
< | < "key3": <record3>, | ||
< | < "key4": <record4> | ||
< | < } | ||
< } | < } | ||
When no "next" value is included in the response, the client knows that all available records have | |||
been fetched. | |||
Records are always batched in lexicographic order of their keys, and clients are free to request an arbitrary key range using the | |||
'start' and 'end' parameters: | |||
> GET < | > GET <collection-url>/records?start=key2&end=key3 | ||
> Authorization: <hawk auth parameters> | > Authorization: <hawk auth parameters> | ||
. | . | ||
Line 122: | Line 236: | ||
< { | < { | ||
< "items": { | < "items": { | ||
< | < "key2": <record2>, | ||
< | < "key3": <record3> | ||
< } | < } | ||
< } | < } | ||
Clients may also choose to batch their requests by using the 'limit' query parameter. As with server-driven batching, the output key | |||
> GET < | "next" will be used to indicate that more data is available: | ||
> GET <collection-url>/records?start=key2&limit=2 | |||
> Authorization: <hawk auth parameters> | |||
. | |||
< 200 OK | |||
< Content-Type: application/json | |||
< { | |||
< "next": "key4", | |||
< "items": { | |||
< "key2": <record2>, | |||
< "key3": <record3> | |||
< } | |||
< } | |||
. | |||
. | |||
> GET <collection-url>/records?start=key4&limit=2 | |||
> Authorization: <hawk auth parameters> | > Authorization: <hawk auth parameters> | ||
. | . | ||
Line 137: | Line 266: | ||
< { | < { | ||
< "items": { | < "items": { | ||
< " | < "key4": <record4> | ||
< } | < } | ||
< } | < } | ||
Each server response will include an "ETag" header, formed from the combination of the current seqnum and changeid of the collection. | |||
If | Clients can use this in combination with standard If-Match and If-None-Match headers to ensure that they're getting a consistent view | ||
of the collection: | |||
> GET < | > GET <collection-url>/records?start=key2&limit=2 | ||
> Authorization: <hawk auth parameters> | > Authorization: <hawk auth parameters> | ||
. | . | ||
< 200 OK | < 200 OK | ||
< Content-Type: application/json | < Content-Type: application/json | ||
< ETag: 124-HASH2 | |||
< { | < { | ||
< "next": " | < "next": "key4", | ||
< "items": { | < "items": { | ||
< " | < "key2": <record2>, | ||
< " | < "key3": <record3> | ||
< } | < } | ||
< } | < } | ||
. | . | ||
. | . | ||
> GET < | > GET <collection-url>/records?start=key4&limit=2 | ||
> Authorization: <hawk auth parameters> | |||
> If-Match: 123-HASH | |||
. | |||
< 412 Precondition Failed | |||
< ETag: 125-HASH3 | |||
XXX TODO: use of headers, versus returning seqnum/changeid in the response body? | |||
=== GET <collection-url>/records/<key> === | |||
Request headers: If-Match, If-None-Match | |||
Response headers: ETag | |||
Get the specific record stored under the given key: | |||
> GET <collection-url>/records/<key> | |||
> Authorization: <hawk auth parameters> | > Authorization: <hawk auth parameters> | ||
. | . | ||
< 200 OK | < 200 OK | ||
< Content-Type: application/json | < Content-Type: application/json | ||
< ETag: 123-HASH1 | |||
< { | < { | ||
< " | < "key": <key> | ||
< " | < "seqnum": 123, | ||
< "changeid": "HASH1", | |||
< "payload": "payload1" | |||
< } | < } | ||
< } | < } | ||
This request supports standard etag behaviour to ensure that a consistent view of the collection is being read. | |||
=== GET <collection-url>/changes === | |||
Query parameters: since, limit. | |||
Get the sequence of changes that have been made to the collection. If the number of changes to be returned is small, they will be | |||
returned all at once like so: | |||
> | > GET <collection-url>/changes | ||
> Authorization: <hawk auth parameters> | > Authorization: <hawk auth parameters> | ||
. | . | ||
< | < 200 OK | ||
< Content-Type: application/json | |||
< { | |||
< "changes": [ | |||
< { "seqnum": 0, "changeid": "HASH1", "signature": "sig1", "key": "key1", "payload": "payload1" }, | |||
< { "seqnum": 1, "changeid": "HASH2", "signature": "sig2", "key": "key2", "payload": "payload2" }, | |||
< } | |||
< } | |||
The changeids and signatures on these changes form a hash chain which can be verified by the client. | |||
If there are a large number of changes to be fetched then the server may choose to paginate the result, returning only some of the | |||
changes in the initial request. It will include the key "next" in the output to indicate that more changes are available: | |||
> | > GET <collection-url>/changes | ||
> Authorization: <hawk auth parameters> | |||
. | |||
< 200 OK | |||
< Content-Type: application/json | |||
< { | |||
< "next": 3, | |||
< "changes": [ | |||
< <change1>, | |||
< <change2> | |||
< ] | |||
< } | |||
Clients can request the next batch using the 'since' query parameter: | |||
> GET <collection-url>/changes?since=3 | |||
> Authorization: <hawk auth parameters> | > Authorization: <hawk auth parameters> | ||
. | . | ||
< | < 200 OK | ||
< Content-Type: application/json | |||
< { | |||
< "changes": [ | |||
< <change3>, | |||
< <change4> | |||
< ] | |||
< } | |||
Records are always batched in sequence number order. Clients are free to request changes starting at an arbitrary sequence number, | |||
which is useful for pulling in just the things that have changed since a previous sync. | |||
As | Clients may also choose to batch their requests by using the 'limit' query parameter. As with server-driven batching, the output key | ||
> | "next" will be used to indicate that more data is available: | ||
> GET <collection-url>/changes?since=2&limit=2 | |||
> Authorization: <hawk auth parameters> | > Authorization: <hawk auth parameters> | ||
. | . | ||
< | < 200 OK | ||
< Content-Type: application/json | |||
< { | |||
< "next": 4, | |||
< "changes": [ | |||
< <change2>, | |||
< <change3> | |||
< ] | |||
< } | |||
. | |||
. | |||
> GET <collection-url>/changes?since=4&limit=2 | |||
> Authorization: <hawk auth parameters> | |||
. | . | ||
< 200 OK | |||
< Content-Type: application/json | |||
< { | |||
< "changes": { | |||
< <change4> | |||
< } | |||
< } | |||
The server is not required to keep the full change history from seqnum zero, and may periodically compact and garbage-collection the | |||
stored data. If the client requests changes since a seqnum that is no longer known to the server, it will receive an error: | |||
> GET <collection-url>/changes?since=1 | |||
> Authorization: <hawk auth parameters> | |||
. | . | ||
< 416 Requested Range Not Satisfiable | |||
XXX TODO: seriously, is there a good error code for this, or should we just tunnel errors in the body? | |||
=== POST <collection-url>/records === | |||
Request headers: If-Match, If-None-Match | |||
Response headers: ETag | |||
Update or delete records in the collection. The request body must contain an array of change objects with properly-formed sequence | |||
numbers and changeids, and it must be preconditioned with an If-Match or If-None-Match header: | |||
> POST <collection-url>/records | |||
> Authorization: <hawk auth parameters> | > Authorization: <hawk auth parameters> | ||
> If-Match: 125-HASH1 | |||
> { | > { | ||
> | > "changes": [ | ||
> | > {"key": "key1", "payload": "newpayload1", "seqnum": 126, "changeid": "NEWHASH1", "signature": "newsig1"}, | ||
> | > {"key": "key2", "payload": null, "seqnum": 127, "changeid": "NEWHASH2", "signature": "newsig2"} | ||
> } | > } | ||
> } | |||
. | . | ||
< | < 204 No Content | ||
The server will apply each change in turn, checking that the seqnum and changeid hash chains are properly formed. If they are not | |||
then an error will be reported: | |||
> POST <collection-url>/records | |||
> Authorization: <hawk auth parameters> | |||
> If-Match: 120-OLD-HASH | |||
> { | |||
> "changes": [ | |||
> {"key": "key1", "payload": "newpayload1", "seqnum": 121, "changeid": "NEWHASH1", "signature": "newsig1"}, | |||
> {"key": "key2", "payload": null, "seqnum": 122, "changeid": "NEWHASH2", "signature": "newsig2"} | |||
> } | |||
> } | |||
. | . | ||
< 412 Precondition Failed | |||
< ETag: 125-HASH1 | |||
No content is returned in response to a POST. The client has already calculated the new seqnum and changeid for the collection, so | |||
there is no more useful information that the server can provide. | |||
=== POST <collection-url>/records/<key> === | |||
Update or delete a specific record in the collection. The request body must contain a change object with properly-formed sequence | |||
number and changeid, and it must be preconditioned with an If-Match or If-None-Match header: | |||
> POST <collection-url>/records/<key> | |||
> Authorization: <hawk auth parameters> | > Authorization: <hawk auth parameters> | ||
> If-Match: 125-HASH1 | |||
> { | > { | ||
> | > "payload": "newpayload1", | ||
> " | > "seqnum": 126, | ||
> | > "changeid": "NEWHASH1", | ||
> } | > "signature": "newsig1" | ||
> } | |||
. | . | ||
< | < 204 No Content | ||
The server will check that the seqnum and changeid hash chains are properly formed before applying the change. If they are not then | |||
an error will be reported: | |||
> POST <collection-url>/records/<key> | |||
> Authorization: <hawk auth parameters> | |||
> If-Match: 120-OLD-HASH | |||
> { | |||
> "payload": "newpayload1", | |||
> "seqnum": 126, | |||
> "changeid": "NEWHASH1", | |||
> "signature": "newsig1" | |||
> } | |||
. | |||
< 412 Precondition Failed | |||
< ETag: 125-HASH1 | |||
No content is returned in response to a POST. The client has already calculated the new seqnum and changeid for the collection, so | |||
there is no more useful information that the server can provide. | |||