Labs/Bespin/DesignDocs/TimeMachine: Difference between revisions

← Older edit

Labs/Bespin/DesignDocs/TimeMachine (view source)

Revision as of 12:44, 10 September 2009

1,700 bytes added , 10 September 2009

→‎Data Storage

JoeWalker

Confirmed users

295

edits

@@ Line 2: / Line 2: @@
 Time Machine is designed to solve 2 basic problems in Bespin:
-- Allow users to get the text back that someone else deleted
+* Allow users to get the text back that someone else deleted
-- Allow viewing the evolution of some text over time
+* Allow viewing the evolution of some text over time
 = API =
@@ Line 34: / Line 34: @@
 * date: We will need to specify a format for the time.
 * owner: VCS specified username for VCS originated change. Bespin specified for other types of change. To begin with, we do not expect to link VCS usernames to bespin usernames. This linkage could be useful over time.
-* size: The size field comes from something like 'diff $file | wc -l' / 'cat $file | wc -l' (clearly that won't work exactly, but the point is the same). From my investigations so far, it would seem that it's going to be a significant performance drain getting this to work.
+* <strike>size: The size field comes from something like 'diff $file | wc -l' / 'cat $file | wc -l' (clearly that won't work exactly, but the point is the same). From my investigations so far, it would seem that it's going to be a significant performance drain getting this to work.</strike> ''We're not planning on doing a churn graph right now''
 * description: Simple in the VCS case. For the 'save' source we might take the users status on save. For the 'undo' source we might take an editor provided description of the action e.g. 'typing', 'format source'.
 === Issues ===
-* How do we handle VCS rename/copy operations? Is it valid to include changes made under an old filename and ignore the fact that the file may no longer be valid in its new position.
+* How do we handle VCS rename/copy operations? Is it valid to include changes made under an old filename and ignore the fact that the file may no longer be valid in its new position. (Answer: we lose the data when you move the file.)
-* Can we tie bespin usernames to VCS usernames. Tying them together will helps colorization to be consistent.
+* Can we tie bespin usernames to VCS usernames. Tying them together will helps colorization to be consistent. (Answer: we don't - for now, but should later)
 * Can we create descriptions (e.g. 'typing', 'format source') out the back of mobwrite?
 * Can we create descriptions for save commands?
 * Is what comes out of mobwrite going to be too granular
 * How are we going to store all this data?
-* Can we create a useful size parameter? Is this going to be viable for SVN and other non-D VCSs?
+* Can we create a useful size parameter? Is this going to be viable for SVN and other non-D VCSs? (Answer: We're not attempting this for any VCS for now)
 * Is there any benefit in transmitting the diff with the rest of the history? We are currently assuming that there isn't because the files are easily accessible, and only a diff might be slow to make use of if you need to work from the start/end. Also diffs prevent us from interleaving sources.
-* For none-D VCSs, what's the performance going to be like? Can we use git svn to mitigate the issues?
+* For none-D VCSs, what's the performance going to be like? Can we use git svn to mitigate the issues? (Answer: Probably not - for now, we're not supporting the VCS portion on SVN)
 It is expected that initially for simplicities sake, we will not be interleaving source, so the diff queue would begin with undo, then save, then vcs info. The system ought to allow source interleaving though.
@@ Line 62: / Line 62: @@
 * We probably need to easily distinguish the source of a version id without needing to go to vcs / save history and mobwrite in turn.
 * What is non-D VCS performance going to be like? (See above)
+= Data Storage =
+Currently undo data is stored in a mini repository - repoistory.py. The storage format is defined in the pydoc for that package. Briefly:
+The features of this repository are:
+* lightweight: ie easy to code in the first instance
+* upgradable: so the disk format can evolve to be more efficient
+* potentially performant: file reads could be O(n) on history length
+It is not however:
+* distributed
+* non-linear. There is no DAG
+The on disk format is a series of lines as follows
+ hex(time):owner:method:urlencode(comment):data
+For example
+aa51a00:joewalker:int:example command:data
+aa61e8f:joewalker:int:another example:more data
+Where:
+* hex(time): an 8 character string (for the next few years) following the python way of using seconds since the epoch e.g. 4aa61e8f
+* owner: is the bespin username of the change creator
+* method: is one of [int|ext|delta|zint|zext] Currently the only supported value is 'int' however the following are planned
+** int: The contents is stored in the data field (at the end of the line)
+** ext: The contents are written to a file whose name is in the data field
+** delta: The contents are the value of the previous record with the change in the delta field applied
+** zint|zext: As int|ext except the contents are compresses with zlib
+* comment: A comment (where possible) for the change
+* data: As interpreted by the 'method' field
 = User Interface =