Raindrop/Strawman

From MozillaWiki
< Raindrop
Revision as of 17:03, 8 June 2010 by DavidA (talk | contribs) (Created page with 'Whereas the couchdb version of raindrop scales very poorly, for a variety of reasons which we understand more or less well. and even if we could get an order of magnitude incr…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Whereas

the couchdb version of raindrop scales very poorly, for a variety of reasons which we understand more or less well.

and

even if we could get an order of magnitude increase in the performance of couchdb in our architecture, it'd still be very unaffordable.

and

our data model abstractions currently leak too much to the HTTP APIs

and

our "spread out" data model makes it too hard for newcomers to understand and work with raindrop

It is resolved that

We will do a Raindrop reset.

This reset prioritizes:

a) APIs that make it easier to do front-ends b) an architecture that takes hosted scaling into consideration c) Minimal deltas that make it possible to avoid regressing the accomplishements made so far.


After talking to a bunch of people, I'm proposing the follow strawman proposal:

1) We stop using couchdb as a queue, and use a queue instead. Specifically, we use a message queue (rabbit-mq or apache-mq). This would enable:

  • understanding the performance cost of message fetching, and allocating those to specific processing units (processes, nodes, etc.)

2) We define a clear HTTP API for use by inflow & other front-ends.

3) We use a fast key-value storage to keep track of which remote messages we already have (by IMAP UIDs, twitter timestamps, whatever)

4) We use a blob storage to keep raw messages and JSON-normalized messages.

5) We evaluate merging bits of gloda and porting some of the existing Raindrop extensions to create a pipeline. This bit of code would likely be the most "custom" bit of the Raindrop backend (along w/ the API handling), with everything else quite "stock".

  • The primary reason for using Gloda here are:
    • it's a message-aware ORM that knows about conversations, mailing lists, contacts vs. identities, etc.
    • it's got known worst-case performance characteristics which are reasonable (about 15 msgs/sec including full-text-search)
  • Note: I'm not implying a commitment to using Gloda in the long term for Raindrop, but just as a way to get us to the next stage in the architecture.
  • There are interesting complications due to using Gloda, including at least:
    • need to interface w/ the message queue
    • need to run on xulrunner/xpcshell+event loop
    • currently tied to sqlite
  • Note: It's likely that long term we'd move from sqlite to a more scalable DB.