Raindrop/Strawman

From MozillaWiki
< Raindrop
Revision as of 17:54, 8 June 2010 by DavidA (talk | contribs)
Jump to navigation Jump to search

Whereas

  • the couchdb version of raindrop scales very poorly, for a variety of reasons which we understand more or less well.

and

  • even if we could get an order of magnitude increase in the performance of couchdb in our architecture, it'd still be very unaffordable.

and

  • our data model abstractions currently leak too much to the HTTP APIs

and

  • our "spread out" data model makes it too hard for newcomers to understand and work with raindrop

It is resolved that

  • We will do a Raindrop reset.

This reset prioritizes:

  • APIs that make it easier to do front-ends
  • an architecture that takes hosted scaling into consideration
  • minimal delta of work that make it possible to get the existing front-ends working w/ a back-end that's a) faster, b) closer to where we think we'll need to go.

After talking to a bunch of people, I'm proposing the follow strawman proposal:

1) We stop using couchdb as a queue, and use a queue instead. Specifically, we use a message queue (rabbit-mq or apache-mq). This would enable:

  • understanding the performance cost of message fetching, and allocating those to specific processing units (processes, nodes, etc.)

2) We define a clear HTTP API for use by inflow & other front-ends.

3) We use a fast key-value storage to keep track of which remote messages we already have (by IMAP UIDs, twitter timestamps, whatever)

4) We use a blob storage to keep raw messages and JSON-normalized messages.

5) We evaluate merging bits of gloda and porting some of the existing Raindrop extensions to create a pipeline. This bit of code would likely be the most "custom" bit of the Raindrop backend (along w/ the API handling), with everything else quite "stock".

  • The primary reason for using Gloda here are:
    • it's a message-aware ORM that knows about conversations, mailing lists, contacts vs. identities, etc.
    • it's got known worst-case performance characteristics which are reasonable (about 15 msgs/sec including full-text-search)
  • Note: I'm not implying a commitment to using Gloda in the long term for Raindrop, but just as a way to get us to the next stage in the architecture.
  • There are interesting complications due to using Gloda, including at least:
    • need to interface w/ the message queue
    • need to run on xulrunner/xpcshell+event loop
    • currently tied to sqlite
  • Note: It's likely that long term we'd move from sqlite to a more scalable DB.