Raindrop/Strawman

Whereas

the couchdb version of raindrop scales very poorly, for a variety of reasons which we understand more or less well.

and

even if we could get an order of magnitude increase in the performance of couchdb in our architecture, it'd still be very unaffordable.

and

our data model abstractions currently leak too much to the HTTP APIs

and

our "spread out" data model makes it too hard for newcomers to understand and work with raindrop

It is resolved that

We will do a Raindrop reset.

This reset prioritizes:

APIs that make it easier to do front-ends
an architecture that takes hosted scaling into consideration
minimal delta of work that make it possible to get the existing front-ends working w/ a back-end that's a) faster, b) closer to where we think we'll need to go.

After talking to a bunch of people, I'm proposing the follow strawman proposal:

1) We stop using couchdb as a queue, and use a queue instead. Specifically, we use a message queue (rabbit-mq or apache-mq). This would enable:

understanding the performance cost of message fetching, and allocating those to specific processing units (processes, nodes, etc.)

2) We define a clear HTTP API for use by inflow & other front-ends.

3) We use a fast key-value storage to keep track of which remote messages we already have (by IMAP UIDs, twitter timestamps, whatever)

4) We use a blob storage to keep raw messages and JSON-normalized messages.

5) We evaluate merging bits of gloda and porting some of the existing Raindrop extensions to create a pipeline. This bit of code would likely be the most "custom" bit of the Raindrop backend (along w/ the API handling), with everything else quite "stock".

The primary reason for using Gloda here are:
- it's a message-aware ORM that knows about conversations, mailing lists, contacts vs. identities, etc.
- it's got known worst-case performance characteristics which are reasonable (about 15 msgs/sec including full-text-search)

Note: I'm not implying a commitment to using Gloda in the long term for Raindrop, but just as a way to get us to the next stage in the architecture.

There are interesting complications due to using Gloda, including at least:
- need to interface w/ the message queue
- need to run on xulrunner/xpcshell+event loop
- currently tied to sqlite

Note: It's likely that long term we'd move from sqlite to a more scalable DB.

Raindrop/Strawman

Navigation menu

Search