Raindrop/Strawman: Difference between revisions
No edit summary |
No edit summary |
||
Line 23: | Line 23: | ||
* APIs that make it easier to do front-ends | * APIs that make it easier to do front-ends | ||
* an architecture that takes hosted scaling into consideration | * an architecture that takes hosted scaling into consideration | ||
* minimal delta of work that make it possible to get the existing front-ends working w/ a back-end that's a) faster, b) closer to where we think we'll need to go. | ** minimal delta of work that make it possible to get the existing front-ends working w/ a back-end that's a) faster, b) closer to where we think we'll need to go. | ||
----- | ----- |
Revision as of 18:03, 8 June 2010
Whereas
- the couchdb version of raindrop scales very poorly, for a variety of reasons which we understand more or less well.
and
- even if we could get an order of magnitude increase in the performance of couchdb in our architecture, it'd still be very unaffordable.
and
- our data model abstractions currently leak too much to the HTTP APIs
and
- our "spread out" data model makes it too hard for newcomers to understand and work with raindrop
It is resolved that
- We will do a Raindrop reset.
This reset prioritizes:
- APIs that make it easier to do front-ends
- an architecture that takes hosted scaling into consideration
- minimal delta of work that make it possible to get the existing front-ends working w/ a back-end that's a) faster, b) closer to where we think we'll need to go.
After talking to a bunch of people, I'm proposing the follow strawman proposal:
1) We stop using couchdb as a queue, and use a queue instead. Specifically, we use a message queue (rabbit-mq or apache-mq). This would enable:
- understanding the performance cost of message fetching, and allocating those to specific processing units (processes, nodes, etc.)
2) We define a clear HTTP API for use by inflow & other front-ends.
3) We use a fast key-value storage to keep track of which remote messages we already have (by IMAP UIDs, twitter timestamps, whatever)
4) We use a blob storage to keep raw messages and JSON-normalized messages.
5) We evaluate merging bits of gloda and porting some of the existing Raindrop extensions to create a pipeline. This bit of code would likely be the most "custom" bit of the Raindrop backend (along w/ the API handling), with everything else quite "stock".
- The primary reason for using Gloda here are:
- it's a message-aware ORM that knows about conversations, mailing lists, contacts vs. identities, etc.
- it's got known worst-case performance characteristics which are reasonable (about 15 msgs/sec including full-text-search)
- Note: I'm not implying a commitment to using Gloda in the long term for Raindrop, but just as a way to get us to the next stage in the architecture.
- There are interesting complications due to using Gloda, including at least:
- need to interface w/ the message queue
- need to run on xulrunner/xpcshell+event loop
- currently tied to sqlite
- Note: It's likely that long term we'd move from sqlite to a more scalable DB.