Raindrop/Strawman: Difference between revisions

no edit summary
No edit summary
No edit summary
 
(3 intermediate revisions by the same user not shown)
Line 23: Line 23:
* APIs that make it easier to do front-ends
* APIs that make it easier to do front-ends
* an architecture that takes hosted scaling into consideration
* an architecture that takes hosted scaling into consideration
* minimal delta of work that make it possible to get the existing front-ends working w/ a back-end that's a) faster, b) closer to where we think we'll need to go.
* use existing battle-tested technologies when possible


-----
-----
Line 29: Line 29:
After talking to a bunch of people, I'm proposing the follow strawman proposal:
After talking to a bunch of people, I'm proposing the follow strawman proposal:


1) We stop using couchdb as a queue, and use a queue instead.  Specifically, we use a message queue (rabbit-mq or apache-mq).  This would enable:
1) We stop using couchdb as a queue, and use a queue instead.  Specifically, we use a message queue (rabbit-mq gets consensus).  This would enable:


* understanding the performance cost of message fetching, and allocating those to specific processing units (processes, nodes, etc.)
* understanding the performance cost of message fetching, and allocating those to specific processing units (processes, nodes, etc.)
* better horizontal scaling


2) We define a clear HTTP API for use by inflow & other front-ends.
2) We define a clear HTTP API for use by inflow & other front-ends.


3) We use a fast key-value storage to keep track of which remote messages we already have (by IMAP UIDs, twitter timestamps, whatever)
3) We use a blob storage to keep raw messages and JSON-normalized messages.  MogileFS is a candidate for the hosted version, but we'd probably want a trivial python equivalent for localhost dev.


4) We use a blob storage to keep raw messages and JSON-normalized messages.
4) We optimize the pipeline to do all processing of messages in memory, only writing the final processed objects to disk at the end of processing a message, to save massively on DB use.


5) We evaluate merging bits of gloda and porting some of the existing Raindrop extensions to create a pipelineThis bit of code would likely be the most "custom" bit of the Raindrop backend (along w/ the API handling), with everything else quite "stock".
5) We use a mature ORM (specifically SQLalchemy) as the Raindrop equivalent of Gloda.   


* The primary reason for using Gloda here are:
[[File:Raindrop_reset_large.png|200px|thumb|left]]
** it's a message-aware ORM that knows about conversations, mailing lists, contacts vs. identities, etc.
** it's got known worst-case performance characteristics which are reasonable (about 15 msgs/sec including full-text-search)
 
* Note: I'm not implying a commitment to using Gloda in the long term for Raindrop, but just as a way to get us to the next stage in the architecture.
 
* There are interesting complications due to using Gloda, including at least:
** need to interface w/ the message queue
** need to run on xulrunner/xpcshell+event loop
** currently tied to sqlite
 
* Note: It's likely that long term we'd move from sqlite to a more scalable DB.
Confirmed users, Bureaucrats and Sysops emeriti
525

edits