Raindrop/Strawman: Difference between revisions

← Older edit

Raindrop/Strawman (view source)

Revision as of 22:56, 8 June 2010

732 bytes removed , 8 June 2010

no edit summary

DavidA

Confirmed users, Bureaucrats and Sysops emeriti

525

edits

@@ Line 23: / Line 23: @@
 * APIs that make it easier to do front-ends
 * an architecture that takes hosted scaling into consideration
-* minimal delta of work that make it possible to get the existing front-ends working w/ a back-end that's a) faster, b) closer to where we think we'll need to go.
+* use existing battle-tested technologies when possible
 -----
@@ Line 29: / Line 29: @@
 After talking to a bunch of people, I'm proposing the follow strawman proposal:
-) We stop using couchdb as a queue, and use a queue instead.  Specifically, we use a message queue (rabbit-mq or apache-mq).  This would enable:
+) We stop using couchdb as a queue, and use a queue instead.  Specifically, we use a message queue (rabbit-mq gets consensus).  This would enable:
 * understanding the performance cost of message fetching, and allocating those to specific processing units (processes, nodes, etc.)
+* better horizontal scaling
 ) We define a clear HTTP API for use by inflow & other front-ends.
-) We use a fast key-value storage to keep track of which remote messages we already have (by IMAP UIDs, twitter timestamps, whatever)
+) We use a blob storage to keep raw messages and JSON-normalized messages.  MogileFS is a candidate for the hosted version, but we'd probably want a trivial python equivalent for localhost dev.
-) We use a blob storage to keep raw messages and JSON-normalized messages.
+) We optimize the pipeline to do all processing of messages in memory, only writing the final processed objects to disk at the end of processing a message, to save massively on DB use.
-) We evaluate merging bits of gloda and porting some of the existing Raindrop extensions to create a pipeline.  This bit of code would likely be the most "custom" bit of the Raindrop backend (along w/ the API handling), with everything else quite "stock".
+) We use a mature ORM (specifically SQLalchemy) as the Raindrop equivalent of Gloda.
-* The primary reason for using Gloda here are:
+[[File:Raindrop_reset_large.png|200px|thumb|left]]
-** it's a message-aware ORM that knows about conversations, mailing lists, contacts vs. identities, etc.
-** it's got known worst-case performance characteristics which are reasonable (about 15 msgs/sec including full-text-search)
-* Note: I'm not implying a commitment to using Gloda in the long term for Raindrop, but just as a way to get us to the next stage in the architecture.
-* There are interesting complications due to using Gloda, including at least:
-** need to interface w/ the message queue
-** need to run on xulrunner/xpcshell+event loop
-** currently tied to sqlite
-* Note: It's likely that long term we'd move from sqlite to a more scalable DB.