Confirmed users, Bureaucrats and Sysops emeriti
525
edits
(Created page with 'Whereas the couchdb version of raindrop scales very poorly, for a variety of reasons which we understand more or less well. and even if we could get an order of magnitude incr…') |
No edit summary |
||
(5 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
Whereas | Whereas | ||
the couchdb version of raindrop scales very poorly, for a variety of reasons which we understand more or less well. | * the couchdb version of raindrop scales very poorly, for a variety of reasons which we understand more or less well. | ||
and | and | ||
even if we could get an order of magnitude increase in the performance of couchdb in our architecture, it'd still be very unaffordable. | * even if we could get an order of magnitude increase in the performance of couchdb in our architecture, it'd still be very unaffordable. | ||
and | and | ||
our data model abstractions currently leak too much to the HTTP APIs | * our data model abstractions currently leak too much to the HTTP APIs | ||
and | and | ||
our "spread out" data model makes it too hard for newcomers to understand and work with raindrop | * our "spread out" data model makes it too hard for newcomers to understand and work with raindrop | ||
It is resolved that | It is resolved that | ||
We will do a Raindrop reset. | * We will do a Raindrop reset. | ||
This reset prioritizes: | This reset prioritizes: | ||
* APIs that make it easier to do front-ends | |||
* an architecture that takes hosted scaling into consideration | |||
* use existing battle-tested technologies when possible | |||
----- | ----- | ||
Line 29: | Line 29: | ||
After talking to a bunch of people, I'm proposing the follow strawman proposal: | After talking to a bunch of people, I'm proposing the follow strawman proposal: | ||
1) We stop using couchdb as a queue, and use a queue instead. Specifically, we use a message queue (rabbit-mq | 1) We stop using couchdb as a queue, and use a queue instead. Specifically, we use a message queue (rabbit-mq gets consensus). This would enable: | ||
* understanding the performance cost of message fetching, and allocating those to specific processing units (processes, nodes, etc.) | * understanding the performance cost of message fetching, and allocating those to specific processing units (processes, nodes, etc.) | ||
* better horizontal scaling | |||
2) We define a clear HTTP API for use by inflow & other front-ends. | 2) We define a clear HTTP API for use by inflow & other front-ends. | ||
3) We use a | 3) We use a blob storage to keep raw messages and JSON-normalized messages. MogileFS is a candidate for the hosted version, but we'd probably want a trivial python equivalent for localhost dev. | ||
4) We | 4) We optimize the pipeline to do all processing of messages in memory, only writing the final processed objects to disk at the end of processing a message, to save massively on DB use. | ||
5) We | 5) We use a mature ORM (specifically SQLalchemy) as the Raindrop equivalent of Gloda. | ||
[[File:Raindrop_reset_large.png|200px|thumb|left]] | |||