Loop/Architecture/Redis

From MozillaWiki
< Loop‎ | Architecture
Revision as of 10:10, 17 September 2014 by Natim (talk | contribs)
Jump to navigation Jump to search

Note that the changes proposed here are preliminary, and will be updated as the result of conversations with interested parties.

Date Author Changes
Wed Sep 17 11:48:12 CDT 2014 Rémy Hubscher Braistorming about using Redis as a Key-Value data store

Architecting ElastiCache Redis Tier

What does ElastiCache gives us?

  • A Master with up to 5 read-only replica. If the master goes down, the data is still available as read-only.
  • It is possible to promote a read-only replica as the new master using an API call:
https://elasticache.us-east-1.amazonaws.com/
  ?Action=ModifyReplicationGroup
  &ReplicationGroupId=my-repgroup
  &PrimaryClusterId=my-replica-1  
  &Version=2013-06-15
  &SignatureVersion=4
  &SignatureMethod=HmacSHA256
  &Timestamp=20140421T220302Z
  &X-Amz-Algorithm=AWS4-HMAC-SHA256
  &X-Amz-Date=20140421T220302Z
  &X-Amz-SignedHeaders=Host
  &X-Amz-Expires=20140421T220302Z
  &X-Amz-Credential=<credential>
  &X-Amz-Signature=<signature>
  • The maximum ElastiCache size is 60GB

What are our options

First of all we need to think about sharding never the less 60GB is a too close constrain as we keep adding data that can stay for longer than before inside the database (rooms-url, sessionToken, call-urls)

We need sharding to make sure we will continue to scale up our cluster even if we need to store more than 60GB of data in it.

My plan is to build a cluster with 10 small elasticache redis instance (and there read only replicas) that we will be able to scale up using TwemProxy.

TwemProxy will be responsible for sharding the keys. I will also create a worker that will make sure each node of the master cluster doesn't turn to a read-only master. If that happen, an API call will be made on the cluster to promote a replica as a new master.

What do we need to do on redis client side?

  • First we need to add TwemProxy to our stack (deploy and configure it on each loop-server node).
  • Then we need to replace all multiple keys Redis call to async calls so that TwemProxy will be able to shard the request for each keys.

During the replication phase the loop-server will return a 503 with a Backoff header that will let the loop-client now how many times it should wait before retrying. This window is the needed time for the new master promotion.