Overview

The Message Queue project is part of Project Sagrada , providing a service for applications to queue messages for clients.

Project

Engineers

Ben Bangert

User Requirements

Phase 1

NOTE: This phase is being somewhat skipped at the moment to Phase 2

The first version of the MQ will focus on providing a useful service for the Notifications project to enqueue messages for many queues and for clients to poll for messages on a given queue.

No message aggregation is done in this version, if a client needs to get messages for more than one subscription it must poll each queue and aggregate them.

The first version could be considered a Message Store rather than a queue as it supports a much richer set of query semantics and does not let public consumers remove messages. Messages can be removed via the entire queue being deleted by the App or by expiring.

Requirements:

Service App can create queues
Service App can add messages to queue
Messages on queues expire
Clients may read any queue they are aware of

Phase 2

The second version allows authenticated applications to queue messages and for clients to consume them. This model allows for a worker model where jobs are added to a queue that multiple clients may be watching, and each message will be given to an individual client.

Optionally, the client can 'reserve' a message to indicate it would like it, and if the client does not verify that it has successfully handled the message it will be put back in the queue or to a fail/retry queue if desired.

Requirements:

Service App can create queues
Service App can add messages to queue
Authenticated clients may consume messages from a queue.
Authenticated clients may mark a message for consumption with a reservation TTL.

Architecture

For scalability purposes, since some messages may be retained for periods of time per Phase 1 requirements, the initial choice of a backend is Cassandra. However, for Phase 2 requirements, Cassandra is missing the ability to manage and coordinate queue consumers so Zookeeper is being used for distributed synchronization.

When used for Notifications, each user will have a single queue per Notification Application, this helps ensure that even for an individual user receiving many messages, they are partitioned by the Notification Application to ensure a more level distribution rather than a single deep queue.

Under the Phase 2 use-case, it is most likely desired that massive amounts of messages may be intended for the same queue, which would normally result in a single extremely deep queue. Deep queue's do not scale well horizontally, especially when using Cassandra which maximizes through-put across many row-keys.

The other issue is that to consume messages, there are only two effective ways of marking a message as consumed:

1. The message may be deleted after it has been sent. The only problem in this case is that there is still no guarantee it has been processed, and acquiring a lock to consume a message, and the resulting write-back to delete it is expensive.

2. One consumer per queue/partition. This requires some guess-work up-front about how many consumers will be around. Since consumers can consume from multiple queue/partition's at once, there needs to be at least as many partitions for a queue, as desired consumers. The advantage with this approach is that locking is only necessary when adding/removing consumers to ensure one consumer per partition.

This MessageQueue project goes with the second option, and only incurs the lock during consumer addition/removal. The MessageQueue also does not track the state or last message read in the queue/partition's, it is the consumers responsibility to track how far it has read and processed successfully. There is an API call available to record with the MessageQueue how far a consumer has successfully processed in a queue/partition.

When reading messages for a processing workload, they should be read in batches for performance to avoid network latency overhead.

Since messages are stored and read by timestamp from Cassandra, and Cassandra only has eventual consistency, there is an enforced delay in how soon a message is availalable for reading to ensure a consistent picture of the queue. This is no less than 5 seconds after insertion, and at most about 15 seconds.

When using queue's that are to be consumed, they must be declared up-front as a partioned queue. The amount of partitions should also be specified, and new messages will be randomly partioned. If messages should be processed in order, they can be inserted into a single partition to enforce ordering. All messages that are randomly partitioned should be considered loosely ordered.

Proposed API (Phase 1)

For the first version, applications allowed to use the Message Queue will be given an application key that must be sent with every request, and their IP must be on an accepted IP list for the given application key.

The application key must be sent as a HTTP header named 'ApplicationKey'.

Internal Apps

POST /queue

   Creates a new queue, optionally takes the queue name as POST param queue_name.

DELETE /queue/{queue_name}

   Deletes a given queue created by the App. When the param delete is set to false, then
   the queue contents will be deleted, but the queue will remain registered.

POST /queue/{queue_name}

   Create a message on the given queue. Contents is expected to be
   a JSON object.
   
   Raises an error if the queue does not exist.

Clients

GET /queue/{queue_name}

   Returns all messages for the queue. Can optionally be passed
   one of several query parameters:
       
       since_timestamp - All messages newer than this timestamp, should be formatted as seconds since epoch in GMT
       limit           - Only return N amount of messages
   
   Messages are returned in order of newest to oldest.

Use Cases

Notifications

The new version of Notifications will use the MQ to store messages for users from various websites and allow users to poll the MQ directly for new notifications they have allowed.

Internal app add's messages to a queue by queue name
Internal app can delete queues by name
Devices can read a queue by name
Devices can read the X most recent messages
Devices can read messages since XXX
Messages are encoded as JSON

Services/MessageQueuing