Auto-tools/Projects/Pulse/PulseGuardian: Difference between revisions
m (PeerGuardian -> PulseGuardian, and added implementation details) |
(Added info about getting a queue's user and such) |
||
Line 18: | Line 18: | ||
= Design and Approach = | = Design and Approach = | ||
PulseGuardian will need to know who owns a given queue in order to attempt to contact its owner. | PulseGuardian will need to know who owns a given queue in order to attempt to contact its owner. Since we currently use the same user for all consumers, we have no way to know which person to contact. | ||
We need a web app that performs standard registration: accepts a username and password, emails a verification link/code, and creates the user in RabbitMQ when verified. It should also provide a method to reset a user's password and to delete the user. Finally, it should provide a method (REST API) to download archived messages (see below). | |||
The second part is a process that polls RabbitMQ, looking for queues above a set length (WARN_QUEUE_SIZE). If the queue belongs to a user with a properly formatted username (i.e. an email address), a warning email is sent containing the queue name and current queue length. After a second threshold is reached (DEL_QUEUE_SIZE), the queue is deleted, and another email is sent. If the username is not a proper email address (e.g. the public user), the queue is silently deleted when DEL_QUEUE_SIZE is reached (no action is performed at WARN_QUEUE_SIZE). | The second part is a process that polls RabbitMQ, looking for queues above a set length (WARN_QUEUE_SIZE). If the queue belongs to a user with a properly formatted username (i.e. an email address), a warning email is sent containing the queue name and current queue length. After a second threshold is reached (DEL_QUEUE_SIZE), the queue is deleted, and another email is sent. If the username is not a proper email address (e.g. the public user), the queue is silently deleted when DEL_QUEUE_SIZE is reached (no action is performed at WARN_QUEUE_SIZE). | ||
Line 27: | Line 27: | ||
We'll have to think through this feature a bit to determine the implications of a client trying to consume while PulseGuardian is also consuming them (or trying to). | We'll have to think through this feature a bit to determine the implications of a client trying to consume while PulseGuardian is also consuming them (or trying to). | ||
= Notes = | |||
As we RabbitMQ's management plugin API doesn't give us the user who created a queue, we'll probably have to poll RabbitMQ to detect queues that aren't assigned to any user and assign each of them to the user of the consumer currently consuming them (reminder: In our pattern, we should only have one consumer maximum per queue) | |||
= Implementation = | = Implementation = |
Revision as of 22:42, 30 April 2014
Team
- mcote, dkl, akachkach
Problem
We use RabbitMQ as a pub/sub service which currently allows anyone to subscribe to any queue via a common user account. Some client applications use durable queues in case they crash; however, sometimes these queues are created by accident, and sometimes apps crash without admins noticing. In these cases, the queues continue to grow without bound, which can eventually result in the RabbitMQ host running out of memory. Our current solution is to have Nagios monitor the queues and send alerts when any queues exceed a certain number of unread or unacknowledged messages, at which point a RabbitMQ admin attempts to find the person responsible and/or delete the offending queue.
Goals & Considerations
We need an intelligent system to handle overgrowing queues. The system should have some way to automatically alert the queue's owner, eventually deleting the queue if no action has been taken.
A further improvement would be to automatically consume messages and write them to disk for later consumption, since this would at least free up memory. This system would also need a limit to avoid consuming too much disk space, after which (with a further alert) the queue would be killed. There would need to be a convenient way to consume archived messages.
Non-Goals
Design and Approach
PulseGuardian will need to know who owns a given queue in order to attempt to contact its owner. Since we currently use the same user for all consumers, we have no way to know which person to contact.
We need a web app that performs standard registration: accepts a username and password, emails a verification link/code, and creates the user in RabbitMQ when verified. It should also provide a method to reset a user's password and to delete the user. Finally, it should provide a method (REST API) to download archived messages (see below).
The second part is a process that polls RabbitMQ, looking for queues above a set length (WARN_QUEUE_SIZE). If the queue belongs to a user with a properly formatted username (i.e. an email address), a warning email is sent containing the queue name and current queue length. After a second threshold is reached (DEL_QUEUE_SIZE), the queue is deleted, and another email is sent. If the username is not a proper email address (e.g. the public user), the queue is silently deleted when DEL_QUEUE_SIZE is reached (no action is performed at WARN_QUEUE_SIZE).
We can also, optionally, add a threshold between WARN_QUEUE_SIZE and DEL_QUEUE_SIZE, call it ARCHIVE_QUEUE_SIZE, at which point PulseGuardian will start to consume messages from the queue and archive them to disk. This is advantageous because RabbitMQ keeps all queues in memory, so one rogue queue can eventually take down RabbitMQ. If the queue size falls below ARCHIVE_QUEUE_SIZE, presumably due to the client application resuming, no new messages will be archived unless ARCHIVE_QUEUE_SIZE is exceeded again. When MAX_ARCHIVE_SIZE messages are archived, messages are no longer consumed by PulseGuardian and thus, unless archived messages are consumed by the client, the queue will continue to grow until DEL_QUEUE_SIZE is hit and the queue deleted, as above.
We'll have to think through this feature a bit to determine the implications of a client trying to consume while PulseGuardian is also consuming them (or trying to).
Notes
As we RabbitMQ's management plugin API doesn't give us the user who created a queue, we'll probably have to poll RabbitMQ to detect queues that aren't assigned to any user and assign each of them to the user of the consumer currently consuming them (reminder: In our pattern, we should only have one consumer maximum per queue)
Implementation
PulseGuardian uses Flask for the user management app and sqlalchemy + mysql to store user data.
Communication with RabbitMQ is done via the rabbitmq management plugin's REST API.