Auto-tools/Projects/Pulse/PulseGuardian: Difference between revisions
m (PeerGuardian -> PulseGuardian, and added implementation details) |
|||
Line 1: | Line 1: | ||
= Team = | = Team = | ||
* mcote, dkl, | * mcote, dkl, akachkach | ||
= Problem = | = Problem = | ||
Line 24: | Line 24: | ||
The second part is a process that polls RabbitMQ, looking for queues above a set length (WARN_QUEUE_SIZE). If the queue belongs to a user with a properly formatted username (i.e. an email address), a warning email is sent containing the queue name and current queue length. After a second threshold is reached (DEL_QUEUE_SIZE), the queue is deleted, and another email is sent. If the username is not a proper email address (e.g. the public user), the queue is silently deleted when DEL_QUEUE_SIZE is reached (no action is performed at WARN_QUEUE_SIZE). | The second part is a process that polls RabbitMQ, looking for queues above a set length (WARN_QUEUE_SIZE). If the queue belongs to a user with a properly formatted username (i.e. an email address), a warning email is sent containing the queue name and current queue length. After a second threshold is reached (DEL_QUEUE_SIZE), the queue is deleted, and another email is sent. If the username is not a proper email address (e.g. the public user), the queue is silently deleted when DEL_QUEUE_SIZE is reached (no action is performed at WARN_QUEUE_SIZE). | ||
We can also, optionally, add a threshold between WARN_QUEUE_SIZE and DEL_QUEUE_SIZE, call it ARCHIVE_QUEUE_SIZE, at which point | We can also, optionally, add a threshold between WARN_QUEUE_SIZE and DEL_QUEUE_SIZE, call it ARCHIVE_QUEUE_SIZE, at which point PulseGuardian will start to consume messages from the queue and archive them to disk. This is advantageous because RabbitMQ keeps all queues in memory, so one rogue queue can eventually take down RabbitMQ. If the queue size falls below ARCHIVE_QUEUE_SIZE, presumably due to the client application resuming, no new messages will be archived unless ARCHIVE_QUEUE_SIZE is exceeded again. When MAX_ARCHIVE_SIZE messages are archived, messages are no longer consumed by PulseGuardian and thus, unless archived messages are consumed by the client, the queue will continue to grow until DEL_QUEUE_SIZE is hit and the queue deleted, as above. | ||
We'll have to think through this feature a bit to determine the implications of a client trying to consume while | We'll have to think through this feature a bit to determine the implications of a client trying to consume while PulseGuardian is also consuming them (or trying to). | ||
= Implementation = | = Implementation = | ||
PulseGuardian uses Flask for the user management app and sqlalchemy + mysql to store user data. | |||
Communication with RabbitMQ | Communication with RabbitMQ is done via the rabbitmq management plugin's REST API. |
Revision as of 21:54, 30 April 2014
Team
- mcote, dkl, akachkach
Problem
We use RabbitMQ as a pub/sub service which currently allows anyone to subscribe to any queue via a common user account. Some client applications use durable queues in case they crash; however, sometimes these queues are created by accident, and sometimes apps crash without admins noticing. In these cases, the queues continue to grow without bound, which can eventually result in the RabbitMQ host running out of memory. Our current solution is to have Nagios monitor the queues and send alerts when any queues exceed a certain number of unread or unacknowledged messages, at which point a RabbitMQ admin attempts to find the person responsible and/or delete the offending queue.
Goals & Considerations
We need an intelligent system to handle overgrowing queues. The system should have some way to automatically alert the queue's owner, eventually deleting the queue if no action has been taken.
A further improvement would be to automatically consume messages and write them to disk for later consumption, since this would at least free up memory. This system would also need a limit to avoid consuming too much disk space, after which (with a further alert) the queue would be killed. There would need to be a convenient way to consume archived messages.
Non-Goals
Design and Approach
PulseGuardian will need to know who owns a given queue in order to attempt to contact its owner. There are two good choices: the queue name and the username. The former is simple to set up, since it is entirely defined by the client. We could just use a convention, such as appname_email, where "appname" can be anything, and "email" should be a valid email address. However, since pulse is a public resource, this is open to abuse; anyone could provide anyone else's email, potentially deluging them with pulse messages.
A more secure way is to provide email validation. Thus we will need a simple web client that performs standard registration: accepts a username and password, emails a verification link/code, and creates the user in RabbitMQ when verified. It should also provide a method to reset a user's password and to delete the user. Finally, it should provide a method (REST API) to download archived messages (see below).
The second part is a process that polls RabbitMQ, looking for queues above a set length (WARN_QUEUE_SIZE). If the queue belongs to a user with a properly formatted username (i.e. an email address), a warning email is sent containing the queue name and current queue length. After a second threshold is reached (DEL_QUEUE_SIZE), the queue is deleted, and another email is sent. If the username is not a proper email address (e.g. the public user), the queue is silently deleted when DEL_QUEUE_SIZE is reached (no action is performed at WARN_QUEUE_SIZE).
We can also, optionally, add a threshold between WARN_QUEUE_SIZE and DEL_QUEUE_SIZE, call it ARCHIVE_QUEUE_SIZE, at which point PulseGuardian will start to consume messages from the queue and archive them to disk. This is advantageous because RabbitMQ keeps all queues in memory, so one rogue queue can eventually take down RabbitMQ. If the queue size falls below ARCHIVE_QUEUE_SIZE, presumably due to the client application resuming, no new messages will be archived unless ARCHIVE_QUEUE_SIZE is exceeded again. When MAX_ARCHIVE_SIZE messages are archived, messages are no longer consumed by PulseGuardian and thus, unless archived messages are consumed by the client, the queue will continue to grow until DEL_QUEUE_SIZE is hit and the queue deleted, as above.
We'll have to think through this feature a bit to determine the implications of a client trying to consume while PulseGuardian is also consuming them (or trying to).
Implementation
PulseGuardian uses Flask for the user management app and sqlalchemy + mysql to store user data.
Communication with RabbitMQ is done via the rabbitmq management plugin's REST API.