Confirmed users
1,927
edits
(Added info about getting a queue's user and such) |
No edit summary |
||
Line 5: | Line 5: | ||
= Problem = | = Problem = | ||
We use RabbitMQ as a pub/sub service which currently allows anyone to subscribe to any | We use RabbitMQ as a pub/sub service which currently allows anyone to subscribe to any exchange via a common user account. Some client applications use durable queues in case they crash; however, sometimes these queues are created by accident, and sometimes apps crash without admins noticing. In these cases, the queues continue to grow without bound, which can eventually result in the RabbitMQ host running out of memory. Our current solution is to have Nagios monitor the queues and send alerts when any queues exceed a certain number of unread or unacknowledged messages, at which point a RabbitMQ admin attempts to find the person responsible and/or delete the offending queue. | ||
= Goals & Considerations= | = Goals & Considerations= | ||
Line 12: | Line 12: | ||
A further improvement would be to automatically consume messages and write them to disk for later consumption, since this would at least free up memory. This system would also need a limit to avoid consuming too much disk space, after which (with a further alert) the queue would be killed. There would need to be a convenient way to consume archived messages. | A further improvement would be to automatically consume messages and write them to disk for later consumption, since this would at least free up memory. This system would also need a limit to avoid consuming too much disk space, after which (with a further alert) the queue would be killed. There would need to be a convenient way to consume archived messages. | ||
= Design and Approach = | = Design and Approach = | ||
Line 22: | Line 19: | ||
We need a web app that performs standard registration: accepts a username and password, emails a verification link/code, and creates the user in RabbitMQ when verified. It should also provide a method to reset a user's password and to delete the user. Finally, it should provide a method (REST API) to download archived messages (see below). | We need a web app that performs standard registration: accepts a username and password, emails a verification link/code, and creates the user in RabbitMQ when verified. It should also provide a method to reset a user's password and to delete the user. Finally, it should provide a method (REST API) to download archived messages (see below). | ||
The second part is a process that polls RabbitMQ, looking for queues above a set length (WARN_QUEUE_SIZE). If the queue belongs to a user with a properly formatted username (i.e. an email address), a warning email is sent containing the queue name and current queue length. After a second threshold is reached (DEL_QUEUE_SIZE), the queue is deleted, and another email is sent. If the username is not a proper email address (e.g. the public user), the queue is | The second part is a process that polls RabbitMQ, looking for queues above a set length (WARN_QUEUE_SIZE). If the queue belongs to a user with a properly formatted username (i.e. an email address), a warning email is sent containing the queue name and current queue length. After a second threshold is reached (DEL_QUEUE_SIZE), the queue is deleted, and another email is sent. If the username is not a proper email address (e.g. the public user), the queue is deleted without a user notification when DEL_QUEUE_SIZE is reached (no action is performed at WARN_QUEUE_SIZE). | ||
Optionally, we can have admin email addresses that are also sent all notifications, including when there is no owner. | |||
Another optional feature is to add a threshold between WARN_QUEUE_SIZE and DEL_QUEUE_SIZE, call it ARCHIVE_QUEUE_SIZE, at which point PulseGuardian will start to consume messages from the queue and archive them to disk. This is advantageous because RabbitMQ keeps all queues in memory, so one rogue queue can eventually take down RabbitMQ. If the queue size falls below ARCHIVE_QUEUE_SIZE, presumably due to the client application resuming, no new messages will be archived unless ARCHIVE_QUEUE_SIZE is exceeded again. When MAX_ARCHIVE_SIZE messages are archived, messages are no longer consumed by PulseGuardian and thus, unless archived messages are consumed by the client, the queue will continue to grow until DEL_QUEUE_SIZE is hit and the queue deleted, as above. | |||
We'll have to think through this feature a bit to determine the implications of a client trying to consume while PulseGuardian is also consuming them (or trying to). | We'll have to think through this feature a bit to determine the implications of a client trying to consume while PulseGuardian is also consuming them (or trying to). | ||
Line 30: | Line 29: | ||
= Notes = | = Notes = | ||
As we RabbitMQ's management plugin API doesn't give us the user who created a queue, we'll probably have to poll RabbitMQ to detect queues that aren't assigned to any user and assign each of them to the user of the consumer currently consuming them (reminder: In our pattern, we should only have one consumer maximum per queue) | As we RabbitMQ's management plugin API doesn't give us the user who created a queue, we'll probably have to poll RabbitMQ to detect queues that aren't assigned to any user and assign each of them to the user of the consumer currently consuming them (reminder: In our pattern, we should only have one consumer maximum per queue). If we have not found an owner of a queue by the time it hits DEL_QUEUE_SIZE (because the consumer never stays connected long enough), the queue will be deleted. | ||
= Implementation = | = Implementation = |