BMO/ChangeNotificationSystem

< BMO
Revision as of 20:41, 14 August 2013 by Mcote (talk | contribs) (Created page with " = Team = BMO team (dkl, glob, mcote), ebryn (contract developer on front end), peterbe (advisor from web tools) = Problem = With a focus on Bugzilla as a platform, facilit...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Team

BMO team (dkl, glob, mcote), ebryn (contract developer on front end), peterbe (advisor from web tools)

Problem

With a focus on Bugzilla as a platform, facilitating responsive, JavaScript-based front ends, detecting changes to bugs in a timely fashion is increasingly important. At the moment, the only way to determine if a bug has been recently updated is to poll; there is no push mechanism of any kind. This model has inherent problems, including but not limited to scalability (opening and closing connections is costly) and performance (polling must go through Bugzilla's permission system and other logic layers).

Goals & Considerations

Provide a push-based notification system to inform clients of changes to bugs. Plan for scalability by minimizing server load and time from change to notification.

Providing details of what has changed is not necessary in the notification itself, although preferably this information would be available somehow, perhaps in a separate call.

Non-Goals

Re-implementing Bugzilla's permissions system is not an option. It is complex enough that changing the current model would be major surgery (and would further diverge BMO from upstream), and maintaining two parallel implementations would incur maintenance costs and be error prone.

Notifications should not include changes to dependent/related bugs. This is harder to track based on the current Bugzilla database schema, and a properly designed system should be able to track them indepedently at the client's discretion.

Design and Approach

We will implement a separate server that polls the database directly (or otherwise receives notifications from the database) for changes to bugs and passes on *only the ID* of the changed bug to its clients. The clients can then use the main Bugzilla REST API to determine the exact changes, which will enforce permissions as usual. For simplicity, clients are notified of any changed bug; there is no support for subscribing to particular bugs. Since the notification data is so small (a bug ID, currently 6 or fewer characters), this shouldn't be a problem for scalability.

Important note: this whole design relies upon the idea that knowing the ID of a changed bug is not a security risk. This should be reasonable, given that the only information that is conveyed is that some bug has changed. One could use this information to determine the frequency of changes to a particular bug over some time frame, and hence perhaps an increased interest in a particular bug, but the changes could be to anything--main bug fields, comments, tracking flags, dependencies, etc.

Conceptually, there are three parts:

  • Database poller/listener. There would be exactly one process that frequently polls the database for changes (period TBD but on the order of seconds, not minutes). This would keep the time of the last poll in memory and would ask for only the ID of all bugs changed since the time of the last poll.
    • Even better would be some sort of push notification from the database itself, if this is possible.
  • TCP servers. There would be one or more processes acting as servers that accept client connections and maintain them indefinitely. WebSockets is the preferred protocol for easy integration with browsers. These servers would listen for notifications from the database poller and fan out notification messages to all clients. For scalability, multiple server processes could be launched with a load-balancer (such as Zeus) spreading out connections amongst them.
  • Messaging middleware. Some sort of connection will be required between the database poller and the TCP servers. This could be as simple as standard POSIX communication channels (e.g. named pipes) or a larger application such as an AMQP server (e.g. RabbitMQ or ZeroMQ), as needed. This is the main open question at the moment.

Implementation

Python is a reasonable choice for both the database poller and the TCP server, although it has been suggested that node.js might be better suited for at least the TCP server (need reasons why). Support for the middleware solution (TBD) will be crucial in the chosen language/framework.