Auto-tools/Projects/Pulse: Difference between revisions

Remove outdated sections, add some updates
(Added spec)
(Remove outdated sections, add some updates)
Line 127: Line 127:


=== Contributing ===
=== Contributing ===
To set up a local system for development, see the [https://hg.mozilla.org/automation/mozillapulse/file/tip/HACKING.md HACKING.md] file included in the mozillapulse source.


Here is a the list of open, unassigned  mentored Pulse bugs to see how you can contribute!
Here is a the list of open, unassigned  mentored Pulse bugs to see how you can contribute!
<bugzilla>
<bugzilla>
     {
     {
Line 149: Line 152:
     }
     }
</bugzilla>
</bugzilla>
To set up a local system for development, see the [https://hg.mozilla.org/automation/mozillapulse/file/tip/HACKING.md HACKING.md] file included in the mozillapulse source.


For mentored bugs, we use the User Story to provide a link back to this page, as well as any extra information for contributors, such as required knowledge/learnings.  The basic text for mentored bugs should be "This is a mentored Pulse bug.  For general information on Pulse, see https://wiki.mozilla.org/Auto-tools/Projects/Pulse, which includes a section on Contributing."  An example of extra text is "This bug also requires you to have a working mail server."
For mentored bugs, we use the User Story to provide a link back to this page, as well as any extra information for contributors, such as required knowledge/learnings.  The basic text for mentored bugs should be "This is a mentored Pulse bug.  For general information on Pulse, see https://wiki.mozilla.org/Auto-tools/Projects/Pulse, which includes a section on Contributing."  An example of extra text is "This bug also requires you to have a working mail server."


=== Status ===
==== Consuming Buildbot messages ====
 
At the moment, only BuildBot messages (BuildMessage, TestMessage) and [[BMO/ChangeNotificationSystem|SimpleBugMessages]] are being published to Pulse.


There used to be two other publishers, which have been disabled:
There are two ways to consume messages published by Buildbot.  The most direct way, which requires the most knowledge about Buildbot, is using the BuildConsumer in [http://hg.mozilla.org/automation/mozillapulse mozillapulse].  This consumer has access to all the native Buildbot messages, and therefore offers the most flexibility.


* HgPublisher: the original shim "crashed on various occasions, in particular file additions/removals/renames and merges made it go funky."
The disadvantage of using the BuildConsumer is that you need to spend time understanding what messages Buildbot publishes to pulse, and how these can vary, and associate particular messages with what you're trying to accomplish.  The format of Buildbot messages is undocumented, and can change without warning, which makes services based on the BuildConsumer potentially fragile.
** {{bug|1022701}} on file to fix and re-enable.
* BugzillaPublisher: this produced too much traffic for the original prototype system, and for security reasons it could publish only changes to public bugs, making it of questionable value.  The [[BMO/ChangeNotificationSystem|SimpleBugzillaPublisher]] is a lightweight replacement that publishes only bug ID and change time, but for all bugs, public or otherwise.


==== Consuming buildbot messages ====
To address some of these disadvantages, a translator is run against the BuildConsumer (the [https://github.com/mozilla/pulsetranslator pulsetranslator]) which re-publishes a subset of Buildbot messages to a NormalizedBuild exchange, which are available using the NormalizedBuildConsumer of mozillapulse.  The content of these messages is simplified and normalized, making it easier to consume without the need to have a thorough understanding of how Buildbot publishes messages to pulse.  The re-published messages also protect consumers against some changes to the pulse stream, although significant enough changes will likely break the pulse translator as well as direct users of BuildConsumer.
 
There are two ways to consume messages published by buildbot.  The most direct way, which requires the most knowledge about buildbot, is using the BuildConsumer in [http://hg.mozilla.org/automation/mozillapulse mozillapulse].  This consumer has access to all the native buildbot messages, and therefore offers the most flexibility.
 
The disadvantage of using the BuildConsumer is that you need to spend time understanding what messages buildbot publishes to pulse, and how these can vary, and associate particular messages with what you're trying to accomplish.  The format of buildbot messages is undocumented, and can change without warning, which makes services based on the BuildConsumer potentially fragile.
 
To address some of these disadvantages, a translator is run against the BuildConsumer (the [https://github.com/mozilla/pulsetranslator pulsetranslator]) which re-publishes a subset of buildbot messages to a NormalizedBuild exchange, which are available using the NormalizedBuildConsumer of mozillapulse.  The content of these messages is simplified and normalized, making it easier to consume without the need to have a thorough understanding of how buildbot publishes messages to pulse.  The re-published messages also protect consumers against some changes to the pulse stream, although significant enough changes will likely break the pulse translator as well as direct users of BuildConsumer.


Another advantage of the NormalizedBuildConsumer is that it will only publish messages for a given build or test job after the logs for that job are available; using the BuildConsumer directly can result in the reception of messages for a build before the build artifacts are available, which can cause problems in consumers if they don't explicitly guard against this problem.
Another advantage of the NormalizedBuildConsumer is that it will only publish messages for a given build or test job after the logs for that job are available; using the BuildConsumer directly can result in the reception of messages for a build before the build artifacts are available, which can cause problems in consumers if they don't explicitly guard against this problem.


Generally speaking, consumers that wish to be notified when specific build or test jobs are completed should use the NormalizedBuildConsumer; consumers that need direct access to the buildbot pulse stream or are looking for non-specific jobs (such as all jobs belonging to a particular commit) should probably use the BuildConsumer.
Generally speaking, consumers that wish to be notified when specific build or test jobs are completed should use the NormalizedBuildConsumer; consumers that need direct access to the Buildbot pulse stream or are looking for non-specific jobs (such as all jobs belonging to a particular commit) should probably use the BuildConsumer.
 
=== Technology used ===
 
* The message broker used is [http://www.rabbitmq.com RabbitMQ].
* Protocol used to talk to the broker is [http://en.wikipedia.org/wiki/AMQP AMQP].
* Messages are in JSON.
* For the Python mozillapulse package, the underlying library currently used to talk AMQP is [http://kombu.readthedocs.org/ Kombu].


=== Road Map ===
=== Road Map ===


See the [http://mzl.la/1pc2F3M prioritized bug list] for all open issues.
See the [http://mzl.la/1pc2F3M prioritized bug list] for all open issues and feature requests.


==== Website ====
=== Security Model ===
* {{bug|1017957}} Merge above in with PulseGuardian; no point in having two websites.
* Indicate current Pulse status (at least just up/down).
* (Maybe) Display published messages on the Pulse website (mostly decorative but also an example of use in the browser).


==== Management ====
This is summarized in the formal Pulse specification above. What follows is the rationale and some technical implementation notes.
* (Almost done!) Intelligently handle queues that start filling up.
** See [[Auto-tools/Projects/Pulse/PulseGuardian|PulseGuardian]].
 
==== Security ====
* {{done|}} Enable SSL.
** {{bug|1013980}} Enable SSL by default in clients.
** Close non-SSL port eventually?
* Move to a tighter permission model. See the Security Model section below.
 
==== Shims ====
* Re-enable hg shim?
* Add git shim?
* Other shims?
 
==== Other ====
* Upgrade RabbitMQ to latest 3.x version (ideally with zero downtime).
* Enable STOMP or some other method of accessing Pulse via the browser.
* Create a JavaScript library along the lines of the mozillapulse Python package.
 
=== Security Model ===


In order to have a reliable, well behaved system, the following assertions will need to be true.
In order to have a reliable, well behaved system, the following assertions will need to be true.
Line 229: Line 190:
=== Admin Procedures ===
=== Admin Procedures ===


These should largely become obsolete when PulseGuardian is deployed.
* PulseGuardian should be deleting queues that are too long. If you need to manually delete a queue, use the Management UI. Try to ping the queue owner first before killing if possible.
 
* pulsetranslator service, which normalizes Buildbot messages, is currently running on pulsetranslator.ateam.phx1.mozilla.com and may need to be reset from time to time.
* When a queue becomes stuck, you can use the Admin UI to kill it. Try to ping the queue owner first before killing if possible.  
** More than half of the queues are QA related (whimboo)
* pulsetranslator service, which normalizes buildbot messages, is currently running on pulsetranslator.ateam.phx1.mozilla.com and may need to be reset from time to time.
* logparser service, used by [http://brasstacks.mozilla.com/orangefactor/ Orange Factor], runs on orangefactor1.dmz.phx1.mozilla.com
* logparser service, used by [http://brasstacks.mozilla.com/orangefactor/ Orange Factor], runs on orangefactor1.dmz.phx1.mozilla.com


=== More reading ===
=== More reading ===


LegNeato wrote several blog posts on Pulse as he was building it.  They contain some more background if you're really interested.  They are linked below, in chronological order.
* [http://slides.com/mcote/pulse Slides] from a presentation on Pulse.
* [https://mrcote.info/blog/2015/02/16/pulse-update/ Update] on Pulse from 2015/02/16.
 
LegNeato also wrote several blog posts on Pulse as he was building it.  They contain some more background if you're really interested.  They are linked below, in chronological order.


* [http://christian.legnitto.com/blog/2010/07/17/mozilla-pulse-and-rabbitmq/ Mozilla Pulse and RabbitMQ]
* [http://christian.legnitto.com/blog/2010/07/17/mozilla-pulse-and-rabbitmq/ Mozilla Pulse and RabbitMQ]
Confirmed users
1,927

edits