CloudServices/Sagrada/Metlog: Difference between revisions

no edit summary
No edit summary
No edit summary
Line 28: Line 28:
The proposed Services Metrics architecture will consist of 3 layers:
The proposed Services Metrics architecture will consist of 3 layers:


; generator : The generator portion of the system is the actual service application that is generating the data that is to be sent into the system.  We will provide libraries (described below) that app authors can use to easily plug in.  The libraries will take messages generated by the applications, serialize them, and then send them out to the listener using ZeroMQ.  The metrics generating apps that need to be supported initially are based on the following platforms:
; generator : The generator portion of the system is the actual service application that is generating the data that is to be sent into the system.  We will provide libraries (described below) that app authors can use to easily plug in.  The libraries will take messages generated by the applications, serialize them, and then send them out "fire and forget" style over UDP (zeromq?).  The metrics generating apps that need to be supported initially are based on the following platforms:
* Mozilla Services team's Python app framework (sync, reg, sreg, message queue, etc.)
* Mozilla Services team's Python app framework (sync, reg, sreg, message queue, etc.)
* Node.js (BrowserID).
* Node.js (BrowserID).


; router : The router is what will be listening for the messages sent out by the provided libraries.  It will deserialize these messages and examine the metadata to determine the appropriate back end(s) to which each message should be delivered.  The format and protocol for delivering these messages to the endpoints will vary from back end to back end.  We plan on using [http://logstash.net/ logstash] as the message router, because it is already planned to be installed on every services server machine, and it is built specifically for this type of event-based messager routing.
; router : The router is what will be listening for the UDP packets sent out by the provided libraries.  It will deserialize these messages and examine the metadata to determine the appropriate back end(s) to which the message should be delivered.  The format and protocol for delivering these messages to the endpoints will vary from back end to back end.  We plan on using [http://logstash.net/ logstash] as the message router, because it is already planned to be installed on every services server machine, and it is built specifically for this type of event-based messager routing.


; endpoints : Different types of messages lend themselves to different types of presentation, processing, and analytics.  We will start with a small selection of back end destinations, but we will be able to add to this over time as we generate more types of metrics data and we spin up more presentation and query layers.  Proposed back ends are as follows:
; endpoints : Different types of messages lend themselves to different types of presentation, processing, and analytics.  We will start with a small selection of back end destinations, but we will be able to add to this over time as we generate more types of metrics data and we spin up more presentation and query layers.  Proposed back ends are as follows:
* [https://github.com/etsy/statsd statsd]: '''(Phase 1)''' statsd is already in the pipeline to be running on every Services machine
* [https://github.com/etsy/statsd statsd]: '''(Phase 1)''' statsd is already in the pipeline to be running on every Services machine
* [https://github.com/mozilla-metrics/bagheera Bagheera]: '''(Phase 1)''' Bagheera is a REST service provided by the Mozilla Metrics team that will insert data into the Metrics team's Hadoop infrastructure, available for later processing.
* [https://github.com/mozilla-metrics/bagheera Bagheera]: '''(Phase 1)''' Bagheera is a REST service provided by the Mozilla Metrics team that will insert data into the Metrics team's Hadoop infrastructure, available for later processing.
* [https://github.com/dcramer/django-sentry Sentry]: '''(Phase 2)''' Sentry is an exception logging infrastructure that provides useful debugging tools to service app developers.  Sentry is not yet planned on being provided by any Mozilla operations team, using it would require buy-in from and coordination with a Mozilla internal service provider (probably the Services Ops team).
* [https://github.com/dcramer/django-sentry Sentry]: '''(Phase 1)''' Sentry is an exception logging infrastructure that provides useful debugging tools to service app developers.  Sentry is not yet planned on being provided by any Mozilla operations team, using it would require buy-in from and coordination with a Mozilla internal service provider (probably the Services Ops team).
* [http://esper.codehaus.org/ Esper]: '''(Phase 2)''' System for "complex event processing", i.e. one which will watch various statistic streams in real time looking for anomalous behavior.
* [http://www.arcsight.com/products/products-esm/ ArcSight ESM] '''(Phase 2)''' Security risk analysis engine.
* [http://opentsdb.net/ OpenTSDB] '''(Phase 2)''' A "Time Series Database" providing fine grained real time monitoring and graphing.


== Proposed API ==
== Proposed API ==
Line 43: Line 46:
The atomic unit for the Services Metrics system is the "message".  The structure of a message is inspired by that of the well known syslog message standard, with some slight extensions to allow for more rich metadata.  Each message will consist of the following fields:
The atomic unit for the Services Metrics system is the "message".  The structure of a message is inspired by that of the well known syslog message standard, with some slight extensions to allow for more rich metadata.  Each message will consist of the following fields:


* timestamp: Time at which the message is generated.
* ''timestamp'': Time at which the message is generated.
* logger: String token identifying the message generator, such as the name of the service application in question.
* ''logger'': String token identifying the message generator, such as the name of the service application in question.
* severity: Numerical code from 0-7 indicating the severity of the message, as defined by [https://tools.ietf.org/html/rfc5424 RFC 5424].
* ''type'': String token identifying the type of message payload
* message: Message text payload.
* ''severity'': Numerical code from 0-7 indicating the severity of the message, as defined by [https://tools.ietf.org/html/rfc5424 RFC 5424].
* metadata: Arbitrary set of key/value pairs that indicates the type of message that is being generated and includes any additional data that may be useful for back end reporting or analysis.
* ''payload'': Actual message contents.
* ''tags'': Arbitrary set of key/value pairs that includes any additional data that may be useful for back end reporting or analysis.


We will provide a "metlog" library that will both ease generation of these messages and that will handle packaging them up and delivering them (via ZeroMQ) into the message processing infrastructure.  Implementations of this library will likely be available in both Python and Javascript, but the Python library will be available first and this document will, for now, only describe the Python API.  The Javascript API will be similar, modulo syntactic sugar that is available in Python but not in JS (e.g. decorators, context managers), and will be documented in detail in the future.  The proposed Python API is as follows:
We will provide a "metlog" library that will both ease generation of these messages and that will handle packaging them up and delivering them (via UDP) into the message processing infrastructure.  Implementations of this library will likely be available in both Python and Javascript, but the Python library will be available first and this document will, for now, only describe the Python API.  The Javascript API will be similar, modulo syntactic sugar that is available in Python but not in JS (e.g. decorators, context managers), and will be documented in detail in the future.  The proposed Python API is as follows:


; '''MetlogClient(host, port, logger="", severity=6)''' : Primary metlog client class which can accept metlog messages and will deliver them to the message processor listening at the specified ''host'' and ''port''.  The provided ''logger'' and ''severity'' values will be used by default for all subsequent ''metlog'' method calls which do not explicitly pass other values.
; '''MetlogClient(bindstrs, logger="", severity=6)''' : Primary metlog client class which can accept metlog messages and will deliver them to the message processor.


; '''MetlogClient.set_message_flavor(flavor_name, metadata)''' : The metadata for a given message can be used to label and categorize that message. This method expects a string value ''flavor_name'' and a dictionary ''metadata''. The flavor name value can be passed in as a ''flavor'' to subsequent ''metlog'' calls as shorthand for including the specified metadata in the outgoing message.
* ''bindstrs'': A string (or a sequence of strings) representing the location of the upstream message processor.  By default these should be ZeroMQ bind strings.
* ''logger'': Default for all subsequent ''metlog'' calls which do not explicitly pass this value.
* ''severity'': Default for all subsequent ''metlog'' calls which do not explicitly pass this value.


; '''MetlogClient.metlog(timestamp=None, logger=None, severity=None, message="", metadata=None, flavors=None)''' : Sends a single log message to the previously specified metlog listener. Most of the arguments correspond to the message fields described above. None of them are strictly required, but most of them will be populated by reasonable defaults if they aren't provided:
; '''MetlogClient.metlog(type, timestamp=None, logger=None, severity=None, message="", tags=None)''' : Sends a single log message along to the metlog listener(s). Most of the arguments correspond to the message fields described above. Only ''type'' is strictly required, the rest will be populated by reasonable defaults if they aren't provided:


* ''timestamp'': Defaults to current system time
* ''timestamp'': Defaults to current system time
Line 61: Line 67:
* ''severity'': Defaults to the current value of MetlogClient.severity
* ''severity'': Defaults to the current value of MetlogClient.severity
* ''message'': Defaults to an empty string
* ''message'': Defaults to an empty string
* ''metadata'': Defaults to an empty dictionary
* ''tags'': Defaults to an empty dictionary
* ''flavors'': Any specified flavors will cause this message's metadata value to be updated to contain the flavor's metadata; defaults to an empty list


; '''MetlogClient.timer(name, timestamp=None, logger=None, severity=None, metadata=None, flavors=None, rate=1)''' : Can be used as either a context manager or a decorator.  Will calculate the time required to execute the enclosed code, and will generate and send a metlog message containing the timing information upon completion.  ''name'' is a required string label for the timer that will be added to the message metadata.  ''rate'' represents what fraction of these invocations should actually be timed, so a value of 0.3 would mean that the code would be timed and the results sent off approximately 30% of the time it was executed.
; '''MetlogClient.timer(name, timestamp=None, logger=None, severity=None, tags=None, rate=1)''' : Can be used as either a context manager or a decorator.  Will calculate the time required to execute the enclosed code, and will generate and send a metlog message (of type "timer") containing the timing information upon completion.


; '''MetlogClient.incr(name, timestamp=None, logger=None, severity=None, metadata=None, flavors=None)''' : Sends an "increment counter" message to metlog.  ''name'' is a required string label for the counter that will be added to the message metadata.
* ''name'': A required string label for the timer that will be added to the message tags
* ''timestamp'': Defaults to current system time
* ''logger'': Defaults to the current value of MetlogClient.logger
* ''severity'': Defaults to the current value of MetlogClient.severity
* ''tags'': Defaults to an empty dictionary
* ''rate'' Represents what fraction of these invocations should actually be timed; a value of 0.3 would mean that the code would be timed and the results sent off approximately 30% of the time it was executed
 
; '''MetlogClient.incr(name, timestamp=None, logger=None, severity=None, tags=None)''' : Sends an "increment counter" message to metlog.  ''name'' is a required string label for the counter that will be added to the message metadata.
 
* ''name'': A required string label for the counter that will be added to the message tags
* ''timestamp'': Defaults to current system time
* ''logger'': Defaults to the current value of MetlogClient.logger
* ''severity'': Defaults to the current value of MetlogClient.severity
* ''tags'': Defaults to an empty dictionary


== Use Cases ==
== Use Cases ==
Confirmed users
125

edits