Taskcluster/Round Tuit Box: Difference between revisions

Taskcluster/Round Tuit Box (view source)

Revision as of 22:39, 18 February 2017

1,534 bytes added , 18 February 2017

Proposed in-task metrics

Jonasfj

33

edits

@@ Line 58: / Line 58: @@
 == Mock Pulse Listener/Publisher ==
 In taskcluster services we publish a lot of events to a RabbitMQ server called pulse. We do this so that anyone can hook into the event stream and feed of CI events. This is a powerful and important feature, hence, we test that messages are sent during our integration tests. However, this means that in order to run tests you must have pulse credentials. This isn't a huge issue as these can be made for anyone. But it limits our ability to test PRs from untrusted repositories and makes running tests harder. Hence, it would be cool to make a mock-mode for our PulseListener and publisher. Then we can run tests without pulse credentials for PRs and with credentials for pushes, as an integration step after merging. This is also an opportunity to clean up some of older parts of the code base to improve overall stability.
+== In-Task Metrics ==
+Inside tasks running on taskcluster we do a lot of steps that it would be interesting to measure. Things like "time to clone gecko", "firefox build time", or "how often do we have a clobber build". It would be convenient for task-writers to record these metrics by printing special annotations in the log, like <code>### BEGIN my-metric-name </code> and <code>### END my-metric-name </code>.
+If workers would extract such annotations along with timestamps and report them to a services that would aggregate them we would be able to easily build statistics many different things. The service aggregating these metrics would have to index by when the metric was recorded as well as <code>task.tags</code> of the task the metric was recorded from. Such that we can slice and dice a metric by tags.
+As an example we might want to look at median, 95th percentile and mean for the <code>firefox-build-time</code> metric over all tasks with tags <code>level=*</code>, <code>kind=debug</code> and <code>platform=linux64</code>.
+Extracting metrics from logs is a bit of work, the hard would be to index and aggregate the metrics in a scalable manner.
+Presumably, we would have to throw everything in a relational database, or perhaps a time series database like influxdb.
+It might also be worth while look at data warehouse solutions for inspiration. Or look into options for on-the-fly aggregation using t-digests, granted that probably won't work considering the explosive dimensionality  of <code>task.tags</code>.
 = Guidelines =