Taskcluster/Round Tuit Box: Difference between revisions

m
Callek moved page TaskCluster/Round Tuit Box to Taskcluster/Round Tuit Box: Pinged directly by :dustin to do this
No edit summary
m (Callek moved page TaskCluster/Round Tuit Box to Taskcluster/Round Tuit Box: Pinged directly by :dustin to do this)
 
(One intermediate revision by one other user not shown)
Line 1: Line 1:
That's a great idea, let's do that when we get a round tuit!
That's a great idea, let's do that when we get a round tuit!


All of these projects are long-term, in-depth work that is important for TaskCluster's future, but for which we don't have paid staff right now.  These are *not* for beginners!  We will be happy to help out, but if you want to get started on something here, please plan to be in it for the long haul.
We used to have a nice page here for ideas like this -- good ideas that we just don't have time to work on right now.  Those ideas are now kept and discussed in the [https://github.com/taskcluster/taskcluster-rfcs taskcluster-RFCs project on Github] instead.
 
= Ideas =
 
== Auth Service Refactoring ==
 
This includes a few updates to the authentication service:
 
* optimization of DFA generation (simpler? smarter?)
* send scope sets to the auth service and it returns a boolean, rather than auth.authenticateHawk's return of the full set of user's scopes (which are getting large)
* security policies (a way to assert that user or client __ does *not* have scope __)
* parameterized roles:
** 'project-admin:%': ['auth:create-client:project/%/*', ..]
** With the added hack that: 'project-admin:*' ->  ['auth:create-client:project/*, ...'] i.e., drop trailing slash/chars
 
== Live-Log Proxy ==
 
Write a server that privileged-clients can open a HTTPS connection to in-order to expose a webhook that http-clients can call.
 
When normal http-clients access the exposed webhooks the connection will be reverse proxied to the privileged-clients over their out-going connection.
 
== Login Refactoring ==
 
Currently, taskcluster-login and taskcluster-tools are two different projects.  Let's combine them into one site, with the login service invoked only via API calls.  In the process, integrate more tightly with the Mozilla-IAM team, including allowing users to authenticate with their Github credentials.
 
Also, provide a means for users to increase their privileges through [https://bugzilla.mozilla.org/show_bug.cgi?id=1312915 some sort of "sudo" mechanism]
 
Similarly, allow users who have access to certain github or hg repositories to gain the privileges associated with those repositories.
 
== TaskCluster To-Go ==
 
TaskCluster is pretty much a single-deployment service right now.  While each of the individual services can be run independently, there's really no practical method of deploying another TaskCluster instance, say at another company or organization.  This prevents certain users from adopting TaskCluster, as they are then beholden to Mozilla to continue to support the service.
 
Build a method to go from zero to a fully operational TaskCluster installation.
 
== Queue Datamining ==
 
TaskCluster provides very little information on tasks that are in the queue right now, or on tasks that have been completed.  It's difficult to even answer simple questions like "what portion of our tasks are gecko-related?"
 
Let's dump task information into a good database that would allow the right kind of flexible queries to answer questions like this.
 
== Artifact Downloader ==
 
Downloading artifacts reliably is a constant problem.  Every time someone decides to use a TaskCluster artifact for some purpose, they do the obvious: <tt>curl https://queue.taskcluster.net/v1/...</tt>.  Then that download fails and their job explodes.  So they add <tt>--fail</tt>, which doesn't actually do what it promises.  So they rewrite in Python with urrlib2.  Which raises unexpected exceptions, or fails to follow redirects, or times out too quickly.  And so on.
 
Let's build a simple, reliable, automatically-retrying artifact-downloading client that is easy to deploy everywhere we need it (so, in a compiled language like rust or go) that we can then encourage all users to take advantage of.
 
== Docker and Native Engines for TaskCluster-Worker ==
 
We have a bunch of worker implementations now.  In theory, we want to converge on TaskCluster-Worker, but we haven't found the time, so we are still running Generic-Worker and Docker-Worker.  So let's implement the functionality of those workers within TaskCluster-Worker, and then migrate all uses of the old workers and deprecate them.
 
== New Provisioner Architecture ==
 
The AWS provisioner is specific to AWS, and mixes the concerns of predicting required load with bidding for and monitoring running EC2 instances.  We would like to be able to support other clouds, as well as on-premises hardware, all of which have different behaviors.  And we would like to be able to use more sophisticated load-prediction algorithms.
 
== Portable Decision Tasks ==
 
In the Gecko tree, we have a sophisticated decision task that makes the task graph.
 
We should provide some support for building a similar system for other projects.  That should be some basic building blocks on which those projects can build, since their needs will be different from Gecko.  For example, perhaps a Github repo with some basic decision-task code, and a docker image that can run it.  Then projects can fork that repo and get some tasks running quickly, then modify it to suit their needs.
 
== Mock Pulse Listener/Publisher ==
In taskcluster services we publish a lot of events to a RabbitMQ server called pulse. We do this so that anyone can hook into the event stream and feed of CI events. This is a powerful and important feature, hence, we test that messages are sent during our integration tests. However, this means that in order to run tests you must have pulse credentials. This isn't a huge issue as these can be made for anyone. But it limits our ability to test PRs from untrusted repositories and makes running tests harder. Hence, it would be cool to make a mock-mode for our PulseListener and publisher. Then we can run tests without pulse credentials for PRs and with credentials for pushes, as an integration step after merging. This is also an opportunity to clean up some of older parts of the code base to improve overall stability.
 
== In-Task Metrics ==
Inside tasks running on taskcluster we do a lot of steps that it would be interesting to measure. Things like "time to clone gecko", "firefox build time", or "how often do we have a clobber build". It would be convenient for task-writers to record these metrics by printing special annotations in the log, like <code>### BEGIN my-metric-name </code> and <code>### END my-metric-name </code>.
If workers would extract such annotations along with timestamps and report them to a services that would aggregate them we would be able to easily build statistics many different things. The service aggregating these metrics would have to index by when the metric was recorded as well as <code>task.tags</code> of the task the metric was recorded from. Such that we can slice and dice a metric by tags.
As an example we might want to look at median, 95th percentile and mean for the <code>firefox-build-time</code> metric over all tasks with tags <code>level=*</code>, <code>kind=debug</code> and <code>platform=linux64</code>.
 
Extracting metrics from logs is a bit of work, the hard would be to index and aggregate the metrics in a scalable manner.
Presumably, we would have to throw everything in a relational database, or perhaps a time series database like influxdb.
It might also be worth while look at data warehouse solutions for inspiration. Or look into options for on-the-fly aggregation using t-digests, granted that probably won't work considering the explosive dimensionality  of <code>task.tags</code>.
 
= Guidelines =
 
The ideas listed here should be multi-month projects, but should not last forever (so, "build XYZ", but not "maintain ABC indefinitely").
 
They should be reasonably well-defined, and not require blue-sky design or unproven technologies (so, no quantum computing, sorry).
 
Avoid linking to discussions.  Link to (or just include here) succinct descriptions of the current thinking about the idea.
Account confirmers, Anti-spam team, Confirmed users, Bureaucrats and Sysops emeriti
1,529

edits