Buildbot <-> Taskcluster Bridge
Why?
Allows for a graceful transition from Buildbot to Taskcluster
Lets us reimplement Schedulers as Task Graphs
Lets us port individual Builds to Taskcluster Tasks if/when it makes sense to
Build Promotion requires intermixed Task Graphs, and one of the driving forces behind the Bridge
High Level Architecture
The Bridge acts as both a Taskcluster Worker and a Provisioner, but delegates everything to Buildbot
Three services: Taskcluster Listener, Buildbot Listener, Reflector
Small database specifically for the Bridge that associates BuildRequests with Task IDs, and Run IDs
All multihomed
Dependent on Pulse, SchedulerDB, Self-serve
Taskcluster Listener
Reacts to events on Taskcluster Pulse exchanges
Creates BuildRequests in response to Tasks becoming pending
Cancels Builds and BuildRequests when Tasks are cancelled
Buildbot Listener
Reacts to events on Buildbot Pulse exchanges
Claims Tasks when Builds start
Attaches Buildbot Properties to Tasks as artifacts
Resolves Tasks when Builds complete
Build Result
Taskcluster Resolution
SUCCESS
Completed
WARNINGS
Failed
FAILURE
Failed
EXCEPTION
Exception (reason: malformed-payload)
RETRY
Exception (reason: malformed-payload)
CANCELLED
Exception (reason: canceled)
Reflector
Runs on a timer
Periodically reclaims Tasks
Cancels Tasks when a BuildRequest is cancelled
Some scenarios
Simple, successful build
Task is created
TCListener receives task-pending event, creates BuildRequest
Buildbot creates a Build
BuildbotListener receives build started event, claims the Task
Reflector reclaims the Task while the Build is running
Build completes successfully
BuildbotListener receives log uploaded event, reports success to Taskcluster
Build fails initially, succeeds upon retry
Task is created
TCListener receives task-pending event, creates BuildRequest
Buildbot creates a Build
BuildbotListener receives build started event, claims the Task
Build fails, marked as RETRY
BuildbotListener receives log uploaded event, reports exception to Taskcluster and calls rerunTask
Buildbot has already started a new Build
TCListener receives task-pending event, updates runId, does _not_ create a new BuildRequest
Build completes successfully
BuildbotListener receives log uploaded event, reports success to Taskcluster
Task exceeds deadline before Build starts
Task is created
TCListener receives task-pending event, creates BuildRequest
Nothing happens
Task goes past deadline, Taskcluster cancels it
TCListener receives task-exception event, cancels BuildRequest through Self Serve
Roll-out plan
Deploy in production, restricted to the Alder branch (done)
Replace some Alder Schedulers with Task Graphs (in progress)
Furious testing/verification on Alder
???
TODO
Speed up some queries (the Buildbot Listener sometimes falls behind)
Don't reclaim Tasks so often
Support Task creation for Builds/BuildRequests started in Buildbot ("reverse bridge")
Handle Pulse message acking better