release-automation - Part 1: Bootstrap
One of the first tasks I had as a full-time employee of Mozilla was getting the Bootstrap Release framework working with Firefox 3.0 Beta releases. Now, just over 4 years later, our release-automation has changed dramatically in many ways: primary language, supported platforms, scope and extent, reliability, and versatility. I thought it made be interesting to trace the path from there to here, and talk about what's in store for the future, too. Throughout all of this work there's been two overarching goals: 1) Lower the time it takes to go from "go to build" to "updates available for testing" - which we call "end2end time", and 2) Remove the number of machines we have to log into, commands we have to run, and active time we have to spend on a release - known as "manual touchpoints". I'll be referencing these a lot throughout this series. This post will talk about what I know of Bootstrap and my work porting it to Firefox 3.0. In its earliest form Bootstrap was a simple scripted version of much of the previously manual release process. The processes for tagging VCS repositories, creating deliverables (source packages, en-US and localized builds, updates), and some verifications were encapsulated into its scripts. This was a big improvement over the 100% manual, cut+paste-from-a-wiki, process. Instead of logging into many machines and running many commands, the release engineer had to log in to many machines and run a few, very simple commands. The very first release that was Bootstrap-aided was Firefox 1.5.0.9, built on December 6th, 2006. This was before my time, but a former release engineer, Rob Helmer, told me that the end2end time back then could be multiple days, and countless touchpoints. Over time, more parts of the release process were automated with Bootstrap, further reducing the burden on the release engineer. Even with these big improvements some classes of things were still not codified: which machines to run which commands on, when and in what order to run things, who to notify about what. Enter: Buildbot. Integrating Bootstrap into Buildbot was the next logical step in the process. It would handle scheduling and status, while Bootstrap would remain responsible for all of implementation. With this, the release engineer only had to log in to a few machines and run a few, very simple commands. Another big improvement! The first release to benefit from this was Firefox 2.0.0.8, built on October 10th, 2007. This work was largely done by Rob Helmer. Around this time we were gearing up to start shipping the first Firefox 3.0 Beta release and had never tested Bootstrap against that development branch. I was tasked with making whatever changes were necessary to Bootstrap and our Buildbot to make it work. The Buildbot side was largely simple, because of it being at such a high abstraction layer, but back in these days we still had single purpose Buildbot masters, so it involved adding several hundred lines of config code. The Bootstrap side was far more interesting. Until this point, there was a lot of built-in assumptions based on what the 1.8 branch looked like, including:
- Releases are done from CVS branches (explicitly _not_ trunk)
- Windows build machines run Cygwin
- Linux packages are in .gz format
- The crash reporting system Talkback is always shipped
So, in the early days there were tons of improvement quickly: Bootstrap itself sped things up and lowered the possibility of error through reducing manual touchpoints. Buildbot + Bootstrap did so again, through the same methods. We also had pure speed-ups through things such as fast patcher. Having these things allowed us to maintain the 2.0.0.x and 3.0.x branches more more easily, and get chemspill releases out quickly and simultaneously. All of this work had to be done incrementally too, because we had to continue shipping releases while the work was happening. It's hard to find good data for releases done with this version of the automation, but I guesstimate that the end2end time was around 12-14 hours and the number of manual touchpoints was still around 20 for a release without major issues. Next up....release-automation on Mercurial, v1.