Experiments with smaller pools of build machines

Since the 3.0 days we’ve been using a pool of identical machines to build Firefox. It started off with a few machines per platform, and has since expanded into many, many more (close to 100 on Mac, close to 200 on Windows, and many more hundreds on Linux). This machine pooling is one of the main things that has enabled us to scale to support so many more branches, pushes, and developers. It means that we don’t need to close the trees when a single machine fails (anyone remember fx-win32-tbox?) and makes it easier to buy extra capacity like we’ve done with our use of Amazon’s EC2.

However, this doesn’t come without a price. On mozilla-inbound alone there are more than 30 different jobs that a Linux build machine can run. Multiply that by 20 or 30 branches and you get a Very Large Number. Having so many different types of jobs you can do, you rarely end up doing the same job twice in a row. This means that a very high percentage of our build jobs are clobbers. Even with ccache enabled, these take much more time to do than an incremental build.

This week I’ve run a couple of experiments using a smaller pool of machines (“hot tubs”) to handle a subset of job types on mozilla-inbound. The results have been somewhat staggering. A hot tub with 3 machines returned results in an average of 60% of the time our production pool did, but coalesced 5x the number of pushes. A hot tub with 6 machines returned results in an average of 65% of the time our production pool did, and only coalesced 1.4x the number of pushes. For those interested, the raw numbers are available

With more tweaks to the number of machines and job types in a hot tub I think we can make these numbers even better – maybe even to the point where we both give results sooner and reduce coalescing. We also have some technical hurdles to overcome in order to implement this in production. Stay tuned for further updates on this effort!

14 thoughts on “Experiments with smaller pools of build machines

  1. I am not clear on the speed up factor, if a build takes 100 minutes, then a 3 machine hot tub takes 60 minutes and a 6 machine hot tub takes 65 minutes?

  2. So the speedup comes from less clobbering, not more coalescing? Have you tested hot tubs sizes other than 3 or 6? The results are impressive, especially considering the wait times increased. :)

    1. In these tests it comes from a bit of both. If we turned off coalescing (and tested each push individually by itself) the 3 machine tub would come out much slower, I suspect. The 6 machine tub might still be faster though.

      I haven’t tested any other tub sizes yet. We may decide to tackle the technical hurdles of making it possible to do in production, and then fiddle with the number of builders/machines per tub once that’s done.

  3. The hot tub breakdown seems a little artificial. What happens if a hot tub turns out to be too small for a build type? Seems like you’d be endlessly adjusting hot tub sizes or something.

    Wouldn’t it be better to use a single pool, but tweak the slave selection algorithm to use a builder-specific slave preference ordering? eg if you have K builder types and N total slaves, you could break them down into N/K hot tubs and have builder type i pull from the ith hot tub if available, but if none of those slaves are available just keep walking down the list into the i+1th hot tub. Or label each builder type with its own prime number, and walk through the slaves in strides of that prime. It’ll end up scheduling each builder type in MRU order pulling from a different permutation of the full set of slaves. Some slaves will end up early on multiple lists, but maybe that’s ok – at least it’ll only be used for a small number of builder types.

    Or something. Maybe your hot tubs are already sloshy, and you fall back to another one if yours is all taken?

    1. We already try to choose a machine that recently did the same type of a build (https://github.com/mozilla/build-buildbotcustom/blob/master/misc.py#L436). There’s two problems with this:
      1) This code runs on every buildbot master, and they don’t coordinate with each other. This means that if you have master A who has a machine that recently did build X, but master B sees that build first, master B will take the job.
      2) In peak load times you’re forced to choose a machine that didn’t recently do that type of build, because every other machine is busy doing other work.

      Your idea is interesting nonetheless though – assigning a hot tub id to each builder and slave might be a simpler way to implement this than we’ve talked about before. It might also make it easier to do rebalancing when the load on each tub changes.

  4. So I would assume that this is only beneficial to build and not test runs, since whether a build was a clobber or not doesn’t affect the tests. But then you say there are over 30 different kinds of jobs on mozilla-inbound alone but I don’t see that many types of builds there.

    1. You’re right, this is only beneficial to builds (and only to non-try builds, which are always clobbers).

      The 30 that I’m counting include various types of 32 and 64-bit Linux builds, Android builds, and b2g builds — all of which build on the same machine type. The list that I counted is here: https://pastebin.mozilla.org/4206858. Some of these run on a 6 hour cycle rather than per push (pgo and non-unified, for example). It’s also worth noting that on any branch with nightlies there’s even more builders.

  5. Relatedly, http://glandium.org/blog/?p=3170 :)
    What kind of different jobs your hot tubs were treating?
    Note there is another benefit with better slave selection, however it’s achieved, than less clobbers and faster build times: less disk space requirement (we currently require 250GB on non-try build slaves)
    Also note that there’s something else that will have a huge impact on your experiment: bug 858621 (significantly reducing the time spent on make check)

    1. Thanks for all the good points! It would’ve been better if my numbers were comparing just the “compile” steps of every job, rather than including “make check” and other things that aren’t affected by slave selection. It’s great to see that we’ve converged on the same conclusion, too.

      One thing I’m curious about is what your build times were like for each rev? Were the incremental builds incredibly fast?

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>