Over the course of the past week or so I've been working on rolling out the Windows 7 SDK to our build machines. Doing so presented two challenges: Getting the SDK to deploy silently and properly, and updating the appropriate build configurations to use it. Neither of these may sound very challenging, and indeed, they didn't to me either, but because of a combination of factors this ended up becoming a week long ordeal. In this post I will attempt to detangle everything that happened.
Let's start with the actual SDK installation. Unlike most other reasonable packages, the Windows 7 SDK is not distributed as an MSI package, but rather a collection of MSIs wrapped in an EXE. Unfortunately, this EXE doesn't enable you to do a customized, silent install - the precise thing we need. Vainly, I thought I could figure out the proper order and magic options to install the enclosed MSIs properly. Needless to say, this failed. To work around this I fell back onto using an Autoit script that would click through the interactive installer for me. It took some fuss, but not too much difficulty to get that working.
Now, the fun part (of deployment). We use a piece of software called OPSI to schedule and perform software installations across our farm of 80 or so Windows VMs. OPSI runs very early in the Windows start-up process, and actually executes as the SYSTEM user. Well, it turns out that the Windows 7 SDK must be installed by a full user, not the SYSTEM account. This seems unnecessary, as we've deployed other SDKs through OPSI in the past without issue. After trying to fake it out by setting various environment variables I turned to the OPSI forums for some help. (As an aside, the OPSI developers have been fantastic in their support of our installation, many thanks to them.) It turns out that I'm not the first person to hit problems like this. They pointed me to a template for a script that works around such an issue. The solution ends up being:
- Copy installation files to the slave
- Create a new user in the Administrators group, set that user to automatically login at next boot
- Reboot, and run the package installation at login
- Restore the original automatic login, reboot
- Cleanup (delete installation files, remove the created user)
This is obviously quite hacky, but it gets the job done.
So! With that in hand (and
in repo) we set the SDK to deploy over the course of Wednesday night and Thursday morning. Overall, this went smoothly. For a reason (which I haven't yet figured out) some of the slaves needed some kicking to do the installation properly.
Remember how I said part 2 of this was updating the build configurations? I had planned to do this on Friday, and even
posted a patch in preparation. Well, it turns out that MozillaBuild likes to be smart and find the most recent SDK and compiler for you. This completely slipped my mind while I was doing the deployment and a result, all builds from Thursday (yesterday) morning to Friday (today) morning, including those on mozilla-1.9.1, were done with the Windows 7 SDK. This went unnoticed most of Thursday until I was doing a final test of my build configuration patch.
Here's where the fun starts for this part. After discovering I'd accidentally changed the SDK for everything I went into a bit of a panic and rapidly started testing some fixes out in our staging environment. During the course of this I discovered that things were worse than I thought. Most builds were using the Windows 7 SDK, but not the "unit test" ones. So we weren't even using the same SDK for all the builds for a given branch! Getting all of that sorted out was compounded by
all of the iterations of path styles (c:/ vs. c: vs. /c/) I had to try before I found the magic combination. In the end, I discovered a few things:
- If you're specifying LIB/INCLUDE/SDKDIR in a mozconfig, you must use Windows-style paths
- If you're specifying PATH in a mozconfig, you CANNOT use Windows-style paths - you must use MSYS style
- You can't test for these things properly without clobbering
As I write this the first set of builds that all use the correct SDK are finishing up, and this deployment from hell appears to be nearly over. I want to express a special thanks to the OPSI developers, who were very helpful, and to Nick Thomas and Chris AtLee, for their patience with my countless iterations of build configuration patches. As a final note, let me state explicitly which SDK is being used where:
- Windows Vista SDK (6.0a): mozilla-1.9.1 builds
- Windows 7 SDK (7.0): mozilla-central, mozilla-1.9.2, TraceMonkey, Electrolysis, and Places builds
WinCE and WinMO builds are unaffected by this deployment.