Contribution opportunity: Release Engineering systems

Release Engineering runs a vast array of infrastructure and systems that do much of the continuous integration and releases for Mozilla. Many of our systems are small in their scope but must be able to scale up to support the incredible load that developers put on them. Other systems receive millions of requests every day from live Firefox, Fennec, and Thunderbird installations.

Do you want help developer productivity or get releases into users hands more quickly and efficiently? Do you want to gain experience working on systems that must work at scale? If so, Release Engineering is a great place to look. Below are a few interesting bugs that could use some attention. If you're interested in working on any of them I'm interested in mentoring you. You should be familiar with Python, but you don't need to be an expert. Have a look below and contact me directly if anything interests you.

  • Partial update generation service: Arguably, updates are the most important part of release process. Partial updates in particular help us keep a good user experience by reducing the amount of data a user needs to download, which means they update more quickly. We generate many of these already but creating this service would allow much more flexibility over what and when we generate partial updates. This project would involve writing the service from scratch, most likely in Python.
  • Update Balrog schema to support multiple partials: Balrog is the code name of our new update server (which I've previously blogged about). It's original design came about before we supported serving partial updates to users on multiple older versions of Firefox. In order to start using Balrog for Betas and Releases we need to add this feature. Balrog is written in Python and this will mostly involve server side changes to it.
  • Improve update verify output: "Update verify" is a very important test that we run as part of our release automation. It's job is to make sure that all users, regardless of where they're coming from, end up in the same state after updating to the latest release. It's output currently consists of thousands and thousands of lines of text, with test results interspersed. This bug is about finding and implementing a way to make the output easier for a human to make sense of and parse upon failure. The update verify scripts are written in bash, but this could be implemented by modifying them or post-processing the output.
  • Store history of machine actions requested through API: We recently deployed a new system that helps us manage our thousands of build and test machines. It aims to be a single entry point for information gathering and common operations on them. Currently, the data in it is volatile -- all history of operations is lost when the server is restarted. This bug will involve adding permanent storage (maybe SQL, maybe something else) to that server, which is written in Python.

Deer hunting, year two

WARNING: This post contains some graphic description of things that some people may find offensive. If you do not like hunting or are squeamish, you may not want to read this.

Last year I had my first hunting experience. I barely knew what I was doing and didn't even see a single deer, but it was still a great time in nature. This year was a lot more of the same, but I was much more prepared for the experience, and got to spend even more time hunting.

I've been looking forward to deer season for literally months in advance - I bought some new gear, did a lot of reading and other research, and was generally much more confident going in.

Preparation

Last year I bought all of the basics gear-wise except I was borrowing a friend's gun. Earlier this year I bought my own rifle (a Browning X-Bolt Hunter 30.06) and had a few opportunities to get some practice with it. Compared to the 7mm I was shooting last year, my gun is a bit lighter and has significantly less recoil - I've found it much easier to handle. A few days before going out I ended up at Silverdale Gun Club which, at 2h from Toronto, is the closest rifle range open to the public. Happily, I was able to verify the sights on my scope and shoot a 2" grouping at 100m without using an entire case of cartridges.

A few months back a friend gave me a large stack of old hunting magazines. In the week before going out I read through just about all of them to pick up tons of deeps on deer behaviour, location, calling, etc. After reading some of those I picked up a a couple of deer calls (a grunt and a bleat) as both seemed very commonly used. Reading these also reinforced with me the fact that wind direction and scent are extremely important when deer hunting. This is something that I've been told before, but I learned a lot more about how to use wind direction to your advantage by reading so much. For example, when you call a deer it's rare for it to come directly to you -- it will usually try to get downwind to catch your scent before it comes in close. If set-up correctly, you can put yourself in a place where you can intercept it before it makes it downwind.

The Location

A friend of mine recently purchased a large property primarily for hunting purposes. He's spent a lot of time getting it ready for deer season - including building two tree stands - and I was fortunate enough to be invited to hunt there with him. I'm certainly no expert on this, but compared to last year's location it seemed like it had more potential - there was more varied terrain (hardwood forest, cornfields, swamp/marsh) and because much of the property is marshy, there's large amounts of land with absolutely no human traffic.

The weather was a little warm for this time of year. It started around freezing, went up over 10C for a couple of days, and then back down almost to freezing again, with rain on and off.

The Hunt

Like last year, we got up very early so we could be out in the field before legal shooting. Thankfully, this new property was closer to where we were staying and we got to sleep until 4:30am most days - what a luxury!

Monday and Tuesday

I spent the bulk of Monday and Tuesday in a tree stand on the edge of cornfield. I didn't see a single deer these two days, but one of the largest rabbits I've ever seen poked it's head out around dawn and hopped into the cornfield. It must've been hungry because it didn't return until dusk. I did take some time to walkabout the property a bit and found numerous signs of deer: tracks all over the place, scat, and a series of buck scrapes on one of the wider trails. After finding these on the second day we set-up some trail cams to try to get a better sense of where the deer are coming from, and when they move about.

Wednesday and Thursday

I spent on Wednesday in the stand again but gave it up pretty quickly after absolutely no activity the first couple of hours after sunrise. That afternoon I spent some time looking for a spot to set-up deeper in the hardwood part of the forest. Despite being quite dense there's very few places with a decent amount of cover - meaning it's hard to sit without being easily spotted. Eventually I found a spot that wasn't completely exposed and got set-up. Because the forest is so dense I needed to bend and break quite a few branches of trees to have a viable shooting path. With that done I sat down for the remainder of the day. I tried some calls and I tried complete silence - but I had no luck at all. No sightings and no sounds (other than the irritating red squirrel). Still no deer. On our way home we noticed that all of the corn in the fields had been harvested - this change in scenery gave me some hope for the next day.

Thursday morning I didn't look to the stand at all, and sat in a simple ground blind on the edge of a different field. Again I tried calling and again nothing came. Like Wednesday, I didn't last long before I gave up on the cornfield and went back into the bush. We also checked the trail cams again and found that there was a nice buck wandering around just 30 minutes before we got there - how frustrating! Once again, I was on the hunt for somewhere with good cover to sit - but this time I got as close to the marsh as I could, as we thought that it was the most likely place that the deer were bedding. After an hour or so of slow walking and searching I finally found somewhere to my liking:

It looked to my like this was something someone constructed a few years ago - the logs are arranged so that there's pretty good cover for small movements (like shifting of legs) but low enough that they can be seen over. There was also a nice big pile of dirt behind me to lean against. As sitting on the ground goes, it doesn't get much better than this. Once again I needed to cut shooting lanes. As I was moving branches directly ahead I looked down and saw a fresh looking deer track. I won't lie, I got pretty excited about seeing a deer track 30 yards from where I was sitting for the rest of the day. Again I sat down, again I called, and again nothing came. Heading out that night I was starting to feel a little dismayed. I know that even if you do all the right things the deer may not come, but it was pretty disappointing to be coming out of my second last day without so much as seeing a deer. On the way back my friend suggested that we come out even earlier the next day to get sat down long before legal shooting to give even more time to let things settle.

Friday

So, Friday morning we got up at 3:45am and left the house blearly eyed around 4:30. I decided to head back my spot from the day before which meant I had to walk close to 10 minutes into the woods in pitch black. The fact that there's coyotes and possibly a bear in these woods made this a little nerve wracking. I made it in safely and was set-up, sitting quietly and motionless by 5:30. With still nearly an hour before legal shooting I took a short nap. A minute before legal shooting I gave out a short grunt with my call hoping to entice something out. After having no success with it all week I didn't expect much, but 2 minutes later I caught a climpse of something moving to my right. At this point my heart started to beat faster and I did my best to move slowly to get a better look. I quickly realized that it was in fact a deer! It was upwind of me and unfortunately not in either of my shooting lanes. I started to think about what I should do next but before I could decide the deer made the decision for me -- it started moving left, probably trying to get downwind of me to catch my scent. It moved behind a series of trees and I took this opportunity to shoulder my rifle, though I was unsure if I would get a shot at it. All of a sudden it appeared directly ahead of me and completely motionless. I looked down my scope, told myself where to aim, and pulled the trigger. Despite shooting my rifle many times before, this shot seemed louder than any other.

For the next 30 seconds I was in a state of shock. I looked ahead and saw nothing. Looking slowly in either direction was the same. Because of the recoil of the rifle I had no idea whether or not I hit my target. I wasn't sure what to do at first but soon realized that I had to get up and check things out. It's possible that I wounded it, I thought, and that I'd have to chase down a blood trail. Getting up, I grabbed my rifle and turned on my headlamp. As I walked through the dense bush my gaze was fixed on the spot I last saw the deer. At first I didn't see anything but then I saw something reflecting light. I approached slowly once I was within a few metres I saw it: a deer, laying motionless on the ground, with an exit wound clearly visible on the upturned side of the body, just below the neck. To make sure it was truly dead I poked from a distance a few times (something I'd been told is always smart to do). No response came from the deer, which I could now tell was a doe. Then it really sunk in: I'd done it. I killed my first deer, and my shot killed it instantly, ensuring it didn't suffer.


I worked towards this moment for more than a year but it still feels unreal. Killing this animal gave me no pleasure, but directly providing food for my family gives me immense satisfaction. Whether or not I bring home anything in the future I look forward to contiuning this journey.

How to deal with timezone adjusted "epoch" timestamps in Python

Today I discovered that we have a system that returns "epoch" timestamps, but adjusted for Pacific time. This means that depending on whether daylight savings time is in effect, these timestamps are 7 or 8 hours ahead when interpreted by most tools. These are horribly difficult to deal with as unix timestamps are assumed to be in UTC time. I spent a good deal of time banging my head against Python's datetime and pytz modules (as well as the wall). With some help from John Hopkins I found a solution:

In the following example we'll convert the Pacific "epoch" timestamp 1383000394 to a proper epoch timestamp (which is 1382975194).

First, we need a tzinfo object for Pacific time:

>>> import pytz

>>> pacific_time = pytz.timezone("America/Los_Angeles")

Next, we need to get the initial timestamp into a datetime object to work with it. Note that it's important to use utcfromtimestamp() here otherwise you'll get a localized datetime object - which will only be useful if the machine you run this on is in Pacific time:

>>> from datetime import datetime

>>> dt = datetime.utcfromtimestamp(1383000394)

>>> dt

datetime.datetime(2013, 10, 28, 22, 46, 34)

It gets a little weird from here. We need to subtract the Pacific offset from the datetime object in order to get it into an actual UTC time. To do that we can force the datetime object in UTC time and then use its built-in astimezone() method to do the conversion. I think this still leaves a 7 or 8 hour window whenever DST starts or ends where this conversion is an hour off - but it's good enough for my usage:

>>> dt = dt.replace(tzinfo=pytz.utc)

>>> dt

datetime.datetime(2013, 10, 28, 22, 46, 34, tzinfo=)

>>> dt = dt.astimezone(pacific_time)

>>> dt

datetime.datetime(2013, 10, 28, 15, 46, 34, tzinfo=)

Now we have a datetime object with the correct time, but claiming to be in Pacific. We can fix that by replacing the tzinfo again:

>>> dt = dt.replace(tzinfo=pytz.utc)

>>> dt

datetime.datetime(2013, 10, 28, 15, 46, 34, tzinfo=)

The only thing left to do now is convert to epoch time!

>>> import calendar

>>> calendar.timegm(dt.utctimetuple())

1382975194

Voila, the timestamp we were looking for!

Much credit to John Hopkins for his code that taught me how to use datetime.replace() and astimezone(). No credit at all goes to Python's datetime module, which is sorely in need of an overhaul.

Summit Takeaways

For me, events like the Mozilla Summit are much more about getting to know new people and building rapport. Even given my focus on that there's lots of interesting and random things that made an impact on me over the past week:

  • Webmaker and Appmaker could be incredibly important. Someone in Santa Clara made the analogy of these tools being like simple tools like hammers, screwdrivers, etc. that let anyone create things from scratch - whether it's for just themselves, a small group of people, or the wider world.
  • Matt Thompson made a great point about "Reinvest[ing] in mentorship" in his recent blog post. My originally involved with Mozilla probably wouldn't have gone anywhere if it wasn't for people mentoring me. I've done my share of teaching and helping within my own group at Mozilla, but Matt's post made me realize that I've never actually paid this forward elsewhere.
  • I had a conversation with someone about a request for input I got from someone on a complicated topic. I've grown hesitant to reply to these in recent times because I'm afraid of wrongly committing myself or others to work. The person I spoke with reinforced with me that I shouldn't ever be afraid to comment on technical things because of this. This seems like a simple thing upon reflection, but it really adjusted my perspective.
  • All the face to face time - both with the people I talk with daily and with those I barely know - is very useful. I got new perspective on people I've known for a long time, built rapport, and was able to thank a lot of people for things they've helped me with over the years. I also learned a lot of things I wouldn't have otherwise by talking with people I've never met. For this reason alone I hope we can find ways to have larger Mozilla events more often (that is, larger than a team's work week but smaller than a full Summit).

New AUS is live!

As previously announced, Nightly users of Firefox and Fennec were switched to the new update server yesterday. We've been doing manual testing, watching logs, and keeping an eye metrics since then and as far as we can tell everything is functioning as expected and users are being updated. If you are on Nightly and experience any problems updating, please stop by #releng or file a bug.

While I'm on the one making this announcement, I'm far from the only person who's worked on it. Balrog, as we've been calling internally, is the the brainchild of Nick Thomas, who made the first commit back in December 2010. The two of us have been working on it off and on since then with lots of help from numerous other people:

There's still more work to do on Balrog, but this is a huge step for us, and validates all the work we've already done on it. Expect more posts from myself and others as we continue to make progress.

Upcoming changes to AUS for Nightly users

For a long time, we've known that we've outgrown the current version of the AUS software. Recently we've been working hard on a drop-in replacement for it - something that would give us easier and better controls over the updates we serve. This new software, codenamed Balrog, is nearly ready to start serving updates in production. We're still bringing the last parts of the new system online, but we intend to make this change before the end of September. We will announce an exact date at least a day before the transition.

Of course, accurately serving updates at scale is very important to Mozilla, so even after these preparations we'll be rolling out carefully. We've already tested Balrog internally for many months without issue, and we'll start by switching Nightly and Nightly-based (UX, elm, oak) users of Firefox and Fennec over. Aurora, Beta, Release and ESR builds will continue talking to the old AUS until we're confident enough to start moving them over. Thunderbird users will also be moved when we feel comfortable.

While we're confident that the transition will go smoothly, we ask that anyone who experiences problems after the switch to file a bug or look for bhearsum or nthomas in #releng.

Mozconfig verification tests

It's important for us to make sure that we build Firefox as similarly as possible between nightly builds and release builds. However, because release builds need to be built with a few special flags (to enable the correct branding, update channel, etc.) we need to maintain separate mozconfigs for them. In the past we've sometimes forgotten to carry forward new compiler flags (eg, enabling PGO) from nightly -> release. To alleviate this problem, we built a tool that compared the mozconfigs of nightly and release builds, and complained about any differences that weren't whitelisted. This tool has worked very well for us for a few years now, but because it runs as part of pre-release checks, it has the downside of catching the problems in the critical path of the release -- leaving us scrambling to fix them quickly.

This changes today. Jason Yeo, a RelEng intern, has just landed bug 763903, which moves these checks to run at build-time. This means that problems of this nature will now be caught whenever they land -- even on Try. This means that anyone who changes a "nightly" or "release" mozconfig now needs to make a few decisions. The following is intended to help decide what needs to happen when making such a change:

  • Should my change to a nightly mozconfig apply to releases too?
    • If yes, change the "common-opt" mozconfig (eg: https://mxr.mozilla.org/mozilla-central/source/browser/config/mozconfigs/win32/common-opt)
    • If no, add the line to platform's "nightly" whitelist (https://mxr.mozilla.org/mozilla-central/source/browser/config/mozconfigs/whitelist)
  • Should my change to a release mozconfig apply to nightlies too?
    • If yes, change the "common-opt" mozconfig (eg: https://mxr.mozilla.org/mozilla-central/source/browser/config/mozconfigs/win32/common-opt)
    • If no, add the line to platform's "release" whitelist (https://mxr.mozilla.org/mozilla-central/source/browser/config/mozconfigs/whitelist)

At the moment, these checks only run for Desktop Firefox. Firefox for Android and Thunderbird will soon run them too (bug 885154).

Buildbot Scheduler and Builder graphing

One of the the most important systems I work is the release automation for Firefox and Thunderbird. The process behind the automation long predates me, but I've been deeply involved in automating, refining, and optimizing it. It shouldn't come as any surprise that one of the biggest challenges of working on such a complex system is understanding how the smaller pieces fit together to make the whole system. For the release automation we have an advantage though: the smaller pieces are generally Buildbot Builders, and the things that fit them together are generally Buildbot Schedulers. Awhile ago I was improving parallelism for l10n repacks and found it extremely difficult to reason about whether or not my changes would actually create the desired Builders and string them together correctly. I threw together some (terrible) code that spat out a digraph of the release automation's Builders and Schedulers. By comparing the before and after graphs I was able to iterate on some parts of my code without spending hours and hours testing.

This week I finally got around to tidying up and packaging this code as a more general purpose tool. It's not nearly complete and has many rough edges, but as a very basic tool to help you understand non-trivial Buildbot installations, I think it's wonderful. It's pip installable ("buildbot-scheduler-graph") and available on Github. Once you've got it, try it out with "buildbot-scheduler-graph /path/to/your/master.cfg /path/to/output-dir". Here's what Mozilla's scheduler graphs looks like. What do yours look like?

Loading Python modules from arbitrary files

tl;dr: Use imp.load_source.

I've been hacking on a tool on and off that needs to load Python code from badly named files (eg, "master.cfg"). To my surprise, there wasn't an obvious way to do this. My "go to" method of doing this is with execfile. For example, this will load the contents of master.cfg into "m", with each top level object as a key:

m = {}

execfile("master.cfg", m)

This works well enough for simple cases, but what happens when you try to load a module that loads other modules? It turns out that execfile has a nasty limitation of requiring modules that aren't in sys.path to be in the same directory as the file that calls execfile. You can't even chdir your way around this, you have to copy the files you need to the caller's directory. (We actually have some production code that does this.

Someone in #python on Freenode suggested using importlib. That seemed like a fine idea, especially after recently watching Brett Cannon's "How Import Works" talk. Unfortunately, Python 2.7's importlib only has a single method which can only load a module by name.

Eventually I came across a Stack Overflow post that pointed me at imp.load_source. This function is similar to execfile in that it loads Python code from a named file. However, it properly handles imports without the need to copy files around. It also has the nice added bonus of returning a module rather than throwing objects into a dict. I ended up with code like this, to load the contents of "foo/bar/master.cfg":

>>> import os, sys

>>> os.chdir("foo/bar")

>>> sys.path.insert(0, "") # Needed to ensure that the current directory is looked at when importing

>>> m = imp.load_source("buildbot.master.cfg", "master.cfg")

Problem solved!