Tag Archives: python

Redo 1.3 is released – now with more natural syntax!

We’ve been using the functions packaged in Redo for a few years now at Mozilla. One of the things we’ve been striving for with it is the ability to write the most natural code possible. In it’s simplest form, retry, a callable that may raise, the exceptions to retry on, and the callable to run to cleanup before another attempt – are all passed in as arguments. As a result, we have a number of code blocks like this, which don’t feel very Pythonic:

retry(self.session.request, sleeptime=5, max_sleeptime=15,
      retry_exceptions=(requests.HTTPError, 
                        requests.ConnectionError),
      attempts=self.retries,
      kwargs=dict(method=method, url=url, data=data,
                  config=self.config, timeout=self.timeout,
                  auth=self.auth, params=params)
)

It’s particularly unfortunate that you’re forced to let retry do your exception handling and cleanup – I find that it makes the code a lot less readable. It’s also not possible to do anything in a finally block, unless you wrap the retry in one.

Recently, Chris AtLee discovered a new method of doing retries that results in much cleaner and more readable code. With it, the above block can be rewritten as:

for attempt in retrier(attempts=self.retries):
    try:
        self.session.request(method=method, url=url, data=data,
                             config=self.config,
                             timeout=self.timeout, auth=self.auth,
                             params=params)
        break
    except (requests.HTTPError, requests.ConnectionError), e:
        pass

retrier simply handles the the mechanics of tracking attempts and sleeping, leaving your code to do all of its own exception handling and cleanup – just as if you weren’t retrying at all. It’s important to note that the break at the end of the try block is important, otherwise self.session.request would run even if it succeeded.

I released Redo 1.3 with this new functionality this morning – enjoy!

Redo – Utilities to retry Python callables

We deal with a lot of flaky things in RelEng. The network can drop. Code can have race conditions. Servers can go offline temporarily. Freak errors can happen (more often than you’d think). One of the ways we’ve learned to cope with this is to add “retry” behaviour to damn near everything that could fail intermittently. We use it so much that we’ve got a Python library and command line tool that are used all over the place.

Last week I finally got around to packaging and publishing ours, and I’m happy to present: Redo – Utilities to retry Python callables. Redo provides a decorator, context manager, plain old function, and even a command line tool to retry all sorts of things that may break. It’s very simple to use, here’s some examples from the docs:
The plain old function:

def maybe_raises(foo, bar=1):
    ...
    return 1

def cleanup():
    os.rmtree("/tmp/dirtydir")

ret = retry(maybe_raises, retry_exceptions=(HTTPError,),
            cleanup=cleanup, args=1, kwargs={"bar": 2})

The decorator:

from redo import retriable

@retriable()
def foo()
    ...

@retriable(attempts=100, sleeptime=10)
def bar():
    ...

The context manager:

def foo(a, b):
    ...

with retrying(foo, retry_exceptions=(HTTPError,)) as retrying_foo:
    r = retrying_foo(1, 3)

You can grab version 1.0 from PyPI, or find it on Github, where you can send issues or pull requests.

How to deal with timezone adjusted “epoch” timestamps in Python

Today I discovered that we have a system that returns “epoch” timestamps, but adjusted for Pacific time. This means that depending on whether daylight savings time is in effect, these timestamps are 7 or 8 hours ahead when interpreted by most tools. These are horribly difficult to deal with as unix timestamps are assumed to be in UTC time. I spent a good deal of time banging my head against Python’s datetime and pytz modules (as well as the wall). With some help from John Hopkins I found a solution:

In the following example we’ll convert the Pacific “epoch” timestamp 1383000394 to a proper epoch timestamp (which is 1382975194).

First, we need a tzinfo object for Pacific time:

>>> import pytz
>>> pacific_time = pytz.timezone("America/Los_Angeles")

Next, we need to get the initial timestamp into a datetime object to work with it. Note that it’s important to use utcfromtimestamp() here otherwise you’ll get a localized datetime object – which will only be useful if the machine you run this on is in Pacific time:

>>> from datetime import datetime
>>> dt = datetime.utcfromtimestamp(1383000394)
>>> dt
datetime.datetime(2013, 10, 28, 22, 46, 34)

It gets a little weird from here. We need to subtract the Pacific offset from the datetime object in order to get it into an actual UTC time. To do that we can force the datetime object in UTC time and then use its built-in astimezone() method to do the conversion. I think this still leaves a 7 or 8 hour window whenever DST starts or ends where this conversion is an hour off – but it’s good enough for my usage:

>>> dt = dt.replace(tzinfo=pytz.utc)
>>> dt
datetime.datetime(2013, 10, 28, 22, 46, 34, tzinfo=)
>>> dt = dt.astimezone(pacific_time)
>>> dt
datetime.datetime(2013, 10, 28, 15, 46, 34, tzinfo=)

Now we have a datetime object with the correct time, but claiming to be in Pacific. We can fix that by replacing the tzinfo again:

>>> dt = dt.replace(tzinfo=pytz.utc)
>>> dt
datetime.datetime(2013, 10, 28, 15, 46, 34, tzinfo=)

The only thing left to do now is convert to epoch time!

>>> import calendar
>>> calendar.timegm(dt.utctimetuple())
1382975194

Voila, the timestamp we were looking for!

Much credit to John Hopkins for his code that taught me how to use datetime.replace() and astimezone(). No credit at all goes to Python’s datetime module, which is sorely in need of an overhaul.

Loading Python modules from arbitrary files

tl;dr: Use imp.load_source.

I’ve been hacking on a tool on and off that needs to load Python code from badly named files (eg, “master.cfg”). To my surprise, there wasn’t an obvious way to do this. My “go to” method of doing this is with execfile. For example, this will load the contents of master.cfg into “m”, with each top level object as a key:

m = {}
execfile("master.cfg", m)

This works well enough for simple cases, but what happens when you try to load a module that loads other modules? It turns out that execfile has a nasty limitation of requiring modules that aren’t in sys.path to be in the same directory as the file that calls execfile. You can’t even chdir your way around this, you have to copy the files you need to the caller’s directory. (We actually have some production code that does this.

Someone in #python on Freenode suggested using importlib. That seemed like a fine idea, especially after recently watching Brett Cannon’s “How Import Works” talk. Unfortunately, Python 2.7′s importlib only has a single method which can only load a module by name.

Eventually I came across a Stack Overflow post that pointed me at imp.load_source. This function is similar to execfile in that it loads Python code from a named file. However, it properly handles imports without the need to copy files around. It also has the nice added bonus of returning a module rather than throwing objects into a dict. I ended up with code like this, to load the contents of “foo/bar/master.cfg”:

>>> import os, sys
>>> os.chdir("foo/bar")
>>> sys.path.insert(0, "") # Needed to ensure that the current directory is looked at when importing
>>> m = imp.load_source("buildbot.master.cfg", "master.cfg")

Problem solved!