Easily spawning parallel tasks with python

Wow !

My 1-bit LBDs reconstruction algorithm is working now :-)1

The problem…

My goal now is to produce enough reconstruction results for a journal paper. Hence, I wrote a small Python script that :

  • gets all the image files from a directory
  • generates some experiments by creating different sets of parameters
  • launches the reconstruction routine for each image with each set of parameter in a shell with the os.system module

    my_command = str(…) os.system(command)

There is an obvious parallelism in the big experiment loop, since each reconstruction process is independent from the others.

In C/C++, I would have used OpenMP on a carefully chosen loop, and voilà ! However, there is no such thing in Python.

… has a simple solution !

Since I’m new with Python, I had to do some googling for some time, until I found the multiprocessing module.

This module introduces the handy concept of Pool, that groups together a bunch of processes. By default, it creates the same number of processes as the number of CPUs on the host machine.

Then, you can add function calls to your pool either with the command Pool.apply(func), which blocks until the function returns (hence not really interesting…) or asynchronously using Pool.apply_async(func). When I have enqueued a large number of processes, I close the Pool and wait for the processes to complete.

Happy ending

Right now, after writing a few lines of code, I have 10 cores frenetically computing LBDs reconstructions ! I’ll keep you updated about the results once the paper is submitted.

Post-scriptum: Joblib (10.09.12)

Gaël, in the comments section below, gives us a link to another Python module called joblib. As the title of joblib’s page reads “Embarrassingly parallel for loops”, it sure looks like a good candidate for the task I’ve exposed in this post ! I especially like the possibility to pass a parameter claiming how many CPUs should be ignored by the process pool, thus allowing one to continue working on something else in a transparent manner. It would be quite useful for me since I work at the same time on an 8-CPU workstation and a 2-CPU laptop, hence specifying the number of busy CPUs is less useful for me than specifying the number of idle cores.

If you don’t want to miss the next post, you can register to the blog’s feed or follow me on Twitter!

  1. I will keep the primer of the results for my ICPR talk. So, be there or be patient ! ^