Concurrency (without bugs) is hard!

Story of a bug that I introduced in my codebase while making it parallel. Hopefully, I could figure it out quickly (thanks to some well placed assert’s), but it took me a little but more time to understand what what happening.

The context

The problem that I was working on could be summarized as follows:

  • let’s say that we have a procedure (a signal processing filter for example) that knows how to operate on some data
  • and now pretend that we want to re-use that procedure (because we like it and it’s well tested) but with different types of data (new measurement instrument, new data set…) that will need some specific pre- or post-processing.

Since our core routine is supposed to be well tested and doing its job right we don’t want to modify it in any way (not even by copying its code to adapt it to our new data). A solution to this problem is to design a class that will provide a public method that allows to call our wonderful routine and that can be subclassed (specialized) in order to take care of applying the correct pre/post-processing.

As Herb Sutter suggests here, it’s a good idea in this case to keep the public method that will be called by clients non-virtual, while the internal pre/post-processing methods are declared as private and virtual. The rationale behing is quite simple: the sequence of commands executed in the public function does not vary1, so it doesn’t have any reasons to be virtual.2

This yields us with the following base class:

    class Base
    {
    public:

        /**
         * Public interface to a complex algorithm.
         *
         * The complex algorithm can have variants (e.g.,
         * put the data in the correct shape, apply some
         * post-processing to remove outliers in the result...).
         * The virtual methdos are private so that clients of the class
         * do only see a stable interface and the sequence of execution
         * does also remain stable.
         *
         * The outline of the algorithm is always:
         * 1. pre-processing step (can be overriden)
         * 2. processing (fixed)
         * 3. result post-processing (can be overriden)
         */
        void performComplexAlgorithm() const
        {
            preprocessing();
            processing();
            postprocessing();
        }

    private:

        /**
         * Prepare the data (e.g., convert RGB images to GRAY).
         * Can be overriden by subclasses.
         */
        virtual void preprocessing() const
        {
            std::cout << "Base::" << __FUNCTION__ << std::endl;
        }

        /**
         * Really solve the problem here.
         */
        void processing() const
        {
            std::cout << "Base::" << __FUNCTION__ << std::endl;
        }

        /**
         * Post-process the data (remove outliers, put back to RGB...).
         * Can be overriden by subclasses.
         */
        virtual void postprocessing() const
        {
            std::cout << "Base::" << __FUNCTION__ << std::endl;
        }
    };

Some subclasses are used to specialize what happened inside the pre/post-processing methods, for example:

	/**
	 * Specialized class to deal with a certain type of data
	 */
	class Derived : public Base
	{
	private:

	    /**
	     * This special type of data needs to be put into a usable shape
	     * for our brilliant processing function.
	     */
	    virtual void preprocessing() const override
	    {
		std::cout << "Derived::" << __FUNCTION__ << std::endl;
	    }

	};

The issue

So far, so good.

In order to spend less time glaring at my screen and waiting for a result, I decided to run several threads in parallel, each of them applying a solver object on a line of an image. This was easy to set using the std::thread class from C++11. I thought I was all done (I did even think of copying my solver object so that each thread did hold its own copy to avoid conflicts between them) but… it turned out that the threads were always using solvers of the base class. The specialized subclasses seemed to be ignored :-/

However, when I checked, I was constantly passing one pf my specialized solver classes as arguments to the main routine…

The solution

Taking some time to think about it…

Actually, I’ve provided much of the solution in the previous section:

I did even think of copying my solver object so that each thread did hold its own copy

This is precisely the mistake:

  • my main routine (before dispatching the work to the threads) was inherited from my single-threaded implementation and expected a reference to an object of the base class;
  • when used directly from the main routine, the virtual function mechanisms work correctly and call the specialized implementations of Derived::preprocessing();
  • however, when copy-constructing solvers in the creation of the threads the compiler generated a call to the copy-constructor of the Base class, instead of the Derived class, because I was using a reference to a Baseobject inside this routine.

Reproducing the bug

This behaviour can be reproduced using these simple functions and the associated main:

	void wrongWorker(Base const solver)
	{
	    std::cout << "Inside " <<  __FUNCTION__ << std::endl;
	    solver.performComplexAlgorithm();
	}

	void correctWorker(Base const& solver)
	{
	    std::cout << "Inside " <<  __FUNCTION__ << std::endl;
	    solver.performComplexAlgorithm();
	}

	int main()
	{
	    std::cout << "Sanity check: call performComplexAlgorithm from Derived:"
                  << std::endl;
	    Derived D0;
	    D0.performComplexAlgorithm();

	    std::cout << std::endl;

	    std::unique_ptr<Base> D1 = std::unique_ptr<Derived>(new Derived);
	    std::cout << "Polymorphic access via a pointer:" << std::endl;
	    D1->performComplexAlgorithm();

	    std::cout << std::endl;

	    std::cout << "Wrong (calls methods from the BASE class):"
                  << std::endl;
	    wrongWorker(*D1);

	    std::cout << std::endl;

	    std::cout << "Correct (calls methods from the DERIVED class):"
                  << std::endl;
	    correctWorker(*D1);
	}

You can find the whole project at this gist.

Consequences

Hopefully, this bug did not take me long to figure out and fix (thanks to some well placed asserts in the code). The fix was easy: instead of creating the solver instances outside the threads, I created them inside using a small factory function. Since this factory takes only 1 parameter, it did not even cripple my thread creation calls with lots of additional arguments.

Lessons learned:

  • Beware of copy-constructed instances of object! They can be tricky to get right in the context of polymorphism…
  • Concurrency is always harder than you thought!
  • It’s good to have some debugging hooks/well placed assertions to enforce and check your assumptions… especially when moving from sequential to paralle execution ;)

  1. It’s always 1) pre-processing, 2) processing, 3) post-processing. ^
  2. This is also known as the Non-Virtual Interface idiom. ^

Related