CUDA weirdness

Just lost a few hours ripping my hair apart1.

I’m currently doing some tests compiling CUDA code as object files or as a C/C++ library, in order to call it from the Dreaded Matlab.

Wait a minute… did you say external Cuda lib + Matlab ?

Yes, I did say that. I am aware of the existence of Matlab’s Parallel Computing Toolbox. However, two important things make me avoid it :

  1. the toolbox requires Cuda devices with compute capability at least 1.3. Except for a few (2 or 3) cards in our lab, we have devices with capabilities 1.1. This comes from the fact that we are a Mac-only lab, and our workstations (Mac Pro’s) are either old or embedding Radeon cards2;
  2. the code may eventually be released as an external (open-sourced) library, and it’s good to have it as-loosely-as-possible tied to a specific, expensive software, because not every lab will have it, or the correct version, etc.

Back to the topic : nvcc on mac

Anyway, back to the topic.

NVCC on the mac

I had previously experienced that there was an architecture mismatch when mixing C/C++ libraries and Cuda code on my mac, producing undefined symbols. The solution was quickly identified : nvcc compiles by default (at least on my machine) for 32 bits architecture, while gcc3 compiles for the native arch, which is x86_64 on my Core 2 Duo (64 bits).

So, I was careful to invoke nvcc with the -m 64 option, and gcc with the -arch x86_64 option, to control exactly which parameters were used. I compiled my files in three steps, following e.g. this procedure on [Stack Overflow]((

Still undefined symbols

Furthermore, to avoid any linkage weirdness, and because Cuda is originally C-only compatible, I wrote everything in standard vanilla C, avoiding C++ features, and compiled my object files by invoking gcc. However, I still had some undefined symbols at the link stage (e.g. _main) !

I tried a quick fix : I compiled with g++ instead, and everything worked ! I can’t really explain why this happened. The Cuda compiler backend has maybe moved to C++, and some extern "C" {}’s are maybe required in my include files.

Anyway, I have something working now, and I can tackle the Next Big Step : linking my lib with a mex file…

  1. Which is why I like wearing it short: no grip. ^
  2. A list of Cuda compute capabilities can be found here. ^
  3. Actually, llvm-gcc. ^