A simple CUDA kernel + mex file (part 2)

So, we have seen first that using Matlab and CUDA together was not always straightforward, especially on a Mac. Then, I showed a simple project to demonstrate how to do it in practice. In this post, we’re going to see how to compile this project using a Makefile.

Final part : the Makefile!

Overview of the compilation process

Let’s recap the different steps needed to compile our mex file :

  • get the name and location of the various libraries, scripts and compilers involved
  • compile the CUDA kernels with the NVIDIA nvcc compiler
  • compile any additional C/C++ file (not required here)
  • make a library of these object files
  • use the mex script provided by Matlab to compile the mexFunction(), and link it with the library obtained above.

Anatomy of the Makefile

CUDA_INSTALL_PATH := /usr/local/cuda

SDK    := /Developer/GPU\ Computing/C

INC     := -I$(CUDA)/include -I$(SDK)/common/inc -I.
LIB     := -L$(CUDA)/lib   -L$(SDK)/lib

# Mex script installed by Matlab
MEX = /Applications/MATLAB_R2012a.app/bin/mex

# Flags for the CUDA nvcc compiler
NVCCFLAGS :=  -O=4 -arch=sm_11 --ptxas-options=-v -m 64

# IMPORTANT : don't forget the CUDA runtime (-lcudart) !
LIBS     := -lcudart -lcusparse -lcublas

# Regular C++ part
CXX = g++
CFLAGS = -Wall -c -O2 -fPIC $(INC)
LFLAGS = -Wall

AR = ar

all: dataloop mex

     $(CUDA)/bin/nvcc dataloop.cu -c -o dataloop.cu.o $(INC) $(NVCCFLAGS)

main.o:        main_dataloop.cpp
     ${CXX} $(CFLAGS) $(INC) -o main.o main_dataloop.cpp

dataloop:     kernels main.o
     ${CXX} $(LFLAGS) -o demo_dataloop main.o dataloop.cu.o $(LIB) $(LIBS)

dataloop.a:     kernels
     ${AR} -r libdataloop.a dataloop.cu.o

mex:     dataloop.a
     ${MEX} -L. -ldataloop -v mex_dataloop.cpp -L$(CUDA)/lib $(LIBS)
     install_name_tool -add_rpath /usr/local/cuda/lib mex_dataloop.mexmaci64

     rm *.o a.out *.a *.mexmaci* *~

(Be careful and refrain from copy/pasting, it seems that the CMS has lost the spaces / tabs alignment.)

Setting up the environment

In the first part, we specify the path and names of various tools, including of course nvcc and g++1. Some additional files that were required by the software consultant’s code are installed in the NVIDIA GPGPU SDK, so we had to give its path too. Finally, we need to give the full path to Matlab’s mex script, because MacTeX (a TeX distribution for Mac) already installs an executable with the same name in /usr/local, hence it’s detected by the terminal instead of the Matlab’s script.

The various flags and options are standard and you should already now most of them (and if you don’t, don’t hesitate to ask in the comments below !) :

  • -O for the optimization level (code speed vs. debugability)
  • -arch=sm_11 to force nvcc to compile for a compute capability of 1.1 (the CC of our target card). If you don’t require that, then I understand from the doc that the CUDA code could be compiled for a higher CC than the one of your card, and the runtime would do the necessary translations when the kernel is running
  • -m 64 to enforce the creation of a 64 bits object file instead of the default 32 bits. This is mandatory if we want to link then with a 64 bits C++/mex executable, otherwise we get an architecture mismatch error from the linker

Compiler and linker options

The creation of object files with nvcc and g++ or llvm is simple, using the -c option. We keep the .cu part in the filename to keep in mind that we created dataloop.cu.o with nvcc, but it could be removed.

Since we want to perform static linking, we call the archive builder ar and tell it to append both object files (.o).

Then, we calling mex, we don’t forget to indicate the path to our build results with -L. (we are in the same directory) and to link against them with the -l option. If you’ve never seen this before2, you have to know that for historical reasons gcc (and the compilers that use its syntax) automatically append the lib prefix to the argument of -l, hence our call -ldataloop is correct.

One more thing…

I’ve already explained before that you can’t safely rely on variables such as LD_LIBRARY_PATH / DYLD_LIBRARY_PATH to find the external libraries when working on a Mac. However, you still have to provide the location of the dynamically linked functions !

This can be achieved by modifying the final output (the compiled mex) with install_name_tool(). We add /usr/local/cuda/lib to the list of the directories explored at launch (variable rpath) because it is a non-standard location used by our CUDA install.

Note that we could do it to the standalone executable (target dataloop of the Makefile), but it was unnecessary because it’s launched from a terminal, and I had already added /usr/local/cuda/lib to the DYLD_LIBRARY_PATH variable in the .profile.

How to use it ?

Once you are in the correct directory, just invoke the different targets :

  • make kernels allows you to build the CUDA code
  • make dataloop builds the kernels (if needed) and creates a command line executable to test them
  • make mex creates a Matlab-compatible command, that you can call with mex_dataloop(single(my_vector)).


I hope you enjoyed this mini-series ! It’s short, but yet it gets you started correctly for Matlab-CUDA-Mac –joy and happiness– interactions. You can download an archive with this project in order to test and modify it.

Don’t hesitate to drop a word or ask questions in the comment form below, it helps keeping me motivated and the updates coming !

*If you don’t want to miss the next post, you can register to the blog’s feed or follow me on Twitter!”

  1. More precisely, on our system it’s a symlink to llvm-g++-4.2 ^
  2. This was the case of our Windows consultant. ^