Seeing a GPU from Jupyter

Warning: Although I wrote this a little while back, this is incomplete. I would now recommend using pybind11 rather than boost.python mentioned below - I'll update the page at some point to reflect that…

For many number-crunching workloads GPUs are faster than a CPU. But they are limited to simple calculations running in parallel, so initialisation, post-processing and basically everything that isn't the kernel of the problem will run on the CPU.

Woodcut of tipsy man swaying before two moons.

This page describes one way of using your own GPU code from a Jupyter notebook - a fairly easy path from Jupyter via Boost.Python to C++ and on to CUDA and your own Code running on the GPU. 1 This is not a description of installing TensorFlow, Theano or Numba in order to run an existing library's GPU-enabled code with Python - that’s covered by the documentation already. Nor is it similar to PyCUDA, which exposes CUDA functions directly. That's a good way to run “raw” CUDA code from Jupyter, which you should investigate if you can to skip the compiled-code layer described on this page and do all of your post processing in Python.

Putting it all together

Start at the top, with the Python code as we would like to execute it.

"A simple example of running custom GPU code from Python."
import gpumodule
output =, 32)

The module gpucode doesn't exist, but is easy to create. While it is possible to write Python extensions directly in C++, it is much easier to expose C++ interfaces to Python using Boost.Python.

//Example of exposing a single C++ function to Python
#include <boost/python.hpp>

int run(int nx, int ny)

        boost::python::def("run", run, "Runs on GPU.");

Notice that the parameters and return type of the exposed “run” function are inferred from the function declaration - Boost.Python doesn't need to be told about them again.2

In order to compile and link, you'll need to tell the build where your python and boost headers and libs are. This example uses CMake, so it works on linux and windows, but other methods would be simple too.

cmake_minimum_required(VERSION 3.0)
find_package(PythonLibs REQUIRED)
find_package(Boost COMPONENTS python REQUIRED)
include_directories(${PYTHON_INCLUDE_DIRS} ${BOOST_INCLUDE_DIR})
add_library(gpumodule SHARED gpumodule.cpp)
set_target_properties(gpumodule PROPERTIES PREFIX "")
target_link_libraries(gpumodule ${BOOST_LIBRARIES})


Running any of the above code is only possible with a properly configured system, but I've left discussion of it until last. Partly because it is boring, but mostly because it depends on your environment, including OS and version of Python.3 Rather than leave you on your own, I'll give examples here for my Linux and Windows environments. Do get in touch if you'd like to add a comment or correction to this page.

The requirements are python-dev, boost, cmake and a working CUDA installation. If you only have python installed without the python development libraries, you'll see compilation errors due to a missing pyconfig.h. It is easier to build Boost.Python from source on your system rather than trying to use a distributed binary library, since it will depend on the version of Python you're using.


On ubuntu / debian (for Redhat-alikes, yum the equivalents):

sudo apt install cmake python3-dev
export PYTHON_INCLUDE_DIR=/usr/include/python3.6
export PYTHON_LIBRARY=/usr/lib/python3.6/config/


Use whichever distribution of Python you prefer. The CUDA toolbox works best with Visual Studio - I built the above with VS2017 and CUDA9.1 and CMake 3.9.


When there is a problem executing on the GPU, for example the code executes on a machine without a suitable card or driver, exceptions carrying the error message are send from C++ and received by the Python script. If the Python code doesn't try/except gracefully, you'll get a stacktrace with the error.


If you drive your own GPU code from Python - whether or not you go via Jupyter, or you followed the steps here - let me know about it. I'd like to learn from your experience and share any details here.

  1. It seemed a natural way of doing this sort of work, but I haven't see it described elsewhere, perhaps because it needs technologies in different parts of the stack. In my case, I had some existing C++ code for running Lattice Boltzmann simulations - I converted the core code to run on the GPU (previously it ran on multiple CPUs using MPI), with setup, orchestration and analysis in C++. Intermediate results were generated as files on disk, and were then processed with Python to create human-readable output - graphs, charts and summaries. This batch-processing worked well, but it's much less interactive and fun than driving from Jupyter. ↩︎

  2. It works for classes as well, but we'll keep it simple here. The module name declared in code must match the file name of the library. That is on Linux, gpumodule.pyd on Windows. ↩︎

  3. Just don't use Python 2 - it would work, but there's no reason to stay in the past. ↩︎