Tinkering with CAPI

CAPI (Coherent Accelerator Processor Interface) is an exciting technology that should allow developers to more easily design applications that utilize a FPGA accelerator. This article documents my initial spelunking into this technology.

A little context

FPGAs

This is my first foray into tinkering with FPGAs and digital logic design in general. For those unfamiliar with this technology, an FPGA (Field-Programmable Gate Array) is a type of integrated circuit that essentially allows software defined hardware. For the designer it’s almost like a pile of digital logic gates and some mechanisms that allow you to define how they are connected together. With this technology, and the appropriate skill sets, the FPGA can be programmed to act like nearly any piece of digital hardware.

Accelerators

An accelerator is used kind of like a co-proccessor that can be used to hardware implement computationally expensive algorithms. The idea is that instead of processing something on the general purpose CPU in your computer you delegate processing to a piece of hardware designed specifically for the task at hand; somewhat similar to using a GPU for 3D rendering. In my first couple projects my goal is to make something more functional than practical.

CAPI

CAPI is a technology that should allow me to focus on the interesting parts of designing an accelerated application. Instead of worrying about how I’m going to communicate between code running in a Linux userspace application and custom piece of hardware, I get to focus my efforts on the application and the hardware itself!

To run it on real gear you’ll need a POWER8-based server, for me the plan is to tinker with this on the Barreleye server that I play with work on for Rackspace. To make this more accessible to other developers I will focus mostly on my design process and simulation on my x86_64 workstation.



My simulation environment

For my initial testbed I’m using Xubuntu 15.10 on my laptop and the Quartus Prime design software.

If you want to set this up for yourself I recommend you grab the flavor of Ubuntu that you like the most and install the Quartus Prime software. I’m using the 30-day Evaluation of Quartus Prime Standard Edition, but I believe the free Lite Edition would suffice for this tinkering as well. Elect to install ModelSim Starter Edition as part of the Quartus installation process.

Moar CAPI talk and terminology

Nallatech offers a CAPI developer kit and provides a copy of the CAPI User’s Manual. This manual describes a lot of the core components that make up the CAPI systems.

CAPI Diagram
Diagram from CAPI User’s Manual

An important part of the CAPI system on the FPGA side of things is the POWER Service Layer (PSL), which helps create the bridge between your custom hardware and userspace application. The accelerator itself is referred to as a Accelerator Function Unit (AFU) in the context of CAPI, this is the part I am most interested in designing.

PSL Diagram
Diagram of the PSL from CAPI User’s Manual

On the userspace side of things, libcxl is the library you include in your application to communicate with the PSL and the AFU(s) behind it.

The Power Service Layer Simulation Engine (pslse) can be used to help design and test this technology without the need for the physical gear. In the next few bits I’ll outline the process I have taken to set this up on my machine and run a sample project.

Building and setting up PSLSE

First, clone down the pslse repo from github

git clone https://github.com/ibm-capi/pslse

Build the AFU driver

The AFU driver is used by ModelSim to transmit signals between a simulated design and a running instance of PSLSE. To build it you’ll need to find the vpi_user.h header included in your ModelSim installation. For me this is located in /home/$USER/altera/15.1/modelsim_ase/include/. You’ll also need to compile for 32bit as ModelSim is a 32bit application.

cd pslse/afu_driver/src/
export VPI_USER_H_DIR="/home/$USER/altera/15.1/modelsim_ase/include/"
BIT32=y make

If you get an error about not finding a cdefs.h header, you’ll just need to install the libc6-dev-i386 package.

You can run file veriuser.sl to verify it generated a ELF 32-bit LSB shared object.

Build pslse itself

PSLSE has a straight forward build process, just make sure to build this for 32bit use as well.

cd ../../pslse/
BIT32=y make

Build libcxl from pslse repo

There is a variant of libcxl inside of the PSLSE repo that is modified for use in a simulated environment. This can be compiled for 64bit architecture as it communicates with the pslse over a socket.

cd ../libcxl/
make