This is part 4 of my Hello AFU tutorial. In the previous section we implemented the functionality to handle requests for the AFU descriptor. In this part we’ll shift focus a little bit into writing the C code that runs on the application side, and send our first bit of data to our AFU
Getting the Code Started
I like to start a new C project by writing a basic Makefile, this one will just set up some variables to include the libCXL library from PSLSE.
LIBCXL_PATH=~/workprojects/pslse/libcxl
LIBCXL_INCLUDE=-I $(LIBCXL_PATH) -L $(LIBCXL_PATH) -lcxl -lpthread
LIBRARIES=$(LIBCXL_INCLUDE)
CC=gcc -Wall -o $@ $< $(LIBRARIES)
all: test_afu
test_afu: test_afu.c
$(CC)
clean:
rm -f test_afu
Next I’ll write a basic C file that will just open a handle to the AFU and clean up.
#include <stdio.h>
#include "libcxl.h"
int main(int argc, char *argv[])
{
struct cxl_afu_h *afu;
afu = cxl_afu_open_dev("/dev/cxl/afu0.0d");
if(!afu)
{
printf("Failed to open AFU: %m\n");
return 1;
}
cxl_afu_attach(afu, 0x0123456789abcdef);
printf("Attached to AFU\n");
cxl_afu_free(afu);
return 0;
}
Next, just to make things a little faster, I’ve noticed my AFU typically becomes ready around 136ns, so I’ll modify my test.do
to run for 136ns right at the start. At this point I can make
my test_afu
binary and run it as long as I set my linker path via export LD_LIBRARY_PATH=~/workprojects/pslse/libcxl/
prior to running it.
The last thing to setup before running is to create a pslse_server.dat
file that contains what host:port the simulated libCXL should connect to. I’ll point mine to localhost:16384
which is the default if you’re testing locally.
After kicking off my test_afu
application and running the AFU for a few cycles, I’ll see my second argument to cxl_afu_attach
show up in my ha_jea
bus, this chunk of data is usually referred to as the Work Element Descriptor (WED).
I’ll commit my changes and we’ll start making a little better use of that WED.
Aligning data
Many of the requests we’ll make soon to read data from the applications memory space will require that the data is aligned to 128-byte addresses. There are a few ways to accomplish this, my go-to is the aligned_alloc() function that is part of the C11 standard.
This function provides an interface that is very similar to the classic malloc()
function, its first parameter lets you specify what memory alignment you want.
Now that we can align data, I’ll create my WED structure for this parity-generating AFU.
typedef struct
{
__u64 size;
void *stripe1;
void *stripe2;
void *parity;
__u64 done;
} parity_request;
Next I’ll create my example parity request, using aligned allocations for each block.
parity_request *example;
size_t size = 128, alignment = 128;
example = aligned_alloc(alignment, sizeof(*example));
example->size = size;
example->stripe1 = aligned_alloc(alignment, size);
example->stripe2 = aligned_alloc(alignment, size);
example->parity = aligned_alloc(alignment, size);
The intention here is that the data in the structure members stripe1
and stripe3
will be XOR’d together, and the results put in the parity
member. Once the operation is complete, the AFU will set the done
field to a non-zero.
Before sending this request to the AFU, I’ll copy some data into both buffers and zero out the done
field.
memcpy(example->stripe1,
"asfb190jwqsefx0amxAqa1nlkaf78sa0g&0ha8dngj3t21078fnajl38n32j3np2"
"x3t8wefiankxkfmgm ncmbqx8ehn2jkaeubgfbuapwnjxkg09f0w9es80872981",
size);
memcpy(example->stripe2,
"\x35\x1b\x07\x16\x11\x50\x43\x4a\x04\x1e\x1e\x00\x46\x08\x42\x0e"
"\x1d\x1d\x33\x51\x11\x50\x1c\x05\x1f\x18\x47\x17\x6c\x1b\x08\x43"
"\x47\x4f\x43\x48\x04\x40\x05\x0d\x13\x06\x4a\x54\x45\x59\x51\x43"
"\x18\x2f\x49\x0c\x4a\x09\x4b\x48\x0b\x50\x46\x03\x5d\x09\x50\x46"
"\x17\x13\x07\x5d\x12\x4b\x46\x20\x46\x0a\x4b\x19\x07\x15\x02\x47"
"\x01\x49\x05\x06\x4d\x16\x1e\x58\x4b\x00\x0d\x4e\x46\x02\x02\x12"
"\x45\x07\x17\x09\x08\x0b\x1b\x06\x50\x18\x00\x4a\x0b\x04\x0a\x55"
"\x19\x14\x55\x16\x55\x45\x14\x5d\x51\x4a\x17\x41\x56\x57\x5f",
size);
example->done = 0;
I’ll also add some print statements to show me these structure members.
printf("[example structure\n");
printf(" example: %p\n", example);
printf(" example->size: %llu\n", example->size);
printf(" example->stripe1: %p\n", example->stripe1);
printf(" example->stripe2: %p\n", example->stripe2);
printf(" example->parity: %p\n", example->parity);
printf(" &(example->done): %p\n", &(example->done));
I’ll modify my cxl_afu_attach()
call to send the pointer to this parity_request
structure.
cxl_afu_attach(afu, (__u64)example);
Lastly, I’ll add a while loop to wait until the AFU has completed it’s operation then spit out the data in the parity member.
printf("Waiting for completion by AFU\n");
while(!example->done){
sleep(1);
}
printf("PARITY:\n%s\n", (char *)example->parity);
At this point we can get the address of our WED structure in our AFU, but we’ll need to use the PSL’s Command and Buffer interfaces to request the data inside of that structure, which I’ll cover in the next post. Ending on this point I’ll commit my application code changes and see you in the next post!