Hello AFU – Part 4

This is part 4 of my Hello AFU tutorial. In the previous section we implemented the functionality to handle requests for the AFU descriptor. In this part we’ll shift focus a little bit into writing the C code that runs on the application side, and send our first bit of data to our AFU

Getting the Code Started

I like to start a new C project by writing a basic Makefile, this one will just set up some variables to include the libCXL library from PSLSE.

LIBCXL_PATH=~/workprojects/pslse/libcxl
LIBCXL_INCLUDE=-I $(LIBCXL_PATH) -L $(LIBCXL_PATH) -lcxl -lpthread
LIBRARIES=$(LIBCXL_INCLUDE)
CC=gcc -Wall -o $@ $< $(LIBRARIES)

all: test_afu

test_afu: test_afu.c
    $(CC)

clean:
    rm -f test_afu

Next I’ll write a basic C file that will just open a handle to the AFU and clean up.

#include <stdio.h>
#include "libcxl.h"


int main(int argc, char *argv[])
{
    struct cxl_afu_h *afu;

    afu = cxl_afu_open_dev("/dev/cxl/afu0.0d");
    if(!afu)
    {
        printf("Failed to open AFU: %m\n");
        return 1;
    }

    cxl_afu_attach(afu, 0x0123456789abcdef);
    printf("Attached to AFU\n");

    cxl_afu_free(afu);

    return 0;
}

Next, just to make things a little faster, I’ve noticed my AFU typically becomes ready around 136ns, so I’ll modify my test.do to run for 136ns right at the start. At this point I can make my test_afu binary and run it as long as I set my linker path via export LD_LIBRARY_PATH=~/workprojects/pslse/libcxl/ prior to running it.

The last thing to setup before running is to create a pslse_server.dat file that contains what host:port the simulated libCXL should connect to. I’ll point mine to localhost:16384 which is the default if you’re testing locally.

After kicking off my test_afu application and running the AFU for a few cycles, I’ll see my second argument to cxl_afu_attach show up in my ha_jea bus, this chunk of data is usually referred to as the Work Element Descriptor (WED).

wed_signal

I’ll commit my changes and we’ll start making a little better use of that WED.



Aligning data

Many of the requests we’ll make soon to read data from the applications memory space will require that the data is aligned to 128-byte addresses. There are a few ways to accomplish this, my go-to is the aligned_alloc() function that is part of the C11 standard.

This function provides an interface that is very similar to the classic malloc() function, its first parameter lets you specify what memory alignment you want.

Now that we can align data, I’ll create my WED structure for this parity-generating AFU.

typedef struct
{
    __u64 size;
    void *stripe1;
    void *stripe2;
    void *parity;
    __u64 done;
} parity_request;

Next I’ll create my example parity request, using aligned allocations for each block.

parity_request *example;
size_t size = 128, alignment = 128;

example = aligned_alloc(alignment, sizeof(*example));
example->size = size;
example->stripe1 = aligned_alloc(alignment, size);
example->stripe2 = aligned_alloc(alignment, size);
example->parity = aligned_alloc(alignment, size);

The intention here is that the data in the structure members stripe1 and stripe3 will be XOR’d together, and the results put in the parity member. Once the operation is complete, the AFU will set the done field to a non-zero.

Before sending this request to the AFU, I’ll copy some data into both buffers and zero out the done field.

memcpy(example->stripe1,
       "asfb190jwqsefx0amxAqa1nlkaf78sa0g&0ha8dngj3t21078fnajl38n32j3np2"
       "x3t8wefiankxkfmgm ncmbqx8ehn2jkaeubgfbuapwnjxkg09f0w9es80872981",
       size);
memcpy(example->stripe2,
       "\x35\x1b\x07\x16\x11\x50\x43\x4a\x04\x1e\x1e\x00\x46\x08\x42\x0e"
       "\x1d\x1d\x33\x51\x11\x50\x1c\x05\x1f\x18\x47\x17\x6c\x1b\x08\x43"
       "\x47\x4f\x43\x48\x04\x40\x05\x0d\x13\x06\x4a\x54\x45\x59\x51\x43"
       "\x18\x2f\x49\x0c\x4a\x09\x4b\x48\x0b\x50\x46\x03\x5d\x09\x50\x46"
       "\x17\x13\x07\x5d\x12\x4b\x46\x20\x46\x0a\x4b\x19\x07\x15\x02\x47"
       "\x01\x49\x05\x06\x4d\x16\x1e\x58\x4b\x00\x0d\x4e\x46\x02\x02\x12"
       "\x45\x07\x17\x09\x08\x0b\x1b\x06\x50\x18\x00\x4a\x0b\x04\x0a\x55"
       "\x19\x14\x55\x16\x55\x45\x14\x5d\x51\x4a\x17\x41\x56\x57\x5f",
       size);
example->done = 0;

I’ll also add some print statements to show me these structure members.

printf("[example structure\n");
printf("  example: %p\n", example);
printf("  example->size: %llu\n", example->size);
printf("  example->stripe1: %p\n", example->stripe1);
printf("  example->stripe2: %p\n", example->stripe2);
printf("  example->parity: %p\n", example->parity);
printf("  &(example->done): %p\n", &(example->done));

I’ll modify my cxl_afu_attach() call to send the pointer to this parity_request structure.

cxl_afu_attach(afu, (__u64)example);

Lastly, I’ll add a while loop to wait until the AFU has completed it’s operation then spit out the data in the parity member.

printf("Waiting for completion by AFU\n");
while(!example->done){
  sleep(1);
}

printf("PARITY:\n%s\n", (char *)example->parity);

At this point we can get the address of our WED structure in our AFU, but we’ll need to use the PSL’s Command and Buffer interfaces to request the data inside of that structure, which I’ll cover in the next post. Ending on this point I’ll commit my application code changes and see you in the next post!

Leave a Reply