This is part 5 of my Hello AFU tutorial. In the last post, I built the C application that would attach and utilize the AFU that’s the focus of these posts. In this post I’ll start pulling data from the application’s memory space into the AFU and read the WED structure.
Keeping it Running
Before I start requesting for data, some modifications are necessary to notify the underlying systems that the AFU is running. So far, I’m not managing the ah_jrunning
signal that should be set high when the AFU is performing a task. After a short time the PSL will stop driving the AFU’s clock if the AFU hasn’t raised the ah_jrunning
signal, so lets quickly fix this and improve the parity_afu
module a little bit.
I’ll refactor the always_ff
block of the parity_afu
module to use a case statement to handle commands and add handling for the START
command in addition to our existing RESET
command.
always_ff @(posedge clock) begin
if(job_in.valid) begin
case(job_in.command)
RESET: begin
jdone <= 1;
job_out.running <= 0;
end
START: begin
jdone <= 0;
job_out.running <= 1;
end
endcase
end else begin
jdone <= 0;
end
end
Now that I’m setting job_out.running
, I’ll also remove my static assignment of that signal. These changes are committed here.
Planning for the Work Element Submodule
The ground work to actually deal with the issue at hand is almost completely laid out. The module that will do the real work will have considerably more complexity than the components so far, so I’ll start planning and creating a new module to segregate that functionality to, my parity_workelement
.
First I’ll define the inputs and outputs of this module
Direction | Name | Purpose |
---|---|---|
Input | clock | Clock signal to follow |
Input | enabled | High while AFU is in running state |
Input | reset | Signal triggering reset of internal state |
Input | wed | The WED pointer from userspace |
Input | buffer_in | For reading userspace buffer data |
Input | response | To check responses of commands |
Output | command_out | To request buffer reads and writes |
Output | buffer_out | For writing userspace buffer data |
We’ll also define a mostly linear finite state machine to describe the work to be done.
State | Purpose | Next State |
---|---|---|
START | Request data at WED | WAITING_FOR_REQUEST |
WAITING_FOR_REQUEST | Wait for WED data to be available | REQUEST_STRIPES |
REQUEST_STRIPES | Send commands to read stripe1 and stripe2 |
WAITING_FOR_STRIPES |
WAITING_FOR_STRIPES | Wait for stripe data to be available | WRITE_PARITY |
WRITE_PARITY | Write XOR’d parity from stripes back to memory | REQUEST_STRIPES if more data to read; DONE otherwise |
DONE | Write done flag and halt. |
n/a |
Now I’ll write the first couple portions of this module. I’ll create an enumeration that contains the various states used by the module. In the module definition itself I’ll define the input/output ports and create an internal register for the current_state
. While I’m in here I setup some signals with assign
, mostly some settings I don’t want to change and a few parity generators as well. Lastly I’ll start off the always_ff
block that’ll contains the reset logic and the case statement that implements my state machine.
import CAPI::*;
typedef enum {
START,
WAITING_FOR_REQUEST,
REQUEST_STRIPES,
WAITING_FOR_STRIPES,
WRITE_PARITY,
DONE
} state;
module parity_workelement (
input logic clock,
input logic enabled,
input logic reset,
input pointer_t wed,
input BufferInterfaceInput buffer_in,
input ResponseInterface response,
output CommandInterfaceOutput command_out,
output BufferInterfaceOutput buffer_out
);
state current_state;
assign command_out.abt = 0,
command_out.context_handle = 0,
buffer_out.read_latency = 1,
command_out.command_parity = ~^command_out.command,
command_out.address_parity = ~^command_out.address,
command_out.tag_parity = ~^command_out.tag,
buffer_out.read_parity = ~^buffer_out.read_data;
always_ff @ (posedge clock) begin
if (reset) begin
current_state <= START;
end else if (enabled) begin
case(current_state)
START: begin
$display("Started!");
end
endcase
end
end
endmodule
With that defined, I’ll modify my parity_afu
module to include and instance of my parity_workelement
:
parity_workelement workelement(
.clock(clock),
.enabled(job_out.running),
.reset(jdone),
.wed(job_in.address),
.buffer_in(buffer_in),
.response(response),
.command_out(command_out),
.buffer_out(buffer_out));
To reduce how much I’m looking at during simulation, I’ll also modify my test.do
to just show what’s going on in my workelement.
vsim work.top
add wave -position insertpoint sim:/top/a0/svAFU/workelement/*
run 136
Since this is a significant amount of code I’ll commit here before implementing the state machine.
Requesting Data
Requesting the WED data will be easy enough, but I first want a handy container to put it in, so I’ll define a new type in SystemVerilog that matches my WED structure in C but I skip the done
field as I don’t need to look at what’s currently in there; I can set that later by it’s offset relative to the WED.
typedef struct {
longint unsigned size;
pointer_t stripe1;
pointer_t stripe2;
pointer_t parity;
} parity_request;
Next I’ll add an internal register to the parity_workelement
module that can hold this structure.
parity_request request;
To use the PSL’s Command Interface to request this data, the PSL requires that each active commands has a unique tag ID. I’ll define another enum that will be used to automatically ensure I have a unique tag for each purpose.
typedef enum logic [0:7] {
REQUEST_READ,
STRIPE1_READ,
STRIPE2_READ,
PARITY_WRITE,
DONE_WRITE
} request_tag;
The simplest way to request data from userspace is using the READ_CL_NA
, or “read cacheline, no allocate”, command. I’ll request a read size of 32 bytes, as I’m reading in 4 64-bit pointers. I’ll set the tag to REQUEST_READ
and use the wed
as my address. As with the other interfaces, I need to set a valid signal high for 1 clock, I’ll do this by setting it high in the START
state, transitioning to the WAITING_FOR_REQUEST
state, and have it set back low there.
case(current_state)
START: begin
command_out.command <= READ_CL_NA;
command_out.tag <= REQUEST_READ;
command_out.size <= 32;
command_out.address <= wed;
command_out.valid <= 1;
current_state = WAITING_FOR_REQUEST;
end
WAITING_FOR_REQUEST: begin
command_out.valid <= 0;
end
endcase
When the data I’ve requested comes back, it’ll come via two writes on the buffer_in.write_data
bus. This bus is 512-bites wide, but supports 128 byte (1024 bit) requests. As such, there are two writes that occur to deliver the lower (address 0) and higher (address 1) halves. Since I’ve only requested 32 bytes, the data will be in the first 256 bits of the writes to address 0 for the REQUEST_READ
tag.
One important thing to look out for is that you can get multiple cycles of data on this bus, so you need to capture that data until the response interface lets you know the last cycle was valid.
With this in mind I’ll read the buffer interface each time it’s a valid signal and it’s for my tag and it’s for the address I’m looking for. It’s also important to remember that the terms read
and write
for the buffer interface are named from the PSL’s perspective, so even though I’m making a read request to read data, it comes to the AFU on the buses named write_data
and such.
if (buffer_in.write_valid &&
buffer_in.write_tag == REQUEST_READ &&
buffer_in.write_address == 0) begin
request.size <= buffer_in.write_data[0:63];
request.stripe1 <= buffer_in.write_data[64:127];
request.stripe2 <= buffer_in.write_data[128:191];
request.parity <= buffer_in.write_data[192:255];
end
When the data comes back, it’s not quite as I’d like it to be.
My application code spits out what these values should be:
[example structure
example: 0x1d91500
example->size: 128
example->stripe1: 0x1d91600
example->stripe2: 0x1d91780
example->parity: 0x1d91880
&(example->done): 0x1d91520
The issue here is that I’m reading in data that is in a little-endian byte format, but is being interpreted as big-endian. To deal with this issue I wrote a SystemVerilog function that can swap the endianness of the bytes in a generic way.
function logic [0:63] swap_endianness(logic [0:63] in);
return {in[56:63], in[48:55], in[40:47], in[32:39], in[24:31], in[16:23],
in[8:15], in[0:7]};
endfunction
I’ll modify my assignments to make use of this function.
request.size <= swap_endianness(buffer_in.write_data[0:63]);
request.stripe1 <= swap_endianness(buffer_in.write_data[64:127]);
request.stripe2 <= swap_endianness(buffer_in.write_data[128:191]);
request.parity <= swap_endianness(buffer_in.write_data[192:255]);
Now that this is in the right byte order, my internal request register is being filled with the appropriate values.
I’ll add a touch of logic to catch when these values are set to something valid then move to the next state.
if (response.valid && response.tag == REQUEST_READ) begin
current_state <= REQUEST_STRIPES;
end
With our WED data all the way into our AFU I’ll commit my changes and call it a wrap for this post. In the next post I’ll write the remaining states and write some data back to userspace memory, completing this AFU!
what’s up how are you ? Your blog is very neat, lots of valuable information and gives me motivation to start my own blog. Do you have any tips you can give me ? Also Which template are you using as your design its very reputable.