This is the 6th and final part of my Hello AFU tutorial. In the last post, I started building out a state machine for the AFU and read from the data structure that the WED points to. In this post, I’ll finish off the state machine, pulling down the data in our stripes XOR them together and write that data back to userland.
Reading the Stripes
Since the largest memory size I can request via the PSL is for 128 bytes, I’ll make requests for that amount. I need a scratch pad for this data so I’ll add two 1024 bit internal registers for these chunks of data. I’ll also need a variable to know when I’ve received both chunks, so I’ll setup a small register for that as well.
logic [0:1023] stripe1_data;
logic [0:1023] stripe2_data;
logic stripe_received;
In my REQUEST_STRIPES
state I’ll request data from stripe1 in one cycle, then stripe2 in the next, I’ll use the command’s tag to know where I am in that process. I’ll set my stripe_received
to 0, to indicate I’ve not yet retrieved either.
REQUEST_STRIPES: begin
command_out.valid <= 1;
command_out.size = 128;
command_out.command <= READ_CL_NA;
if (command_out.tag == REQUEST_READ) begin
command_out.tag <= STRIPE1_READ;
command_out.address <= request.stripe1;
end else begin
command_out.tag <= STRIPE2_READ;
command_out.address <= request.stripe2;
current_state <= WAITING_FOR_STRIPES;
stripe_received <= 0;
end
end
With the requests for stripe data sent, I need to wait for the data to come back. This could happen in any order, so I need to be ready for either.
WAITING_FOR_STRIPES: begin
command_out.valid <= 0;
if (buffer_in.write_valid) begin
case(buffer_in.write_tag)
STRIPE1_READ: begin
if (buffer_in.write_address == 0) begin
stripe1_data[0:511] <= buffer_in.write_data;
end else begine
stripe1_data[512:1023] <= buffer_in.write_data;
end
end
STRIPE2_READ: begin
if (buffer_in.write_address == 0) begin
stripe2_data[0:511] <= buffer_in.write_data;
end else begine
stripe2_data[512:1023] <= buffer_in.write_data;
end
end
endcase
end
end
In the same state, I’ll look for the tags to come in over the response interface. On the first request I set the stripe_received
register, the second request the state progresses to WRITE_PARITY
if (response.valid) begin
if (response.tag == STRIPE1_READ ||
response.tag == STRIPE2_READ) begin
if (stripe_received) begin
current_state <= WRITE_PARITY;
end else begin
stripe_received <= 1;
end
end
end
Where is this Parity?
I decided to parity the stripes via assign
, by creating one new internal variable parity_data
can be referenced for the XOR’d value of stripe1
and stripe2
.
logic [0:1023] parity_data;
assign parity_data = stripe1_data ^ stripe2_data;
Since I set the buffer latency to 1, the data being put on the buffer for writing to memory needs to be shifted back a cycle.
logic [0:511] write_buffer;
shift_register #(512) write_shift (
.clock(clock),
.in(write_buffer),
.out(buffer_out.read_data));
Now I need to write the parity data to the memory at request.parity
. This is pretty similar to reading memory. I’ll send a WRITE_CL
“write cacheline” command and align my data with buffer_out.read_data
, returning the first half for address 0 and the high half in 1.
WRITE_PARITY: begin
if (command_out.tag != PARITY_WRITE) begin
command_out.command <= WRITE_NA;
command_out.address <= request.parity;
command_out.tag <= PARITY_WRITE;
command_out.valid <= 1;
end else begin
command_out.valid <= 0;
// Read half depending on address
if (buffer_in.read_address == 0) begin
write_buffer <= parity_data[0:511];
end else begin
write_buffer <= parity_data[512:1023];
end
// Handle response
if (response.valid &&
response.tag == PARITY_WRITE) begin
current_state <= DONE;
end
end
end
After the parity is written, the job is complete. The state progresses to DONE
when the write comes back on the response interface.
Aligned Writing
Writing the done
flag is a little trickier, since it is not on a 128 or 64-byte alignment. The PSL can handle writing to any address, but the data must be aligned within the 128-byte read bus. If the data size you’re writing to is 64 bytes or less you can let the same data sit on the buffer interface for both addresses.
In this case, the done
field is 32 bytes past WED. and I’m doing a 1 byte write. I’ll align my data starting at the 256th bit, writing 8 bits. I’ll write a 1 in the first byte to set the little-endian unsigned 64bit number to a non-zero.
DONE: begin
if (command_out.tag != DONE_WRITE) begin
command_out.tag <= DONE_WRITE;
command_out.size <= 1;
command_out.address <= wed + 32;
command_out.valid <= 1;
write_buffer[256:319] <= 1;
end else begin
command_out.valid <= 0;
end
end
With that, the parity is written and the userspace application can see when it completes. Here’s the output from the test_afu
application.
INFO:Connecting to host 'localhost' port 16384
[example structure
example: 0x7fa500
example->size: 128
example->stripe1: 0x7fa600
example->stripe2: 0x7fa780
example->parity: 0x7fa880
&(example->done): 0x7fa520
Attached to AFU
Waiting for completion by AFU
done: 0
done: 0
done: 1
PARITY:
That is some proper parity! This is exactly what I'm expecting to see. I'd also like to see this running on some real gear soon
Releasing AFU
That completes the basic function of this AFU, I’ll commit my changes here.
Larger buffers
Now I’ll extend the design to support more than 128-byte buffers, this just requires an offset buffer that keep track of the current offset relative to the total size of the buffer to generate parity for.
I’ll start by adding a new variable for the offset that matches the data type as size.
longint unsigned offset;
Then I’ll set it to 0 in the START
state.
offset <= 0;
In the REQUEST_STRIPES
state I’ll add the offset to the stripe pointers.
command_out.address <= request.stripe1 + offset;
In the WRITE_PARITY
state I’ll add the offset to the parity pointer, and check to see if the operation is complete.
command_out.address <= request.parity + offset;
if (offset + 128 < request.size) begin
offset <= offset + 128;
current_state <= REQUEST_STRIPES;
end else begin
current_state <= DONE;
end
With that I’d say this AFU is good enough for this tutorial. I’ll commit my changes and welcome pull requests if you find improvements to this tutorial. Hope this helps you hack on CAPI!
Many thanks for a really awesome blog. It was very helpful. I’m so happy I came across this.