This is part 3 of my Hello AFU tutorial. In the last part we built components to handle the AFU reset. In this part we’ll look at the requests coming in for the AFU descriptor and build a mechanism to send this data back to the PSL.
svAFU Ports
Before we get started here, I noticed something odd that’s good to be aware of. In the first post I commented out the structured inputs and outputs because Quartus was throwing errors that they were not defined. I assumed it didn’t like that they weren’t being used, but after re-cloning the repo down it started to give me those errors again.
It looks like this error might be just related to some of the order the components are being built in, if you comment out all the structured inputs it’ll synthesize the project successfully. After that, you can uncomment it all and it will synthesize just fine. If anyone has insight into why this is happening or a better fix, I’d appreciate any feedback you have to offer.
AFU Descriptor Read Requests
After the AFU handles its initial reset signal, the next batch of signals are requests over the MMIO interface for the AFU descriptor. The AFU decriptor provides some details about the AFUs function and setup.
To add the MMIO interfaces to the wave viewer, I’ll add a do watch_mmio_interface.do
line to my test.do
script before the run 40
command.
The MMIO operations are synchronous so the PSL will send a single request and wait until it gets a response. We can see the first signal coming in here:
Like with the job interface, the ha_mmval
will be raised when a valid command is active. The ha_mmcfg
being high lets us know this a request for data in the AFU descriptor. ha_mmrnw
is high for read requests and low for write requests. ha_mmdw
is low for 32-bit requests and high for 64-bit. ha_mmad
is the address of the data being requested, and ha_mmadpar
is the odd parity bit for that address. ha_data
and ha_datapar
are only used for write requests, so we don’t need to look at those quite yet.
Parameterizing the Shift Register
Before we can send data back, we need to make a modification to our shift register. Similar to our previous jdone
signal we shifted back a clock cycle, we need to do the same here for ah_mmack
and ah_mmdata
. Our shift register as-is will work fine for ah_mmack
, as it’s also a signal signal. For ah_mmdata
we need it to be 64 bits wide to support the whole bus being shifted back a clock cycle.
SystemVerilog provides as a construct to parameterize a module, allowing us to modify some of how it’s operating on a per-instance basis. In this case I want to add a width field that lets us set the width the bus.
The logic in the always_ff
block does not need to change for this, we just need to define the parameter and use it in the input and output port declarations
module shift_register #(parameter width = 1) (
input logic clock,
input logic [0:width-1] in,
output logic [0:width-1] out);
In this change, we now have a default width of 1, so that we don’t need to change the shift registers already in use. For the module we’re about to build we can now create an instance like this for a 64 bit wide shifter:
shift_register #(64) data_shift(
.clock(clock),
.in(data),
.out(mmio_out.data));
Handling AFU Descriptor requests
I will define and add a new file mmio.sv
to the project that will be responsible for all MMIO request handling. It will have some internal variables ack
and data
to hold the data that will be shifted back. Additionally it will have some logic to set the ah_mmdatapar
bit. That parity bit doesn’t need to be shifted because we can hook it up to the current output to save a couple logic gates.
import CAPI::*;
module mmio (
input logic clock,
input MMIOInterfaceInput mmio_in,
output MMIOInterfaceOutput mmio_out);
logic ack;
logic [0:63] data;
shift_register ack_shift(
.clock(clock),
.in(ack),
.out(mmio_out.ack));
shift_register #(64) data_shift(
.clock(clock),
.in(data),
.out(mmio_out.data));
// Set parity bit for MMIO output
assign mmio_out.data_parity = ~^mmio_out.data;
always_ff @(posedge clock) begin
if(mmio_in.valid) begin
if(mmio_in.cfg) begin
if(mmio_in.read) begin
ack <= 1;
data <= 1;
end
end
end else begin
ack <= 0;
data <= 0;
end
end
endmodule
For now, I’m not as worried about sending proper data as I am getting all the pieces laid out and working. I’ll add an instance of this new mmio
module in my parity_afu
module.
mmio mmio_handler(
.clock(clock),
.mmio_in(mmio_in),
.mmio_out(mmio_out));
Looking at the waves now, we can see 7 MMIO requests coming in, and for each we’re sending back a simple 1 across on the data bus.
Since we didn’t send a proper descriptor, PSLSE complains ERROR:AFU descriptor num_of_processes=0!
Either way it’s starting to come together so I’ll commit my changes and move on.
Defining a New Type
It took me a while to find a way to handle these AFU requests that I felt was functional and cleanly coded. Most AFU descriptor implementations I’ve seen so far are using some verilog implementation of ROM, and this is how I first implemented this.
I found this method to be a bit cumbersome, so I decided to extend my capi.sv
to include a new structure definition for an AFU descriptor. This format is modeled after whats described in the CAPI User’s Manual.
typedef struct packed {
bit [0:15] num_ints_per_process;
bit [0:15] num_of_processes;
bit [0:15] num_of_afu_crs;
bit [0:15] req_prog_model;
bit [0:199] reserved_1;
bit [0:55] afu_cr_len;
bit [0:63] afu_cr_offset;
bit [0:5] reserved_2;
bit psa_per_process_required;
bit psa_required;
bit [0:55] psa_length;
bit [0:63] psa_offset;
bit [0:7] reserved_3;
bit [0:55] afu_eb_len;
bit [0:63] afu_eb_offset;
} AFUDescriptor;
To support reading the right portions of the AFU descriptor, a SystemVerilog function felt like the best route. This initial implementation is built just to support the regions of the AFU descriptor that I’ve seen requests come in to so far.
function bit [0:63] read_afu_descriptor(AFUDescriptor descriptor,
bit [0:23] address);
case(address)
'h0: begin
return {descriptor.num_ints_per_process,
descriptor.num_of_processes,
descriptor.num_of_afu_crs,
descriptor.req_prog_model};
end
default: begin
return 0;
end
endcase
endfunction
With this new type and function to help reading it added to my CAPI package, I can create an instance of this type in my mmio
module and set the values appropriately.
AFUDescriptor afu_desc;
assign afu_desc.num_ints_per_process = 0,
afu_desc.num_of_processes = 1,
afu_desc.num_of_afu_crs = 0,
afu_desc.req_prog_model = 16'h8010,
afu_desc.reserved_1 = 0,
afu_desc.afu_cr_len = 0,
afu_desc.afu_cr_offset = 0,
afu_desc.reserved_2 = 0,
afu_desc.psa_per_process_required = 0,
afu_desc.psa_required = 0,
afu_desc.psa_length = 0,
afu_desc.psa_offset = 0,
afu_desc.reserved_3 = 0,
afu_desc.afu_eb_len = 0,
afu_desc.afu_eb_offset = 0;
The last step is to replace our hard-coded response with the newly defined function.
data <= read_afu_descriptor(afu_desc, mmio_in.address);
With that completed, I’ll verify I’m getting the expected behavior during simulation.
Now that we’ve returned enough AFU data the PSLSE output shows us we’re ready to connect a client!
INFO:PSLSE version 1.002 compiled @ Feb 5 2016 11:47:34
INFO:PSLSE parm values:
Seed = 13
Timeout = 10 seconds
Response = 16%
Paged = 3%
Reorder = 86%
Buffer = 82%
INFO:Attempting to connect AFU: afu0.0 @ localhost:32768
PSL_SOCKET: Using PSL protocol level : 0.9908.0
INFO:Clocking afu0.0
WARNING:ah_brlat must be either 1 or 3!
WARNING:ah_brlat must be either 1 or 3!
INFO:Started PSLSE server, listening on kbawx:16384
There are also a couple of warnings about the buffer read latency, but I’ll wait to address that when we look at using the buffer interface. With this bit implemented, I’ll commit my changes and in the next post we’ll look at communicating with our AFU from userspace.