Building custom Stream PE with Vivado HLS
Start:
[[OpenFC - an Open FPGA Cluster Toolkit]]
As described in [[OpenFC Architecture]], custom Stream PE...
* Input, Output and Timing considerations [#a8fd9723]
- Stream ports
-- For all FPGA cards, the base design has 2 router ports...
-- Width of each port is 64bit, and this width can't be c...
- Data frame headers
-- SPE always receives "Length" field of the data frame h...
-- SPE doesn't see routing header in the incoming data fr...
-- SPE must transmit "Length" field before transmitting a...
-- SPE can add (or elongate) routing header words by send...
- Currently, Router and SPE are driven at 250MHz. In futu...
* Writing, Synthesizing and Wrapping a simple SPE [#qd96d...
Sample codes for SPE is found in src/examples. Here's the...
#geshi(c++,number){{
#include <stdint.h>
#include "hls_stream.h"
void vec_accum(hls::stream<uint64_t>& in, hls::stream<uin...
#pragma HLS INTERFACE axis register both port=in
#pragma HLS INTERFACE axis register both port=out
uint64_t len, sum=0;
len = in.read();
for(uint64_t i=0; i<len; i++) sum += in.read();
out.write(1); // output length is always 1
out.write(sum);
}
}}
To synthesize this code as an SPE on KC705 card,
+ Launch Vivado HLS, create new project.
+ Add file as source code, choose "vec_accum()" for top-l...
+ Select XC7K325T-FFG900-2 as the target device (if your ...
+ Run C synthesis and Export RTL, then you'll get the IP ...
The resulting, instantiated module interface should be li...
#geshi(verilog,number){{
module vec_accum_0
(
.ap_clk (),
.ap_rst_n(),
.ap_start(),
.ap_done (),
.ap_idle (),
.ap_ready(),
.in_V_TDATA (),
.in_V_TVALID(),
.in_V_TREADY(),
.out_V_TDATA (),
.out_V_TVALID(),
.out_V_TREADY()
);
}}
#ref("remove_pe.png",right,around);
This can be fit into src/pe-base/wrappers/axis-1r1w.v. To...
+ Set up the base project along [[Quickstart with Xilinx ...
+ In the module hierarchy, remove or disable "pe : pe (pe...
+ Press Alt+a (or File -> Add Sources) then add src/pe-ba...
+ Find your HLS generated SPE core in IP catalog and inst...
-- This procedure is described in [[Xilinx Vivado HLS tut...
+ Modify "CHANGE_ME" in axis-1r1w.v to your SPE instance ...
+ Now you can synthesize and generate bitstream. Write th...
#clear
* Handling non-integer data types [#ied87c99]
The stream interface is 64-bit width, and the example abo...
** 64bit data types: adding double-precision vectors [#r1...
The first example is double: 64bit floating-point type. I...
- The payload comes with double, but the length header is...
- Vivado HLS allow only basic data types for top-level mo...
So, the input/output streams are declared as double, and ...
vecadd-double.cc:
#geshi(c++,number){{
#include <stdint.h>
#include "hls_stream.h"
typedef hls::stream<double> my_str;
typedef union {
uint64_t u;
double d;
} ud_t;
void vecadd_double(my_str& in1, my_str& in2, my_str& out1){
#pragma HLS INTERFACE axis register both port=in1
#pragma HLS INTERFACE axis register both port=in2
#pragma HLS INTERFACE axis register both port=out1
ud_t len;
len.d = in1.read();
in2.read(); // in1 and in2 must have exactly same length
out1.write(len.d); // output length = input length;
for(uint64_t i=0; i<len.u; i++){
#pragma HLS PIPELINE
double a, b, x;
a = in1.read(); b = in2.read();
x = a+b;
out1.write(x);
}
}
}}
Note: host code example will come soon.
** Non-64bit data types: adding single-precision vectors ...
Next example is addition of use of 32bit single precision...
For example, type "ap_uint<64>" means unsigned 64bit inte...
vecadd-float.cc:
#geshi(c++,number){{
#include <stdint.h>
#include "hls_stream.h"
#include "ap_int.h"
typedef hls::stream<ap_uint<64> > my_str;
typedef union {
uint32_t u;
float f;
} uf_t;
void vecadd_float(my_str& in1, my_str& in2, my_str& out1){
#pragma HLS INTERFACE axis register both port=in1
#pragma HLS INTERFACE axis register both port=in2
#pragma HLS INTERFACE axis register both port=out1
uint64_t len;
len = in1.read();
in2.read(); // in1 and in2 must have exactly same length
out1.write(len); // output length = input length;
for(uint64_t i=0; i<len; i++){
#pragma HLS PIPELINE
ap_uint<64> au, bu, xu;
uf_t a[2], b[2], x[2];
au = in1.read();
bu = in2.read();
a[0].u = au.range(31,0); b[0].u = bu.range(31,0);
a[1].u = au.range(63,32); b[1].u = bu.range(63,32);
x[0].f = a[0].f + b[0].f;
x[1].f = a[1].f + b[1].f;
xu.range(31,0) = x[0].u; xu.range(63,32) = x[1].u;
out1.write(xu);
}
}
}}
End:
[[OpenFC - an Open FPGA Cluster Toolkit]]
As described in [[OpenFC Architecture]], custom Stream PE...
* Input, Output and Timing considerations [#a8fd9723]
- Stream ports
-- For all FPGA cards, the base design has 2 router ports...
-- Width of each port is 64bit, and this width can't be c...
- Data frame headers
-- SPE always receives "Length" field of the data frame h...
-- SPE doesn't see routing header in the incoming data fr...
-- SPE must transmit "Length" field before transmitting a...
-- SPE can add (or elongate) routing header words by send...
- Currently, Router and SPE are driven at 250MHz. In futu...
* Writing, Synthesizing and Wrapping a simple SPE [#qd96d...
Sample codes for SPE is found in src/examples. Here's the...
#geshi(c++,number){{
#include <stdint.h>
#include "hls_stream.h"
void vec_accum(hls::stream<uint64_t>& in, hls::stream<uin...
#pragma HLS INTERFACE axis register both port=in
#pragma HLS INTERFACE axis register both port=out
uint64_t len, sum=0;
len = in.read();
for(uint64_t i=0; i<len; i++) sum += in.read();
out.write(1); // output length is always 1
out.write(sum);
}
}}
To synthesize this code as an SPE on KC705 card,
+ Launch Vivado HLS, create new project.
+ Add file as source code, choose "vec_accum()" for top-l...
+ Select XC7K325T-FFG900-2 as the target device (if your ...
+ Run C synthesis and Export RTL, then you'll get the IP ...
The resulting, instantiated module interface should be li...
#geshi(verilog,number){{
module vec_accum_0
(
.ap_clk (),
.ap_rst_n(),
.ap_start(),
.ap_done (),
.ap_idle (),
.ap_ready(),
.in_V_TDATA (),
.in_V_TVALID(),
.in_V_TREADY(),
.out_V_TDATA (),
.out_V_TVALID(),
.out_V_TREADY()
);
}}
#ref("remove_pe.png",right,around);
This can be fit into src/pe-base/wrappers/axis-1r1w.v. To...
+ Set up the base project along [[Quickstart with Xilinx ...
+ In the module hierarchy, remove or disable "pe : pe (pe...
+ Press Alt+a (or File -> Add Sources) then add src/pe-ba...
+ Find your HLS generated SPE core in IP catalog and inst...
-- This procedure is described in [[Xilinx Vivado HLS tut...
+ Modify "CHANGE_ME" in axis-1r1w.v to your SPE instance ...
+ Now you can synthesize and generate bitstream. Write th...
#clear
* Handling non-integer data types [#ied87c99]
The stream interface is 64-bit width, and the example abo...
** 64bit data types: adding double-precision vectors [#r1...
The first example is double: 64bit floating-point type. I...
- The payload comes with double, but the length header is...
- Vivado HLS allow only basic data types for top-level mo...
So, the input/output streams are declared as double, and ...
vecadd-double.cc:
#geshi(c++,number){{
#include <stdint.h>
#include "hls_stream.h"
typedef hls::stream<double> my_str;
typedef union {
uint64_t u;
double d;
} ud_t;
void vecadd_double(my_str& in1, my_str& in2, my_str& out1){
#pragma HLS INTERFACE axis register both port=in1
#pragma HLS INTERFACE axis register both port=in2
#pragma HLS INTERFACE axis register both port=out1
ud_t len;
len.d = in1.read();
in2.read(); // in1 and in2 must have exactly same length
out1.write(len.d); // output length = input length;
for(uint64_t i=0; i<len.u; i++){
#pragma HLS PIPELINE
double a, b, x;
a = in1.read(); b = in2.read();
x = a+b;
out1.write(x);
}
}
}}
Note: host code example will come soon.
** Non-64bit data types: adding single-precision vectors ...
Next example is addition of use of 32bit single precision...
For example, type "ap_uint<64>" means unsigned 64bit inte...
vecadd-float.cc:
#geshi(c++,number){{
#include <stdint.h>
#include "hls_stream.h"
#include "ap_int.h"
typedef hls::stream<ap_uint<64> > my_str;
typedef union {
uint32_t u;
float f;
} uf_t;
void vecadd_float(my_str& in1, my_str& in2, my_str& out1){
#pragma HLS INTERFACE axis register both port=in1
#pragma HLS INTERFACE axis register both port=in2
#pragma HLS INTERFACE axis register both port=out1
uint64_t len;
len = in1.read();
in2.read(); // in1 and in2 must have exactly same length
out1.write(len); // output length = input length;
for(uint64_t i=0; i<len; i++){
#pragma HLS PIPELINE
ap_uint<64> au, bu, xu;
uf_t a[2], b[2], x[2];
au = in1.read();
bu = in2.read();
a[0].u = au.range(31,0); b[0].u = bu.range(31,0);
a[1].u = au.range(63,32); b[1].u = bu.range(63,32);
x[0].f = a[0].f + b[0].f;
x[1].f = a[1].f + b[1].f;
xu.range(31,0) = x[0].u; xu.range(63,32) = x[1].u;
out1.write(xu);
}
}
}}
Page: