OpenFC - an Open FPGA Cluster Toolkit

As described in OpenFC Architecture, custom Stream PEs can be described in C++, with Vivado HLS. This page describes how to design and implement a custom Stream PE (SPE.)

Input, Output and Timing considerations

Writing, Synthesizing and Wrapping a simple SPE

Sample codes for SPE is found in src/examples. Here's the code of "vec-accum.cc".

  1. #include <stdint.h>
  2. #include "hls_stream.h"
  3.  
  4. void vec_accum(hls::stream<uint64_t>& in, hls::stream<uint64_t>& out){
  5. #pragma HLS INTERFACE axis register both port=in
  6. #pragma HLS INTERFACE axis register both port=out
  7.  
  8.   uint64_t len, sum=0;
  9.   len = in.read();
  10.  
  11.   for(uint64_t i=0; i<len; i++) sum += in.read();
  12.  
  13.   out.write(1);  // output length is always 1
  14.   out.write(sum);
  15. }

To synthesize this code as an SPE on KC705 card,

  1. Launch Vivado HLS, create new project.
  2. Add file as source code, choose "vec_accum()" for top-level function.
  3. Select XC7K325T-FFG900-2 as the target device (if your target card has different FPGA, choose right one), set target period to 4.0 because it runs at 250MHz (also I recommend to set some larger uncertainty such as 0.8.)
  4. Run C synthesis and Export RTL, then you'll get the IP core.

The resulting, instantiated module interface should be like this:

  1.    module vec_accum_0
  2.      (
  3.       .ap_clk  (),
  4.       .ap_rst_n(),
  5.       .ap_start(),
  6.       .ap_done (),
  7.       .ap_idle (),
  8.       .ap_ready(),
  9.  
  10.       .in_V_TDATA (),
  11.       .in_V_TVALID(),
  12.       .in_V_TREADY(),
  13.       .out_V_TDATA (),
  14.       .out_V_TVALID(),
  15.       .out_V_TREADY()
  16.       );
remove_pe.png

This can be fit into src/pe-base/wrappers/axis-1r1w.v. To make it work,

  1. Set up the base project along Quickstart with Xilinx KC705 (use different .tcl file if your target isn't KC705)
  2. In the module hierarchy, remove or disable "pe : pe (pe-pass.v)"
  3. Press Alt+a (or File -> Add Sources) then add src/pe-base/wrappers/axis-1r1w.v into your project
  4. Find your HLS generated SPE core in IP catalog and instantiate it
  5. Modify "CHANGE_ME" in axis-1r1w.v to your SPE instance name
  6. Now you can synthesize and generate bitstream. Write the generated bitstream on your card, reboot the FPGA host then you'll be able to access the SPE :)

Handling non-integer data types

The stream interface is 64-bit width, and the example above is about to handle unsigned 64bit integer. However, the data types are not always integer. Moreover, there are always needs for non-64bit data types such as float, char or even structs. This section shows a brief programming guide for non uint64_t types.

64bit data types: adding double-precision vectors

The first example is double: 64bit floating-point type. In this case, there are 2 major problems.

So, the input/output streams are declared as double, and the length header is read through a union named "ud_t" in the example. This way of type conversion is useful for 64bit data types.

vecadd-double.cc:

  1. #include <stdint.h>
  2. #include "hls_stream.h"
  3.  
  4. typedef hls::stream<double> my_str;
  5. typedef union {
  6.   uint64_t u;
  7.   double d;
  8. } ud_t;
  9.  
  10. void vecadd_double(my_str& in1, my_str& in2, my_str& out1){
  11. #pragma HLS INTERFACE axis register both port=in1
  12. #pragma HLS INTERFACE axis register both port=in2
  13. #pragma HLS INTERFACE axis register both port=out1
  14.  
  15.   ud_t len;
  16.   len.d = in1.read();
  17.   in2.read(); // in1 and in2 must have exactly same length
  18.   out1.write(len.d);  // output length = input length;
  19.  
  20.   for(uint64_t i=0; i<len.u; i++){
  21. #pragma HLS PIPELINE
  22.     double a, b, x;
  23.     a = in1.read(); b = in2.read();
  24.     x = a+b;
  25.     out1.write(x);
  26.   }
  27. }

Note: host code example will come soon.

Non-64bit data types: adding single-precision vectors

Next example is addition of use of 32bit single precision, float type vector. In this case, each word of input/output streams conveys 2 variables. This means 2 independent scalar addition is executed in every loop iteration. To split one 64bit word to two 32bit words, the arbitrary precision type of Vivado HLS C++ library is quite convenient.

For example, type "ap_uint<64>" means unsigned 64bit integer (the bit width is not limited to 8n: any number is possible.) Interesting with the ap_uint class is range() method, that enables to access specific bit range of a variable. Integer <-> floating point thing is same to previous double-precision example, so this code is much more tricky than previous one.

vecadd-float.cc:

  1. #include <stdint.h>
  2. #include "hls_stream.h"
  3. #include "ap_int.h"
  4.  
  5. typedef hls::stream<ap_uint<64> > my_str;
  6. typedef union {
  7.   uint32_t u;
  8.   float f;
  9. } uf_t;
  10.  
  11. void vecadd_float(my_str& in1, my_str& in2, my_str& out1){
  12. #pragma HLS INTERFACE axis register both port=in1
  13. #pragma HLS INTERFACE axis register both port=in2
  14. #pragma HLS INTERFACE axis register both port=out1
  15.  
  16.   uint64_t len;
  17.   len = in1.read();
  18.   in2.read(); // in1 and in2 must have exactly same length
  19.   out1.write(len);  // output length = input length;
  20.  
  21.   for(uint64_t i=0; i<len; i++){
  22. #pragma HLS PIPELINE
  23.     ap_uint<64> au, bu, xu;
  24.     uf_t a[2], b[2], x[2];
  25.  
  26.     au = in1.read();
  27.     bu = in2.read();
  28.  
  29.     a[0].u = au.range(31,0);  b[0].u = bu.range(31,0);
  30.     a[1].u = au.range(63,32); b[1].u = bu.range(63,32);
  31.  
  32.     x[0].f = a[0].f + b[0].f;
  33.     x[1].f = a[1].f + b[1].f;
  34.  
  35.     xu.range(31,0) = x[0].u;  xu.range(63,32) = x[1].u;
  36.  
  37.     out1.write(xu);
  38.   }
  39. }

Attach file: fileremove_pe.png 164 download [Information]

Front page   Edit Freeze Diff History Attach Copy Rename Reload   New Page list Search Recent changes   Help   RSS of recent changes
Last-modified: 2020-06-22 (Mon) 04:54:21