As described in OpenFC Architecture, custom Stream PEs can be described in C++, with Vivado HLS. This page describes how to design and implement a custom Stream PE (SPE.)
For all FPGA cards, the base design has 2 router ports to SPE. This means an SPE can have up to 2 input and 2 output stream ports.
Width of each port is 64bit, and this width can't be changed.
Data frame headers
SPE always receives "Length" field of the data frame header before the payload.
SPE doesn't see routing header in the incoming data frame.
SPE must transmit "Length" field before transmitting any output frame payload. If there are remaining routing header words, route prepends them to the length field.
SPE can add (or elongate) routing header words by sending them before the Length field.
Currently, Router and SPE are driven at 250MHz. In future version, the frequency may be changed in faster FPGA cards.
Sample codes for SPE is found in src/examples. Here's the code of "vec-accum.cc".
To synthesize this code as an SPE on KC705 card,
Launch Vivado HLS, create new project.
Add file as source code, choose "vec_accum()" for top-level function.
Select XC7K325T-FFG900-2 as the target device (if your target card has different FPGA, choose right one), set target period to 4.0 because it runs at 250MHz (also I recommend to set some larger uncertainty such as 0.8.)
Run C synthesis and Export RTL, then you'll get the IP core.
The resulting, instantiated module interface should be like this:
This can be fit into src/pe-base/wrappers/axis-1r1w.v. To make it work,
Modify "CHANGE_ME" in axis-1r1w.v to your SPE instance name
Now you can synthesize and generate bitstream. Write the generated bitstream on your card, reboot the FPGA host then you'll be able to access the SPE :)
The stream interface is 64-bit width, and the example above is about to handle unsigned 64bit integer. However, the data types are not always integer. Moreover, there are always needs for non-64bit data types such as float, char or even structs. This section shows a brief programming guide for non uint64_t types.
64bit data types: adding double-precision vectors†
The first example is double: 64bit floating-point type. In this case, there are 2 major problems.
The payload comes with double, but the length header is 64bit integer. So we have to handle both data types in single data stream.
Vivado HLS allow only basic data types for top-level module interface (we can't use struct or union)
So, the input/output streams are declared as double, and the length header is read through a union named "ud_t" in the example. This way of type conversion is useful for 64bit data types.
Non-64bit data types: adding single-precision vectors†
Next example is addition of use of 32bit single precision, float type vector. In this case, each word of input/output streams conveys 2 variables. This means 2 independent scalar addition is executed in every loop iteration. To split one 64bit word to two 32bit words, the arbitrary precision type of Vivado HLS C++ library is quite convenient.
For example, type "ap_uint<64>" means unsigned 64bit integer (the bit width is not limited to 8n: any number is possible.) Interesting with the ap_uint class is range() method, that enables to access specific bit range of a variable. Integer <-> floating point thing is same to previous double-precision example, so this code is much more tricky than previous one.