OpenFC - an Open FPGA Cluster framework

System overview

overview.png

OpenFC provides:

Users of OpenFC can design and load their own Stream PE accelerator module, in HLS or of course RTL. Multiple FPGAs can be connected together to enable large-scale accelerated computing.

On the host, simple APIs are provided to enable user programs to communicate with the Stream PE on FPGA(s).

Routing basics

routing.png

To communicate with Stream PEs, host program transmits one or more data frames via PCIe DMA. Data frames are routed by its routing header: header is basically generated by the host program. Stream PEs can consume the payload part, then generate new payload that contains result of the computation.

A data frame is a stream of 64-bit words, composed of these 3 parts:

  1. Routing header: Values of 64'h0100_0000_xxxx_xxxx, where xxxx_xxxx is the router's port #.
    • On arrival to router, the first word of routing header is "consumed" to choose the destination port. To describe a route of multiple hops, multiple routing header words are arranged. Any length of routing header is allowed.
    • Basically, host program prepares routing header to the final destination (i.e., the host itself.) Or, Stream PEs can add routing header words for adaptiveness/flexibility.
  2. Length: Number of 64-bit words follows as the payload. The minimum length is 1 word, the maximum length is 2^32-1 words.
  3. The payload.

Simple send/receive example

This host C code will generate a data frame to Stream PE. Stream PE output will be sent to PCIe thus the host will receive the result from Stream PE.

  1.   // Buffer allocation
  2.   uint64_t* out = (uint64_t*)buf_alloc(vec_len*8);
  3.   uint64_t* in  = (uint64_t*)buf_alloc(vec_len*8);
  4.  
  5.   // Header
  6.   uint64_t header[HEADER_MAX];
  7.  
  8.   header[0] = ROUTING_HEADAER | 1; // Stream PE
  9.   header[1] = ROUTING_HEADER | 6;  // PCIe
  10.   header[2] = vec_len;
  11.   buf_set_header((uint64_t*)out, header,  3);
  12.  
  13.   // Generate Payload
  14.   for(int i=0; i<vec_len; i++) out[i] = i;
  15.  
  16.   // Send & Receive
  17.   buf_send_async(handles.fd_o1, (uint64_t*)out, vec_len*8);
  18.   buf_recv(handles.fd_i, (uint64_t*)in, vec_len*8);

Simple 64-bit integer vector accumulate example with Vivado HLS

The following Vivado HLS C++ code gives a Stream PE to calculate integer vector sum. The routing header is not sent to Stream PE.

For details about hls::stream or #pragma HLS in this code, please refer Xilinx UG902: High-level synthesis. For more things in writing custom Stream PE, proceed to Building custom Stream PE with Vivado HLS.

  1. void vec_accum(hls::stream<uint64_t>& in, hls::stream<uint64_t>& out){
  2. #pragma HLS INTERFACE axis register both port=in
  3. #pragma HLS INTERFACE axis register both port=out
  4.  
  5.   uint64_t len, sum=0;
  6.   len = in.read();
  7.  
  8.   for(uint64_t i=0; i<len; i++) sum += in.read();
  9.  
  10.   out.write(1);  // output length is always 1
  11.   out.write(sum);
  12. }

Attach file: fileoverview.png 120 download [Information] filerouting.png 103 download [Information]

Front page   Edit Freeze Diff History Attach Copy Rename Reload   New Page list Search Recent changes   Help   RSS of recent changes
Last-modified: 2019-09-03 (Tue) 08:57:12