Network Development Kit

Easy to use framework for HW acceleration.

We provide the Network Development Kit (NDK), which allows users to quickly and easily develop new network appliances based on FPGA acceleration cards. The NDK is optimized for high throughput and scalable to support 10, 100 and 400 Gigabit Ethernet.

Key features:

  • Network module based on standard Ethernet Hard IPs with support 10 GbE, 100 GbE, 400 GbE and other speeds.
  • Ultra fast DMA module with 400 Gbps throughput based on PCIe Gen5 x16 interface.
  • Easy to use memory interface for single read/write data from/to card.
  • Automatic scripts for complete design synthesis. Single make command to create complete FPGA bitstream.
  • Linux kernel driver, DPDK support, user space library, tools for the configuration of components.
  • Easy creation of custom application by user-friendly API for component access and DMA transfers.

User Application

NDK is designed for creating new network applications with fast packet processing in a deep pipeline. The application core is a user logic which can benefit from NDK to capture packets from network interfaces and send any data to the host CPU using ultra fast DMA transfers. Receiving and sending network packets is handled by a network module (part of NDK). The packets are then transmitted to the application core via the data stream bus (compatible with AXI4-Stream/Avalon-ST). The same data bus is then used to transfer data to the host CPU. The entire NDK is designed to be scalable from tens to hundreds Gbps. It is ready to send and process more packets per clock cycle. The standard data buses are optimized to transfer more packets at once and thus further scale the throughput. We have designed the concept of MFB (Multi-Frame Bus) and MVB (Multi-Value Bus) to scale standard buses over 100 Gbps throughput. In terms of throughput, almost the only limitation is the available resources of the FPGA.

Many networking applications need large data structures or buffers. Therefore the NDK provides an easy to use interface for communication with external memories (typically DRAM). Users can utilize the interface for rapid development of a connection tracking table, flow cache or data buffers.

The user application implemented in the FPGA can be controlled by read/write requests to an assigned address range. These requests are transmitted from the SW to the application core via a CSR bus compatible with Avalon-MM. Read and write requests can be generated by the SW user application through a simple SW API.

Ultra Fast DMA transfers (DMA Medusa IP)

We provide a vendor-independent FPGA architecture and open-source Linux drivers for high-speed DMA transfers using the per-packet approach. The DMA is designed for  400 Gbps throughput and uses multi-channel architecture to support the distribution of data among CPU cores. The architecture is highly flexible and supports various high-end FPGA families and PCIe bus configurations (up to PCIe Gen5 x16). The DMA IP can utilize more PCI Endpoint blocks to scale throughput to 100, 200, and 400 Gbps.

We have already demonstrated the 400 Gb throughput of the DMA architecture on the Intel Stratix 10 DX Development Kit, but the same DMA engine can provide a very high throughput also for Xilinx UltraScale+ or Intel Agilex devices. The NDK Linux driver allows controlling all DMA channels separately. The NDK driver also provides a user-friendly API to connect your application core directly to the DMA IP. It is also possible to handle DMA transfers through the DPDK driver.

Example NIC design

As part of NDK, we provide an example design of an NIC application which can be easily extended to provide hardware acceleration of user application. The NIC example design  is built on top of the NDK framework. Due to the well-designed architecture the NIC application core is composed only of connection wires, which are necessary to connect network interfaces with the DMA module to transfer packets from Ethernet ports to the PCIe bus.  Moreover, the NIC example design includes a unit for data distribution to DMA channels.

Generate bitstream by single command

The primary NDK goal is to provide an easy to use framework for the fast development of hardware-accelerated appliances and systems. Therefore we simplify and facilitate as much as possible the steps necessary to build a new bitstream. To start the synthesis process and generate the bitstream, you need to install synthesis tools for your FPGA and download the NDK Git repository.  Then, you can run the synthesis and generate bitstream by a single command:

# Go to the folder of the NIC design for the selected card.
$ cd apps/nic/dk-dev-1sdx-p/
# Run compilation in the synthesis tool by make command.
$ make

Once the synthesis process is completed, you have a bitstream file, which can be uploaded into the FPGA card using the NDK tools. Alternatively, it is possible to use a FPGA programming tools provided by an FPGA vendor.

The example NIC design is built by default configured for the fastest supported Ethernet standard (for example 100 GbE or 400 GbE). The NDK usually supports lower speeds. If you want to build a bitstream for another Ethernet standard (for example 10 or 25 GbE), you can specify the speed as the makefile target.

# Run compilation with support for eight 25 GbE channels.
$ make 25g8

The target name (for example 25g8) contains a description of the selected Ethernet mode. The number before the letter “g” defines the speed of each Ethernet channel in gigabits. The number after “g” letter determines the total number of Ethernet channels provided by the FPGA card.

Easy to use packet capture solution

The NDK also includes several useful tools to control DMA transfers and provide packet capture functionality. One of these tools is used to receive packets from the DMA controller and store them in a PCAP file(s). This tool is called “ndp-receive”. As the example NIC design directly connects Ethernet interfaces and DMA controller, we can easily use the “ndp-receive” tool to capture Ethernet traffic to a PCAP file(s). All NDK tools have a built-in help that will introduce all the options and command line syntax:

$ ndp-receive -h
Usage: ndp-receive [-d path] [-i indexes] [-D dump] [-I interval] [-p packets] [-b bytes] [-B size] [-Rqh]
Common parameters:
   -d path       Path to device [default: /dev/nfb0]
   -i indexes    Queues numbers to use - list or range, e.g. "0-5,7" [default: all]
   -h            Show this text
   -p packets    Stop receiving or transmitting after  packets
   -b bytes      Stop receiving or transmitting after  bytes
   -B size       Read and write packets in bursts of  [default: RX=64, TX=64]
Packet output parameters: (available for one queue only)
   -D dump       Dump packet content to stdout (char, all, header, data)
   -I interval   Sample each Nth packet
Statistic output parameters: (exclusive with -D argument)
   -R            Incremental mode (no counter reset on each output)
   -I interval   Print stats each N secs, 0 = don't print continuous stats [default: 1]
   -q            Quiet mode - don't print stats at end
Receive parameters:
   -f file       Write data to PCAP file  (. for multiple queues)
   -t timestamp  Timestamp source for PCAP packet header: (system, header:X)
                 (X is bit offset in NDP header of 64b timestamp value)
   -r trim       Maximum number of bytes per packet to save

For example, if we want to capture 10,000 packets from all available DMA channels (queues) and store these packets in the PCAP files for each DMA channel separately, we can start the packet capture by a single command::

# Capture 10,000 packets from all DMA channels into the PCAP files.
$ ndp-receive -p 10000 -f my_capture.pcap

When the capture of the specified number of packets is completed, the ndp-receive program is finished and all captured network traffic is stored to PCAP files (the files are named as follows: “my_capture.<index>.pcap” where <index> is the DMA channel number).

Implement, verify and run

The example NIC design can serve as a source of inspiration when you create your target application. The application core can be easily extended with your acceleration engine or other unique functionality. We provide an NDK User Guide with a detailed description of the application interfaces. You can focus on the application core and utilize NDK to control network interfaces and PCIe with fast DMA transfers. 

All network applications need precise verification and testing. Therefore, the NDK provides a UVM verification environment that allows to check the proper functionality of the application. The verification environment includes a set of Bus Functional Models (BFM) for all application interfaces. All BFMs and the entire verification environment are described in detail in the NDK User Guide.  Once you have successfully verified the application core, you can prepare a new FPGA bitstream by single make command and run the  application on the FPGA card.

Supported FPGA cards

The NDK is currently available for two FPGA cards, but it is easily extensible to any other FPGA based card with network interfaces and PCIe connector. For all cards, we share the common source code to have as much as possible verified and tested code base. The vast majority of components used in NDK are ready to support a wide range of high-end FPGAs, including Intel Agilex, Intel Stratix 10, Xilinx UltraScale+ and others. If you are interested in the possibility of adding your FPGA card among those supported in NDK, please contact us.

XpressSX AGI-FH400G card by CESNET & REFLEX CES

Supported FPGA accelerator cards: