| Info | HW section | SW section | Addr space | Versions |
| Authors: | Tomas Martinek |
| Jan Korenek |
Network traffic monitoring is one of the typical applications that requires a high performance to process a large amount of data. Typically, for 10Gps technology, up to 20 million packets arrive in the system every second. Afterwards, for each packet, it is necessary to analyse the control information in its header and provide the classification that splits the packets into several categories based on monitoring system requirements. For some categories, statistical data is produced and selected packets are sampled for a more detailed analysis. One of the very useful and most performance critical part of this monitoring system is the so-called payload checking - the packet data is analysed for the occurrence of specific patterns. All these functionalities require a level of computation power, that is difficult to achieve even with powerful conventional computers. FPGA technology has been identified as a suitable way of accelerating the monitoring of a system application. It offers high computation power, the use of massive parallelism and allows reconfiguration or even dynamic reconfiguration.
In this project, we present the architecture of a monitoring adaptor dedicated to 10Gbps technology. The adaptor provides input packet classification and supports three techniques of packet sampling, namely deterministic, stochastic and byte deterministic. Further, it is able to produce two types of statistical data - statistics based on packet length and statistics based on time-stamps. Finally, it offers fast payload checking of classified packets using fast content addressable memory (CAM).
The architecture of the 10Gbps monitoring adaptor is shown in Figure 1 and can be divided into three functional parts. The first part involves packet receiving and the analysis of the control information which is contained in the packet header. The next part represents a classification process, which splits packets into several classes based on the previous analysis and the configuration of a monitoring system. Finally, statistical data, sampling and payload checking is provided for appropriate classes of packets. Detailed description of individual parts is presented in following subsections.
SCAMPI design block structure.
The Scampi Common block (SCOM) is used for general design purposes. It contains identification register, which is important for software driver and global design detection. Further, the SCOM is used for design reset, IRQ enabling/disabling and input interface selection.
The Input Packet Buffer is used as a storage for incoming packets, before they are processed by others parts of the design. The input packets arrive at the standard XGMII interface and they are evenly distributed into four Input Buffers (IBUF), so that the 10Gbps input data stream is split into four 2.5Gps streams. In association with the packet data, the time-stamp and identification information is stored into the IBUF. Time-stamp is generated by Timestamp unit (TSU) and represents precise time information of packet arrival synchronized with external GPS module and used for producing statistical data. Among others, the IBUF provides needful CRC computation checking.
The TSU_COV design performs TSU_ADD handling by software. It makes available for software to request all three TSU modes (INIT, SHORT, FAST) and to read the actual time stamp value.
The Header Field Extractor is intended for analyzing of input packets. It is a processor based on RISC architecture controled by specific instruction set. HFE reads packets data from input buffer, analyses control information in its headers and produces specific data structures.
UH FIFO is memory organised as FIFO which contain Unified Headers (UH) generated by HFE processor. It has 16 items and is organised as circular buffer. LUP reads UH and performs classification. When UH processing is finished, it is released.
Packet FIFO is divided into two parts: PFIFO_A and PFIFO_B. Due to this PFIFO can provide inter-design connection. It converts 16 bit input data into 32 bit data. PFIFO_A is conected with four HFE interfaces and provides DRAM scheduler functions. It converts HFE data into new format PFIFO_B can work with and send them via four streams to PFIFO_B. PFIFO_B recieves 4 data strems from PFIFO_A, which are stored into FIFO16TO32. It is also connected with Sorting Unit (SU), Dispatcher (DISP) and SFIFO. It use SU records to sort packets from four streams into one in previous order (as they were recieved from HFEs). It also separates control data from input streams, sorts them and stores into SFIFO. DISP can read sorted 32 bit packets from PFIFO or it can free them directly without reading.
Look up processor (LUP) performs packet classification. LUP input is a structure called Unified Header which is created by Header field extractor and contain important informations from packet headers. Output is a record that controls packet processing in following blocks. LUP use TCAM and SSRAM memory. The Unified header classification starts in TCAM memory where part of Unified header is selected and matched. The TCAM result is address which points to the program stored in SSRAM memory. The program checks remaining part of unified header and is finished by EXE instruction which contain result (LUP) record.
The Sort Unit is responsible for sorting of input data records and their sending to the output, in right order. The input data are 16-bits wide and the first word contains the key, which represents position of data record in sorted sequence. (Number of data record words is generic.) The output data are 32-bits wide and they represents sorted records without the first word containing the key.
Statistic Unit (STU) allows to create length and time statistic (number of packects, average packet length, minimal and maximal packet length, average interpacket time, etc.). Statistic unit contains 256 virtual statistic cores implemented as 256 sets of registers and only one processing unit. Set of registers which is used for statistic for actual packet is determined by LUP record.
Sampling unit (SAU) performs sampling of packets which are required to pass to the application. Sampling units contains 16 Sampling Cores (SC). Each Sampling Core can be configured to do deterministic, length deterministic and probabilistic sampling. The packet could be processed simultaneously in more than one SAU core. Sampling unit has two inputs: (1) LUP record created by Look up processor (LUP) and (2) length of actual packet. Output from SAU is 16bit sample vector. Each bit of sample vector corresponds to result of one Sampling Core.
Based on input control information, the Dispatcher component determines whether the packet from input FIFO component will be forwarded to the output, or discarded. Input control information is composed of two parts: (1) Sampling Record - specifies bit vector of Sampling Cores results; (2) Payload Checker Identifier - specifies the group of strings to be checked in packet payload. If none of Sampling Cores has sampled the packet and PCK Identifier is zero (no payload checking), then the input packet is discarded. Otherwise, the packet is forwarded to the output interface.
Payload checker (PCK) is component for fast pattern matching using FPGA and TCAM. FPGA is used for data control and buffering and TCAM is used for fast pattern matching. PCK allows to search up to 512 patterns, each up to 16B long. Patterns are shifted against the TCAM word, so more then only one row is used to store one pattern. It enables to achieve high performance up to 3,2Gbps. Patterns can be divided to 256 groups and matching can be done in any combination of those groups (classification is done by LUP record).
The Output Buffer is used as a storage for outgoing packets dedicated to software driver. The input packets arrive in format of "command protocol", which splits packet data from its control data. The Output Buffer decapsulates this protocol and stores packet and control data into the output memory separately at different memory positions. The control data are typically stored at beginning of memory buffer and packet data are stored from specified memory offset. The Output Buffer supports generic number of memory blocks and it also is responsible for interrupt (IRQ) generation, when packet arrive into the buffer.


