High-Speed Multithreaded IP Flow Exporter for Machine Learning

Our team develops hardware accelerated technologies especially for monitoring high-speed network links for many years and in May 2021, we reached a new milestone. In combination with a hardware card equipped with an FPGA capable of 200Gb/s, the monitoring probe is able to process over 175Gb/s and export extended IPFIX data.

The whole measurement of the developed prototype was based on the new version of the ipfixprobe software. This IP flow exporter is an open source project capable of exporting IPFIX output data extended with various information by extension plugins. This software tool was originally known as flow_meter, a component of the NEMEA system (the development started as a project of bachelor and master theses at the Czech universities in cooperation with CESNET). However, since the IPFIX ecosystem evolved and our another open source project IPFIXcol2 became more flexible and efficient, we have decided to make ipfixprobe also more independent.

The main improvement of the latest version is a reworked architecture of ipfixprobe, which is newly based on multithreading (compared to the previous version) and supports high-throughput data transfers from the FPGA card. This big change allows users to start the exporter with multiple inputs – separated DMA channels – and process high-speed packets in parallel (modern NICs support so-called Receive-Side Scaling – RSS that distributes incoming packets into independent queues for parallel processing). Also, the output part of the flow exporter was significantly improved so there are shared minimal IPFIX templates and flow data from parallel threads are joined before sending to the IPFIX collector. This approach saves the capacity of the collector.

Naturally, the new version of ipfixprobe was evaluated. Our measurements were performed using a NFB-200G2QL card equipped with an FPGA chip and generated 200Gb/s network traffic. The machine with the flow exporter was a commodity Dell server with 96GB RAM, 2 CPUs Intel Xeon Gold 5218 and CentOS 7 was used as an operating system. Overall generated traffic reached 200Gb/s to saturate the whole system, and about 90% packets were processed by the software. In total, ipfixprobe running above 32 independent DMA channels (allowing to utilize 32 CPU cores) was able to process about 175Gb/s, which is very promising result. During the tests, multiple plugins were enabled: pstats, TLS, OVPN, idpcontent.

We use ipfixprobe as a primary source of data for our research activities. Since the ratio of encrypted traffic rises, it is no longer possible to use only traditional tools for monitoring and analysis that are based on readable-unencrypted information. Therefore, we are researching the feasibility of Machine Learning techniques to classify encrypted traffic and detect suspicious activities – possible security incidents. For these experiments,  we study flow-based data extended with packet-level information, which seems to be feasible for this research challenge, as it is presented in several recent scientific papers (listed below).

Selected scientific papers related machine learning that benefited from ipfixprobe (in case of troubles with downloading the papers, please contact us):

  1. D. Vekshin, K. Hynek, and T. Cejka: DoH Insight: Detecting DNS over HTTPS by Machine Learning. In Proceedings of the 15th International Conference on Availability, Reliability and Security (ARES), New York, NY, USA, 2020.
  2. K. Hynek and T. Čejka: Privacy Illusion: Beware of Unpadded DoH. In 2020 11th IEEE Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), 2020, pp. 621–628.
  3. J. Luxemburk, J., K. Hynek, and T. Čejka: Detection of HTTPS Brute-Force Attacks with Packet-Level Feature Set. In: 11th Annual Computing and Communication Workshop and Conference (CCWC2021). Piscataway (New Jersey): IEEE, 2021. p. 0115-0123. ISBN 978-0-7381-4394-1.