A framework to understand BlueField-3 Datapath Accelerator(DPA) performance. This is the source code for our ICNP24 paper.
- BlueField-3 SmartNIC with any series, running DPU mode/ NIC mode
- NVIDIA DOCA Framework >= 2.5.0
- DPDK >= 22.11
- Required lib: cmake, gflags, numa, lz4, z
- HugePage: At least 2048 huge pages on specific NUMA node
All benchmarks require
- two BF3 SmartNICs connect under 400Gbps Ethernet(back-to-back or switch)
- use link aggregation to combine two network interfaces into a single interface(mlx5_bond_0), here is a guideline for how to setup
- enable OVS hardware offload
- set PCI_WR_ORDERING to force_relax
- isolation cpu for test (add isolcpus=0-11 nohz_full=0-11 to /etc/default/grub and reboot)
- disable PFC
sudo apt-get install -y --no-install-recommends libgflags-dev libz-dev liblz4-dev
wget https://github.com/protocolbuffers/protobuf/releases/download/v3.20.3/protobuf-cpp-3.20.3.zip && unzip protobuf-cpp-3.20.3.zip && cd protobuf-3.20.3 && ./configure && make -j&& sudo make install && sudo ldconfig
Due to DOCA driver limitation(DOCA 2.8.0), DPA can only invoke from Arm/access Arm memory under DPU mode, and invoke from host/access host memory under NIC mode. So please change to the correct mode before benchmark. Here is a sample for change mode:
# To NIC MODE
mlxconfig -d /dev/mst/mt41692_pciconf0 s INTERNAL_CPU_OFFLOAD_ENGINE=1
# TO DPU MODE
mlxconfig -d /dev/mst/mt41692_pciconf0 s INTERNAL_CPU_OFFLOAD_ENGINE=0
# Then power cycle
All DPA bench codes located in /flexio folder, the benchmarks used in the paper are described below. This repository contains other benchmarks(For Arm processor or hardware engines) as well.
Benchmark | Description |
---|---|
bench/computer | used for test DPA roofline |
bench/memory | used for test DPA cache/memory |
dpa_kv_aggregation | used for test kv aggregation case study |
dpa_network_function | used for test NFV case study |
dpa_refactor | a sample code for L2 reflector |
dpa_refactor_ddio | used for test DPA DDIO |
dpa_refactor_mt | multi DPA thread version for L2 reflector |
dpa_refactor_random_access | used for test random access in memory when handing packets |
dpa_refactor_workingset | used for test working set size' influence for DPA |
dpa_send | a sample code for send packet use DPA |
dpa_send_mt | multi DPA thread version for dpa_send |
dpa_send_ntp | used for ntp time sync case study |
dpa_small_bank | DPA version smallbank test |
roofline | used for test DPA roofline |
- /test_suite folder contains other BF3 benchmark code, e.g., dma_copy_bench used for benchmark DOCA DMA performance, and so on.
- /scripts folder contains some data process and automatic run some benchmark with differenct params.
All DPDK codes are still working in code format process, but it's partial located in https://github.com/cxz66666/dpdk_cs/tree/dpdk22.11/benchmark , you also can try to write another version with different DPDK implement.
email at chenxuz@zju.edu.cn
// WIP