-
Notifications
You must be signed in to change notification settings - Fork 70
Performance Tuning for Mellanox Adapters
-
Mellanox aims to provide the best out-of-box performance possible, however, in some cases, achieving optimal performance may require additional system and/or network adapter configurations.
-
As a starting point, it is always recommended to download and install the latest MLNX_OFED drivers for your OS.
-
If you are using an Ethernet fabric, be sure to correctly configure flow control, as described in the following page: "Flow Control and QoS configuration for RoCE fabrics" .
-
Prior to running any SparkRDMA jobs, you should first test the point to point performance between nodes to ensure you are achieving expected results. This can be done using the ib_send_bw utility provided in the MLNX_OFED package.
ib_send_bw is a client/server utility. First, you must start the server instance. The utility will default to the first adapter it finds within the system. If you have multiple devices within the same system, then you will need to use the -D flag to specify the correct interface. The ibdev2netdev command will give you the mapping from RDMA device name to ethernet device name:
$ ibdev2netdev mlx5_0 port 1 ==> ens2 (Up)
Start the server side
$ ib_send_bw -d mlx5_0 ************************************ * Waiting for client to connect... * ************************************
Run the client to connect to the IP address of the server:
$ ib_send_bw -d mlx5_0 192.168.1.14 --------------------------------------------------------------------------------------- Send BW Test Dual-port : OFF Device : mlx5_0 Number of qps : 1 Transport type : IB Connection type : RC Using SRQ : OFF TX depth : 128 CQ Moderation : 100 Mtu : 1024[B] Link type : Ethernet GID index : 5 Max inline data : 0[B] rdma_cm QPs : OFF Data ex. method : Ethernet --------------------------------------------------------------------------------------- local address: LID 0000 QPN 0x01b6 PSN 0x29626e GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:15 remote address: LID 0000 QPN 0x02be PSN 0x1c5dfe GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:14 --------------------------------------------------------------------------------------- #bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps] 65536 1000 8125.36 8125.06 0.130001 ---------------------------------------------------------------------------------------
-
It is also recommended to run the mlnx_tune utility, which will run several system checks and provide notification of any potential settings which will cause performance degradation. Running mlnx_tune requires superuser privlidges.
$ sudo mlnx_tune 2017-08-16 14:47:17,023 INFO Collecting node information 2017-08-16 14:47:17,023 INFO Collecting OS information 2017-08-16 14:47:17,026 INFO Collecting CPU information 2017-08-16 14:47:17,104 INFO Collecting IRQ Balancer information 2017-08-16 14:47:17,107 INFO Collecting Firewall information 2017-08-16 14:47:17,111 INFO Collecting IP table information 2017-08-16 14:47:17,115 INFO Collecting IPv6 table information 2017-08-16 14:47:17,118 INFO Collecting IP forwarding information 2017-08-16 14:47:17,122 INFO Collecting hyper threading information 2017-08-16 14:47:17,122 INFO Collecting IOMMU information 2017-08-16 14:47:17,124 INFO Collecting driver information 2017-08-16 14:47:18,281 INFO Collecting Mellanox devices information Mellanox Technologies - System Report ConnectX-5 Device Status on PCI 03:00.0 FW version 12.18.2000 OK: PCI Width x16 OK: PCI Speed 8GT/s PCI Max Payload Size 256 PCI Max Read Request 512 Local CPUs list [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] ens2 (Port 1) Status Link Type eth OK: Link status Up Speed 100GbE MTU 1500 OK: TX nocache copy 'off' 2017-08-16 14:47:18,777 INFO System info file: /tmp/mlnx_tune_170816_144716.log
-
After running the mlnx_tune command, it is highly recommended to set the cpuList parameter (described in Configuration Properties) within spark.conf file using the NUMA cores associated with the Mellanox device.
Local CPUs list [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] spark.shuffle.rdma.cpuList 0-15
-
More indepth performance resources can be found in the Mellanox Community post: Performance Tuning for Mellanox Adapters