Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LD_PRELOAD fails while application samples work #923

Open
miharulidze opened this issue Jul 1, 2024 · 5 comments
Open

LD_PRELOAD fails while application samples work #923

miharulidze opened this issue Jul 1, 2024 · 5 comments

Comments

@miharulidze
Copy link

Dear MTL developers,

I want to test the MTL user-space UDP implementation with NVIDIA BlueField-2 NICs.

I was able to successfully build MTL with DPDK v23.11 (and some minor changes to disable AVX-related code to compile on ARM).
Built-in UfdServerSample application starts to wait for the data without any problems (e.g., EAL init passes successfully), but when I'm trying to run my UDP "helloworld" example with the LD_PRELOAD'ed libmtl_udp_preload.so the mlx5 EAL fails to initialize.

Is it possible that some code around MUFD in LD_PRELOAD wrapper break things - do you have any idea where to look at?

I'm attaching two logs with the maximum EAL/PMD logging level collected under root user for you to analyse together with UFD json config:

  1. ufdserversample.log - that initialised without any problems
  2. ldpreload.log - that fails to initialise with the same MUFD config
  3. ufd_server.json - MUFD config that I use to set up the NIC.

Thank you very much in advance!

@frankdjx
Copy link
Collaborator

frankdjx commented Jul 2, 2024

Yes, seems the override function in MTL preload break the mlx5_common init process. The MTL override function may give an unexpected return compared to default libc function.

mlx5_common: Failed to open IB device "mlx5_1".

The override functions list can be found at https://github.com/OpenVisualCloud/Media-Transport-Library/blob/main/ld_preload/udp/udp_preload.h#L80.

I guess we need find which override function break mlx5_common init and then check what's cause.

@miharulidze
Copy link
Author

miharulidze commented Jul 2, 2024

Hi @frankdjx ,

Thank you very much for the reply and the reference to the code !

I traced the mlx5_common code and it fails during the IB device init here: https://github.com/DPDK/dpdk/blob/c15902587b538ff02cfb0fbb4dd481f1503d936b/drivers/common/mlx5/linux/mlx5_glue.c#L82

So if I understand correctly, you expect that it fails because internally ibv_open_device(..) can call one of the socket functions intercepted by MTL and, thus changing the standard/expected POSIX behaviour. I'll try to change the LD_PRELOAD init code and init mlx5 PMD before it overrides sockets and also add debug prints into the each intercepted call to check this.

I will keep you posted.

@frankdjx
Copy link
Collaborator

frankdjx commented Jul 2, 2024

Yes. All MTL-intercepted socket functions initially check if the file descriptor (fd) was created by an MTL socket function. If the fd was not created by MTL, the function redirect to the standard libc path.

We have tested this ld_preload on the Intel E810 NIC, and it performs well.

However, the mlx5 appears to be more complex, potentially harboring scenarios that the current flow does not adequately address.

@miharulidze
Copy link
Author

We have tested this ld_preload on the Intel E810 NIC, and it performs well.

Regarding the E810 NIC, did you test the UDP stack with 100Gbit/s port?
Were you able to achieve 100 Gbit/s? What MTU did you use and how many cores did you need to drive TX/RX at the full speed? Also, did you test the packet rate with very small (e.g., 64 B) datagrams?

@frankdjx
Copy link
Collaborator

frankdjx commented Jul 3, 2024

We have tested this ld_preload on the Intel E810 NIC, and it performs well.

Regarding the E810 NIC, did you test the UDP stack with 100Gbit/s port? Were you able to achieve 100 Gbit/s? What MTU did you use and how many cores did you need to drive TX/RX at the full speed? Also, did you test the packet rate with very small (e.g., 64 B) datagrams?

Below is the TX data we has, the RX improvements is similar.

<style> </style>
udp pkt size thread MTL throughput(gb/s) kernel throughput(gb/s) MTL vs kernel
1460 1 43.879977 7.641692 5.742180789
1024 1 35.105923 5.624864 6.241203876
512 1 19.953871 2.861991 6.97202437
256 1 10.354163 1.439964 7.190570736
128 1 5.443459 0.7284 7.473172707

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants