Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Positive RDMA / smb-direct „taskman test“ between Win-client and linux-Server? #488

Open
besterino opened this issue Nov 2, 2024 · 25 comments

Comments

@besterino
Copy link

Hi!

Since I am struggling and cannot get it to work: has anyone successfully tested RDMA/smb-direct between a linux ksmbd server and a windows client, so that the windows taskmanager on the client does not show corresponding load (load between windows machine is zero or at least very low (kb/s)) while copying large files (e.g. 30GB) at much higher speeds of 1-2.5GB/s?

If so, could you please give some guidance on hardware, distro, specific steps, settings etc. used?

I have described my tries also here in another thread, but willing to start from scratch if there’s a setup / route more likely / proven to succeed:

#466 (comment)

@namjaejeon
Copy link
Owner

Have you ever searched RDMA or smb-direct work in ISSUE of cifsd-team github?

cifsd-team#542
cifsd-team#604
and more... in ISSUE of cifsd-team github.

If so, could you please give some guidance on hardware, distro, specific steps, settings etc. used?

You should add "server multi channel support = yes" parameter in [global] section of your ksmbd.conf
and build ksmbd.ko after turning CONFIG_SMB_SERVER_SMBDIRECT config on.
The smb-direct(RDMA) feature is that the server responds from the client request. So please refer to your client settings or guide.

If it still doesn't work, please let me know.

@besterino
Copy link
Author

Thank you for the input.

Yes, I had a look at various posts but still could not get it to work. Another one I found interesting but also without success: https://forum.level1techs.com/t/how-can-i-help-with-the-new-truenas-100g-testing/179052/8

As to my ksmbd.conf, it already has/had "server multi channel support = yes".
My kernel was build with CONFIG_SMB_SERVER_SMBDIRECT enabled, at least according to /boot/config-6.11.0-9-generic.

On the windows client I tracked RDMA activity with Perfmon. It apparently tries to establish RDMA connections, but they fail.

The only time I see any smb_direct messages in dmesg is immediately after start of the service:
[ +7.774154] ksmbd: selected SMB3_11 dialect idx = 3
[ +0.000009] ksmbd: selected SMB3_11 dialect idx = 3
[ +0.000179] ksmbd: smb_direct: ib device added: name rocep33s0f0
[ +0.000002] ksmbd: smb_direct: ib device added: name rocep33s0f1
[ +0.000354] ksmbd: smb_direct: init RDMA listener. cm_id=0000000084cd3fdd

@namjaejeon
Copy link
Owner

The only time I see any smb_direct messages in dmesg is immediately after start of the service:

Is there any error messages from ksmbd: smb_direct: ? This message("ksmbd: smb_direct: init RDMA listener. cm_id=0000000084cd3fdd") is the last one ?

@besterino
Copy link
Author

Apologies for the late reply, I had not enough time for testing recently. As to your question: yes, that is the last message with smb_direct.

I did another test for RDMA functionality between windows und linux setting up NVME over fabric by this guide: https://www.reddit.com/r/truenas/comments/1fh3rfl/an_idiots_walkthrough_to_setting_up_nvmeofroce/?rdt=60944

That works like a charm, including RDMA performance counters being triggered when accessing the nvme-of target (please excuse the German OS):

NVMEoF_win_proxmox

@namjaejeon
Copy link
Owner

@besterino Could you test ksmbd RDMA after applying the following change ?

diff --git a/transport_rdma.c b/transport_rdma.c
index 29b2b43..d2ca328 100644
--- a/transport_rdma.c
+++ b/transport_rdma.c
@@ -2310,6 +2310,7 @@ out:
                }
        }
 
+       rdma_capable = true;
        return rdma_capable;
 }

@besterino
Copy link
Author

Apologies, not a coder here. How do I apply that change?

@namjaejeon
Copy link
Owner

Sigh,, Okay. Can you dump packets using wireshark ?
You need to capture it when windows client connect to ksmbd server. no need to catpure when copying/reading files in ksmbd share. It will cause too large dump file.

@besterino
Copy link
Author

besterino commented Nov 17, 2024

image

Does this contain what you are looking for? The dump itself was 3.5GB within seconds...

EDIT: attached is a smaller one without initiating a file copy, only initial access to the smb share (rename from txt to pcap)
capture2.txt
.

@namjaejeon
Copy link
Owner

You need to check if smbd-direct is enable in your windows 11 client. When I have checked packets, Your windows doesn't send smb2 ioctl request to ksmbd to know if smb server has RDMA NICs.

@besterino
Copy link
Author

It is definitely enabled. SMB Direct works with three different (windows) servers without any issues and all relevant powershell commands confirm it is enabled. However, the windows client does not show that multichannel connections are established (get-smbmultichannelconnection returns nothing) with the Linux server.

@namjaejeon
Copy link
Owner

I need to test ksmbd RDMA with Windows 11 pro workstation. Do you have Windows 10 pro workstation ? I and other users have verified ksmbd's smb-direct working with it in the past.

@besterino
Copy link
Author

Hmmm. I could check with win10 pro. Will do as soon as I can.

@namjaejeon
Copy link
Owner

Okay, Thanks!

@besterino
Copy link
Author

Ok, learned the hard way that a freshyl 22h2 Win 10 Pro (without "for workstation") does NOT support SMB Direct, even though various MS-sites proclaim otherwise. Switched to Enterprise and with that RDMA does work (between windows 10 and windows server).

Problem with ksmbd remains the same though. No RDMA pointers are triggered with a file copy.

Here is another capture (connect only)
capture2.txt

Dmesg at least reportet smb dialect 311 (a few lines below the red ones):

image

Copy with a win-server within same network works with same client (immediately after copy with ksmbd server). So from my perspective, rather not a client issue.

@namjaejeon
Copy link
Owner

@besterino Okay, Windows 10 also doesn't send smb2 ioctl request to ksmbd. So client doesn't know if server support smb-direct. I will test it on my setup.

@besterino
Copy link
Author

besterino commented Nov 21, 2024

When you check, ideally use the performance monitor for RDMA activity:

image

image

@namjaejeon
Copy link
Owner

If rdma work fine, I can see debug print logs using echo "rdma" > /sys/class/ksmbd-control/debug

@besterino
Copy link
Author

Ah ok. Would still be interesting though to confirm that also windows recognizes RDMA activity with ksmbd on the client side.

@namjaejeon
Copy link
Owner

rdma

I have tested smb-direct(RDMA) with connectx-3 through windows 10 pro workstation. It work fine as you see performance monitor.

@namjaejeon
Copy link
Owner

@leehaoun @hcbwiz Is anyone having trouble with rdma connection to ksmbd on the latest windows 10 or 11 build?

@namjaejeon
Copy link
Owner

@besterino Would you like to test it with ksmbd on github? I don't think there will be any difference...

git clone https://github.com/namjaejeon/ksmbd --branch=next

@leehaoun
Copy link

I used the latest version of Windows 10 Enterprise with a Mellanox ConnectX-5 adapter (Product ID: MCX555A-ECAT). The adapter's port mode was set to ETH(2), and it worked well. However, to achieve the fastest bandwidth with SMB Direct, you need to use fast SSDs. Additionally, it is recommended to check the driver and firmware versions on both sides of the adapter.

@leehaoun @hcbwiz Is anyone having trouble with rdma connection to ksmbd on the latest windows 10 or 11 build?

@namjaejeon
Copy link
Owner

@leehaoun Thanks for your check and answer!

@besterino
Copy link
Author

besterino commented Nov 22, 2024

Thank you for checking! @leehaoun: what Linux distro did you use on the server side and did you install anything „Special“ outside the repository or Set any specific Settings?

I'm about to give up. Today I again spent various hours - this time with Fedora (41) and Redhat (9.5), and cannot get it to work, in particular the install instructions here or there (https://github.com/cifsd-team/ksmbd-tools/blob/master/README.md#building-and-installing) are outdated for current RHEL or Fedora. Since it works for you, I am sure I am doing something wrong or missing something, but trial-and-error gets me nowhere without knowing which server environment is known to work (ideally out of the box or with minimal changes).

@leehaoun
Copy link

leehaoun commented Nov 24, 2024

Thank you for checking! @leehaoun: what Linux distro did you use on the server side and did you install anything „Special“ outside the repository or Set any specific Settings?

I'm about to give up. Today I again spent various hours - this time with Fedora (41) and Redhat (9.5), and cannot get it to work, in particular the install instructions here or there (https://github.com/cifsd-team/ksmbd-tools/blob/master/README.md#building-and-installing) are outdated for current RHEL or Fedora. Since it works for you, I am sure I am doing something wrong or missing something, but trial-and-error gets me nowhere without knowing which server environment is known to work (ideally out of the box or with minimal changes).

I used ubuntu desktop 22.04 with back-to-back cable connection.
you can check your adaper rdma-capable in this code.

static bool rdma_frwr_is_supported(struct ib_device_attr *attrs)
{
	if (!(attrs->device_cap_flags & IB_DEVICE_MEM_MGT_EXTENSIONS))
	{
		ksmbd_debug(RDMA, "cap_flags : %d\n", attrs->device_cap_flags);
		ksmbd_debug(RDMA, "MGT_EXTENSION : %d\n", IB_DEVICE_MEM_MGT_EXTENSIONS);
		return false;
	}
	if (attrs->max_fast_reg_page_list_len == 0)
	{
		ksmbd_debug(RDMA, "reg_page_list_len : %d\n", attrs->max_fast_reg_page_list_len);
		return false;
	}
	return true;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants