Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added check for Infiniband PCIe link width and speed. #90

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

yqin
Copy link

@yqin yqin commented Mar 23, 2020

This PR should address the difference between what PCI-e link layer reports and driver reports.

@Obihoernchen
Copy link

Works very well!

@kcgthb
Copy link

kcgthb commented Apr 8, 2021

It works very well for HCAs using the mlx5_core module, indeed.

But AFAICT, mlx4_ib doesn't provide the /sys/class/infiniband/<device>/device/current_link_{speed,width} files, which make the check fail:

# nhc -d -e 'check_hw_ib 56'
DEBUG:  Debugging activated via -d option.
DEBUG:  Evaluating single check line:  check_hw_ib 56
[...]
/etc/nhc/scripts/lbnl_hw.nhc: line 134: /sys/class/infiniband/mlx4_0/device/current_link_speed: No such file or directory
/etc/nhc/scripts/lbnl_hw.nhc: line 137: /sys/class/infiniband/mlx4_0/device/current_link_width: No such file or directory
[1617908392] - DEBUG:  Found ACTIVE (LinkUp) IB Port mlx4_0:1 (56 Gb/sec) with PCI-e link 56x 56 GT/s

This is with a ConnectX-3 card: "Mellanox Technologies MT27500 Family [ConnectX-3]"

@mej mej self-assigned this Apr 18, 2021
@mej mej added this to the 1.5 Release milestone Apr 18, 2021
@mej
Copy link
Owner

mej commented Apr 18, 2021

This looks great; thanks, Yong!

I will address @kcgthb's comments by checking for the existence of the file prior to merge. So thanks to you both! 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants