Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Which AVX512 instructions sets are supported? #1

Open
TheIronBorn opened this issue Jul 28, 2023 · 6 comments
Open

Which AVX512 instructions sets are supported? #1

TheIronBorn opened this issue Jul 28, 2023 · 6 comments

Comments

@TheIronBorn
Copy link

See https://en.wikipedia.org/wiki/AVX-512#CPUs_with_AVX-512.
Some instructions like _mm512_rol_epi64 are only available in avx512vl. Most CPUs have avx512vl but it would be good to report which instruction sets are assumed/supported

@martinothamar
Copy link
Owner

martinothamar commented Jul 29, 2023

Hi, very good point, I've updated the README

simd-rand/README.md

Lines 5 to 10 in 20ddb7b

- [`portable`] - portable implementations using `std::simd` (nightly required)
- [`specific`] - implementations using architecture-specific hardware intrinsics
- [`specific::avx2`] - AVX2 for x86_64 architecture (4 lanes for 64bit)
- Requires `avx2` CPU flag, but has additional optimization if you have `avx512dq` and `avx512vl`
- [`specific::avx512`] - AVX512 for x86_64 architecture (8 lanes for 64bit)
- Requires `avx512f`, `avx512dq` CPU flags

The specific module looks like this:

#[cfg(all(target_arch = "x86_64", target_feature = "avx2"))]
pub mod avx2;
#[cfg(all(target_arch = "x86_64", target_feature = "avx512f", target_feature = "avx512dq"))]
pub mod avx512;

One thing I'm a bit unsure about though now that you mention it is _mm_cvtsi32_si128, it's used in basically all bitshifting code for Xoshiro variants. This is a sse2 intruction, but if I've understood correctly avx2 always implies sse2 as well, atleast in practice. I see that numpy makes the same assumptions bot for AVX2 and AVX512 codepaths:

https://github.com/numpy/numpy/blob/e41180d3fe0cd054d57ce446b20b22b95f206d85/numpy/core/src/common/simd/avx512/operators.h#L34

@TheIronBorn
Copy link
Author

Yeah instruction subsets are confusing and annoying. To my best knowledge all AVX2/512 will also have sse2. (Also Rust assumes sse2 for any x86 64-bit hardware, so I'm not even sure if such hardware existed it could even use Rust)

@TheIronBorn
Copy link
Author

Actually you can just use _mm_slli_epi64 and variants and thus skip _mm_cvtsi32_si128.
Note also you could provide variants which use _mm512_rol_epi64 if avx512vl is available

@martinothamar
Copy link
Owner

martinothamar commented Jul 30, 2023

true! the only place where I can't use the i-suffix variant currently is the rotate_left generic function, since it has a const generic parameter. It was telling me to enable some nightly feature and it didn't seem functional at all

what do you mean regarding _mm512_rol_epi64? there are both _mm256_slli_epi64 and _mm512_slli_epi64. Are they functionally equivalent to the rol-variants? It looks like it to me. Latency and throughput numbers seem equal as well

@martinothamar
Copy link
Owner

ooo sorry for being slow, you're saying that rotate_left == _mm512_rol_epi64. Awesome, thanks 😄

@martinothamar
Copy link
Owner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants