Which AVX512 instructions sets are supported? #1

TheIronBorn · 2023-07-28T20:31:16Z

See https://en.wikipedia.org/wiki/AVX-512#CPUs_with_AVX-512.
Some instructions like _mm512_rol_epi64 are only available in avx512vl. Most CPUs have avx512vl but it would be good to report which instruction sets are assumed/supported

The text was updated successfully, but these errors were encountered:

martinothamar · 2023-07-29T11:17:27Z

Hi, very good point, I've updated the README

simd-rand/README.md

Lines 5 to 10 in 20ddb7b

    
           - [`portable`] - portable implementations using `std::simd` (nightly required)  
        
           - [`specific`] - implementations using architecture-specific hardware intrinsics 
        
             - [`specific::avx2`] - AVX2 for x86_64 architecture (4 lanes for 64bit) 
        
               - Requires `avx2` CPU flag, but has additional optimization if you have `avx512dq` and `avx512vl` 
        
             - [`specific::avx512`] - AVX512 for x86_64 architecture (8 lanes for 64bit) 
        
               - Requires `avx512f`, `avx512dq` CPU flags

The specific module looks like this:

simd-rand/src/specific/mod.rs

Lines 1 to 5 in 20ddb7b

    
           #[cfg(all(target_arch = "x86_64", target_feature = "avx2"))] 
        
           pub mod avx2; 
        
           #[cfg(all(target_arch = "x86_64", target_feature = "avx512f", target_feature = "avx512dq"))] 
        
           pub mod avx512;

One thing I'm a bit unsure about though now that you mention it is _mm_cvtsi32_si128, it's used in basically all bitshifting code for Xoshiro variants. This is a sse2 intruction, but if I've understood correctly avx2 always implies sse2 as well, atleast in practice. I see that numpy makes the same assumptions bot for AVX2 and AVX512 codepaths:

https://github.com/numpy/numpy/blob/e41180d3fe0cd054d57ce446b20b22b95f206d85/numpy/core/src/common/simd/avx512/operators.h#L34

TheIronBorn · 2023-07-30T03:21:49Z

Yeah instruction subsets are confusing and annoying. To my best knowledge all AVX2/512 will also have sse2. (Also Rust assumes sse2 for any x86 64-bit hardware, so I'm not even sure if such hardware existed it could even use Rust)

TheIronBorn · 2023-07-30T08:02:43Z

Actually you can just use _mm_slli_epi64 and variants and thus skip _mm_cvtsi32_si128.
Note also you could provide variants which use _mm512_rol_epi64 if avx512vl is available

martinothamar · 2023-07-30T09:58:15Z

true! the only place where I can't use the i-suffix variant currently is the rotate_left generic function, since it has a const generic parameter. It was telling me to enable some nightly feature and it didn't seem functional at all

what do you mean regarding _mm512_rol_epi64? there are both _mm256_slli_epi64 and _mm512_slli_epi64. Are they functionally equivalent to the rol-variants? It looks like it to me. Latency and throughput numbers seem equal as well

martinothamar · 2023-07-30T12:42:32Z

ooo sorry for being slow, you're saying that rotate_left == _mm512_rol_epi64. Awesome, thanks 😄

martinothamar · 2023-07-30T13:06:26Z

Compiler was already doing it apparantly

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Which AVX512 instructions sets are supported? #1

Which AVX512 instructions sets are supported? #1

TheIronBorn commented Jul 28, 2023

martinothamar commented Jul 29, 2023 •

edited

Loading

TheIronBorn commented Jul 30, 2023

TheIronBorn commented Jul 30, 2023

martinothamar commented Jul 30, 2023 •

edited

Loading

martinothamar commented Jul 30, 2023

martinothamar commented Jul 30, 2023

Which AVX512 instructions sets are supported? #1

Which AVX512 instructions sets are supported? #1

Comments

TheIronBorn commented Jul 28, 2023

martinothamar commented Jul 29, 2023 • edited Loading

TheIronBorn commented Jul 30, 2023

TheIronBorn commented Jul 30, 2023

martinothamar commented Jul 30, 2023 • edited Loading

martinothamar commented Jul 30, 2023

martinothamar commented Jul 30, 2023

martinothamar commented Jul 29, 2023 •

edited

Loading

martinothamar commented Jul 30, 2023 •

edited

Loading