Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thoughts on gather_from #95

Open
danieltowner opened this issue Jun 22, 2023 · 4 comments
Open

Thoughts on gather_from #95

danieltowner opened this issue Jun 22, 2023 · 4 comments

Comments

@danieltowner
Copy link
Collaborator

At Varna a poll agreed that gather and scatter should be made member functions called gather_from and scatter_to, to match their copy_from and copy_to counterparts. The original free function gather looked like this:

template<std::contiguous_iterator Iter, std::integral Idx, typename AbiIdx>
constexpr simd<std::iter_value_t<Iter>, basic_simd<Idx, AbiIdx>::size()>
gather(Iter in, const simd<Idx, AbiIdx>& indexes);

This takes a set of indexes and an iterator, and generates a simd of the same size as there are indexes, with the ith element set to in[indexes[i]]. Following the direction of the Varna poll this would change to:

template<std::contiguous_iterator Iter, std::integral Idx,, class... Flags>
constexpr simd<std::iter_value_t<Iter>, size()>
simd<T, Abi> basic_simd::gather_from(Iter in, const basic_simd<Idx, Abi>& indexes, simd_flags<Flags...> f = {});

This new version gains the ability to do conversions as part of the gather (controlled by the flags and the type of Iter in comparison to the value_type of the object invoking the gather. This version also enforces that the number of indexes matches the simd into which the values are being gathered. Agreed?

In addition, it was agreed that gather_from should take a mask parameter as well, to match copy_from:

template<std::contiguous_iterator Iter, std::integral Idx, class... Flags>
constexpr simd<std::iter_value_t<Iter>, size()>
simd<T, Abi> basic_simd::gather_from(Iter in, const mask_type& mask, const mask_type& mask, 
                                     const basic_simd<Idx, Abi>& indexes, simd_flags<Flags...> f = {});

Note that the mask is the object's own mask_type since it is the values of the object's own elements which are being masked. Flags have been left as the last parameter where they can default. The first two parameters are left as the iterator and the mask so that they match the order of copy_from, and the indexes are made to be an extra parameter passed after the basic it/mask information.

A disadvantage of a member function gather_from is that there needs to be an existing object from which to call the method. This is also true of copy_from and in that case this is solved by providing a constructor with the same arguments. Should we therefore extend basic_simd::basic_simd overloads to include the two following constructors as well:

template<std::contiguous_iterator Iter, std::integral Idx, class... Flags>
constexpr basic_simd<T, Abi>::basic_simd(Iter in,
                                         const basic_simd<Idx, Abi>& indexes, simd_flags<Flags...> f = {});

template<std::contiguous_iterator Iter, std::integral Idx, class... Flags>
constexpr basic_simd<T, Abi>::basic_simd(Iter in, const mask_type& mask,
                                         const basic_simd<Idx, Abi>& indexes, simd_flags<Flags...> f = {});

The reason for calling the free function gather, or gather_from was to make it explicit in the source code what operation was happening, but I think that the most common use case would be to gather directly into a new basic_simd object using a constructor rather than overwriting something already existing, in which case the gather_from name rarely gets used, and the purpose of the code becomes more obscure. It also mean that the set of overloads on the basic_simd constructor is getting ever bigger.

I like the new name, I like the addition of the mask parameter, and I like allowing flags for future expansion (e.g., streaming). But I am now wondering whether it is better to have it as a free function after all. If we consider permute:

auto p = permute(values, indexes);

This returns a new simd of the same elements as value, with the size of indexes. If we took that as a pattern, then

auto g = gather_from(iter, indexes);

has a similar pattern, generating the same elements as the iter with the size of indexes. Perhaps it is better to ignore conversions for gather in the same way as we ignored conversions for permute, and require the user to do an extra conversions if they need to?

Or perhaps we keep gather as a member of the indexes instead:

simd<int> indexes;
const auto gatheredValues = indexes.gather_from(array);

I think this is less intuitive and it doesn't fit alongside permute, so I don't really like it.

Some of the same thinking applies to scatter_to as well, but I think that would be easy to make a member function. Doing so would make it inconsistent with gather_from though, so I think we need to think about gather first and then how scatter would work in the same way. Having mask and flags in scatter is much easier.

Thoughts @mattkretz @rarutyun

@mattkretz
Copy link
Owner

To be fully general replace basic_simd<Idx, Abi> with basic_simd<Idx, IdxAbi>, adding a constraint that the sizes match. (I know your implementation doesn't need it...)

I agree that the ctor would be useful but cryptic. More alternatives:

  1. No ctor but enable & document the following pattern:
const std::basic_simd x = std::simd<float>().gather_from(ptr, idx);

I.e. make gather_from return *this. If we do this we should consider adjusting copy_from accordingly.

  1. Named constructor:
const std::basic_simd x = std::simd<float>::gather_construct(ptr, idx);

@mattkretz
Copy link
Owner

Just noticed: If there's going to be gather_from (and copy_from) member functions that return *this then we need to overload on lvalue vs. rvalue:

simd gather_from(...) &&;
simd& gather_from(...) &;

The rvalue overload could even be const &&: we don't need to modify a temporary since its lifetime basically ends with the end of the gather_from function.

@danieltowner
Copy link
Collaborator Author

I don't particularly like having gather_from have to be so careful with its return types - to me that feels like we're trying to patch up problems that arise from going about this in the wrong way.

I like the named constructor better but would we then have to do the same with some of the other constructors? - That is, why would gather_from be special? Would you also want copy_from_construct and generator_construct too (and maybe more)? To me, having a constructor which accepts a pointer seems okay as that seems a fairly obvious thing to want to do to generate a SIMD value, but having special constructors to deal with the other scenarios seems less reasonable and introduces inconsistencies if we don't apply the same idea to the other available constructors.

Should we go back to the original plan of having them as free functions? After all, we aren't proposing that permute, compress, expand and others should also be members or constructors, so why is gather_from different? We can still let it have a mask and flags if we wanted, but it wouldn't do any embedded convert and the conversion would have to be done separately if the user wanted it.

@mattkretz
Copy link
Owner

Actually, your question whether permute, compress, expand, ... should be members ties in with mattkretz/std-simd-feedback#80. Did you consider members? What's the "generic code story"? I.e. is there one?

x = x.compress(x > 0).permute([](int i) { return i ^ 1; });

That's easier to read than

x = simd_permute(simd_compress(x, x > 0), [](int i) { return i ^ 1; });

I don't think the argument that rvalue and lvalue simd need to consider lifetime carries much weight here. This is nothing the user would need to know/understand. It just works correctly instead of introducing silent object lifetime bugs. However, we might want to have different names for member functions modifying the object (lvalue) and those returning a new object. Analogy: x += 1 modifies an lvalue and doesn't accept an rvalue on the left, x + 1 returns a new object and works with lvalue and rvalue on the left.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Undecided
Development

No branches or pull requests

2 participants