-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
move to modern indexing style #790
Conversation
check #722 too |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, I think there's no reason to drop @inbounds
if we use eachindex
and hence ensure that indexing is correct. However, it seems here we could also just use zip
without explicit eachindex
?
As in #722, it would be good to add tests with OffsetArrays.
As an alternative, that would also allow to fix the promotion to Float64
and allow us to use pairwise summation we could use e.g.
function sqL2dist(a::AbstractArray{<:Number}, b::AbstractArray{<:Number})
length(a) == length(b) || throw(DimensionMismatch("length of inputs incompatible"))
r = sum(Broadcast.instantiate(Broadcast.broadcasted(vec(a), vec(b)) do ai, bi
return abs2(ai - bi)
end))
return r
end
Compiler should infer inbounds when using each index |
after some testing, I suggest keep using |
on the master branch of Julia, the and I don't understand the CI error on nightly |
Co-authored-by: Kristoffer Carlsson <kcarlsson89@gmail.com>
Seems like the PR is mainly missing tests, e.g., with OffsetArrays? (Of course, there are other possible improvements discussed in some comments above but in my opinion they could go into separate PRs since this PR is already a clear improvement for arrays with non-standard indices.) |
src/deviation.jl
Outdated
@inbounds if a[i] == b[i] | ||
c += 1 | ||
end | ||
for (ai, bi) = zip(a, b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using zip
here is incorrect given the contract of counteq
(same comments below).
The point is that the contract states:
Count the number of indices
and zip
will ignore indices but instead lead to comparing values using their iteration order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO that's consistent with the current behaviour though, regardless of the docstring. The current implementation already only cares about whether the first, second, third, etc elements match but doesn't care about shapes or cartesian indices of the compared arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another nice consequence of using zip
is that it is consistent with how Distances handles input arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @bkamins that the docstring implies that values at the same indices are compared. So we need to check that indices are the same (or that starting indices are equal in addition to the current check that length are equal). We can keep a tolerance when mixing vectors and matrices by only check that linear indices are equal to avoid breakage.
What do you mean about Distances? Doesn't it check that axes are the same or that at least linear indices are the same? EDIT: just saw the link you posted above to https://github.com/JuliaStats/Distances.jl/blob/91f51b543ea6c54936d3e6183acdf7da50bf1f9e/src/metrics.jl#L251. I guess we could use a similar approach, but I would suggest being stricter for arrays with non-1-based indices, and requiring that they start at the same index. Note that this logic in Distances predates OffsetArray
support since it was already present at JuliaStats/Distances.jl#164, so there's no reason to think this behaviors makes sense for OffsetArray
s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the package does neither support nor test arrays with non-standard or non-linear indices, my interpretation is that the docstring just refers to iteration order but it was not intended to come up with a more general design decision. Maybe there's some information in the original PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we're free to reinterpret and adapt the docstring as we deem appropriate. But when passing arrays with different axes, it would find it dangerous to silently discard indices. If people use OffsetArray
s there must be a reason. That's also a safer approach as if people find it too inconvenient, we can relax this requirement and allow mismatched indices. OTC if we allow them now we won't be able to change this later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is clear from the source codes that the original design did not take into account non-1-based indices. Then - naturally - docstrings do not reflect this case.
I think we need to make a general decision for the whole package:
- if we allow non-1-based indexing (I feel from the discussions that we want to allow them)
- if yes - how they should be treated (and this should be made consistent across all methods in the package)
- what is our position about mixing arrays of different dimensionalities (this point is related as higher-dimensional arrays usually support linear indexing, which is 1-based) and when we want to accept only vectors.
After these decisions are made and documented the respective PRs should be done to reflect them. Otherwise we risk a situation when different methods in the package will take different assumptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A simple solution could be to adopt the semantics of broadcast
, as discussed above at #790 (comment). This would avoid forcing users to understand yet another rule and it would be easier to implement for us (in practice we could use a different implementation under the hood if needed for performance).
@Moelf could we wrap up this PR? It looks almost ready to merge and I want to end all the negative press from StatsBase's frequent (By the way, does |
what's the desired action? do we want to use |
I'm happy with |
oh, but then it's already fixed on master, closing now |
No description provided.