Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move to modern indexing style #790

Closed
wants to merge 5 commits into from
Closed

Conversation

Moelf
Copy link
Contributor

@Moelf Moelf commented May 16, 2022

No description provided.

@Moelf
Copy link
Contributor Author

Moelf commented May 16, 2022

check #722 too

Copy link
Member

@devmotion devmotion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, I think there's no reason to drop @inbounds if we use eachindex and hence ensure that indexing is correct. However, it seems here we could also just use zip without explicit eachindex?

As in #722, it would be good to add tests with OffsetArrays.

As an alternative, that would also allow to fix the promotion to Float64 and allow us to use pairwise summation we could use e.g.

function sqL2dist(a::AbstractArray{<:Number}, b::AbstractArray{<:Number})
    length(a) == length(b) || throw(DimensionMismatch("length of inputs incompatible"))
    r = sum(Broadcast.instantiate(Broadcast.broadcasted(vec(a), vec(b)) do ai, bi
        return abs2(ai - bi)
    end))
    return r
end

src/deviation.jl Outdated Show resolved Hide resolved
src/deviation.jl Outdated Show resolved Hide resolved
src/deviation.jl Outdated Show resolved Hide resolved
src/deviation.jl Outdated Show resolved Hide resolved
src/deviation.jl Outdated Show resolved Hide resolved
src/deviation.jl Outdated Show resolved Hide resolved
@Moelf
Copy link
Contributor Author

Moelf commented May 16, 2022

Compiler should infer inbounds when using each index

@Moelf
Copy link
Contributor Author

Moelf commented May 16, 2022

after some testing, zip() is as slow as eachindex(a, b) without @inbounds

I suggest keep using eachindex(a, b) but also add @inbounds

@Moelf Moelf requested a review from devmotion May 16, 2022 22:29
@Moelf
Copy link
Contributor Author

Moelf commented May 16, 2022

on the master branch of Julia, the @inbounds is inferred from for i in eachindex(a, b) as expected.

and I don't understand the CI error on nightly

src/deviation.jl Outdated Show resolved Hide resolved
src/deviation.jl Outdated Show resolved Hide resolved
Co-authored-by: Kristoffer Carlsson <kcarlsson89@gmail.com>
@devmotion
Copy link
Member

Seems like the PR is mainly missing tests, e.g., with OffsetArrays? (Of course, there are other possible improvements discussed in some comments above but in my opinion they could go into separate PRs since this PR is already a clear improvement for arrays with non-standard indices.)

src/deviation.jl Outdated
@inbounds if a[i] == b[i]
c += 1
end
for (ai, bi) = zip(a, b)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using zip here is incorrect given the contract of counteq (same comments below).

The point is that the contract states:

Count the number of indices

and zip will ignore indices but instead lead to comparing values using their iteration order.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO that's consistent with the current behaviour though, regardless of the docstring. The current implementation already only cares about whether the first, second, third, etc elements match but doesn't care about shapes or cartesian indices of the compared arrays.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another nice consequence of using zip is that it is consistent with how Distances handles input arrays.

Copy link
Member

@nalimilan nalimilan May 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @bkamins that the docstring implies that values at the same indices are compared. So we need to check that indices are the same (or that starting indices are equal in addition to the current check that length are equal). We can keep a tolerance when mixing vectors and matrices by only check that linear indices are equal to avoid breakage.

What do you mean about Distances? Doesn't it check that axes are the same or that at least linear indices are the same? EDIT: just saw the link you posted above to https://github.com/JuliaStats/Distances.jl/blob/91f51b543ea6c54936d3e6183acdf7da50bf1f9e/src/metrics.jl#L251. I guess we could use a similar approach, but I would suggest being stricter for arrays with non-1-based indices, and requiring that they start at the same index. Note that this logic in Distances predates OffsetArray support since it was already present at JuliaStats/Distances.jl#164, so there's no reason to think this behaviors makes sense for OffsetArrays.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the package does neither support nor test arrays with non-standard or non-linear indices, my interpretation is that the docstring just refers to iteration order but it was not intended to come up with a more general design decision. Maybe there's some information in the original PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we're free to reinterpret and adapt the docstring as we deem appropriate. But when passing arrays with different axes, it would find it dangerous to silently discard indices. If people use OffsetArrays there must be a reason. That's also a safer approach as if people find it too inconvenient, we can relax this requirement and allow mismatched indices. OTC if we allow them now we won't be able to change this later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is clear from the source codes that the original design did not take into account non-1-based indices. Then - naturally - docstrings do not reflect this case.

I think we need to make a general decision for the whole package:

  • if we allow non-1-based indexing (I feel from the discussions that we want to allow them)
  • if yes - how they should be treated (and this should be made consistent across all methods in the package)
  • what is our position about mixing arrays of different dimensionalities (this point is related as higher-dimensional arrays usually support linear indexing, which is 1-based) and when we want to accept only vectors.

After these decisions are made and documented the respective PRs should be done to reflect them. Otherwise we risk a situation when different methods in the package will take different assumptions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simple solution could be to adopt the semantics of broadcast, as discussed above at #790 (comment). This would avoid forcing users to understand yet another rule and it would be easier to implement for us (in practice we could use a different implementation under the hood if needed for performance).

@ParadaCarleton
Copy link
Contributor

ParadaCarleton commented Aug 22, 2023

@Moelf could we wrap up this PR? It looks almost ready to merge and I want to end all the negative press from StatsBase's frequent @inbounds errors.

(By the way, does eachindex still not automatically elide bounds checking?)

@Moelf
Copy link
Contributor Author

Moelf commented Aug 22, 2023

what's the desired action? do we want to use zip or @inbounds?

@Moelf Moelf closed this Aug 22, 2023
@Moelf Moelf reopened this Aug 22, 2023
@ParadaCarleton
Copy link
Contributor

ParadaCarleton commented Aug 22, 2023

I'm happy with @inbounds, as long as eachindex is being used. If we want to fix that later that's fine, but for now we just need to fix the bugs.

@Moelf
Copy link
Contributor Author

Moelf commented Aug 22, 2023

oh, but then it's already fixed on master, closing now

@Moelf Moelf closed this Aug 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants