Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move to modern indexing style #790

Closed
wants to merge 5 commits into from
Closed
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 16 additions & 27 deletions src/deviation.jl
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,10 @@ Count the number of indices at which the elements of the arrays
`a` and `b` are equal.
"""
function counteq(a::AbstractArray, b::AbstractArray)
n = length(a)
length(b) == n || throw(DimensionMismatch("Inconsistent lengths."))
length(a) == length(b) || throw(DimensionMismatch("Inconsistent lengths."))
c = 0
for i = 1:n
@inbounds if a[i] == b[i]
c += 1
end
for (ai, bi) = zip(a, b)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using zip here is incorrect given the contract of counteq (same comments below).

The point is that the contract states:

Count the number of indices

and zip will ignore indices but instead lead to comparing values using their iteration order.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO that's consistent with the current behaviour though, regardless of the docstring. The current implementation already only cares about whether the first, second, third, etc elements match but doesn't care about shapes or cartesian indices of the compared arrays.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another nice consequence of using zip is that it is consistent with how Distances handles input arrays.

Copy link
Member

@nalimilan nalimilan May 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @bkamins that the docstring implies that values at the same indices are compared. So we need to check that indices are the same (or that starting indices are equal in addition to the current check that length are equal). We can keep a tolerance when mixing vectors and matrices by only check that linear indices are equal to avoid breakage.

What do you mean about Distances? Doesn't it check that axes are the same or that at least linear indices are the same? EDIT: just saw the link you posted above to https://github.com/JuliaStats/Distances.jl/blob/91f51b543ea6c54936d3e6183acdf7da50bf1f9e/src/metrics.jl#L251. I guess we could use a similar approach, but I would suggest being stricter for arrays with non-1-based indices, and requiring that they start at the same index. Note that this logic in Distances predates OffsetArray support since it was already present at JuliaStats/Distances.jl#164, so there's no reason to think this behaviors makes sense for OffsetArrays.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the package does neither support nor test arrays with non-standard or non-linear indices, my interpretation is that the docstring just refers to iteration order but it was not intended to come up with a more general design decision. Maybe there's some information in the original PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we're free to reinterpret and adapt the docstring as we deem appropriate. But when passing arrays with different axes, it would find it dangerous to silently discard indices. If people use OffsetArrays there must be a reason. That's also a safer approach as if people find it too inconvenient, we can relax this requirement and allow mismatched indices. OTC if we allow them now we won't be able to change this later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is clear from the source codes that the original design did not take into account non-1-based indices. Then - naturally - docstrings do not reflect this case.

I think we need to make a general decision for the whole package:

  • if we allow non-1-based indexing (I feel from the discussions that we want to allow them)
  • if yes - how they should be treated (and this should be made consistent across all methods in the package)
  • what is our position about mixing arrays of different dimensionalities (this point is related as higher-dimensional arrays usually support linear indexing, which is 1-based) and when we want to accept only vectors.

After these decisions are made and documented the respective PRs should be done to reflect them. Otherwise we risk a situation when different methods in the package will take different assumptions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simple solution could be to adopt the semantics of broadcast, as discussed above at #790 (comment). This would avoid forcing users to understand yet another rule and it would be easier to implement for us (in practice we could use a different implementation under the hood if needed for performance).

c += (ai == bi)
end
return c
end
Expand All @@ -28,13 +25,10 @@ Count the number of indices at which the elements of the arrays
`a` and `b` are not equal.
"""
function countne(a::AbstractArray, b::AbstractArray)
n = length(a)
length(b) == n || throw(DimensionMismatch("Inconsistent lengths."))
length(a) == length(b) || throw(DimensionMismatch("Inconsistent lengths."))
c = 0
for i = 1:n
@inbounds if a[i] != b[i]
c += 1
end
for (ai, bi) = zip(a, b)
c += (ai != bi)
end
return c
end
Expand All @@ -47,11 +41,10 @@ Compute the squared L2 distance between two arrays: ``\\sum_{i=1}^n |a_i - b_i|^
Efficient equivalent of `sumabs2(a - b)`.
"""
function sqL2dist(a::AbstractArray{T}, b::AbstractArray{T}) where T<:Number
n = length(a)
length(b) == n || throw(DimensionMismatch("Input dimension mismatch"))
length(a) == length(b) || throw(DimensionMismatch("Input dimension mismatch"))
r = 0.0
for i = 1:n
@inbounds r += abs2(a[i] - b[i])
for (ai, bi) = zip(a, b)
r += abs2(ai - bi)
end
return r
end
Expand All @@ -75,11 +68,10 @@ Compute the L1 distance between two arrays: ``\\sum_{i=1}^n |a_i - b_i|``.
Efficient equivalent of `sum(abs, a - b)`.
"""
function L1dist(a::AbstractArray{T}, b::AbstractArray{T}) where T<:Number
n = length(a)
length(b) == n || throw(DimensionMismatch("Input dimension mismatch"))
length(a) == length(b) || throw(DimensionMismatch("Input dimension mismatch"))
r = 0.0
Moelf marked this conversation as resolved.
Show resolved Hide resolved
for i = 1:n
@inbounds r += abs(a[i] - b[i])
for (ai, bi) = zip(a, b)
r += abs(ai - bi)
end
return r
end
Expand All @@ -94,11 +86,10 @@ two arrays: ``\\max_{i\\in1:n} |a_i - b_i|``.
Efficient equivalent of `maxabs(a - b)`.
"""
function Linfdist(a::AbstractArray{T}, b::AbstractArray{T}) where T<:Number
n = length(a)
length(b) == n || throw(DimensionMismatch("Input dimension mismatch"))
length(a) == length(b) || throw(DimensionMismatch("Input dimension mismatch"))
r = 0.0
for i = 1:n
@inbounds v = abs(a[i] - b[i])
for (ai, bi) = zip(a, b)
v = abs(ai - bi)
if r < v
r = v
end
Expand All @@ -118,9 +109,7 @@ Efficient equivalent of `sum(a*log(a/b)-a+b)`.
function gkldiv(a::AbstractArray{T}, b::AbstractArray{T}) where T<:AbstractFloat
n = length(a)
r = 0.0
for i = 1:n
@inbounds ai = a[i]
@inbounds bi = b[i]
for (ai, bi) = zip(a, b)
if ai > 0
r += (ai * log(ai / bi) - ai + bi)
else
Expand Down