Make `pairwise` multi-threaded and make `corkendall` and `corspearman` wrap `pairwise`. #923

PGS62 · 2024-03-31T11:38:24Z

This pull request follows up on #849. In that discussion I envisaged amending corkendall to be multi-threaded. Instead I have done the following:

Made pairwise and pairwise! multi-threaded, and in the case of skipmissing = :pairwise, greatly reduced allocations which is needed for good multi-threaded performance. A nice illustration of the performance improvements this leads to is obtained with f = LinearAlgebra.dot. The example here shows a 31.5 times speedup on a 12 core PC.
corkendall now simply wraps pairwise but with added methods of _pairwise! for typeof(corkendall) to retain the speedups to corkendall that were a result of Speeding up Kendall Correlation corkendall #634.
corspearman also wraps pairwise with added methods of _pairwise! for typeof(corspearman). The purpose of these added methods is to reduce the number of calls to tiedrank from 2nm to 2(n+m) when n and m are respectively the number of columns in the arguments x and y to the call to corspearman(x,y....

Since corkendall and corspearman now wrap pairwise they have a skipmissing argument, which introduces an additional inconsistency between corkandall/corspearman and cov/cor.

I developed the new code in my package KendallTau. Please see the README file for further examples of the performance improvements achieved, which for all but the smallest input data exceed the number of cores on the PC (thanks to reduced allocations).

I have amended the tests as needed, and I believe that test coverage is as close to 100% as possible.

I apologise that there is quite a lot for maintainers to review, and I'll be happy to answer questions.

andreasnoack

This is really great. Thanks for contributing such a PR. I'm happy with it but lets keep if open for a couple of days to see if anybody else would like to comment.

test/pairwise.jl

andreasnoack · 2024-03-31T18:19:23Z

src/rankcorr.jl

-            Xjrank = tiedrank(Xj)
-            C[j,1] = cor(Xjrank, yrank)
+function corspearman(x::AbstractMatrix, y::AbstractVector; skipmissing::Symbol=:none)
+    return corspearman(x, reshape(y, (length(y), 1)); skipmissing)


I think the 1.0 tests are failing because of this way of passing the keyword argument. I'd be okay with bumping the minimum version for StatsBase since 1.0 is ancient but maybe it's better to just make it skipmissing=skipmissing here for now.

When writing and testing the code (in KendallTau) I was unaware that support for Julia 1.0.5 was necessary, so the test failures when I submitted the PR came as a surprise. Changing how keyword arguments are handled would be easy, as would avoiding trailing backslashes within string literals. But the fact that 1.0.5 does not support eachcol is a bit more difficult. eachcol was added in 1.1. For the time being I will work on responding to @devmotion's comments and wait for your guidance as to whether bumping the minimum version might happen.

I've done some more investigation:
Other popular packages (the compat section of Project.toml) have the following minimum versions:
1.9.4 for Statistics
1.6 for Plots, DataFrames, Revise, Optim, ForwardDiff, OhMyREPL, CSV
1.3 for Distributions
1 for StatsBase
So upping the minimum version for StatsBase seems sensible.

But as I remarked above, eachcol is crucial to how corkendall and corspearman are mere wrappers to pairwise. eachcol was introduced in Julia 1.1, so I imagined that upping the minimum Julia version for StatsBase to 1.1 would suffice. But no - the code appears to require Julia 1.9.4, and as of now does not work with Julia 1.8.5. For example the method check_vectors fails because:

# Julia 1.9.4 julia> keys((eachcol([1 2;3 4]))[2]) 2-element LinearIndices{1, Tuple{Base.OneTo{Int64}}}: 1 2

# Julia 1.8.5 julia> keys((eachcol([1 2;3 4]))[2]) ERROR: MethodError: no method matching getindex(::Base.Generator{Base.OneTo{Int64}, Base.var"#242#243"{Matrix{Int64}}}, ::Int64) Stacktrace: [1] top-level scope @ REPL[1]:1

So the alternatives are:

A. We increase the minimum version to 1.9.4, which happens to be that for Statistics, but is more modern than the current LTS.

B. I find a workaround so that this PR requires either no increase in the minimum version number or an increase to 1.6 which is more in line with other popular packages.

It's now the case that increasing the minimum version to 1.6 would work. See here.

.vscode/settings.json

devmotion · 2024-03-31T22:04:54Z

src/pairwise.jl

+    #cor and friends are special-cased.
+    iscor = (f in (corkendall, corspearman, cor))


It is a bit unfortunate that this means other functions can't benefit from the improvements (?) since it is not extendable. Maybe an additional argument would be better?

Can I check that you've understood the purpose of the iscor variable (which might be better named diag_is_1)?

It's a Boolean that answers the question "Does x === y imply f(x,y) == 1.0, irrespective of NaN, missing and Inf entries, and that therefore there is a speedup to be had by not bothering to call f when x[i] === y[i]".

The upshot is that if f was some new correlation estimator for which the answer to the question above was "yes" but where we failed to update line 15 then we'd miss out on the "diagonal elements are near-zero-cost to calculate" but still have the benefit of multithreading. Of course, an additional argument is possible, but it makes the interface to pairwise a bit more complicated.

There's also the fact that the author of funky_cor could simply include the line x===y && return 1.0 in its definition, which remark has got me wondering whether I should do that for corspearman and corkendall...

src/pairwise.jl

devmotion · 2024-03-31T22:59:23Z

src/pairwise.jl

+ [26, 25, 16, 15, 6, 5]
+```
+"""
+function equal_sum_subsets(n::Int, num_subsets::Int)::Vector{Vector{Int}}


In case this function is kept (see comments above), is it necessary to return a vector of vectors or could one return an iterator of vectors instead?

I'm reasonably sure that equal_sum_subsets works well in the sense of doing a good job of load balancing. The only reason it currently returns a vector of vectors is that's what I know how to do, so I would need to read up on iterators.

In commit a1d9ae9 I have replaced the function equal_sum_subsets with an iterator EqualSumSubsets whose element type is another iterator TwoStepRange. It seems to work well and leads to a small reduction in allocations. But please review as this is the first time I've written code to create an iterator.

Ah, didn't work on 32 bit. Maybe I need to replace use of Int64 with Int?

Commit 951b391 fixes EqualSumSubsets on 32-bit.

…ll be dropped.

No longer use eachcol when VERSION < v"1.9"

… version

PGS62 · 2024-04-05T22:29:28Z

I've now made changes so that pairwise, corkendall and corspearman work with all tests passing on Julia 1.6 or later. I've also changed ci.yml to test against 1.6 instead of 1.0 and the compat section of Project.toml now puts the minimum Julia version as 1.6.

Would this meet with approval from the StatsBase maintainers?

PGS62 · 2024-04-10T15:48:54Z

The failed checks after commit [46d5655] are all in the codecov step, with error:

Commit creating failed: {"detail":"Tokenless has reached GitHub rate limit. Please upload using a token: https://docs.codecov.com/docs/adding-the-codecov-token. Expected available in 1549 seconds."}

IIUC (this page) I need to wait and try again.

PGS62 · 2024-05-28T08:57:46Z

It's been a while... May I ask why this PR has not been merged? Is it:
a) An oversight?
b) Concerns about upping the minimum Julia version to 1.6?
c) Something else?

frankier · 2024-08-27T09:43:45Z

I know people have many obligations and PRs/emails get ignored sometimes, but this is a bit of an unfortunate situation. @PGS62 was rebuffed from registering a new package, but now this PR sits languishing.

nalimilan

Sorry for the delay! This is a significant refactoring so there are many things to discuss. My main remarks are (see details in inline comments):

I think the threading strategy can be simplified a lot. Task local storage shouldn't be needed (it's almost never the case), using @spawn is the recommended pattern and it composes with nested threaded calls contrary to @threads. Also I wonder whether we can find a simpler solution of splitting indices across tasks than EqualSumSubsets (see here).
I wouldn't add a skipmissing argument to corkendall and corspeaman, as this makes the code significantly more verbose and the signature inconsistent with cor. We have pairwise for that. (And anyway it's better to make separate PRs for distinct features as in my experience large PRs are more likely to become unmanageable and stall.)
There are lots of unrelated formatting changes, please revert these so that the diff is cleaner and easier to review. Also in the changed code the style isn't always consistent with StatsBase's (in particular, arguments should be aligned in method definitions, no empty initial lines, space after # in comments).

nalimilan · 2024-10-13T13:55:58Z

src/pairwise.jl

+
+    #cor and friends are special-cased.
+    diag_is_1 = (f in (corkendall, corspearman, cor))
+    (diag_is_1 || f == cov) && (symmetric = x === y)


Suggested change

(diag_is_1 || f == cov) && (symmetric = x === y)

if (diag_is_1 || f === cov) && x === y

symmetric = true

end

nalimilan · 2024-10-13T13:56:23Z

src/pairwise.jl

+    diag_is_1 = (f in (corkendall, corspearman, cor))
+    (diag_is_1 || f == cov) && (symmetric = x === y)
+    #cov(x) is faster than cov(x, x)
+    (f == cov) && (f = ((x, y) -> x === y ? cov(x) : cov(x, y)))


Suggested change

(f == cov) && (f = ((x, y) -> x === y ? cov(x) : cov(x, y)))

if f === cov

f = (x, y) -> x === y ? cov(x) : cov(x, y)

end

nalimilan · 2024-10-13T14:00:33Z

src/pairwise.jl

        end
    end
+    symmetric && LinearAlgebra.copytri!(dest, 'L')


AFAICT this is an internal function, better copy using an explicit loop as before. Anyway that function just does that, without any particular optimization.

nalimilan · 2024-10-13T14:00:56Z

src/pairwise.jl

+    (f == cov) && (f = ((x, y) -> x === y ? cov(x) : cov(x, y)))
+
+    Threads.@threads for subset in EqualSumSubsets(nc, Threads.nthreads())
+        for j in subset


Suggested change

for j in subset

@inbounds for j in subset

nalimilan · 2024-10-13T14:02:47Z

src/pairwise.jl

+function _pairwise!(::Val{:none}, f, dest::AbstractMatrix{V}, x, y,
+    symmetric::Bool) where {V}


Suggested change

function _pairwise!(::Val{:none}, f, dest::AbstractMatrix{V}, x, y,

symmetric::Bool) where {V}

function _pairwise!(::Val{:none}, f, dest::AbstractMatrix{V}, x, y,

symmetric::Bool) where {V}

src/pairwise.jl

nalimilan · 2024-10-17T21:18:21Z

src/rankcorr.jl

+# eachcol was added in Julia 1.1 but for Julia 1.8, keys(eachcol(x)) fails for any Matrix x
+# which causes `check_vectors` to fail. Solution is to use the comprehension below.
+_eachcol(x) = VERSION < v"1.9" ? [view(x, :, i) for i in axes(x, 2)] : eachcol(x)


I though check_vectors allowed any iterator, as the docstring for pairwise indicates. It doesn't seem to call keys on the input. Things like pairwise(cor, eachcol(rand(4, 5)), eachcol(rand(4, 5))) work on Julia 1.6.

Anyway even if it works around an error, this function doesn't fix it when users pass eachcol(...) so we need a better solution.

nalimilan · 2024-10-18T06:57:17Z

src/rankcorr.jl

+# _isnan required so that corkendall and corspearman have correct handling of NaNs and
+# can also accept arguments with element type for which isnan is not defined but isless is
+# is defined, so that rank correlation makes sense.
+_isnan(x::T) where {T<:Number} = isnan(x)


Suggested change

_isnan(x::T) where {T<:Number} = isnan(x)

_isnan(x::Number) = isnan(x)

nalimilan · 2024-10-18T07:04:23Z

src/rankcorr.jl

+    end
+
+    if missing isa T || missing isa V
+        sortedx, permutedy = handle_pairwise(sortedx, scratch_py; scratch_fx=scratch_fx, scratch_fy=scratch_fy)


This assumes pairwise deletion but what about skipmissing=:listwise?

nalimilan · 2024-10-18T07:25:13Z

src/pairwise.jl

+    #cov(x) is faster than cov(x, x)
+    (f == cov) && (f = ((x, y) -> x === y ? cov(x) : cov(x, y)))
+
+    Threads.@threads for subset in EqualSumSubsets(nc, Threads.nthreads())


(Continuing #923 (comment))

This approach is clever, but I also wonder whether it wouldn't be possible to achieve the same result with less code. If you identify a cell in the upper triangle using an integer index, you can just split these equally across threads. Then you can compute back what row and column the integer index corresponds to (e.g. via i, j = Tuple(CartesianIndices(dest))). If that's simpler you could even attribute indices from the full table to tasks, and skip those from the lower triangle from inside tasks. Am I missing something?

It could also make sense to spawn more tasks than Threads.nthreads when the input matrix is really large. Indeed, we don't have any guarantee that there are Threads.nthreads() idle threads. We may be operating within a context which is already threaded, and for example one thread, or two thirds of the threads may already be busy; another app may also be using cores. So when we're sure that the overhead of spawning threads is small compared to the total cost of the operation it's worth spawning lots of tasks that can then be attributed to idle threads as appropriate. This is what we do in DataFrames.jl, with a threshold so that we spawn a new task for each batch of N rows.

PGS62 added 2 commits March 30, 2024 10:52

Rewrite pairwise, corkendall and corspearman

c85116a

Delete rankcorrspeedtest.jl

8ced336

andreasnoack reviewed Mar 31, 2024

View reviewed changes

devmotion reviewed Mar 31, 2024

View reviewed changes

PGS62 and others added 18 commits April 2, 2024 09:00

Partial solution to test failures on Julia 1.0.5

6a8fecd

Reverted to using eachcol in the hope that support for Julia 1.0.5 wi…

eff32e4

…ll be dropped.

Deleted commented out tests

1a4d064

Delete settings.json

893d4b6

Update pairwise.jl in response to some of devmotion's comments

e2c9ef5

Was calling copytry! too soon. Fixed that.

4c2e7a0

Update pairwise.jl

2119804

nested loops more friendly to Julia being column-major

5da5943

flipped i and j loop variables. reshape -> similar

baf193b

Added iterators EqualSumSubsets and TwoStepRange

a1d9ae9

Int64 -> Int to work on 32bit

951b391

Update tests

75d206a

Update tests

56d27cc

Tests now pass on Julia 1.8.5

682859a

No longer use eachcol when VERSION < v"1.9"

Compatibility with Julia 1.6

9c039dd

Test against Julia 1.6, 1.7, 1.8 and 1.9 to find possible new minimum…

f53b7b5

… version

Test on Julia 1.1, 1.2, 1.3. 1.4 and 1.5

88bf8f6

Demonstrate that 1.6 could be new minimum version

ddb1b51

Corrected call to keyword argument

f67e867

devmotion mentioned this pull request Apr 8, 2024

Optimize cov and cor with identical arguments JuliaStats/Statistics.jl#166

Merged

PGS62 added 2 commits April 10, 2024 15:23

Improved test coverage (src/pairwise.jl#L84-L87)

246f32d

Improved test coverage (StatsBase._cor)

46d5655

Trivial change to re-run tests

e1395b8

nalimilan reviewed Oct 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make `pairwise` multi-threaded and make `corkendall` and `corspearman` wrap `pairwise`. #923

Make `pairwise` multi-threaded and make `corkendall` and `corspearman` wrap `pairwise`. #923

PGS62 commented Mar 31, 2024

andreasnoack left a comment

andreasnoack Mar 31, 2024

PGS62 Apr 2, 2024

PGS62 Apr 5, 2024

PGS62 Apr 7, 2024

devmotion Mar 31, 2024

PGS62 Apr 2, 2024 •

edited

Loading

devmotion Mar 31, 2024

PGS62 Apr 2, 2024

PGS62 Apr 4, 2024 •

edited

Loading

PGS62 Apr 5, 2024

PGS62 Apr 5, 2024

PGS62 commented Apr 5, 2024

PGS62 commented Apr 10, 2024

PGS62 commented May 28, 2024

frankier commented Aug 27, 2024

nalimilan left a comment

nalimilan Oct 13, 2024 •

edited

Loading

nalimilan Oct 13, 2024

nalimilan Oct 13, 2024

nalimilan Oct 13, 2024

nalimilan Oct 13, 2024

nalimilan Oct 17, 2024

nalimilan Oct 18, 2024

nalimilan Oct 18, 2024

nalimilan Oct 18, 2024

		#cor and friends are special-cased.
		iscor = (f in (corkendall, corspearman, cor))

-    (diag_is_1 || f == cov) && (symmetric = x === y)
+    if (diag_is_1 || f === cov) && x === y
+        symmetric = true
+    end

-    (f == cov) && (f = ((x, y) -> x === y ? cov(x) : cov(x, y)))
+    if f === cov
+        f = (x, y) -> x === y ? cov(x) : cov(x, y)
+    end

		function _pairwise!(::Val{:none}, f, dest::AbstractMatrix{V}, x, y,
		symmetric::Bool) where {V}

	_isnan(x::T) where {T<:Number} = isnan(x)
	_isnan(x::Number) = isnan(x)

Make pairwise multi-threaded and make corkendall and corspearman wrap pairwise. #923

Are you sure you want to change the base?

Make pairwise multi-threaded and make corkendall and corspearman wrap pairwise. #923

Conversation

PGS62 commented Mar 31, 2024

andreasnoack left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PGS62 Apr 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PGS62 Apr 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PGS62 commented Apr 5, 2024

PGS62 commented Apr 10, 2024

PGS62 commented May 28, 2024

frankier commented Aug 27, 2024

nalimilan left a comment

Choose a reason for hiding this comment

nalimilan Oct 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Make `pairwise` multi-threaded and make `corkendall` and `corspearman` wrap `pairwise`. #923

Make `pairwise` multi-threaded and make `corkendall` and `corspearman` wrap `pairwise`. #923

PGS62 Apr 2, 2024 •

edited

Loading

PGS62 Apr 4, 2024 •

edited

Loading

nalimilan Oct 13, 2024 •

edited

Loading