Skip to content

Commit

Permalink
Allow LocalPermutationTest to be used with TEShannon estimated us…
Browse files Browse the repository at this point in the history
…ing dedicated TE estimators (#350)

* Fix issue #348

* More effective estimation for `Lindner` when doing e.g. surrogate tests

partially addresses #344

* Typos

* Add tests

* Up patch version

* Correctly scale

* Make sure we have enough samples for tests

* Better test organization

* It is the estimator that controls what happens, not the measure

* Add note to `LocalPermutationTest` docstring about transfer entropy

* Error should occur only for `TransferEntropyEstimator`s

* More tests

* Improve test comments.

* Fix #349

And also mention conditioning in `Zhu1` docs
  • Loading branch information
kahaaga authored Oct 4, 2023
1 parent 5909d72 commit 477bd4c
Show file tree
Hide file tree
Showing 14 changed files with 250 additions and 76 deletions.
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ name = "CausalityTools"
uuid = "5520caf5-2dd7-5c5d-bfcb-a00e56ac49f7"
authors = ["Kristian Agasøster Haaga <kahaaga@gmail.com>", "Tor Einar Møller <temolle@gmail.com>", "George Datseris <datseris.george@gmail.com>"]
repo = "https://github.com/kahaaga/CausalityTools.jl.git"
version = "2.9.1"
version = "2.9.2"

[deps]
Accessors = "7d9f7c33-5ae7-4f3b-8dc6-eff91059b697"
Expand Down
2 changes: 2 additions & 0 deletions changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@
### Bug fixes

- Fixed bug in `transferentropy` function which yielded identical results in both directions for the bivariate case.
- Fixed bug that occurred when using `LocalPermutationTest` with `TEShannon` as the measure and a dedicated `TransferEntropyEstimator` (e.g. `Zhu1` or `Lindner`). This occurred because the `LocalPermutationTest` is, strictly speaking, a test using conditional mutual information as the measure. Therefore, naively applying a `TransferEntropy` measure such as `TEShannon` would error. This is fixed by performing a similar procedure where the source marginal is shuffled according to local neighborhoods in the conditional marginal. This is similar, but not identical to the CMI-based `LocalPermutationTest`, and adapts to the specific case of transfer entropy estimation using dedicated transfer entropy estimators instead of some lower-level estimator.
- Fixed bug in `Zhu1` transfer entropy estimator where when box volumes were extremely small, taking the logarithm of volume ratios resulted in `Inf` values. This was solved by simply ignoring these volumes.

## 2.8.0

Expand Down
14 changes: 7 additions & 7 deletions src/independence_tests/local_permutation/LocalPermutationTest.jl
Original file line number Diff line number Diff line change
Expand Up @@ -72,13 +72,13 @@ instead of `Z` and we `I(X; Y)` and `Iₖ(X̂; Y)` instead of `I(X; Y | Z)` and
## Compatible measures
| Measure | Pairwise | Conditional | Requires `est` |
| ----------------------------- | :------: | :---------: | :------------: |
| [`PartialCorrelation`](@ref) | ✖ | ✓ | No |
| [`DistanceCorrelation`](@ref) | ✖ | ✓ | No |
| [`CMIShannon`](@ref) | ✖ | ✓ | Yes |
| [`TEShannon`](@ref) | ✓ | ✓ | Yes |
| [`PMI`](@ref) | ✖ | ✓ | Yes |
| Measure | Pairwise | Conditional | Requires `est` | Note |
| ----------------------------- | :------: | :---------: | :------------: | :-------------------------------------------------------------------------------------------------------------------------------: |
| [`PartialCorrelation`](@ref) | ✖ | ✓ | No | |
| [`DistanceCorrelation`](@ref) | ✖ | ✓ | No | |
| [`CMIShannon`](@ref) | ✖ | ✓ | Yes | |
| [`TEShannon`](@ref) | ✓ | ✓ | Yes | Pairwise tests not possible with `TransferEntropyEstimator`s, only lower-level estimators, e.g. `FPVP`, `GaussianMI` or `Kraskov` |
| [`PMI`](@ref) | ✖ | ✓ | Yes | |
The `LocalPermutationTest` is only defined for conditional independence testing.
Exceptions are for measures like [`TEShannon`](@ref), which use conditional
Expand Down
54 changes: 49 additions & 5 deletions src/independence_tests/local_permutation/transferentropy.jl
Original file line number Diff line number Diff line change
Expand Up @@ -10,17 +10,61 @@ end

function independence(test::LocalPermutationTest{<:TransferEntropy{<:E}}, x::AbstractVector...) where E
measure, est, nshuffles = test.measure, test.est, test.nshuffles

if !(length(x) == 3) && est isa TransferEntropyEstimator
msg = "`LocalPermutationTest` is not defined for pairwise transfer entropy with " *
" `TransferEntropyEstimators`. " *
"Either provide a third timeseries to condition on, or use some other estimator."
throw(ArgumentError(msg))
end
# Below, the T variable also includes any conditional variables.
S, T, T⁺, C = individual_marginals_te(measure.embedding, x...)
TC = StateSpaceSet(T, C)
@assert length(T⁺) == length(S) == length(TC)
N = length(x)

X, Y = T⁺, S
Z = TC # The conditional variable
cmi = te_to_cmi(measure)
= estimate(cmi, est, X, Y, Z)
Îs = permuted_Îs(X, Y, Z, cmi, est, test)
if est isa TransferEntropyEstimator
= estimate(measure, est, S, T, T⁺, C)
Îs = permuted_Îs_te(S, T, T⁺, C, measure, est, test)
else
X, Y = S, T⁺ # The source marginal `S` is the one being shuffled.
Z = TC # The conditional variable
cmi = te_to_cmi(measure)
= estimate(cmi, est, X, Y, Z)
Îs = permuted_Îs(X, Y, Z, cmi, est, test)
end

p = count(Î .<= Îs) / nshuffles
return LocalPermutationTestResult(length(x), Î, Îs, p, nshuffles)
end

# Runge's local permutation test can't be directly translated to transfer entropy specific
# estimators like `Lindner`. However, but we can use a similar principle where
# the source marginal `S` is shuffled according to local closeness in the
# conditional marginal `C`. The `T` and `T⁺` marginals (i.e. all information)
# about the target variable is left untouched.
function permuted_Îs_te(S, T, T⁺, C, measure::TransferEntropy, est, test)
rng, kperm, nshuffles, replace, w = test.rng, test.kperm, test.nshuffles, test.replace, test.w

N = length(S)
test.kperm < N || throw(ArgumentError("kperm must be smaller than input data length"))

# Search for neighbors in the conditional marginal
tree_C = KDTree(C, Chebyshev())
idxs_C = bulkisearch(tree_C, C, NeighborNumber(kperm), Theiler(w))

# Shuffle source marginal `S` based on local closeness in C.
= deepcopy(S)
Nᵢ = MVector{kperm, Int}(zeros(kperm)) # A statically sized copy
πs = shuffle(rng, 1:N)
Îs = zeros(nshuffles)
for n in 1:nshuffles
if replace
shuffle_with_replacement!(Ŝ, S, idxs_C, rng)
else
shuffle_without_replacement!(Ŝ, S, idxs_C, kperm, rng, Nᵢ, πs)
end
Îs[n] = estimate(measure, est, Ŝ, T, T⁺, C)
end
return Îs
end
46 changes: 30 additions & 16 deletions src/methods/infomeasures/transferentropy/estimators/Lindner.jl
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ also used in the Trentool MATLAB toolbox, and is based on nearest neighbor searc
during neighbor searches (defaults to `0`, meaning that only the point itself is excluded
when searching for neighbours).
The estimator can be used both for pairwise and conditional transfer entropy estimation.
## Description
For a given points in the joint embedding space `jᵢ`, this estimator first computes the
Expand All @@ -32,7 +34,8 @@ TE(X \\to Y) =
```
where the index `k` references the three marginal subspaces `T`, `TTf` and `ST` for which
neighbor searches are performed.
neighbor searches are performed. Here this estimator has been modified to allow for
conditioning too (a simple modification to [Lindner2011](@citet)'s equation 5 and 6).
"""
Base.@kwdef struct Lindner{B} <: TransferEntropyEstimator
k::Int = 2 # number of neighbors in joint space.
Expand Down Expand Up @@ -61,11 +64,22 @@ function estimate(measure::TEShannon, est::Lindner,
C::AbstractStateSpaceSet)
(; k, w, base) = est

joint = StateSpaceSet(S, T, T⁺, C)
ST = StateSpaceSet(S, T, C)
TT⁺ = StateSpaceSet(T, T⁺, C)
T = StateSpaceSet(T, C)
# This layer ensures that the number of `StateSpaceSet`s that must be
# constructed is minimal when doing e.g. surrogate testing (then,
# `S` is the only marginal changing).
TT⁺C = StateSpaceSet(T, T⁺, C)
TC = StateSpaceSet(T, C)
return estimate_with_premade_embeddings(measure, est, S, TT⁺C, TC)
end

function estimate_with_premade_embeddings(measure::TEShannon, est::Lindner,
S::AbstractStateSpaceSet,
TT⁺C::AbstractStateSpaceSet,
TC::AbstractStateSpaceSet)
(; k, w, base) = est

joint = StateSpaceSet(S, TT⁺C)
STC = StateSpaceSet(S, TC)
N = length(joint)
W = Theiler(w)
metric = Chebyshev()
Expand All @@ -75,19 +89,19 @@ function estimate(measure::TEShannon, est::Lindner,
# points within distance `ds[i]` from the point. Then count, for each point in each
# of the marginals, how many neighbors each `xᵢ` has given `ds[i]`.
ds = last.(ds_joint) # only care about distance to the k-th neighbor
tree_ST = KDTree(ST, metric)
tree_TT⁺ = KDTree(TT⁺, metric)
tree_T = KDTree(T, metric)
nns_ST = [isearch(tree_ST, pᵢ, WithinRange(ds[i])) for (i, pᵢ) in enumerate(ST)]
nns_TT⁺ = [isearch(tree_TT⁺, pᵢ, WithinRange(ds[i])) for (i, pᵢ) in enumerate(TT⁺)]
nns_T = [isearch(tree_T, pᵢ, WithinRange(ds[i])) for (i, pᵢ) in enumerate(T)]

n_ST = length.(nns_ST)
n_TT⁺ = length.(nns_TT⁺)
n_T = length.(nns_T)
tree_STC = KDTree(STC, metric)
tree_TT⁺C = KDTree(TT⁺C, metric)
tree_TC = KDTree(TC, metric)
nns_STC = [isearch(tree_STC, pᵢ, WithinRange(ds[i])) for (i, pᵢ) in enumerate(STC)]
nns_TT⁺C = [isearch(tree_TT⁺C, pᵢ, WithinRange(ds[i])) for (i, pᵢ) in enumerate(TT⁺C)]
nns_TC = [isearch(tree_TC, pᵢ, WithinRange(ds[i])) for (i, pᵢ) in enumerate(TC)]

n_STC = length.(nns_STC)
n_TT⁺C = length.(nns_TT⁺C)
n_TC = length.(nns_TC)
te = 0.0
for i = 1:N
te += digamma(n_T[i] + 1) - digamma(n_TT⁺[i] + 1) - digamma(n_ST[i])
te += digamma(n_TC[i] + 1) - digamma(n_TT⁺C[i] + 1) - digamma(n_STC[i])
end
te /= N
# The "unit" is nats
Expand Down
16 changes: 13 additions & 3 deletions src/methods/infomeasures/transferentropy/estimators/Zhu1.jl
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,10 @@ export Zhu1
The `Zhu1` transfer entropy estimator [Zhu2015](@cite).
Assumes that the input data have been normalized as described in (Zhu et al., 2015).
Assumes that the input data have been normalized as described in [Zhu2015](@citet).
The estimator can be used both for pairwise and conditional transfer entropy.
## Description
This estimator approximates probabilities within hyperrectangles
surrounding each point `xᵢ ∈ x` using using `k` nearest neighbor searches. However,
Expand Down Expand Up @@ -102,10 +105,17 @@ end

function mean_volumes(vols_joint, vols_ST, vols_TT⁺, vols_T, N::Int)
vol = 0.0
n_ignore = 0
for i = 1:N
vol += log((vols_TT⁺[i] * vols_ST[i]) / (vols_joint[i] * vols_T[i]))
num = vols_TT⁺[i] * vols_ST[i]
den = vols_joint[i] * vols_T[i]
if den != 0
vol += log(num / den)
else
n_ignore += 1
end
end
return vol / N
return vol / (N - n_ignore)
end

function mean_digamma(ks_ST, ks_TT⁺, ks_T, k::Int, N::Int,
Expand Down
32 changes: 32 additions & 0 deletions test/independence/LocalPermutationTest/api.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
using Test
using CausalityTools
using StableRNGs

rng = StableRNG(123)
x, y, z = rand(rng, 30), rand(rng, 30), rand(rng, 30)

independence_test = LocalPermutationTest(CMIShannon(), FPVP())
# We should get back a convenience wrapper containing the result.
res = independence(independence_test, x, z, y)
@test res isa LocalPermutationTestResult

# We should be able to compute p-values for the result.
@test pvalue(res) isa Real
@test pvalue(res) 0

# Only conditional analyses are possible, meaning that we need three inputs.
# Pairwise analyses won't work, because only two inputs are given.
@test_throws ArgumentError independence(independence_test, x, y)

# Sampling with/without replacement
test_cmi_replace = LocalPermutationTest(CMIShannon(), FPVP(), replace = true)
test_cmi_nonreplace = LocalPermutationTest(CMIShannon(), FPVP(), replace = false)
@test independence(test_cmi_replace, x, y, z) isa LocalPermutationTestResult
@test independence(test_cmi_nonreplace, x, y, z) isa LocalPermutationTestResult

# Measure definition AND estimator must be provided for info measures
@test_throws ArgumentError LocalPermutationTest(TEShannon()) # estimator needed

# The number of local neighbors can't exceed the number of input datapoints
test_kperm_toolarge = LocalPermutationTest(CMIShannon(), FPVP(); kperm = 200, rng)
@test_throws ArgumentError independence(test_kperm_toolarge, x, y, z)
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
using Test
using CausalityTools
using StableRNGs

rng = StableRNG(123)
x, y, z = rand(rng, 30), rand(rng, 30), rand(rng, 30)

X = StateSpaceSet(x)
Y = StateSpaceSet(y)
Z = StateSpaceSet(z)

nshuffles = 5
lptest_sp = LocalPermutationTest(CMIShannon(), SymbolicPermutation(); nshuffles, rng)
lptest_vh = LocalPermutationTest(CMIShannon(), ValueHistogram(4); nshuffles, rng)
lptest_dp = LocalPermutationTest(CMIShannon(), Dispersion(); nshuffles, rng)
@test independence(lptest_sp, x, y, z) isa LocalPermutationTestResult
@test independence(lptest_vh, x, y, z) isa LocalPermutationTestResult
@test independence(lptest_dp, x, y, z) isa LocalPermutationTestResult
@test independence(lptest_sp, X, Y, Z) isa LocalPermutationTestResult
@test independence(lptest_vh, X, Y, Z) isa LocalPermutationTestResult
@test independence(lptest_dp, X, Y, Z) isa LocalPermutationTestResult
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
using Test
using CausalityTools
using StableRNGs

rng = StableRNG(123)
x, y, z = rand(rng, 30), rand(rng, 30), rand(rng, 30)

independence_test = LocalPermutationTest(DistanceCorrelation())
@test independence(independence_test, x, y, z) isa LocalPermutationTestResult
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
include("api.jl")

# Measure-specific implementations. One file per method that is listed in the docstring
# of `LocalPermutationTest`.
include("conditional_mutual_information.jl")
include("part_mutual_information.jl")
include("transferentropy.jl")
include("partial_correlation.jl")
include("distance_correlation.jl")
Original file line number Diff line number Diff line change
@@ -1,49 +1,10 @@

using Test
using CausalityTools
using StableRNGs
rng = StableRNG(123)
x, y, z = rand(rng, 100), rand(rng, 100), rand(rng, 100)

test_cmi_replace = LocalPermutationTest(CMIShannon(), FPVP())
test_cmi_nonreplace = LocalPermutationTest(CMIShannon(), FPVP())

test_teshannon = LocalPermutationTest(TEShannon(), FPVP())
@test_throws ArgumentError LocalPermutationTest(TEShannon()) # estimator needed

@test independence(test_cmi_replace, x, y, z) isa LocalPermutationTestResult
@test independence(test_cmi_nonreplace, x, y, z) isa LocalPermutationTestResult

@test independence(test_teshannon, x, y, z) isa LocalPermutationTestResult

test_kperm_toolarge = LocalPermutationTest(CMIShannon(), FPVP(); kperm = 200, rng)
@test_throws ArgumentError independence(test_kperm_toolarge, x, y, z)

# CMI
# ------------------------
# Independence tests
x = rand(rng, 50)
y = rand(rng, 50)
z = rand(rng, 50)
X = StateSpaceSet(x)
Y = StateSpaceSet(y)
Z = StateSpaceSet(z)

nshuffles = 5
lptest_sp = LocalPermutationTest(CMIShannon(), SymbolicPermutation(); nshuffles, rng)
lptest_vh = LocalPermutationTest(CMIShannon(), ValueHistogram(4); nshuffles, rng)
lptest_dp = LocalPermutationTest(CMIShannon(), Dispersion(); nshuffles, rng)
@test independence(lptest_sp, x, y, z) isa LocalPermutationTestResult
@test independence(lptest_vh, x, y, z) isa LocalPermutationTestResult
@test independence(lptest_dp, x, y, z) isa LocalPermutationTestResult
@test independence(lptest_sp, X, Y, Z) isa LocalPermutationTestResult
@test independence(lptest_vh, X, Y, Z) isa LocalPermutationTestResult
@test independence(lptest_dp, X, Y, Z) isa LocalPermutationTestResult
rng = StableRNG(123)
x, y, z = rand(rng, 30), rand(rng, 30), rand(rng, 30)

# Part mutual information
# ------------------------
# Independence tests
x = rand(rng, 50)
y = rand(rng, 50)
z = rand(rng, 50)
X = StateSpaceSet(x)
Y = StateSpaceSet(y)
Z = StateSpaceSet(z)
Expand Down
9 changes: 9 additions & 0 deletions test/independence/LocalPermutationTest/partial_correlation.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
using Test
using CausalityTools
using StableRNGs

rng = StableRNG(123)
x, y, z = rand(rng, 30), rand(rng, 30), rand(rng, 30)

independence_test = LocalPermutationTest(PartialCorrelation())
@test independence(independence_test, x, y, z) isa LocalPermutationTestResult
Loading

2 comments on commit 477bd4c

@kahaaga
Copy link
Member Author

@kahaaga kahaaga commented on 477bd4c Oct 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuliaRegistrator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registration pull request created: JuliaRegistries/General/92761

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:

git tag -a v2.9.2 -m "<description of version>" 477bd4c5106f59d592f7a7338d0f11a6cf294269
git push origin v2.9.2

Please sign in to comment.