Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Compression complexity causality #162

Draft
wants to merge 20 commits into
base: main
Choose a base branch
from
Draft

Conversation

kahaaga
Copy link
Member

@kahaaga kahaaga commented Aug 19, 2022

What is this PR?

This PR implements the effort-to-compress algorithm that underlies the Compression-Complexity Causality (CCC) in #161.

It also implements the CCC algorithm in #161.

TODO

  • Uni- and multivariate effort-to-compress.
  • Joint effort-to-compress.
  • CCC univariate
  • CCC multivariate

Interface for compression complexity

I've decided on a dispatch-on-algorithm approach to allow a cleaner interface in the case that future compression complexity measures (e.g. Lempel-Siv complexity) are added.

The syntax is compression_complexity(x, algorithm::CompressionComplexityAlgorithm).

Input time series x (and y, for joint estimation) must always be pre-symbolized.

Effort-to-compress (ETC)

I've done an implementation of the ETC algorithm from scratch, based on the Nagaraj (2013; for single time series) and Kathpalia & Nagaraj (2019; for joint estimation). ETC is the only compression complexity measure included in this PR.

The syntax is:

  • Regular ETC: compression_complexity(x, EffortToCompress()). Computed on a single univariate (AbstractVector{Integer}) or multivariate (AbstractDataset{D, Integer}) pre-symbolized time series x.
  • Joint ETC: compression_complexity(x, y, EffortToCompress()). Also computed on pre-binned time series x and y.

See the tests for some examples.

Performance

My initial implementation of ETC is pretty naive - it uses dictionaries for histogram estimation. The algorithm can probably be optimised substantially by using pre-allocated arrays and clever indexing during the recursion step. However, at this stage I'm keeping it simple, just to have a working algorithm. I haven't looked at the MATLAB/Python implementations yet, so not sure what the speed-ups could be. This will have to be tested at some point.

Testing

Tests are based on examples from the papers linked in #161, and by comparison to results from the MATLAB scripts provided in the references given in the original papers. Additional tests for obvious edge cases and examples have also been added to the test suite.

All tests pass, so I'm fairly sure the implementation is correct.

Below is a reproduction of figure 3 in Nagaraj (2013), which compares the (approximate) Lyapunov exponent and compression complexity (ETC) of 200-pt long logistic map time series with varying bifurcation parameters. For ETC, the time series was symbolized to a binary time series by splitting above and below x = 0.5. Lyapunov exponents are estimated directly on the time series.

Screenshot 2022-08-20 at 02 17 28

@codecov-commenter
Copy link

codecov-commenter commented Aug 19, 2022

Codecov Report

Merging #162 (c0fd2b9) into master (f1bc2b7) will increase coverage by 21.85%.
The diff coverage is 93.63%.

❗ Current head c0fd2b9 differs from pull request most recent head 9418012. Consider uploading reports for the commit 9418012 to get more accurate results

@@             Coverage Diff             @@
##           master     #162       +/-   ##
===========================================
+ Coverage   53.75%   75.60%   +21.85%     
===========================================
  Files         154       48      -106     
  Lines        4757     1025     -3732     
===========================================
- Hits         2557      775     -1782     
+ Misses       2200      250     -1950     
Impacted Files Coverage Δ
src/CausalityTools.jl 0.00% <ø> (ø)
...sures/dynamical_complexity/dynamical_complexity.jl 58.62% <58.62%> (ø)
...yCausality/ccc/compression_complexity_causality.jl 88.23% <88.23%> (ø)
...on_complexity/effort_to_compress/etc_algorithms.jl 100.00% <100.00%> (ø)
...ression_complexity/effort_to_compress/etc_joint.jl 100.00% <100.00%> (ø)
..._complexity/effort_to_compress/etc_multivariate.jl 100.00% <100.00%> (ø)
...on_complexity/effort_to_compress/etc_univariate.jl 100.00% <100.00%> (ø)
...ression_complexity/effort_to_compress/etc_utils.jl 100.00% <100.00%> (ø)
...ventionalComplexityCausality/ccc/ccc_estimators.jl 100.00% <100.00%> (ø)
src/utils/sliding_window.jl 100.00% <100.00%> (ø)

... and 190 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@Datseris
Copy link
Member

perhaps @AditiKathpalia is interested to review this PR?

@Datseris
Copy link
Member

(I'm super swamped at the moment, but will return here in a couple of months)

@kahaaga
Copy link
Member Author

kahaaga commented Aug 19, 2022

perhaps @AditiKathpalia is interested to review this PR?

That would be awesome!

If you're interested @AditiKathpalia, let me know, and I'll tag you here once it's ready for review.

* Fix argument name


Doc wording


Simplify univariate ETC


Print statements commented out


Remove unneeded tests


Fix tests


Remove tests from scripts

* Simplify, and add joint ETC


Clean code and tests


Function signature

* Update docs

* Sliding window joint ETC
@kahaaga kahaaga marked this pull request as ready for review August 20, 2022 00:28
@kahaaga kahaaga changed the title WIP: Compression complexity for univariate and multivariate time series WIP: Compression complexity causality Aug 20, 2022
@kahaaga kahaaga marked this pull request as draft August 20, 2022 14:38
@AditiKathpalia
Copy link

Yes, sure, I will review it. Thanks for implementing this.

@kahaaga
Copy link
Member Author

kahaaga commented Aug 23, 2022

Hey @AditiKathpalia!

I'm trying to implement the full CCC algorithm, but when trying to compare my results with the implementation in the MATLAB toolbox from the supplement of the paper, I'm not able to run the examples in the code.

The documentation for the conditional_CCC function references a Main.m file, which is missing, so I can't run the given examples.

I also tried to follow the instructions in the function documentation and run an example myself:

N = 1000
INFO_CCC = struct('L',150,'w',15,'delta',80,'B',2,'N',N)
[y1, y2] = coupled_AR(N);
[CCC_value, k] = conditional_CCC([y1; y2], 1, 2, INFO_CCC)

but get the error

Index in position 2 exceeds array bounds. Index must not exceed 1.

Error in conditional_CCC (line 61)
        ETC_ini(k)=ETC(data_reduced(:,i:i+LEN_past-1),Num_bins);

I could spend more time deciphering the MATLAB code and figuring out what's wrong (or what I'm misunderstanding in the documentation string). However, given that there are files missing from the toolbox, I think it would be much faster if you provided an up-to-date, working example that I can use as a basis for comparison with my implementation.

Do you have an example of a simple CCC(x -> y) computation between two univariate time series x and y, using the code in MATLAB toolbox (as dowloaded from the paper)?

You can just post code snippet in a comment here, and I'll figure out the rest.

EDIT: By the way, I'm running MATLAB version 2021b - if that should matter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants