Skip to content

Latest commit

 

History

History
434 lines (327 loc) · 18.8 KB

README.md

File metadata and controls

434 lines (327 loc) · 18.8 KB
output
pdf_document html_document
default
default

Latent Multimodal Functional Graphical Model Estimation

This repository implements the method developed in Latent Multimodal Functional Graphical Model Estimation.

This form documents the artifacts associated with the article (i.e., the data and code supporting the computational findings) and describes how to reproduce the findings.

Part 1: Data

  • This paper does not involve analysis of external data (i.e., no data are used or the only data are generated by the authors via simulation in their code).
  • I certify that the author(s) of the manuscript have legitimate access to and permission to use the data used in this manuscript.

Abstract

The dataset consists of simulated and real data of EEG-fMRI. The simulated data along with data generation code is available. We provide generation code for four types of simulated graph under 2 different noise models, as detailed in Appendix K. We will provide a download link for a simulated dataset for these four types of graphs and with the sample size 100, and the dimension 50,100,150. The real data is available upon request. Since we do not own the original dataset of concurrent measurements of EEG-fMRI, we kindly ask to send the request to the original authors referenced in the manuscript.

Availability

  • Data are publicly available.
  • Data cannot be made publicly available.

If the data are publicly available, see the Publicly available data section. Otherwise, see the Non-publicly available data section, below.

Publicly available data

  • Data are available online at:here

  • Data are available as part of the paper’s supplementary material.

  • Data are publicly available by request, following the process described here:

  • Data are or will be made available through some other mechanism, described here:

The data we use is originally from Morillon et al. (2010). We have contacted the data owner, Anne-Lise Giraud, for data sharing. She has agreed to share data when individuals request it. Please contact the Anne-Lise Giraud (email:anne-lise.giraud-mamessier@pasteur.fr) to request the data. The simulated data are available in the above link. Partial simulated data, i.e, the data that run the sample complexity in section 7.2 of the manuscript are not available online due to the fact that the size is too big (~1TB). But the data generation code is provided so practitioners can generate data on their own.

Reference

Morillon, B., Lehongre, K., Frackowiak, R. S., Ducorps, A., Kleinschmidt, A., Poeppel, D., & Giraud, A. L. (2010). Neurophysiological origin of human brain asymmetry for speech and language. Proceedings of the National Academy of Sciences, 107(43), 18688-18693.

Non-publicly available data

Description

File format(s)

  • CSV or other plain text.
  • Software-specific binary format (.Rda, Python pickle, etc.): pkcle
  • Standardized binary format (e.g., netCDF, HDF5, etc.):
  • Other (please specify):

Data dictionary

  • Provided by authors in the following file(s): Under the directory data/README.md
  • Data file(s) is(are) self-describing (e.g., netCDF files)
  • Available at the following URL:

Additional Information (optional)

Part 2: Code

Abstract

The code contains source files and testing files. We briefly outline the content of each directory. The code/synth_data directory contains files to generate synthetic data and script files to generate a batch of synthetic data. The code/src contains all the source code. We do not provide the codes for other comparing methods as we do not possess the ownership. Under the directory code/tests, the directory notebook contains all the step-by-step instruction and visualization code, the script folder contains the execution script. The code/experiments directory contains the data preprocessing code and graph estimation code for real data.

Description

Code format(s)

  • Script files
    • R
    • Python
    • Matlab
    • Other:
  • Package
    • R
    • Python
    • MATLAB toolbox
    • Other:
  • Reproducible report
    • R Markdown
    • Jupyter notebook
    • Other:
  • Shell script
  • Other (please specify):

Supporting software requirements

Version of primary software used

R version 3.6.0 Python version 3.7.3

Libraries and dependencies used by the code

  • R-packages
    • wordspace_0.2-6
    • fields_12.5
    • viridis_0.6.1
    • viridisLite_0.4.0
    • spam_2.7-0
    • dotCall64_1.0-1
    • plotly_4.10.0
    • ggplot2_3.3.5
    • pracma_2.3.3
    • R.matlab_3.6.2
    • far_0.6-5
    • nlme_3.1-139
    • matrixcalc_1.0-5
    • poweRlaw_0.70.6
    • fgm_1.0
    • mvtnorm_1.1-2
    • fda_5.4.0
    • deSolve_1.30
    • fds_1.8
    • RCurl_1.98-1.5
    • rainbow_3.6
    • pcaPP_1.9-74
    • MASS_7.3-51.3
    • Matrix_1.2-17
    • RSpectra_0.16-0
    • doParallel_1.0.16
    • iterators_1.0.10
    • foreach_1.4.4
  • Python packages
    • numpy_1.19.1
    • scipy_1.5.2
    • pathos_0.2.8
    • matplotlib_3.4.3
    • multiprocessing_0.70.12.2
    • nilearn_0.9.0
    • rpy2_2.9.4

Supporting system/hardware requirements (optional)

Platform: x86_64-conda_cos6-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core)

All experiments are run on a cluster and no GPUs are required. Any single run of the experiment can be run on a standalone desktop. However, if practitioners want to test different variables to generate ROC curves, it is highly recommended to use a cluster.

Parallelization used

  • No parallel code used
  • Multi-core parallelization on a single machine/node
    • Number of cores used: 8-160
  • Multi-machine/multi-node parallelization
    • Number of nodes and cores used:

License

  • MIT License (default)
  • BSD
  • GPL v3.0
  • Creative Commons
  • Other: (please specify)

Additional information (optional)

Part 3: Reproducibility workflow

Scope

The provided workflow reproduces:

  • Any numbers provided in text in the paper
  • The computational method(s) presented in the paper (i.e., code is provided that implements the method(s))
  • All tables and figures in the paper
  • Selected tables and figures in the paper, as explained and justified below:

Workflow

Location

The workflow is available:

  • As part of the paper’s supplementary material.
  • In this Git repository: The git respository will be made public if accepted. Now we include the contents under the directory code/
  • Other (please specify):

Format(s)

  • Single master code file
  • Wrapper (shell) script(s)
  • Self-contained R Markdown file, Jupyter notebook, or other literate programming approach
  • Text file (e.g., a readme-style file) that documents workflow
  • Makefile
  • Other (more detail in Instructions below)

Instructions

Each simulated experiment is consisted of three steps (i) generate simulated data (ii) run the proposed algorithm (w/ variable selection) (iii) visualize the results. The code for the first step is under the directory code/synth_data. The file for the second step is under code/tests or the proposed algorithm can be run in batch by running the script files in code/tests. The tools to visualize the results are under code/tests/notebook. Each directory also contains the README.md file for more detailed instruction.

Figure 2/Table 3
  • Data preparation
    • Data generation script: ./code/synth_data/run_dgp_N2.sh
    • Data downlaod link. Store the files under data_batch_N2
  • Estimation
    • Script file: ./code/tests/script/noise_model_2/*N100.sh. Modify the script file to specify the conda environment, file path, and save path.
  • Visualization
    • Result download link. The directory /proposed contain the results of proposed method. The directory /comparison/ constain the results of other comparison methods.
    • Visualization notebook: /code/notebook/plot_Comparison.ipynb
    • Instruction to generate table: to print the AUC and AUC15, set verbose=True
Figure 3
  • Data preparation
    • Data generation script: ./code/synth_data/run_dgp_sample.sh
  • Estimation
    • Script file: ./code/tests/script/sample_sample/. Modify the script file to specify the conda environment, file path, and save path.
  • Visualization
    • Result download link. Please select to download the directory ./p50, ./p100, ./p150
    • Visualization notebook:/code/notebook/plot_SampleComplexity.ipynb
Figure 5/Table 2
  • Data preparation
    • Data generation script: ./code/synth_data/run_dgp_N1.sh
    • Data downlaod link. Store the files under data_batch_N1
  • Estimation
    • Script file: ./code/tests/script/noise_model_1/*N100.sh. Modify the script file to specify the conda environment, file path, and save path.
  • Visualization
    • Result download link The directory /proposed contain the results of proposed method. The directory /comparison/ constain the results of other comparison methods.
    • Visualization notebook: /code/notebook/plot_Comparison.ipynb
    • Instruction to generate table: to print the AUC and AUC15, set verbose=True
Figure 6/Table 4
  • Data preparation
    • Data generation script: ./code/synth_data/run_dgp_N2.sh
    • Data downlaod link. Store the files under data_batch_N2
  • Estimation
    • Script file: ./code/tests/script/noise_model_2/. Modify the script file to specify the conda environment, file path, and save path.
  • Visualization
    • Result download link The directory /proposed contain the results of proposed method.
    • Visualization notebook: /code/notebook/plot_Comparison.ipynb
    • Instruction to generate table: to print the AUC and AUC15, set verbose=True
Figure 7/Table 5
  • Data preparation
    • Data generation script: ./code/synth_data/run_dgp_N1.sh
    • Data downlaod link. Store the files under data_batch_N1
  • Estimation
    • Script file: ./code/tests/script/noise_model_1/. Modify the script file to specify the conda environment, file path, and save path.
  • Visualization
    • Result download link The directory /proposed contain the results of proposed method.
    • Visualization notebook: /code/notebook/plot_Comparison.ipynb
    • Instruction to generate table: to print the AUC and AUC15, set verbose=True
Figure 8
  • Data preparation
    • Data generation script: /code/synth_data/run_dgp_k.sh,
    • Data downlaod link
  • Estimation
    • Script file: ./code/tests/script/sample_k/.
  • Visualization
    • Result download link
    • Visualization notebook: /code/notebook/plot_SampleComplexity_2.ipynb.
Figure 9
  • Estimation
    • Run /code/tests/notebook/plot_elbo.ipynb and save the result
  • Visualization
    • Visualization notebook: /code/tests/notebook/plot_elbo2.ipynb
Figure 10
  • Estimation
    • Run /code/tests/notebook/plot_elbo.ipynb and save the result
  • Visualization
    • Visualization notebook: /code/tests/notebook/plot_elbo2.ipynb
Figure 11
  • Data preparation
    • Data generation script: ./code/synth_data/run_dgp_N2.sh
    • Data downlaod link. Store the files under data_batch_N2
  • Estimation
    • Script file: ./code/tests/script/noise_model_2/. Modify the script file to specify the conda environment, file path, and save path
  • Visualization
    • Result download link
    • Visualization notebook: /code/tests/notebook/plot_VariableSelection.ipynb
Figure 12
  • Data preparation
    • Data generation script: ./code/synth_data/run_dgp_sample.sh
  • Estimation
    • Script file: ./code/tests/script/sample_alpha/.
  • Visualization
    • Result download link
    • Visualization notebook: /code/notebook/plot_SampleComplexity_2.ipynb
Figure 13/Table 6
  • Data preparation
    • Data generation script: ./code/synth_data/run_dgp_kmk.sh
    • Data downlaod link
  • Estimation
    • Script file: ./code/tests/script/noise_model1_varykmk/
  • Visualization
    • Result download link
    • Visualization notebook: /code/notebook/plot_Comparison.ipynb
    • Instruction to generate table: to print the AUC and AUC15, set verbose=True
Figure 14/Table 7
  • Data preparation
    • Data generation script: ./code/synth_data/run_dgp_kmk.sh
    • Data downlaod link
  • Estimation
    • Script file: ./code/tests/script/noise_model1_varykmk/
  • Visualization
    • Result download link
    • Visualization notebook: /code/notebook/plot_Comparison.ipynb
    • Instruction to generate table: to print the AUC and AUC15, set verbose=True

Expected run-time

Approximate time needed to reproduce the analyses on a standard desktop machine:

  • < 1 minute
  • 1-10 minutes
  • 10-60 minutes
  • 1-8 hours
  • > 8 hours
  • Not feasible to run on a desktop machine, as described here: It is safest to run on a cluster as original tests are implemented with parallelization. One can modify the number of cores in the test file to make it suitable for a desktop machine.

Additional information (optional)

We provide a demo example that can be run on standard desktop. Please see the /code/README.md for further instruction.

Notes (optional)