This repository documentation is used to explain the model in the papar by Kang, Hakmook, et al. "A bayesian double fusion model for resting-state brain connectivity using joint functional and structural data." Brain connectivity 7.4 (2017): 219-227. Since GitHub doest not render the equation in Markdown, you can read the Readme in HTML or slides.
Our brain network, as a complex integrative system, consists of many different regions that have each own task and function and simultaneously share structural and functional information. With the developed imaging techniques such as functional magnetic resonance imaging (fMRI) and diffusion tensor imaging (DTI), researchers can investigate the underlying brain functions related to human behaviors and some diseases or disorders in the nervous system such as major depressive disorder (MDD).
We developed a Bayesian hierarchical spatiotemporal model that combined fMRI and DTI data jointly to enhance the estimation of resting-state functional connectivity. Structural connectivity from DTI data was utilized to construct an informative prior for functional connectivity based on resting-state fMRI data through the Cholesky decomposition in a mixture model. The analysis took the advantages of probabilistic programming package as PyMC3 and next-generation Markov Chain Monte Carlo (MCMC) sampling algorithm as No-U-Turn Sampler (NUTS). PyMC3 is new, open-source framework with a readable but powerful syntax close to the natural syntax statisticians will use to describe models. NUTS avoids the random walk behavior and sensitivity to correlated parameters by taking a series of steps informed by first-order gradient information. In other words, the NUTS method is a self-tuning variant of Hamiltonian Monte Carlo (HMC).
To install the Python packages for the project, clone the repository and run:
conda env create -f environment.yml
from inside the cloned directory. This assumes that Anaconda Python is installed. And theanorc
file can follows as:
# cpu
[global]
floatX = float 32
config.compile.timeout = 1000
# gpu
[global]
floatX = float32
config.compile.timeout = 1000
device = cuda0
For Windows users, environmental variables need to be added in the system setting to use conda
command in the terminal and theano
package in Python environment. C++ compiler such as Cygwin needs to be installed before running pymc3
and theano
in Windows if it does not exist.
To run the model:
# activate the environment
source activate double-fusion
# run the model
python model.py
We also add some examples using CPU/GPU on Vanderbilt Advanced Computing Center for Research and Education (ACCRE).
In a resting-state fMRI study, we define the time-series data at voxel
In the formula above,
Note that the voxel-specific random effect
This kernel function can be any valid spatial covariance function like the following:
In our model, we apply the exponential function to represent the covariance structure between voxels:
Finally,
where
Our goal is to estimate each functional connectivity through its corresponding posterior distribution. To obtain the posterior distribution, each component in the spatiotemporal structure from last section can be rewritten as a hierarchical structure in one ROI level:
$\boldsymbol{Y}{cv}$ denotes a vector ($(1 \times V)$) of signals at each voxel as $[Y{c1}(t), Y_{c2}(t),...,Y_{cv}(t)]^T$. We use
In details, each term
We combine the structural connectivity and naive functional connectivity together because the effect of direct structural connectivity is different from that of indirect structural connectivity. For example, relatively lower values of structural connectivity imply no direct correlated pathways between two ROIs. However, it is likely that there exist indirect structural connections between two ROIs, resulting in high functional coupling. And the low structural connectivity will indicate very low functional connectivity if no structural connection exists. Therefore, we should treat the indirect structural connectivity differently from direct structural connectivity in the fusion step.
Then the prior distribution of the covariance matrix
The elements in the upper triangular part can be vectorized as:
$$[\rho_{12},...,\rho_{1n}, \rho_{23},..., \rho_{2n},..., \rho_{(n-1)n}]{n{vec}}$$
The total number of estimation is
Because we have no prior information about the values of each parameter, we decide to apply uninformative priors. In the exponential function, we assume the corresponding parameters follow: