Skip to content

ncats/RD-Clust

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RDClust: Clustering of rare diseases on knowledge graphs

Identifying sets of rare diseases with shared aspects of etiology and pathophysiology may enable drug repurposing and/or platform-based therapeutic development. Toward that aim, we utilize an integrative knowledge graph-based approach to constructing clusters of rare diseases.

Workflow:

Note: The workflow is designed and executed within an HPC slurm cluster environment. For more information, please see the various example notebooks provided.

The steps to reproducing the workflow are outlined below:

    1. Set up environment and directory:
bash 00_setup_data.sh
conda env create -f rdclust.yml
conda activate rdclust
pip install -r requirements.txt
    1. Get data: 01_get_public_data.sh
Note - The GARD data is currently NOT publicly accessible via API; therefore, we provide the necessary datasets in this repository (RD-Clust/data/raw/). When an API is publicly available, the workflow and 01_get_gard_data.sh will be updated.
    1. Pre-process the data: 02_process_ontologies.sh
    1. Generate random walks: 03_walks_array.sh
    1. Generate node embeddings: 04_embeddings_array.sh
    1. Create clustering models: 05_cluster_array.sh
    1. Post-hoc summaries
    • Gene enrichment: 06_calculate_enrichment.sh
    • Clustering metrics: 06_summarize_clusters.sh
    • Walk annotation counts: 06_summarize_walks.sh
    • Semantic similarity: 06_calculate_semantic_similarity.sh
    1. Detailed analysis and Visualization in the notebooks directory


  • For quality check, we randomized graphs to assess how well disease nodes cluster when their relationships are not based on real knowledge. See QC directory for quality control pipeline *

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published