Skip to content

Automatic pipeline for processing Hi-C data with different methods.

License

Notifications You must be signed in to change notification settings

dulcirena/TAD-triforce

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 

Repository files navigation

TRIFORCE

The Triforce consists of an automatized pipeline for processing Hi-C data with different methods.

image

Developed by: Dulce I. Valdivia + Luis Delaye + Kasia Oktaba

Installation

Clone this repository in your working environment:

git clone https://github.com/dulcirena/TAD-triforce.git

Software Requirements

  1. Juicer
  2. HiCExplorer
  3. R
  4. R packages:
    • tidyverse
    • dplyr
    • strucchangeRcpp
    • plotly
    • ggpubr
    • scico
  5. bedtools

Usage:

From fastq raw data to topologically associated domains using Arrowhead and HiCExplorer

Work in progress

Compute consensus TADs using the TRIFORCE

Once you have ran Arrowhead and HiCExplorer to obtain the their corresponding TAD annotation files, it is time to use TRIFORCE to obtain a consensus set of high-confidence TADs.

To run the complete workflow for TAD annotation:

./triforce.sh  <WORK_DIR> \
		<FILES_FASTQ_DIR> <TAD_SEP_SCORE> \
		<RESOLUTION_KB> <PROJECT_NAME> <DISMISS_CHR> \
		<FILE_ARROWHEAD_TADS> <FILE_HICEXPLORER_TADS>    

Description:

  • WORK_DIR: Directory where the out/ directory will be created.
  • FILES_FASTQ_DIR: Directory where the fastq files are (in progress, write any path).
  • TAD_SEP_SCORE: TAD separation score file computed with HiCExplorer. Should end with tad_score.bm
  • RESOLUTION_KB: Resolution of the matrix in kb.
  • PROJECT_NAME: ID for the project (do not use spaces)
  • DISMISS_CHR: The name of the chromosome you want to dismiss during the analysis.
  • FILE_ARROWHEAD: Arrowhead's TAD calling file (e.g. 10000_blocks)
  • HICEXPLORER_ARROWHEAD: HiCExplorer's TAD calling file. Should end with domains.bed

See the file src/run_test.sh for a working example.

Output:

All the outputs are stored in the directory WORK_DIR/out/

  • structure_CHR.html: An interactive file per chromosome (CHR) showing the breakpoints of the TAD separation score (TAD-SS) according to the structural change analysis (SCA). SCA breaks the TAD-SS in genomic regions that exhibit similar contact trends. The width of each breakpoint (lightgreen) represent its confidence interval.
  • confidenceIntervalCHR.tsv: The coordinates of the 5% and 95% CI for each breakpoint for each CHR. The coordinate of the 50% CI is used in the downstream analysis.
  • avgSCregion_boxplot.html: Boxplot showing the distribution of the average TAD separation score in each SCA-region. Only the regions above the overall median are kept for downstream analysis.
  • domainSizes_files: Distribution of TAD legth in the different steps of the analysis. Usually the TADs computed by the majority vote script produces longer TADs because it merges the regions of consecutive TADs.
  • wd_mountains.bed: High-condifence TADs
  • wd_valleys.bed: Out of TAD regions
  • wd_interestRegions.bed: All regions classified as high-confidence TADs or out of TADs (basically a union of wd_mountains and wd_valleys). Includes a color column for visualization in HiCExplorer.
  • region_type_count.pdf : A plot showing the number of regions in each class (Majority Vote, Fuzzy or Out of TAD). This is made before the refinement of majority vote areas as high-confidence TADs.
  • size_class.pdf: A plot showing the total length of the genome classified as High-confidence TADs, TADs between fuzzy regions, Fuzzy regions and Out of TAD regions.

About

Automatic pipeline for processing Hi-C data with different methods.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published