TRIFORCE

The Triforce consists of an automatized pipeline for processing Hi-C data with different methods.

Developed by: Dulce I. Valdivia + Luis Delaye + Kasia Oktaba

Installation

Clone this repository in your working environment:

git clone https://github.com/dulcirena/TAD-triforce.git

Software Requirements

Juicer
HiCExplorer
R
R packages:
- tidyverse
- dplyr
- strucchangeRcpp
- plotly
- ggpubr
- scico
bedtools

Usage:

From fastq raw data to topologically associated domains using Arrowhead and HiCExplorer

Work in progress

Compute consensus TADs using the TRIFORCE

Once you have ran Arrowhead and HiCExplorer to obtain the their corresponding TAD annotation files, it is time to use TRIFORCE to obtain a consensus set of high-confidence TADs.

To run the complete workflow for TAD annotation:

./triforce.sh  <WORK_DIR> \
		<FILES_FASTQ_DIR> <TAD_SEP_SCORE> \
		<RESOLUTION_KB> <PROJECT_NAME> <DISMISS_CHR> \
		<FILE_ARROWHEAD_TADS> <FILE_HICEXPLORER_TADS>

Description:

WORK_DIR: Directory where the out/ directory will be created.
FILES_FASTQ_DIR: Directory where the fastq files are (in progress, write any path).
TAD_SEP_SCORE: TAD separation score file computed with HiCExplorer. Should end with tad_score.bm
RESOLUTION_KB: Resolution of the matrix in kb.
PROJECT_NAME: ID for the project (do not use spaces)
DISMISS_CHR: The name of the chromosome you want to dismiss during the analysis.
FILE_ARROWHEAD: Arrowhead's TAD calling file (e.g. 10000_blocks)
HICEXPLORER_ARROWHEAD: HiCExplorer's TAD calling file. Should end with domains.bed

See the file src/run_test.sh for a working example.

Output:

All the outputs are stored in the directory WORK_DIR/out/

structure_CHR.html: An interactive file per chromosome (CHR) showing the breakpoints of the TAD separation score (TAD-SS) according to the structural change analysis (SCA). SCA breaks the TAD-SS in genomic regions that exhibit similar contact trends. The width of each breakpoint (lightgreen) represent its confidence interval.
confidenceIntervalCHR.tsv: The coordinates of the 5% and 95% CI for each breakpoint for each CHR. The coordinate of the 50% CI is used in the downstream analysis.
avgSCregion_boxplot.html: Boxplot showing the distribution of the average TAD separation score in each SCA-region. Only the regions above the overall median are kept for downstream analysis.
domainSizes_files: Distribution of TAD legth in the different steps of the analysis. Usually the TADs computed by the majority vote script produces longer TADs because it merges the regions of consecutive TADs.
wd_mountains.bed: High-condifence TADs
wd_valleys.bed: Out of TAD regions
wd_interestRegions.bed: All regions classified as high-confidence TADs or out of TADs (basically a union of wd_mountains and wd_valleys). Includes a color column for visualization in HiCExplorer.
region_type_count.pdf : A plot showing the number of regions in each class (Majority Vote, Fuzzy or Out of TAD). This is made before the refinement of majority vote areas as high-confidence TADs.
size_class.pdf: A plot showing the total length of the genome classified as High-confidence TADs, TADs between fuzzy regions, Fuzzy regions and Out of TAD regions.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
src		src
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TRIFORCE

Installation

Software Requirements

Usage:

From fastq raw data to topologically associated domains using Arrowhead and HiCExplorer

Compute consensus TADs using the TRIFORCE

About

Releases

Packages

Languages

License

dulcirena/TAD-triforce

Folders and files

Latest commit

History

Repository files navigation

TRIFORCE

Installation

Software Requirements

Usage:

From fastq raw data to topologically associated domains using Arrowhead and HiCExplorer

Compute consensus TADs using the TRIFORCE

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages