Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: call peaks for each replicate (diffbind), not only the merged data #367

Open
lindenb opened this issue Jun 28, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@lindenb
Copy link

lindenb commented Jun 28, 2024

Description of feature

Hi all, I've been asked to run diffbind on our ATACseq data; Unless I'm wrong diffbind requires a samplesheet with each replicate and, for each replicate a BAM and a peaks/bed: https://bioconductor.org/packages/release/bioc/vignettes/DiffBind/inst/doc/DiffBind.pdf

> samples
SampleID Tissue Factor Condition Treatment Replicate
1 BT4741 BT474 ER Resistant Full-Media 1
2 BT4742 BT474 ER Resistant Full-Media 2
3 MCF71 MCF7 ER Responsive Full-Media 1
4 MCF72 MCF7 ER Responsive Full-Media 2
5 MCF73 MCF7 ER Responsive Full-Media 3
6 T47D1 T47D ER Responsive Full-Media 1
7 T47D2 T47D ER Responsive Full-Media 2
8 MCF7r1 MCF7 ER Resistant Full-Media 1
9 MCF7r2 MCF7 ER Resistant Full-Media 2
10 ZR751 ZR75 ER Responsive Full-Media 1
11 ZR752 ZR75 ER Responsive Full-Media 2

bamReads ControlID bamControl
1 reads/Chr18_BT474_ER_1.bam BT474c reads/Chr18_BT474_input.bam
2 reads/Chr18_BT474_ER_2.bam BT474c reads/Chr18_BT474_input.bam
3 reads/Chr18_MCF7_ER_1.bam MCF7c reads/Chr18_MCF7_input.bam
4 reads/Chr18_MCF7_ER_2.bam MCF7c reads/Chr18_MCF7_input.bam
5 reads/Chr18_MCF7_ER_3.bam MCF7c reads/Chr18_MCF7_input.bam
6 reads/Chr18_T47D_ER_1.bam T47Dc reads/Chr18_T47D_input.bam
7 reads/Chr18_T47D_ER_2.bam T47Dc reads/Chr18_T47D_input.bam
8 reads/Chr18_TAMR_ER_1.bam TAMRc reads/Chr18_TAMR_input.bam
9 reads/Chr18_TAMR_ER_2.bam TAMRc reads/Chr18_TAMR_input.bam
10 reads/Chr18_ZR75_ER_1.bam ZR75c reads/Chr18_ZR75_input.bam
11 reads/Chr18_ZR75_ER_2.bam ZR75c reads/Chr18_ZR75_input.bam

Peaks PeakCaller
1 peaks/BT474_ER_1.bed.gz bed
2 peaks/BT474_ER_2.bed.gz bed
3 peaks/MCF7_ER_1.bed.gz bed
4 peaks/MCF7_ER_2.bed.gz bed
5 peaks/MCF7_ER_3.bed.gz bed
6 peaks/T47D_ER_1.bed.gz bed
7 peaks/T47D_ER_2.bed.gz bed
8 peaks/TAMR_ER_1.bed.gz bed
9 peaks/TAMR_ER_2.bed.gz bed
10 peaks/ZR75_ER_1.bed.gz bed
11 peaks/ZR75_ER_2.bed.gz bed

but, as far as I understand (Am i wrong ?) the peak calling is only generated for the merged data, so i would be nice to have an option to generate the peaks for each replicate. (Unless there is a cool nf-project that is able to find the differential peaks between two ATACSeq conditions, but I have not found it )

@lindenb lindenb added the enhancement New feature or request label Jun 28, 2024
@JoseEspinosa
Copy link
Member

In principle, https://github.com/nf-core/differentialabundance is what you are looking for

@lindenb
Copy link
Author

lindenb commented Jul 3, 2024

@JoseEspinosa unless I'm wrong differentalabundance doesn't work with the output of nf-atacseq . see https://nfcore.slack.com/archives/C045UNCS5R9/p1707383810019459?thread_ts=1707349504.512029&cid=C045UNCS5R9

We had ambitions in this regard, but it hasn't been optimised for it yet, so I'd say it's unsupported right now and I don't know if it's an appropriate solution.

@JoseEspinosa
Copy link
Member

The peaks generated for each replicate can be found under <ALIGNER>/merged_library/macs2/<PEAK_TYPE>, see the example of the full test here. Likewise, the bam files can be found in <ALIGNER>/merged_library/, see again the example here. Hope this helps.

@csmith02-ms
Copy link

Hello,

I would like to follow up on this inquiry. I am also interested in running diffbind but utilizing merged replicate peaks rather than peaks for each individual replicate. There are other downstream analyses that I have performed without realizing that the pipeline only integrates specified parameters for merged_library MACS2 peak calling as opposed to incorporating them for merged_replicate peak calling as well. Is it possible to incorporate within the params documentation to allow the user to specify peak calling parameters for both sets of peak calling? Or make it more clear in the documentation that these parameters are only integrated into one part.

Here is the blog post where I first posed the issue I was having. [https://community.seqera.io/t/custom-paramaters-in-yaml-params-file-not-reflected-in-peak-calling-output/1458]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants