Allow discontinuing replicate numbers #397

aghr · 2024-12-19T10:49:30Z

Description of feature

Dear NFcore-ATACseq team, the current pipeline v2.1.2 checks if in column replicate of the samplesheet replicates per condition have continuing ids starting at 1. If not, it throws this error

Process: NFCORE_ATACSEQ:ATACSEQ:INPUT_CHECK:SAMPLESHEET_CHECK
ERROR: Please check samplesheet -> Replicate ids must start with 1..<num_replicates>!
  Sample: 'd0, replicate ids: 1,4,6,7,9,10'

I do not see why replicate ids have to start at 1 and continue to <num_replicates> and this constraint creates inconveniences for the users. I do not see why users should not be able to use any string for identifying replicates, I sometimes prefer characters a, b, c, and so on. The only reason I see is convenience for the programmers of this pipeline because in case of allowing as replicate id any string they would need to extend this pipeline by creating a mapping from the arbitrary replicate ids to the numbers 1,2,3... or 0,1,2,3.

Three arguments that speak in favor for arbitrary replicate ids are:

When a user starts with replicate ids from 1 to num_replicates according to their exp design but want to exclude certain samples from the analysis due to issues with these samples, then the user is forced to re-assign replicate ids in the pipeline that would not match the replicate ids of the exp design. Eg. we have replicates 1,2,3,4,5,6 in exp design and pipeline samplesheet all is fine. But if it is necessary to exclude replicate 2, then it should be possible to have rep ids 1,3,4,5,6 instead of 1,2,3,4,5 where the user needs to keep track of the matching of rep ids from the pipeline back to the rep ids of the exp design: rep 2 in pipeline actually is 3 in exp design, 3->4, 4->5, 5->6.
If the rep ids need start at 1 and go until num_replicates then, why do the user need to fill in this column replicates of the sample sheet at all? It's a meaningless extra work because the pipeline could assign these rep ids automatically.
What the sample column of the samplesheet actually encodes is CONDITION. The nfcore rnaseq pipeline used the same samplesheet in the past but decided to change it adding a column CONDITION and making the column REPLICATE optional. Users can encode into the SAMPLE column sample names that encode replicates eg CTR_1, CTR_2, or TREAT_a, TREAT_b and no REPLICATE column is necessary. This solved the mentioned flaws.

Allowing arbitrary rep ids would make the use of this pipeline easier and more flexible and thus enhance the user experience.

The text was updated successfully, but these errors were encountered:

aghr added the enhancement New feature or request label Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow discontinuing replicate numbers #397

Allow discontinuing replicate numbers #397

aghr commented Dec 19, 2024 •

edited

Loading

Allow discontinuing replicate numbers #397

Allow discontinuing replicate numbers #397

Comments

aghr commented Dec 19, 2024 • edited Loading

Description of feature

aghr commented Dec 19, 2024 •

edited

Loading