Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow discontinuing replicate numbers #397

Open
aghr opened this issue Dec 19, 2024 · 0 comments
Open

Allow discontinuing replicate numbers #397

aghr opened this issue Dec 19, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@aghr
Copy link

aghr commented Dec 19, 2024

Description of feature

Dear NFcore-ATACseq team, the current pipeline v2.1.2 checks if in column replicate of the samplesheet replicates per condition have continuing ids starting at 1. If not, it throws this error

Process: NFCORE_ATACSEQ:ATACSEQ:INPUT_CHECK:SAMPLESHEET_CHECK
ERROR: Please check samplesheet -> Replicate ids must start with 1..<num_replicates>!
  Sample: 'd0, replicate ids: 1,4,6,7,9,10'

I do not see why replicate ids have to start at 1 and continue to <num_replicates> and this constraint creates inconveniences for the users. I do not see why users should not be able to use any string for identifying replicates, I sometimes prefer characters a, b, c, and so on. The only reason I see is convenience for the programmers of this pipeline because in case of allowing as replicate id any string they would need to extend this pipeline by creating a mapping from the arbitrary replicate ids to the numbers 1,2,3... or 0,1,2,3.

Three arguments that speak in favor for arbitrary replicate ids are:

  1. When a user starts with replicate ids from 1 to num_replicates according to their exp design but want to exclude certain samples from the analysis due to issues with these samples, then the user is forced to re-assign replicate ids in the pipeline that would not match the replicate ids of the exp design. Eg. we have replicates 1,2,3,4,5,6 in exp design and pipeline samplesheet all is fine. But if it is necessary to exclude replicate 2, then it should be possible to have rep ids 1,3,4,5,6 instead of 1,2,3,4,5 where the user needs to keep track of the matching of rep ids from the pipeline back to the rep ids of the exp design: rep 2 in pipeline actually is 3 in exp design, 3->4, 4->5, 5->6.
  2. If the rep ids need start at 1 and go until num_replicates then, why do the user need to fill in this column replicates of the sample sheet at all? It's a meaningless extra work because the pipeline could assign these rep ids automatically.
  3. What the sample column of the samplesheet actually encodes is CONDITION. The nfcore rnaseq pipeline used the same samplesheet in the past but decided to change it adding a column CONDITION and making the column REPLICATE optional. Users can encode into the SAMPLE column sample names that encode replicates eg CTR_1, CTR_2, or TREAT_a, TREAT_b and no REPLICATE column is necessary. This solved the mentioned flaws.

Allowing arbitrary rep ids would make the use of this pipeline easier and more flexible and thus enhance the user experience.

@aghr aghr added the enhancement New feature or request label Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant