Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VCFfiles: seperated or merged #105

Open
hsymoon opened this issue Jul 19, 2024 · 2 comments
Open

VCFfiles: seperated or merged #105

hsymoon opened this issue Jul 19, 2024 · 2 comments

Comments

@hsymoon
Copy link

hsymoon commented Jul 19, 2024

Hi developers,
Thanks for developing this helpful tool. I encountered two questions when I used vireo.
As I have different time experiment data from 10x scrna-seq (batch1 for D1 , batch2 for D2 ,...).
Cellsnp-lite was used to call common SNP for each batch, followed by vireo to demultiplex .
Here we have a donor.vcf.gz .

Q1:I wonder which one could get more resonble result:
1)Seperated :
CELL_FILE : .cellSNP.cells.vcf.gz for each batch from cellsnp-lite (like batch1.cellSNP.cells.vcf.gz )
DONOR_FILE: bcftools view donor.vcf.gz -R batch1.cellSNP.cells.vcf.gz -Oz -o donors.sub_Batch1.vcf.gz
~/miniconda3/bin/vireo -c batch1.cellSNP.cells.vcf.gz -d donors.sub_Batch1.vcf.gz -o ${re} -N $n --randSeed 2

  1. Merged:
    CELL_FILE : "bcftools merge" was used to merge cellSNP.cells.vcf.gz for each batch from cellsnp-lite ,generated all.cellSNP.cells.vcf.gz.
    DONOR_FILE: bcftools view donor.vcf.gz -R all.cellSNP.cells.vcf.gz -Oz -o donors.sub_All.vcf.gz
    ~/miniconda3/bin/vireo -c all.cellSNP.cells.vcf.gz -d donors.sub_All.vcf.gz -o ${re} -N $n --randSeed 2

As I tried ,even though --randSeed was set to the same, cells in batch1 was demultiplexed to different donors in Seperated or Merged.
Could you tell me which one could get more resonble result and why .Many thanks.

Q2: Mode4 in vireo was applicable when with genotype but not confident (or only for subset of SNPs).
The command is : vireo -c $CELL_DATA -d $DONOR_GT_FILE -o $OUT_DIR --forceLearnGT.
Could you give some examples for this mode?Sorry for my questions.

   Thank you  very much.
@huangyh09
Copy link
Collaborator

Hi,

Thanks for sharing your experience. For Q1, I would expect both "separated" and "merged" to give very similar results, if the configuration (coverage, n_cell per donor, balance of donors, etc) is within a reasonable range. Similarly, if the configuration is fine, I would say the "separated" is good enough, as the number of SNPs is often sufficient. However, if the number of cells for each donor (or some minor donor) is very limited (e.g., <100 cells), merging multiple time points may help increase the cell numbers for each donor, while merging batches may use different sets of SNPs. I would run cellsnp-lite on all batches together, followed by vireo, for the "merged" option.

Alternatively, for the problematic batch, you can simply run vireo without reference genotype and see whether it is better aligned to the "separated" or "merged".

For Q2, this is a less commonly used option. It is similar to mode 1 without genotype, but only using the donor genotype as prior, it can be updated in the estimation. If you feel your genotype has high noise (e.g., from very shallow bulk RNA-seq), you may consider trying it.

Yuanhua

@hsymoon
Copy link
Author

hsymoon commented Jul 26, 2024

Hi,

Thanks for sharing your experience. For Q1, I would expect both "separated" and "merged" to give very similar results, if the configuration (coverage, n_cell per donor, balance of donors, etc) is within a reasonable range. Similarly, if the configuration is fine, I would say the "separated" is good enough, as the number of SNPs is often sufficient. However, if the number of cells for each donor (or some minor donor) is very limited (e.g., <100 cells), merging multiple time points may help increase the cell numbers for each donor, while merging batches may use different sets of SNPs. I would run cellsnp-lite on all batches together, followed by vireo, for the "merged" option.

Alternatively, for the problematic batch, you can simply run vireo without reference genotype and see whether it is better aligned to the "separated" or "merged".

For Q2, this is a less commonly used option. It is similar to mode 1 without genotype, but only using the donor genotype as prior, it can be updated in the estimation. If you feel your genotype has high noise (e.g., from very shallow bulk RNA-seq), you may consider trying it.

Yuanhua

Thanks very much for your valuable reply. It helps me a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants