Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when running vireo mode 2 with option -N 1 #83

Open
calebthomas259 opened this issue Apr 11, 2023 · 1 comment
Open

Error when running vireo mode 2 with option -N 1 #83

calebthomas259 opened this issue Apr 11, 2023 · 1 comment

Comments

@calebthomas259
Copy link

Hello vireo developers,

I have a small issue to report, please see the details below:

Background
We're performing pooled single nucleus RNA-seq experiments, and are planning to demultiplex the samples with vireo mode 2 using donor-specific SNPs (obtained with shallow WGS for each donor). For testing purposes, we also sequenced some of the samples as individual snRNA-seq libraries (i.e. no pooling).

As a bioinformatics sanity check, we wanted to try running vireo in mode 2 on one of the unpooled libraries, using a VCF file that has genotypes for multiple donors, while specifying -N 1 in the vireo command. E.g. the library has donor A nuclei only, but the VCF file has genotypes for donors A, B, and C.

We were hoping that vireo would correctly identify that the sample is from donor A, and assign all of the nuclei to donor A (or at least the majority of nuclei assigned to donor A, and maybe a few unassigned). But unfortunately we get some error messages instead.

How to reproduce the error
When we run this command on our data:

vireo --randSeed=1234 --nproc=14 \
-d "$three_donor_genotypes_vcf" -c "$dir_cellsnp_pool" \
-N 1 \
-o 'temp_vireo'

It gives the following error message (I've shortened some of the filepaths):

[vireo] Loading cell folder ...
[vireo] Loading donor VCF file ...
[vireo] 158953 out 197549 variants matched to donor VCF
[vireo] Demultiplex 27219 cells to 1 donors with 158953 variants.
[vireo] lower bound ranges [-7665415.0, -7665415.0, -7665415.0]
[vireo] allelic rate mean and concentrations:
[[0.4   0.451 0.534]]
[[1981897.3 5929198.2 4157211.5]]
[vireo] donor size before removing doublets:
donor0
27219
Traceback (most recent call last):
  File ".../envs/cellsnp_vireo/bin/vireo", line 8, in <module>
    sys.exit(main())
  File ".../vireoSNP/vireo.py", line 204, in main
    res_vireo = vireo_wrap(cell_dat['AD'], cell_dat['DP'], n_donor=n_donor,
  File ".../vireoSNP/utils/vireo_wrap.py", line 152, in vireo_wrap
    doublet_prob, ID_prob, doublet_LLR = predict_doublet(modelCA, AD, DP)
  File ".../vireoSNP/utils/vireo_doublet.py", line 39, in predict_doublet
    GT_both = add_doublet_GT(vobj.GT_prob)
  File ".../vireoSNP/utils/vireo_doublet.py", line 118, in add_doublet_GT
    s_idx1 = sp_idx[:, 0]
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

As a workaround, I tried running with doublet detection disabled:

vireo --randSeed=1234 --nproc=14 \
-d "$three_donor_genotypes_vcf" -c "$dir_cellsnp_pool" \
-N 1 \
--noDoublet \
-o 'temp_vireo2'

But this results in a new error message:

[vireo] Loading cell folder ...
[vireo] Loading donor VCF file ...
[vireo] 158953 out 197549 variants matched to donor VCF
[vireo] Demultiplex 27219 cells to 1 donors with 158953 variants.
[vireo] lower bound ranges [-7665415.0, -7665415.0, -7665415.0]
[vireo] allelic rate mean and concentrations:
[[0.4   0.451 0.534]]
[[1981897.3 5929198.2 4157211.5]]
[vireo] donor size before removing doublets:
donor0
27219
Traceback (most recent call last):
  File ".../envs/cellsnp_vireo/bin/vireo", line 8, in <module>
    sys.exit(main())
  File ".../envs/cellsnp_vireo/lib/python3.9/site-packages/vireoSNP/vireo.py", line 217, in main
    write_donor_id(out_dir, donor_names, cell_dat['samples'], n_vars, res_vireo)
  File ".../envs/cellsnp_vireo/lib/python3.9/site-packages/vireoSNP/utils/io_utils.py", line 98, in write_donor_id
    prob_doublet_out = np.max(doublet_prob, axis=1)
  File "<__array_function__ internals>", line 5, in amax
  File ".../envs/cellsnp_vireo/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 2754, in amax
    return _wrapreduction(a, np.maximum, 'max', axis, None, out,
  File ".../envs/cellsnp_vireo/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 86, in _wrapreduction
    return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: zero-size array to reduction operation maximum which has no identity

Expected result
The command runs successfully without errors, and assigns the cells to donor A

Software versions (installed via conda/pip)

  • vireo: 0.5.8
  • cellsnp-lite: 1.2.2
  • python: 3.9.7
  • numpy: 1.21.4
  • matplotlib: 3.5.1
  • scipy: 1.7.3

Is it possible to do this with vireo? I was hoping you could please advise of a potential solution or a workaround, if any exist?

Thank you very much for the help!

@ghuls
Copy link
Contributor

ghuls commented Aug 30, 2023

Giving -N 1 doesn't make much sense as it would be the same as making no groups. Better use -N 3 (or omit it at all). If you have 1 sample only and you have 3 genotypes in your VCF, vireo should hopefully assign all/most cells to only one of your samples.

vireo --randSeed=1234 --nproc=14 \
-d "$three_donor_genotypes_vcf" -c "$dir_cellsnp_pool" \
-N 3 \
-o 'temp_vireo'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants