Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mini_assemble empty target sequences error #33

Open
kautto opened this issue Jun 13, 2019 · 3 comments
Open

mini_assemble empty target sequences error #33

kautto opened this issue Jun 13, 2019 · 3 comments

Comments

@kautto
Copy link

kautto commented Jun 13, 2019

Dear nanopore devs,

I'm having issues getting mini_assemble to run on human (HG001/NA12978) data. I've successfully ran it on smaller assemblies before, but a ~30x human genome seems to be causing issues. I'm running on a 96 core/768 gig RAM AWS instance. After running the minimapping for a while, it eventually gets to:

[M::worker_pipeline::751.688*6.16] mapped 18095 sequences
[M::worker_pipeline::754.787*6.16] mapped 19902 sequences
[M::main] Version: 2.14-r883
[M::main] CMD: minimap2 -K 500M -t 96 NA12878.gfa.fa.gz NA12878.fa.gz
[M::main] Real time: 754.791 sec; CPU: 4648.586 sec; Peak RSS: 1.928 GB
[racon::Polisher::initialize] error: empty target sequences set!
[M::mm_idx_gen::0.014*0.17] collected minimizers
[M::mm_idx_gen::0.019*9.13] sorted minimizers
[M::main::0.019*9.12] loaded/built the index for 0 target sequence(s)
[M::mm_mapopt_update::0.019*9.10] mid_occ = 718917417
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 0
[M::mm_idx_stat::0.019*9.08] distinct minimizers: 0 (-nan% are singletons); average occurrences: -nan; average spacing: -nan
[M::worker_pipeline::4.056*6.97] mapped 82521 sequences
[M::worker_pipeline::7.981*7.89] mapped 91941 sequences
[M::worker_pipeline::11.426*7.27] mapped 83333 sequences

Which then results in the same "empty target sequences set" error propagating until the whole thing fails:

[M::main] Version: 2.14-r883
[M::main] CMD: minimap2 -K 500M -t 96 racon_1_1.fa.gz NA12878.fa.gz
[M::main] Real time: 768.219 sec; CPU: 4632.751 sec; Peak RSS: 1.931 GB
[racon::Polisher::initialize] error: empty target sequences set!
[M::mm_idx_gen::0.010*0.35] collected minimizers
[M::mm_idx_gen::0.015*12.78] sorted minimizers
[M::main::0.015*12.77] loaded/built the index for 0 target sequence(s)
[M::mm_mapopt_update::0.015*12.73] mid_occ = 0
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 0
[M::mm_idx_stat::0.015*12.70] distinct minimizers: 0 (-nan% are singletons); average occurrences: -nan; average spacing: -nan
[M::main] Version: 2.14-r883
[M::main] CMD: minimap2 -K 500M -t 96 racon_1_3.fa.gz NA12878.fa.gz
[M::main] Real time: 740.812 sec; CPU: 4347.913 sec; Peak RSS: 1.894 GB
[racon::Polisher::initialize] error: empty target sequences set!
rm: cannot remove 'shuffled*': No such file or directory
rm: cannot remove '*paf*': No such file or directory

Any ideas where to start troubleshooting this?

Edit: The input definitely isn't empty when it starts the run.

@cjw85
Copy link
Member

cjw85 commented Oct 2, 2019

Hi @kautto,

We would not recommend running mini_assemble on a human genome, this is not really a use case considered in its design. You would be better served by using a purpose-built, robust assembler like canu, flye, or shasta.

@glf20
Copy link

glf20 commented Apr 7, 2020

Hi, I have been getting a similar error running mini_assemble on a small 3.5kb amplicon. It has been running ok, but when i filter the data to include shorter length fragments it gives me a similar error.

[M::asg_arc_del_multi] removed 0 multi-arcs
[M::asg_arc_del_asymm] removed 0 asymmetric arcs
[M::asg_pop_bubble] popped 0 bubbles and trimmed 0 tips
[M::main] ===> Step 4.3: cutting short overlaps (3 rounds in total) <===
[M::asg_arc_del_short] removed 0 short overlaps
[M::asg_arc_del_short] removed 0 short overlaps
[M::asg_arc_del_short] removed 0 short overlaps
[M::main] ===> Step 4.4: removing short internal sequences and bi-loops <===
[M::asg_cut_internal] cut 0 internal sequences
[M::asg_cut_biloop] cut 0 small bi-loops
[M::asg_cut_tip] cut 0 tips
[M::asg_pop_bubble] popped 0 bubbles and trimmed 0 tips
[M::main] ===> Step 4.5: aggressively cutting short overlaps <===
[M::asg_arc_del_short] removed 0 short overlaps
[M::main] ===> Step 5: generating unitigs <===
[M::main] Version: 0.3-r179
[M::main] CMD: miniasm -s 100 -e 3 -f AS2k_denovo_trimmed.fa.gz AS2k_denovo.paf.gz
[M::main] Real time: 3170.456 sec; CPU: 3167.859 sec
Running racon read shuffle 1...
Running round 1 consensus...
[M::mm_idx_gen::0.0004.33] collected minimizers
[M::mm_idx_gen::0.002
12.06] sorted minimizers
[M::main::0.00211.89] loaded/built the index for 0 target sequence(s)
[M::mm_mapopt_update::0.002
11.66] mid_occ = 1
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 0
[M::mm_idx_stat::0.00211.43] distinct minimizers: 0 (-nan% are singletons); average occurrences: -nan; average spacing: -nan
[M::worker_pipeline::1.929
4.83] mapped 67078 sequences
[M::main] Version: 2.17-r941
[M::main] CMD: minimap2 -K 500M -t 48 AS2k_denovo.gfa.fa.gz AS2k_denovo_trimmed.fa.gz
[M::main] Real time: 1.930 sec; CPU: 9.322 sec; Peak RSS: 0.186 GB
[racon::Polisher::initialize] error: empty target sequences set!

the command i have been running on our HPC is:
mini_assemble -i /data/freimanis/analysis_files/nanopore/Run1/Asia/2kb/filtered/filtered.fq -o denovo -p AS2k_denovo -t 48 -c

Can anyone help. I have rerun several times and keep getting same result.

@glf20
Copy link

glf20 commented Apr 7, 2020

the full log is as below:
Skipped pre-assembly correction.
Overlapping reads...
[M::mm_idx_gen::6.1121.68] collected minimizers
[M::mm_idx_gen::6.751
2.87] sorted minimizers
[M::main::6.7512.87] loaded/built the index for 67078 target sequence(s)
[M::mm_mapopt_update::6.875
2.84] mid_occ = 18774
[M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 67078
[M::mm_idx_stat::6.9502.82] distinct minimizers: 4409671 (68.07% are singletons); average occurrences: 14.201; average spacing: 2.979
[M::worker_pipeline::4217.177
5.55] mapped 67078 sequences
[M::main] Version: 2.17-r941
[M::main] CMD: minimap2 -x ava-ont -K 500M -t 48 AS2k_denovo_trimmed.fa.gz AS2k_denovo_trimmed.fa.gz
[M::main] Real time: 4217.266 sec; CPU: 23396.592 sec; Peak RSS: 100.374 GB
Assembling graph...
[M::main] ===> Step 1: reading read mappings <===
[M::ma_hit_read::2278.6661.00] read 1251922617 hits; stored 2503845219 hits and 67077 sequences (186573504 bp)
[M::main] ===> Step 2: 1-pass (crude) read selection <===
[M::ma_hit_sub::2712.484
1.00] 67077 query sequences remain after sub
[M::ma_hit_cut::2766.5991.00] 2503589437 hits remain after cut
[M::ma_hit_flt::2851.754
1.00] 2468450488 hits remain after filtering; crude coverage after filtering: 22268.75
[M::main] ===> Step 3: 2-pass (fine) read selection <===
[M::ma_hit_sub::3023.4051.00] 67075 query sequences remain after sub
[M::ma_hit_cut::3076.853
1.00] 2468279139 hits remain after cut
[M::ma_hit_contained::3165.6161.00] 21 sequences and 26 hits remain after containment removal
[M::main] ===> Step 4: graph cleaning <===
[M::ma_sg_gen] read 10 arcs
[M::main] ===> Step 4.1: transitive reduction <===
[M::asg_arc_del_trans] transitively reduced 0 arcs
[M::main] ===> Step 4.2: initial tip cutting and bubble popping <===
[M::asg_cut_tip] cut 17 tips
[M::asg_arc_del_multi] removed 0 multi-arcs
[M::asg_arc_del_asymm] removed 0 asymmetric arcs
[M::asg_pop_bubble] popped 0 bubbles and trimmed 0 tips
[M::main] ===> Step 4.3: cutting short overlaps (3 rounds in total) <===
[M::asg_arc_del_short] removed 0 short overlaps
[M::asg_arc_del_short] removed 0 short overlaps
[M::asg_arc_del_short] removed 0 short overlaps
[M::main] ===> Step 4.4: removing short internal sequences and bi-loops <===
[M::asg_cut_internal] cut 0 internal sequences
[M::asg_cut_biloop] cut 0 small bi-loops
[M::asg_cut_tip] cut 0 tips
[M::asg_pop_bubble] popped 0 bubbles and trimmed 0 tips
[M::main] ===> Step 4.5: aggressively cutting short overlaps <===
[M::asg_arc_del_short] removed 0 short overlaps
[M::main] ===> Step 5: generating unitigs <===
[M::main] Version: 0.3-r179
[M::main] CMD: miniasm -s 100 -e 3 -f AS2k_denovo_trimmed.fa.gz AS2k_denovo.paf.gz
[M::main] Real time: 3170.456 sec; CPU: 3167.859 sec
Running racon read shuffle 1...
Running round 1 consensus...
[M::mm_idx_gen::0.000
4.33] collected minimizers
[M::mm_idx_gen::0.002*12.06] sorted minimizers

[M::main::0.00211.89] loaded/built the index for 0 target sequence(s)
[M::mm_mapopt_update::0.002
11.66] mid_occ = 1
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 0
[M::mm_idx_stat::0.00211.43] distinct minimizers: 0 (-nan% are singletons); average occurrences: -nan; average spacing: -nan
[M::worker_pipeline::1.929
4.83] mapped 67078 sequences
[M::main] Version: 2.17-r941
[M::main] CMD: minimap2 -K 500M -t 48 AS2k_denovo.gfa.fa.gz AS2k_denovo_trimmed.fa.gz
[M::main] Real time: 1.930 sec; CPU: 9.322 sec; Peak RSS: 0.186 GB
[racon::Polisher::initialize] error: empty target sequences set!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants