-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bypassing ordering step #32
Comments
Hi @sumit-walia, at the moment we do not have an option to pass a custom ordering, but it's an interesting suggestion. It's something we might consider adding at a later iteration. In the meantime I can try to help with the problem with mash, and see if this speeds things up. Would you mind sharing the error message you get? One possibility is that the Aside from that, I never tried building graphs with many (>1k) short (<1Mbp) genomes. I'd be curious to know how pangraph behaves in this regime. Depending on what your question is, it might be important to tune the Best, |
Hi @mmolari, I have installed mash and it is available in my path.
Thanks, |
Hi @sumit-walia, from the error message it looks like the code is failing on this line. This is where the output of I tried reproducing the issue on my end. I downloaded ~3000 sars-cov-2 sequences from NCBI and run pangraph with the command: pangraph build -d mash sequences.fa but I could not reproduce the error. Pangraph executed successfully. However when inspecting the output graph I noticed that a single genome was missing from the graph. After some debugging I found out that this was due to an input-output stream not being flushed between a write and a read operation. Now the problem should be fixed. Could you try pulling again the master branch and testing if this solves the problem in your case? If not, here are some further checks that we could do to try to figure out where the error comes from:
Thanks for your feedback, it's much appreciated! Marco |
Hi, I realized that the error was caused by an outdated version of Mash that I was using. Although I can now use Thanks, |
I am building PanGraph for Sars-CoV2 sequences (around 50k). I observed that the ordering step is a bottleneck here. Is it possible to pass a tree as an argument and bypass the ordering step in PanGraph? Also, I have tried adding "-d mash" as an argument to check the speedup but it throws an error instead.
Any help would be appreciated.
Thanks
The text was updated successfully, but these errors were encountered: