Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about genome data installation because of invalid download link of mirbase #3748

Open
lovebaboon1989 opened this issue Aug 14, 2024 · 2 comments

Comments

@lovebaboon1989
Copy link

lovebaboon1989 commented Aug 14, 2024

Hi there,
I'm trying to install bcbio mm10 genome data for RNAseq and ATACseq analysis, but I noticed that several previous issues mentioned there might be some errors when downloading the mirbase. So I downloaded the most recent 'mirbase.yaml' file from github: https://github.com/chapmanb/cloudbiolinux/blob/master/ggd-recipes/mm10/mirbase.yaml, and copy this file up to the folder: /home/xz202/tmpbcbio-install/cloudbiolinux/ggd-recipes/mm10/ to overwrite with the older one.
I also tried using the parameter 'datatarget' to tell bcbio_nextgen not to download any mirna related databases, but my two strategies both failed.
Here is the command I use:
bcbio_nextgen.py upgrade -u skip --genomes mm10 --aligners bowtie2 --datatarget rnaseq

I found that downloading mirbase failed because the link for hairpin.fa and mature.fa is invalid even for the most updated version of 'mirbase.yaml'. So I have the following questions:
How can I just skip installing any small RNA seq related annotation files (i.e. skip installing mirbase and installing all the remaining)? Since I just want to perform RNAseq and ATACseq analysis for mice genome, I don't think I need the small RNA seq reference or index files.

Thanks a lot for your help!

Version info

  • bcbio version (bcbio_nextgen.py --version): 1.2.9
  • OS name and version (lsb_release -ds): CentOS Linux release 7.9.2009

To Reproduce
Exact bcbio command you have used:

bcbio_nextgen.py upgrade -u skip --genomes mm10 --aligners bowtie2 --datatarget rnaseq

Log files (could be found in work/log)
--2024-08-14 16:47:08-- https://github.com/chapmanb/cloudbiolinux/archive/master.tar.gz
Resolving github.com (github.com)... 140.82.114.4
Connecting to github.com (github.com)|140.82.114.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://codeload.github.com/chapmanb/cloudbiolinux/tar.gz/refs/heads/master [following]
--2024-08-14 16:47:08-- https://codeload.github.com/chapmanb/cloudbiolinux/tar.gz/refs/heads/master
Resolving codeload.github.com (codeload.github.com)... 140.82.113.9
Connecting to codeload.github.com (codeload.github.com)|140.82.113.9|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/x-gzip]
Saving to: ‘STDOUT’

 0K ........ ........ ........ ........ ........ ........  562K

3072K ........ ........ ........ ....... 530K=9.2s

2024-08-14 16:47:18 (549 KB/s) - written to stdout [5195515]

Upgrading bcbio
Upgrading bcbio-nextgen data files
List of genomes to get (from the config file at '{'genomes': [{'dbkey': 'mm10', 'name': 'Mouse (mm10)', 'indexes': ['seq', 'twobit'], 'annotations': ['transcripts', 'rmsk', 'problem_regions', 'prioritize', 'dbsnp', 'vcfanno', 'mirbase']}], 'genome_indexes': ['bowtie2', 'rtg'], 'install_liftover': False, 'install_uniref': False}'): Mouse (mm10)
Running GGD recipe: mm10 srnaseq 20211203
Traceback (most recent call last):
File "/home/xz202/bcbio/anaconda/bin/bcbio_nextgen.py", line 228, in
install.upgrade_bcbio(kwargs["args"])
File "/home/xz202/bcbio/anaconda/lib/python3.7/site-packages/bcbio/install.py", line 109, in upgrade_bcbio
upgrade_bcbio_data(args, REMOTES)
File "/home/xz202/bcbio/anaconda/lib/python3.7/site-packages/bcbio/install.py", line 361, in upgrade_bcbio_data
args.cores, ["ggd", "s3", "raw"])
File "/home/xz202/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 354, in install_data_local
_prep_genomes(env, genomes, genome_indexes, ready_approaches, data_filedir)
File "/home/xz202/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 480, in _prep_genomes
retrieve_fn(env, manager, gid, idx)
File "/home/xz202/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 875, in _install_with_ggd
ggd.install_recipe(os.getcwd(), env.system_install, recipe_file, gid)
File "/home/xz202/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 30, in install_recipe
recipe["recipe"]["full"]["recipe_type"], system_install)
File "/home/xz202/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 62, in _run_recipe
subprocess.check_output(["bash", run_file])
File "/home/xz202/bcbio/anaconda/lib/python3.7/subprocess.py", line 411, in check_output
**kwargs).stdout
File "/home/xz202/bcbio/anaconda/lib/python3.7/subprocess.py", line 512, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['bash', '/home/xz202/bcbio/genomes/Mmusculus/mm10/txtmp/ggd-run.sh']' returned non-zero exit status 8.

@lpantano
Copy link
Collaborator

Hi, sorry about the issue. I think sometimes miRBase is down and it leads for this issues. I would try again.
Another options is to modify that yaml file and remove the lines with mirbase to skip that downloading. You can remove things under command and output files to avoid downloading them and checking for them.

@lovebaboon1989
Copy link
Author

Hi, thanks for the reply. I deleted the line about 'miRBase' in the yaml file and it seems to work well. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants