Question about genome data installation because of invalid download link of mirbase #3748

lovebaboon1989 · 2024-08-14T20:56:45Z

Hi there,
I'm trying to install bcbio mm10 genome data for RNAseq and ATACseq analysis, but I noticed that several previous issues mentioned there might be some errors when downloading the mirbase. So I downloaded the most recent 'mirbase.yaml' file from github: https://github.com/chapmanb/cloudbiolinux/blob/master/ggd-recipes/mm10/mirbase.yaml, and copy this file up to the folder: /home/xz202/tmpbcbio-install/cloudbiolinux/ggd-recipes/mm10/ to overwrite with the older one.
I also tried using the parameter 'datatarget' to tell bcbio_nextgen not to download any mirna related databases, but my two strategies both failed.
Here is the command I use:
bcbio_nextgen.py upgrade -u skip --genomes mm10 --aligners bowtie2 --datatarget rnaseq

I found that downloading mirbase failed because the link for hairpin.fa and mature.fa is invalid even for the most updated version of 'mirbase.yaml'. So I have the following questions:
How can I just skip installing any small RNA seq related annotation files (i.e. skip installing mirbase and installing all the remaining)? Since I just want to perform RNAseq and ATACseq analysis for mice genome, I don't think I need the small RNA seq reference or index files.

Thanks a lot for your help!

Version info

bcbio version (bcbio_nextgen.py --version): 1.2.9
OS name and version (lsb_release -ds): CentOS Linux release 7.9.2009

To Reproduce
Exact bcbio command you have used:

bcbio_nextgen.py upgrade -u skip --genomes mm10 --aligners bowtie2 --datatarget rnaseq

Log files (could be found in work/log)
--2024-08-14 16:47:08-- https://github.com/chapmanb/cloudbiolinux/archive/master.tar.gz
Resolving github.com (github.com)... 140.82.114.4
Connecting to github.com (github.com)|140.82.114.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://codeload.github.com/chapmanb/cloudbiolinux/tar.gz/refs/heads/master [following]
--2024-08-14 16:47:08-- https://codeload.github.com/chapmanb/cloudbiolinux/tar.gz/refs/heads/master
Resolving codeload.github.com (codeload.github.com)... 140.82.113.9
Connecting to codeload.github.com (codeload.github.com)|140.82.113.9|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/x-gzip]
Saving to: ‘STDOUT’

 0K ........ ........ ........ ........ ........ ........  562K

3072K ........ ........ ........ ....... 530K=9.2s

2024-08-14 16:47:18 (549 KB/s) - written to stdout [5195515]

Upgrading bcbio
Upgrading bcbio-nextgen data files
List of genomes to get (from the config file at '{'genomes': [{'dbkey': 'mm10', 'name': 'Mouse (mm10)', 'indexes': ['seq', 'twobit'], 'annotations': ['transcripts', 'rmsk', 'problem_regions', 'prioritize', 'dbsnp', 'vcfanno', 'mirbase']}], 'genome_indexes': ['bowtie2', 'rtg'], 'install_liftover': False, 'install_uniref': False}'): Mouse (mm10)
Running GGD recipe: mm10 srnaseq 20211203
Traceback (most recent call last):
File "/home/xz202/bcbio/anaconda/bin/bcbio_nextgen.py", line 228, in
install.upgrade_bcbio(kwargs["args"])
File "/home/xz202/bcbio/anaconda/lib/python3.7/site-packages/bcbio/install.py", line 109, in upgrade_bcbio
upgrade_bcbio_data(args, REMOTES)
File "/home/xz202/bcbio/anaconda/lib/python3.7/site-packages/bcbio/install.py", line 361, in upgrade_bcbio_data
args.cores, ["ggd", "s3", "raw"])
File "/home/xz202/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 354, in install_data_local
_prep_genomes(env, genomes, genome_indexes, ready_approaches, data_filedir)
File "/home/xz202/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 480, in _prep_genomes
retrieve_fn(env, manager, gid, idx)
File "/home/xz202/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 875, in _install_with_ggd
ggd.install_recipe(os.getcwd(), env.system_install, recipe_file, gid)
File "/home/xz202/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 30, in install_recipe
recipe["recipe"]["full"]["recipe_type"], system_install)
File "/home/xz202/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 62, in _run_recipe
subprocess.check_output(["bash", run_file])
File "/home/xz202/bcbio/anaconda/lib/python3.7/subprocess.py", line 411, in check_output
**kwargs).stdout
File "/home/xz202/bcbio/anaconda/lib/python3.7/subprocess.py", line 512, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['bash', '/home/xz202/bcbio/genomes/Mmusculus/mm10/txtmp/ggd-run.sh']' returned non-zero exit status 8.

The text was updated successfully, but these errors were encountered:

lpantano · 2024-08-16T18:50:48Z

Hi, sorry about the issue. I think sometimes miRBase is down and it leads for this issues. I would try again.
Another options is to modify that yaml file and remove the lines with mirbase to skip that downloading. You can remove things under command and output files to avoid downloading them and checking for them.

lovebaboon1989 · 2024-08-17T01:53:29Z

Hi, thanks for the reply. I deleted the line about 'miRBase' in the yaml file and it seems to work well. Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about genome data installation because of invalid download link of mirbase #3748

Question about genome data installation because of invalid download link of mirbase #3748

lovebaboon1989 commented Aug 14, 2024 •

edited

Loading

lpantano commented Aug 16, 2024

lovebaboon1989 commented Aug 17, 2024

Question about genome data installation because of invalid download link of mirbase #3748

Question about genome data installation because of invalid download link of mirbase #3748

Comments

lovebaboon1989 commented Aug 14, 2024 • edited Loading

lpantano commented Aug 16, 2024

lovebaboon1989 commented Aug 17, 2024

lovebaboon1989 commented Aug 14, 2024 •

edited

Loading