Skip to content

Download a NCBI genome, process it with your own code, then trash it

License

Notifications You must be signed in to change notification settings

maxtico/ncbi_single_use_genome

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

ncbi_single_use_genome

Download a NCBI genome, process it with your own code, then trash it

This script is instrumental to analyze as many NCBI genomes as you want, without having to allocate the storage space for them. In a single run ncbi_single_use_genome will download a genome, process it according to options, then trash it.

This script takes as input a genome accession from NCBI. To search for these accessions (e.g. for a certain species or lineage) use the command 'datasets'.

For example, to obtain a table with information (including its accession, column "assminfo-accession") for all available genome assemblies for any "Drosophila":

datasets download  genome taxon "Drosophila" --dehydrated --filename  my_search.zip
dataformat tsv genome --package my_search.zip > my_search.tsv

Requirements:

Help message / usage:

(run 'ncbi_single_use_genome.py -h' to inspect at any time)

This program downloads one specific NCBI assembly, executes certains operations, then cleans up data

Input/Output:

  • -a genome NCBI accession
  • -o folder to download to

Actions:

  • -c bash command template
  • -cf bash command template read from this file
  • -p python command template
  • -pf python command template read from this file

In all templates above, these placeholders can be used:

  • {accession} genome NCBI accession, e.g. GCA_000209535.1
  • {genomefile} path to genome fasta file
  • {taxid} taxonomy id
  • {species} species name, e.g. "Drosophila melanogaster"
  • {mspecies} masked species, e.g. "Drosophila_melanogaster"

Other options:

  • -k keep files instead of cleaning them up at the end

  • -w max workers for downloads at once

  • -sh open shells for bash commands. Required for complex commands (e.g. sequential commands, or using redirections)

  • -print_opt print currently active options

  • -h | --help print this help and exit

About

Download a NCBI genome, process it with your own code, then trash it

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%