- portable deployment across the machines
- environment needed for your application to run comes with the container
- 'if it works on your machine, it will work on others too'
- Docker is optimized for deployment of applications, not machines or systems
- use
docker build
to build an image based onDockerfile
- automated building on DockerHub after pushing to git repository
- docker tracks all changes in successive versions of the container
- it is possible to commit, diff and roll back
- like git, docker uses incremental uploads and downloads, so only diff is send
- any image can be used as a base for another image
- you can build one generic image and then multiple specialized versions of that one
- public registry Docker Hub with images uploaded by other people
- you can see
Dockerfile
for each container - official containers maintained by Docker team
- deployment: Dokku, Deis, Flynn ...
- multinode orchestration: Maestro, Salt, Mesos, OpenStack Nova ...
- dashboards: docker-ui, Openstack Horizon, Shipyard ...
- configuration managements: Chef, Puppet, Ansible ...
- continuous integration: Jenkins, Strider, Travis ...
you start developing a tool on your laptop, then you move it to the cloud (ubuntu) for some more serious analysis, but after a month you want ot put it on the local cluster (CentOS) for production use
you already use git (and maybe github) for version control and collaboration in your team, you would like to do the same with deployment, to know that if it worked on your machine, when you pushed the changes, it will work on your colleges machine when he pulls them
it's great if you can point in the paper to a git commit relevant to publication, but that does not guarantee that they can install and run it, with docker you can point them to particular commit for the image and be sure they can run it anywhere they need to, you can even make whole paper reproducible
Bremges, A., Maus, I., Belmann, P., Eikmeyer, F., Winkler, A., Albersmeier, A., Puhler, A., Schluter, A., Sczyrba, A.: (2015) Deeply sequenced metagenome and metatranscriptome of a biogas-producing microbial community from an agricultural production-scale biogas plant.GigaScience 4:33 doi:10.1186/s13742-015-0073-6 Docker accessible version of the study
if you're installing a new aligner that you want to try, put it in a container, so that when you need it in the cloud or on the cluster you don't have to install it again
-
tool in a box
-
analysis environment in a box
Docker containerization has a negligible impact on the execution performance of common genomic pipelines where tasks are generally very time consuming. The minimal performance loss introduced by the Docker engine is offset by the advantages of running an analysis in a self-contained and precisely controlled runtime environment. Docker makes it easy to precisely prototype an environment, maintain all its variations over time and rapidly reproduce any former configuration one may need to re-use. These capacities guarantee consistent results over time and across different computing platforms.
The impact of Docker containers on the performance of genomic pipelines https://dx.doi.org/10.7287/peerj.preprints.1171v2
- Brad Chapman Improving reproducibility and installation of genomic analysis pipelines with docker
- Titus Brown Adventures in replicable scientific papers: Docker
- Heng Li A few hours with docker
- Carl Boettiger An introduction to Docker for reproducible research, with examples from the R environment
- docker-helps-biofuels-research
- Reproducible research for biofuels and biogas
- Docker-based solutions to reproducibility in science
- BioDocker and BioBoxes: the containerization of bioinformatics
- Using docker for reproducible computational publications
- Updates on docker and bioinformatics
see this as a slideshow here