Repository to hold wrapper apps for running bcbio analyses on DNAnexus using the Common Workflow Language.
This app encodes the steps of creating bcbio CWL and running a bcbio workflow on the platform. It also makes use of assets for efficient caching of the bcbio software and Docker image.
This documents the process of building the assets and apps. For usage, see the high level app documentation and detailed usage documentation.
The public bcbio_resources project on DNAnexus contains reference genomes, assets and applets used in bcbio analysis.
dx login
dx select bcbio_resources
On a Linux machine with the dx-toolkit and Docker installed:
# bcbio_resources
PROJECT=project-F541fX00f5v9vKJjJ34gvgbv
dx select $PROJECT
docker pull quay.io/bcbio/bcbio-vc
dx-docker create-asset quay.io/bcbio/bcbio-vc --output_path $PROJECT:/containers/
dx ls -l /containers | head -6
dx describe `dx ls containers/ | head -1 | cut -d ' ' -f 3`
From the last two commands you need to identify the latest docker build file
reference (record-NNN
) and the docker file reference from this (file-NNN
).
The bcbio-vm asset includes installation of requirements for running using bioconda recipes. If bcbio, bcbio-vm or dx-cwl have changed, update the recipes prior to rebuilding to include these latest changes when rebuilding this asset.
dx select bcbio_resources
dx build_asset bcbio-dnanexus-wrapper/bcbio-vm-asset -d bcbio_resources:/bcbio_assets
dx mv bcbio-vm-asset /bcbio_assets
dx ls -l bcbio_resources:/bcbio_assets
From the last command note the latest bcbio-vm-asset (record-NNN
).
-
Update the version in dxapp.json
-
Add the bcbio-vm record asset ID and Docker record asset ID from above to assetDepends in dxapp.json
-
Add the Docker file ID to
BCBIO_CONTAINER_FILE
in bcbio-run-workflow.sh
Build the bcbio applet in bcbio_resources:/applets
with:
dx select bcbio_resources
dx build -a -d bcbio_resources:/applets/ bcbio-dnanexus-wrapper/bcbio-run-workflow
You can run this directly for testing in the same way as versioned apps.
To release a new version app, publish and make public:
dx build --publish --app bcbio-dnanexus-wrapper/bcbio-run-workflow -b org-az_cgr_services
dx add users app-bcbio-run-workflow PUBLIC
Generally in an R&D/pre-production mode you want to test a pipeline on a handful up to hundreds of samples. There still may be bugs/issues that would require changes in the underlying bcbio Docker image. For this case, you'd like to reuse results up to the point of failure but still use a modified Docker image.
Rather than use a cached asset on the platform (as described above), when running the workflow for the first time, provide this option to the app:
dx run bcbio-run-workflow -ipull_from_docker_registry=true ...
where '...' are the remaining options you would typically supply to the app. This option ensures that the compiled workflow directly pulls from the Docker registry as opposed to using a cached asset. This is a little less efficient and robust when compared to using a cached asset, but for tens to hundreds of runs it may be preferable to accellerate iteration for R&D purposes.
Now, if you noticed a bug and subseequently modify the Docker image, you can reuse this workflow instead of compiling a new one:
dx run bcbio-run-workflow -ireuse_workflow=workflow-XXXX ...
OR
dx run bcbio-run-workflow -ireuse_workflow=path/to/workflow-name ...
The execution of this app will allow reuse of existing results already computed for the workflow but will use the modified Docker image for any remaining jobs to be executed.