PCAWG BWA-Mem Workflow

Overview

This is the SeqWare workflow for the TCGA/ICGC PanCancer project that aligns whole genome sequences with BWA-Mem. It uses version 0.7.8-r455 of bwa mem from version 1.1.1 of PCAP-core. It also uses 0.6.2-r126 of bwa for "bwa aln". It also reads/writes to GNOS (optionally), the metadata/data repository system used in the project. It can now be used in a standalone mode using the Dockstore CLI. For more information about the workflow see the CHANGELOG.

For more information about the project overall see the PanCancer wiki space.

For Users

Running via Dockstore

See the Dockstore page for this workflow. The idea is Dockstore links together the Dockstore.cwl descriptor (which tells you how to launch this workflow) and the Docker image for this workflow (which contains the compiled workflow). You can then use the dockstore command line to easily find parameters, parameterize, and then run the workflow. For this workflow, the Docker image is hosted and built on quay.io and the Dockstore.cwl is in GitHub. We also provide a sample Dockstore.json which includes links for sample reference files and input but you need to update the output to write to an S3 bucket where you have access or a full, local file path.

dockstore launch --entry quay.io/pancancer/pcawg-bwa-mem-workflow --json Dockstore.json

This downloads the sample inputs and reference files from HTTP URLs to a local datastore working directory, runs the workflow, and uploads the results back to the location you specified in S3 (or local file path). You can of course make a JSON that parameterizes everything with full, local file paths which will skip the file provisioning.

The last successful test was with the following dependencies and command-line:

$ git branch
  develop
* feature/cwl1
$ dockstore tool launch  --entry Dockstore.cwl --local-entry --json Dockstore_cwl.json

Versions that we tested with are the following

avro (1.8.1)
cwl-runner (1.0)
cwl-upgrader (0.1.1)
cwltool (1.0.20160712154127)
schema-salad (1.14.20160708181155)
setuptools (25.1.6)

For Developers

Building the Workflow

This workflow (like other SeqWare workflows) is built using Maven. In the workflow directory (workflow-bwa-pancancer), execute the following:

mvn clean install

This will take a long time on first build since it download dependencies from Maven and the reference genome which is 5GB+ in size.

Building the Workflow Docker Image

You can also build a Docker image that has the workflow ready to run in it.

docker build --no-cache -t pancancer/pcawg-bwa-mem-workflow:2.6.8 .

Testing Locally via Docker

You can run the image and then run the workflow

docker run -ti --rm pancancer/pcawg-bwa-mem-workflow:2.6.8 /bin/bash
seqware bundle launch --dir target/Workflow_Bundle_BWA_2.6.8_SeqWare_1.1.1/ --engine whitestar --no-metadata

To save time, if you have reference data handy, you can mount it into the container to save time. You can grab it from the links directory when running the test execution of the workflow like above.

docker run -ti --rm -v /<your custom location>/links.bak:/home/seqware/Seqware-BWA-Workflow/links  pancancer/pcawg-bwa-mem-workflow:2.6.8 /bin/bash

Installation & Running

This is beyond the scope of this README. However, we recommend using the Docker-based workflow and run it using the Dockstore command line.

Sample Data

Some synthetic sample data.

Reference Data

We use a specific reference based on GRCh37.

Authors

Brian O'Connor boconnor@oicr.on.ca
Junjun Zhang Junjun.Zhang@oicr.on.ca
Adam Wright adam.wright@oicr.on.ca

Contributors

Keiran Raine: PCAP-Core and BWA-Mem workflow design
Roshaan Tahir: Original BWA-Align workflow design

Workflow Authors' Release Checklist

Make sure you:

update the workflow version in:
- workflow.properties
- pom.xml
- gnos_upload_data.pl
- make sure you grep for any other files
update the description of the workflow in workflow.properties and gnos_upload_data.pl, this includes differences with the previous release. Update the CHANGELOG which should contain the bulk of documentation about changes and links to our Bug system.
test the workflow locally (VM) and at clouds
do not package your gnostest.pem key!
release in Github using HubFlow
upload the workflow zip to S3

Name		Name	Last commit message	Last commit date
Latest commit History 440 Commits
checker		checker
scripts		scripts
src/main/java/com/github/seqware		src/main/java/com/github/seqware
workflow		workflow
.gitignore		.gitignore
.travis.yml		.travis.yml
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
Dockstore.cwl		Dockstore.cwl
Dockstore.local.json		Dockstore.local.json
Dockstore.wdl		Dockstore.wdl
Dockstore_cwl.json		Dockstore_cwl.json
Dockstore_cwl2.json		Dockstore_cwl2.json
Dockstore_wdl.json		Dockstore_wdl.json
LICENSE.txt		LICENSE.txt
README.md		README.md
checker-input-cwl.json		checker-input-cwl.json
checker-workflow-wrapping-workflow.cwl		checker-workflow-wrapping-workflow.cwl
pom.xml		pom.xml
requirements.txt		requirements.txt
workflow.properties		workflow.properties

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PCAWG BWA-Mem Workflow

Overview

For Users

Running via Dockstore

For Developers

Building the Workflow

Building the Workflow Docker Image

Testing Locally via Docker

Installation & Running

Sample Data

Reference Data

Authors

Contributors

Workflow Authors' Release Checklist

About

Releases 9

Packages

Contributors 11

Languages

License

ICGC-TCGA-PanCancer/Seqware-BWA-Workflow

Folders and files

Latest commit

History

Repository files navigation

PCAWG BWA-Mem Workflow

Overview

For Users

Running via Dockstore

For Developers

Building the Workflow

Building the Workflow Docker Image

Testing Locally via Docker

Installation & Running

Sample Data

Reference Data

Authors

Contributors

Workflow Authors' Release Checklist

About

Resources

License

Stars

Watchers

Forks

Releases 9

Packages 0

Contributors 11

Languages

Packages