-
Notifications
You must be signed in to change notification settings - Fork 0
bergmanlab/contemplate
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Requirements: - UCSC tools installed on local machine, not too old (make sure that your liftUp command supports the bed8 format for -type), at least the tools: faSize, pslToBed, liftUp, - UCSC source code on local machine - variable KENTSRC which points to the base directory of the UCSC srccode e.g. if you are using bourne-like shells with the command export KENTSRC=/usr/local/src/kent/src - an SGE compute cluster - python (at least version 2.5), gawk, cut How to run the pipeline (e.g. for Drosophila pseudoobscura): - edit config.mk and edit username, name of your compute cluster and the target directory on it. - copy your ssh public key to the cluster (if it's not there yet) make clusterPublicKey this avoids passwords. Run ssh-keygen and retry if you do not have a public key yet. - put the file dm3.dp3.synteny.chain from the supplemental file 2 into the directory engstromPipeline/dm3dp3Example/ - download the genomes from the UCSC download server: cd engstromPipeline make getGenomes - split the genome into pieces according to the chain file: cd ../fastaFragments make (this can take several minutes) You can inspect the output of this step with less dm3dp3Example/fragments.bed less dm3dp3Example/chr2L-25*.fa - run contemplate on the pieces: cd ../contemplate (edit the makefile to point to the target SGE compute cluster directory, variables CLUSTERHOST and CLUSTERDIR) make toCluster ( this copies all files to the cluster) log into your cluster, go to the target directory defined in config.mk and run: make submitJobs When the jobs are finished, copy them back with: make convert - Inspect the result with less dm3dp3Example/constrainedElements.filtered.minBlocks3.bed Description of files and directories: config.mk: central config file which specifies the name of the current dataset to use and its parameters. The name of the dataset will be used to create subdirectories in all of the following directories. All contemplate- related data will be stored in these subdirectories. This scheme allows us to switch between contemplate runs by modifying only config.mk engstromPipeline: modified sourcecode of Per Engstrom's synteny pipeline. cd into this directory, run make in all subdirectories of src/, then run "make" in this directory to generate a list of syntenic regions between dm3 and droVir3. I have modified the chainNetSynteny program to output the native chain format instead of the processed bed file. fastaFragments: a C program based on the UCSC toolchain which gets FASTA fragments for chains and cuts them into suitable fragments. Ouput is fasta, bed (to inspect where it cut) and a liftUp file (to lift fragment annotations to the chromosome level). cd into this, run "make compile", then "make run" contemplate: The contemplate software (biojava) written by David Huen with input from Thomas Down and Max Haeussler. Adapt the makefile with the name of your cluster and the directory where you want to store the files there. Change into the "contemplate" directory, run "make toCluster", ssh onto your cluster, run "make submitJobs", go back to your machine, run "make fromCluster" and "make liftUp" If you have a local UCSC mirror, you can load the results into it with "make loadTracks" (assuming that your ~/.hg.conf is working) check the log directory on your cluster for "Exception" messages.
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published