forked from VertebrateResequencing/vr-pipe
-
Notifications
You must be signed in to change notification settings - Fork 0
Generic pipeline system
License
mp15/vr-pipe
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
[You should read IMPORTANT_NOTES which notes version-specific issues] VRPipe is a generic pipeline system designed for use by the Vertebrate Resequencing team at the Sanger Instititue, but also for broad use for anyone. For an overview of why it was created, read our vision document: https://github.com/VertebrateResequencing/vr-pipe/wiki/Vision It comprises mostly self-documented Perl code. There are both scripts and modules within subfolders. Help on each module is currently of limited availability, but eventually all will be self-documented with POD, which can be read with eg. perldoc. Each script has a --help opion, eg: vrpipe-status --help INSTALLATION ------------ You will need the source version of samtools compiled with -fPIC and -m64 in the CFLAGS, and the environment variable SAMTOOLS pointing to that source directory (which should now contain bam.h and libbam.a). Samtools can be downloaded from here: http://sourceforge.net/projects/samtools/files/samtools/ Furthermore, it is required that a samtools executable be in your PATH. (Note that tests use the samtools in your PATH, not samtools in your SAMTOOlS dir; production pipelines will use the samtools at the absolute path you configure them with during setup - the default will be the first one in your PATH.) It is also recommended that you set PERL_INLINE_DIRECTORY to ~/.Inline and create that directory. You require Module::Build in order for the Build.PL script to work. It is recommended you 'install Bundle::CPAN' using 'cpan' prior to attempting setup. First, do the initial setup and dependency installation: $ perl Build.PL If this says you have "ERRORS/WARNINGS FOUND IN PREREQUISITES" try: $ ./Build installdeps to install missing prerequisites from CPAN. You may find that you have to unset your LANG environment variable temporarily to install all CPAN prerequisites. You will likely find that Inline::Filters fails its preprocess.t test; this is ok and you can manually use 'cpan' to: 'force install Inline::Filters'. You may also find that MooseX::Types::Parameterizable fails its tests. In that case you can manually use 'cpan' to install an earlier version that should work: 'install JJNAPIORK/MooseX-Types-Parameterizable-0.07.tar.gz'. Once the prerequisites are installed you will be asked a number of setup questions such as the details of your production and testing databases. It is recommended to use a stand-alone relational database such as MySQL. The pipeline functionality requires such a database, but if you're only using code such as a parser or the bas function you can use sqlite, supplying a file path for the database name. Once setup is complete modules/VRPipe/SiteConfig.pm will have been created. It is this file that holds your site-wide configuration of VRPipe. If you choose to use the local scheduler, you will need to add the scripts directory to your PATH prior to running tests. Note that the local scheduler currently doesn't work well with an sqlite database, so some tests are likely to fail with that combination. To test the code prior to using it: $ ./Build test A number of environment variables affect which tests run and how. In most cases it is fine to not worry about these extra variables. They are described here for the sake of completeness. VRPIPE_TEST_PIPELINES, when true, will fully test all the pipelines and steps instead of just the core functionality of the system. These tests take a very long time - potentially hours - so this is off by default. If you're interested in running a particular pipeline, you can test just that pipeline by setting this variable to true and then running: $ ./Build test --test_files t/VRPipe/Pipelines/[name].t --verbose GATK, pointing to a directory containing GATK jar files, will enable tests that require GATK VRPIPE_VRTRACK_TESTDB will enable testing of the VRTrack DataSource, using the value supplied as the database name (other database details will come from the standard VRTrack environment variables) Additionally, tests that require certain executables will not run unless you have those executables in your PATH. To install: $ ./Build install (or just include the modules subdirectory in your PERL5LIB, and include the scripts subdirectory in your PATH) To create your production database (testing database is created automatically): $ vrpipe-db_deploy If in the future the VRPipe code is updated and there is a change to the schema, you will need to stop VRPipe, update your code, then run: $ vrpipe-db_upgrade To use: The daemons to run the system have not yet been written. In the mean time you can run the scripts vrpipe-trigger_pipelines and vrpipe-dispatch_pipelines. Set up a pipeline using vrpipe-setup and then it will execute if you have the trigger and dispatch scripts running in the background. As a general point, any perl script you write that wants to use some VRPipe code should typically "use VRPipe::Persistent::Schema;". After that most things should work without further "use" statements. EXTERNAL SOFTWARE ----------------- Some software need environment variables setup (using setenv in csh or export in bash). The following list shows the name of the software, the environment variable you need to set, and the value you should set it to, separated by commas. samtools,SAMTOOLS,/path/to/samtools/source_directory GATK,GATK,/path/to/gatk_jar_files picard,PICARD,/path/to/picard_jar_files eg. to have GATK work properly in the pipelines you might do: setenv GATK /path/to/gatk_jar_files or export GATK=/path/to/gatk_jar_files depending on what shell you are using. COPYRIGHT & LICENSE ------------------- Copyright (c) 2011-2012 Genome Research Limited. This file is part of VRPipe. VRPipe is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see L<http://www.gnu.org/licenses/>. The usage of a range of years within a copyright statement contained within this distribution should be interpreted as being equivalent to a list of years including the first and last year specified and all consecutive years between them. For example, a copyright statement that reads 'Copyright (c) 2005, 2007- 2009, 2011-2012' should be interpreted as being identical to a statement that reads 'Copyright (c) 2005, 2007, 2008, 2009, 2011, 2012' and a copyright statement that reads "Copyright (c) 2005-2012' should be interpreted as being identical to a statement that reads 'Copyright (c) 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012'."
About
Generic pipeline system
Resources
License
Stars
Watchers
Forks
Packages 0
No packages published
Languages
- Perl 99.9%
- Visual Basic .NET 0.1%