Petasearch enables searching through the largest sets of proteins.
Petasearch
depends on block-aligner
for fast computation
of Smith-Waterman alignments in the blockalign
module. Thus, the Rust Programming
Langugage needs be installed on the user's machine.
Clone this repository to your local machine. After cloning, navigate to the project folder and run the following commands.
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make -j 16
Afterwards, the built binary srasearch
can be found in ./build/src/
. You can add the path to the binary to your $PATH
in order to user it easily.
To achieve space efficiency, srasearch
will store the target databases in a specific highly-compressed format. You can use convert2sradb
to convert a FASTA/FASTQ file or a MMseqs2 database into a srasearch database.
Example usage:
srasearch convert2sradb target.fasta targetDB
srasearch convert2sradb mmseqsDB targetDB
srasearch
requires pre-indexing the target database by calling the module
createkmertable
first. You can use the following command to generate the kmer table and ID table for a target
database called targetDB
.
srasearch createkmertable targetDB target_kmertable
Petasearch
provides a combined workflow that will produce only one output file alignments.m8
that contain all the
search results from searching queryDB
against all the target databases listed in targetlist
.
srasearch petasearch queryDB targetlist resultlist alignments.m8 tmp
Petasearch
also provides an easy workflow that will accept fasta
file as the input query dataset. The user also do
not need to provide resultlist
as an input.
sraserach easy-petasearch query.fasta targetlist alignments.m8 tmp
To search a MMseqs2 database against a list of Petasearch databases, simply run:
srasearch comparekmertables queryDB targetlist resultlist
targetlist
should be a file containing all target databases and kmer tables. An example targetlist
would look
like this:
target_kmertable1 targetDB1
target_kmertable2 targetDB2
resultlist
should be a file containing all file names for output files to store the prefiltering result. An example
resultlist
would look like this:
compkmer_res_1
compkmer_res_2
srasearch blockalign queryDB targetDB1 compkmer_res_1 compali_res_1
srasearch convertsraalis queryDB targetDB1 compali_res_1 alignments_1.m8
resultlist
should be a file with the same amount of entries of targetlist
. Those are the file names