Python implementation of ARGs_OAP
git clone https://github.com/xinehc/pysarg
cd pysarg
## use python3 if needed
python setup.py install
- the source code is also uploaded to pypi, try:
## make take a while for building the wheel, as 2 different versions of diamond need to be compiled
## use pip3 if needed
pip install pysarg
- pre-compiled conda packages (
osx-64
orlinux-64
, python 3.6)
conda install -c xinehc pysarg
- only python=3.6 package has been uploaded, if python!=3.6, create a new conda environment
conda create -n pysarg -c xinehc pysarg python=3.6
source activate pysarg
Two toy examples (100k paired-end reads, 100bp each) are provided in example/inputdir
:
# git clone https://github.com/xinehc/pysarg
# mkdir -p pysarg/example/outputdir
pysarg stage_one -i pysarg/example/inputdir -o pysarg/example/outputdir
pysarg stage_two -i pysarg/example/outputdir/extracted.fasta -m pysarg/example/outputdir/metadata.txt -o pysarg/example/outputdir
## LINUX only: add flag --original to get Ublastx_stageone's results (required to use the pre-compiled binaries in Ublastx_stageone)
pysarg stage_one -i pysarg/example/inputdir -o pysarg/example/outputdir --original
pysarg stage_two -i pysarg/example/outputdir/extracted.fasta -m pysarg/example/outputdir/metadata.txt -o pysarg/example/outputdir --original
If everything is ok, there should be four output files in example/outputdir
metadata.txt
sample | read_length | read_number | 16s_number | cell_number |
---|---|---|---|---|
STAS | 100 | 200000 | 9.776536312849162 | 3.05292019025543 |
SWHAS104 | 100 | 200000 | 9.35754189944134 | 3.3635174193105737 |
output.txt
sample | sequence | gene | gene_length | gene_type | gene_subtype | covered_length |
---|---|---|---|---|---|---|
STAS | STAS_30 | gi|671541568|ref|WP_031525212.1| | 648 | macrolide-lincosamide-streptogramin | macrolide-lincosamide-streptogramin__macB | 31 |
STAS | STAS_61 | NP_840140 | 273 | bacitracin | bacitracin__bacA | 32 |
STAS | STAS_70 | gi|764440891|ref|WP_044366757.1| | 439 | multidrug | ||
... | ... | ... | ... | ... | ... | ... |
- the above two tables can be merged on column
sample
and then used for normalizing the ARG counts and drawing PCA plots. One example is provided innotebook/normalize_sarg.ipynb
extracted.fasta
is the pre-filtered ARG-like sequences in stage-one,extracted.blast
is the blastx result ofextracted.fasta
in stage-two.