pysarg

Python implementation of ARGs_OAP

Installation

build from source (cmake, zlib and libpthread are required for building diamond and minimap2):

git clone https://github.com/xinehc/pysarg
cd pysarg
## use python3 if needed
python setup.py install

the source code is also uploaded to pypi, try:

## make take a while for building the wheel, as 2 different versions of diamond need to be compiled
## use pip3 if needed
pip install pysarg

pre-compiled conda packages (osx-64 or linux-64, python 3.6)

conda install -c xinehc pysarg

only python=3.6 package has been uploaded, if python!=3.6, create a new conda environment

conda create -n pysarg -c xinehc pysarg python=3.6
source activate pysarg

Example

Two toy examples (100k paired-end reads, 100bp each) are provided in example/inputdir:

# git clone https://github.com/xinehc/pysarg
# mkdir -p pysarg/example/outputdir
pysarg stage_one -i pysarg/example/inputdir -o pysarg/example/outputdir
pysarg stage_two -i pysarg/example/outputdir/extracted.fasta -m pysarg/example/outputdir/metadata.txt -o pysarg/example/outputdir 

## LINUX only: add flag --original to get Ublastx_stageone's results (required to use the pre-compiled binaries in Ublastx_stageone)
pysarg stage_one -i pysarg/example/inputdir -o pysarg/example/outputdir --original
pysarg stage_two -i pysarg/example/outputdir/extracted.fasta -m pysarg/example/outputdir/metadata.txt -o pysarg/example/outputdir --original

If everything is ok, there should be four output files in example/outputdir

metadata.txt

sample	read_length	read_number	16s_number	cell_number
STAS	100	200000	9.776536312849162	3.05292019025543
SWHAS104	100	200000	9.35754189944134	3.3635174193105737

output.txt

sample	sequence	gene	gene_length	gene_type	gene_subtype	covered_length
STAS	STAS_30	gi\|671541568\|ref\|WP_031525212.1\|	648	macrolide-lincosamide-streptogramin	macrolide-lincosamide-streptogramin__macB	31
STAS	STAS_61	NP_840140	273	bacitracin	bacitracin__bacA	32
STAS	STAS_70	gi\|764440891\|ref\|WP_044366757.1\|	439	multidrug
...	...	...	...	...	...	...

the above two tables can be merged on column sample and then used for normalizing the ARG counts and drawing PCA plots. One example is provided in notebook/normalize_sarg.ipynb
extracted.fasta is the pre-filtered ARG-like sequences in stage-one, extracted.blast is the blastx result of extracted.fasta in stage-two.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github/workflows		.github/workflows
example		example
notebook		notebook
pysarg		pysarg
.gitignore		.gitignore
MANIFEST.in		MANIFEST.in
README.md		README.md
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pysarg

Installation

Example

About

Releases

Packages

Languages

xiaole99/pysarg

Folders and files

Latest commit

History

Repository files navigation

pysarg

Installation

Example

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages