AminoX

AminoX is a Natural Language Processing (NLP) Recurrent Neural Network (RNN) making use of a Long Short Term Memory (LSTM) architecture for the determination of a single missing aminoacid in a given input primary sequence:

'SPSSLSTNTTSA ? PTLTSEPR' → 'SPSSLSTNTTSA S PTLTSEPR'

The input dataset is generated with ProtGPT2, a language model trained on protein space (https://huggingface.co/nferruz/ProtGPT2). Protgpt2_seq_gen.py allows for the generation of N different aminoacid sequences, of length comprised between a settable minimum (100) and a settable maximum (300).

The input data is then organised in minibatches of length 100, corresponding to the minimum length of the sequences. Each minibatch correspond to a sequence where, in turn, each aminoacid is substituted with the character '?' to represent a missing aminoacid, and its target output will be the unmodified sequence.

After training epochs, predictions are made over the test set. For each amino acid in this dataset, the whole list of aminoacids is shown, in decreasing order of prediction likelihood. A confusion matrix is also plotted, in order to show how aminoacids are correctly/incorrectly predicted.

AminoX is adapted from Unit 6/7 of this NLP tutorial: https://learn.microsoft.com/en-us/training/modules/intro-natural-language-processing-pytorch/1-introduction

Required Libraries

Python modules required:

numpy >= 1.22.3
torch >= 1.12.1+cu116 (pip install torch==1.12.1+cu116 -f https://download.pytorch.org/whl/torch_stable.html)
torchtext >= 0.13.1
matplotlib.pyplot >= 3.4.3
transformers >= 4.22.2 (required by Protgpt2_seq_gen.py)

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
example_input		example_input
example_output		example_output
src		src
AminoX.py		AminoX.py
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AminoX

Required Libraries

Example Confusion Matrix

About

Releases

Packages

Languages

License

alescrnjar/AminoX

Folders and files

Latest commit

History

Repository files navigation

AminoX

Required Libraries

Example Confusion Matrix

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages