./download_example_data.sh
This script will download the following files for training, mutation prediction, and sequence generation:
datasets/sequences/BLAT_ECOLX_1_b0.5_lc_weights.fa
datasets/nanobodies/Manglik_filt_seq_id80_id90.fa
datasets/nanobodies/Manglik_labelled_nanobodies.txt
calc_logprobs/input/BLAT_ECOLX_r24-286_Ranganathan2015.fa
sess/BLAT_ECOLX_v2_channels-48_rseed-11_19Aug16_0626PM.ckpt-250000*
sess/nanobody.ckpt-250000*
./download_sequences.sh
This script will download all training sequences from the paper
to datasets/sequences/
and datasets/nanobodies/
./download_effect_predictions.sh
This script will download all mutation effect predictions
from the paper to calc_logprobs/output/
./download_generated_nanobodies.sh
This script will download the designed nanobody library to generated/
./demo_train.sh
This script will run 100 training iterations on the β-Lactamase sequence dataset
(the full model runs 250,000 iterations).
The final model checkpoint will appear as three files in
sess/BLAT_ECOLX_elu_channels-48_rseed-11_<timestamp>.ckpt-100*
On an AWS p2.xlarge instance, this demonstration took 2 minutes.
./demo_calc_logprobs.sh
This script will use the pretrained model weights in
sess/BLAT_ECOLX_v2_channels-48_rseed-11_19Aug16_0626PM.ckpt-250000*
to make mutation effect predictions for the β-Lactamase mutational scan from
Stiffler et al., Cell, 2015.
The final predictions are the average of 10 predictions
(500 are used in the full test).
These predictions will appear in
calc_logprobs/output/demo_BLAT_ECOLX_r24-286_Ranganathan2015_rseed-11_channels-48_dropoutp-0.5.csv
On an AWS p2.xlarge instance, this demonstration took 3.5 minutes.
./demo_generate.sh
./demo_generate_fast.sh
This will generate nanobody CDR3 and FRA4 sequences given a preceding VH sequence.
The full nanobody sequences will be output in
generated/nanobody.ckpt-250000_temp-1.0_rseed-42.fa
On an AWS p2.xlarge instance, demo_generate.sh
took 2.5 minutes and
demo_generate_fast.sh
took 30 seconds.