The dms_variants.globalepistasis
module is based off the "global epistasis" concepts of Otwinoski et
al and Sailer and
Harms -- that there exists an underlying latent phenotype that mutations affect additively, and
then an observed (measured) phenotype that is a non-linear
function of the latent phenotype.
Import Python modules / packages:
import collections
import os
import itertools
import random
import tempfile
import time
import warnings
import pandas as pd
from plotnine import *
import dms_variants.binarymap
import dms_variants.codonvarianttable
import dms_variants.globalepistasis
import dms_variants.plotnine_themes
import dms_variants.simulate
from dms_variants.constants import CBPALETTE, CODONS_NOSTOP
import yaml
Set the gray grid plotnine theme defined in dms_variants.plotnine_themes:
theme_set(dms_variants.plotnine_themes.theme_graygrid())
Versions of key software:
print(f"Using dms_variants version {dms_variants.__version__}")
Using dms_variants version 0.6.0
Set pandas display options to show large chunks of Data Frames in this example:
pd.set_option('display.max_columns', 30)
pd.set_option('display.width', 500)
Hide warnings that clutter output:
warnings.simplefilter('ignore')
Read the configuration file:
with open('config.yaml') as f:
config = yaml.safe_load(f)
Make output directory if needed:
os.makedirs(config['global_epistasis_binding_dir'], exist_ok=True)
os.makedirs(config['figs_dir'], exist_ok=True)
Read in log10Ka measurements. Rename log10Ka column to be func_score, and calculate variance from the ddG_log10Ka column. Remove rows with NaN for func_score. For empty aa_substitutions (wildtype), it is being replaced with NA. Make back to an empty string.
df = pd.read_csv(config['Titeseq_Kds_file'])
df.rename(columns={'log10Ka':'func_score'},inplace=True)
df['func_score_var'] = df['log10SE']**2
func_scores = df[pd.notnull(df['func_score'])]
func_scores.fillna('',inplace=True)
func_scores.head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
library | target | barcode | variant_call_support | avgcount | func_score | delta_log10Ka | log10SE | response | baseline | nMSR | variant_class | aa_substitutions | n_aa_substitutions | func_score_var | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | lib1 | SARS-CoV-2 | AAAAAAAAACTTCAAT | 5 | 74.50 | 8.72 | -2.05 | 0.12 | 1.64 | 1.13 | 0.01 | >1 nonsynonymous | A22C R127G E141D L188V | 4 | 0.0144 |
3 | lib1 | SARS-CoV-2 | AAAAAAAACAAGCAGA | 6 | 146.32 | 10.35 | -0.42 | 0.05 | 2.87 | 1.01 | 0.00 | 1 nonsynonymous | N13F | 1 | 0.0025 |
5 | lib1 | SARS-CoV-2 | AAAAAAAACAGGTTGC | 4 | 47.48 | 6.00 | -4.76 | 2.00 | 1.50 | 1.15 | 0.01 | >1 nonsynonymous | V71K P149L N157T | 3 | 4.0000 |
6 | lib1 | SARS-CoV-2 | AAAAAAAACATTAAAT | 6 | 47.44 | 10.15 | -0.61 | 0.09 | 2.64 | 1.09 | 0.00 | >1 nonsynonymous | A18V T148S H189Y | 3 | 0.0081 |
8 | lib1 | SARS-CoV-2 | AAAAAAAACCTTACAA | 2 | 18.78 | 9.61 | -1.15 | 0.19 | 2.00 | 1.11 | 0.01 | >1 nonsynonymous | T63D A89N | 2 | 0.0361 |
We now fit global epistasis models to the functional scores. For background on these models, see Otwinoski et al (2018). The models we fits are implemented in dms_variants.globalepistasis, which has extensive documentation.
The primary model of interest is MonotonicSplineEpistasis, which assumes that the observed phenotype is a monotonic non-linear function of an underlying additive latent phenotype. As a control, we also fit a NoEpistasis model, which assumes that mutations simply contribute additively to the observed phenotype:
For the fitting, we first convert the set of functional scores into a binary representation using a BinaryMap. Then we create the model, fit it, and store it.
# NBVAL_IGNORE_OUTPUT
models = {} # store models, keyed by `(epistasistype, likelihoodtype, lib)`
for (lib), scores in func_scores.groupby(['library']):
bmap = dms_variants.binarymap.BinaryMap(scores)
for epistasistype, likelihoodtype, Model in [
('global epistasis', 'Gaussian', dms_variants.globalepistasis.MonotonicSplineEpistasisGaussianLikelihood),
('no epistasis', 'Gaussian', dms_variants.globalepistasis.NoEpistasisGaussianLikelihood),
('global epistasis', 'Cauchy', dms_variants.globalepistasis.MonotonicSplineEpistasisCauchyLikelihood),
('no epistasis', 'Cauchy', dms_variants.globalepistasis.NoEpistasisCauchyLikelihood),
]:
print(f"Fitting {epistasistype} with {likelihoodtype} likelihood model to {lib}...", end=' ')
start = time.time()
model = Model(bmap)
model.fit() # do NOT change ftol in normal use, this is just for test
print(f"fitting took {time.time() - start:.1f} sec.")
models[(epistasistype, likelihoodtype, lib)] = model
Fitting global epistasis with Gaussian likelihood model to lib1... fitting took 87.7 sec.
Fitting no epistasis with Gaussian likelihood model to lib1... fitting took 3.2 sec.
Fitting global epistasis with Cauchy likelihood model to lib1... fitting took 218.3 sec.
Fitting no epistasis with Cauchy likelihood model to lib1... fitting took 16.3 sec.
Fitting global epistasis with Gaussian likelihood model to lib2... fitting took 83.2 sec.
Fitting no epistasis with Gaussian likelihood model to lib2... fitting took 3.3 sec.
Fitting global epistasis with Cauchy likelihood model to lib2... fitting took 240.6 sec.
Fitting no epistasis with Cauchy likelihood model to lib2... fitting took 20.5 sec.
Now we want to see which model fits the data better. To do this, we get the log likelihood of each model along with the number of model parameters and use it to calculate the AIC. Models with lower AIC are better, and below we see that the MonotonicSplineEpistasis global epistasis model always fits the data much better. (Note, I previous tried alterning number of meshpoints, and the default 4 was the lowest AIC):
# NBVAL_IGNORE_OUTPUT
logliks_df = (
pd.DataFrame.from_records(
[(epistasistype, likelihoodtype, lib, model.nparams, model.loglik) for
(epistasistype, likelihoodtype, lib), model in models.items()],
columns=['model', 'likelihood type', 'library',
'n_parameters', 'log_likelihood']
)
.assign(AIC=lambda x: 2 * x['n_parameters'] - 2 * x['log_likelihood'])
.set_index(['library'])
)
logliks_df.round(1)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
model | likelihood type | n_parameters | log_likelihood | AIC | |
---|---|---|---|---|---|
library | |||||
lib1 | global epistasis | Gaussian | 3801 | -53065.8 | 113733.5 |
lib1 | no epistasis | Gaussian | 3795 | -80441.6 | 168473.3 |
lib1 | global epistasis | Cauchy | 3801 | -39036.0 | 85674.0 |
lib1 | no epistasis | Cauchy | 3795 | -60021.4 | 127632.9 |
lib2 | global epistasis | Gaussian | 3798 | -49768.0 | 107132.0 |
lib2 | no epistasis | Gaussian | 3792 | -73422.4 | 154428.8 |
lib2 | global epistasis | Cauchy | 3798 | -32464.9 | 72525.8 |
lib2 | no epistasis | Cauchy | 3792 | -54067.1 | 115718.3 |
# check to confirm the global epistasis model is better in all cases
assert (logliks_df
.reset_index()
.pivot_table(index=['library'],
values='AIC',
columns='model'
)
.assign(global_better=lambda x: x['global epistasis'] < x['no epistasis'])
['global_better']
.all()
)
Because the MonotonicSplineEpistasis global epistasis models fit so much better than the additive NoEpistasis models, below we are just going to analyze the results from the global epistasis model.
First, we will examine how the model looks on all the actual variants used to fit the model. We use AbstractEpistasis.phenotypes_df to get a data frame of all the variants used to fit each global epistasis model along with their functional scores and the latent and observed phenotypes predicted by each model.
# NBVAL_IGNORE_OUTPUT
variants_df = pd.concat(
[model.phenotypes_df
.assign(library=lib,
likelihoodtype=likelihoodtype,
)
for (epistasistype, likelihoodtype, lib), model in models.items()
if (epistasistype == 'global epistasis')],
ignore_index=True, sort=False)
#predictionsfile = os.path.join(config['global_epistasis_binding_dir'], 'globalepistasis_binding_predictions.csv')
#variants_df.to_csv(predictionsfile, index=False)
#print(f"Writing predictions to {predictionsfile}")
variants_df.head().round(2)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
aa_substitutions | func_score | func_score_var | latent_phenotype | observed_phenotype | library | likelihoodtype | |
---|---|---|---|---|---|---|---|
0 | A22C R127G E141D L188V | 8.72 | 0.01 | -1.54 | 9.06 | lib1 | Gaussian |
1 | N13F | 10.35 | 0.00 | -0.62 | 10.46 | lib1 | Gaussian |
2 | V71K P149L N157T | 6.00 | 4.00 | -3.99 | 6.64 | lib1 | Gaussian |
3 | A18V T148S H189Y | 10.15 | 0.01 | -0.89 | 10.14 | lib1 | Gaussian |
4 | T63D A89N | 9.61 | 0.04 | -1.33 | 9.43 | lib1 | Gaussian |
Below we plot the relationships among the latent phenotype from the model, the observed phenotype from the model, and the measured functional score for all variants used to fit the model:
for x, y in itertools.combinations(['latent_phenotype',
'observed_phenotype',
'func_score'],
2):
p = (
ggplot(variants_df, aes(x, y)) +
geom_point(alpha=0.05, size=0.5) +
facet_grid('library ~ likelihoodtype', scales='free_y') +
theme(figure_size=(2 * variants_df['likelihoodtype'].nunique(),
2 * variants_df['library'].nunique()),
)
)
# plotfile = os.path.join(config['figs_dir'], f'{y}-v-{x}_by_{epistasistype}.pdf')
# print(f"Saving to {plotfile}")
# p.save(plotfile)
_ = p.draw()
The two Cauchy fits looks good -- clearly picking up on the complete censoring of KA,app values >106 to the 106 boundary condition. The Gaussian likelihood model fails to find this boundary in the lib2 data, and in the lib1 data it seems to fit a boundary closer to 7 than to 6. I am curious to see how these curves change when I remove the SE/variance estimates, which get wonky in these values at this boundary (and are of uncertain value even for points within the dynamic range).
The Cauchy likelihood fits also have fewer points that are observed at this boundary condition in the experimental functional scores but given latent phenotypes near neutral (aka, likely "false positives"). Overall, these models that use the Cauchy liklelihood look better, which is consistent with what I've seen from previous Tite-seq results. We will confirm this later on when we evaluate the correlation in coefficients between replicates from these different models.
To get a better view into the "shape" of global epistasis, let's re-make the plots above but only showing single mutant barcodes.
mask = (variants_df['aa_substitutions'].str.len() < 6) & (variants_df['aa_substitutions'].str.len() > 0)
single_variants_df = variants_df.loc[mask]
single_variants_df.head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
aa_substitutions | func_score | func_score_var | latent_phenotype | observed_phenotype | library | likelihoodtype | |
---|---|---|---|---|---|---|---|
1 | N13F | 10.35 | 0.0025 | -0.620878 | 10.456437 | lib1 | Gaussian |
10 | P7S | 10.67 | 0.0081 | -0.082381 | 10.731078 | lib1 | Gaussian |
21 | P149Q | 10.72 | 0.0025 | -0.234197 | 10.714161 | lib1 | Gaussian |
24 | D90Y | 10.57 | 0.0049 | -0.622221 | 10.455074 | lib1 | Gaussian |
26 | L5C | 10.69 | 0.0036 | -0.257591 | 10.706989 | lib1 | Gaussian |
for x, y in itertools.combinations(['latent_phenotype',
'observed_phenotype',
'func_score'],
2):
p = (
ggplot(single_variants_df, aes(x, y)) +
geom_point(alpha=0.05, size=0.5) +
facet_grid('library ~ likelihoodtype', scales='free_y') +
theme(figure_size=(2 * variants_df['likelihoodtype'].nunique(),
2 * variants_df['library'].nunique()),
)
)
# plotfile = os.path.join(config['figs_dir'], f'{y}-v-{x}_by_{epistasistype}.pdf')
# print(f"Saving to {plotfile}")
# p.save(plotfile)
_ = p.draw()
#add predicted and latent phenotypes to table by barcode with additional columns used for interpretation, save output file
dt = pd.read_csv(config['Titeseq_Kds_file'])
dt[['aa_substitutions']] = dt[['aa_substitutions']].fillna(value='')
dt = models.get(('global epistasis', 'Gaussian', 'lib1')).add_phenotypes_to_df(df=dt, substitutions_col='aa_substitutions',latent_phenotype_col='latent_phenotype_Gaussian_1',observed_phenotype_col='predicted_phenotype_Gaussian_1',unknown_as_nan=True)
dt = models.get(('global epistasis', 'Cauchy', 'lib1')).add_phenotypes_to_df(df=dt, substitutions_col='aa_substitutions',latent_phenotype_col='latent_phenotype_Cauchy_1',observed_phenotype_col='predicted_phenotype_Cauchy_1',unknown_as_nan=True)
dt = models.get(('global epistasis', 'Gaussian', 'lib2')).add_phenotypes_to_df(df=dt, substitutions_col='aa_substitutions',latent_phenotype_col='latent_phenotype_Gaussian_2',observed_phenotype_col='predicted_phenotype_Gaussian_2',unknown_as_nan=True)
dt = models.get(('global epistasis', 'Cauchy', 'lib2')).add_phenotypes_to_df(df=dt, substitutions_col='aa_substitutions',latent_phenotype_col='latent_phenotype_Cauchy_2',observed_phenotype_col='predicted_phenotype_Cauchy_2',unknown_as_nan=True)
dt.to_csv(config['global_epistasis_binding_file'], index=False)
print(f"Writing predictions to {config['global_epistasis_binding_file']}")
dt.head().round(2)
Writing predictions to results/global_epistasis_binding/global_epistasis_binding_predictions.csv
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
library | target | barcode | variant_call_support | avgcount | log10Ka | delta_log10Ka | log10SE | response | baseline | nMSR | variant_class | aa_substitutions | n_aa_substitutions | latent_phenotype_Gaussian_1 | predicted_phenotype_Gaussian_1 | latent_phenotype_Cauchy_1 | predicted_phenotype_Cauchy_1 | latent_phenotype_Gaussian_2 | predicted_phenotype_Gaussian_2 | latent_phenotype_Cauchy_2 | predicted_phenotype_Cauchy_2 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | lib1 | SARS-CoV-2 | AAAAAAAAAATTGTAA | 1 | 0.00 | NaN | NaN | NaN | NaN | NaN | NaN | >1 nonsynonymous | Y91L K199Y | 2 | -0.34 | 10.67 | -0.33 | 10.66 | -0.41 | 10.66 | -0.33 | 10.69 |
1 | lib1 | SARS-CoV-2 | AAAAAAAAACTTAAAT | 2 | 4.30 | NaN | NaN | NaN | NaN | NaN | NaN | >1 nonsynonymous | N13S L60P K94N S147T C150Y | 5 | -4.56 | 6.61 | -4.76 | 6.14 | -4.56 | 6.54 | -4.82 | 6.06 |
2 | lib1 | SARS-CoV-2 | AAAAAAAAACTTCAAT | 5 | 74.50 | 8.72 | -2.05 | 0.12 | 1.64 | 1.13 | 0.01 | >1 nonsynonymous | A22C R127G E141D L188V | 4 | -1.54 | 9.06 | -1.38 | 9.43 | -1.57 | 9.07 | -1.23 | 9.67 |
3 | lib1 | SARS-CoV-2 | AAAAAAAACAAGCAGA | 6 | 146.32 | 10.35 | -0.42 | 0.05 | 2.87 | 1.01 | 0.00 | 1 nonsynonymous | N13F | 1 | -0.62 | 10.46 | -0.67 | 10.38 | -0.63 | 10.46 | -0.70 | 10.36 |
4 | lib1 | SARS-CoV-2 | AAAAAAAACAATATAA | 1 | 2.48 | NaN | NaN | NaN | NaN | NaN | NaN | >1 nonsynonymous | C6K T15W K94Y V103W | 4 | -7.48 | 6.61 | -6.45 | 6.12 | -5.29 | 6.53 | -7.16 | 6.03 |
Repeat the fits for all barcodes pooled together. There is slight variation in the average KD,app ascribed to wildtype genotypes in each library. To account for this, express the functional score in each library as the delta_log10Ka relative to the average of wildtype measurements in that library, thereby standardizing the small difference in mean WT between the two replicates and avoiding consequential artefacts in a joint fit.
df_joint = pd.read_csv(config['Titeseq_Kds_file'])
df_joint.rename(columns={'delta_log10Ka':'func_score'},inplace=True)
df_joint['func_score_var'] = df_joint['log10SE']**2
func_scores_joint = df_joint[pd.notnull(df_joint['func_score'])]
func_scores_joint.fillna('',inplace=True)
func_scores_joint.head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
library | target | barcode | variant_call_support | avgcount | log10Ka | func_score | log10SE | response | baseline | nMSR | variant_class | aa_substitutions | n_aa_substitutions | func_score_var | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | lib1 | SARS-CoV-2 | AAAAAAAAACTTCAAT | 5 | 74.50 | 8.72 | -2.05 | 0.12 | 1.64 | 1.13 | 0.01 | >1 nonsynonymous | A22C R127G E141D L188V | 4 | 0.0144 |
3 | lib1 | SARS-CoV-2 | AAAAAAAACAAGCAGA | 6 | 146.32 | 10.35 | -0.42 | 0.05 | 2.87 | 1.01 | 0.00 | 1 nonsynonymous | N13F | 1 | 0.0025 |
5 | lib1 | SARS-CoV-2 | AAAAAAAACAGGTTGC | 4 | 47.48 | 6.00 | -4.76 | 2.00 | 1.50 | 1.15 | 0.01 | >1 nonsynonymous | V71K P149L N157T | 3 | 4.0000 |
6 | lib1 | SARS-CoV-2 | AAAAAAAACATTAAAT | 6 | 47.44 | 10.15 | -0.61 | 0.09 | 2.64 | 1.09 | 0.00 | >1 nonsynonymous | A18V T148S H189Y | 3 | 0.0081 |
8 | lib1 | SARS-CoV-2 | AAAAAAAACCTTACAA | 2 | 18.78 | 9.61 | -1.15 | 0.19 | 2.00 | 1.11 | 0.01 | >1 nonsynonymous | T63D A89N | 2 | 0.0361 |
# NBVAL_IGNORE_OUTPUT
models_joint = {} # store models, keyed by `(epistasistype, likelihoodtype)`
for (target), scores in func_scores_joint.groupby(['target']):
bmap = dms_variants.binarymap.BinaryMap(scores)
for epistasistype, likelihoodtype, Model in [
('global epistasis', 'Gaussian', dms_variants.globalepistasis.MonotonicSplineEpistasisGaussianLikelihood),
('no epistasis', 'Gaussian', dms_variants.globalepistasis.NoEpistasisGaussianLikelihood),
('global epistasis', 'Cauchy', dms_variants.globalepistasis.MonotonicSplineEpistasisCauchyLikelihood),
('no epistasis', 'Cauchy', dms_variants.globalepistasis.NoEpistasisCauchyLikelihood),
]:
print(f"Fitting {epistasistype} with {likelihoodtype} likelihood model...", end=' ')
start = time.time()
model = Model(bmap)
model.fit() # do NOT change ftol in normal use, this is just for test
print(f"fitting took {time.time() - start:.1f} sec.")
models_joint[(epistasistype, likelihoodtype)] = model
Fitting global epistasis with Gaussian likelihood model... fitting took 150.1 sec.
Fitting no epistasis with Gaussian likelihood model... fitting took 6.1 sec.
Fitting global epistasis with Cauchy likelihood model... fitting took 125.0 sec.
Fitting no epistasis with Cauchy likelihood model... fitting took 29.6 sec.
# NBVAL_IGNORE_OUTPUT
variants_df_joint = pd.concat(
[model.phenotypes_df
.assign(likelihoodtype=likelihoodtype)
for (epistasistype, likelihoodtype), model in models_joint.items()
if (epistasistype == 'global epistasis')],
ignore_index=True, sort=False)
variants_df_joint.head().round(2)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
aa_substitutions | func_score | func_score_var | latent_phenotype | observed_phenotype | likelihoodtype | |
---|---|---|---|---|---|---|
0 | A22C R127G E141D L188V | -2.05 | 0.01 | -1.60 | -1.75 | Gaussian |
1 | N13F | -0.42 | 0.00 | -0.64 | -0.33 | Gaussian |
2 | V71K P149L N157T | -4.76 | 4.00 | -6.19 | -4.19 | Gaussian |
3 | A18V T148S H189Y | -0.61 | 0.01 | -0.92 | -0.67 | Gaussian |
4 | T63D A89N | -1.15 | 0.04 | -1.31 | -1.27 | Gaussian |
for x, y in itertools.combinations(['latent_phenotype',
'observed_phenotype',
'func_score'],
2):
p = (
ggplot(variants_df_joint, aes(x, y)) +
geom_point(alpha=0.05, size=0.5))
# facet_grid('likelihoodtype', scales='free_y') +
# theme(figure_size=(2 * variants_df_novar['likelihoodtype'].nunique()),
# )
# )
# plotfile = os.path.join(config['figs_dir'], f'{y}-v-{x}_by_{epistasistype}.pdf')
# print(f"Saving to {plotfile}")
# p.save(plotfile)
_ = p.draw()
We will look more at each of these models (Cauchy and Gaussian likelihoods, observed and latent scale measurements) in the next notebook, in which we evaluate coefficients, process our final phenotype map, and compare our measurements to some validation datasets.
#lib1 models
models.get(('global epistasis', 'Gaussian', 'lib1')).single_mut_effects(phenotype='latent',standardize_range=False).to_csv('results/global_epistasis_binding/Gaussian-latent-effects_binding_1.csv',index=False)
models.get(('global epistasis', 'Gaussian', 'lib1')).single_mut_effects(phenotype='observed',standardize_range=False).to_csv('results/global_epistasis_binding/Gaussian-predicted-effects_binding_1.csv',index=False)
models.get(('global epistasis', 'Cauchy', 'lib1')).single_mut_effects(phenotype='latent',standardize_range=False).to_csv('results/global_epistasis_binding/Cauchy-latent-effects_binding_1.csv',index=False)
models.get(('global epistasis', 'Cauchy', 'lib1')).single_mut_effects(phenotype='observed',standardize_range=False).to_csv('results/global_epistasis_binding/Cauchy-predicted-effects_binding_1.csv',index=False)
models.get(('no epistasis', 'Gaussian', 'lib1')).single_mut_effects(phenotype='observed',standardize_range=False).to_csv('results/global_epistasis_binding/nonepistatic-Gaussian-predicted-effects_binding_1.csv',index=False)
models.get(('no epistasis', 'Cauchy', 'lib1')).single_mut_effects(phenotype='observed',standardize_range=False).to_csv('results/global_epistasis_binding/nonepistatic-Cauchy-predicted-effects_binding_1.csv',index=False)
#lib2 models
models.get(('global epistasis', 'Gaussian', 'lib2')).single_mut_effects(phenotype='latent',standardize_range=False).to_csv('results/global_epistasis_binding/Gaussian-latent-effects_binding_2.csv',index=False)
models.get(('global epistasis', 'Gaussian', 'lib2')).single_mut_effects(phenotype='observed',standardize_range=False).to_csv('results/global_epistasis_binding/Gaussian-predicted-effects_binding_2.csv',index=False)
models.get(('global epistasis', 'Cauchy', 'lib2')).single_mut_effects(phenotype='latent',standardize_range=False).to_csv('results/global_epistasis_binding/Cauchy-latent-effects_binding_2.csv',index=False)
models.get(('global epistasis', 'Cauchy', 'lib2')).single_mut_effects(phenotype='observed',standardize_range=False).to_csv('results/global_epistasis_binding/Cauchy-predicted-effects_binding_2.csv',index=False)
models.get(('no epistasis', 'Gaussian', 'lib2')).single_mut_effects(phenotype='observed',standardize_range=False).to_csv('results/global_epistasis_binding/nonepistatic-Gaussian-predicted-effects_binding_2.csv',index=False)
models.get(('no epistasis', 'Cauchy', 'lib2')).single_mut_effects(phenotype='observed',standardize_range=False).to_csv('results/global_epistasis_binding/nonepistatic-Cauchy-predicted-effects_binding_2.csv',index=False)
#joint models
models_joint.get(('global epistasis', 'Gaussian')).single_mut_effects(phenotype='latent',standardize_range=False).to_csv('results/global_epistasis_binding/Gaussian-latent-effects_binding_joint.csv',index=False)
models_joint.get(('global epistasis', 'Gaussian')).single_mut_effects(phenotype='observed',standardize_range=False).to_csv('results/global_epistasis_binding/Gaussian-predicted-effects_binding_joint.csv',index=False)
models_joint.get(('global epistasis', 'Cauchy')).single_mut_effects(phenotype='latent',standardize_range=False).to_csv('results/global_epistasis_binding/Cauchy-latent-effects_binding_joint.csv',index=False)
models_joint.get(('global epistasis', 'Cauchy')).single_mut_effects(phenotype='observed',standardize_range=False).to_csv('results/global_epistasis_binding/Cauchy-predicted-effects_binding_joint.csv',index=False)
models_joint.get(('no epistasis', 'Gaussian')).single_mut_effects(phenotype='observed',standardize_range=False).to_csv('results/global_epistasis_binding/nonepistatic-Gaussian-predicted-effects_binding_joint.csv',index=False)
models_joint.get(('no epistasis', 'Cauchy')).single_mut_effects(phenotype='observed',standardize_range=False).to_csv('results/global_epistasis_binding/nonepistatic-Cauchy-predicted-effects_binding_joint.csv',index=False)
#! jupyter nbconvert --to markdown global_epistasis_binding.ipynb --output-dir ./results/summary/ --output global_epistasis_binding.md