Skip to content

Latest commit

 

History

History
1249 lines (1057 loc) · 36.1 KB

global_epistasis_binding.md

File metadata and controls

1249 lines (1057 loc) · 36.1 KB

Fit global epistasis models to DMS data

The dms_variants.globalepistasis module is based off the "global epistasis" concepts of Otwinoski et al and Sailer and Harms -- that there exists an underlying latent phenotype that mutations affect additively, and then an observed (measured) phenotype that is a non-linear function of the latent phenotype.

Setup for analysis

Import Python modules / packages:

import collections
import os
import itertools
import random
import tempfile
import time
import warnings

import pandas as pd

from plotnine import *

import dms_variants.binarymap
import dms_variants.codonvarianttable
import dms_variants.globalepistasis
import dms_variants.plotnine_themes
import dms_variants.simulate
from dms_variants.constants import CBPALETTE, CODONS_NOSTOP

import yaml

Set the gray grid plotnine theme defined in dms_variants.plotnine_themes:

theme_set(dms_variants.plotnine_themes.theme_graygrid())

Versions of key software:

print(f"Using dms_variants version {dms_variants.__version__}")
Using dms_variants version 0.6.0

Set pandas display options to show large chunks of Data Frames in this example:

pd.set_option('display.max_columns', 30)
pd.set_option('display.width', 500)

Hide warnings that clutter output:

warnings.simplefilter('ignore')

Parameters for notebook

Read the configuration file:

with open('config.yaml') as f:
    config = yaml.safe_load(f)

Make output directory if needed:

os.makedirs(config['global_epistasis_binding_dir'], exist_ok=True)
os.makedirs(config['figs_dir'], exist_ok=True)

Read in Tite-seq affinity scores by barcode

Read in log10Ka measurements. Rename log10Ka column to be func_score, and calculate variance from the ddG_log10Ka column. Remove rows with NaN for func_score. For empty aa_substitutions (wildtype), it is being replaced with NA. Make back to an empty string.

df = pd.read_csv(config['Titeseq_Kds_file'])
df.rename(columns={'log10Ka':'func_score'},inplace=True)
df['func_score_var'] = df['log10SE']**2
func_scores = df[pd.notnull(df['func_score'])]
func_scores.fillna('',inplace=True)
func_scores.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
library target barcode variant_call_support avgcount func_score delta_log10Ka log10SE response baseline nMSR variant_class aa_substitutions n_aa_substitutions func_score_var
2 lib1 SARS-CoV-2 AAAAAAAAACTTCAAT 5 74.50 8.72 -2.05 0.12 1.64 1.13 0.01 >1 nonsynonymous A22C R127G E141D L188V 4 0.0144
3 lib1 SARS-CoV-2 AAAAAAAACAAGCAGA 6 146.32 10.35 -0.42 0.05 2.87 1.01 0.00 1 nonsynonymous N13F 1 0.0025
5 lib1 SARS-CoV-2 AAAAAAAACAGGTTGC 4 47.48 6.00 -4.76 2.00 1.50 1.15 0.01 >1 nonsynonymous V71K P149L N157T 3 4.0000
6 lib1 SARS-CoV-2 AAAAAAAACATTAAAT 6 47.44 10.15 -0.61 0.09 2.64 1.09 0.00 >1 nonsynonymous A18V T148S H189Y 3 0.0081
8 lib1 SARS-CoV-2 AAAAAAAACCTTACAA 2 18.78 9.61 -1.15 0.19 2.00 1.11 0.01 >1 nonsynonymous T63D A89N 2 0.0361

Fit global epistasis models

We now fit global epistasis models to the functional scores. For background on these models, see Otwinoski et al (2018). The models we fits are implemented in dms_variants.globalepistasis, which has extensive documentation.

The primary model of interest is MonotonicSplineEpistasis, which assumes that the observed phenotype is a monotonic non-linear function of an underlying additive latent phenotype. As a control, we also fit a NoEpistasis model, which assumes that mutations simply contribute additively to the observed phenotype:

For the fitting, we first convert the set of functional scores into a binary representation using a BinaryMap. Then we create the model, fit it, and store it.

# NBVAL_IGNORE_OUTPUT

models = {}  # store models, keyed by `(epistasistype, likelihoodtype, lib)`

for (lib), scores in func_scores.groupby(['library']):
   
    bmap = dms_variants.binarymap.BinaryMap(scores)
    
    for epistasistype, likelihoodtype, Model in [
            ('global epistasis', 'Gaussian', dms_variants.globalepistasis.MonotonicSplineEpistasisGaussianLikelihood),
            ('no epistasis', 'Gaussian', dms_variants.globalepistasis.NoEpistasisGaussianLikelihood),
            ('global epistasis', 'Cauchy', dms_variants.globalepistasis.MonotonicSplineEpistasisCauchyLikelihood),
            ('no epistasis', 'Cauchy', dms_variants.globalepistasis.NoEpistasisCauchyLikelihood),
            ]:
        print(f"Fitting {epistasistype} with {likelihoodtype} likelihood model to {lib}...", end=' ')
    
        start = time.time()
        model = Model(bmap)
        model.fit()  # do NOT change ftol in normal use, this is just for test
        print(f"fitting took {time.time() - start:.1f} sec.")
        models[(epistasistype, likelihoodtype, lib)] = model
Fitting global epistasis with Gaussian likelihood model to lib1... fitting took 87.7 sec.
Fitting no epistasis with Gaussian likelihood model to lib1... fitting took 3.2 sec.
Fitting global epistasis with Cauchy likelihood model to lib1... fitting took 218.3 sec.
Fitting no epistasis with Cauchy likelihood model to lib1... fitting took 16.3 sec.
Fitting global epistasis with Gaussian likelihood model to lib2... fitting took 83.2 sec.
Fitting no epistasis with Gaussian likelihood model to lib2... fitting took 3.3 sec.
Fitting global epistasis with Cauchy likelihood model to lib2... fitting took 240.6 sec.
Fitting no epistasis with Cauchy likelihood model to lib2... fitting took 20.5 sec.

Now we want to see which model fits the data better. To do this, we get the log likelihood of each model along with the number of model parameters and use it to calculate the AIC. Models with lower AIC are better, and below we see that the MonotonicSplineEpistasis global epistasis model always fits the data much better. (Note, I previous tried alterning number of meshpoints, and the default 4 was the lowest AIC):

# NBVAL_IGNORE_OUTPUT

logliks_df = (
    pd.DataFrame.from_records(
            [(epistasistype, likelihoodtype, lib, model.nparams, model.loglik) for
             (epistasistype, likelihoodtype, lib), model in models.items()],
            columns=['model', 'likelihood type', 'library',
                     'n_parameters', 'log_likelihood']
            )
    .assign(AIC=lambda x: 2 * x['n_parameters'] - 2 * x['log_likelihood'])
    .set_index(['library'])
    )

logliks_df.round(1)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
model likelihood type n_parameters log_likelihood AIC
library
lib1 global epistasis Gaussian 3801 -53065.8 113733.5
lib1 no epistasis Gaussian 3795 -80441.6 168473.3
lib1 global epistasis Cauchy 3801 -39036.0 85674.0
lib1 no epistasis Cauchy 3795 -60021.4 127632.9
lib2 global epistasis Gaussian 3798 -49768.0 107132.0
lib2 no epistasis Gaussian 3792 -73422.4 154428.8
lib2 global epistasis Cauchy 3798 -32464.9 72525.8
lib2 no epistasis Cauchy 3792 -54067.1 115718.3
# check to confirm the global epistasis model is better in all cases
assert (logliks_df
        .reset_index()
        .pivot_table(index=['library'],
                     values='AIC',
                     columns='model'
                     )
        .assign(global_better=lambda x: x['global epistasis'] < x['no epistasis'])
        ['global_better']
        .all()
        )

Because the MonotonicSplineEpistasis global epistasis models fit so much better than the additive NoEpistasis models, below we are just going to analyze the results from the global epistasis model.

First, we will examine how the model looks on all the actual variants used to fit the model. We use AbstractEpistasis.phenotypes_df to get a data frame of all the variants used to fit each global epistasis model along with their functional scores and the latent and observed phenotypes predicted by each model.

# NBVAL_IGNORE_OUTPUT

variants_df = pd.concat(
        [model.phenotypes_df
         .assign(library=lib,
                 likelihoodtype=likelihoodtype,
                 )
         for (epistasistype, likelihoodtype, lib), model in models.items()
         if (epistasistype == 'global epistasis')],
        ignore_index=True, sort=False)

#predictionsfile = os.path.join(config['global_epistasis_binding_dir'], 'globalepistasis_binding_predictions.csv')
#variants_df.to_csv(predictionsfile, index=False)
#print(f"Writing predictions to {predictionsfile}")

variants_df.head().round(2)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
aa_substitutions func_score func_score_var latent_phenotype observed_phenotype library likelihoodtype
0 A22C R127G E141D L188V 8.72 0.01 -1.54 9.06 lib1 Gaussian
1 N13F 10.35 0.00 -0.62 10.46 lib1 Gaussian
2 V71K P149L N157T 6.00 4.00 -3.99 6.64 lib1 Gaussian
3 A18V T148S H189Y 10.15 0.01 -0.89 10.14 lib1 Gaussian
4 T63D A89N 9.61 0.04 -1.33 9.43 lib1 Gaussian

Below we plot the relationships among the latent phenotype from the model, the observed phenotype from the model, and the measured functional score for all variants used to fit the model:

for x, y in itertools.combinations(['latent_phenotype',
                                    'observed_phenotype',
                                    'func_score'],
                                   2):
    p = (
        ggplot(variants_df, aes(x, y)) +
        geom_point(alpha=0.05, size=0.5) +
        facet_grid('library ~ likelihoodtype', scales='free_y') +
        theme(figure_size=(2 * variants_df['likelihoodtype'].nunique(),
                           2 * variants_df['library'].nunique()),
              )
        )
#         plotfile = os.path.join(config['figs_dir'], f'{y}-v-{x}_by_{epistasistype}.pdf')
#         print(f"Saving to {plotfile}")
#         p.save(plotfile)
    _ = p.draw()

png

png

png

The two Cauchy fits looks good -- clearly picking up on the complete censoring of KA,app values >106 to the 106 boundary condition. The Gaussian likelihood model fails to find this boundary in the lib2 data, and in the lib1 data it seems to fit a boundary closer to 7 than to 6. I am curious to see how these curves change when I remove the SE/variance estimates, which get wonky in these values at this boundary (and are of uncertain value even for points within the dynamic range).

The Cauchy likelihood fits also have fewer points that are observed at this boundary condition in the experimental functional scores but given latent phenotypes near neutral (aka, likely "false positives"). Overall, these models that use the Cauchy liklelihood look better, which is consistent with what I've seen from previous Tite-seq results. We will confirm this later on when we evaluate the correlation in coefficients between replicates from these different models.

To get a better view into the "shape" of global epistasis, let's re-make the plots above but only showing single mutant barcodes.

mask = (variants_df['aa_substitutions'].str.len() < 6) & (variants_df['aa_substitutions'].str.len() > 0)
single_variants_df = variants_df.loc[mask]
single_variants_df.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
aa_substitutions func_score func_score_var latent_phenotype observed_phenotype library likelihoodtype
1 N13F 10.35 0.0025 -0.620878 10.456437 lib1 Gaussian
10 P7S 10.67 0.0081 -0.082381 10.731078 lib1 Gaussian
21 P149Q 10.72 0.0025 -0.234197 10.714161 lib1 Gaussian
24 D90Y 10.57 0.0049 -0.622221 10.455074 lib1 Gaussian
26 L5C 10.69 0.0036 -0.257591 10.706989 lib1 Gaussian
for x, y in itertools.combinations(['latent_phenotype',
                                    'observed_phenotype',
                                    'func_score'],
                                   2):
    p = (
        ggplot(single_variants_df, aes(x, y)) +
        geom_point(alpha=0.05, size=0.5) +
        facet_grid('library ~ likelihoodtype', scales='free_y') +
        theme(figure_size=(2 * variants_df['likelihoodtype'].nunique(),
                           2 * variants_df['library'].nunique()),
              )
        )
#         plotfile = os.path.join(config['figs_dir'], f'{y}-v-{x}_by_{epistasistype}.pdf')
#         print(f"Saving to {plotfile}")
#         p.save(plotfile)
    _ = p.draw()

png

png

png

#add predicted and latent phenotypes to table by barcode with additional columns used for interpretation, save output file
dt = pd.read_csv(config['Titeseq_Kds_file'])
dt[['aa_substitutions']] = dt[['aa_substitutions']].fillna(value='')

dt = models.get(('global epistasis', 'Gaussian', 'lib1')).add_phenotypes_to_df(df=dt, substitutions_col='aa_substitutions',latent_phenotype_col='latent_phenotype_Gaussian_1',observed_phenotype_col='predicted_phenotype_Gaussian_1',unknown_as_nan=True)
dt = models.get(('global epistasis', 'Cauchy', 'lib1')).add_phenotypes_to_df(df=dt, substitutions_col='aa_substitutions',latent_phenotype_col='latent_phenotype_Cauchy_1',observed_phenotype_col='predicted_phenotype_Cauchy_1',unknown_as_nan=True)
dt = models.get(('global epistasis', 'Gaussian', 'lib2')).add_phenotypes_to_df(df=dt, substitutions_col='aa_substitutions',latent_phenotype_col='latent_phenotype_Gaussian_2',observed_phenotype_col='predicted_phenotype_Gaussian_2',unknown_as_nan=True)
dt = models.get(('global epistasis', 'Cauchy', 'lib2')).add_phenotypes_to_df(df=dt, substitutions_col='aa_substitutions',latent_phenotype_col='latent_phenotype_Cauchy_2',observed_phenotype_col='predicted_phenotype_Cauchy_2',unknown_as_nan=True)

dt.to_csv(config['global_epistasis_binding_file'], index=False)
print(f"Writing predictions to {config['global_epistasis_binding_file']}")

dt.head().round(2)
Writing predictions to results/global_epistasis_binding/global_epistasis_binding_predictions.csv
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
library target barcode variant_call_support avgcount log10Ka delta_log10Ka log10SE response baseline nMSR variant_class aa_substitutions n_aa_substitutions latent_phenotype_Gaussian_1 predicted_phenotype_Gaussian_1 latent_phenotype_Cauchy_1 predicted_phenotype_Cauchy_1 latent_phenotype_Gaussian_2 predicted_phenotype_Gaussian_2 latent_phenotype_Cauchy_2 predicted_phenotype_Cauchy_2
0 lib1 SARS-CoV-2 AAAAAAAAAATTGTAA 1 0.00 NaN NaN NaN NaN NaN NaN >1 nonsynonymous Y91L K199Y 2 -0.34 10.67 -0.33 10.66 -0.41 10.66 -0.33 10.69
1 lib1 SARS-CoV-2 AAAAAAAAACTTAAAT 2 4.30 NaN NaN NaN NaN NaN NaN >1 nonsynonymous N13S L60P K94N S147T C150Y 5 -4.56 6.61 -4.76 6.14 -4.56 6.54 -4.82 6.06
2 lib1 SARS-CoV-2 AAAAAAAAACTTCAAT 5 74.50 8.72 -2.05 0.12 1.64 1.13 0.01 >1 nonsynonymous A22C R127G E141D L188V 4 -1.54 9.06 -1.38 9.43 -1.57 9.07 -1.23 9.67
3 lib1 SARS-CoV-2 AAAAAAAACAAGCAGA 6 146.32 10.35 -0.42 0.05 2.87 1.01 0.00 1 nonsynonymous N13F 1 -0.62 10.46 -0.67 10.38 -0.63 10.46 -0.70 10.36
4 lib1 SARS-CoV-2 AAAAAAAACAATATAA 1 2.48 NaN NaN NaN NaN NaN NaN >1 nonsynonymous C6K T15W K94Y V103W 4 -7.48 6.61 -6.45 6.12 -5.29 6.53 -7.16 6.03

Repeat fits for pooled library measurements

Repeat the fits for all barcodes pooled together. There is slight variation in the average KD,app ascribed to wildtype genotypes in each library. To account for this, express the functional score in each library as the delta_log10Ka relative to the average of wildtype measurements in that library, thereby standardizing the small difference in mean WT between the two replicates and avoiding consequential artefacts in a joint fit.

df_joint = pd.read_csv(config['Titeseq_Kds_file'])
df_joint.rename(columns={'delta_log10Ka':'func_score'},inplace=True)
df_joint['func_score_var'] = df_joint['log10SE']**2
func_scores_joint = df_joint[pd.notnull(df_joint['func_score'])]
func_scores_joint.fillna('',inplace=True)
func_scores_joint.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
library target barcode variant_call_support avgcount log10Ka func_score log10SE response baseline nMSR variant_class aa_substitutions n_aa_substitutions func_score_var
2 lib1 SARS-CoV-2 AAAAAAAAACTTCAAT 5 74.50 8.72 -2.05 0.12 1.64 1.13 0.01 >1 nonsynonymous A22C R127G E141D L188V 4 0.0144
3 lib1 SARS-CoV-2 AAAAAAAACAAGCAGA 6 146.32 10.35 -0.42 0.05 2.87 1.01 0.00 1 nonsynonymous N13F 1 0.0025
5 lib1 SARS-CoV-2 AAAAAAAACAGGTTGC 4 47.48 6.00 -4.76 2.00 1.50 1.15 0.01 >1 nonsynonymous V71K P149L N157T 3 4.0000
6 lib1 SARS-CoV-2 AAAAAAAACATTAAAT 6 47.44 10.15 -0.61 0.09 2.64 1.09 0.00 >1 nonsynonymous A18V T148S H189Y 3 0.0081
8 lib1 SARS-CoV-2 AAAAAAAACCTTACAA 2 18.78 9.61 -1.15 0.19 2.00 1.11 0.01 >1 nonsynonymous T63D A89N 2 0.0361
# NBVAL_IGNORE_OUTPUT

models_joint = {}  # store models, keyed by `(epistasistype, likelihoodtype)`

for (target), scores in func_scores_joint.groupby(['target']):
   
    bmap = dms_variants.binarymap.BinaryMap(scores)
    
    for epistasistype, likelihoodtype, Model in [
            ('global epistasis', 'Gaussian', dms_variants.globalepistasis.MonotonicSplineEpistasisGaussianLikelihood),
            ('no epistasis', 'Gaussian', dms_variants.globalepistasis.NoEpistasisGaussianLikelihood),
            ('global epistasis', 'Cauchy', dms_variants.globalepistasis.MonotonicSplineEpistasisCauchyLikelihood),
            ('no epistasis', 'Cauchy', dms_variants.globalepistasis.NoEpistasisCauchyLikelihood),
            ]:
        print(f"Fitting {epistasistype} with {likelihoodtype} likelihood model...", end=' ')
    
        start = time.time()
        model = Model(bmap)
        model.fit()  # do NOT change ftol in normal use, this is just for test
        print(f"fitting took {time.time() - start:.1f} sec.")
        models_joint[(epistasistype, likelihoodtype)] = model
Fitting global epistasis with Gaussian likelihood model... fitting took 150.1 sec.
Fitting no epistasis with Gaussian likelihood model... fitting took 6.1 sec.
Fitting global epistasis with Cauchy likelihood model... fitting took 125.0 sec.
Fitting no epistasis with Cauchy likelihood model... fitting took 29.6 sec.
# NBVAL_IGNORE_OUTPUT

variants_df_joint = pd.concat(
        [model.phenotypes_df
         .assign(likelihoodtype=likelihoodtype)
         for (epistasistype, likelihoodtype), model in models_joint.items()
         if (epistasistype == 'global epistasis')],
        ignore_index=True, sort=False)

variants_df_joint.head().round(2)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
aa_substitutions func_score func_score_var latent_phenotype observed_phenotype likelihoodtype
0 A22C R127G E141D L188V -2.05 0.01 -1.60 -1.75 Gaussian
1 N13F -0.42 0.00 -0.64 -0.33 Gaussian
2 V71K P149L N157T -4.76 4.00 -6.19 -4.19 Gaussian
3 A18V T148S H189Y -0.61 0.01 -0.92 -0.67 Gaussian
4 T63D A89N -1.15 0.04 -1.31 -1.27 Gaussian
for x, y in itertools.combinations(['latent_phenotype',
                                    'observed_phenotype',
                                    'func_score'],
                                   2):
    p = (
        ggplot(variants_df_joint, aes(x, y)) +
        geom_point(alpha=0.05, size=0.5))
#        facet_grid('likelihoodtype', scales='free_y') +
#        theme(figure_size=(2 * variants_df_novar['likelihoodtype'].nunique()),
#              )
#        )
#         plotfile = os.path.join(config['figs_dir'], f'{y}-v-{x}_by_{epistasistype}.pdf')
#         print(f"Saving to {plotfile}")
#         p.save(plotfile)
    _ = p.draw()

png

png

png

We will look more at each of these models (Cauchy and Gaussian likelihoods, observed and latent scale measurements) in the next notebook, in which we evaluate coefficients, process our final phenotype map, and compare our measurements to some validation datasets.

Output epistasis model parameters

#lib1 models
models.get(('global epistasis', 'Gaussian', 'lib1')).single_mut_effects(phenotype='latent',standardize_range=False).to_csv('results/global_epistasis_binding/Gaussian-latent-effects_binding_1.csv',index=False)
models.get(('global epistasis', 'Gaussian', 'lib1')).single_mut_effects(phenotype='observed',standardize_range=False).to_csv('results/global_epistasis_binding/Gaussian-predicted-effects_binding_1.csv',index=False)
models.get(('global epistasis', 'Cauchy', 'lib1')).single_mut_effects(phenotype='latent',standardize_range=False).to_csv('results/global_epistasis_binding/Cauchy-latent-effects_binding_1.csv',index=False)
models.get(('global epistasis', 'Cauchy', 'lib1')).single_mut_effects(phenotype='observed',standardize_range=False).to_csv('results/global_epistasis_binding/Cauchy-predicted-effects_binding_1.csv',index=False)
models.get(('no epistasis', 'Gaussian', 'lib1')).single_mut_effects(phenotype='observed',standardize_range=False).to_csv('results/global_epistasis_binding/nonepistatic-Gaussian-predicted-effects_binding_1.csv',index=False)
models.get(('no epistasis', 'Cauchy', 'lib1')).single_mut_effects(phenotype='observed',standardize_range=False).to_csv('results/global_epistasis_binding/nonepistatic-Cauchy-predicted-effects_binding_1.csv',index=False)
#lib2 models
models.get(('global epistasis', 'Gaussian', 'lib2')).single_mut_effects(phenotype='latent',standardize_range=False).to_csv('results/global_epistasis_binding/Gaussian-latent-effects_binding_2.csv',index=False)
models.get(('global epistasis', 'Gaussian', 'lib2')).single_mut_effects(phenotype='observed',standardize_range=False).to_csv('results/global_epistasis_binding/Gaussian-predicted-effects_binding_2.csv',index=False)
models.get(('global epistasis', 'Cauchy', 'lib2')).single_mut_effects(phenotype='latent',standardize_range=False).to_csv('results/global_epistasis_binding/Cauchy-latent-effects_binding_2.csv',index=False)
models.get(('global epistasis', 'Cauchy', 'lib2')).single_mut_effects(phenotype='observed',standardize_range=False).to_csv('results/global_epistasis_binding/Cauchy-predicted-effects_binding_2.csv',index=False)
models.get(('no epistasis', 'Gaussian', 'lib2')).single_mut_effects(phenotype='observed',standardize_range=False).to_csv('results/global_epistasis_binding/nonepistatic-Gaussian-predicted-effects_binding_2.csv',index=False)
models.get(('no epistasis', 'Cauchy', 'lib2')).single_mut_effects(phenotype='observed',standardize_range=False).to_csv('results/global_epistasis_binding/nonepistatic-Cauchy-predicted-effects_binding_2.csv',index=False)
#joint models
models_joint.get(('global epistasis', 'Gaussian')).single_mut_effects(phenotype='latent',standardize_range=False).to_csv('results/global_epistasis_binding/Gaussian-latent-effects_binding_joint.csv',index=False)
models_joint.get(('global epistasis', 'Gaussian')).single_mut_effects(phenotype='observed',standardize_range=False).to_csv('results/global_epistasis_binding/Gaussian-predicted-effects_binding_joint.csv',index=False)
models_joint.get(('global epistasis', 'Cauchy')).single_mut_effects(phenotype='latent',standardize_range=False).to_csv('results/global_epistasis_binding/Cauchy-latent-effects_binding_joint.csv',index=False)
models_joint.get(('global epistasis', 'Cauchy')).single_mut_effects(phenotype='observed',standardize_range=False).to_csv('results/global_epistasis_binding/Cauchy-predicted-effects_binding_joint.csv',index=False)
models_joint.get(('no epistasis', 'Gaussian')).single_mut_effects(phenotype='observed',standardize_range=False).to_csv('results/global_epistasis_binding/nonepistatic-Gaussian-predicted-effects_binding_joint.csv',index=False)
models_joint.get(('no epistasis', 'Cauchy')).single_mut_effects(phenotype='observed',standardize_range=False).to_csv('results/global_epistasis_binding/nonepistatic-Cauchy-predicted-effects_binding_joint.csv',index=False)

#! jupyter nbconvert --to markdown global_epistasis_binding.ipynb --output-dir ./results/summary/ --output global_epistasis_binding.md