Fit global epistasis models to DMS data

The dms_variants.globalepistasis module is based off the "global epistasis" concepts of Otwinoski et al and Sailer and Harms -- that there exists an underlying latent phenotype that mutations affect additively, and then an observed (measured) phenotype that is a non-linear function of the latent phenotype.

Setup for analysis

Import Python modules / packages:

import collections
import os
import itertools
import random
import tempfile
import time
import warnings

import pandas as pd

from plotnine import *

import dms_variants.binarymap
import dms_variants.codonvarianttable
import dms_variants.globalepistasis
import dms_variants.plotnine_themes
import dms_variants.simulate
from dms_variants.constants import CBPALETTE, CODONS_NOSTOP

import yaml

Set the gray grid plotnine theme defined in dms_variants.plotnine_themes:

theme_set(dms_variants.plotnine_themes.theme_graygrid())

Versions of key software:

print(f"Using dms_variants version {dms_variants.__version__}")

Using dms_variants version 0.6.0

Set pandas display options to show large chunks of Data Frames in this example:

pd.set_option('display.max_columns', 30)
pd.set_option('display.width', 500)

Hide warnings that clutter output:

warnings.simplefilter('ignore')

Parameters for notebook

Read the configuration file:

with open('config.yaml') as f:
    config = yaml.safe_load(f)

Make output directory if needed:

os.makedirs(config['global_epistasis_binding_dir'], exist_ok=True)
os.makedirs(config['figs_dir'], exist_ok=True)

Read in Tite-seq affinity scores by barcode

Read in log10Ka measurements. Rename log10Ka column to be func_score, and calculate variance from the ddG_log10Ka column. Remove rows with NaN for func_score. For empty aa_substitutions (wildtype), it is being replaced with NA. Make back to an empty string.

df = pd.read_csv(config['Titeseq_Kds_file'])
df.rename(columns={'log10Ka':'func_score'},inplace=True)
df['func_score_var'] = df['log10SE']**2
func_scores = df[pd.notnull(df['func_score'])]
func_scores.fillna('',inplace=True)
func_scores.head()

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	library	target	barcode	variant_call_support	avgcount	func_score	delta_log10Ka	log10SE	response	baseline	nMSR	variant_class	aa_substitutions	n_aa_substitutions	func_score_var
2	lib1	SARS-CoV-2	AAAAAAAAACTTCAAT	5	74.50	8.72	-2.05	0.12	1.64	1.13	0.01	>1 nonsynonymous	A22C R127G E141D L188V	4	0.0144
3	lib1	SARS-CoV-2	AAAAAAAACAAGCAGA	6	146.32	10.35	-0.42	0.05	2.87	1.01	0.00	1 nonsynonymous	N13F	1	0.0025
5	lib1	SARS-CoV-2	AAAAAAAACAGGTTGC	4	47.48	6.00	-4.76	2.00	1.50	1.15	0.01	>1 nonsynonymous	V71K P149L N157T	3	4.0000
6	lib1	SARS-CoV-2	AAAAAAAACATTAAAT	6	47.44	10.15	-0.61	0.09	2.64	1.09	0.00	>1 nonsynonymous	A18V T148S H189Y	3	0.0081
8	lib1	SARS-CoV-2	AAAAAAAACCTTACAA	2	18.78	9.61	-1.15	0.19	2.00	1.11	0.01	>1 nonsynonymous	T63D A89N	2	0.0361

Fit global epistasis models

We now fit global epistasis models to the functional scores. For background on these models, see Otwinoski et al (2018). The models we fits are implemented in dms_variants.globalepistasis, which has extensive documentation.

The primary model of interest is MonotonicSplineEpistasis, which assumes that the observed phenotype is a monotonic non-linear function of an underlying additive latent phenotype. As a control, we also fit a NoEpistasis model, which assumes that mutations simply contribute additively to the observed phenotype:

For the fitting, we first convert the set of functional scores into a binary representation using a BinaryMap. Then we create the model, fit it, and store it.

# NBVAL_IGNORE_OUTPUT

models = {}  # store models, keyed by `(epistasistype, likelihoodtype, lib)`

for (lib), scores in func_scores.groupby(['library']):
   
    bmap = dms_variants.binarymap.BinaryMap(scores)
    
    for epistasistype, likelihoodtype, Model in [
            ('global epistasis', 'Gaussian', dms_variants.globalepistasis.MonotonicSplineEpistasisGaussianLikelihood),
            ('no epistasis', 'Gaussian', dms_variants.globalepistasis.NoEpistasisGaussianLikelihood),
            ('global epistasis', 'Cauchy', dms_variants.globalepistasis.MonotonicSplineEpistasisCauchyLikelihood),
            ('no epistasis', 'Cauchy', dms_variants.globalepistasis.NoEpistasisCauchyLikelihood),
            ]:
        print(f"Fitting {epistasistype} with {likelihoodtype} likelihood model to {lib}...", end=' ')
    
        start = time.time()
        model = Model(bmap)
        model.fit()  # do NOT change ftol in normal use, this is just for test
        print(f"fitting took {time.time() - start:.1f} sec.")
        models[(epistasistype, likelihoodtype, lib)] = model

Fitting global epistasis with Gaussian likelihood model to lib1... fitting took 87.7 sec.
Fitting no epistasis with Gaussian likelihood model to lib1... fitting took 3.2 sec.
Fitting global epistasis with Cauchy likelihood model to lib1... fitting took 218.3 sec.
Fitting no epistasis with Cauchy likelihood model to lib1... fitting took 16.3 sec.
Fitting global epistasis with Gaussian likelihood model to lib2... fitting took 83.2 sec.
Fitting no epistasis with Gaussian likelihood model to lib2... fitting took 3.3 sec.
Fitting global epistasis with Cauchy likelihood model to lib2... fitting took 240.6 sec.
Fitting no epistasis with Cauchy likelihood model to lib2... fitting took 20.5 sec.

Now we want to see which model fits the data better. To do this, we get the log likelihood of each model along with the number of model parameters and use it to calculate the AIC. Models with lower AIC are better, and below we see that the MonotonicSplineEpistasis global epistasis model always fits the data much better. (Note, I previous tried alterning number of meshpoints, and the default 4 was the lowest AIC):

# NBVAL_IGNORE_OUTPUT

logliks_df = (
    pd.DataFrame.from_records(
            [(epistasistype, likelihoodtype, lib, model.nparams, model.loglik) for
             (epistasistype, likelihoodtype, lib), model in models.items()],
            columns=['model', 'likelihood type', 'library',
                     'n_parameters', 'log_likelihood']
            )
    .assign(AIC=lambda x: 2 * x['n_parameters'] - 2 * x['log_likelihood'])
    .set_index(['library'])
    )

logliks_df.round(1)

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	model	likelihood type	n_parameters	log_likelihood	AIC
library
lib1	global epistasis	Gaussian	3801	-53065.8	113733.5
lib1	no epistasis	Gaussian	3795	-80441.6	168473.3
lib1	global epistasis	Cauchy	3801	-39036.0	85674.0
lib1	no epistasis	Cauchy	3795	-60021.4	127632.9
lib2	global epistasis	Gaussian	3798	-49768.0	107132.0
lib2	no epistasis	Gaussian	3792	-73422.4	154428.8
lib2	global epistasis	Cauchy	3798	-32464.9	72525.8
lib2	no epistasis	Cauchy	3792	-54067.1	115718.3

# check to confirm the global epistasis model is better in all cases
assert (logliks_df
        .reset_index()
        .pivot_table(index=['library'],
                     values='AIC',
                     columns='model'
                     )
        .assign(global_better=lambda x: x['global epistasis'] < x['no epistasis'])
        ['global_better']
        .all()
        )

Because the MonotonicSplineEpistasis global epistasis models fit so much better than the additive NoEpistasis models, below we are just going to analyze the results from the global epistasis model.

First, we will examine how the model looks on all the actual variants used to fit the model. We use AbstractEpistasis.phenotypes_df to get a data frame of all the variants used to fit each global epistasis model along with their functional scores and the latent and observed phenotypes predicted by each model.

# NBVAL_IGNORE_OUTPUT

variants_df = pd.concat(
        [model.phenotypes_df
         .assign(library=lib,
                 likelihoodtype=likelihoodtype,
                 )
         for (epistasistype, likelihoodtype, lib), model in models.items()
         if (epistasistype == 'global epistasis')],
        ignore_index=True, sort=False)

#predictionsfile = os.path.join(config['global_epistasis_binding_dir'], 'globalepistasis_binding_predictions.csv')
#variants_df.to_csv(predictionsfile, index=False)
#print(f"Writing predictions to {predictionsfile}")

variants_df.head().round(2)

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	aa_substitutions	func_score	func_score_var	latent_phenotype	observed_phenotype	library	likelihoodtype
0	A22C R127G E141D L188V	8.72	0.01	-1.54	9.06	lib1	Gaussian
1	N13F	10.35	0.00	-0.62	10.46	lib1	Gaussian
2	V71K P149L N157T	6.00	4.00	-3.99	6.64	lib1	Gaussian
3	A18V T148S H189Y	10.15	0.01	-0.89	10.14	lib1	Gaussian
4	T63D A89N	9.61	0.04	-1.33	9.43	lib1	Gaussian

Below we plot the relationships among the latent phenotype from the model, the observed phenotype from the model, and the measured functional score for all variants used to fit the model:

for x, y in itertools.combinations(['latent_phenotype',
                                    'observed_phenotype',
                                    'func_score'],
                                   2):
    p = (
        ggplot(variants_df, aes(x, y)) +
        geom_point(alpha=0.05, size=0.5) +
        facet_grid('library ~ likelihoodtype', scales='free_y') +
        theme(figure_size=(2 * variants_df['likelihoodtype'].nunique(),
                           2 * variants_df['library'].nunique()),
              )
        )
#         plotfile = os.path.join(config['figs_dir'], f'{y}-v-{x}_by_{epistasistype}.pdf')
#         print(f"Saving to {plotfile}")
#         p.save(plotfile)
    _ = p.draw()

The two Cauchy fits looks good -- clearly picking up on the complete censoring of K_A,app values >10⁶ to the 10⁶ boundary condition. The Gaussian likelihood model fails to find this boundary in the lib2 data, and in the lib1 data it seems to fit a boundary closer to 7 than to 6. I am curious to see how these curves change when I remove the SE/variance estimates, which get wonky in these values at this boundary (and are of uncertain value even for points within the dynamic range).

The Cauchy likelihood fits also have fewer points that are observed at this boundary condition in the experimental functional scores but given latent phenotypes near neutral (aka, likely "false positives"). Overall, these models that use the Cauchy liklelihood look better, which is consistent with what I've seen from previous Tite-seq results. We will confirm this later on when we evaluate the correlation in coefficients between replicates from these different models.

To get a better view into the "shape" of global epistasis, let's re-make the plots above but only showing single mutant barcodes.

mask = (variants_df['aa_substitutions'].str.len() < 6) & (variants_df['aa_substitutions'].str.len() > 0)
single_variants_df = variants_df.loc[mask]
single_variants_df.head()

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	aa_substitutions	func_score	func_score_var	latent_phenotype	observed_phenotype	library	likelihoodtype
1	N13F	10.35	0.0025	-0.620878	10.456437	lib1	Gaussian
10	P7S	10.67	0.0081	-0.082381	10.731078	lib1	Gaussian
21	P149Q	10.72	0.0025	-0.234197	10.714161	lib1	Gaussian
24	D90Y	10.57	0.0049	-0.622221	10.455074	lib1	Gaussian
26	L5C	10.69	0.0036	-0.257591	10.706989	lib1	Gaussian

for x, y in itertools.combinations(['latent_phenotype',
                                    'observed_phenotype',
                                    'func_score'],
                                   2):
    p = (
        ggplot(single_variants_df, aes(x, y)) +
        geom_point(alpha=0.05, size=0.5) +
        facet_grid('library ~ likelihoodtype', scales='free_y') +
        theme(figure_size=(2 * variants_df['likelihoodtype'].nunique(),
                           2 * variants_df['library'].nunique()),
              )
        )
#         plotfile = os.path.join(config['figs_dir'], f'{y}-v-{x}_by_{epistasistype}.pdf')
#         print(f"Saving to {plotfile}")
#         p.save(plotfile)
    _ = p.draw()

#add predicted and latent phenotypes to table by barcode with additional columns used for interpretation, save output file
dt = pd.read_csv(config['Titeseq_Kds_file'])
dt[['aa_substitutions']] = dt[['aa_substitutions']].fillna(value='')

dt = models.get(('global epistasis', 'Gaussian', 'lib1')).add_phenotypes_to_df(df=dt, substitutions_col='aa_substitutions',latent_phenotype_col='latent_phenotype_Gaussian_1',observed_phenotype_col='predicted_phenotype_Gaussian_1',unknown_as_nan=True)
dt = models.get(('global epistasis', 'Cauchy', 'lib1')).add_phenotypes_to_df(df=dt, substitutions_col='aa_substitutions',latent_phenotype_col='latent_phenotype_Cauchy_1',observed_phenotype_col='predicted_phenotype_Cauchy_1',unknown_as_nan=True)
dt = models.get(('global epistasis', 'Gaussian', 'lib2')).add_phenotypes_to_df(df=dt, substitutions_col='aa_substitutions',latent_phenotype_col='latent_phenotype_Gaussian_2',observed_phenotype_col='predicted_phenotype_Gaussian_2',unknown_as_nan=True)
dt = models.get(('global epistasis', 'Cauchy', 'lib2')).add_phenotypes_to_df(df=dt, substitutions_col='aa_substitutions',latent_phenotype_col='latent_phenotype_Cauchy_2',observed_phenotype_col='predicted_phenotype_Cauchy_2',unknown_as_nan=True)

dt.to_csv(config['global_epistasis_binding_file'], index=False)
print(f"Writing predictions to {config['global_epistasis_binding_file']}")

dt.head().round(2)

Writing predictions to results/global_epistasis_binding/global_epistasis_binding_predictions.csv

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	library	target	barcode	variant_call_support	avgcount	log10Ka	delta_log10Ka	log10SE	response	baseline	nMSR	variant_class	aa_substitutions	n_aa_substitutions	latent_phenotype_Gaussian_1	predicted_phenotype_Gaussian_1	latent_phenotype_Cauchy_1	predicted_phenotype_Cauchy_1	latent_phenotype_Gaussian_2	predicted_phenotype_Gaussian_2	latent_phenotype_Cauchy_2	predicted_phenotype_Cauchy_2
0	lib1	SARS-CoV-2	AAAAAAAAAATTGTAA	1	0.00	NaN	NaN	NaN	NaN	NaN	NaN	>1 nonsynonymous	Y91L K199Y	2	-0.34	10.67	-0.33	10.66	-0.41	10.66	-0.33	10.69
1	lib1	SARS-CoV-2	AAAAAAAAACTTAAAT	2	4.30	NaN	NaN	NaN	NaN	NaN	NaN	>1 nonsynonymous	N13S L60P K94N S147T C150Y	5	-4.56	6.61	-4.76	6.14	-4.56	6.54	-4.82	6.06
2	lib1	SARS-CoV-2	AAAAAAAAACTTCAAT	5	74.50	8.72	-2.05	0.12	1.64	1.13	0.01	>1 nonsynonymous	A22C R127G E141D L188V	4	-1.54	9.06	-1.38	9.43	-1.57	9.07	-1.23	9.67
3	lib1	SARS-CoV-2	AAAAAAAACAAGCAGA	6	146.32	10.35	-0.42	0.05	2.87	1.01	0.00	1 nonsynonymous	N13F	1	-0.62	10.46	-0.67	10.38	-0.63	10.46	-0.70	10.36
4	lib1	SARS-CoV-2	AAAAAAAACAATATAA	1	2.48	NaN	NaN	NaN	NaN	NaN	NaN	>1 nonsynonymous	C6K T15W K94Y V103W	4	-7.48	6.61	-6.45	6.12	-5.29	6.53	-7.16	6.03

Repeat fits for pooled library measurements

Repeat the fits for all barcodes pooled together. There is slight variation in the average K_D,app ascribed to wildtype genotypes in each library. To account for this, express the functional score in each library as the delta_log10Ka relative to the average of wildtype measurements in that library, thereby standardizing the small difference in mean WT between the two replicates and avoiding consequential artefacts in a joint fit.

df_joint = pd.read_csv(config['Titeseq_Kds_file'])
df_joint.rename(columns={'delta_log10Ka':'func_score'},inplace=True)
df_joint['func_score_var'] = df_joint['log10SE']**2
func_scores_joint = df_joint[pd.notnull(df_joint['func_score'])]
func_scores_joint.fillna('',inplace=True)
func_scores_joint.head()

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	library	target	barcode	variant_call_support	avgcount	log10Ka	func_score	log10SE	response	baseline	nMSR	variant_class	aa_substitutions	n_aa_substitutions	func_score_var
2	lib1	SARS-CoV-2	AAAAAAAAACTTCAAT	5	74.50	8.72	-2.05	0.12	1.64	1.13	0.01	>1 nonsynonymous	A22C R127G E141D L188V	4	0.0144
3	lib1	SARS-CoV-2	AAAAAAAACAAGCAGA	6	146.32	10.35	-0.42	0.05	2.87	1.01	0.00	1 nonsynonymous	N13F	1	0.0025
5	lib1	SARS-CoV-2	AAAAAAAACAGGTTGC	4	47.48	6.00	-4.76	2.00	1.50	1.15	0.01	>1 nonsynonymous	V71K P149L N157T	3	4.0000
6	lib1	SARS-CoV-2	AAAAAAAACATTAAAT	6	47.44	10.15	-0.61	0.09	2.64	1.09	0.00	>1 nonsynonymous	A18V T148S H189Y	3	0.0081
8	lib1	SARS-CoV-2	AAAAAAAACCTTACAA	2	18.78	9.61	-1.15	0.19	2.00	1.11	0.01	>1 nonsynonymous	T63D A89N	2	0.0361

# NBVAL_IGNORE_OUTPUT

models_joint = {}  # store models, keyed by `(epistasistype, likelihoodtype)`

for (target), scores in func_scores_joint.groupby(['target']):
   
    bmap = dms_variants.binarymap.BinaryMap(scores)
    
    for epistasistype, likelihoodtype, Model in [
            ('global epistasis', 'Gaussian', dms_variants.globalepistasis.MonotonicSplineEpistasisGaussianLikelihood),
            ('no epistasis', 'Gaussian', dms_variants.globalepistasis.NoEpistasisGaussianLikelihood),
            ('global epistasis', 'Cauchy', dms_variants.globalepistasis.MonotonicSplineEpistasisCauchyLikelihood),
            ('no epistasis', 'Cauchy', dms_variants.globalepistasis.NoEpistasisCauchyLikelihood),
            ]:
        print(f"Fitting {epistasistype} with {likelihoodtype} likelihood model...", end=' ')
    
        start = time.time()
        model = Model(bmap)
        model.fit()  # do NOT change ftol in normal use, this is just for test
        print(f"fitting took {time.time() - start:.1f} sec.")
        models_joint[(epistasistype, likelihoodtype)] = model

Fitting global epistasis with Gaussian likelihood model... fitting took 150.1 sec.
Fitting no epistasis with Gaussian likelihood model... fitting took 6.1 sec.
Fitting global epistasis with Cauchy likelihood model... fitting took 125.0 sec.
Fitting no epistasis with Cauchy likelihood model... fitting took 29.6 sec.

# NBVAL_IGNORE_OUTPUT

variants_df_joint = pd.concat(
        [model.phenotypes_df
         .assign(likelihoodtype=likelihoodtype)
         for (epistasistype, likelihoodtype), model in models_joint.items()
         if (epistasistype == 'global epistasis')],
        ignore_index=True, sort=False)

variants_df_joint.head().round(2)

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	aa_substitutions	func_score	func_score_var	latent_phenotype	observed_phenotype	likelihoodtype
0	A22C R127G E141D L188V	-2.05	0.01	-1.60	-1.75	Gaussian
1	N13F	-0.42	0.00	-0.64	-0.33	Gaussian
2	V71K P149L N157T	-4.76	4.00	-6.19	-4.19	Gaussian
3	A18V T148S H189Y	-0.61	0.01	-0.92	-0.67	Gaussian
4	T63D A89N	-1.15	0.04	-1.31	-1.27	Gaussian

for x, y in itertools.combinations(['latent_phenotype',
                                    'observed_phenotype',
                                    'func_score'],
                                   2):
    p = (
        ggplot(variants_df_joint, aes(x, y)) +
        geom_point(alpha=0.05, size=0.5))
#        facet_grid('likelihoodtype', scales='free_y') +
#        theme(figure_size=(2 * variants_df_novar['likelihoodtype'].nunique()),
#              )
#        )
#         plotfile = os.path.join(config['figs_dir'], f'{y}-v-{x}_by_{epistasistype}.pdf')
#         print(f"Saving to {plotfile}")
#         p.save(plotfile)
    _ = p.draw()

We will look more at each of these models (Cauchy and Gaussian likelihoods, observed and latent scale measurements) in the next notebook, in which we evaluate coefficients, process our final phenotype map, and compare our measurements to some validation datasets.

Output epistasis model parameters

#lib1 models
models.get(('global epistasis', 'Gaussian', 'lib1')).single_mut_effects(phenotype='latent',standardize_range=False).to_csv('results/global_epistasis_binding/Gaussian-latent-effects_binding_1.csv',index=False)
models.get(('global epistasis', 'Gaussian', 'lib1')).single_mut_effects(phenotype='observed',standardize_range=False).to_csv('results/global_epistasis_binding/Gaussian-predicted-effects_binding_1.csv',index=False)
models.get(('global epistasis', 'Cauchy', 'lib1')).single_mut_effects(phenotype='latent',standardize_range=False).to_csv('results/global_epistasis_binding/Cauchy-latent-effects_binding_1.csv',index=False)
models.get(('global epistasis', 'Cauchy', 'lib1')).single_mut_effects(phenotype='observed',standardize_range=False).to_csv('results/global_epistasis_binding/Cauchy-predicted-effects_binding_1.csv',index=False)
models.get(('no epistasis', 'Gaussian', 'lib1')).single_mut_effects(phenotype='observed',standardize_range=False).to_csv('results/global_epistasis_binding/nonepistatic-Gaussian-predicted-effects_binding_1.csv',index=False)
models.get(('no epistasis', 'Cauchy', 'lib1')).single_mut_effects(phenotype='observed',standardize_range=False).to_csv('results/global_epistasis_binding/nonepistatic-Cauchy-predicted-effects_binding_1.csv',index=False)
#lib2 models
models.get(('global epistasis', 'Gaussian', 'lib2')).single_mut_effects(phenotype='latent',standardize_range=False).to_csv('results/global_epistasis_binding/Gaussian-latent-effects_binding_2.csv',index=False)
models.get(('global epistasis', 'Gaussian', 'lib2')).single_mut_effects(phenotype='observed',standardize_range=False).to_csv('results/global_epistasis_binding/Gaussian-predicted-effects_binding_2.csv',index=False)
models.get(('global epistasis', 'Cauchy', 'lib2')).single_mut_effects(phenotype='latent',standardize_range=False).to_csv('results/global_epistasis_binding/Cauchy-latent-effects_binding_2.csv',index=False)
models.get(('global epistasis', 'Cauchy', 'lib2')).single_mut_effects(phenotype='observed',standardize_range=False).to_csv('results/global_epistasis_binding/Cauchy-predicted-effects_binding_2.csv',index=False)
models.get(('no epistasis', 'Gaussian', 'lib2')).single_mut_effects(phenotype='observed',standardize_range=False).to_csv('results/global_epistasis_binding/nonepistatic-Gaussian-predicted-effects_binding_2.csv',index=False)
models.get(('no epistasis', 'Cauchy', 'lib2')).single_mut_effects(phenotype='observed',standardize_range=False).to_csv('results/global_epistasis_binding/nonepistatic-Cauchy-predicted-effects_binding_2.csv',index=False)
#joint models
models_joint.get(('global epistasis', 'Gaussian')).single_mut_effects(phenotype='latent',standardize_range=False).to_csv('results/global_epistasis_binding/Gaussian-latent-effects_binding_joint.csv',index=False)
models_joint.get(('global epistasis', 'Gaussian')).single_mut_effects(phenotype='observed',standardize_range=False).to_csv('results/global_epistasis_binding/Gaussian-predicted-effects_binding_joint.csv',index=False)
models_joint.get(('global epistasis', 'Cauchy')).single_mut_effects(phenotype='latent',standardize_range=False).to_csv('results/global_epistasis_binding/Cauchy-latent-effects_binding_joint.csv',index=False)
models_joint.get(('global epistasis', 'Cauchy')).single_mut_effects(phenotype='observed',standardize_range=False).to_csv('results/global_epistasis_binding/Cauchy-predicted-effects_binding_joint.csv',index=False)
models_joint.get(('no epistasis', 'Gaussian')).single_mut_effects(phenotype='observed',standardize_range=False).to_csv('results/global_epistasis_binding/nonepistatic-Gaussian-predicted-effects_binding_joint.csv',index=False)
models_joint.get(('no epistasis', 'Cauchy')).single_mut_effects(phenotype='observed',standardize_range=False).to_csv('results/global_epistasis_binding/nonepistatic-Cauchy-predicted-effects_binding_joint.csv',index=False)

#! jupyter nbconvert --to markdown global_epistasis_binding.ipynb --output-dir ./results/summary/ --output global_epistasis_binding.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

global_epistasis_binding.md

global_epistasis_binding.md

Fit global epistasis models to DMS data

Setup for analysis

Parameters for notebook

Read in Tite-seq affinity scores by barcode

Fit global epistasis models

Repeat fits for pooled library measurements

Output epistasis model parameters

Files

global_epistasis_binding.md

Latest commit

History

global_epistasis_binding.md

File metadata and controls

Fit global epistasis models to DMS data

Setup for analysis

Parameters for notebook

Read in Tite-seq affinity scores by barcode

Fit global epistasis models

Repeat fits for pooled library measurements

Output epistasis model parameters