How to obtain y and d residuals for plotting #161

PhilipSpechler · 2022-09-27T22:33:31Z

PhilipSpechler
Sep 27, 2022

Hello, I am using doubleMLPR to estimate the effect of 8 candidate treatment variables (d1:d8, with use_other_treat_as_covariate=TRUE) on an outcome measure (y) while adjusting for a set of other covariates. After fitting, I found a significant effect of d1 and now I would like to plot the residualized outcome measure (y'~covariates) against the residualized d1 measure(d1'~covariates). Are the residuals for y' and d1' available somewhere in the fitted object? While I could plot the original y vs. d1, it seems like plotting the residuals might be a more accurate representation of the results from double ML. Thanks!

Answered by SvenKlaassen

Apr 11, 2023

Just a small correction. To calculate the correct residuals the predictions have to be reshaped:

# Predictions for nuisance part 'ml_l' and 'ml_m' stored in an array with dimensions (n_obs x n_rep x n_treat)
print(dml_plr.predictions['ml_l'].shape)
print(dml_plr.predictions['ml_m'].shape)

# Compute residuals for ml_l = E[Y|X]
residuals_ml_l_d1 = dml_data.y - dml_plr.predictions['ml_l'][:,:,0].reshape(-1)

# Compute residuals for ml_m = E[D_1 | X] (for first treatment variable)
residuals_ml_m_d1 = dml_data.data[dml_data.d_cols[0]].values - dml_plr.predictions['ml_m'][:,:,0].reshape(-1)

# Generate a scatter plot of the residuals 
import matplotlib.pyplot as plt

# Fixing random state for …

View full answer

PhilippBach · 2022-09-28T10:18:32Z

PhilippBach
Sep 28, 2022
Maintainer

Hi @PhilipSpechler ,

thanks for your question. I think we do not directly export the residuals of the nuisance parts, but you can compute them on your own. To do this you can export the predictions from the fitting stage and construct the residuals accordingly. See below an example based on the code example from the section on simultaneous inference in our user guide. To save the nuisance prediction, you'd have to make sure that you specify the option store_predictions = True when calling .fit().

import doubleml as dml
import numpy as np
from sklearn.base import clone
from sklearn.linear_model import LassoCV

np.random.seed(1234)
n_obs = 500
n_vars = 100
X = np.random.normal(size=(n_obs, n_vars))
theta = np.array([3., 3., 3.])
y = np.dot(X[:, :3], theta) + np.random.standard_normal(size=(n_obs,))
dml_data = dml.DoubleMLData.from_arrays(X[:, 10:], y, X[:, :10])

learner = LassoCV()
ml_l = clone(learner)
ml_m = clone(learner)

dml_plr = dml.DoubleMLPLR(dml_data, ml_l, ml_m)

dml_plr.fit(store_predictions = True)
print(dml_plr.summary)

The predictions are then stored in an numpy array with dimensions (number of observations x number of cross-fitting repetitions x number of treatment variables). You can access these arrays to calculate and plot the residuals

# Predictions for nuisance part 'ml_l' and 'ml_m' stored in an array with dimensions (n_obs x n_rep x n_treat)
print(dml_plr.predictions['ml_l'].shape)
print(dml_plr.predictions['ml_m'].shape)

# Compute residuals for ml_l = E[Y|X]
residuals_ml_l_d1 = dml_data.y - dml_plr.predictions['ml_l'][:,:,0]

# Compute residuals for ml_m = E[D_1 | X] (for first treatment variable)
residuals_ml_m_d1 = dml_data.data[dml_data.d_cols[0]].values - dml_plr.predictions['ml_m'][:,:,0]

# Generate a scatter plot of the residuals 
import matplotlib.pyplot as plt

# Fixing random state for reproducibility
plt.scatter(residuals_ml_m_d1, residuals_ml_l_d1)
plt.show()

I hope this helps you a bit. @MalteKurz - if you'd like to add anything here, feel free to edit/comment.

Best,

Philipp

0 replies

SvenKlaassen · 2023-04-11T07:06:57Z

SvenKlaassen
Apr 11, 2023
Maintainer

Just a small correction. To calculate the correct residuals the predictions have to be reshaped:

# Predictions for nuisance part 'ml_l' and 'ml_m' stored in an array with dimensions (n_obs x n_rep x n_treat)
print(dml_plr.predictions['ml_l'].shape)
print(dml_plr.predictions['ml_m'].shape)

# Compute residuals for ml_l = E[Y|X]
residuals_ml_l_d1 = dml_data.y - dml_plr.predictions['ml_l'][:,:,0].reshape(-1)

# Compute residuals for ml_m = E[D_1 | X] (for first treatment variable)
residuals_ml_m_d1 = dml_data.data[dml_data.d_cols[0]].values - dml_plr.predictions['ml_m'][:,:,0].reshape(-1)

# Generate a scatter plot of the residuals 
import matplotlib.pyplot as plt

# Fixing random state for reproducibility
plt.scatter(residuals_ml_m_d1, residuals_ml_l_d1)
plt.show()

But with version 0.6.0 it is also possible to obtain the target values for the nuisance elements via the nuisance_targets() method.
In this example, we could calculate all residuals via

# Target values for nuisance part 'ml_l' and 'ml_m' stored in an array with dimensions (n_obs x n_rep x n_treat)
print(dml_plr.nuisance_targets['ml_l'].shape)
print(dml_plr.nuisance_targets['ml_m'].shape)

# Compute residuals for ml_l = E[Y|X] for all treatments
residuals_ml_l = dml_plr.nuisance_targets['ml_l'] - dml_plr.predictions['ml_l']

# Compute residuals for ml_m = E[D | X] (for all treatment variables)
residuals_ml_m = dml_plr.nuisance_targets['ml_m'] - dml_plr.predictions['ml_m']

And the corresponding plot would look something like this

import pandas as pd
import seaborn as sns

df = pd.melt(pd.DataFrame(residuals_ml_l[:, 0, :]), var_name="Treatment", value_name="Residual ml_l")
df["Residual ml_m"] = pd.melt(pd.DataFrame(residuals_ml_m[:, 0, :]))["value"]

g = sns.FacetGrid(df, col="Treatment", col_wrap=3)
g.map(sns.scatterplot, "Residual ml_m", "Residual ml_l")

Best,
Sven

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to obtain y and d residuals for plotting #161

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

How to obtain y and d residuals for plotting #161

PhilipSpechler Sep 27, 2022

Replies: 2 comments

PhilippBach Sep 28, 2022 Maintainer

SvenKlaassen Apr 11, 2023 Maintainer

PhilipSpechler
Sep 27, 2022

PhilippBach
Sep 28, 2022
Maintainer

SvenKlaassen
Apr 11, 2023
Maintainer