Add effect modifier #177

juandavidgutier · 2022-11-22T10:32:16Z

juandavidgutier
Nov 22, 2022

I have reviewed the documentation of the package DoubleML, and I found it very useful, congrats on the work!!! However, I want to know if is it possible to add a variable as an effect modifier to the causal model? I refer to direct effect modifiers based on the taxonomy of effect modifiers by VanderWheele and Robins: “Four types of effect modification: A classification based on directed acyclic graphs. Epidemiology. 2007.”

Thanks a lot for your answer.

Answered by PhilippBach

Nov 25, 2022

Hi @juandavidgutier

thanks! I'm not 100% sure, if I get your questions here...

can I incorporate a not binary effect modifier, by including interaction terms or there is another form?

yes I think that's possible. If you have categorical variables you should be able to include the interactions of the one-hot levels (= the corresponding dummies generated from the levels of the variable). Continous should be possible too

Second: In the script, you have included a response variable 'wt82_71', a treatment 'qsmk', and an effect modifier 'sex', but is it possible to include a set of confounders (Xs) variables, isn't it?

Yes, that's possible. You can provide them simply via x_cols when you cr…

View full answer

PhilippBach · 2022-11-24T14:16:51Z

PhilippBach
Nov 24, 2022
Maintainer

Hi @juandavidgutier ,

thanks for your question. Yes, it is basically possible to add effect modifiers, i.e., by including interaction terms. I hope the example, that is based on Section 12.5. of Hernán and Robins (2020) illustrates how to do it with DoubleML.

Effect modification example based on Section 12.5. / Program 12.6 from Hernán and Robins (2020)

Load data from What-if book

Find the code for downloading the data from the What-if book at the end of this post.

import numpy as np
import pandas as pd
import doubleml as dml
from sklearn.base import clone
from sklearn.linear_model import LassoCV

nhefs_all = pd.read_excel('data/NHEFS.xls')

# consider only subset of variables
select_cols = ['wt82_71', 'sex', 'qsmk']
nhefs = nhefs_all[select_cols]
nhefs.dropna(inplace = True)

Regression Model with Effect Modification

We want to estimate the regression model (with effect modification) and want to estimate the coefficients $\beta_1$, $\beta_2$, and $\beta_3$

$wt = \beta_{1} \cdot qsmk + \beta_2 \cdot sex+ \beta_3 \cdot (qsmk \cdot sex) + \varepsilon$

We set up a data backend for this regression model.

# create new column for interaction term 
nhefs['qsmk_and_female'] = nhefs.qsmk * nhefs.sex

# create a data backend
dml_data_eff_mod = dml.DoubleMLData(nhefs,
                    y_col = 'wt82_71',
                    d_cols = ['qsmk', 'sex', 'qsmk_and_female'])

Next, we specify the learners and initiate a partially linear regression
model.

# learners
learner = LassoCV()
ml_l = clone(learner)
ml_m = clone(learner)

np.random.seed(1)
dml_eff_mod = dml.DoubleMLPLR(dml_data_eff_mod,
                            ml_l, ml_m)

dml_eff_mod.fit()
dml_eff_mod.summary

                     coef   std err         t     P>|t|     2.5 %    97.5 %
qsmk             2.854826  0.642493  4.443359  0.000009  1.595563  4.114088
sex             -0.035241  0.433580 -0.081278  0.935221 -0.885041  0.814560
qsmk_and_female -0.656232  0.980216 -0.669477  0.503191 -2.577420  1.264955

In case you consider mulitple interaction terms, correcting for multiple
testing might become relevant as below

dml_eff_mod.bootstrap()
dml_eff_mod.p_adjust()

                     coef   pval
qsmk             2.854826  0.000
sex             -0.035241  0.824
qsmk_and_female -0.656232  0.824

Comment

A caveat to the current implementation is that DoubleML only supports one learner for all treatment variables and hence, does not allow to specify classification and regression learners for the different treatment variables. However, it is possible to provide specific parameters for the learners of the corresponding treatment variables, see the learners chapter in the user guide.

References

Hernán MA, Robins JM (2020). Causal Inference: What If. Boca Raton: Chapman & Hall/CRC.

Code based on:

Program 12.1 in
https://github.com/jrfiedler/causal_inference_python_code/blob/master/chapter12.ipynb
https://remlapmot.github.io/cibookex-r/ip-weighting-and-marginal-structural-models.html#program-12.6
Commented R code to download data below

# Execute the following code in R to download the data sets and save them in a directory
# In this directory, create a new subdirectory called 'data'
# source: https://remlapmot.github.io/cibookex-r/index.html#downloading-the-datasets

library(here)
dataurls <- list()
stub <- "https://cdn1.sph.harvard.edu/wp-content/uploads/sites/1268/"
dataurls[[1]] <- paste0(stub, "2012/10/nhefs_sas.zip")
dataurls[[2]] <- paste0(stub, "2012/10/nhefs_stata.zip")
dataurls[[3]] <- paste0(stub, "2017/01/nhefs_excel.zip")
dataurls[[4]] <- paste0(stub, "1268/20/nhefs.csv")

temp <- tempfile()
for (i in 1:3) {
  download.file(dataurls[[i]], temp)
  unzip(temp, exdir = "data")
}

download.file(dataurls[[4]], here("data", "nhefs.csv"))

2 replies

juandavidgutier Nov 24, 2022
Author

Hi PhilippBach

Thanks a lot for your answer. However, I want to solve two additional questions about the script you have shared with me. First: The treatment variable is 'qsmk' and the female sex is the effect modifier? If it is the case, can I incorporate a not binary effect modifier, by including interaction terms or there is another form? Second: In the script, you have included a response variable 'wt82_71', a treatment 'qsmk', and an effect modifier 'sex', but is it possible to include a set of confounders (Xs) variables, isn't it?

PhilippBach Nov 25, 2022
Maintainer

Hi @juandavidgutier

thanks! I'm not 100% sure, if I get your questions here...

can I incorporate a not binary effect modifier, by including interaction terms or there is another form?

yes I think that's possible. If you have categorical variables you should be able to include the interactions of the one-hot levels (= the corresponding dummies generated from the levels of the variable). Continous should be possible too

Second: In the script, you have included a response variable 'wt82_71', a treatment 'qsmk', and an effect modifier 'sex', but is it possible to include a set of confounders (Xs) variables, isn't it?

Yes, that's possible. You can provide them simply via x_cols when you create the data backend. See the toy example below, where I add a constructed (standard normal) variable; you could use basically any variable in your data set

# create new column for interaction term 
nhefs['qsmk_and_female'] = nhefs.qsmk * nhefs.sex

# simulate a variable and include as a confounder / control variable
nhefs['x_var'] = np.random.normal(size = nhefs.shape[0])

# create a data backend
dml_data_eff_mod = dml.DoubleMLData(nhefs,
                    y_col = 'wt82_71',
                    d_cols = ['qsmk', 'sex', 'qsmk_and_female'],
                    x_cols = ['x_var'])

# learners
learner = LassoCV()
ml_l = clone(learner)
ml_m = clone(learner)

np.random.seed(1)
dml_eff_mod = dml.DoubleMLPLR(dml_data_eff_mod,
                            ml_l, ml_m)

dml_eff_mod.fit()
dml_eff_mod.summary

                    coef   std err         t     P>|t|     2.5 %    97.5 %
qsmk             2.899548  0.642639  4.511942  0.000006  1.640000  4.159097
sex             -0.022673  0.432595 -0.052412  0.958200 -0.870545  0.825198
qsmk_and_female -0.677453  0.980858 -0.690674  0.489770 -2.599899  1.244993

Answer selected by SvenKlaassen

PhilippBach · 2022-11-24T14:20:39Z

PhilippBach
Nov 24, 2022
Maintainer

If you consider the answer as addressing your question, I'd encourage you to mark it as an answer by pressing the "mark as answer button" under the reply

0 replies

juandavidgutier · 2022-11-25T20:22:27Z

juandavidgutier
Nov 25, 2022
Author

Hi PhilippBach

I am trying to run an Interactive regression model (IRM) with an epidemiological dataset, where the effect modifier is a continuous variable (NBI), the treatment (NeutralNina) and outcome (excess_cases1) are binary variables, and the confounders are continuous (qbo, wpac, zwnd). Here is the dataset:top50.csv

However, I get the next error: "Error in private$check_data(self$data) : Incompatible data.
To fit an IRM model with DoubleML exactly one binary variable with values 0 and 1 needs to be specified as treatment variable."

Here is my code in R

`library(DoubleML)
library(dplyr)
library(na.tools)
library(mlr3)
library(mlr3learners)

top50 <- read.csv("D:/top50.csv")

#NeutralNiña dataset
data_NeutralNina <- select(top50, excess_cases1, NeutralNina, qbo, wpac, zwnd, NBI)
data_NeutralNina <- na.omit(data_NeutralNina)

#interaction term
data_NeutralNina$NeutralNina_and_NBI <- data_NeutralNina$NeutralNina * data_NeutralNina$NBI

#data
obj_dml_data = double_ml_data_from_data_frame(data_NeutralNina,
y_col = "excess_cases1",
d_cols = c("NeutralNina", "NBI", "NeutralNina_and_NBI"),
x_cols = c("qbo", "wpac", "zwnd"),
use_other_treat_as_covariate=TRUE)

#ML methods
ml_g = lrn("classif.ranger", num.trees = 100, mtry = 20, min.node.size = 2, max.depth = 5)
ml_m = lrn("classif.ranger", num.trees = 100, mtry = 20, min.node.size = 2, max.depth = 5)

#DML specifications
dml_irm_obj = DoubleMLIRM$new(obj_dml_data, ml_g, ml_m) #HERE I GET THE ERROR

#estimation
dml_irm_obj$fit()

print(dml_irm_obj)`

I will appreciate a lot your cooperation.

2 replies

PhilippBach Nov 26, 2022
Maintainer

I see. At the moment the IRM only supports one treatment variable which is required to be binary. We are planning to extend this, but it will take some time to get this done. If you only had multiple binary treatment variables, we could try to fix this soon, but the continuous treatment effect case is associated a bit more effort...

Alternatively, you could try to put your analysis in an PLR model which is more flexible in both regards.

Best,

Philipp

juandavidgutier Nov 26, 2022
Author

Hi PhilippBach

Thanks for your cooperation

zhoujiahongsysu · 2022-11-30T12:43:56Z

zhoujiahongsysu
Nov 30, 2022

Hi PhilippBach,

I have an event study using a difference-in-difference model (with time and firm fixed effects), comparing the outcome of treated and control samples before and after the event. The formula looks like this:

Without fixed effects:

$$
\mathrm{y}{i, t}=\alpha_1 \text { treated }{i, t}+\alpha_2 \text { post }{i, t}+\alpha_3 \text { treated }{i, t} * \text { post }{i, t}+\beta \text { controls }{i, t}+\varepsilon_{i, t}
$$

And with fixed effects:

$$
\mathrm{y}{i, t}=\alpha \text { treated }{i, t} * \text { post }{i, t}+\beta \text { controls }{i, t}+\gamma_i+\delta_t+\varepsilon_{i, t}
$$

I want to implement these two models with double machine learning model using the DML package. I have three questions for help:

(1) If the event is mostly exogenous imposed by political issues, is it still sensible to use DML? In that case, E[D│X] is already expected to be zero.

(2) For the no fixed effects model, in the DoubleMLData function, should the ['treated','post', 'treated_post'] put at the “d_cols” (the treatment variable) or “x_cols” (the control covariates)? The modifier effect example you show above put it at “x_cols”, but I just wonder whether it would be the same for DiD model with treated and post interaction terms.

Specifically, slightly change that example, which one below should be correct?

dml_data_eff_mod = dml.DoubleMLData(nhefs,
y_col = 'wt82_71',
d_cols = ['treated','post', 'treated_post'],
x_cols = ['x_var'])

Or

dml_data_eff_mod = dml.DoubleMLData(nhefs,
y_col = 'wt82_71',
d_cols = ['treated_post'],
x_cols = ['x_var']+ ['treated','post'])

(3) For the fixed effects model, is the implementation below correct?

dml_data_eff_mod = dml.DoubleMLData(nhefs,
y_col = 'wt82_71',
d_cols = ['treated_post'],
x_cols = ['x_var']+ [time_and_firm_dummies])

Thank you very much in advance. The APIs, documentation, and DoubleML Tutorial are excellent.

2 replies

PhilippBach Nov 30, 2022
Maintainer

Hi @zhoujiahongsysu

(1) If the event is mostly exogenous imposed by political issues, is it still sensible to use DML? In that case, E[D│X] is already expected to be zero.

In case the treatment is truly randomized in an unconditional way (i.e., unconfoundedness holds), you wouldn't have to adjust for any confounders in your estimation framework in order to identify the causal parameter of interest. However, it could make sense to include covariates for efficiency reasons, as they might help to reduce the residual variance. I'd recommend you to try it out and compare the results...

(2) For the no fixed effects model, in the DoubleMLData function, should the ['treated','post', 'treated_post'] put at the “d_cols” (the treatment variable) or “x_cols” (the control covariates)? The modifier effect example you show above put it at “x_cols”, but I just wonder whether it would be the same for DiD model with treated and post interaction terms. [...]

It depends on what you are after: Do you want to make inference (i.e,. obtain a coefficient estimate and standard errors etc) for one of the variables only - then you would have to include that variable in d_cols and the remaining confounding variables in x_cols. If you want to get estimates + standard errors for all these variables, you should provide all of them via d_cols and the remaining confounding variables in
x_cols. In case you provide multiple treatment variables in d_cols the algorithm iterates through them. It starts with the first one and adds the other treatment variables to the x_cols internally and proceeds like this. Whether the other treatment variables are added to the list of confounders during estimation is specified via the option use_other_treat_as_covariate, see the docu.

I hope this helps!

zhoujiahongsysu Nov 30, 2022

Thank you very much Philipp, that's extremely helpful! And thanks for your awesome work in developing this package.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add effect modifier #177

{{title}}

Replies: 4 comments 6 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Add effect modifier #177

juandavidgutier Nov 22, 2022

Replies: 4 comments · 6 replies

PhilippBach Nov 24, 2022 Maintainer

Effect modification example based on Section 12.5. / Program 12.6 from Hernán and Robins (2020)

Load data from What-if book

Regression Model with Effect Modification

Comment

References

juandavidgutier Nov 24, 2022 Author

PhilippBach Nov 25, 2022 Maintainer

PhilippBach Nov 24, 2022 Maintainer

juandavidgutier Nov 25, 2022 Author

PhilippBach Nov 26, 2022 Maintainer

juandavidgutier Nov 26, 2022 Author

zhoujiahongsysu Nov 30, 2022

PhilippBach Nov 30, 2022 Maintainer

zhoujiahongsysu Nov 30, 2022

juandavidgutier
Nov 22, 2022

Replies: 4 comments 6 replies

PhilippBach
Nov 24, 2022
Maintainer

juandavidgutier Nov 24, 2022
Author

PhilippBach Nov 25, 2022
Maintainer

PhilippBach
Nov 24, 2022
Maintainer

juandavidgutier
Nov 25, 2022
Author

PhilippBach Nov 26, 2022
Maintainer

juandavidgutier Nov 26, 2022
Author

zhoujiahongsysu
Nov 30, 2022

PhilippBach Nov 30, 2022
Maintainer