Skip to content

Commit

Permalink
fix bibtex issues in ref; catch redundant labeling
Browse files Browse the repository at this point in the history
  • Loading branch information
nhejazi committed Jul 6, 2023
1 parent efdc3d2 commit f273201
Show file tree
Hide file tree
Showing 2 changed files with 115 additions and 112 deletions.
79 changes: 40 additions & 39 deletions 04-roadmap.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -27,19 +27,18 @@ the choice of statistical model, selecting a statistical target parameter that
represents an answer to the scientific question of interest, and developing
efficient estimators of the statistical estimand.

## The Roadmap {#roadmap}
## The Roadmap {#roadmap-steps}

The roadmap is a six-stage process:

1. Define the data as a random variable with a probability distribution, $O \sim
P_0$
2. Specify the statistical model $\M$ realistically, such that $P_0 \in \M$
3. Translate the scientific question of interest into a statistical target
parameter $\Psi$ and establish the target population
parameter $\Psi$ and establish the target population
4. Choose an estimator $\hat{\Psi}$ for $\Psi$ under realistic $\M$
5. Construct a measure of uncertainty for the estimate $\hat{\Psi}(P_n)$
6. Make substantive conclusion

6. Make substantive conclusion

### (1) Data: A random variable with a probability distribution, $O \sim P_0$ {-}

Expand Down Expand Up @@ -502,35 +501,37 @@ well known causal parameter that is most often called the "average treatment
effect" (ATE) and is denoted

\begin{equation}
ATE = \E_X(Y(1) - Y(0)),
ATE = \E_X[Y(1) - Y(0)],
(\#eq:ate)
\end{equation}
where $\E_X$ is the mean under the theoretical (unobservable) full data
$X = (W, Y(1), Y(0))$. Note that the full data structure $X$ is, by its very
definition, unobservable since one can never observe both of $Y(1)$ and $Y(0)$
for the same observational unit.

We can define much more complicated interventions on SCMs, such as
interventions based upon dynamic rules (which assign particular interventions
based on a function of the covariates $W$), stochastic rules (which can even
account for the natural value of $A$ observed in the absence of the
intervention), and much more. Each results in a different target causal
parameter and entails different identifiability assumptions discussed below.
where $\E_X(\cdot)$ is the expectation taken over the theoretical (unobservable)
full data (i.e., $X = (W, Y(1), Y(0))$) distribution $P_X$. Note that the full
data structure $X$ is, by its very definition, unobservable since one can never
observe both of $Y(1)$ and $Y(0)$ for the same observational unit.

We can define much more complicated interventions on SCMs, such as interventions
based upon dynamic rules (which assign particular interventions based on a
function of the covariates $W$), stochastic rules (which can even account for
the natural value of $A$ observed in the absence of the intervention), and much
more. Each results in a different target causal parameter and entails different
identifiability assumptions discussed below.

### Identifiability {-}

Since we can never observe both $Y(0)$ (the counterfactual outcome when $A=0$)
and $Y(1)$ (similarly, the counterfactual outcome when $A=1$), we cannot
estimate the quantity in Equation \@ref(eq:ate) directly. This is called the
_Fundamental Problem of Causal Inference_ [@holland1986statistics]. Thus, one of
the primary activities in causal inference is to _identify_ the assumptions
necessary to express causal quantities of interest as functions of the
data-generating distribution of the observed data. To do this, we must make
assumptions under which such quantities may be estimated from the observed data
$O \sim P_0$ and its corresponding data-generating distribution $P_0$.
Fortunately, given the causal model specified in the SCM above, we can, with a
handful of untestable assumptions, estimate the ATE from observational data.
These assumptions may be summarized as follows.
Since we can never simultaneously observe $Y(0)$, the counterfactual outcome
when $A=0$, and $Y(1)$, the counterfactual outcome when $A=1$, we cannot
estimate their difference $Y(1) - Y(0)$ (the individual treatment effect), which
appears in Equation \@ref(eq:ate) (inside the expectation $\E_X(\cdot)$ that
defines ATE). This is called the _Fundamental Problem of Causal Inference_
[@holland1986statistics]. Thus, one of the primary activities in causal
inference is to _identify_ the assumptions necessary to express causal
quantities of interest as functions of the data-generating distribution of the
observed data. To do this, we must make assumptions under which such quantities
may be estimated from the observed data $O \sim P_0$ and its corresponding
data-generating distribution $P_0$. Fortunately, given the causal model
specified in the SCM above, we can, with a handful of untestable assumptions,
estimate the ATE from observational data. These assumptions may be summarized as
follows.

::: {#consist-ass .definition name="Consistency"}
The outcome for unit $i$ is $Y_i(a)$ whenever $A_i = a$, which may be thought of
Expand All @@ -551,27 +552,27 @@ experiments, ensuring that the effect of $A$ on $Y$ can be disentangled from
that of $W$ on $Y$, even though $W$ affects both.
:::

::: {#posit-ass .definition name="Positivity (or Overlap)"}
::: {#posit-ass .definition name="Positivity/Overlap"}
All observed units, across strata defined by $W$, must have a bounded
(non-deterministic) probability of receiving treatment -- that is,
$\epsilon < \P(A = a \mid W) < 1 - \epsilon$ for all $a$ and $W$ and for some
$\epsilon > 0$) \ .
probability of receiving treatment -- that is, $\epsilon < \P(A = a \mid W) < 1
- \epsilon$ for all $a$ and $W$ and for some $\epsilon > 0$) \ .
:::

Technically speaking, only the latter two of these assumptions are necessary
when working within the SCM framework, as the first two are implied properties
of an SCM for i.i.d. data (if you're really curious, see this commentary of
@pearl2010brief for an extended philosophical discussion). We introduce all four
@pearl2010brief for an extended discussion). We introduce all four
identification assumptions because they are most often considered together, and
all four are necessary when working within the potential outcomes framework.
all four are necessary when working within the potential outcomes framework
[@rubin2005causal; @imbens2015causal].

Given these assumptions, the ATE may be re-written as a function of $P_0$ --
specifically
Under these assumptions, the ATE may be re-written as a function of $P_0$, the
distribution of the observed data:

\begin{align}
\psi_{\text{ATE}} &= \E_0(Y(1) - Y(0)) \\ \nonumber
&= \E_0 \left(\E_0[Y \mid A = 1, W] -
\E_0[Y \mid A = 0, W]\right) \ .
\psi_{\text{ATE}} &= \E_0[Y(1) - Y(0)] \\ \nonumber
&= \E_0 [\E_0[Y \mid A = 1, W] -
\E_0[Y \mid A = 0, W]] \ .
(\#eq:estimand)
\end{align}
In words, the ATE is the mean difference in the predicted outcome values for
Expand Down
148 changes: 75 additions & 73 deletions book.bib
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ @article{holland1986statistics
@book{fisher1946statistical,
title={Statistical Methods for Research Workers},
author={Fisher, Ronald Aylmer},
number={10\textsuperscript{th} ed.},
edition={10\textsuperscript{th}},
year={1946},
publisher={Oliver and Boyd}
}
Expand Down Expand Up @@ -139,17 +139,6 @@ @book{pearl2009causality
publisher={Cambridge University Press}
}

@article{holland1986statistics,
title={Statistics and causal inference},
author={Holland, Paul W},
journal={Journal of the American statistical Association},
volume={81},
number={396},
pages={945--960},
year={1986},
publisher={Taylor \& Francis}
}

@article{rosenbaum1983central,
title={The central role of the propensity score in observational studies for
causal effects},
Expand Down Expand Up @@ -765,14 +754,13 @@ @article{hejazi2021nonparametric
author = {Hejazi, Nima S and Rudolph, Kara E and {van der Laan}, Mark J and
D{\'\i}az, Iv{\'a}n},
year = {2022},
doi = {10.1093/biostatistics/kxac002},
url = {https://arxiv.org/abs/2009.06203},
year = {2022},
publisher = {Oxford University Press},
journal = {Biostatistics},
volume = {(in press)},
number = {},
pages = {}
pages = {},
publisher = {Oxford University Press},
url = {https://arxiv.org/abs/2009.06203},
doi = {10.1093/biostatistics/kxac002}
}

@article{tchetgen2013inverse,
Expand Down Expand Up @@ -1009,47 +997,52 @@ @Article{bembom2007realistic
}

@article{montoya2021optimal,
title={The Optimal Dynamic Treatment Rule {SuperLearner}: Considerations,
Performance, and Application},
author={Montoya, Lina and {van der Laan}, Mark J and Luedtke, Alexander and
Skeem, Jennifer and Coyle, Jeremy and Petersen, Maya},
year={2021},
eprint={2101.12326},
archivePrefix={arXiv},
primaryClass={stat.AP}
title={The optimal dynamic treatment rule superlearner: considerations,
performance, and application to criminal justice interventions},
author={Montoya, Lina M and {van der Laan}, Mark J and Luedtke, Alexander R
and Skeem, Jennifer L and Coyle, Jeremy R and Petersen, Maya L},
journal={The International Journal of Biostatistics},
volume={19},
number={1},
pages={217--238},
year={2023},
publisher={De Gruyter},
doi={10.1515/ijb-2020-0127}
}

@article{montoya2021performance,
title={Performance and Application of Estimators for the Value of an
Optimal Dynamic Treatment Rule},
author={Montoya, Lina and Skeem, Jennifer and {van der Laan}, Mark and
Petersen, Maya},
year={2021},
eprint={2101.12333},
archivePrefix={arXiv},
primaryClass={stat.ME}
title={Estimators for the value of the optimal dynamic treatment rule with
application to criminal justice interventions},
author={Montoya, Lina M and {van der Laan}, Mark J and Skeem, Jennifer L and
Petersen, Maya L},
journal={The International Journal of Biostatistics},
volume={19},
number={1},
pages={239--259},
year={2023},
publisher={De Gruyter},
doi={10.1515/ijb-2020-0128}
}

@Article{luedtke2016resource,
Author={Luedtke, Alexander R and {van der Laan}, Mark J},
Title={Optimal individualized treatments in resource-limited settings},
Journal={International Journal of Biostatisics},
Year={2016},
Volume={12},
Number={1},
Pages={283--303},
Month={05}
Title={Optimal individualized treatments in resource-limited settings},
Author={Luedtke, Alexander R and {van der Laan}, Mark J},
Journal={The International Journal of Biostatisics},
Volume={12},
Number={1},
Pages={283--303},
Year={2016},
publisher={De Gruyter},
doi={10.1515/ijb-2015-0007}
}

@phdthesis{hejazi2021semiparametric,
title = {Semiparametric statistical methods for causal inference with
stochastic treatment regimes},
school = {University of California, Berkeley},
author = {Hejazi, Nima S},
author+an = {1=highlight},
year = {2021},
url = {https://www.stat.berkeley.edu/~nhejazi/publications/thesis-phd-biostat.pdf},
keywords = {theses}
}

@article{stock1989nonparametric,
Expand Down Expand Up @@ -1265,66 +1258,75 @@ @article{naimi2018stacked
}

@article{rvp2022super,
doi = {10.48550/ARXIV.2204.06139},
url = {https://arxiv.org/abs/2204.06139},
author = {Phillips, Rachael V and {van der Laan}, Mark J and Lee, Hana and
title={Practical considerations for specifying a super learner},
author={Phillips, Rachael V and van der Laan, Mark J and Lee, Hana and
Gruber, Susan},
title = {Practical considerations for specifying a super learner},
publisher = {arXiv},
year = {2022}
journal={International Journal of Epidemiology},
volume={},
pages={},
year={2023},
publisher={Oxford University Press},
doi={10.1093/ije/dyad023}
}

@Manual{SuperLearner,
title = {SuperLearner: Super Learner Prediction},
author = {Eric Polley and Erin LeDell and Chris Kennedy and Mark
{van der Laan}},
@manual{SuperLearner,
title = {{\texttt{SuperLearner}}: Super Learner Prediction},
author = {Polley, Eric and LeDell, Erin and Kennedy, Chris and {van der Laan},
Mark},
year = {2021},
note = {R package version 2.0-28},
note = {\texttt{R} package version 2.0-28},
url = {https://CRAN.R-project.org/package=SuperLearner},
}

@software{coyle-cran-origami,
doi = {10.5281/zenodo.835602},
url = {https://CRAN.R-project.org/package=origami},
note = {{\texttt{R}} package with \input{./metrics/downloads_origami.txt}},
version = {1.0.5},
@manual{coyle-cran-origami,
title = {{\texttt{origami}}: Generalized framework for cross-validation}
author = {Coyle, Jeremy R and Hejazi, Nima S and Malenica, Ivana and
Phillips, Rachael V},
author+an = {2=highlight},
title = {{\texttt{origami}}: Generalized framework for cross-validation},
keywords = {software-pkg}
note = {\texttt{R} package version 1.0.5},
doi = {10.5281/zenodo.835602},
url = {https://CRAN.R-project.org/package=origami}
}

@incollection{kennedy2016semiparametric,
title={Semiparametric theory and empirical processes in causal inference},
author={Kennedy, Edward H},
booktitle={Statistical causal inferences and their applications in public health research},
booktitle={Statistical Causal Inferences and Their Applications in Public
Health Research},
pages={141--167},
year={2016},
publisher={Springer}
}

@article{diaz2013sensitivity,
title={Sensitivity analysis for causal inference under unmeasured confounding and measurement error problems},
author={D{\'\i}az, Iv{\'a}n and van der Laan, Mark J},
journal={The international journal of biostatistics},
title={Sensitivity analysis for causal inference under unmeasured confounding
and measurement error problems},
author={D{\'\i}az, Iv{\'a}n and {van der Laan}, Mark J},
journal={The International Journal of Biostatistics},
volume={9},
number={2},
pages={149--160},
year={2013},
publisher={De Gruyter}
publisher={De Gruyter},
doi={10.1515/ijb-2013-0004}
}

@article{gruber2022targeted,
title={Targeted learning: Towards a future informed by real-world evidence},
author={Gruber, Susan and Phillips, Rachael V and Lee, Hana and Ho, Martin and Concato, John and van der Laan, Mark J},
journal={arXiv preprint arXiv:2205.08643},
year={2022}
title={{Targeted Learning}: Toward a Future Informed by Real-World Evidence},
author={Gruber, Susan and Phillips, Rachael V and Lee, Hana and Ho, Martin
and Concato, John and {van der Laan}, Mark J},
journal={Statistics in Biopharmaceutical Research},
volume={},
number={},
pages={},
year={2023},
publisher={Taylor \& Francis},
doi={10.1080/19466315.2023.2182356}
}

@article{gruber2022evaluating,
title={Evaluating and improving real-world evidence with Targeted Learning},
author={Gruber, Susan and Phillips, Rachael V and Lee, Hana and Concato, John and van der Laan, Mark},
author={Gruber, Susan and Phillips, Rachael V and Lee, Hana and Concato, John
and {van der Laan}, Mark},
journal={arXiv preprint arXiv:2208.07283},
year={2022}
}
}

0 comments on commit f273201

Please sign in to comment.