Skip to content

Commit

Permalink
Few more edits
Browse files Browse the repository at this point in the history
  • Loading branch information
Ivana Malenica authored and Ivana Malenica committed Sep 15, 2024
1 parent 48a232a commit 9026e40
Showing 1 changed file with 40 additions and 38 deletions.
78 changes: 40 additions & 38 deletions 06-sl3.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,9 @@ Coyle, Nima Hejazi, Ivana Malenica, Rachael Phillips, and Oleg Sofrygin_.
<!--
IM:
Since we introduced the concept of loss & risk in the previous chapter, it
should be ok use it instead of "performance metric". I might be a bit confusing
to change notation many times. Define base learner Shouldn't objective 4 be
before talking about screeners ect?
should be ok to use it instead of "performance metric". It might be a bit confusing
to change notation many times. Define base learner. Shouldn't objective 4 be
before talking about screeners etc?
-->

By the end of this chapter you will be able to:
Expand Down Expand Up @@ -528,7 +528,7 @@ Our first option to get CV predictions, `cv_preds_option1`, used the
This function only exists for learner fits that are cross-validated in `sl3`,
like those in `Lrnr_sl`. In addition to supplying `fold_number = "validation"`
in `predict_fold`, we can set `fold_number = "full"` to obtain predictions from
learners fit to the entire analytic dataset (i.e., all of the data supplied to
learners fit to the entire dataset (i.e., all of the data supplied to
`make_sl3_Task`). For instance, below we show that `glm_preds` we calculated
above can also be obtained by setting `fold_number = "full"`.

Expand All @@ -554,7 +554,7 @@ training).

<!--
IM:
Is this part really necessary?
Is this part necessary? Perhaps we can have something like an appendix?
-->

```{r cv-predictions-long}
Expand Down Expand Up @@ -677,6 +677,12 @@ tr_intervention_task <- make_sl3_Task(
counterfactual_pred <- sl_fit$predict(tr_intervention_task)
```

<!--
IM:
Perhaps this is also not necessary as we talk about dynamic interventions in the
next chapter.
-->

Note that this type of intervention, where every subject receives the same
intervention, is referred to as "static". Interventions that vary depending on
the characteristics of the subject are referred to as "dynamic". For instance,
Expand Down Expand Up @@ -929,6 +935,11 @@ if (knitr::is_latex_output()) {
```
### Revere-cross-validated predictive performance of Super Learner

<!--
IM:
This should be very optional, maybe in the "appendix"
-->

We can also use so-called "revere", to obtain a partial CV risk for the SL,
where the SL candidate learner fits are cross-validated but the meta-learner fit
is not. It takes essentially no extra time to calculate a revere-CV
Expand Down Expand Up @@ -1030,21 +1041,20 @@ forest) is used as the meta-learner, then the revere-CV risk estimate of the
resulting SL will be a worse approximation of the CV risk estimate. This is
because more flexible learners are more likely to overfit. When simple
parametric regressions are used as a meta-learner, like what we considered in
our SL (NNLS with `Lrnr_nnls`), and like all of the default meta-learners in
`sl3`, then the revere-CV risk is a quick way to examine an approximation of
the CV risk estimate of the SL and it can thought of as a ballpark lower bound
on it. This idea holds in our example; that is, with the simple NNLS
our SL (NNLS with `Lrnr_nnls`, the default meta-learner), then the revere-CV risk is
a quick way to examine an approximation of
the CV risk estimate of the SL. It can be thought of as a ballpark lower bound
on the CV risk estimate. This notion holds in our example; that is, with the simple NNLS
meta-learner the revere risk estimate of the SL (`r round(sl_revere_risk, 4)`)
is very close to, and slightly lower than, the CV risk estimate for the SL
(`r round(cv_sl_fit$cv_risk[nrow(cv_sl_fit$cv_risk),2], 4)`).

## Discrete Super Learner

From the glossary (Table 1) entry for discrete SL (dSL) in @rvp2022super,
the dSL is "a SL that uses a winner-take-all meta-learner called
Discrete SL (dSL) is a SL that uses a winner-take-all meta-learner called
the cross-validated selector. The dSL is therefore identical to the candidate
with the best cross-validated performance; its predictions will be the same as
this candidate’s predictions". The cross-validated selector is
this candidate’s predictions. The cross-validated selector is
`Lrnr_cv_selector` in `sl3` (see `Lrnr_cv_selector` documentation for more
detail) and a dSL is instantiated in `sl3` by using `Lrnr_cv_selector` as the
meta-learner in `Lrnr_sl`.
Expand Down Expand Up @@ -1101,10 +1111,6 @@ earth_pred <- dSL_fit$learner_fits$Lrnr_earth_2_3_backward_0_1_0_0$predict(task)
identical(dSL_pred, earth_pred)
```

<!--
IM:
Seems like an overkill
-->

### Including ensemble Super Learner(s) as candidate(s) in discrete Super Learner

Expand All @@ -1113,17 +1119,18 @@ showed how to do this with `cv_sl` above. We have also seen that when we
include a learner as a candidate in the SL (in `sl3` terms, when we include a
learner in the `Stack` passed to `Lrnr_sl` as `learners`), we are able to
examine its CV risk. Also, when we use the dSL, the candidate that achieved the
lowest CV risk defines the resulting SL. We therefore can use the dSL automate
lowest CV risk defines the resulting SL. We therefore can use the dSL to automate
a procedure for obtaining a final SL that represents the candidate with the
best cross-validated predictive performance. When the ensemble SL (eSL) and
best cross-validated predictive performance.

The ensemble SL (eSL) is a SL that uses any parametric or non-parametric algorithm as its
meta-learner. Therefore, the eSL is defined by a combination of multiple
candidates; its predictions are defined by a combination of multiple candidates’
predictions. When the eSL and
its candidate learners are considered in a dSL as candidates, the eSL’s CV
performance can be compared to that from the learners from which it was
constructed, and the final SL will be the candidate that achieved the lowest CV
risk. From the glossary (Table 1) entry for eSL in @rvp2022super, an
eSL is "a SL that uses any parametric or non-parametric algorithm as its
meta-learner. Therefore, the eSL is defined by a combination of multiple
candidates; its predictions are defined by a combination of multiple candidates’
predictions." In the following, we show how to include the eSL, and multiple
risk. In the following, we show how to include the eSL, and multiple
eSLs, as candidates in the dSL.

Recall the SL object, `sl`, defined in section 2:
Expand Down Expand Up @@ -1163,10 +1170,10 @@ between including the eSL as a candidate in the dSL and calling `cv_sl` is that
the former automates a procedure for the final SL to be the learner that
achieved the best CV predictive performance, i.e., lowest CV risk. If the eSL
outperforms any other candidate, the dSL will end up selecting it and the
resulting SL will be the eSL. As mentioned in @rvp2022super, "another advantage
resulting SL will be the eSL. Another advantage
of this approach is that multiple eSLs that use more flexible meta-learner
methods (e.g., non-parametric machine learning algorithms like HAL) can be
evaluated simultaneously."
evaluated simultaneously.

Below, we show how multiple eSLs can be included as candidates in a dSL:
```{r make-sl-discrete-multi-esl}
Expand Down Expand Up @@ -1363,7 +1370,7 @@ quantification.

### Character and categorical covariates

First any character covariates are converted to factors. Then all factor
First, any character covariates are converted to factors. Then all factor
covariates are one-hot encoded, i.e., the levels of a factor become a set of
binary indicators. For example, the factor `cats` and it's one-hot encoding are
shown below:
Expand Down Expand Up @@ -1466,7 +1473,7 @@ stack_pretty_names

Customized learners can be created over a grid of tuning parameters. For
highly flexible learners that require careful tuning, it is oftentimes
very helpful to consider different tuning parameter specifications. However,
helpful to consider different tuning parameter specifications. However,
this is time consuming, so computational feasibility should be considered.
Also, when the effective sample size is small, highly flexible learners
will likely not perform well since they typically require a lot of data to fit
Expand All @@ -1475,8 +1482,8 @@ and step-by-step guidelines for tailoring the SL specification to perform well
for the prediction task at hand.

<!--
IM:
...
IM:
Some general wisdom would be nice here too
-->

We show two ways to customize learners over a grid of tuning parameters. The
Expand Down Expand Up @@ -1535,17 +1542,12 @@ lrnr_nnet_autotune <- Lrnr_caret$new(method = "nnet", name = "NNET_autotune")

## Learners with Interactions and `formula` Interface

As described in in @rvp2022super, if it’s known/possible that there are
interactions among covariates then we can include learners that pick up on that
If it’s known/possible that there are
interactions among covariates, then we can include learners that pick up on that
explicitly (e.g., by including in the library a parametric regression learner
with interactions specified in a formula) or implicitly (e.g., by including in
the library tree-based algorithms that learn interactions empirically).

<!--
IM:
...
-->

One way to define interaction terms among covariates in `sl3` is with a
`formula`. The argument exists in `Lrnr_base`, which is inherited by every
learner in `sl3`; even though `formula` does not explicitly appear as a
Expand Down Expand Up @@ -1579,11 +1581,11 @@ IM:
...
-->

As stated in @rvp2022super, "covariate screening is essential when the
Covariate screening is essential when the
dimensionality of the data is very large, and it can be practically useful in
any SL or machine learning application. Screening of covariates that considers
associations with the outcome must be cross validated to avoid biasing the
estimate of an algorithm’s predictive performance". By including
estimate of an algorithm’s predictive performance. By including
screener-learner couplings as additional candidates in the SL library, we are
cross validating the screening of covariates. Covariates retained in each CV
fold may vary.
Expand Down

0 comments on commit 9026e40

Please sign in to comment.