-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
144 lines (114 loc) · 4.5 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
---
title: "Assessing trends in vaccine efficacy by pathogen genetic distance"
author: "David Benkeser, Michal Juraska, and Peter Gilbert"
date: "September 28, 2017"
output:
md_document:
variant: markdown_github
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Installation of `sievetrend`
The `sievetrend` repository is available for download as an `R` package and may be
downloaded directly from GitHub as follows.
```{r, messages = FALSE, warnings = FALSE}
# install sievetrend from GitHub
devtools::install_github("benkeser/sievetrend")
library(sievetrend)
# also will require package survtmle
if(!("survtmle" %in% row.names(installed.packages()))){
install.packages("survtmle")
}
library(survtmle)
```
Below, we include the code that was used to perform the simulation study
and to analyze the RTS,S data for the manuscript.
## Simulation
Here, we demonstrate how to obtain results for a simulated data set.
```{r}
# sample size
n <- 1000
# set random seed
set.seed(1234)
# simulate data set
dat <- makeData(n = n)
# get the formula for the empirical estimate of the
# censoring distribution needed for input to survtmle
glm.ctime <- get.ctimeForm(trt = dat$trt, site = dat$adjustVars$site,
ftime = dat$ftime, ftype = dat$ftype)
# call survtmle to estimate cumulative incidence
object <- survtmle(ftime = dat$ftime,
ftype = dat$ftype,
adjustVars = dat$adjustVars,
trt = dat$trt,
glm.trt = "1",
glm.ftime = "trt*factor(site)",
glm.ctime = glm.ctime,
method = "mean",
t0=6)
# call trend_test to estimate projection
trend <- trend_test(object)
trend
```
A numerical approximation of the true value of $\beta_{0,n}$ may be obtained as follows.
```{r}
# get covariance matrix estimate
nabla_g <- sievetrend:::grad_g(object$est)
Upsilon_n <- nabla_g %*% cov(Reduce(cbind, object$ic)) %*% t(nabla_g)
# call getTruth function with this estimate several times
# and average (to increase accuracy)
beta_0n <- rowMeans(replicate(20, getTruth(Upsilon = Upsilon_n, n = 1e6)))[2]
# compare estimate to truth
c(est = trend$beta, truth = beta_0n)
```
The full code used to execute the simulation study is included in the simulation subdirectory. This includes the script `cent.R`, which is batched to a slurm via
the script `sce.sh`. The function `makeIllustrationPlot` was used to produce Figure 1.
## RTS,S Analysis
Due to existing privacy agreements, it is difficult to obtain access to the real RTS,S data. Nevertheless, a mock data set is distributed with the `survtmle` package that will serve as illustration for the outputation methodology. See `?rtss` for further description of the data set.
The `rtss` data set is a list of ten data sets each representing an outputed data set. They are formatted for analysis of a binary genetic mark, so we first replace the failure type column with a simulated genetic distance.
```{r}
# replace the ftype column in each data set
rtss_mod <- lapply(rtss, function(data){
# which were observed failures
fail_idx <- which(data$ftype > 0)
# number of observed failures
fail_n <- length(fail_idx)
# replace with a simulated version
data$ftype[fail_idx] <- rbinom(fail_n, 4, plogis(data$vaccine)) + 1
# collapse site variable into a single column
data$site <- as.numeric(1*(data$site1==1) + 2*(data$site2==1) +
3*(data$site3==1) + 4*(data$site4==1) +
5*(data$site5==1))
return(data)
})
# check out new ftype distribution for first
# outputed data set
table(rtss_mod[[1]]$ftype)
```
Now we use `survtmle` to estimate the cumulative incidence for each data set.
```{r}
rslt <- lapply(rtss_mod, function(data){
glm.ctime <- get.ctimeForm(trt = data$trt, site = data$site,
ftime = data$ftime, ftype = data$ftype)
object <- survtmle(ftime = data$ftime,
ftype = data$ftype,
adjustVars = data[,"site",drop=FALSE],
trt = data$vaccine,
glm.trt = "1",
glm.ftime = "trt*factor(site)",
glm.ctime = glm.ctime,
method = "mean",
t0=6)
return(object)
})
```
We can use the `getMO` function to obtain the averaged cumulative incidence
results and apply the trend_test on this scale.
```{r}
# obtain averaged results
mo_rslt <- getMO(rslt)
# apply trend test
mo_trend <- trend_test(mo_rslt)
mo_trend
```