diff --git a/Makefile b/Makefile index 909baaa..24ebcc6 100644 --- a/Makefile +++ b/Makefile @@ -27,8 +27,6 @@ preview: $(vignettes) .PHONY: pkgdown pkgdown: - @mkdir -p docs/inst/bib/; cp inst/bib/$(pkg).bib docs/inst/bib/; - @echo "added/updated the package bib file" Rscript -e "library(methods); pkgdown::build_site();" $(tar): $(objects) diff --git a/docs/404.html b/docs/404.html new file mode 100644 index 0000000..9958645 --- /dev/null +++ b/docs/404.html @@ -0,0 +1,156 @@ + + + +
+ + + + +vignettes/reda-Recur.Rmd
+ reda-Recur.Rmd
library(reda)
+packageVersion("reda")
## [1] '0.5.0'
+The Recur()
function provides a flexible and widely applicable formula response interface for modeling recurrent event data with considerate data checking procedures. It combined the flexible interface of reSurv()
(deprecated in reReg version 1.1.7) and the effective checking procedures embedded in the Survr()
(deprecated in reda version 0.5.0).
The function interface of Recur()
is given below.
A high-level introduction to each argument is as follows:
+time
: event and censoring timesid
: subject’s idevent
: recurrent event indicator, cost, or typeterminal
: event indicator of terminal eventsorigin
: time origin of subjectscheck
: how to run the data checking procedure
+"hard"
: throw errors if the check_Recur()
finds any issue on the data structure"soft"
: throw warnings instead"none"
: not to run the checking procedureMore details of arguments are provided in the function documentation by ?Recur
.
Recur
ObjectThe function Recur()
returns an S4-class Recur
object representing model response for recurrent event data. The Recur
class object mainly contains a numerical matrix object (in the .Data
slot) that serves as a model response matrix. The other slots are
ID
: a factor storing the original subject’s ID, which originally can be a character vector, a numeric vector, or a factor). It is needed to pinpoint data issues for particular subjects with their original ID’s.ord
: indices that sort the response matrix (by rows) increasingly by id
, time2
, and - event
. Sorting is often done in the model-fitting steps, where the indices stored in this slot can be used directly.rev_ord
: indices that revert the increasingly sorted response matrix by ord
to its original ordering. This slot is provided to easily revert the sorting.first_idx
: indices that indicates the first record of each subject in the sorted matrix. It helps in the data checking produce and may be helpful in model-fitting step, such as getting the origin time.last_idx
: indices that indicates the last record of each subject in the sorted matrix. Similar to first_idx
, it helps in the data checking produce and may be helpful in the model-fitting step, such as locating the terminal events.check
: a character string that records the specified check
argument. It just records the option that users specified on data checking.Among all the arguments, only the argument time
does not have default values and thus has to be specified by users.
time
is givenid
takes its default value: seq_along(time)
.event
takes its default values: 0
(censoring) at the last record of each subject, and 1
(event) before censoring.terminal
and origin
take zero for all subjects by default. time1 time2 id event terminal origin
+[1,] 0 3 1 0 0 0
+[2,] 0 4 2 0 0 0
+[3,] 0 5 3 0 0 0
+time
and id
are givenevent
takes its default values: 0
(censoring) at the last record of each subject, and 1
(event) before censoring.terminal
and origin
take zero for all subjects by default. time1 time2 id event terminal origin
+[1,] 4 6 1 0 0 0
+[2,] 3 5 2 0 0 0
+[3,] 2 4 1 1 0 0
+[4,] 1 3 2 1 0 0
+[5,] 0 2 1 1 0 0
+[6,] 0 1 2 1 0 0
+## sort by id, time2, and - event
+head(ex2[ex2@ord, ])
time1 time2 id event terminal origin
+[1,] 0 2 1 1 0 0
+[2,] 2 4 1 1 0 0
+[3,] 4 6 1 0 0 0
+[4,] 0 1 2 1 0 0
+[5,] 1 3 2 1 0 0
+[6,] 3 5 2 0 0 0
+ord
stores the indices that sort the response matrix by id
, time2
, and - event
.%to%
for recurrent episodesThe function Recur()
allows users to input recurrent episodes by time1
and time2
, which can be specified with help of %to%
(or its alias %2%
) in Recur()
. For example,
left <- c(1, 5, 7)
+right <- c(3, 7, 9)
+ex3 <- Recur(left %to% right, id = c("A1", "A1", "A2"))
+head(ex3)
time1 time2 id event terminal origin
+[1,] 1 3 1 1 0 1
+[2,] 5 7 1 0 0 1
+[3,] 7 9 2 0 0 7
+Internally, the function %to%
returns a list with element named "time1"
and "time2"
. Therefore, it is equivalent to specify such a list.
origin
and terminal
+origin
and terminal
take a numeric vector.time
. Some simple examples are given below. time1 time2 id event terminal origin
+[1,] 1 3 1 0 1 1
+[2,] 1 4 2 0 1 1
+[3,] 1 5 3 0 1 1
+
+ time1 time2 id event terminal origin
+[1,] 1 3 1 1 0 1
+[2,] 3 4 1 0 0 1
+[3,] 2 5 2 0 1 2
+ex7 <- Recur(3:5, id = c("A1", "A1", "A2"),
+ origin = c(1, 1, 2), terminal = c(0, 0, 1))
+stopifnot(all.equal(ex6, ex7))
Error : Invalid length for `origin`. See '?Recur' for details.
+
+Error : Invalid length for 'terminal'. See '?Recur' for details.
+The Recur()
(internally calls check_Recur()
and) checks whether the specified data fits into the recurrent event data framework by several rules if check = "hard"
or check = "soft"
. The existing rules and the corresponding examples are given below.
Error : Subjects having events at or after censoring: A1.
+Error : Subjects having multiple terminal events: A1.
+Error : Missing times! Please check subject: A1.
+Error : Event times must be >= origin. Please check subject: A1.
+
+Error : Event times must be >= origin. Please check subject: A1.
+Error : Recurrent episodes cannot be overlapped. Please check subject: A1.
+[1] A1: (0, 1+], (2, 3], (6, 8+]
+Show()
MethodA show()
method is added for the Recur
object in a similar fashion to the output of the function survival:::print.Surv()
, which internally converts the input Recur
object to character strings representing the recurrent episodes by a dedicated as.character()
method. For each recurrent episode,
+
sign;*
sign;For a concise printing, the show()
method takes the getOption("reda.Recur.maxPrint")
to limit the maximum number of recurrent episodes to be printed for each process. By default, options(reda.Recur.maxPrint = 3)
is set.
We may illustrate the results of the show()
method by the example valve seats data, where terminal events are artificially added.
set.seed(123)
+term_events <- rbinom(length(unique(valveSeats$ID)), 1, 0.5)
+with(valveSeats, Recur(Days, ID, No., term_events))
[1] 251: (0, 761+]
+ [2] 328: (0, 759*]
+ [3] 329: (0, 98], (98, 667*]
+ [4] 390: (0, 326], (326, 653], ..., (653, 667*]
+ [5] 395: (0, 665+]
+ [6] 397: (0, 84], (84, 667*]
+ [7] 400: (0, 87], (87, 663*]
+ [8] 404: (0, 646], (646, 653*]
+ [9] 407: (0, 92], (92, 653*]
+[10] 409: (0, 651*]
+[11] 411: (0, 258], (258, 328], ..., (621, 650+]
+[12] 252: (0, 61], (61, 539], (539, 648*]
+[13] 389: (0, 254], (254, 276], ..., (640, 644*]
+[14] 395: (0, 76], (76, 538], (538, 642*]
+[15] 401: (0, 635], (635, 641*]
+[16] 404: (0, 349], (349, 404], ..., (561, 649+]
+[17] 409: (0, 631*]
+[18] 411: (0, 596*]
+[19] 415: (0, 120], (120, 479], (479, 614+]
+[20] 327: (0, 323], (323, 449], (449, 582+]
+[21] 389: (0, 139], (139, 139], (139, 589*]
+[22] 394: (0, 593+]
+[23] 394: (0, 573], (573, 589*]
+[24] 397: (0, 165], (165, 408], ..., (604, 606+]
+[25] 405: (0, 249], (249, 594*]
+[26] 407: (0, 344], (344, 497], (497, 613*]
+[27] 412: (0, 265], (265, 586], (586, 595+]
+[28] 420: (0, 166], (166, 206], ..., (348, 389+]
+[29] 390: (0, 601+]
+[30] 392: (0, 410], (410, 581], (581, 601+]
+[31] 395: (0, 611+]
+[32] 396: (0, 608+]
+[33] 397: (0, 587*]
+[34] 400: (0, 367], (367, 603+]
+[35] 403: (0, 202], (202, 563], ..., (570, 585*]
+[36] 409: (0, 587+]
+[37] 411: (0, 578*]
+[38] 413: (0, 578*]
+[39] 416: (0, 586*]
+[40] 417: (0, 585+]
+[41] 421: (0, 582*]
+The updated show()
method preserves NA
’s when check = "none"
. However, NA
’s will always appear because times are sorted internally.
[1] 1: (0, 4], (4, 6], (6, NA+] 2: (0, 3], (3, 5], (5, NA+]
+vignettes/reda-intro.Rmd
+ reda-intro.Rmd
In this vignette, we introduce how to explore recurrent event data by nonparametric sample mean function (also called Nelson-Aalen estimator) and modeling the event counts of interest by gamma frailty model with the reda package through examples. Most functions in the package are S4 methods that produce S4 class objects. The details of function syntax and the produced objects are available in the package manual, which will thus not be covered in this vignette.
-In this vignette, we introduce how to explore recurrent event data by mean cumulative function, and modeling the event counts of interest by gamma frailty model with the reda package through examples. Most functions in the package are S4 methods that produce S4 class objects. The details of function syntax and the produced objects are available in the package manual, which will thus not be covered in this vignette.
+library(reda)
+packageVersion("reda")
## [1] '0.5.0'
First of all, the sample recurrent event data we are going to use in the following examples is called simuDat
, which contains totally 500 observations of 6 variables.
head(simuDat)
## ID time event group x1 gender
## 1 1 1 1 Contr -1.93 female
## 2 1 22 1 Contr -1.93 female
@@ -97,7 +119,7 @@ 2018-02-08
## 4 1 57 1 Contr -1.93 female
## 5 1 112 0 Contr -1.93 female
## 6 2 140 0 Treat -0.11 female
-
+str(simuDat)
## 'data.frame': 500 obs. of 6 variables:
## $ ID : num 1 1 1 1 1 2 3 3 4 4 ...
## $ time : num 1 22 23 57 112 140 40 168 14 112 ...
@@ -120,68 +142,74 @@ 2018-02-08
-
gender
: Gender of subjects.
-The dataset was simulated by thinning method (Lewis and Shedler 1979) and further processed for a better demonstration purpose. (Note that reda also provides functions for simulating survival data, recurrent event data, or multiple state data. See another package vignette for details.)
-Data checking
-The subjects’ ID, event or censoring times, event indicators or costs, and time origins of the follow-up is specified through the corresponding arguments, ID
, time
, event
, and origin
in function Survr
, where the argument event
takes negative values as censoring, and positive values as costs or event indicators. The function Survr
serves as the formula response and contains a considerate data checking procedure for recurrent event data modeled by methods based on counts and rate function. The observations of the covariates specified in the formula will be checked first, where the checking rules include
-
-- Subjects’ ID, event times or censoring times cannot be missing;
-- For every subject, the event time cannot not be later than censoring time;
-- Each subject must have one and only one censoring time;
-- Subjects may have different time origins. But one subject must has only one time origin.
-
-The subject’s ID will be pinpointed (in most cases) if its observation violates any given checking rule.
-Exploratory analysis
-Computing sample MCF
-The nonparametric sample mean cumulative function (MCF) is also called Nelson-Aalen Estimator (Nelson 2003). The point estimate of MCF at each time point does not assume any particular underlying model. The variance estimates at each time point is computed by the Lawless and Nadeau method (Lawless and Nadeau 1995), Poisson process method or based on the well-known bootstrap method (Efron 1979) with subjects as resampling units. The approximate confidence intervals are provided as well, which are constructed based on the asymptotic normality of the MCF estimates itself (the default) or logarithm of the MCF estimates.
-The function mcf
is a generic function for the MCF estimates from a sample data or a fitted gamma frailty model (as demonstrated later). If a formula with Survr
as response is specified in function mcf
, the formula method for estimating the sample MCF will be called. The covariate specified at the right hand side of the formula should be either 1
or any “linear” combination of factor variables in the data. The former computes the overall sample MCF. The latter computes the sample MCF for each level of the combination of the factor variable(s) specified, respectively.
-The valve-seat dataset in Nelson (1995) and the simulated sample data are used for demonstration as follows:
-## Example 1. valve-seat data
-valveMcf0 <- mcf(Survr(ID, Days, No.) ~ 1, data = valveSeats)
-
-## Example 2. the simulated data
-simuMcf <- mcf(Survr(ID, time, event) ~ group + gender,
- data = simuDat, subset = ID %in% 1 : 50)
-After estimation, we may plot the sample MCF by function plot
, which actually returns a ggplot
object so that the plot produced can be easily further customized by ggplot2. The legendname
and legendLevels
can be specified to easily customize the legend in the plot. Two examples are given as follows:
-## overall sample MCF for valve-seat data in Nelson (1995)
-plot(valveMcf0, conf.int = TRUE, mark.time = TRUE, addOrigin = TRUE, col = 2) +
- ggplot2::xlab("Days") + ggplot2::theme_bw()
+The dataset was simulated by thinning method (Lewis and Shedler 1979) and further processed for a better demonstration purpose. (Note that reda also provides functions for simulating survival data, and recurrent event data. See vignette("reda-simulate")
for details.)
+
The process’s ID, event times, event indicators or costs, time origins, and possible terminal events of the follow-up is specified in the function Recur()
, which serves as the formula response and contains considerate data checking procedures for recurrent event data. See vignette("reda-Recur")
for details.
The nonparametric mean cumulative function (MCF) estimates are widely utilized in exploring the trend of recurrent event data. MCF is also called cumulative mean function (CMF) in literature (see e.g., Lawless and Nadeau 1995). Let \(N_i(t)\) denote the number of events that occurred up to time \(t\) of process \(i\). The MCF of \(N_i(t)\) denoted by \(M_i(t)\), is defined as follows: \[M_i(t)=\mathbb{E}\{N_i(t)\}.\] For \(k\) independent processes having the same MCF, the Nelson-Aalen Estimator (Nelson 2003) is often used, which is defined as follows: \[ \hat{M}(t) = \int_0^t \frac{dN(s)}{\delta(s)}, \] where \(dN(s)=\sum_{i=1}^k dN_i(s)\), \(dN_i(s)\) is the jump size of process \(i\) at time \(s\), \(\delta(s) = \sum_{i=1}^k \delta_i(s)\), \(\delta_i(s)\) is the at-risk indicator of process \(i\) at time \(s\). One variant is called the cumulative sample mean (CSM) function introduced by Cook and Lawless (2007), which assumes that \(\delta_i(s) = 1\) for \(s \ge 0\).
+The nonparametric estimate of MCF at each time point does not assume any particular underlying model. The variance estimates at each time point can be computed by the Lawless and Nadeau method (Lawless and Nadeau 1995), Poisson process method, the bootstrap method (Efron 1979) with subjects as resampling units. For CSM, the cumulative sample variance (CSV) method can be used instead. The approximate confidence intervals are provided as well, which are constructed based on the asymptotic normality of the MCF estimates itself or logarithm of the MCF estimates.
+The function mcf()
is a generic function for the MCF estimates from a sample data or a fitted gamma frailty model (as demonstrated later). If a formula with Recur()
as formula response is specified in function mcf()
, the formula method for estimating the sample MCF will be called. The covariate specified at the right hand side of the formula should be either 1
or any “linear” combination of factor variables in the data. The former computes the overall sample MCF. The latter computes the sample MCF for each level of the combination of the factor variable(s) specified, respectively.
The valve-seat dataset in Nelson (1995) and the simulated sample data are used for demonstration as follows:
+## Example 1. valve-seat data
+valveMcf0 <- mcf(Recur(Days, ID, No.) ~ 1, data = valveSeats)
+
+## Example 2. the simulated data
+simuMcf <- mcf(Recur(time, ID, event) ~ group + gender,
+ data = simuDat, subset = ID %in% seq_len(50))
After estimation, we may plot the sample MCF by function plot()
, which returns a ggplot
object so that the plot produced can be easily further customized by ggplot2. The legendname
and legendLevels
can be specified to easily customize the legend in the plot. Two examples are given as follows:
## overall sample MCF for valve-seat data in Nelson (1995)
+plot(valveMcf0, conf.int = TRUE, mark.time = TRUE, addOrigin = TRUE, col = 2) +
+ ggplot2::xlab("Days") + ggplot2::theme_bw()
## sample MCF for different groups (the default theme)
-plot(simuMcf, conf.int = TRUE, lty = 1 : 4, legendName = "Treatment & Gender")
## sample MCF for different groups (the default theme)
+plot(simuMcf, conf.int = TRUE, lty = 1:4, legendName = "Treatment & Gender")
Notice that in the first plot, the censoring times was marked on the step curve by specifying mark.time = TRUE
and the time origins was included in the curve by specifying addOrigin = TRUE
. In addition, the type and color of the line can be specified through lty
and col
, respectively.
Note that in the first plot, the censoring times was marked on the step curve by specifying mark.time = TRUE
and the time origins was included in the curve by specifying addOrigin = TRUE
. In addition, the type and color of the line can be specified through lty
and col
, respectively.
As for the variance estimates, the Poisson process method assumes that the underlying counting process is a Poisson process and may underestimate the variance if the assumption cannot be justified. While, the Lawless and Nadeau method is more robust to departures from the Poisson process assumption. The nonparametric bootstrap method can be considered as well if an extra computational burden is not of concern. We may perform a quick comparison among the standard error estimates and the confidence intervals from these methods for the valve seats data as follows:
-## Poisson process method
-valveMcf1 <- mcf(Survr(ID, Days, No.) ~ 1, valveSeats, variance = "Poisson")
-
-## bootstrap method (with 1,000 bootstrap samples)
-set.seed(123)
-valveMcf2 <- mcf(Survr(ID, Days, No.) ~ 1, valveSeats,
- variance = "bootstrap", control = list(B = 1e3))
-
-## comparing the standard error estimates
-library(ggplot2)
-ciDat <- rbind(cbind(valveMcf0@MCF, Method = "Lawless & Nadeau"),
- cbind(valveMcf1@MCF, Method = "Poisson"),
- cbind(valveMcf2@MCF, Method = "Bootstrap"))
-ggplot(ciDat, aes(x = time, y = se)) +
- geom_step(aes(color = Method, linetype = Method)) +
- xlab("Days") + ylab("SE estimates") + theme_bw()
## Poisson process method
+valveMcf1 <- mcf(Recur(Days, ID, No.) ~ 1, valveSeats, variance = "Poisson")
+
+## bootstrap method (with 1,000 bootstrap samples)
+set.seed(123)
+valveMcf2 <- mcf(Recur(Days, ID, No.) ~ 1, valveSeats,
+ variance = "bootstrap", control = list(B = 1e3))
+
+## comparing the standard error estimates
+library(ggplot2)
+ciDat <- rbind(cbind(valveMcf0@MCF, Method = "Lawless & Nadeau"),
+ cbind(valveMcf1@MCF, Method = "Poisson"),
+ cbind(valveMcf2@MCF, Method = "Bootstrap"))
+ggplot(ciDat, aes(x = time, y = se)) +
+ geom_step(aes(color = Method, linetype = Method)) +
+ xlab("Days") + ylab("SE estimates") + theme_bw()
## comparing the confidence intervals
-ggplot(ciDat, aes(x = time)) +
- geom_step(aes(y = MCF), color = "grey") +
- geom_step(aes(y = lower, color = Method, linetype = Method)) +
- geom_step(aes(y = upper, color = Method, linetype = Method)) +
- xlab("Days") + ylab("Confidence intervals") + theme_bw()
## comparing the confidence intervals
+ggplot(ciDat, aes(x = time)) +
+ geom_step(aes(y = MCF), color = "grey") +
+ geom_step(aes(y = lower, color = Method, linetype = Method)) +
+ geom_step(aes(y = upper, color = Method, linetype = Method)) +
+ xlab("Days") + ylab("Confidence intervals") + theme_bw()
From the comparison, we may find that the SE estimates and the confidence intervals from the Lawless and Nadaeu method and the bootstrap method have good agreement, while the Poisson process method gives slightly smaller SE estimates and a narrower confidence band. In practice, the Lawless and Nadeau method is suggested if it is hard to justify the Poisson process assumption.
-The function mcfDiff.test
is an implementation of the pseudo-score tests for comparing two-sample MCFs proposed by Cook, Lawless, and Nadeau (1996), while the function mcfDiff
gives the difference estimates and wraps the pseudo-score testing results from mcfDiff.test
(by default).
Suppose we are interested in comparing the two-sample MCFs between the treatment and control group in the simulated data. We may simply feed the mcf.formula
object returned from the function mcf
to function mcfDiff
as follows:
## one sample MCF object of two groups
-mcf0 <- mcf(Survr(ID, time, event) ~ group, data = simuDat)
-(mcf_diff0 <- mcfDiff(mcf0))
From the comparison, we may find that the SE estimates and the confidence intervals from the Lawless and Nadaeu method and the bootstrap method have a good agreement, while the Poisson process method gives slightly smaller SE estimates and a narrower confidence band. In practice, the Lawless and Nadeau method is suggested if it is hard to justify the Poisson process assumption.
+The function mcfDiff.test()
is an implementation of the pseudo-score tests for comparing two-sample MCFs proposed by Cook, Lawless, and Nadeau (1996), while the function mcfDiff()
gives the difference estimates and wraps the pseudo-score testing results from mcfDiff.test()
(by default).
Suppose we are interested in comparing the two-sample MCFs between the treatment and control group in the simulated data. We may simply feed the mcf.formula
object returned from the function mcf()
to function mcfDiff()
as follows:
## one sample MCF object of two groups
+mcf0 <- mcf(Recur(time, ID, event) ~ group, data = simuDat)
+(mcf_diff0 <- mcfDiff(mcf0))
## Call:
## mcfDiff(mcf1 = mcf0)
##
@@ -193,11 +221,11 @@ 2018-02-08
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Variance Estimator: robust
-Then what if the covariate group
contains more than two levels? In that case, we may compute the sample MCF for each group, respectively, and feed both of the generated mcf.formula
objects to mcfDiff
as the first two arguments. Alternatively, an intuitive -
method is available for comparing the difference between two mcf.formual
objects, mcf1
and mcf2
, returned from the mcf
formula method, which means that the function call mcf1 - mcf2
is equivalent to mcfDiff(mcf1, mcf2)
. A simple example is given below.
## explicitly ask for the difference of two sample MCF
-mcf1 <- mcf(Survr(ID, time, event) ~ 1, simuDat, group %in% "Contr")
-mcf2 <- mcf(Survr(ID, time, event) ~ 1, simuDat, group %in% "Treat")
-mcf1 - mcf2
Then what if the covariate group
contains more than two levels? In that case, we may compute the sample MCF for each group, respectively, and feed both of the generated mcf.formula
objects to mcfDiff
as the first two arguments. Alternatively, an intuitive -
method is available for comparing the difference between two mcf.formual
objects, mcf1
and mcf2
, returned from the mcf
formula method, which means that the function call mcf1 - mcf2
is equivalent to mcfDiff(mcf1, mcf2)
. A simple example is given below.
## explicitly ask for the difference of two sample MCF
+mcf1 <- mcf(Recur(time, ID, event) ~ 1, simuDat, group %in% "Contr")
+mcf2 <- mcf(Recur(time, ID, event) ~ 1, simuDat, group %in% "Treat")
+mcf1 - mcf2
## Call:
## mcfDiff(mcf1 = mcf1, mcf2 = mcf2)
##
@@ -209,15 +237,21 @@ 2018-02-08
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Variance Estimator: robust
-Similarly, a plot
method based on ggplot2 is available for visual comparison.
Similarly, a plot()
method based on ggplot2 is available for visual comparison.
plot(mcf_diff0)
The default model when argument df
, knots
, and degree
are not specified is gamma frailty model with (one piece) constant rate function, which is equivalent to negative binomial regression with the same shape and rate parameter in the gamma prior.
## Call:
-## rateReg(formula = Survr(ID, time, event) ~ group + x1, data = simuDat)
+## rateReg(formula = Recur(time, ID, event) ~ group + x1, data = simuDat)
##
## Coefficients of covariates:
## groupTreat x1
@@ -231,16 +265,19 @@ 2018-02-08
## Coefficients of pieces:
## B-spline1
## 0.03041865
-The function rateReg
returns rateReg
object, which can be printed out by calling the object. (Internally, show
method for rateReg
object is called.)
When argument df
or knots
(at least one internal knot) is specified, the model becomes gamma frailty model with piecewise constant rate function or so-called HEART model (Fu, Luo, and Qu 2016) if argument degree is specified to be zero as default.
The function rateReg()
returns rateReg
object, which can be printed out by calling the object. (Internally, show()
method for rateReg
object is called.)
When argument df
or knots
(at least one internal knot) is specified, the model becomes gamma frailty model with piecewise constant rate function or so-called HEART model (Fu, Luo, and Qu 2016) if argument degree is specified to be zero as default.
We may specify df
and leave knots
and degree
as default. Then piecewise constant rate function will be applied and the number of pieces will equal df
. The internal knots will be automatically specified at suitable quantiles of the covariate representing event and censoring time.
For example, two pieces’ constant rate function can be simply specified by setting df = 2
. The internal knot will be the median time of all the event and censoring time. Also, we can fit the models on the first 50 subjects by specifying argument subset
.
# two pieces' constant rate function
-(twoPiecesFit <- rateReg(Survr(ID, time, event) ~ group + x1, df = 2,
- data = simuDat, subset = ID %in% 1:50))
# two pieces' constant rate function
+(twoPiecesFit <- rateReg(Recur(time, ID, event) ~ group + x1, df = 2,
+ data = simuDat, subset = ID %in% 1:50))
## Call:
-## rateReg(formula = Survr(ID, time, event) ~ group + x1, data = simuDat,
+## rateReg(formula = Recur(time, ID, event) ~ group + x1, data = simuDat,
## subset = ID %in% 1:50, df = 2)
##
## Coefficients of covariates:
@@ -260,10 +297,10 @@ 2018-02-08
## 0.03396500 0.04889013
In the example shown above, the internal knots is set automatically to be 102 and the baseline rate function is two pieces’ constant.
If internal knots
are specified explicitly, the df
will be neglected even if it is specified. An example of model with six pieces’ constant rate function is given as follows:
(piecesFit <- rateReg(Survr(ID, time, event) ~ group + x1, data = simuDat,
- knots = seq(from = 28, to = 140, by = 28)))
(piecesFit <- rateReg(Recur(time, ID, event) ~ group + x1, data = simuDat,
+ knots = seq(from = 28, to = 140, by = 28)))
## Call:
-## rateReg(formula = Survr(ID, time, event) ~ group + x1, data = simuDat,
+## rateReg(formula = Recur(time, ID, event) ~ group + x1, data = simuDat,
## knots = seq(from = 28, to = 140, by = 28))
##
## Coefficients of covariates:
@@ -281,14 +318,17 @@ 2018-02-08
## Coefficients of pieces:
## B-spline1 B-spline2 B-spline3 B-spline4 B-spline5 B-spline6
## 0.02578590 0.03051400 0.02063051 0.03059808 0.03774148 0.04792728
-When argument degree
is specified to be a positive integer, the baseline rate function is fitted by splines. The type or flavor of the splines can be specified by argument spline
. The available option for spline
are bSplines
for B-splines and mSplines
for M-splines. (See R package spline2 for details about the spline functions used internally.) A partial matching on names is allowed.
For example, one may want to fit the baseline rate function by a cubic spline with two internal knots. Then we may explicitly specify degree = 3
and knots
to be a length-two numeric vector. Or we may simply specify degree = 3
and df = 6
Then the internal knots will be automatically specified at suitable quantiles of the covariate representing event and censoring time. Generally speaking, the degree of freedom of spline (or the number of spline bases) equals the summation of the number of internal knots and the degree of each spline base, plus one if intercept is included in spline bases.
## internal knots are set as 33% and 67% quantiles of time variable
-(splineFit <- rateReg(Survr(ID, time, event) ~ group + x1, data = simuDat,
- df = 6, degree = 3, spline = "mSplines"))
## internal knots are set as 33% and 67% quantiles of time variable
+(splineFit <- rateReg(Recur(time, ID, event) ~ group + x1, data = simuDat,
+ df = 6, degree = 3, spline = "mSplines"))
## Call:
-## rateReg(formula = Survr(ID, time, event) ~ group + x1, data = simuDat,
+## rateReg(formula = Recur(time, ID, event) ~ group + x1, data = simuDat,
## df = 6, degree = 3, spline = "mSplines")
##
## Coefficients of covariates:
@@ -306,11 +346,11 @@ 2018-02-08
## Coefficients of spline bases:
## M-spline1 M-spline2 M-spline3 M-spline4 M-spline5 M-spline6
## 0.3431193 1.5542060 0.1478426 1.7829521 1.3690238 0.1828846
-## or internal knots are expicitly specified
-(splineFit <- rateReg(Survr(ID, time, event) ~ group + x1, data = simuDat,
- spline = "bSp", degree = 3L, knots = c(56, 112)))
## or internal knots are expicitly specified
+(splineFit <- rateReg(Recur(time, ID, event) ~ group + x1, data = simuDat,
+ spline = "bSp", degree = 3L, knots = c(56, 112)))
## Call:
-## rateReg(formula = Survr(ID, time, event) ~ group + x1, data = simuDat,
+## rateReg(formula = Recur(time, ID, event) ~ group + x1, data = simuDat,
## knots = c(56, 112), degree = 3L, spline = "bSp")
##
## Coefficients of covariates:
@@ -328,11 +368,14 @@ 2018-02-08
## Coefficients of spline bases:
## B-spline1 B-spline2 B-spline3 B-spline4 B-spline5 B-spline6
## 0.01809635 0.04107572 0.01770835 0.02399031 0.06318482 0.03172925
-A brief summary of the fitted model is given by show
method as shown in the previous examples. Further, summary
method for rateReg
object provides a more specific summary of the model fitted. For instance, the summary of the models fitted in section of model fitting can be called as follows:
A brief summary of the fitted model is given by show()
method as shown in the previous examples. Further, summary()
method for rateReg
object provides a more specific summary of the model fitted. For instance, the summary of the models fitted in section of model fitting can be called as follows:
summary(constFit)
## Call:
-## rateReg(formula = Survr(ID, time, event) ~ group + x1, data = simuDat)
+## rateReg(formula = Recur(time, ID, event) ~ group + x1, data = simuDat)
##
## Coefficients of covariates:
## coef exp(coef) se(coef) z Pr(>|z|)
@@ -355,7 +398,7 @@ 2018-02-08
## B-spline1 0.030419 0.0057
##
## Loglikelihood: -1676.422
-
+summary(piecesFit, showCall = FALSE)
##
## Coefficients of covariates:
## coef exp(coef) se(coef) z Pr(>|z|)
@@ -386,7 +429,7 @@ 2018-02-08
## B-spline6 0.047927 0.0105
##
## Loglikelihood: -1663.792
-
+summary(splineFit, showCall = FALSE, showKnots = FALSE)
##
## Coefficients of covariates:
## coef exp(coef) se(coef) z Pr(>|z|)
@@ -411,83 +454,99 @@ 2018-02-08
## B-spline6 0.031729 0.0130
##
## Loglikelihood: -1663.285
-The summary includes the function call, estimated covariate coefficients, estimated parameter of frailty variable, internal knots (if exist), boundary knots, degree of spline bases if splines are applied, coefficients of rate function bases (pieces), and log-likelihood of the model fitted. Outputs of function call or knots, may be suppressed by specifying argument showCall
or showKnots
to be FALSE
, respectively, in summary
method, which would be especially useful for a relatively concise summary in a reproducible report using Rmarkdown, etc.
What’s more, the corresponding coef
and confint
method for point estimates and confidence interval for covariate coefficients are provided as well. Let’s take the fitted model with spline rate function as an example.
The summary includes the function call, estimated covariate coefficients, estimated parameter of frailty variable, internal knots (if exist), boundary knots, degree of spline bases if splines are applied, coefficients of rate function bases (pieces), and log-likelihood of the model fitted. Outputs of function call or knots, may be suppressed by specifying argument showCall
or showKnots
to be FALSE
, respectively, in summary()
method, which would be especially useful for a relatively concise summary in a reproducible report using Rmarkdown, etc.
What’s more, the corresponding coef()
and confint()
method for point estimates and confidence interval for covariate coefficients are provided as well. Let’s take the fitted model with spline rate function as an example.
## point estimates of covariate coefficients
+coef(splineFit)
## groupTreat x1
## -0.6352812 0.3061369
-
+## confidence interval for covariate coefficients
+confint(splineFit, level = 0.95)
## 2.5% 97.5%
## groupTreat -1.19445835 -0.0761041
## x1 -0.02020186 0.6324757
-Two handy functions are provided for model selection. We may compare and select the models with different baseline rate function based on Akaike Information Criterion (AIC) by function AIC
or Bayesian Information Criterion (BIC) by function BIC
. A friendly warning will be thrown out if the numbers of observation were different in the model comparison by AIC.
Two handy functions are provided for model selection. We may compare and select the models with different baseline rate function based on Akaike Information Criterion (AIC) by function AIC()
or Bayesian Information Criterion (BIC) by function BIC()
. A friendly warning will be thrown out if the numbers of observation were different in the model comparison by AIC.
AIC(constFit, piecesFit, splineFit)
## df AIC
## constFit 4 3360.843
## piecesFit 9 3345.585
## splineFit 9 3344.570
-
+BIC(constFit, piecesFit, splineFit)
## df BIC
## constFit 4 3377.702
## piecesFit 9 3383.516
## splineFit 9 3382.501
-Function baseRate
produces baseRate.rateReg
object representing the estimated baseline rate function for a fitter model. An associated plot
method is available. For example, the baseline rate function and its confidence band estimated by cubic splines can be plotted as follows:
Function baseRate()
produces baseRate.rateReg
object representing the estimated baseline rate function for a fitter model. An associated plot()
method is available. For example, the baseline rate function and its confidence band estimated by cubic splines can be plotted as follows:
If rateReg
object is supplied to function mcf
, the method for rateReg
is called, which returns the estimated baseline MCF from the fitted model if newdata
is not specified in the function. The example estimating and plotting the baseline MCF from the fitted model with piecewise constant rate function is shown as follows:
If rateReg
object is supplied to function mcf()
, the method for rateReg
is called, which returns the estimated baseline MCF from the fitted model if newdata
is not specified in the function. The example estimating and plotting the baseline MCF from the fitted model with piecewise constant rate function is shown as follows:
The argument newdata
allows one to estimate the MCF for a given dataset instead of the baseline MCF. If newdata
is specified, the data frame should have the same column names as the covariate names appearing in the formula of original fitting. The MCF will be estimated for each unique row in the data frame and its confidence intervals are constructed based on Delta-method.
In addition, we may specify the name for grouping each unique row and the levels of each group through groupName
and groupLevels
, respectively. For example, we may specify groupName = "Gender"
and groupLevels = c("Male", "Female")
for estimation of different gender groups.
As the last two examples in this vignette, we estimate the MCF from fitted model with spline rate function for the different treatment groups and plot the estimated MCFs and their confidence intervals correspondingly.
-newDat <- data.frame(x1 = c(0, 0), group = c("Treat", "Contr"))
-estmcf <- mcf(splineFit, newdata = newDat, groupName = "Group",
- groupLevels = c("Treatment", "Control"))
-plot(estmcf, conf.int = TRUE, col = c("royalblue", "red"), lty = c(1, 5)) +
- ggtitle("Control vs. Treatment") + xlab("Days")
newDat <- data.frame(x1 = c(0, 0), group = c("Treat", "Contr"))
+estmcf <- mcf(splineFit, newdata = newDat, groupName = "Group",
+ groupLevels = c("Treatment", "Control"))
+plot(estmcf, conf.int = TRUE, col = c("royalblue", "red"), lty = c(1, 5)) +
+ ggtitle("Control vs. Treatment") + xlab("Days")
The data frame containing the MCF estimates is stored in the slot named MCF
. So it is not hard to make further customization to the MCF plot.
plot(estmcf) +
- geom_ribbon(data = estmcf@MCF, alpha = 0.2,
- aes(x = time, ymin = lower, ymax = upper, fill = Group)) +
- ggtitle("Control vs. Treatment") + xlab("Days")
plot(estmcf) +
+ geom_ribbon(data = estmcf@MCF, alpha = 0.2,
+ aes(x = time, ymin = lower, ymax = upper, fill = Group)) +
+ ggtitle("Control vs. Treatment") + xlab("Days")
Cook, Richard J, and Jerald Lawless. 2007. The Statistical Analysis of Recurrent Events. Springer Science & Business Media.
+Cook, Richard J, Jerald F Lawless, and Claude Nadeau. 1996. “Robust Tests for Treatment Comparisons Based on Recurrent Event Responses.” Biometrics 52 (2). Wiley, International Biometric Society:557–71.
+Cook, Richard J, Jerald F Lawless, and Claude Nadeau. 1996. “Robust Tests for Treatment Comparisons Based on Recurrent Event Responses.” Biometrics 52 (2): 557–71.
Efron, B. 1979. “Bootstrap Methods: Another Look at the Jackknife.” The Annals of Statistics 7 (1). The Institute of Mathematical Statistics:1–26.
+Efron, B. 1979. “Bootstrap Methods: Another Look at the Jackknife.” The Annals of Statistics 7 (1): 1–26.
Fu, Haoda, Junxiang Luo, and Yongming Qu. 2016. “Hypoglycemic Events Analysis via Recurrent Time-to-Event (HEART) Models.” Journal of Biopharmaceutical Statistics 26 (2):280–98.
+Fu, Haoda, Junxiang Luo, and Yongming Qu. 2016. “Hypoglycemic Events Analysis via Recurrent Time-to-Event (HEART) Models.” Journal of Biopharmaceutical Statistics 26 (2): 280–98.
Lawless, Jerald F, and Claude Nadeau. 1995. “Some Simple Robust Methods for the Analysis of Recurrent Events.” Technometrics 37 (2). Taylor & Francis:158–68.
+Lawless, Jerald F, and Claude Nadeau. 1995. “Some Simple Robust Methods for the Analysis of Recurrent Events.” Technometrics 37 (2): 158–68.
Lewis, P A, and G S Shedler. 1979. “Simulation of Nonhomogeneous Poisson Processes by Thinning.” Naval Research Logistics Quarterly 26 (3). Wiley Online Library:403–13.
+Lewis, P A, and G S Shedler. 1979. “Simulation of Nonhomogeneous Poisson Processes by Thinning.” Naval Research Logistics Quarterly 26 (3): 403–13.
Nelson, Wayne B. 1995. “Confidence Limits for Recurrence Data-Applied to Cost or Number of Product Repairs.” Technometrics 37 (2). Taylor & Francis:147–57.
+Nelson, Wayne B. 1995. “Confidence Limits for Recurrence Data-Applied to Cost or Number of Product Repairs.” Technometrics 37 (2): 147–57.
———. 2003. Recurrent Events Data Analysis for Product Repairs, Disease Recurrences, and Other Applications. Vol. 10. SIAM.
vignettes/reda-simulate.Rmd
+ reda-simulate.Rmd
In this vignette, we briefly introduce how to simulate survival and recurrent event data from stochastic process point of view with the reda package. The core function named simEvent
provides an intuitive and flexible interface for simulating survival and recurrent event times from one stochastic process. Another function named simEventData
is simply a wrapper that calls simEvent
internally and collects the event times and the covariates of a given number of processes into a data frame. Examples of generating random samples from common survival and recurrent models are provided. In fact, the function simEvent
and simEventData
may serve as the building blocks for simulating multitype event data including multiple type event data, recurrent events with termination, and competing risk data. The details of function syntax and the objects produced are available in the package manual and thus not covered in this vignette.
We introduce the general form of hazard rate function considered and implemented in the function simEvent
, which can be generalized to two classes called the relative risk model, and the accelerated failure time model. The introduction is based on Section 2.2 and Section 2.3 of Kalbfleisch and Prentice (2002). Other helpful references include Aalen, Borgan, and Gjessing (2008), Kleinbaum and Klein (2011), among others.
In this vignette, we briefly introduce how to simulate survival and recurrent event data from stochastic process point of view with the reda package. The core function named simEvent()
provides an intuitive and flexible interface for simulating survival and recurrent event times from one stochastic process. Another function named simEventData()
is simply a wrapper that calls simEvent()
internally and collects the event times and the covariates of a given number of processes into a data frame. Examples of generating random samples from common survival and recurrent models are provided. In fact, the function simEvent()
and simEventData()
may serve as the building blocks for simulating multitype event data including multiple type event data, recurrent events with termination, and competing risk data. The details of function syntax and the objects produced are available in the package manual and thus not covered in this vignette.
We introduce the general form of hazard rate function considered and implemented in the function simEvent()
, which can be generalized to two classes called the relative risk model, and the accelerated failure time model. The introduction is based on Section 2.2 and Section 2.3 of Kalbfleisch and Prentice (2002). Other helpful references include Aalen, Borgan, and Gjessing (2008), Kleinbaum and Klein (2011), among others.
Let’s consider \(n\) stochastic processes with the baseline hazard rate function \(\rho(t)\) of time \(t\). For the stochastic process \(i\), \(i\in\{1,\ldots,n\}\), let \(\mathbf{z}_i=(z_{i1},\ldots,z_{ip})^{\top}\) denote the covariate vector of length \(p\), and \(\boldsymbol{\beta}=(\beta_1,\ldots,\beta_p)^{\top}\) denote the covariate coefficients.
-Given the covariates \(\mathbf{z}_i\) (not including the intercept term), the intensity function of time \(t\) for the relative risk regression model can be specified as follows: \[\lambda(t \mid \mathbf{z}_i) = \rho(t)\,r(\mathbf{z}_i, -\boldsymbol{\beta}),\] where \(r(\mathbf{z}_i, \boldsymbol{\beta})\) is the relative risk function. For the Cox model (Cox 1972) and the Andersen-Gill model (Andersen and Gill 1982), \(r(\mathbf{z}_i, \boldsymbol{\beta})= \exp\{\boldsymbol{\beta}^{\top}\mathbf{z}_i\}\). Other common choices include the linear relative risk function: \(r(\mathbf{z}_i, \boldsymbol{\beta}) = 1 + \boldsymbol{\beta}^{\top}\mathbf{z}_i\), and the excess relative risk function: \(r(\mathbf{z}_i, \boldsymbol{\beta}) = \prod_{j=1}^p(1 + \beta_j z_{ij})\), both of which, however, suffer from the drawback that the \(r(\mathbf{z}_i, \boldsymbol{\beta})\) is not necessarily positive. Therefore, the coefficient estimates must be restricted to guarantee that the relative risk function is positive for all possible covariates. The restriction disappears for the exponential relative risk function since it always returns positive values.
+\boldsymbol{\beta}),\] where \(r(\mathbf{z}_i, \boldsymbol{\beta})\) is the relative risk function. For the Cox model (Cox 1972) and the Andersen-Gill model (Andersen and Gill 1982), \(r(\mathbf{z}_i, \boldsymbol{\beta})= \exp\{\boldsymbol{\beta}^{\top}\mathbf{z}_i\}\). Other common choices include the linear relative risk function: \(r(\mathbf{z}_i, \boldsymbol{\beta}) = 1 + \boldsymbol{\beta}^{\top}\mathbf{z}_i\), and the excess relative risk function: \(r(\mathbf{z}_i, \boldsymbol{\beta}) = \prod_{j=1}^p(1 + \beta_j z_{ij})\), both of which, however, suffer from the drawback that the \(r(\mathbf{z}_i, \boldsymbol{\beta})\) is not necessarily positive. Therefore, the coefficient estimates must be restricted to guarantee that the relative risk function is positive for all possible covariates. The restriction disappears for the exponential relative risk function since it always returns positive values.We may extend the model by considering a random frailty effect \(u_i\) (of expectation one) to account for heterogeneity between different processes or processes from different clusters. The intensity function becomes \[\lambda(t \mid \mathbf{z}_i, u_i) = u_i\,\rho(t)\,r(\mathbf{z}_i, \boldsymbol{\beta}).\] The common choices for distribution of the random frailty effect include Gamma distribution of mean one, and lognormal distribution of mean zero in logarithm scale.
Furthermore, both covariates and coefficients may be time-varying. So a general form of the intensity function for the relative risk model is given as follows: \[\lambda\bigl(t \mid \mathbf{z}_i(t), u_i\bigr) = u_i\,\rho(t)\,r\bigl(\mathbf{z}_i(t), \boldsymbol{\beta}(t)\bigr).\]
-The relative risk models incorporate the covariates and their coefficients through a relative risk function multiplied by the baseline hazard rate, which provides an intuitive interpretation. However, we may consider a direct relationship between the covraiates \(\mathbf{z}\) (including the intercept term) and the time to failure \(T>0\), \(\log(T) = \alpha + \sigma W\), where \(W\) is a standardized random error variable with density function \(f_W(w)\), suvival function \(S_W(w)\) and hazard function \(\rho(w) = f_W(w) / S_W(w)\), \(\sigma\) represents the standard error, and \(\alpha = \boldsymbol{\beta}^{\top}\mathbf{z}\). We assume that \(W\) is independent of \(\boldsymbol{\beta}\) given the covariates \(\mathbf{z}\). Taking exponentiation gives \(T=\exp(\boldsymbol{\beta}^{\top}\mathbf{z})\exp(W)\) with density \(f_T(t) = \frac{1}{\sigma t}f_W\bigl((\log(t) - \alpha)/\sigma\bigr)\), survival function \(S_T(t) = S_W\bigl((\log(t) - \alpha)/\sigma\bigr)\), and hazard function \[\lambda(t \mid \mathbf{z}_i) = \frac{1}{\sigma t}\rho\bigl((\log(t) - \alpha)/\sigma\bigr).\] Let \(\lambda_{\mathbf{z}}=\exp(- \alpha)\) and \(p = 1 / \sigma\), we may rewrite the hazard function as follows: \[\lambda(t \mid \mathbf{z}_i) = \frac{p}{t}\rho\bigl(p\log(\lambda_{\mathbf{z}} t)\bigr).\] The resulting model is called accelerated failure time (AFT) model.
@@ -105,43 +130,59 @@Similarly, we may further consider frailty factor, time-variant covariates, and time-varying covariate coefficients. So a general form of the intensity function may be given as follows: \[\lambda\bigl(t \mid \mathbf{z}_i(t), u_i\bigr) = \frac{u_i\,p}{t}\,\rho\bigl( p \log(t) - p \boldsymbol{\beta}(t)^{\top}\mathbf{z}_i(t)\bigr).\]
-The function simEvent
and simEventData
allow users to specify an intensity function in an even more general form given below. \[\lambda\bigl(t \mid
+
The function simEvent()
and simEventData()
allow users to specify an intensity function in an even more general form given below. \[\lambda\bigl(t \mid
\mathbf{z}_i(t), u_i\bigr) = u_i\,\rho\bigl(t, \mathbf{z}_i(t),
\boldsymbol{\beta}(t)\bigr) \,r\bigl(\mathbf{z}_i(t),
\boldsymbol{\beta}(t)\bigr),\] where \(u_i\), \(\rho(\cdot)\), and \(r(\cdot)\) corresponds to the argument frailty
, rho
, and relativeRisk
, respectively.
The thinning method (Lewis and Shedler 1979) and the inversion method (Çinlar 1975) are implemented for sampling event times. It can be shown that both methods achieve the given hazard rate. For function simEvent
, the thinning method is the default method when the hazard rate function is bounded within follow-ups. Otherwise, the inversion method will be used. We may specify the sampling method via the argument method
in function simEvent
and simEventData
.
library(reda) # attach reda package to the search path
-packageVersion("reda") # check the package version
## [1] '0.4.1'
-
-A homogeneous/stationary Poisson process (HPP) has a constant hazard rate over time with the interarrival times (between two successive arrivals/events) following exponential distribution. Two simple examples of simulating a homogeneous Poisson process using simEvent
are given as follows:
The thinning method (Lewis and Shedler 1979) and the inversion method (Çınlar 1975) are implemented for sampling event times. It can be shown that both methods achieve the given hazard rate. For function simEvent()
, the thinning method is the default method when the hazard rate function is bounded within follow-ups. Otherwise, the inversion method will be used. We may specify the sampling method via the argument method
in function simEvent()
and simEventData()
.
library(reda) # attach reda package to the search path
+packageVersion("reda") # check the package version
## [1] '0.5.0'
+
+A homogeneous/stationary Poisson process (HPP) has a constant hazard rate over time with the interarrival times (between two successive arrivals/events) following exponential distribution. Two simple examples of simulating a homogeneous Poisson process using simEvent()
are given as follows:
## HPP from time 1 to 5 of intensity 1 without covariates
+simEvent(rho = 1, origin = 1, endTime = 5)
## 'simEvent' S4 class object:
## [1] 1.84 2.42 3.75 3.78 3.84 4.15 4.47 4.61
-## HPP from 0 to 10 of baseline hazard rate 0.5 with two covariates
-simEvent(z = c(0.2, 0.5), zCoef = c(0.5, - 0.1), rho = 0.5, endTime = 10)
## HPP from 0 to 10 of baseline hazard rate 0.5 with two covariates
+simEvent(z = c(0.2, 0.5), zCoef = c(0.5, - 0.1), rho = 0.5, endTime = 10)
## 'simEvent' S4 class object:
## [1] 0.717 1.076 2.692 5.666 6.577 7.701
-The function simEventData
enable us to simulate multiple processes and collect the simulated event times into a survival or recurrent event data format.
## recurrent events from two processes with same covariates
-simEventData(2, z = c(0.2, 0.5), zCoef = c(1, - 0.5), rho = 0.5, endTime = 5)
The function simEventData()
enable us to simulate multiple processes and collect the simulated event times into a survival or recurrent event data format.
## recurrent events from two processes with same covariates
+simEventData(2, z = c(0.2, 0.5), zCoef = c(1, - 0.5), rho = 0.5, endTime = 5)
## ID time event origin X.1 X.2
## 1 1 3.38 1 0 0.2 0.5
## 2 1 5.00 0 0 0.2 0.5
## 3 2 1.84 1 0 0.2 0.5
## 4 2 5.00 0 0 0.2 0.5
In the example given above, the number of process is explicitly specified to be two. However, if it is not specified, it will be the number of rows of the covariate matrix. See the example given below.
-## recurrent events from two processes
-## with different time-invariant covariates and time origins
-simEventData(z = cbind(rnorm(2), 0.5), zCoef = c(1, - 0.5),
- rho = 0.2, origin = c(1, 0), endTime = c(10, 9))
## recurrent events from two processes
+## with different time-invariant covariates and time origins
+simEventData(z = cbind(rnorm(2), 0.5), zCoef = c(1, - 0.5),
+ rho = 0.2, origin = c(1, 0), endTime = c(10, 9))
## ID time event origin X.1 X.2
## 1 1 4.91 1 1 0.838 0.5
## 2 1 6.31 1 1 0.838 0.5
@@ -150,11 +191,11 @@ 2018-02-08
## 5 2 3.44 1 0 0.153 0.5
## 6 2 6.68 1 0 0.153 0.5
## 7 2 9.00 0 0 0.153 0.5
-We can also simulate survival data by taking the first event of each process. Setting recurrent = FALSE
in function simEvent
(or function simEventData
) gives us the survival time(s) or the right censoring time(s). In the example given below, we specified endTime = "rnorm"
and arguments = list(endTime = list(mean = 10))
for generating a random censoring times from normal distribution with mean ten and unit standard deviation. Also note that the specified origin
is recycled for these ten processes.
## survival data by set 'recurrent = FALSE'
-simEventData(z = cbind(rnorm(10), 1), zCoef = c(0.2, - 0.5), rho = 0.1,
- origin = c(0, 1), endTime = stats::rnorm, recurrent = FALSE,
- arguments = list(endTime = list(mean = 10)))
We can also simulate survival data by taking the first event of each process. Setting recurrent = FALSE
in function simEvent()
(or function simEventData()
) gives us the survival time(s) or the right censoring time(s). In the example given below, we specified endTime = "rnorm"
and arguments = list(endTime = list(mean = 10))
for generating a random censoring times from normal distribution with mean ten and unit standard deviation. Also note that the specified origin
is recycled for these ten processes.
## survival data by set 'recurrent = FALSE'
+simEventData(z = cbind(rnorm(10), 1), zCoef = c(0.2, - 0.5), rho = 0.1,
+ origin = c(0, 1), endTime = stats::rnorm, recurrent = FALSE,
+ arguments = list(endTime = list(mean = 10)))
## ID time event origin X.1 X.2
## 1 1 9.11 0 0 -0.772 1
## 2 2 10.38 0 1 0.287 1
@@ -166,39 +207,51 @@ 2018-02-08
## 8 8 10.96 0 1 -0.934 1
## 9 9 10.08 0 0 0.394 1
## 10 10 11.13 0 1 0.404 1
-In contrast to HPP, a nonhomogeneous Poisson process (NHPP) has a time-varying hazard rate function. In that case, we may specify the baseline hazard rate function rho
to be a function object whose first argument represents the time variable. A quick example is given below, where the baseline hazard rate function \(\rho(t) = \sin(t) + 1\).
In contrast to HPP, a nonhomogeneous Poisson process (NHPP) has a time-varying hazard rate function. In that case, we may specify the baseline hazard rate function rhoFun()
to be a function object whose first argument represents the time variable. A quick example is given below, where the baseline hazard rate function \(\rho(t) = \sin(t) + 1\).
## 'simEvent' S4 class object:
## [1] 1.15
As demonstrated in the last example for HPP, other possible arguments of the function objects can be specified via the arguments
. For example, that arguments = list(rho = list(b = 0.5))
specifies the baseline hazard rate function to be \(\rho(t) = 0.5(\sin(t) + 1)\).
simEventData(z = cbind(rexp(2), c(0, 1)), zCoef = c(0.1, - 0.5),
- rho = rhoFun, arguments = list(rho = list(b = 0.5)))
simEventData(z = cbind(rexp(2), c(0, 1)), zCoef = c(0.1, - 0.5),
+ rho = rhoFun, arguments = list(rho = list(b = 0.5)))
## ID time event origin X.1 X.2
## 1 1 3.000 0 0 0.0687 0
## 2 2 0.715 1 0 0.5692 1
## 3 2 1.120 1 0 0.5692 1
## 4 2 2.208 1 0 0.5692 1
## 5 2 3.000 0 0 0.5692 1
-In the Poisson process, the interarrival times between two successive arrivals (or events) follow exponential distribution independently. We may generalize the distribution of interarrival times and consider more general renewal processes.
-In function simEvent
(and simEventData
), we may specify the distribution of the interarrival times via the argument interarrival
, which takes the function stats::rexp
for generating interarrival times following exponential distribution by default. In general, the argument interarrival
takes a function with at least one argument named rate
for generating random (nonnegative) interarrival times from a certain distribution at the given arrival rate. A quick example of generating the interarrival times following Gamma distribution of scale one is given as follows:
In function simEvent()
(and simEventData()
), we may specify the distribution of the interarrival times via the argument interarrival
, which takes the function stats::rexp()
for generating interarrival times following exponential distribution by default. In general, the argument interarrival
takes a function with at least one argument named rate
for generating random (nonnegative) interarrival times from a certain distribution at the given arrival rate. A quick example of generating the interarrival times following Gamma distribution of scale one is given as follows:
## 'simEvent' S4 class object:
## [1] 0.182 1.878
-If the specified function has an argument named n
, the function simEvent
will assume that the function can generate n
number of random interarrival times at one time and take advantage of the vectorization for a potentially better performance. However, it is optional. The example given below produces an equivalent result.
If the specified function has an argument named n
, the function simEvent()
will assume that the function can generate n
number of random interarrival times at one time and take advantage of the vectorization for a potentially better performance. However, it is optional. The example given below produces an equivalent result.
## 'simEvent' S4 class object:
## [1] 0.182 1.878
-In practice, some covariates such as patients’ age, automobile’s mileage may vary over time. The argument z
in the function simEvent
and simEventData
may take a function of time that returns a vector of covariates for generating event times with the time-varying covariates. Let’s consider an example of generating recurrent event times with three covariates, where two of which are time-variant.
In practice, some covariates such as patients’ age, automobile’s mileage may vary over time. The argument z
in the function simEvent()
and simEventData()
may take a function of time that returns a vector of covariates for generating event times with the time-varying covariates. Let’s consider an example of generating recurrent event times with three covariates, where two of which are time-variant.
set.seed(123)
+zFun1 <- function(time) cbind(time / 10 + 1, as.numeric(time > 1), 0.5)
+simEventData(z = zFun1, zCoef = c(0.1, 0.5, - 0.5))
## ID time event origin X.1 X.2 X.3
## 1 1 0.971 1 0 1.10 0 0.5
## 2 1 1.880 1 0 1.19 1 0.5
@@ -209,10 +262,10 @@ 2018-02-08
## 7 1 2.471 1 0 1.25 1 0.5
## 8 1 3.000 0 0 1.30 1 0.5
In the example given above, the covariate vector is \(\mathbf{z}(t)=(0.1t+1, \boldsymbol{1}(t > 1), 0.5)^{\top}\). If the covariate function has more arguments, we may specify them by a named list in arguments
. The example given below produces the equivalent results.
set.seed(123)
-zFun2 <- function(x, a, b) cbind(x / 10 + a, as.numeric(x > b), 0.5)
-simEventData(z = zFun2, zCoef = c(0.1, 0.5, - 0.5),
- arguments = list(z = list(a = 1, b = 1)))
set.seed(123)
+zFun2 <- function(x, a, b) cbind(x / 10 + a, as.numeric(x > b), 0.5)
+simEventData(z = zFun2, zCoef = c(0.1, 0.5, - 0.5),
+ arguments = list(z = list(a = 1, b = 1)))
## ID time event origin X.1 X.2 X.3
## 1 1 0.971 1 0 1.10 0 0.5
## 2 1 1.880 1 0 1.19 1 0.5
@@ -222,11 +275,11 @@ 2018-02-08
## 6 1 2.371 1 0 1.24 1 0.5
## 7 1 2.471 1 0 1.25 1 0.5
## 8 1 3.000 0 0 1.30 1 0.5
-Notice that in the examples given above, if we generate event times for more than one process, the time-varying covariate function will remain the same for different processes, which may not be the case in practice. A more realistic situation is that the time-variant covariate functions are different among different processes but coming from a common function family. Let’s consider the Stanford heart transplant data (Crowley and Hu 1977) as an example (the heart
data available in the survival package). The covariate transplant
indicating whether the patient has already received a heart transplant before time t
is time-dependent and can be represented by a indicator function family, \(\boldsymbol{1}(t > b)\), where \(b\) is a known parameter that may differ among patients. For a particular patient \(i\), \(b = b_i\) is a known constant. In that case, we specify the function parameters inside the quote
function as follows:
zFun3 <- function(time, a, b) cbind(time / 10 + a, as.numeric(time > b))
-(simDat <- simEventData(nProcess = 3, z = zFun3, zCoef = c(- 0.1, 0.5),
- arguments = list(z = list(a = quote(rpois(1, 10) / 10),
- b = quote(runif(1, 1, 3))))))
Notice that in the examples given above, if we generate event times for more than one process, the time-varying covariate function will remain the same for different processes, which may not be the case in practice. A more realistic situation is that the time-variant covariate functions are different among different processes but coming from a common function family. Let’s consider the Stanford heart transplant data (Crowley and Hu 1977) as an example (the heart
data available in the survival package). The covariate transplant
indicating whether the patient has already received a heart transplant before time t
is time-dependent and can be represented by a indicator function family, \(\boldsymbol{1}(t > b)\), where \(b\) is a known parameter that may differ among patients. For a particular patient \(i\), \(b = b_i\) is a known constant. In that case, we specify the function parameters inside the quote
function as follows:
zFun3 <- function(time, a, b) cbind(time / 10 + a, as.numeric(time > b))
+(simDat <- simEventData(nProcess = 3, z = zFun3, zCoef = c(- 0.1, 0.5),
+ arguments = list(z = list(a = quote(rpois(1, 10) / 10),
+ b = quote(runif(1, 1, 3))))))
## ID time event origin X.1 X.2
## 1 1 0.104 1 0 1.710 0
## 2 1 0.328 1 0 1.733 0
@@ -240,17 +293,21 @@ 2018-02-08
## 10 3 1.472 1 0 0.747 0
## 11 3 3.000 0 0 0.900 1
In the example given above, the covariate X.2
is simulated from the indicator function famliy, \(\boldsymbol{1}(t > b)\), where parameter \(b\) follows uniform distribution between 1 and 3. Internally, the parameters specified in arguments
were evaluated for each process. We may check the values of the parameter a
from the generated covariate X.1
for different processes as follows:
## check the values of parameter `a` for different processes
-with(simDat, unique(cbind(ID, a = X.1 - time / 10)))
## check the values of parameter `a` for different processes
+with(simDat, unique(cbind(ID, a = X.1 - time / 10)))
## ID a
## [1,] 1 1.7
## [2,] 2 1.2
-## [3,] 3 0.6
-The assumption of time-invariance on the covariate coefficients can be hard to justify in practice. We may simulate event times with time-varying covariate coefficients by specifying the argument zCoef
to be a function of time that returns a vector of coefficients at the input time point. How we may specify the argument zCoef
for time-varying coefficients is very similar to the way we may specify the argument z
for time-varying covariates introduced in last section. For example, we may generate event times with both covariates and their coefficients being time-variant as follows:
zCoefFun <- function(time, shift) cbind(sqrt(time / 10), sin(time + shift), 0.1)
-simEventData(z = zFun1, zCoef = zCoefFun,
- arguments = list(zCoef = list(shift = 1)))
zCoefFun <- function(time, shift) cbind(sqrt(time / 10), sin(time + shift), 0.1)
+simEventData(z = zFun1, zCoef = zCoefFun,
+ arguments = list(zCoef = list(shift = 1)))
## ID time event origin X.1 X.2 X.3
## 1 1 1.12 1 0 1.11 1 0.5
## 2 1 1.34 1 0 1.13 1 0.5
@@ -258,48 +315,57 @@ 2018-02-08
## 4 1 2.43 1 0 1.24 1 0.5
## 5 1 3.00 0 0 1.30 1 0.5
As demonstrated in the example given above, we may similarly specify the other arguments of the time-varying coefficient function via a named list in arguments
.
Let’s consider frailty factors for individual processes first, where each process \(i\) has its own frailty effect \(w_i\). A popular choice of the frailty distribution is the one-parameter gamma distribution of mean one, which often leads to an explicit marginal likelihood in a relatively simple expression. Similar to the argument z
, zCoef
, and rho
, the argument frailty
may take a function as input for simulating the frailty effect. For example, we may simulate the recurrent event times for one process with frailty factor following gamma(2, 0.5) via frailty = "rgamma"
and arguments = list(frailty = list(shape = 2, scale = 0.5))
as follows:
set.seed(123)
-simEventData(z = zFun1, zCoef = c(0.1, 0.5, - 0.5), frailty = stats::rgamma,
- arguments = list(frailty = list(shape = 2, scale = 0.5)))
set.seed(123)
+simEventData(z = zFun1, zCoef = c(0.1, 0.5, - 0.5), frailty = stats::rgamma,
+ arguments = list(frailty = list(shape = 2, scale = 0.5)))
## ID time event origin X.1 X.2 X.3
## 1 1 0.135 1 0 1.01 0 0.5
## 2 1 0.620 1 0 1.06 0 0.5
## 3 1 1.102 1 0 1.11 1 0.5
## 4 1 1.324 1 0 1.13 1 0.5
## 5 1 3.000 0 0 1.30 1 0.5
-The named list list(shape = 2, scale = 0.5)
was passed to the function rgamma
(from the stats package). The random number seed was reset so that we might compare the results with the first example of time-variant covariates. Note that it is users’ job to make sure the specified distribution of the frailty factor has mean one (or makes sense in a certain way). The function simEvent
and simEventData
only check the sign of the simulated frailty effect. An error will be thrown out if the generated frailty effect is not positive.
The named list list(shape = 2, scale = 0.5)
was passed to the function rgamma
(from the stats package). The random number seed was reset so that we might compare the results with the first example of time-variant covariates. Note that it is users’ job to make sure the specified distribution of the frailty factor has mean one (or makes sense in a certain way). The function simEvent()
and simEventData()
only check the sign of the simulated frailty effect. An error will be thrown out if the generated frailty effect is not positive.
To demonstrate how to specify other distribution of the frailty factor, we simulate the recurrent event times for one process by three slightly different but equivalent approaches in the following examples, where the frailty effect follows log-normal distribution of mean one.
-set.seed(123)
-## use function `rlnorm` from the stats package
-simEvent(z = zFun1, zCoef = c(0.1, 0.5, - 0.5), frailty = stats::rlnorm,
- arguments = list(frailty = list(sdlog = 1)))
set.seed(123)
+## use function `rlnorm` from the stats package
+simEvent(z = zFun1, zCoef = c(0.1, 0.5, - 0.5), frailty = stats::rlnorm,
+ arguments = list(frailty = list(sdlog = 1)))
## 'simEvent' S4 class object:
## [1] 1.59 1.63 1.70 2.08 2.45 2.63
-set.seed(123)
-## use a customized function with argument `n` and `sdlog`
-logNorm1 <- function(n, sdlog) exp(rnorm(n = n, mean = 0, sd = sdlog))
-simEvent(z = zFun1, zCoef = c(0.1, 0.5, - 0.5), frailty = logNorm1,
- arguments = list(frailty = list(sdlog = 1)))
set.seed(123)
+## use a customized function with argument `n` and `sdlog`
+logNorm1 <- function(n, sdlog) exp(rnorm(n = n, mean = 0, sd = sdlog))
+simEvent(z = zFun1, zCoef = c(0.1, 0.5, - 0.5), frailty = logNorm1,
+ arguments = list(frailty = list(sdlog = 1)))
## 'simEvent' S4 class object:
## [1] 1.59 1.63 1.70 2.08 2.45 2.63
-set.seed(123)
-## use a customized function with argument `sdlog` only
-logNorm2 <- function(sdlog) exp(rnorm(n = 1, mean = 0, sd = sdlog))
-simEvent(z = zFun1, zCoef = c(0.1, 0.5, - 0.5), frailty = logNorm2,
- arguments = list(frailty = list(sdlog = 1)))
set.seed(123)
+## use a customized function with argument `sdlog` only
+logNorm2 <- function(sdlog) exp(rnorm(n = 1, mean = 0, sd = sdlog))
+simEvent(z = zFun1, zCoef = c(0.1, 0.5, - 0.5), frailty = logNorm2,
+ arguments = list(frailty = list(sdlog = 1)))
## 'simEvent' S4 class object:
## [1] 1.59 1.63 1.70 2.08 2.45 2.63
-If the function specified for frailty
has an argument named n
, that n = 1
will be specified internally by the function simEvent
, which is designed for using the functions generating random numbers, such as rgamma
and rlnorm
from the stats package.
If the function specified for frailty
has an argument named n
, that n = 1
will be specified internally by the function simEvent()
, which is designed for using the functions generating random numbers, such as rgamma()
and rlnorm()
from the stats package.
When different processes come from several clusters, we may consider a same frailty effect shared among processes within a cluster. The case we considered in last section where frailty factors are different among individual processes is a special case when the cluster size is one.
-In the function simEvent
(simEventData
), the argument frailty
may take a numeric number (vector) as input for specific shared frailty effect for clusters. For instance, we may simulate the recurrent event times for four processes coming from two clusters with shared gamma frailty within cluster, where the first two processes come from one cluster while the remaining two come from another cluster.
## shared gamma frailty for processes from two clusters
-frailtyEffect <- rgamma(2, shape = 2, scale = 0.5)
-simEventData(nProcess = 4, z = zFun1, zCoef = c(0.1, 0.5, - 0.5),
- frailty = rep(frailtyEffect, each = 2))
In the function simEvent()
(and simEventData()
), the argument frailty
may take a numeric number (vector) as input for specific shared frailty effect for clusters. For instance, we may simulate the recurrent event times for four processes coming from two clusters with shared gamma frailty within cluster, where the first two processes come from one cluster while the remaining two come from another cluster.
## shared gamma frailty for processes from two clusters
+frailtyEffect <- rgamma(2, shape = 2, scale = 0.5)
+simEventData(nProcess = 4, z = zFun1, zCoef = c(0.1, 0.5, - 0.5),
+ frailty = rep(frailtyEffect, each = 2))
## ID time event origin X.1 X.2 X.3
## 1 1 0.572 1 0 1.06 0 0.5
## 2 1 1.625 1 0 1.16 1 0.5
@@ -316,9 +382,9 @@ 2018-02-08
## 13 4 2.443 1 0 1.24 1 0.5
## 14 4 3.000 0 0 1.30 1 0.5
If the length of the specified frailty vector is less than the number of processes, the vector will be recycled internally. In the example given below, the process 1 and process 3 shared a same frailty effect (frailtyEffect[1L]
). Similarly, the process 2 and process 4 shared a same frailty effect (frailtyEffect[2L]
).
set.seed(123)
-simEventData(nProcess = 4, z = zFun1, zCoef = c(0.1, 0.5, - 0.5),
- frailty = frailtyEffect)
set.seed(123)
+simEventData(nProcess = 4, z = zFun1, zCoef = c(0.1, 0.5, - 0.5),
+ frailty = frailtyEffect)
## ID time event origin X.1 X.2 X.3
## 1 1 0.956 1 0 1.10 0 0.5
## 2 1 1.851 1 0 1.19 1 0.5
@@ -343,17 +409,23 @@ 2018-02-08
## 21 4 2.731 1 0 1.27 1 0.5
## 22 4 2.948 1 0 1.29 1 0.5
## 23 4 3.000 0 0 1.30 1 0.5
-In this section, we present examples for generating event times with time-invariant covariates for several common parametric survival models.
-The Weibull model is one of the most widely used parametric survival models. Assume the event times of the process \(i\), \(i\in\{1,\ldots,n\}\), follow Weibull model with hazard function \(h_i(t) = \lambda_i p t^{p-1}\), where \(p>0\) is the shape parameter, and \(\lambda_i\) can be reparametrized with regression coefficients. When \(p=1\), the Weibull model reduces to the exponential model, whose hazard rate is a constant over time.
One common reparametrization is \(\lambda_i = \exp(\beta_0 + \boldsymbol{\beta}^{\top}\mathbf{z}_i)\), which results in Weibull proportional hazard (PH) model. Let \(\lambda_0 = \exp(\beta_0)\) and we may rewrite the hazard function \(h_i(t) = \rho(t) \exp(\boldsymbol{\beta}^{\top}\mathbf{z}_i)\), where \(\rho(t) = \lambda_0 p t^{p-1}\) is the baseline hazard function. For example, we may simulate the survival data of ten processes from Weibull PH model as follows:
-nProcess <- 10
-rho_weibull_ph <- function(x, lambda, p) lambda * p * x ^ (p - 1)
-simEventData(z = cbind(rnorm(nProcess), rbinom(nProcess, 1, 0.5)),
- zCoef = c(0.5, 0.2), endTime = rnorm(nProcess, 10),
- recurrent = FALSE, rho = rho_weibull_ph,
- arguments = list(rho = list(lambda = 0.01, p = 2)))
nProcess <- 10
+rho_weibull_ph <- function(x, lambda, p) lambda * p * x ^ (p - 1)
+simEventData(z = cbind(rnorm(nProcess), rbinom(nProcess, 1, 0.5)),
+ zCoef = c(0.5, 0.2), endTime = rnorm(nProcess, 10),
+ recurrent = FALSE, rho = rho_weibull_ph,
+ arguments = list(rho = list(lambda = 0.01, p = 2)))
## ID time event origin X.1 X.2
## 1 1 7.48 1 0 0.887 1
## 2 2 4.24 1 0 -0.151 1
@@ -365,14 +437,17 @@ 2018-02-08
## 8 8 4.35 1 0 0.435 0
## 9 9 3.78 1 0 0.800 0
## 10 10 8.93 0 0 -0.164 1
-The baseline hazard function of the gompertz model is \(\rho(t) = \lambda\exp(\alpha t)\), where \(\lambda > 0\) is the scale parameter and \(\alpha\) is the shape paramter. So the logrithm of the baseline hazard function is linear in time \(t\).
Similar to the example for the Weibull model given in last section, we may simulate the survival data of ten processes as follows:
-rho_gompertz <- function(time, lambda, alpha) lambda * exp(alpha * time)
-simEventData(z = cbind(rnorm(nProcess), rbinom(nProcess, 1, 0.5)),
- zCoef = c(0.5, 0.2), endTime = rnorm(nProcess, 10),
- recurrent = FALSE, rho = rho_gompertz,
- arguments = list(rho = list(lambda = 0.1, alpha = 0.1)))
rho_gompertz <- function(time, lambda, alpha) lambda * exp(alpha * time)
+simEventData(z = cbind(rnorm(nProcess), rbinom(nProcess, 1, 0.5)),
+ zCoef = c(0.5, 0.2), endTime = rnorm(nProcess, 10),
+ recurrent = FALSE, rho = rho_gompertz,
+ arguments = list(rho = list(lambda = 0.1, alpha = 0.1)))
## ID time event origin X.1 X.2
## 1 1 8.95 0 0 -1.462 1
## 2 2 8.74 0 0 0.688 0
@@ -384,17 +459,20 @@ 2018-02-08
## 8 8 4.48 1 0 -1.008 1
## 9 9 7.40 1 0 -0.119 1
## 10 10 9.78 0 0 -0.280 0
-As discussed in Section accelerated failure time model, the hazard function of the log-logistic model is \[\lambda(t \mid \mathbf{z}_i) = \frac{p (t\lambda_{\mathbf{z}})^p/t}{1 + (t\lambda_{\mathbf{z}})^p}.\] So we may simulate the survival data of ten processes from the log-logistic model as follows:
-rho_loglogistic <- function(time, z, zCoef, p) {
- lambda <- 1 / parametrize(z, zCoef, FUN = "exponential")
- lambda * p * (lambda * time) ^ (p - 1) / (1 + (lambda * time) ^ p)
-}
-simEventData(z = cbind(1, rnorm(nProcess), rbinom(nProcess, 1, 0.5)),
- zCoef = c(0.3, 0.5, 0.2), end = rnorm(nProcess, 10),
- recurrent = FALSE, relativeRisk = "none", rho = rho_loglogistic,
- arguments = list(rho = list(p = 1.5)))
rho_loglogistic <- function(time, z, zCoef, p) {
+ lambda <- 1 / parametrize(z, zCoef, FUN = "exponential")
+ lambda * p * (lambda * time) ^ (p - 1) / (1 + (lambda * time) ^ p)
+}
+simEventData(z = cbind(1, rnorm(nProcess), rbinom(nProcess, 1, 0.5)),
+ zCoef = c(0.3, 0.5, 0.2), end = rnorm(nProcess, 10),
+ recurrent = FALSE, relativeRisk = "none", rho = rho_loglogistic,
+ arguments = list(rho = list(p = 1.5)))
## ID time event origin X.1 X.2 X.3
## 1 1 0.206 1 0 1 0.9625 1
## 2 2 1.350 1 0 1 0.6843 0
@@ -406,22 +484,25 @@ 2018-02-08
## 8 8 1.664 1 0 1 0.4282 1
## 9 9 3.238 1 0 1 0.0247 1
## 10 10 0.279 1 0 1 -1.6675 1
-Notice that in the function rho_loglogistic
for the hazard function of the log-logistic model, we wrapped the parametrization of the covariates and covariate coefficients with the function parametrize
and specified that FUN = "exponential"
. In addition, we specified relativeRisk = "none"
when calling simEventData
for AFT models.
Notice that in the function rho_loglogistic()
for the hazard function of the log-logistic model, we wrapped the parametrization of the covariates and covariate coefficients with the function parametrize
and specified that FUN = "exponential"
. In addition, we specified relativeRisk = "none"
when calling simEventData()
for AFT models.
By following the discussion given in Section accelerated failure time model, it is not hard to obtain the hazard function of the log-normal model, \[\lambda(t \mid \mathbf{z}_i) = \frac{p}{t} \rho\bigl(p(\log(t) - \boldsymbol{\beta}^{\top}\mathbf{z}_i)\bigr),\] where \(\rho(w) = \phi(w) / \bigl(1 - \Phi(w)\bigr)\), \(\phi(w)\) and \(\Phi(w)\) is the density and cumulative distribution function of random variable following standard normal distribution.
The example of simulating survival times of ten processes from the log-normal model is given as follows:
-rho_lognormal <- function(time, z, zCoef, p) {
- foo <- function(x) dnorm(x) / pnorm(x, lower.tail = FALSE)
- alpha <- parametrize(z, zCoef, FUN = "linear") - 1
- w <- p * (log(time) - alpha)
- foo(w) * p / time
-}
-simEventData(z = cbind(1, rnorm(nProcess), rbinom(nProcess, 1, 0.5)),
- zCoef = c(0.3, 0.5, 0.2), end = rnorm(nProcess, 10),
- recurrent = FALSE, relativeRisk = "none",
- rho = rho_lognormal, method = "inversion",
- arguments = list(rho = list(p = 0.5)))
rho_lognormal <- function(time, z, zCoef, p) {
+ foo <- function(x) dnorm(x) / pnorm(x, lower.tail = FALSE)
+ alpha <- parametrize(z, zCoef, FUN = "linear") - 1
+ w <- p * (log(time) - alpha)
+ foo(w) * p / time
+}
+simEventData(z = cbind(1, rnorm(nProcess), rbinom(nProcess, 1, 0.5)),
+ zCoef = c(0.3, 0.5, 0.2), end = rnorm(nProcess, 10),
+ recurrent = FALSE, relativeRisk = "none",
+ rho = rho_lognormal, method = "inversion",
+ arguments = list(rho = list(p = 0.5)))
## ID time event origin X.1 X.2 X.3
## 1 1 0.0907 1 0 1 0.433 1
## 2 2 3.7646 1 0 1 0.510 0
@@ -434,22 +515,26 @@ 2018-02-08
## 9 9 0.1656 1 0 1 -1.434 1
## 10 10 0.2125 1 0 1 -1.611 0
Notice that the time origin was set to be zero by default. So the time variable \(t\) in the denominator of the hazard function \(\lambda(t \mid \mathbf{z}_i)\), may result in undefined value when \(t=0\). Therefore, we specified method = "inversion"
for the inversion sampling method for the log-normal model.
Aalen, Odd, Ornulf Borgan, and Hakon Gjessing. 2008. Survival and Event History Analysis: A Process Point of View. Springer Science & Business Media.
Andersen, Per Kragh, and Richard David Gill. 1982. “Cox’s Regression Model for Counting Processes: A Large Sample Study.” The Annals of Statistics 10 (4). JSTOR:1100–1120.
+Andersen, Per Kragh, and Richard David Gill. 1982. “Cox’s Regression Model for Counting Processes: A Large Sample Study.” The Annals of Statistics 10 (4): 1100–1120.
Çinlar, Erhan. 1975. Introduction to Stochastic Processes. Englewood Cliffs, NJ: Printice-Hall.
+Çınlar, Erhan. 1975. Introduction to Stochastic Processes. Englewood Cliffs, NJ: Printice-Hall.
Cox, David R. 1972. “Regression Models and Life-Tables.” Journal of the Royal Statistical Society. Series B (Methodological) 34 (2). JSTOR:187–220.
+Cox, David R. 1972. “Regression Models and Life-Tables.” Journal of the Royal Statistical Society. Series B (Methodological) 34 (2): 187–220.
Crowley, John, and Marie Hu. 1977. “Covariance Analysis of Heart Transplant Survival Data.” Journal of the American Statistical Association 72 (357). Taylor & Francis Group:27–36.
+Crowley, John, and Marie Hu. 1977. “Covariance Analysis of Heart Transplant Survival Data.” Journal of the American Statistical Association 72 (357): 27–36.
Kalbfleisch, John D, and Ross L Prentice. 2002. The Statistical Analysis of Failure Time Data. Vol. 360. John Wiley & Sons.
@@ -458,14 +543,14 @@Kleinbaum, D G, and M Klein. 2011. Survival Analysis: A Self-Learning Text. New York: Springer.
Lewis, P A, and G S Shedler. 1979. “Simulation of Nonhomogeneous Poisson Processes by Thinning.” Naval Research Logistics Quarterly 26 (3). Wiley Online Library:403–13.
+Lewis, P A, and G S Shedler. 1979. “Simulation of Nonhomogeneous Poisson Processes by Thinning.” Naval Research Logistics Quarterly 26 (3): 403–13.
inst/CITATION
Wang W, Fu H and Yan J (2018). +
Wang W, Fu H, Yan J (2019). reda: Recurrent Event Data Analysis. -R package version 0.4.2.9000, https://CRAN.R-project.org/package=reda. +R package version 0.5.0.9002, https://github.com/wenjie2wang/reda.
-@Manual{, +@Manual{reda-package, title = {{reda}: {R}ecurrent Event Data Analysis}, author = {Wenjie Wang and Haoda Fu and Jun Yan}, - note = {{R} package version 0.4.2.9000}, - year = {2018}, - url = {https://CRAN.R-project.org/package=reda}, + year = {2019}, + url = {https://github.com/wenjie2wang/reda}, + note = {{R} package version 0.5.0.9002}, }+Authors
Wenjie Wang. Author, maintainer. +
Haoda Fu. Author. +
Haoda Fu. Author.
Jun Yan. Contributor. +
The R pacakge reda provides functions for
+The R package reda provides functions for
The latest version of package is under development at GitHub in branch dev
. If it is able to pass the building check by Travis CI, you may consider installing it with the help of remotes by
Online documentation for the latest CRAN version
+Online documentation provides function documentations and includes package vignettes forOnline documentation for the latest version under development
The latest version of the package is under development at GitHub. If it is able to pass the building check by Travis CI, you may consider installing it with the help of remotes by
+if (! require(remotes)) install.packages("remotes")
+remotes::install_github("wenjie2wang/reda")
The R package reda is free software: You can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or any later version (at your option). See the GNU General Public License for details.
The R package reda is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
GPL (>= 3)
+Add function Recur
as a successor or function Survr
for model formula response.
Added a new package vignette introducing the function Recur
.
Added a new argument adjustRiskset
to the method mcf.formula
for specifying whether to adjust the size of risk set. The cumulative sample mean function estimates will be computed by setting adjustRiskset = FALSE
.
Added a new option of "CSV"
for cumulative sample variance estimates to the argument variance
of the method mcf.formula
.
The function Survr
is deprecated since this version and will be removed in future.
Added implementation of nonparametric MCF estimates in C++ with help of Rcpp and replaced original implementation in R with the new implementation for a better computational performance.
Added function simEvent
and simEventData
for simulating survival, recurrent event, and multiple event data from stochastic process point of view.
Added function mcfDiff
and mcfDiff.test
for comparing two-sample MCFs by difference estimates over time and the pseudo-score tests.
Added variance estimates of sample MCF from bootstrap methods.
Updated checking rule of argument event
of function Survr
for modeling sample MCF of cost in addition to number of events.
Updated Lawless and Nadaeu (1995) variance estimates in method mcf.formula
for sample MCF.
Renamed class sampleMcf
to mcf.formula
, rateRegMcf
to mcf.rateReg
, baseRateReg
to baseRate.rateReg
, summaryRateReg
to summary.rateReg
.
Survr(ID, time, event) ~ 1
for modeling baseline rate function using gamma frailty model in function rateReg
without specifying any covariate.baseRate
for estimated baseline rate function instead of the estimated coefficients of spline bases.Added M-spline for modeling baseline rate to function rateReg
.
Added argument check
to function rateReg
so that it is possible to skip the data checking step to slightly speed up the model fitting for cleaned data.
Added sample valve-seat dataset from Nelson (1995) for demonstration.
Borrowed the power from R package splines2 for piece-wise constant and splines based baseline rate function, and thus boosted the performance of reda in model fitting.
Updated object class of fitted model, rateReg
.
Added variable “gender” in sample simulated dataset, simuDat
for a better demonstration of sample MCF function.
AIC,rateReg-method
is an S4 class method calculating Akaike
information criterion (AIC) for one or several rateReg
objects,
according to the formula - 2 * log-likelihood + 2 * nPar, where nPar
represents the number of parameters in the fitted model.
# S4 method for rateReg -AIC(object, ..., k = 2)- -
If just one object is provided, a numeric value representing @@ -136,20 +169,17 @@
df
and AIC
,
where df
means degree of freedom, which is the number of
parameters in the fitted model.
-
When comparing models fitted by maximum likelihood to the same data, the
smaller the AIC, the better the fit. A friendly warning will be thrown out
if the numbers of observation were different in the model comparison.
-help(AIC, stats)
for other details.
help(AIC, stats)
for other details.
rateReg
for model fitting;
+
rateReg
for model fitting;
summary,rateReg-method
for summary of a fitted model;
-BIC,rateReg-method
for BIC.
BIC,rateReg-method
for BIC.BIC,rateReg-method
is an S4 class method calculating
Bayesian information criterion (BIC) or so-called
Schwarz's Bayesian criterion (SBC)
@@ -110,12 +147,12 @@
# S4 method for rateReg -BIC(object, ...)- -
... | -Optionally more fitted model objects. |
+ More fitted model objects. |
---|
If just one object is provided, a numeric value representing @@ -136,19 +173,16 @@
df
and BIC
,
where df
means degree of freedom,
which is the number of parameters in the fitted model.
-
When comparing models fitted by maximum likelihood to the same
data, the smaller the BIC, the better the fit.
-help(BIC, stats)
for other details.
help(BIC, stats)
for other details.
rateReg
for model fitting;
+
rateReg
for model fitting;
summary,rateReg-method
for summary of a fitted model;
-AIC,rateReg-method
for AIC.
AIC,rateReg-method
for AIC.R/class.R
+ Recur-class.Rd
The class Recur
is an S4 that represents a formula response for
+recurrent event data model. The function Recur
produces
+objects of this class. See ``Slots'' for details.
.Data
A numeric matrix that consists of the following columns: +
time1
: the beginning of time segements;
time2
: the end of time segements;
id
: Identificators
+of subjects;
event
: Event indicators;
:
+terminal
: Indicators of terminal events.
ID
A charactrer vector for original identificators of subjects.
ord
An integer vector for increasingly ordering data by id
,
+time2
, and - event
. Sorting is often done in the
+model-fitting steps, where the indices stored in this slot can be used
+directly.
rev_ord
An integer vector for reverting the ordering of the sorted
+data (by ord
) to its original ordering. This slot is provided to
+easily revert the sorting.
first_idx
An integer vector indicating the first record of each +subject in the sorted matrix. It helps in the data checking produce and +may be helpful in model-fitting step, such as getting the origin time.
last_idx
An integer vector indicating the last record of each subject
+in the sorted data. Similar to first_idx
, it helps in the data
+checking produce and may be helpful in the model-fitting step, such as
+locating the terminal events.
check
A character string indicating how the data checking is +performed. It just records the option that users specified on data +checking.
Specify time segements or recurrent episodes by endpoints.
+time1 %to% time2 + +time1 %2% time2+ +
time1 | +The left end-points of the recurrent episodes. |
+
---|---|
time2 | +The right end-points of the recurrent episodes. |
+
A list that consists of two elements named
+ "time1"
and "time2"
.
This function is intended to be used for specifying the argument time
+in function Recur
.
Create an S4 class object that represents formula response for recurrent +event data with optional checking procedures embedded.
+Recur(time, id, event, terminal, origin, + check = c("hard", "soft", "none"), ...)+ +
time | +A numerical vector representing the time of reccurence event or
+censoring, or a list with elements named |
+
---|---|
id | +Subject identificators. It can be numeric vector, character
+vector, or a factor vector. If it is left unspecified, |
+
event | +A numeric vector that may represent the status, costs, or types +of the recurrent events. Logical vector is allowed and converted to +numeric vector. Non-positive values are internally converted to zero +indicating censoring status. |
+
terminal | +A numeric vector that may represent the status, costs, or
+types of the terminal events. Logival vector is allowed and converted
+to numeric vector. Non-positive values are internally converted to zero
+indicating censoring status. If a scalar value is specified, all
+subjects will have the same status of terminal events at their last
+recurrent episodes. The length of the specified |
+
origin | +The time origin of each subject. If a scalar value is
+specified, all subjects will have the same origin at the specified
+value. The length of the specified |
+
check | +A character value specifying how to perform the checks for
+recurrent event data. Errors or warnings will be thrown, respectively,
+if the |
+
... | +Other arguments for future usage. A warning will be thrown if +any invalid argument is specified. |
+
An Recur
object.
This is a successor function of the deprecated function Survr
. See
+the vignette by `vignette("reda-Recur")` for details.
+#> [1] 251: (0, 761+] +#> [2] 328: (0, 759+] +#> [3] 329: (0, 98], (98, 667+] +#> [4] 390: (0, 326], (326, 653], ..., (653, 667+] +#> [5] 395: (0, 665+] +#> [6] 397: (0, 84], (84, 667+] +#> [7] 400: (0, 87], (87, 663+] +#> [8] 404: (0, 646], (646, 653+] +#> [9] 407: (0, 92], (92, 653+] +#> [10] 409: (0, 651+] +#> [11] 411: (0, 258], (258, 328], ..., (621, 650+] +#> [12] 252: (0, 61], (61, 539], (539, 648+] +#> [13] 389: (0, 254], (254, 276], ..., (640, 644+] +#> [14] 395: (0, 76], (76, 538], (538, 642+] +#> [15] 401: (0, 635], (635, 641+] +#> [16] 404: (0, 349], (349, 404], ..., (561, 649+] +#> [17] 409: (0, 631+] +#> [18] 411: (0, 596+] +#> [19] 415: (0, 120], (120, 479], (479, 614+] +#> [20] 327: (0, 323], (323, 449], (449, 582+] +#> [21] 389: (0, 139], (139, 139], (139, 589+] +#> [22] 394: (0, 593+] +#> [23] 394: (0, 573], (573, 589+] +#> [24] 397: (0, 165], (165, 408], ..., (604, 606+] +#> [25] 405: (0, 249], (249, 594+] +#> [26] 407: (0, 344], (344, 497], (497, 613+] +#> [27] 412: (0, 265], (265, 586], (586, 595+] +#> [28] 420: (0, 166], (166, 206], ..., (348, 389+] +#> [29] 390: (0, 601+] +#> [30] 392: (0, 410], (410, 581], (581, 601+] +#> [31] 395: (0, 611+] +#> [32] 396: (0, 608+] +#> [33] 397: (0, 587+] +#> [34] 400: (0, 367], (367, 603+] +#> [35] 403: (0, 202], (202, 563], ..., (570, 585+] +#> [36] 409: (0, 587+] +#> [37] 411: (0, 578+] +#> [38] 413: (0, 578+] +#> [39] 416: (0, 586+] +#> [40] 417: (0, 585+] +#> [41] 421: (0, 582+]#> [1] 251: (0, 761+] +#> [2] 328: (0, 759+] +#> [3] 329: (0, 98], (98, 667+] +#> [4] 390: (0, 326], (326, 653], ..., (653, 667+] +#> [5] 395: (0, 665+] +#> [6] 397: (0, 84], (84, 667+] +#> [7] 400: (0, 87], (87, 663+] +#> [8] 404: (0, 646], (646, 653+] +#> [9] 407: (0, 92], (92, 653+] +#> [10] 409: (0, 651+] +#> [11] 411: (0, 258], (258, 328], ..., (621, 650+] +#> [12] 252: (0, 61], (61, 539], (539, 648+] +#> [13] 389: (0, 254], (254, 276], ..., (640, 644+] +#> [14] 395: (0, 76], (76, 538], (538, 642+] +#> [15] 401: (0, 635], (635, 641+] +#> [16] 404: (0, 349], (349, 404], ..., (561, 649+] +#> [17] 409: (0, 631+] +#> [18] 411: (0, 596+] +#> [19] 415: (0, 120], (120, 479], (479, 614+] +#> [20] 327: (0, 323], (323, 449], (449, 582+] +#> [21] 389: (0, 139], (139, 139], (139, 589+] +#> [22] 394: (0, 593+] +#> [23] 394: (0, 573], (573, 589+] +#> [24] 397: (0, 165], (165, 408], ..., (604, 606+] +#> [25] 405: (0, 249], (249, 594+] +#> [26] 407: (0, 344], (344, 497], (497, 613+] +#> [27] 412: (0, 265], (265, 586], (586, 595+] +#> [28] 420: (0, 166], (166, 206], ..., (348, 389+] +#> [29] 390: (0, 601+] +#> [30] 392: (0, 410], (410, 581], (581, 601+] +#> [31] 395: (0, 611+] +#> [32] 396: (0, 608+] +#> [33] 397: (0, 587+] +#> [34] 400: (0, 367], (367, 603+] +#> [35] 403: (0, 202], (202, 563], ..., (570, 585+] +#> [36] 409: (0, 587+] +#> [37] 411: (0, 578+] +#> [38] 413: (0, 578+] +#> [39] 416: (0, 586+] +#> [40] 417: (0, 585+] +#> [41] 421: (0, 582+]#> Warning: Invalid argument 'death' went into `...` of Recur()#> [1] 251: (0, 761+] +#> [2] 328: (0, 759+] +#> [3] 329: (0, 98], (98, 667+] +#> [4] 390: (0, 326], (326, 653], ..., (653, 667+] +#> [5] 395: (0, 665+] +#> [6] 397: (0, 84], (84, 667+] +#> [7] 400: (0, 87], (87, 663+] +#> [8] 404: (0, 646], (646, 653+] +#> [9] 407: (0, 92], (92, 653+] +#> [10] 409: (0, 651+] +#> [11] 411: (0, 258], (258, 328], ..., (621, 650+] +#> [12] 252: (0, 61], (61, 539], (539, 648+] +#> [13] 389: (0, 254], (254, 276], ..., (640, 644+] +#> [14] 395: (0, 76], (76, 538], (538, 642+] +#> [15] 401: (0, 635], (635, 641+] +#> [16] 404: (0, 349], (349, 404], ..., (561, 649+] +#> [17] 409: (0, 631+] +#> [18] 411: (0, 596+] +#> [19] 415: (0, 120], (120, 479], (479, 614+] +#> [20] 327: (0, 323], (323, 449], (449, 582+] +#> [21] 389: (0, 139], (139, 139], (139, 589+] +#> [22] 394: (0, 593+] +#> [23] 394: (0, 573], (573, 589+] +#> [24] 397: (0, 165], (165, 408], ..., (604, 606+] +#> [25] 405: (0, 249], (249, 594+] +#> [26] 407: (0, 344], (344, 497], (497, 613+] +#> [27] 412: (0, 265], (265, 586], (586, 595+] +#> [28] 420: (0, 166], (166, 206], ..., (348, 389+] +#> [29] 390: (0, 601+] +#> [30] 392: (0, 410], (410, 581], (581, 601+] +#> [31] 395: (0, 611+] +#> [32] 396: (0, 608+] +#> [33] 397: (0, 587+] +#> [34] 400: (0, 367], (367, 603+] +#> [35] 403: (0, 202], (202, 563], ..., (570, 585+] +#> [36] 409: (0, 587+] +#> [37] 411: (0, 578+] +#> [38] 413: (0, 578+] +#> [39] 416: (0, 586+] +#> [40] 417: (0, 585+] +#> [41] 421: (0, 582+]#> [1] 251: (10, 761+] +#> [2] 328: (10, 759+] +#> [3] 329: (10, 98], (98, 667+] +#> [4] 390: (10, 326], (326, 653], ..., (653, 667+] +#> [5] 395: (10, 665+] +#> [6] 397: (10, 84], (84, 667+] +#> [7] 400: (10, 87], (87, 663+] +#> [8] 404: (10, 646], (646, 653+] +#> [9] 407: (10, 92], (92, 653+] +#> [10] 409: (10, 651+] +#> [11] 411: (10, 258], (258, 328], ..., (621, 650+] +#> [12] 252: (10, 61], (61, 539], (539, 648+] +#> [13] 389: (10, 254], (254, 276], ..., (640, 644+] +#> [14] 395: (10, 76], (76, 538], (538, 642+] +#> [15] 401: (10, 635], (635, 641+] +#> [16] 404: (10, 349], (349, 404], ..., (561, 649+] +#> [17] 409: (10, 631+] +#> [18] 411: (10, 596+] +#> [19] 415: (10, 120], (120, 479], (479, 614+] +#> [20] 327: (10, 323], (323, 449], (449, 582+] +#> [21] 389: (10, 139], (139, 139], (139, 589+] +#> [22] 394: (10, 593+] +#> [23] 394: (10, 573], (573, 589+] +#> [24] 397: (10, 165], (165, 408], ..., (604, 606+] +#> [25] 405: (10, 249], (249, 594+] +#> [26] 407: (10, 344], (344, 497], (497, 613+] +#> [27] 412: (10, 265], (265, 586], (586, 595+] +#> [28] 420: (10, 166], (166, 206], ..., (348, 389+] +#> [29] 390: (10, 601+] +#> [30] 392: (10, 410], (410, 581], (581, 601+] +#> [31] 395: (10, 611+] +#> [32] 396: (10, 608+] +#> [33] 397: (10, 587+] +#> [34] 400: (10, 367], (367, 603+] +#> [35] 403: (10, 202], (202, 563], ..., (570, 585+] +#> [36] 409: (10, 587+] +#> [37] 411: (10, 578+] +#> [38] 413: (10, 578+] +#> [39] 416: (10, 586+] +#> [40] 417: (10, 585+] +#> [41] 421: (10, 582+]
The class Survr
is an S4 that represents a formula response for
recurrent event data model. The function Survr
produces
objects of this class. See ``Slots'' for details.
.Data
A numeric matrix object.
ID
A charactrer vector for original subject identificator.
check
A logical value indicating whether to performance data checking.
ord
An integer vector for increasingly ordering data by ID
,
+
+
ID
A charactrer vector for original subject identificator.
check
A logical value indicating whether to performance data checking.
ord
An integer vector for increasingly ordering data by ID
,
time
, and 1 - event
.
Survr
is an S4 class that represents
-formula response for recurrent event data
-modeled by methods based on counts and rate function.
Create an S4 class that represents formula response for recurrent event data +modeled by methods based on counts and rate function. Note that the +function is deprecated since version 0.5.0 and will be removed in future.
+Survr(ID, time, event, origin = 0, check = TRUE, ...)- -
Other arguments for future usage. |
This is a similar function to Survr
in package
@@ -163,37 +195,36 @@
The time origin has to be the same and not later than any event time.
rateReg
for model fitting.
R/Recur.R
+ as.character-Recur-method.Rd
Summarize and convert the recurrent episodes for each subjects into +character strings.
+# S4 method for Recur +as.character(x, ...)+ +
x | +An Recur object. |
+
---|---|
... | +Other arguments for future usage. |
+
This function is intended to be a helper function for the `show()` method of +`Recur` objects. To be precise, the function set the maximum number of +recurrent episodes for each subject to be `max(2L, +as.integer(getOption("reda.Recur.maxPrint")))`. By default, at most three +recurrent episodes will be summarized for printing. When subjects having +more than three recurrent episodes, the first +`getOption("reda.Recur.maxPrint") - 1` number of recurrent episodes and the +last one will be summarized. One may use `options()` to adjust the setting. +For example, the default value is equivalent to `options(reda.Recur.maxPrint += 3)`.
+ +An S4 class generic function that returns the estimated baseline rate function.
- +baseRate(object, ...) # S4 method for rateReg -baseRate(object, level = 0.95, control = list(), ...)- -
A baseRate
object.
rateReg
: Estiamted baseline rate from a fitted model.
rateReg
for model fitting;
+
rateReg
for model fitting;
summary,rateReg-method
for summary of a fitted model;
-plot,baseRate.rateReg-method
for ploting method.
plot,baseRate.rateReg-method
for ploting method.## See examples given in function rateReg. @@ -171,30 +200,32 @@Examp
Contents
- Arguments
-- Value
-- Methods (by class)
-- See also
-- Examples
R/class.R
+ baseRate.rateReg-class.Rd
An S4 class that represents the estimated baseline rate function from model.
The function baseRate
produces objects of this class.
Perform several checks for recurrent event data and update object
+attributions if some rows of the contained data (in the .Data
slot)
+have been removed by such as na.action
.
check_Recur(x, check = c("hard", "soft", "none"))+ +
x | +An |
+
---|---|
check | +A character value specifying how to perform the checks for
+recurrent event data. Errors or warnings will be thrown, respectively,
+if the |
+
An Recur
object.
coef,rateReg-method
is an S4 class method that extracts estimated
coefficients of covariates from rateReg
object produced by function
rateReg
.
# S4 method for rateReg -coef(object, ...)- -
Other arguments for future usage. |
A named numeric vector.
-rateReg
for model fitting;
+
rateReg
for model fitting;
confint,rateReg-method
for confidence intervals
for covariate coefficients;
-summary,rateReg-method
for summary of a fitted model.
summary,rateReg-method
for summary of a fitted model.confint,rateReg-method
is an S4 class method for
rateReg
object, which returns approximate confidence intervals
for all or specified covariates.
# S4 method for rateReg -confint(object, parm, level = 0.95, ...)- -
Other arguments for future usage. |
A numeric matrix with row names and column names.
-Under regularity condition (Shao 2003, Theorem 4.16 and Theorem 4.17, page 287, 290), the approximate confidence intervals are constructed loosely based on Fisher information matrix and estimates of coefficients.
-Shao, J. (2003), Mathematical statistics, Springer texts in statistics, New York: Springer, 2nd Edition.
-rateReg
for model fitting;
+
rateReg
for model fitting;
coef,rateReg-method
for point estimates
of covariate coefficients;
-summary,rateReg-method
for summary of a fitted model.
summary,rateReg-method
for summary of a fitted model.
+ All functions+ + |
+ |
---|---|
+ + | +Akaike Information Criterion (AIC) |
+
+ + | +Bayesian Information Criterion (BIC) |
+
+ + | +An S4 Class Representing Formula Response for Recurrent Event Data |
+
+ + | +Recurrent Episodes |
+
+ + | +Formula Response for Recurrent Event Data |
+
+ + | +An S4 Class Representing Formula Response |
+
+ + | +Formula Response for Recurrent Event Data |
+
+ + | +Convert An Recur Object to A Character Vector |
+
+ + | +Estimated Baseline Rate Function |
+
+ + | +An S4 Class Representing Estimated Baseline Rate Function |
+
+ + | +Checks for Recurrent Event Data |
+
+ + | +Estimated Coefficients of Covariates |
+
+ + | +Confidence Intervals for Covariate Coefficients |
+
+ + | +Is the xect from the Recur class? |
+
+ + | +Mean Cumulative Function (MCF) |
+
+ + | +An S4 Class Representing Sample MCF |
+
+ + | +An S4 Class Respresenting Estimated MCF from a Fitted Model |
+
+ + | +An S4 Class Representing Sample MCF Difference |
+
+ + | +Comparing Two-Sample MCFs |
+
+ + | +An S4 Class Representing the Two-Sample Pseudo-Score Test Results |
+
+ + | +Parametrizations of Covariates and Covariate Coefficients |
+
+
|
+ Plot Baseline Rate or Mean Cumulative Function (MCF) |
+
+ + | +An S4 Class Representing a Fitted Model |
+
+ + | +Recurrent Events Regression Based on Counts and Rate Function |
+
+ + | +Recurrent Event Data Analysis |
+
+
|
+ Show an object. |
+
+ + | +An S4 Class for Simulated Recurrent Event or Survival Times |
+
+ + | +Simulated Survival times or Recurrent Events |
+
+ + | +Simulated Sample Dataset for Demonstration |
+
+ + | +Summarizing a Fitted Model |
+
+ + | +An S4 Class Representing Summary of a Fitted Model |
+
+ + | +Valve Seats Dataset |
+
An S4 class that represents sample mean cumulative function (MCF) from data.
The function mcf
produces objects of this class.
formula
Formula.
data
A data frame.
MCF
A data frame.
origin
A named numeric vector.
multiGroup
A logical value.
variance
A character vector.
logConfInt
A logical value.
level
A numeric value.
data
A data frame.
MCF
A data frame.
origin
A named numeric vector.
multiGroup
A logical value.
variance
A character vector.
logConfInt
A logical value.
level
A numeric value.
An S4 class generic function that returns the mean cumulative function (MCF) -estimates from a fitted model or returns the nonparametric MCF estimates -(also called the Nelson-Aalen estimator) from the sample data.
- +estimates from a fitted model or returns the nonparametric MCF estimates (by +Nelson-Aalen estimator or Cook-Lawless cumulative sample mean estimator) +from the sample data. +mcf(object, ...) # S4 method for formula mcf(object, data, subset, na.action, - variance = c("LawlessNadeau", "Poisson", "bootstrap"), logConfInt = FALSE, - level = 0.95, control = list(), ...) + variance = c("LawlessNadeau", "Poisson", "bootstrap", "CSV"), + logConfInt = FALSE, adjustRiskset = TRUE, level = 0.95, + control = list(), ...) # S4 method for rateReg -mcf(object, newdata, groupName, groupLevels, level = 0.95, - na.action, control = list(), ...)- -
data | A data frame, list or environment containing the variables in
the model. If not found in data, the variables are taken from
- |
---|---|
na.action | A function that indicates what should the procedure do if
the data contains |
+setting of
variance | A character specifying the method for variance estimates.
The available options are |
+method,
logConfInt | @@ -164,6 +201,15 @@|
adjustRiskset | +A logical value indicating whether to adjust the size
+of risk-set. If |
level | @@ -185,7 +231,7 @@
A mcf.formula
or mcf.rateReg
object.
For formula
object with Survr
object as response, the
+
For formula
object with Recur
object as response, the
covariate specified at the right hand side of the formula should be either
1
or any "linear" conbination of categorical variable in the data.
The former computes the overall sample MCF. The latter computes the sample
MCF for each level of the combination of the categorical variable(s)
-specified, respectively. The sample MCF is also called Nelson-Aalen
-nonparametric estimator (Nelson 2003) and computed on each time point from
-sample data. The point estimate of sample MCF at each time point does not
-assume any particular underlying model. The variance estimates at each time
-point is computed following the Lawless and Nadeau method (LawLess and
-Nadeau 1995), the Poisson process method, or the bootstrap methods. The
-approximate confidence intervals are provided as well, which are constructed
-based on the asymptotic normality of the MCF itself (by default) or the
-logarithm of MCF.
For rateReg
object, mcf
estimates the baseline
-MCF and its confidence interval at each time grid if argument newdata
-is not specified. Otherwise, mcf
estimates MCF and its confidence
-interval for the given newdata based on Delta-method.
The MCF estimates are computed on each unique time point of the sample data.
+By default, the size of risk set is adjusted over time based on the at-risk
+indicators, which results in the Nelson-Aalen nonparametric estimator
+(Nelson 2003). If the size of risk set remains a constant (total number of
+processes) over time (specified by adjustRiskset = FALSE
), the
+cumulative sample mean (CSM) function introduced in Chapter 1 of Cook and
+Lawless (2007) will be computed instead. The point estimate of sample MCF
+at each time point does not assume any particular underlying model. The
+variance estimates at each time point is computed following the Lawless and
+Nadeau method (LawLess and Nadeau 1995), the Poisson process method, or the
+bootstrap methods. The approximate confidence intervals are provided as
+well, which are constructed based on the asymptotic normality of the MCF
+itself (by default) or the logarithm of MCF.
For rateReg
object, mcf
estimates the baseline MCF and its
+confidence interval at each time grid if argument newdata
is not
+specified. Otherwise, mcf
estimates MCF and its confidence interval
+for the given newdata
based on Delta-method.
formula
: Sample MCF from data.
rateReg
: Estimated MCF from a fitted model.
Lawless, J. F. and Nadeau, C. (1995). Some Simple Robust Methods for the +
Cook, R. J., and Lawless, J. (2007). The statistical analysis of +recurrent events, Springer Science & Business Media.
+Lawless, J. F. and Nadeau, C. (1995). Some Simple Robust Methods for the Analysis of Recurrent Events. Technometrics, 37, 158--168.
Nelson, W. B. (2003). Recurrent Events Data Analysis for Product Repairs, Disease Recurrences, and Other Applications (Vol. 10). SIAM.
-rateReg
for model fitting;
+
rateReg
for model fitting;
mcfDiff
for comparing two-sample MCFs.
-plot-method
for plotting MCF.
plot-method
for plotting MCF.library(reda) +@@ -389,34 +439,34 @@library(reda) ### sample MCF ## Example 1. valve-seat data ## the default variance estimates by Lawless and Nadeau (1995) method -valveMcf0 <- mcf(Survr(ID, Days, No.) ~ 1, data = valveSeats) -plot(valveMcf0, conf.int = TRUE, mark.time = TRUE, addOrigin = TRUE) + - ggplot2::xlab("Days") + ggplot2::theme_bw()+valveMcf0 <- mcf(Recur(Days, ID, No.) ~ 1, data = valveSeats) +plot(valveMcf0, conf.int = TRUE, mark.time = TRUE, addOrigin = TRUE) + + ggplot2::xlab("Days") + ggplot2::theme_bw()## variance estimates following Poisson process model -valveMcf1 <- mcf(Survr(ID, Days, No.) ~ 1, +valveMcf1 <- mcf(Recur(Days, ID, No.) ~ 1, data = valveSeats, variance = "Poisson") ## variance estimates by bootstrap method (with 1,000 bootstrap samples) -valveMcf2 <- mcf(Survr(ID, Days, No.) ~ 1, +set.seed(123) +valveMcf2 <- mcf(Recur(Days, ID, No.) ~ 1, data = valveSeats, variance = "bootstrap", - control = list(B = 2e2)) + control = list(B = 1e3)) ## comparing the variance estimates from different methods -library(ggplot2) -ciDat <- rbind(cbind(valveMcf0@MCF, Method = "Lawless & Nadeau"), - cbind(valveMcf1@MCF, Method = "Poisson"), - cbind(valveMcf2@MCF, Method = "Bootstrap")) -ggplot(ciDat, aes(x = time, y = se)) + - geom_step(aes(color = Method, linetype = Method)) + - xlab("Days") + ylab("SE estimates") + theme_bw()+library(ggplot2) +ciDat <- rbind(cbind(valveMcf0@MCF, Method = "Lawless & Nadeau"), + cbind(valveMcf1@MCF, Method = "Poisson"), + cbind(valveMcf2@MCF, Method = "Bootstrap")) +ggplot(ciDat, aes(x = time, y = se)) + + geom_step(aes(color = Method, linetype = Method)) + + xlab("Days") + ylab("SE estimates") + theme_bw()## comparing the confidence interval estimates from different methods -ggplot(ciDat, aes(x = time)) + - geom_step(aes(y = MCF)) + - geom_step(aes(y = lower, color = Method, linetype = Method)) + - geom_step(aes(y = upper, color = Method, linetype = Method)) + - xlab("Days") + ylab("Confidence intervals") + theme_bw()+ggplot(ciDat, aes(x = time)) + + geom_step(aes(y = MCF)) + + geom_step(aes(y = lower, color = Method, linetype = Method)) + + geom_step(aes(y = upper, color = Method, linetype = Method)) + + xlab("Days") + ylab("Confidence intervals") + theme_bw()## Example 2. the simulated data -simuMcf <- mcf(Survr(ID, time, event) ~ group + gender, +simuMcf <- mcf(Recur(time, ID, event) ~ group + gender, data = simuDat, ID %in% 1 : 50) -plot(simuMcf, conf.int = TRUE, lty = 1 : 4, - legendName = "Treatment & Gender")### estimate MCF difference between two groups ## one sample MCF object of two groups -mcf0 <- mcf(Survr(ID, time, event) ~ group, data = simuDat) +mcf0 <- mcf(Recur(time, ID, event) ~ group, data = simuDat) ## two-sample pseudo-score tests mcfDiff.test(mcf0)#> Two-Sample Pseudo-Score Tests: #> Statistic Variance Chisq DF Pr(>Chisq) @@ -360,16 +410,16 @@Examp #> #> Variance Estimator: robust
## difference estimates over time mcf0_diff <- mcfDiff(mcf0, testVariance = "none") -plot(mcf0_diff)+plot(mcf0_diff)## or explicitly ask for the difference of two sample MCF -mcf1 <- mcf(Survr(ID, time, event) ~ 1, data = simuDat, +mcf1 <- mcf(Recur(time, ID, event) ~ 1, data = simuDat, subset = group %in% "Contr") -mcf2 <- mcf(Survr(ID, time, event) ~ 1, data = simuDat, +mcf2 <- mcf(Recur(time, ID, event) ~ 1, data = simuDat, subset = group %in% "Treat") ## perform two-sample tests and estimate difference at the same time mcf12_diff1 <- mcfDiff(mcf1, mcf2) mcf12_diff2 <- mcf1 - mcf2 # or equivalently using the `-` method -stopifnot(all.equal(mcf12_diff1, mcf12_diff2)) +stopifnot(all.equal(mcf12_diff1, mcf12_diff2)) mcf12_diff1#> Call: #> mcfDiff(mcf1 = mcf1, mcf2 = mcf2) #> @@ -380,7 +430,7 @@Examp #> --- #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #> -#> Variance Estimator: robust
plot(mcf12_diff1)+#> Variance Estimator: robustplot(mcf12_diff1)### For estimated MCF from a fitted model, ### see examples given in function rateReg.Examp
Contents
- Arguments
-- Value
-- Details
-- Methods (by class)
-- References
-- See also
-- Examples
R/class.R
+ mcf.rateReg-class.Rd
An S4 class that represents estimated mean cumulative function (MCF) from
Models. The function mcf
produces objects of this class.
call
Function call.
formula
Formula.
spline
A character.
knots
A numeric vector.
degree
A nonnegative integer.
Boundary.knots
A numeric vector.
newdata
A numeric matrix.
MCF
A data frame.
level
A numeric value between 0 and 1.
na.action
A length-one character vector.
control
A list.
multiGroup
A logical value.
formula
Formula.
spline
A character.
knots
A numeric vector.
degree
A nonnegative integer.
Boundary.knots
A numeric vector.
newdata
A numeric matrix.
MCF
A data frame.
level
A numeric value between 0 and 1.
na.action
A length-one character vector.
control
A list.
multiGroup
A logical value.
An S4 class that represents the difference between two sample mean
cumulative functions from data. The function mcfDiff
produces objects of this class.
call
A function call.
MCF
A data frame.
origin
A named numeric vector.
variance
A character vector.
logConfInt
A logical value.
level
A numeric value.
test
A mcfDiff.test
class object.
MCF
A data frame.
origin
A named numeric vector.
variance
A character vector.
logConfInt
A logical value.
level
A numeric value.
test
A mcfDiff.test
class object.
This function estimates the sample MCF difference between two groups. Both the point estimates and the confidence intervals are computed (Lawless and Nadeau 1995). The two-sample pseudo-score test proposed by Cook, Lawless, and Nadeau (1996) is also performed by default.
- +mcfDiff(mcf1, mcf2 = NULL, level = 0.95, ...) @@ -115,8 +148,8 @@- -Comparing Two-Sample MCFs
mcfDiff.test(mcf1, mcf2 = NULL, testVariance = c("robust", "Poisson", "none"), ...)
The function mcfDiff
returns a mcfDiff
object (of S4 class)
@@ -167,7 +200,7 @@
origin
: Time origins of the two groups.
variance
: The method used for variance estimates.
logConfInt
: A logical value indicating whether normality is
-assumed for log(MCF)
instead of MCF itself. For mcfDiff
+assumed for log(MCF)
instead of MCF itself. For mcfDiff
object, it is always FALSE
.
level
: Confidence level specified.
test
: A mcfDiff.test
object for the hypothesis test
@@ -182,7 +215,6 @@
The function mcfDiff
estimates the two-sample MCFs' difference and
@@ -212,7 +244,6 @@
Lawless, J. F., & Nadeau, C. (1995). Some Simple Robust Methods for the @@ -222,7 +253,6 @@
Cook, R. J., & Lawless, J. (2007). The Statistical Analysis of Recurrent Events. Springer Science & Business Media.
-## See examples given for function mcf. @@ -232,30 +262,32 @@Examp
Contents
- Arguments
-- Value
-- Details
-- References
-- Examples
R/class.R
+ mcfDiff.test-class.Rd
An S4 class that represents the results of the two-sample pseudo-score tests
between two sample mean cumulative functions. The function
mcfDiff.test
produces objects of this class.
This function helps the parametrizations of covariates and covariate
coeffcients when users specify a general hazard rate function in function
simEvent
and simEventData
. It applies the specified function
@@ -109,11 +145,11 @@
z
and the \(i_{th}\) row of the coefficient matrix,
iteratively, for \(i\) from one to the number of rows of the covariate
matrix z
.
-
+ parametrize(z, zCoef, FUN = c("exponential", "linear", "excess"), ...)-
parametrize(z, zCoef, FUN = c("exponential", "linear", "excess"), ...)- -
Other arguments that can be passed to the function |
A numeric vector.
-simEvent
simEvent
## time points -timeVec <- c(0.5, 2) +timeVec <- c(0.5, 2) ## time-variant covariates -zMat <- cbind(0.5, ifelse(timeVec > 1, 1, 0)) +zMat <- cbind(0.5, ifelse(timeVec > 1, 1, 0)) ## time-varying coefficients -zCoefMat <- cbind(sin(timeVec), timeVec) +zCoefMat <- cbind(sin(timeVec), timeVec) ## the following three ways are equivalent for the exponential form, ## where the first one (using the built-in option) has the best performance -parametrize(zMat, zCoefMat, FUN = "exponential")#> [1] 1.270884 11.642343parametrize(zMat, zCoefMat, function(z, zCoef) exp(z %*% zCoef))#> [1] 1.270884 11.642343sapply(1 : 2, function(i) as.numeric(exp(zMat[i, ] %*% zCoefMat[i, ])))#> [1] 1.270884 11.642343+parametrize(zMat, zCoefMat, FUN = "exponential")#> [1] 1.270884 11.642343#> [1] 1.270884 11.642343#> [1] 1.270884 11.642343
S4 class methods plotting sample MCF from data, estimated MCF, or estimated
baseline hazard rate function from a fitted model by using ggplot2
plotting system. The plots generated are thus able to be further customized
properly.
# S4 method for mcf.formula,missing -plot(x, y, lty, col, legendName, legendLevels, - conf.int = FALSE, mark.time = FALSE, addOrigin = FALSE, ...) +plot(x, y, lty, col, legendName, legendLevels, + conf.int = FALSE, mark.time = FALSE, addOrigin = FALSE, ...) # S4 method for mcf.rateReg,missing -plot(x, y, conf.int = FALSE, lty, col, ...) +plot(x, y, conf.int = FALSE, lty, col, ...) # S4 method for baseRate.rateReg,missing -plot(x, y, conf.int = FALSE, lty, col, - ...) +plot(x, y, conf.int = FALSE, lty, col, ...) # S4 method for mcfDiff,missing -plot(x, y, lty, col, legendName, legendLevels, - conf.int = TRUE, addOrigin = FALSE, ...)- -
Other arguments for further usage. |
A ggplot
object.
mcf
for estimation of MCF;
-rateReg
for model fitting.
The class rateReg
is an S4 class that represents a fitted model. The
function rateReg
produces objects of this class. See
``Slots'' for details.
call
Function call.
formula
Formula.
nObs
A positive integer
spline
A list.
estimates
A list.
control
A list.
start
A list.
na.action
A character vector (of length one).
xlevels
A list.
contrasts
A list.
convergCode
A nonnegative integer.
logL
A numeric value.
fisher
A numeric matrix.
formula
Formula.
nObs
A positive integer
spline
A list.
estimates
A list.
control
A list.
start
A list.
na.action
A character vector (of length one).
xlevels
A list.
contrasts
A list.
convergCode
A nonnegative integer.
logL
A numeric value.
fisher
A numeric matrix.
This function fits recurrent event data (event counts) by gamma frailty model with spline rate function. The default model is the gamma frailty model with one piece constant baseline rate function, which is equivalent to negative binomial regression with the same shape and rate parameter in the gamma prior. Spline (including piecewise constant) baseline hazard rate function can be specified for the model fitting.
- +rateReg(formula, data, subset, df = NULL, knots = NULL, - degree = 0L, na.action, spline = c("bSplines", "mSplines"), - start = list(), control = list(), contrasts = NULL, ...)- -
formula | -
|
+
|
---|---|---|
data | An optional data frame, list or environment containing the
variables in the model. If not found in data, the variables are taken
-from |
|
na.action | A function that indicates what should the procedure do if
the data contains |
+setting of |
spline | @@ -182,7 +219,7 @@||
Other arguments for future usage. |
A rateReg
object, whose slots include
contrasts
: Contrasts specified and used for each
factor variable.
convergCode
: code
returned by function
- optim
, which is an integer indicating why the
- optimization process terminated. help(optim)
for details.
optim
, which is an integer indicating why the
+ optimization process terminated. help(optim)
for details.
logL
: Log likelihood of the fitted model.
fisher
: Observed Fisher information matrix.
Function Survr
in the formula response by default first checks
+
Function Recur
in the formula response by default first checks
the dataset and will report an error if the dataset does not fall into
recurrent event data framework. Subject's ID will be pinpointed if its
-observation violates any checking rule. See Survr
for all the
+observation violates any checking rule. See Recur
for all the
checking rules.
Function rateReg
first constructs the design matrix from
the specified arguments: formula
, data
, subset
,
@@ -241,8 +277,8 @@
The model fitting process involves minimization of negative log
-likelihood function, which calls function constrOptim
-internally. help(constrOptim)
for more details.
constrOptim
+internally. help(constrOptim)
for more details.
The argument start
is an optional list
that allows users to specify the initial guess for
the parameter values for the minimization of
@@ -268,31 +304,29 @@
FALSE
to supress any possible message
from this function.
-
+
Fu, H., Luo, J., & Qu, Y. (2016). Hypoglycemic events analysis via recurrent time-to-event (HEART) models. Journal Of Biopharmaceutical Statistics, 26(2), 280--298.
-summary,rateReg-method
for summary of fitted model;
+
summary,rateReg-method
for summary of fitted model;
coef,rateReg-method
for estimated covariate coefficients;
confint,rateReg-method
for confidence interval of
covariate coefficients;
baseRate,rateReg-method
for estimated coefficients of baseline
rate function;
mcf,rateReg-method
for estimated MCF from a fitted model;
-plot,mcf.rateReg-method
for plotting estimated MCF.
plot,mcf.rateReg-method
for plotting estimated MCF.library(reda) ++ xlab("Days")library(reda) ## constant rate function -(constFit <- rateReg(Survr(ID, time, event) ~ group + x1, data = simuDat))#> Call: -#> rateReg(formula = Survr(ID, time, event) ~ group + x1, data = simuDat) +(constFit <- rateReg(Recur(time, ID, event) ~ group + x1, data = simuDat))#> Call: +#> rateReg(formula = Recur(time, ID, event) ~ group + x1, data = simuDat) #> #> Coefficients of covariates: #> groupTreat x1 @@ -307,17 +341,17 @@Examp #> B-spline1 #> 0.03041865
## six pieces' piecewise constant rate function -(piecesFit <- rateReg(Survr(ID, time, event) ~ group + x1, +(piecesFit <- rateReg(Recur(time, ID, event) ~ group + x1, data = simuDat, subset = ID %in% 1:50, - knots = seq.int(28, 140, by = 28)))#> Call: -#> rateReg(formula = Survr(ID, time, event) ~ group + x1, data = simuDat, + knots = seq.int(28, 140, by = 28)))#> Call: +#> rateReg(formula = Recur(time, ID, event) ~ group + x1, data = simuDat, #> subset = ID %in% 1:50, knots = seq.int(28, 140, by = 28)) #> #> Coefficients of covariates: #> groupTreat x1 -#> -0.8030992 0.3361225 +#> -0.8030986 0.3361229 #> -#> Frailty parameter: 0.6869747 +#> Frailty parameter: 0.6869743 #> #> Internal knots: #> 28, 56, 84, 112, 140 @@ -327,11 +361,11 @@Examp #> #> Coefficients of pieces: #> B-spline1 B-spline2 B-spline3 B-spline4 B-spline5 B-spline6 -#> 0.03698082 0.03698082 0.02521386 0.04159822 0.04252535 0.06264075
+#> 0.03698068 0.03698067 0.02521395 0.04159804 0.04252515 0.06264047## fit rate function with cubic spline -(splineFit <- rateReg(Survr(ID, time, event) ~ group + x1, data = simuDat, - knots = c(56, 84, 112), degree = 3))#> Call: -#> rateReg(formula = Survr(ID, time, event) ~ group + x1, data = simuDat, +(splineFit <- rateReg(Recur(time, ID, event) ~ group + x1, data = simuDat, + knots = c(56, 84, 112), degree = 3))#> Call: +#> rateReg(formula = Recur(time, ID, event) ~ group + x1, data = simuDat, #> knots = c(56, 84, 112), degree = 3) #> #> Coefficients of covariates: @@ -350,8 +384,8 @@Examp #> B-spline1 B-spline2 B-spline3 B-spline4 B-spline5 B-spline6 B-spline7 #> 0.01871840 0.03902177 0.02619108 0.01875500 0.03723467 0.06003499 0.03301292
## more specific summary -summary(constFit)#> Call: -#> rateReg(formula = Survr(ID, time, event) ~ group + x1, data = simuDat) +summary(constFit)#> Call: +#> rateReg(formula = Recur(time, ID, event) ~ group + x1, data = simuDat) #> #> Coefficients of covariates: #> coef exp(coef) se(coef) z Pr(>|z|) @@ -373,8 +407,8 @@Examp #> coef se(coef) #> B-spline1 0.030419 0.0057 #> -#> Loglikelihood: -1676.422
summary(piecesFit)#> Call: -#> rateReg(formula = Survr(ID, time, event) ~ group + x1, data = simuDat, +#> Loglikelihood: -1676.422summary(piecesFit)#> Call: +#> rateReg(formula = Recur(time, ID, event) ~ group + x1, data = simuDat, #> subset = ID %in% 1:50, knots = seq.int(28, 140, by = 28)) #> #> Coefficients of covariates: @@ -386,7 +420,7 @@Examp #> #> Parameter of frailty: #> parameter se -#> Frailty 0.6869747 0.1736386 +#> Frailty 0.6869743 0.1736384 #> #> Internal knots: #> 28, 56, 84, 112, 140 @@ -403,10 +437,10 @@
Examp #> B-spline3 0.025214 0.0072 #> B-spline4 0.041598 0.0111 #> B-spline5 0.042525 0.0116 -#> B-spline6 0.062641 0.0175 +#> B-spline6 0.062640 0.0175 #> -#> Loglikelihood: -989.3347
summary(splineFit)#> Call: -#> rateReg(formula = Survr(ID, time, event) ~ group + x1, data = simuDat, +#> Loglikelihood: -989.3347summary(splineFit)#> Call: +#> rateReg(formula = Recur(time, ID, event) ~ group + x1, data = simuDat, #> knots = c(56, 84, 112), degree = 3) #> #> Coefficients of covariates: @@ -440,73 +474,74 @@Examp #> #> Loglikelihood: -1663.231
## model selection based on AIC or BIC -AIC(constFit, piecesFit, splineFit)#> Warning: Models are not all fitted to the same number of observations. Consider +AIC(constFit, piecesFit, splineFit)#> Warning: Models are not all fitted to the same number of observations. Consider #> BIC instead?#> df AIC #> constFit 4 3360.843 #> piecesFit 9 1996.669 -#> splineFit 10 3346.462BIC(constFit, piecesFit, splineFit)#> df BIC +#> splineFit 10 3346.462#> df BIC #> constFit 4 3377.702 #> piecesFit 9 2030.003 #> splineFit 10 3388.608## estimated covariate coefficients -coef(piecesFit)#> groupTreat x1 -#> -0.8030992 0.3361225coef(splineFit)#> groupTreat x1 +coef(piecesFit)#> groupTreat x1 +#> -0.8030986 0.3361229coef(splineFit)#> groupTreat x1 #> -0.6357103 0.3061451## confidence intervals for covariate coefficients -confint(piecesFit)#> 2.5% 97.5% -#> groupTreat -1.5567693 -0.04942913 -#> x1 -0.1193835 0.79162862confint(splineFit, "x1", 0.9)#> 5% 95% -#> x1 0.03228081 0.5800093confint(splineFit, 1, 0.975)#> 1.25% 98.75% +confint(piecesFit)#> 2.5% 97.5% +#> groupTreat -1.5567675 -0.04942976 +#> x1 -0.1193827 0.79162861#> 5% 95% +#> x1 0.03228081 0.5800093#> 1.25% 98.75% #> groupTreat -1.275143 0.003722945## estimated baseline rate function splinesBase <- baseRate(splineFit) -plot(splinesBase, conf.int = TRUE)## estimated baseline mean cumulative function (MCF) from a fitted model piecesMcf <- mcf(piecesFit) -plot(piecesMcf, conf.int = TRUE, col = "blueviolet")## estimated MCF for given new data -newDat <- data.frame(x1 = rep(0, 2), group = c("Treat", "Contr")) +newDat <- data.frame(x1 = rep(0, 2), group = c("Treat", "Contr")) splineMcf <- mcf(splineFit, newdata = newDat, groupName = "Group", - groupLevels = c("Treatment", "Control")) -plot(splineMcf, conf.int = TRUE, lty = c(1, 5))## example of further customization by ggplot2 -library(ggplot2) -plot(splineMcf) + - geom_ribbon(aes(x = time, ymin = lower, +library(ggplot2) +plot(splineMcf) + + geom_ribbon(aes(x = time, ymin = lower, ymax = upper, fill = Group), data = splineMcf@MCF, alpha = 0.2) + - xlab("Days")
The R package reda provides functions for simulating, exploring and modeling recurrent event data.
- +The main functions are summarized as follows:
rateReg
: Fitting Gamma fraitly model with spline baseline rate
function.
See the package vignettes for more introduction and demonstration.
- + +See the package vignettes for more introduction and demonstration.
S4 class methods that display objects produced from this package (similar to
S3 class print
methods).
# S4 method for Recur +show(object) -# S4 method for rateReg +# S4 method for rateReg show(object) # S4 method for summary.rateReg @@ -126,8 +160,8 @@- -Show an object.
# S4 method for mcfDiff.test show(object)Arguments
+ +Arguments
An object used to dispatch a method. |
An S4 class that represents the simulated recurrent event or survival time
from one stochastic process. The function simEvent
produces
objects of this class.
.Data
A numerical vector of possibly length zero.
call
A function call.
z
A list.
zCoef
A list.
rho
A list.
rhoCoef
A numerical vector.
frailty
A list.
origin
A list.
endTime
A list.
censoring
A list.
recurrent
A logical vector.
interarrival
A list.
relativeRisk
A list.
method
A character vector.
call
A function call.
z
A list.
zCoef
A list.
rho
A list.
rhoCoef
A numerical vector.
frailty
A list.
origin
A list.
endTime
A list.
censoring
A list.
recurrent
A logical vector.
interarrival
A list.
relativeRisk
A list.
method
A character vector.
The function simEvent
generates simulated recurrent events or
survival time (the first event time) from one stochastic process. The
function simEventData
provides a simple wrapper that calls
simEvent
internally and collects the generated survival data or
recurrent events into a data frame. More examples are available in one of
the package vignettes in addition to the function documentation.
simEvent(z = 0, zCoef = 1, rho = 1, rhoCoef = 1, origin = 0, endTime = 3, frailty = 1, recurrent = TRUE, interarrival = "rexp", - relativeRisk = c("exponential", "linear", "excess", "none"), - method = c("thinning", "inversion"), arguments = list(), ...) + relativeRisk = c("exponential", "linear", "excess", "none"), + method = c("thinning", "inversion"), arguments = list(), ...) simEventData(nProcess, z = 0, origin = 0, endTime = 3, frailty = 1, ...)- -
interarrival | A function object for randomly generating (positive)
interarrival time between two successive arrivals/events. The default
-value is Ar positive number should be speicified. |
---|
The function simEvent
returns a simEvent
S4 class object and
the function simEventData
returns a data.frame
.
For each process, a time-invariant or time-varying baseline hazard rate @@ -304,7 +338,6 @@
simEvent
similar to simEventData
.
-
Andersen, P. K., & Gill, R. D. (1982). Cox's regression model for counting @@ -319,23 +352,23 @@
library(reda) -set.seed(123) ++ arguments = list(relativeRisk = list(intercept = 1)))library(reda) +set.seed(123) + ### time-invariant covariates and coefficients ## one process -simEvent(z = c(0.5, 1), zCoef = c(1, 0))#> 'simEvent' S4 class object: #> [1] 0.5115827 0.8613145 1.6674270 1.6865797 1.7206733 1.9126410 2.1032295 #> [8] 2.1913383simEvent(z = 1, zCoef = 0.5, recurrent = FALSE)#> 'simEvent' S4 class object: #> [1] 0.2903829## simulated data -simEventData(z = c(0.5, 1), zCoef = c(1, 0), endTime = 2)#> ID time event origin X.1 X.2 #> 1 1 0.4591666 1 0 0.5 1 #> 2 1 0.4784347 1 0 0.5 1 #> 3 1 0.8410489 1 0 0.5 1 -#> 4 1 2.0000000 0 0 0.5 1simEventData(z = cbind(rnorm(3), 1), zCoef = c(1, 0))#> ID time event origin X.1 X.2 +#> 4 1 2.0000000 0 0 0.5 1#> ID time event origin X.1 X.2 #> 1 1 0.2079348 1 0 0.8951257 1 #> 2 1 0.3135230 1 0 0.8951257 1 #> 3 1 0.4852657 1 0 0.8951257 1 @@ -358,7 +391,7 @@Examp #> 20 3 1.7458978 1 0 0.8215811 1 #> 21 3 2.1522257 1 0 0.8215811 1 #> 22 3 2.8744498 1 0 0.8215811 1 -#> 23 3 3.0000000 0 0 0.8215811 1
simEventData(z = matrix(rnorm(5)), zCoef = 0.5, recurrent = FALSE)#> ID time event origin X +#> 23 3 3.0000000 0 0 0.8215811 1#> ID time event origin X #> 1 1 0.8635377 1 0 -0.2288958 #> 2 2 2.9133870 1 0 -0.9007918 #> 3 3 1.0878381 1 0 -0.7350262 @@ -367,21 +400,21 @@Examp ### time-varying covariates and time-varying coefficients zFun <- function(time, intercept) { - cbind(time / 10 + intercept, as.numeric(time > 1)) + cbind(time / 10 + intercept, as.numeric(time > 1)) } zCoefFun <- function(x, shift) { - cbind(sqrt(x + shift), 1) + cbind(sqrt(x + shift), 1) } simEvent(z = zFun, zCoef = zCoefFun, - arguments = list(z = list(intercept = 0.1), - zCoef = list(shift = 0.1)))
#> 'simEvent' S4 class object: + arguments = list(z = list(intercept = 0.1), + zCoef = list(shift = 0.1)))#> 'simEvent' S4 class object: #> [1] 0.01703579 0.69742521 1.25021917 1.26152287 1.36531462 1.68015169 #> [7] 1.91858209 2.15412550 2.23913149 2.24511429 2.27295262 2.46185550 #> [13] 2.65875740## same function of time for all processes simEventData(3, z = zFun, zCoef = zCoefFun, - arguments = list(z = list(intercept = 0.1), - zCoef = list(shift = 0.1)))#> ID time event origin X.1 X.2 + arguments = list(z = list(intercept = 0.1), + zCoef = list(shift = 0.1)))#> ID time event origin X.1 X.2 #> 1 1 1.4109295 1 0 0.2410929 1 #> 2 1 1.5959107 1 0 0.2595911 1 #> 3 1 1.6432538 1 0 0.2643254 1 @@ -423,21 +456,21 @@Examp ## same function within one process but different between processes ## use quote function in the arguments simDat <- simEventData(3, z = zFun, zCoef = zCoefFun, - arguments = list( - z = list(intercept = quote(rnorm(1) / 10)), - zCoef = list(shift = 0.1) + arguments = list( + z = list(intercept = quote(rnorm(1) / 10)), + zCoef = list(shift = 0.1) )) ## check the intercept randomly generated, ## which should be the same within each ID but different between IDs. -unique(with(simDat, cbind(ID, intercept = round(X.1 - time / 10, 6))))
#> ID intercept #> [1,] 1 0.087936 #> [2,] 2 0.046494 #> [3,] 3 0.131681### non-negative time-varying baseline hazard rate function -simEvent(rho = function(timeVec) { sin(timeVec) + 1 })#> 'simEvent' S4 class object: -#> [1] 1.336428 1.655383 2.337511 2.431350 2.887220simEventData(3, origin = rnorm(3), endTime = rnorm(3, 5), - rho = function(timeVec) { sin(timeVec) + 1 })#> 'simEvent' S4 class object: +#> [1] 1.336428 1.655383 2.337511 2.431350 2.887220simEventData(3, origin = rnorm(3), endTime = rnorm(3, 5), + rho = function(timeVec) { sin(timeVec) + 1 })#> ID time event origin X #> 1 1 1.8825263 1 0.00729009 0 #> 2 1 1.9038402 1 0.00729009 0 #> 3 1 3.1337841 1 0.00729009 0 @@ -458,12 +491,12 @@Examp #> 18 3 2.4938328 1 -1.18843404 0 #> 19 3 2.7181862 1 -1.18843404 0 #> 20 3 5.3773880 0 -1.18843404 0
## specify other arguments -simEvent(z = c(rnorm(1), rbinom(1, 1, 0.5)) / 10, - rho = function(a, b) { sin(a + b) + 1 }, - arguments = list(rho = list(b = 0.5)))#> 'simEvent' S4 class object: -#> [1] 0.3575603 1.0572763 1.6624542 1.8274776 2.1770491simEventData(z = cbind(rnorm(3), rbinom(3, 1, 0.5)) / 10, - rho = function(a, b) { sin(a + b) + 1 }, - arguments = list(rho = list(b = 0.5)))#> ID time event origin X.1 X.2 +simEvent(z = c(rnorm(1), rbinom(1, 1, 0.5)) / 10, + rho = function(a, b) { sin(a + b) + 1 }, + arguments = list(rho = list(b = 0.5)))#> 'simEvent' S4 class object: +#> [1] 0.3575603 1.0572763 1.6624542 1.8274776 2.1770491simEventData(z = cbind(rnorm(3), rbinom(3, 1, 0.5)) / 10, + rho = function(a, b) { sin(a + b) + 1 }, + arguments = list(rho = list(b = 0.5)))#> ID time event origin X.1 X.2 #> 1 1 0.3168247 1 0 0.08842508 0.1 #> 2 1 0.6545106 1 0 0.08842508 0.1 #> 3 1 1.1262112 1 0 0.08842508 0.1 @@ -486,81 +519,84 @@Examp #> 20 3 3.0000000 0 0 -0.05735603 0.0
## quadratic B-splines with one internal knot at "time = 1" ## (using function 'bSpline' from splines2 package) -simEvent(rho = splines2::bSpline, rhoCoef = c(0.8, 0.5, 1, 0.6), - arguments = list(rho = list(degree = 2, knots = 1, +simEvent(rho = splines2::bSpline, rhoCoef = c(0.8, 0.5, 1, 0.6), + arguments = list(rho = list(degree = 2, knots = 1, intercept = TRUE, - Boundary.knots = c(0, 3))))#> 'simEvent' S4 class object: #> [1] 0.09089787 0.21678348 1.06873299 1.96199882### frailty effect ## Gamma distribution with mean one -simEvent(z = c(0.5, 1), zCoef = c(1, 0), frailty = rgamma, - arguments = list(frailty = list(shape = 2, scale = 0.5)))#> 'simEvent' S4 class object: +simEvent(z = c(0.5, 1), zCoef = c(1, 0), frailty = rgamma, + arguments = list(frailty = list(shape = 2, scale = 0.5)))#> 'simEvent' S4 class object: #> [1] 0.3961255 0.5721497 0.9267213 0.9505554 1.5040362 1.5561853## lognormal with mean zero (on the log scale) -set.seed(123) -simEvent(z = c(0.5, 1), zCoef = c(1, 0), frailty = "rlnorm", - arguments = list(frailty = list(sdlog = 1)))#> 'simEvent' S4 class object: +set.seed(123) +simEvent(z = c(0.5, 1), zCoef = c(1, 0), frailty = "rlnorm", + arguments = list(frailty = list(sdlog = 1)))#> 'simEvent' S4 class object: #> [1] 1.411910 1.445456 1.505172 1.841404 2.175221 2.329544## or equivalently -set.seed(123) -logNorm <- function(a) exp(rnorm(n = 1, mean = 0, sd = a)) -simEvent(z = c(0.5, 1), zCoef = c(1, 0), frailty = logNorm, - arguments = list(frailty = list(a = 1)))#> 'simEvent' S4 class object: +set.seed(123) +logNorm <- function(a) exp(rnorm(n = 1, mean = 0, sd = a)) +simEvent(z = c(0.5, 1), zCoef = c(1, 0), frailty = logNorm, + arguments = list(frailty = list(a = 1)))#> 'simEvent' S4 class object: #> [1] 1.411910 1.445456 1.505172 1.841404 2.175221 2.329544+ ### renewal process ## interarrival times following uniform distribution -rUnif <- function(n, rate, min) runif(n, min, max = 2 / rate) +rUnif <- function(n, rate, min) runif(n, min, max = 2 / rate) simEvent(interarrival = rUnif, - arguments = list(interarrival = list(min = 0)))#> 'simEvent' S4 class object: #> [1] 1.311412 2.728473## interarrival times following Gamma distribution with scale one -set.seed(123) -simEvent(interarrival = function(n, rate) rgamma(n, shape = 1 / rate))#> 'simEvent' S4 class object: +set.seed(123) +simEvent(interarrival = function(n, rate) rgamma(n, shape = 1 / rate))#> 'simEvent' S4 class object: #> [1] 0.1822171 1.8779682## or equivalently -set.seed(123) -simEvent(interarrival = function(rate) rgamma(n = 1, shape = 1 / rate))#> 'simEvent' S4 class object: +set.seed(123) +simEvent(interarrival = function(rate) rgamma(n = 1, shape = 1 / rate))#> 'simEvent' S4 class object: #> [1] 0.1822171 1.8779682#> 'simEvent' S4 class object: #> [1] 0.8434573 1.4200675 2.7491224 2.7806998 2.8369107## or equivalently rriskFun <- function(z, zCoef, intercept) { - as.numeric(z %*% zCoef) + intercept + as.numeric(z %*% zCoef) + intercept } -set.seed(123) +set.seed(123) simEvent(relativeRisk = rriskFun, - arguments = list(relativeRisk = list(intercept = 1)))#> 'simEvent' S4 class object: -#> [1] 0.8434573 1.4200675 2.7491224 2.7806998 2.8369107-#> 'simEvent' S4 class object: +#> [1] 0.8434573 1.4200675 2.7491224 2.7806998 2.8369107
A simulated data frame with covariates named
ID
, time
, event
, group
, x1
,
and gender
, where
x1
: Continuous variable.
gender
: Gender of subjects.
A data frame with 500 rows and 6 variables.
-The sample dataset is originally simulated by the thinning method developed by Lewis and Shedler (1979) and further processed for a better demonstration purpose. See Fu et al. (2016) for details also.
-Lewis, P. A., & Shedler, G. S. (1979). @@ -134,34 +174,36 @@
Fu, H., Luo, J., & Qu, Y. (2016). Hypoglycemic events analysis via recurrent time-to-event (HEART) models. Journal Of Biopharmaceutical Statistics, 26(2), 280--298.
-Summary of estimated coefficients of covariates, rate function bases, and estimated rate parameter of frailty random variable, etc.
- +# S4 method for rateReg -summary(object, showCall = TRUE, showKnots = TRUE, ...)- -
Other arguments for future usage. |
summary.rateReg
object
summary,rateReg-method
returns a
@@ -151,17 +181,15 @@
baseRateCoef
: Estimated coeffcients of baseline
rate function.
For the meaning of other slots, see rateReg
.
rateReg
for model fitting;
+
rateReg
for model fitting;
coef,rateReg-method
for point estimates of
covariate coefficients;
confint,rateReg-method
for confidence intervals
of covariate coeffcients;
baseRate,rateReg-method
for coefficients of baseline
-rate function.
The class summary.rateReg
is an S4 class with selective slots of
rateReg
object. See ``Slots'' for details. The function
summary,rateReg-method
produces objects of this class.
call
Function call.
spline
A character.
knots
A numeric vector.
Boundary.knots
A numeric vector.
covarCoef
A numeric matrix.
frailtyPar
A numeric matrix.
degree
A nonnegative integer.
baseRateCoef
A numeric matrix.
logL
A numeric value.
spline
A character.
knots
A numeric vector.
Boundary.knots
A numeric vector.
covarCoef
A numeric matrix.
frailtyPar
A numeric matrix.
degree
A nonnegative integer.
baseRateCoef
A numeric matrix.
logL
A numeric value.
Valve seats wear out in certain diesel engines, each with 16 valve seats. The dataset served as an example of recurrence data in Nelson (1995), which consists of valve-seat replacements on 41 engines in a fleet. @@ -112,44 +152,48 @@
No.
: Event indicator, '1' for a valve-seat replacement
and, '0' for the censoring age of an engine.
A data frame with 89 rows and 3 variables.
-Nelson, W. (1995), Confidence Limits for Recurrence Data-Applied to Cost or Number of Product Repairs, Technometrics, 37, 147--157.
-