Skip to content

Commit

Permalink
Closes #73 (#86)
Browse files Browse the repository at this point in the history
* #73 initial setup of data and imputation post

* #73 date functions and imputations blog post

* #73 updates following review

* chore: #73 date stamp

---------

Co-authored-by: Ben Straub <ben.x.straub@gsk.com>
  • Loading branch information
manciniedoardo and bms63 authored Sep 26, 2023
1 parent 0d30076 commit 8d7f46f
Show file tree
Hide file tree
Showing 4 changed files with 358 additions and 1 deletion.
32 changes: 31 additions & 1 deletion inst/WORDLIST.txt
Original file line number Diff line number Diff line change
Expand Up @@ -336,6 +336,37 @@ siloed
stefanthoma
useR
admiralroche
ADTM
ADY
ae
AEENDTC
AEN
AESTDTC
args
AST
ASTDTM
datetime
Datetime
DCUTDT
dt
dtc
DTC
dtf
DTHDT
dtm
hms
lubridate
mh
MHSTDTC
mmThh
ss
tmf
TRTSDT
TRTSDTM
VSDTC
VSTPT
ymd
yyyy
AENDT
AENDY
ASTDTM
Expand All @@ -350,7 +381,6 @@ DTM
dy
lubridate
TRTSDTM
ymd
Ari
eclinical
Knoph
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Empty file.
Original file line number Diff line number Diff line change
@@ -0,0 +1,327 @@
---
title: "Date/Time Functions and Imputation in {admiral} "
author:
- name: Edoardo Mancini
description: "Dates, times and imputation can be a frustrating facet of any programming language. Here's how {admiral} makes all of this easy!"
date: "2023-09-26"
# please do not use any non-default categories.
# You can find the default categories in the repository README.md
categories: [admiral, ADaM, Date/Time]
# feel free to change the image
image: "admiral.png"

---

<!--------------- typical setup ----------------->

```{r setup, include=FALSE}
long_slug <- "2023-08-21_date_functions_and_imputation"
# renv::use(lockfile = "renv.lock")
library(admiraldev)
```

<!--------------- post begins here ----------------->

# Introduction

Date and time is collected in SDTM as character values using the extended [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) format `yyyy-dd-mmThh:mm:ss`. This universal format allows missing parts date or time - e.g. the string`"2019-10"` represents a date where the day and the time are unknown. In contrast, ADaM timing variables like `ADTM` (Analysis Datetime) or `ADY` (Analysis Relative Day) are numeric variables, which can be derived only if the date or datetime is complete.

Most ADaM programmers have, at one point or another, encountered situations where missing dates, unexpected formats or confusing imputation functions rendered derivations of timing variables frustrating and time consuming. `{admiral}` aims to mitigate this (where possible!) by providing functions which automatically derive date/datetime variables for you, and fill in missing date or time parts according to well-defined imputation rules.

In this article, we first examine the arsenal of functions provided by`{admiral}` to aid in datetime imputation and timing variable derivation. We then observe everything in action through a number of selected typical examples.

# Date/Datetime Derivation and Imputation Functions

`{admiral}` provides the following functions for date/datetime imputation:

- Derivations for adding variables
- [derive_vars_dt()](https://pharmaverse.github.io/admiral/cran-release/reference/derive_vars_dt.html): Adds a date variable and a date imputation flag variable (optional) based on a --DTC variable and imputation rules.
- [derive_vars_dtm()](https://pharmaverse.github.io/admiral/cran-release/reference/derive_vars_dtm.html): Adds a datetime variable, a date imputation flag variable, and a time imputation flag variable (both optional) based on a --DTC variable and imputation rules.
- Computation functions
- [impute_dtc_dtm()](https://pharmaverse.github.io/admiral/cran-release/reference/impute_dtc_dtm.html): Returns a complete ISO 8601 datetime or `NA` based on a partial ISO 8601 datetime and imputation rules.
- [impute_dtc_dt()](https://pharmaverse.github.io/admiral/cran-release/reference/impute_dtc_dt.html): Returns a complete ISO 8601 date (without time) or `NA` based on a partial ISO 8601 date(time) and imputation rules.
- [convert_dtc_to_dt()](https://pharmaverse.github.io/admiral/cran-release/reference/convert_dtc_to_dt.html): Returns a date if the input ISO 8601 date is complete. Otherwise, `NA` is returned.
- [convert_dtc_to_dtm()](https://pharmaverse.github.io/admiral/cran-release/reference/convert_dtc_to_dtm.html): Returns a datetime if the input ISO 8601 date is complete (with missing time replaced by `"00:00:00"` as default). Otherwise, NA is returned.
- [compute_dtf()](https://pharmaverse.github.io/admiral/cran-release/reference/compute_dtf.html): Returns the date imputation flag.
- [compute_tmf()](https://pharmaverse.github.io/admiral/cran-release/reference/compute_tmf.html): Returns the time imputation flag.

From the point of view of a typical ADaM programmer, the functions `impute_*`, `convert_*` and `compute_*` above can be viewed as utilities for treating dates and/or imputation within any custom code. In contrast, their `derive_*` find their use in directly deriving new timing variables and/or carrying out imputation at an ADaM dataset scale.

For a detailed look at the Imputation rules applied by these `{admiral}` functions, please visit [this vignette](https://pharmaverse.github.io/admiral/cran-release/articles/imputation.html#imputation-rules) on the documentation website.

# Simple Examples with Vectors

In the examples below, one can observe how some members of the class of utilities `impute_*()` and `convert_*()` can be employed to do the date-related heavy lifting.

## Imputing a Partial Date Portion

It is easy impute dates to the first day/month if they are partial just by using the `highest_imputation` argument:

```{r warning = FALSE, message = FALSE}
library(admiral)
library(lubridate)
library(tibble)
library(dplyr, warn.conflicts = FALSE)
dates <- c(
"2019-07-18T15:25:40",
"2019-07-18T15:25",
"2019-07-18T15",
"2019-07-18",
"2019-02",
"2019",
"2019",
"2019---07",
""
)
impute_dtc_dt(
dtc = dates,
highest_imputation = "M"
)
```

A simple modification using `date_imputation = "mid"` or `date_imputation = "last"` or enables the imputation to be made using the middle or last day/month instead:

```{r}
# Impute to last day/month if date is partial
impute_dtc_dt(
dtc = dates,
highest_imputation = "M",
date_imputation = "last",
)
# Impute to mid day/month if date is partial
impute_dtc_dt(
dtc = dates,
highest_imputation = "M",
date_imputation = "mid"
)
```

But what if there exist minimum dates that the imputed date cannot exceed? Here, the `min_date` argument comes to the rescue:

```{r}
impute_dtc_dt(
"2020-12",
min_dates = list(
ymd("2020-12-06"),
ymd("2020-11-11")
),
highest_imputation = "M"
)
```

## Computing Date Imputation Flags

When it comes to carrying out an imputation, the twin task is to flag the type of imputation that was executed. Here, functions like `compute_dtf()` make this straightforward. For this function, all that needs to be done is to pass a date character date to the `dtc` argument, and the resulting imputed date to the `dt` argument. This will then return the right date imputation flag - see the examples below for some possible behaviors:

```{r}
compute_dtf(dtc = "2019-07", dt = as.Date("2019-07-18"))
compute_dtf(dtc = "2019", dt = as.Date("2019-07-18"))
compute_dtf(dtc = "--06-01T00:00", dt = as.Date("2022-06-01"))
```


# Action Examples

The `derive_*()` functions are essentially wrappers around the aforementioned `impute_*()` and `compute_*()` functions. In the following section, we explore examples where ADaM variables can be derived using this class of functions.

## Creating an Imputed Datetime and Date Variable and Imputation Flag Variables

As described previously, `derive_vars_dtm()` derives an imputed datetime variable and the corresponding date and time imputation flags. The imputed date variable can then be derived by using `derive_vars_dtm_to_dt()`. It is not necessary and advisable to perform the imputation for the date variable if it was already done for the datetime variable. CDISC considers the datetime and the date variable as two representations of the same date. Thus the imputation must be the same and the imputation flags are valid for both the datetime and the date variable.

```{r}
ae <- tribble(
~AESTDTC,
"2019-08-09T12:34:56",
"2019-04-12",
"2010-09",
NA_character_
) %>%
derive_vars_dtm(
dtc = AESTDTC,
new_vars_prefix = "AST",
highest_imputation = "M",
date_imputation = "first",
time_imputation = "first"
) %>%
derive_vars_dtm_to_dt(exprs(ASTDTM))
```
```{r, echo=FALSE}
dataset_vignette(ae)
```

## Creating an Imputed Date Variable and Imputation Flag Variable

If an imputed date variable without a corresponding datetime variable is required, it can be derived by the `derive_vars_dt()` function.

```{r}
ae <- tribble(
~AESTDTC,
"2019-08-09T12:34:56",
"2019-04-12",
"2010-09",
NA_character_
) %>%
derive_vars_dt(
dtc = AESTDTC,
new_vars_prefix = "AST",
highest_imputation = "M",
date_imputation = "first"
)
```
```{r, echo=FALSE}
dataset_vignette(ae)
```

## Imputing Time Without Imputing Date

If the time should be imputed but not the date, the `highest_imputation` argument should be set to `"h"`. This results in `NA` if the date is partial. As no date is imputed the date imputation flag is not created.

```{r}
ae <- tribble(
~AESTDTC,
"2019-08-09T12:34:56",
"2019-04-12",
"2010-09",
NA_character_
) %>%
derive_vars_dtm(
dtc = AESTDTC,
new_vars_prefix = "AST",
highest_imputation = "h",
time_imputation = "first"
)
```
```{r, echo=FALSE}
dataset_vignette(ae)
```

## Avoiding Imputed Dates Before a Particular Date
Usually an adverse event start date is imputed as the earliest date of all possible dates when filling the missing parts. The result may be a date before treatment start date. This is not desirable because the adverse event would not be considered as treatment emergent and excluded from the adverse event summaries. This can be avoided by specifying the treatment start date variable (`TRTSDTM`) for the `min_dates` argument.

Importantly, `TRTSDTM` is used as imputed date only if the non missing date and time parts of `AESTDTC` coincide with those of `TRTSDTM`. Therefore `2019-10` is not imputed as `2019-11-11 12:34:56`. This ensures that collected information is not changed by the imputation.

```{r}
ae <- tribble(
~AESTDTC, ~TRTSDTM,
"2019-08-09T12:34:56", ymd_hms("2019-11-11T12:34:56"),
"2019-10", ymd_hms("2019-11-11T12:34:56"),
"2019-11", ymd_hms("2019-11-11T12:34:56"),
"2019-12-04", ymd_hms("2019-11-11T12:34:56")
) %>%
derive_vars_dtm(
dtc = AESTDTC,
new_vars_prefix = "AST",
highest_imputation = "M",
date_imputation = "first",
time_imputation = "first",
min_dates = exprs(TRTSDTM)
)
```
```{r, echo=FALSE}
dataset_vignette(ae)
```

## Avoiding Imputed Dates After a Particular Date

If a date is imputed as the latest date of all possible dates when filling the missing parts, it should not result in dates after data cut off or death. This can be achieved by specifying the dates for the `max_dates` argument.

Importantly, non missing date parts are not changed. Thus `2019-12-04` is imputed as `2019-12-04 23:59:59` although it is after the data cut off date. It may make sense to replace it by the data cut off date but this is not part of the imputation. It should be done in a separate data cleaning or data cut off step.
```{r}
ae <- tribble(
~AEENDTC, ~DTHDT, ~DCUTDT,
"2019-08-09T12:34:56", ymd("2019-11-11"), ymd("2019-12-02"),
"2019-11", ymd("2019-11-11"), ymd("2019-12-02"),
"2019-12", NA, ymd("2019-12-02"),
"2019-12-04", NA, ymd("2019-12-02")
) %>%
derive_vars_dtm(
dtc = AEENDTC,
new_vars_prefix = "AEN",
highest_imputation = "M",
date_imputation = "last",
time_imputation = "last",
max_dates = exprs(DTHDT, DCUTDT)
)
```
```{r, echo=FALSE}
dataset_vignette(ae)
```

## Imputation Without Creating a New Variable

If imputation is required without creating a new variable the `convert_dtc_to_dt()` function can be called to obtain a vector of imputed dates. It can be used for example here:

```{r}
mh <- tribble(
~MHSTDTC, ~TRTSDT,
"2019-04", ymd("2019-04-15"),
"2019-04-01", ymd("2019-04-15"),
"2019-05", ymd("2019-04-15"),
"2019-06-21", ymd("2019-04-15")
) %>%
filter(
convert_dtc_to_dt(
MHSTDTC,
highest_imputation = "M",
date_imputation = "first"
) < TRTSDT
)
```
```{r, echo=FALSE}
dataset_vignette(mh)
```

## Using More Than One Imputation Rule for a Variable

Using different imputation rules depending on the observation can be done by using the higher-order function `slice_derivation()`, which applies a derivation function differently (by varying its arguments) in different subsections of a dataset. For example, consider this Vital Signs case where pre-dose records require a different treatment to other records:

```{r}
vs <- tribble(
~VSDTC, ~VSTPT,
"2019-08-09T12:34:56", NA,
"2019-10-12", "PRE-DOSE",
"2019-11-10", NA,
"2019-12-04", NA
) %>%
slice_derivation(
derivation = derive_vars_dtm,
args = params(
dtc = VSDTC,
new_vars_prefix = "A"
),
derivation_slice(
filter = VSTPT == "PRE-DOSE",
args = params(time_imputation = "first")
),
derivation_slice(
filter = TRUE,
args = params(time_imputation = "last")
)
)
```
```{r, echo=FALSE}
dataset_vignette(vs)
```

# Conclusion

Deriving timing variables and carrying out imputations is tricky at the best of times, but hopefully this blog post can shed some light on how make this all easier using the `{admiral}` package! As `{admiral}` developers we are always interested in knowing how users are employing the package for their ADaM needs, so if you have any comments or feedback related to this topic, don't be afraid to leave a comment on our [Slack channel](https://app.slack.com/client/T028PB489D3/C02M8KN8269) or on the [Github repository](https://github.com/pharmaverse/admiral/), either as an issue or as a discussion.

For an even more detailed treatment of this topic, users are once again invited to read the corresponding [vignette](https://pharmaverse.github.io/admiral/cran-release/articles/imputation.html) on the documentation website, from which this article was adapted.

<!--------------- appendices go here ----------------->

```{r, echo=FALSE}
#| eval: false
source("appendix.R")
insert_appendix(
repo_spec = "pharmaverse/blog",
name = long_slug
)
```

0 comments on commit 8d7f46f

Please sign in to comment.