-
Notifications
You must be signed in to change notification settings - Fork 2
/
README.Rmd
176 lines (138 loc) · 4.89 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
---
output: github_document
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
eval = TRUE,
echo = TRUE,
message = TRUE,
warning = TRUE,
fig.width = 8,
fig.height = 6,
dpi = 200,
fig.align = "center",
fig.path = "man/figures/README-"
)
knitr::opts_chunk$set()
library(magrittr)
library(cohortBuilder)
set.seed(123)
options(tibble.width = Inf)
iris <- tibble::as.tibble(iris)
options("tibble.print_max" = 5)
options("tibble.print_min" = 5)
pkg_version <- read.dcf("DESCRIPTION", fields = "Version")[1, 1]
```
# cohortBuilder <img src="man/figures/logo.png" align="right" width="120" />
[![version](https://img.shields.io/static/v1.svg?label=github.com&message=v.`r I(pkg_version)`&color=ff69b4)](https://r-world-devs.github.io/cohortBuilder/)
[![lifecycle](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)
## Overview
`cohortBuilder` provides common API for creating cohorts on multiple data sources,
such as local data frame, database schema or external data api.
With only two steps:
1. Configuring data source with `set_source`.
2. Initializing cohort with `cohort`.
You can operate on data using common methods, such as:
- `filter` - to define and `run` to apply filtering rules,
- `step` - to perform multi-stage filtering,
- `get_data`, `stat`, `attrition`, `plot_data` - to extract, sum up or visualize your cohort data.
With `cohortBuilder` you can share the cohort easier with useful methods:
- `code` - to get reproducible cohort creation code,
- `get_state` - to get cohort state (e.g. in JSON) that can be then easily restored with `restore`.
Or modify the cohort configuration with:
- `add_filter`, `rm_filter`, `update_filter` - to manage filters definition
- `add_step`, `rm_step` - to manage filtering steps,
- `update_source` - to manage the cohort source.
## Data sources and extensions
The goal of `cohortBuilder` is to provide common API for creating data cohorts,
but also to be easily extendable for working on different data sources (and interactive dashboards).
`cohortBuilder` allows to operate on local data frames (or list of data frames),
yet you may easily switch to a database source by loading `cohortBuilder.db` layer.
As a standalone R package, you use `cohortBuilder` to perform all the operations in non-interactive
R script, but its shiny layer `shinyCohortBuilder` package helps you to easily switch to intuitive gui mode.
More to that you may integrate `cohortBuilder` with your custom Shiny application.
If you want to learn how to write custom source extension, please check `vignette("custom-extensions")`.
## Installation
```{r, eval = FALSE}
# CRAN version
install.packages("cohortBuilder")
# Latest development version
remotes::install_github("https://github.com/r-world-devs/cohortBuilder")
```
## Usage
```{r}
librarian_source <- set_source(
as.tblist(librarian)
)
coh <- librarian_source %>%
cohort(
filter(
"discrete", id = "author", dataset = "books",
variable = "author", value = "Dan Brown"
),
filter(
"range", id = "copies", dataset = "books",
variable = "copies", range = c(5, 10)
),
filter(
"date_range", id = "registered", dataset = "borrowers",
variable = "registered", range = c(as.Date("2010-01-01"), Inf)
)
) %>%
run()
get_data(coh)
```
```{r}
coh <- librarian_source %>%
cohort() %->%
step(
filter(
"discrete", id = "author", dataset = "books",
variable = "author", value = "Dan Brown"
),
filter(
"date_range", id = "registered", dataset = "borrowers",
variable = "registered", range = c(as.Date("2010-01-01"), Inf)
)
) %->%
step(
filter(
"range", id = "copies", dataset = "books",
variable = "copies", range = c(5, 10)
)
) %>%
run()
```
```{r}
get_data(coh, step_id = 1)
```
```{r}
get_data(coh, step_id = 2)
```
```{r}
update_filter(
coh, step_id = 1, filter_id = "author",
range = c(5, 6)
)
run(coh)
get_data(coh, step_id = 2)
```
```{r}
code(coh)
```
```{r}
attrition(coh, dataset = "books")
```
```{r}
get_state(coh, json = TRUE)
```
## Acknowledgement
Special thanks to:
- [Kamil Wais](mailto:kamil.wais@gmail.com) for highlighting the need for the package and its relevance to real-world applications.
- [Adam Foryś](mailto:adam.forys@gmail.com) for technical support, numerous suggestions for the current and future implementation of the package.
- [Paweł Kawski](mailto:pawel.kawski@gmail.com) for indication of initial assumptions about the package based on real-world medical data.
## Getting help
In a case you found any bugs, have feature request or general question please file an issue at the package [Github](https://github.com/r-world-devs/cohortBuilder/issues).
You may also contact the package author directly via email at [krystian8207@gmail.com](mailto:krystian8207@gmail.com).