-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
182 lines (123 loc) · 3.93 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
---
title: ""
output: github_document
---
```{r,setup,include=FALSE}
knitr::opts_chunk$set(comment = '.', message=FALSE, warning = FALSE,
fig.path="man/figures/README-")
```
# mrgsim.parallel
<!-- badges: start -->
<!-- badges: end -->
## Overview
mrgsolve.parallel facilitates parallel simulation with
mrgsolve in R. The future and parallel packages provide the parallelization.
There are 2 main workflows:
1. Split a `data_set` into chunks by ID, simulate the chunks in parallel, then
assemble the results back to a single data frame.
1. Split an `idata_set` (individual-level parameters) into chunks by row,
simulate the chunks in parallel, then assemble the results back to a single
data frame.
The nature of the parallel backend requires some overhead to get the
parallel simulation done. So, it will take a reasonably-sized job to see
a speed increase and small jobs will likely take *longer* with parallelization.
But jobs taking more than a handful of seconds could benefit from this type
of parallelization.
```{r,include = FALSE}
options(mrgsolve.soloc = "build")
```
## Backend
```{r}
library(dplyr)
library(future)
library(mrgsim.parallel)
options(future.fork.enable = TRUE, parallelly.fork.enable = TRUE, mc.cores = 4L)
```
## First workflow: split and simulate a data set
```{r}
mod <- modlib("pk2cmt", end = 168*8, delta = 1)
data <- expand.ev(amt = 100*seq(1,2000), ii = 24, addl = 27*2+2)
data <- mutate(data, CL = runif(n(), 0.7, 1.3))
head(data)
dim(data)
```
We can simulate in parallel with the future package or the parallel package like this:
```{r}
plan(multisession, workers = 4L)
system.time(ans1 <- future_mrgsim_d(mod, data, nchunk = 4L))
plan(multicore, workers = 4L)
system.time(ans1b <- future_mrgsim_d(mod, data, nchunk = 4L))
system.time(ans2 <- mc_mrgsim_d(mod, data, nchunk = 4L))
```
To compare an identical simulation done without parallelization
```{r}
system.time(ans3 <- mrgsim_d(mod,data))
```
```{r}
identical(ans2,as.data.frame(ans3))
```
## Second workflow: split and simulate a batch of parameters
Backend and the model
```{r}
plan(multisession, workers = 6)
mod <- modlib("pk1cmt", end = 168*4, delta = 1)
```
For this workflow, we have a set of parameters (`idata`) along with an
event object that gets applied to all of the parameters
```{r}
idata <- tibble(CL = runif(4000, 0.5, 1.5), ID = seq_along(CL))
head(idata)
```
```{r}
dose <- ev(amt = 100, ii = 24, addl = 27)
dose
```
Run it in parallel
```{r}
system.time(ans1 <- mc_mrgsim_ei(mod, dose, idata, nchunk = 6))
```
And without parallelization
```{r}
system.time(ans2 <- mrgsim_ei(mod, dose, idata, output = "df"))
identical(ans1,ans2)
```
## Utility functions
You can access the chunking functions for your own parallel workflows
```{r}
dose <- ev_seq(ev(amt = 100), ev(amt = 50, ii = 12, addl = 2))
dose <- ev_rep(dose, 1:5)
dose
chunk_by_id(dose, nchunk = 2)
```
See also: `chunk_by_row`
## Do a dry run to check the overhead of parallelization
```{r}
plan(transparent)
system.time(x <- fu_mrgsim_d(mod, data, nchunk = 8, .dry = TRUE))
plan(multisession, workers = 8L)
system.time(x <- fu_mrgsim_d(mod, data, nchunk = 8, .dry = TRUE))
```
## Pass a function to post process on the worker
First check the range of times from the previous example
```{r}
summary(ans1$time)
```
The post-processing function has arguments the simulated data and the
model object
```{r}
post <- function(sims, mod) {
filter(sims, time > 600)
}
dose <- ev(amt = 100, ii = 24, addl = 27)
ans3 <- mc_mrgsim_ei(mod, dose, idata, nchunk = 6, .p = post)
```
```{r}
summary(ans3$time)
```
The main use case here is to summarize or some how decrease the volume of data
before returning the combined simulations. In case memory is able to handle
the simulation volume, this post-processing could be done on the combined
data as well.
<hr>
## More info
See [inst/docs/stories.md (on GitHub only)](inst/docs/stories.md) for more details.