vignettes/apply_any_r_function.Rmd
apply_any_r_function.Rmdrunner::runner applies any R function on running
windows. Windows are controlled by size (k), shift
(lag), index (idx), and evaluation points
(at). See
vignette("getting_started_with_runner") for an introduction
to window types and parameters.
Without k, windows are cumulative. With a constant
k, windows slide along the data. The function
f receives each window as its argument.
library(runner)
# cumulative sum
runner(1:15, f = sum)
# 4-element sliding sum
runner(1:15, k = 4, f = sum)
# rolling slope from lm on a data.frame
df <- data.frame(a = 1:15, b = 3 * 1:15 + rnorm(15))
runner(df, k = 5, f = function(x) coefficients(lm(b ~ a, data = x))["a"])When data is irregularly spaced (e.g. missing weekends or holidays),
set idx so that k and lag refer
to index distance rather than element count. Both accept integers or
time-interval strings ("5 days", "week",
"2 months").
By default, output has the same length as x. Set
at to evaluate only at selected index positions — useful
when you need results at specific dates rather than at every
observation.
A single time-interval string generates a regular sequence over the
idx range:
idx_date <- seq(Sys.Date(), Sys.Date() + 365, by = "1 month")
# evaluate every 4 months
runner(0:12, idx = idx_date, at = "4 months")
# 6-month rolling correlation
runner(
x = data.frame(a = 1:13, b = 1:13 + rnorm(13, sd = 5), idx_date),
idx = "idx_date", at = "6 months",
f = function(x) cor(x$a, x$b)
)Both k and lag can be vectors of
length(x) (or length(at)), allowing different
window sizes and shifts at each position. lag can be
negative (shifts forward) or positive (shifts backward); k
must be non-negative.
# varying k and lag
runner(1:10,
lag = c(-1, 2, -1, -2, 0, 0, 5, -5, -2, -3),
k = c( 0, 1, 1, 1, 1, 5, 5, 5, 5, 5),
f = paste, collapse = ","
)
# varying k with time-interval strings
idx <- c(4, 6, 7, 13, 17, 18, 18, 21, 27, 31, 37, 42, 44, 47, 48)
runner(1:15,
k = sample(c("5 days", "10 days", "15 days"), 15, replace = TRUE),
lag = sample(c("-2 days", "-1 days", "1 days", "2 days"), 15, replace = TRUE),
idx = Sys.Date() + idx, f = mean
)By default (na_pad = FALSE), incomplete windows (those
extending beyond the data range) are computed with whatever data is
available. Set na_pad = TRUE to return NA for
such windows instead.
k = 5, lag = 1, at = c(4, 18, 48, 51), na_pad = TRUE
idx: 4 6 7 13 17 18 18 21 27 31 37 42 44 47 48
at= 4: [-2, 3] NA window extends before data -> NA (na_pad)
at=18: [13, 17] == {13, 17}
at=48: [43, 47] === {44, 47}
at=51: [46, 50] NA window extends beyond data -> NA (na_pad)
Pass a data.frame or matrix as
x to apply functions involving multiple columns. Each
window is a row-subset of the original data.
runner works inside dplyr::mutate,
including with group_by. Column names can be passed as
strings for idx, which is convenient in pipelines.
library(dplyr)
df %>%
group_by(group) %>%
mutate(
slope = runner(x = ., k = "20 days", idx = "date",
f = function(x) coefficients(lm(y ~ x, data = x))[2])
)When making multiple runner calls with the same window
parameters, use run_by to set shared arguments once:
df %>%
group_by(group) %>%
run_by(idx = "date", k = "20 days", na_pad = FALSE) %>%
mutate(
mse = runner(x = ., f = function(x) mean(residuals(lm(y ~ x, data = x))^2)),
intercept = runner(x = ., f = function(x) coefficients(lm(y ~ x, data = x))[1]),
slope = runner(x = ., f = function(x) coefficients(lm(y ~ x, data = x))[2])
)Pass a [parallel::makeCluster()] cluster to cl for
parallel computation. Objects referenced inside f (beyond
its arguments) must be exported to the cluster beforehand with
clusterExport. Note that parallel execution adds overhead
and is only beneficial for computationally expensive functions.
library(parallel)
cl <- makeForkCluster(detectCores())
runner(df, k = 10, idx = "date", f = function(x) sum(x$x), cl = cl)
stopCluster(cl)For common aggregations, built-in C++ functions are much faster than
runner(f = ...):
sum_run,
mean_run, min_run, max_run,
minmax_run, length_run,
streak_run
fill_run,
lag_run, which_run
All accept the same k, lag,
idx, at, and na_pad arguments.
See vignette("built-in_functions") for details and
examples.