runner::runner applies any R function on running windows. Windows are controlled by size (k), shift (lag), index (idx), and evaluation points (at). See vignette("getting_started_with_runner") for an introduction to window types and parameters.

Basic usage

Without k, windows are cumulative. With a constant k, windows slide along the data. The function f receives each window as its argument.

library(runner)

# cumulative sum
runner(1:15, f = sum)

# 4-element sliding sum
runner(1:15, k = 4, f = sum)

# rolling slope from lm on a data.frame
df <- data.frame(a = 1:15, b = 3 * 1:15 + rnorm(15))
runner(df, k = 5, f = function(x) coefficients(lm(b ~ a, data = x))["a"])

Index-based windows

When data is irregularly spaced (e.g. missing weekends or holidays), set idx so that k and lag refer to index distance rather than element count. Both accept integers or time-interval strings ("5 days", "week", "2 months").

idx <- c(4, 6, 7, 13, 17, 18, 18, 21, 27, 31, 37, 42, 44, 47, 48)

# 5-unit window with lag of 1
runner(idx, k = 5, lag = 1, idx = idx, f = mean)

# same with Date objects and time-interval strings
runner(idx, k = "5 days", lag = "day", idx = Sys.Date() + idx, f = mean)

Evaluation at specific points

By default, output has the same length as x. Set at to evaluate only at selected index positions — useful when you need results at specific dates rather than at every observation.

runner(1:15, k = 5, lag = 1, idx = idx, at = c(18, 27, 48, 31), f = mean)

A single time-interval string generates a regular sequence over the idx range:

idx_date <- seq(Sys.Date(), Sys.Date() + 365, by = "1 month")

# evaluate every 4 months
runner(0:12, idx = idx_date, at = "4 months")

# 6-month rolling correlation
runner(
  x = data.frame(a = 1:13, b = 1:13 + rnorm(13, sd = 5), idx_date),
  idx = "idx_date", at = "6 months",
  f = function(x) cor(x$a, x$b)
)

Varying window size and lag

Both k and lag can be vectors of length(x) (or length(at)), allowing different window sizes and shifts at each position. lag can be negative (shifts forward) or positive (shifts backward); k must be non-negative.

# varying k and lag
runner(1:10,
  lag = c(-1, 2, -1, -2, 0, 0, 5, -5, -2, -3),
  k   = c( 0, 1,  1,  1, 1, 5, 5,  5,  5,  5),
  f = paste, collapse = ","
)

# varying k with time-interval strings
idx <- c(4, 6, 7, 13, 17, 18, 18, 21, 27, 31, 37, 42, 44, 47, 48)
runner(1:15,
  k = sample(c("5 days", "10 days", "15 days"), 15, replace = TRUE),
  lag = sample(c("-2 days", "-1 days", "1 days", "2 days"), 15, replace = TRUE),
  idx = Sys.Date() + idx, f = mean
)

NA padding

By default (na_pad = FALSE), incomplete windows (those extending beyond the data range) are computed with whatever data is available. Set na_pad = TRUE to return NA for such windows instead.

k = 5, lag = 1, at = c(4, 18, 48, 51), na_pad = TRUE
idx:  4   6   7  13  17  18  18  21  27  31  37  42  44  47  48

at= 4:  [-2,  3]  NA   window extends before data -> NA (na_pad)
at=18:  [13, 17]  ==   {13, 17}
at=48:  [43, 47]  ===  {44, 47}
at=51:  [46, 50]  NA   window extends beyond data -> NA (na_pad)
runner(1:15, k = 5, lag = 1, idx = idx, at = c(4, 18, 48, 51),
       na_pad = TRUE, f = mean)

Data frames and matrices

Pass a data.frame or matrix as x to apply functions involving multiple columns. Each window is a row-subset of the original data.

x <- cumsum(rnorm(40))
y <- 3 * x + rnorm(40)
date <- Sys.Date() + cumsum(sample(1:3, 40, replace = TRUE))
group <- rep(c("a", "b"), 20)
df <- data.frame(date, group, y, x)

# cumulative rolling slope
runner(df, f = function(x) coefficients(lm(y ~ x, data = x))[2])

dplyr integration

runner works inside dplyr::mutate, including with group_by. Column names can be passed as strings for idx, which is convenient in pipelines.

library(dplyr)

df %>%
  group_by(group) %>%
  mutate(
    slope = runner(x = ., k = "20 days", idx = "date",
                   f = function(x) coefficients(lm(y ~ x, data = x))[2])
  )

When making multiple runner calls with the same window parameters, use run_by to set shared arguments once:

df %>%
  group_by(group) %>%
  run_by(idx = "date", k = "20 days", na_pad = FALSE) %>%
  mutate(
    mse       = runner(x = ., f = function(x) mean(residuals(lm(y ~ x, data = x))^2)),
    intercept = runner(x = ., f = function(x) coefficients(lm(y ~ x, data = x))[1]),
    slope     = runner(x = ., f = function(x) coefficients(lm(y ~ x, data = x))[2])
  )

Parallel execution

Pass a [parallel::makeCluster()] cluster to cl for parallel computation. Objects referenced inside f (beyond its arguments) must be exported to the cluster beforehand with clusterExport. Note that parallel execution adds overhead and is only beneficial for computationally expensive functions.

library(parallel)
cl <- makeForkCluster(detectCores())
runner(df, k = 10, idx = "date", f = function(x) sum(x$x), cl = cl)
stopCluster(cl)

Built-in functions

For common aggregations, built-in C++ functions are much faster than runner(f = ...):

  • Aggregation: sum_run, mean_run, min_run, max_run, minmax_run, length_run, streak_run
  • Utility: fill_run, lag_run, which_run

All accept the same k, lag, idx, at, and na_pad arguments. See vignette("built-in_functions") for details and examples.