Skip to contents

R-CMD-check codecov CRAN downloads

marimekko (mosaic) plots for ggplot2.

A one-sided formula controls the variable hierarchy and split directions. Column widths and segment heights encode marginal and conditional proportions of categorical variables.

Installation

# Install from CRAN
install.packages("marimekko")

# Or install the development version from GitHub
devtools::install_github("gogonzo/marimekko")

Quick start

library(ggplot2)
library(marimekko)

titanic <- as.data.frame(Titanic)

ggplot(titanic) +
  geom_marimekko(
    aes(fill = Survived, weight = Freq),
    formula = ~ Class | Survived
  ) +
  labs(title = "Titanic survival by class", y = "Proportion") +
  theme_marimekko()

Formula syntax

A one-sided formula (~ ...) controls which variables are used, their nesting order, and how each level splits the plot area. The plot starts as a single rectangle (the unit square) and each variable subdivides it further.

Split directions

There are two split directions:

  • Horizontal split — divides the area into side-by-side columns along the x-axis. Column widths are proportional to the variable’s distribution. All columns share the same vertical extent.
  • Vertical split — divides the area into stacked rows along the y-axis. Row heights are proportional to the variable’s distribution. All rows share the same horizontal extent.

The first variable always splits horizontally (columns). Each | switches direction, so the second variable splits vertically (rows within each column), the third switches back to horizontal, and so on.

Operators

Operator Meaning
| Separates groups of variables. Each | flips the split direction (horizontal ↔︎ vertical).
+ Combines variables within the same group — they split in the same direction, one after another.

Examples

Formula 1st split 2nd split 3rd split Layout
~ a | b a → columns (horizontal) b → rows within each column (vertical) Standard mosaic
~ a | b | c a → columns b → rows c → sub-columns Alternating 3-level
~ a + b | c a → columns, then b → sub-columns c → rows Double decker
~ a | b + c a → columns b → rows, then c → sub-rows Multiple vertical

Expressions in formulas

Variables in the formula can be arbitrary R expressions:

~ factor(cyl) | cut(mpg, breaks = 3)

Features

Feature Function / Parameter
Core marimekko plot geom_marimekko()
Text labels on tiles geom_marimekko_text()
Labels with background box geom_marimekko_label()
Marginal percentages on x-axis show_percentages = TRUE
Compute tiles without plotting fortify_marimekko()
Minimal mosaic theme theme_marimekko()
Pearson residual shading after_stat(.residuals)
Conditional proportion shading after_stat(.proportion)
Independent x/y gaps gap_x / gap_y
Plotly interactivity plotly::ggplotly()

Examples

Marginal percentages on x-axis

ggplot(titanic) +
  geom_marimekko(
    aes(fill = Survived, weight = Freq),
    formula = ~ Class | Survived,
    show_percentages = TRUE
  ) +
  theme_marimekko()

Count labels

ggplot(titanic) +
  geom_marimekko(
    aes(fill = Survived, weight = Freq),
    formula = ~ Class | Survived
  ) +
  geom_marimekko_text(aes(label = after_stat(weight)))

Residual shading

ggplot(titanic) +
  geom_marimekko(
    aes(
      fill = Survived, weight = Freq,
      alpha = after_stat(abs(.residuals))
    ),
    formula = ~ Class | Survived
  ) +
  scale_alpha_continuous(range = c(0.3, 1), guide = "none")

Three-variable nested mosaic

ggplot(titanic) +
  geom_marimekko(
    aes(fill = Survived, weight = Freq),
    formula = ~ Class | Sex | Survived
  )

Faceting

ggplot(as.data.frame(Titanic)) +
  geom_marimekko(
    aes(fill = Survived, weight = Freq),
    formula = ~ Class | Survived
  ) +
  facet_wrap(~Sex)

Independent x/y gaps

ggplot(titanic) +
  geom_marimekko(
    aes(fill = Survived, weight = Freq),
    formula = ~ Class | Survived, gap_x = 0.04, gap_y = 0
  )

Plotly interactivity

library(plotly)

p <- ggplot(titanic) +
  geom_marimekko(
    aes(fill = Survived, weight = Freq),
    formula = ~ Class | Survived
  )
ggplotly(p)

Data extraction with fortify

tiles <- fortify_marimekko(titanic,
  formula = ~ Class | Survived, weight = Freq
)
head(tiles)

How it works

marimekko extends ggplot2 through the ggproto system:

  • StatMarimekko parses the formula, recursively partitions the unit square, and returns tile rectangles (xmin, xmax, ymin, ymax) with computed variables (.residuals, .proportion, .marginal).
  • Tiles are rendered via GeomRect with sensible defaults (white borders, slight transparency).
  • Axis labels are automatically placed by the geom at tile midpoints.

Why to use marimekko?

marimekko was designed to avoid pain points in other existing packages.

  • Minimal dependenciesggplot2 as only dependency
  • No internal ggplot2 API usage – won’t break on ggplot2 updates
  • Easily extendableStatMarimekkoTiles exposes tile data so you can pair it with any ggplot2 geom to build custom companion layers (bubbles, residual markers, etc.)
  • Formula-based API~ a | b | c encodes both variables and directions
  • Works without library()marimekko::geom_marimekko() just works
  • Respects factor levels – user-set levels() are honored
  • In-formula expressions~ factor(cyl) | cut(mpg, breaks = 3) works
  • Plotly compatibleggplotly() works out of the box