marimekko (mosaic) plots for ggplot2.
A one-sided formula controls the variable hierarchy and split directions. Column widths and segment heights encode marginal and conditional proportions of categorical variables.
Installation
# Install from CRAN
install.packages("marimekko")
# Or install the development version from GitHub
devtools::install_github("gogonzo/marimekko")Quick start
library(ggplot2)
library(marimekko)
titanic <- as.data.frame(Titanic)
ggplot(titanic) +
geom_marimekko(
aes(fill = Survived, weight = Freq),
formula = ~ Class | Survived
) +
labs(title = "Titanic survival by class", y = "Proportion") +
theme_marimekko()Formula syntax
A one-sided formula (~ ...) controls which variables are used, their nesting order, and how each level splits the plot area. The plot starts as a single rectangle (the unit square) and each variable subdivides it further.
Split directions
There are two split directions:
- Horizontal split — divides the area into side-by-side columns along the x-axis. Column widths are proportional to the variable’s distribution. All columns share the same vertical extent.
- Vertical split — divides the area into stacked rows along the y-axis. Row heights are proportional to the variable’s distribution. All rows share the same horizontal extent.
The first variable always splits horizontally (columns). Each | switches direction, so the second variable splits vertically (rows within each column), the third switches back to horizontal, and so on.
Operators
| Operator | Meaning |
|---|---|
| |
Separates groups of variables. Each | flips the split direction (horizontal ↔︎ vertical). |
+ |
Combines variables within the same group — they split in the same direction, one after another. |
Examples
| Formula | 1st split | 2nd split | 3rd split | Layout |
|---|---|---|---|---|
~ a | b |
a → columns (horizontal) |
b → rows within each column (vertical) |
— | Standard mosaic |
~ a | b | c |
a → columns |
b → rows |
c → sub-columns |
Alternating 3-level |
~ a + b | c |
a → columns, then b → sub-columns |
c → rows |
— | Double decker |
~ a | b + c |
a → columns |
b → rows, then c → sub-rows |
— | Multiple vertical |
Features
| Feature | Function / Parameter |
|---|---|
| Core marimekko plot | geom_marimekko() |
| Text labels on tiles | geom_marimekko_text() |
| Labels with background box | geom_marimekko_label() |
| Marginal percentages on x-axis | show_percentages = TRUE |
| Compute tiles without plotting | fortify_marimekko() |
| Minimal mosaic theme | theme_marimekko() |
| Pearson residual shading | after_stat(.residuals) |
| Conditional proportion shading | after_stat(.proportion) |
| Independent x/y gaps |
gap_x / gap_y
|
| Plotly interactivity | plotly::ggplotly() |
Examples
Marginal percentages on x-axis
ggplot(titanic) +
geom_marimekko(
aes(fill = Survived, weight = Freq),
formula = ~ Class | Survived,
show_percentages = TRUE
) +
theme_marimekko()Count labels
ggplot(titanic) +
geom_marimekko(
aes(fill = Survived, weight = Freq),
formula = ~ Class | Survived
) +
geom_marimekko_text(aes(label = after_stat(weight)))Residual shading
ggplot(titanic) +
geom_marimekko(
aes(
fill = Survived, weight = Freq,
alpha = after_stat(abs(.residuals))
),
formula = ~ Class | Survived
) +
scale_alpha_continuous(range = c(0.3, 1), guide = "none")Three-variable nested mosaic
ggplot(titanic) +
geom_marimekko(
aes(fill = Survived, weight = Freq),
formula = ~ Class | Sex | Survived
)Faceting
ggplot(as.data.frame(Titanic)) +
geom_marimekko(
aes(fill = Survived, weight = Freq),
formula = ~ Class | Survived
) +
facet_wrap(~Sex)Independent x/y gaps
ggplot(titanic) +
geom_marimekko(
aes(fill = Survived, weight = Freq),
formula = ~ Class | Survived, gap_x = 0.04, gap_y = 0
)Data extraction with fortify
tiles <- fortify_marimekko(titanic,
formula = ~ Class | Survived, weight = Freq
)
head(tiles)How it works
marimekko extends ggplot2 through the ggproto system:
-
StatMarimekkoparses the formula, recursively partitions the unit square, and returns tile rectangles (xmin,xmax,ymin,ymax) with computed variables (.residuals,.proportion,.marginal). - Tiles are rendered via
GeomRectwith sensible defaults (white borders, slight transparency). - Axis labels are automatically placed by the geom at tile midpoints.
Why to use marimekko?
marimekko was designed to avoid pain points in other existing packages.
-
Minimal dependencies –
ggplot2as only dependency - No internal ggplot2 API usage – won’t break on ggplot2 updates
-
Easily extendable –
StatMarimekkoTilesexposes tile data so you can pair it with any ggplot2 geom to build custom companion layers (bubbles, residual markers, etc.) -
Formula-based API –
~ a | b | cencodes both variables and directions -
Works without
library()–marimekko::geom_marimekko()just works -
Respects factor levels – user-set
levels()are honored -
In-formula expressions –
~ factor(cyl) | cut(mpg, breaks = 3)works -
Plotly compatible –
ggplotly()works out of the box