University of Münster
2026-01-29
A ggplot graphic has at least three key components:
displ displacement by hwy highway miles per gallondispl (displacement) and hwy (highway miles per gallon)drv f = front-wheel drive, r = rear wheel drive, 4 = 4wdcyl number of cylindersggplot() functionThe main function is ggplot(). It takes two arguments:
data : A data framemapping : Aesthetic mappings provided with the aes() function.Additional layers are added with a + sign.
mpg data frame.cty (city miles per gallon) and hwy (highway miles per galon) displayed on the axis andclass andshape = drv) is mapped on the variable drv.geom_point() layer.geom function.geom_point() : Dots for each data point.geom_line() : Lines connecting each x-axis data pointgeom_bar() : Barsgeom_text() : Text at x and y positionsgeom_smooth() : Smoothed conditional meanseconomics data framedate and unemployment. (geom_line())geom_point())geom_bar()geom_bar() draws barsx variablempg data frame.drv variable.red with the fill argument.width = 0.8 to resize the bar width.geom_col()geom_col() function, bar heights and bar categories are taken from the x and y variables:starwars database.mutate(bmi = mass / (height / 100)^2)bmi < 100.theme(axis.text.x = element_text(angle = 40, hjust=1)) What does this layer do?summarise(mean_bmi = median(bmi, na.rm = TRUE)geom_smooth()geom_smooth() is used to add smoothed conditional means in scatterplots.
economics data frame.unemploy by population pop.geom_smooth layer.dslabs.gapminder.year and continent.summarize() function to calculate the mean of infant_mortality.year on x-axis, mean of infant_mortality on y-axis, and continent as line/dot colours.smooth layer.group_by(year, continent)When you have multiple values ordered in a categorical variable simple plots become messy:
Solutions
geom_jitter() : Adds a litle random jitter to each datapointgeom_boxplot() : Draws a boxplotgeom_violin() : Draws a violine plotmgp datasetggtitle() ( e.g. ggtitle("My first plot") )labs(x = NULL, y = NULL) (e.g. labs(x = "Categories", y = "Mean"))ylim(min = 0, max = 10) ; xlim(min = 0, max = 10)ggplot(mpg,aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth()
Facets are another basic aesthetics. A plot is organizes multiple times by a categorical variable:
year by extracting the year from the date variable.psavert in to a categorical variable saving_rate with three levelssaving_rate.year = format(date, format = "%Y")saving_rate = cut(psavert, breaks = 3, labels = c("Low","Medium","High"))unemploy / pop * 100economics %>%
mutate(
year = format(date, format = "%Y"),
saving_rate = cut(psavert, breaks = 3, labels = c("Low","Medium","High")),
unemploy_rate = unemploy / pop * 100
) %>%
ggplot(aes(x = year, y = unemploy_rate)) +
geom_point(size = 0.5) +
facet_wrap(~saving_rate) +
theme(axis.text.x = element_text(angle = 90, hjust=1, size = 5))stats <- mpg %>%
group_by(class) %>%
summarise(n=n(), lpk = 235.2 / mean(hwy))
ggplot(mpg, aes(x = class, y = 235.2 / hwy)) +
geom_jitter(width = 0.2) +
geom_point(data = stats, mapping = aes(x = class, y = lpk), colour = "red", size = 6) +
geom_text(data = stats, aes(x = class, y = 5, label = paste0("n=", n)),) +
ylim(5,20) +
ylab("Liters per 100 kilometers on a highway")Jürgen Wilbert - Introduction to R