dyplyr

  • dplyr is a package tailed towards data manipulation.
  • filter() picks cases based on their values.
  • select() picks variables based on their names.
  • mutate() adds and creates new variables
  • summarise() reduces multiple values down to a single summary.
  • group_by() allows to perform any operation by grouping variables.
  • arrange() changes the ordering of the rows.
  • rellocate() changes the ordering of the columns.

filter() picks cases based on their values

filter(.data, ...)

# Without %>%:
filter(mtcars, disp > 350 & mpg < 11)

# with %>%
mtcars %>%
  filter(disp > 350 & mpg < 11)

Note: You don’t have to provide the name of a data object within the dplyr function!

not: filter(mtcars, mtcars$disp > 350 & mtcars$mpg < 11)

but: filter(mtcars, disp > 350 & mpg < 11)

filter is an extension of logical subsetting in base R:

mtcars[mtcars$disp > 350 & mtcars$mpg < 11, ]
# or: 
subset(mtcars, disp > 350 & mpg < 11)

Task

  • Take the starwars database.
  • Filter all cases with brown hair color and female gender (Hint: variables hair_color and sex)

library(tidyverse)
starwars %>%
  filter(hair_color == "brown" & sex == "female")
name height mass hair_color skin_color eye_color birth_year sex gender homeworld species films vehicles starships
Leia Organa 150 49 brown light brown 19 female feminine Alderaan Human A New Hope, The Empire Strikes Back, Return of the Jedi, Revenge of the Sith, The Force Awakens Imperial Speeder Bike
Beru Whitesun Lars 165 75 brown light blue 47 female feminine Tatooine Human A New Hope, Attack of the Clones, Revenge of the Sith
Padmé Amidala 185 45 brown light brown 46 female feminine Naboo Human The Phantom Menace, Attack of the Clones, Revenge of the Sith Naboo fighter, H-type Nubian yacht, Naboo star skiff
Dormé 165 NA brown light brown NA female feminine Naboo Human Attack of the Clones
Rey NA NA brown light hazel NA female feminine NA Human The Force Awakens

select() picks variables based on their names

select(.data, ...)

# Either column numbers or column names
mtcars %>% select(1, 3, carb)

# Use the `:` symbol for ranges
mtcars %>% select(1:3, am:carb)

# use `-` to drop variables and keep the others
mtcars %>% select(-am, -(disp:drat))

# use `!` to select all variables not the ones defined
mtcars %>% select(!(1:3))

Note: selectis an extension of variable subsetting in base R:

mtcars[, c("mpg", "cyl", "carb")]

Task

  • Take the starwars database.
  • Filter for Human and select the variables name, height, and mass
  • Hint: variable species
starwars %>% 
  filter(species == "Human") %>% 
  select(name:mass)
name height mass
Luke Skywalker 172 77.0
Darth Vader 202 136.0
Leia Organa 150 49.0
Owen Lars 178 120.0
Beru Whitesun Lars 165 75.0
Biggs Darklighter 183 84.0
Obi-Wan Kenobi 182 77.0
Anakin Skywalker 188 84.0
Wilhuff Tarkin 180 NA
Han Solo 180 80.0
Wedge Antilles 170 77.0
Palpatine 170 75.0
Boba Fett 183 78.2
Lando Calrissian 177 79.0
Lobot 175 79.0
Mon Mothma 150 NA
Arvel Crynyd NA NA
Qui-Gon Jinn 193 89.0
Finis Valorum 170 NA
Padmé Amidala 185 45.0
Ric Olié 183 NA
Quarsh Panaka 183 NA
Shmi Skywalker 163 NA
Mace Windu 188 84.0
Cliegg Lars 183 NA
Dormé 165 NA
Dooku 193 80.0
Bail Prestor Organa 191 NA
Jango Fett 183 79.0
Jocasta Nu 167 NA
Raymus Antilles 188 79.0
Finn NA NA
Rey NA NA
Poe Dameron NA NA
Captain Phasma NA NA

Help functions to select variables

  • contains(): selects all variables which names contain a certain string.
  • starts_with(): selects all variables which start with a certain string.
  • ends_with(): selects all variables which end with a certain string.
  • num_range(): selects variables with a certain prefix within a certain range
mtcars %>% select(starts_with("d"))
mtcars %>% select(!ends_with("b"))
mtcars %>% select(contains("ra"))
billboard %>% select(num_range("wk", 10:15))

relocate() to rearrange columns

The relocate() function helps to rearrange the columns of a data frame

starwars %>% 
  relocate(name, birth_year, sex)
name birth_year sex height mass hair_color skin_color eye_color
Luke Skywalker 19.0 male 172 77 blond fair blue
C-3PO 112.0 none 167 75 NA gold yellow
R2-D2 33.0 none 96 32 NA white, blue red
Darth Vader 41.9 male 202 136 none white yellow
Leia Organa 19.0 female 150 49 brown light brown
Owen Lars 52.0 male 178 120 brown, grey light blue
Beru Whitesun Lars 47.0 female 165 75 brown light blue
R5-D4 NA none 97 32 NA white, red red
Biggs Darklighter 24.0 male 183 84 black light brown
Obi-Wan Kenobi 57.0 male 182 77 auburn, white fair blue-gray

mutate()

adds new variables that are functions of existing variables

mutate(.data, ...)

mtcars %>%
  mutate(
    efficiency = mpg / disp,
    effect = round(hp / mpg * 100)
  ) %>%
  relocate(efficiency, effect)
mtcars %>%
  mutate(
    efficiency = mpg / disp,
    effect = round(hp / mpg * 100)
  ) %>%
  relocate(efficiency, effect)
efficiency effect mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 0.1312500 524 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 0.1312500 524 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 0.2111111 408 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 0.0829457 514 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 0.0519444 936 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 0.0804444 580 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 0.0397222 1713 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 0.1663258 254 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 0.1619318 417 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 0.1145585 641 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Merc 280C 0.1062053 691 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
Merc 450SE 0.0594634 1098 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Merc 450SL 0.0627266 1040 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
Merc 450SLC 0.0551124 1184 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
Cadillac Fleetwood 0.0220339 1971 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 0.0226087 2067 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
Chrysler Imperial 0.0334091 1565 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Fiat 128 0.4116900 204 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 0.4015852 171 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 0.4767932 192 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Toyota Corona 0.1790175 451 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
Dodge Challenger 0.0487421 968 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
AMC Javelin 0.0500000 987 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
Camaro Z28 0.0380000 1842 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
Pontiac Firebird 0.0480000 911 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Fiat X1-9 0.3455696 242 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 0.2161264 350 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 0.3196635 372 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Ford Pantera L 0.0450142 1671 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Ferrari Dino 0.1358621 888 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
Maserati Bora 0.0498339 2233 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
Volvo 142E 0.1768595 509 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2

Task

  • Take the starwars database.
  • Create a new variable with the body mass index bmi. Hint: bmi = kg / m²
  • Filter all humans with an bmi >= 26.
  • Select name, bmi, height, and mass
starwars %>%
  mutate(bmi = mass/ (height/100)^2) %>%
  filter(bmi >= 26 & species == "Human") %>%
  select(name, bmi, height, mass)
name bmi height mass
Luke Skywalker 26.02758 172 77
Darth Vader 33.33007 202 136
Owen Lars 37.87401 178 120
Beru Whitesun Lars 27.54821 165 75
Wedge Antilles 26.64360 170 77

summarise()

Reduces multiple values down to a single summary.

mtcars %>%
  summarise(
    mean_mpg = mean(mpg),
    sd_mpg = sd(mpg),
    n = n()
  )
mean_mpg sd_mpg n
20.09062 6.026948 32

Task

  • Take the starwars database.
  • Filter Humans
  • Calculate the mean and the median for height and mass (Note: don’t forget to remove NA values: na.rm = TRUE)
starwars %>%
  filter(species == "Human") %>%
  summarise(
    mean_height = mean(height, na.rm = TRUE),
    median_height = median(height, na.rm = TRUE),
    mean_mass = mean(mass, na.rm = TRUE),
    median_mass = median(mass, na.rm = TRUE)
  )
mean_height median_height mean_mass median_mass
178 181 81.31 79

group_by() group a dataset

group_by() allows to perform a summarise() operation by grouping variables.

mtcars %>% group_by(cyl) %>%
  summarise(
    mean_mpg = mean(mpg),
    sd_mpg = sd(mpg),
    n = n()
  )
mtcars %>% group_by(cyl, am) %>%
  summarise(
    mean_mpg = mean(mpg),
    sd_mpg = sd(mpg),
    n = n()
  )
cyl am mean_mpg sd_mpg n
4 0 22.90000 1.4525839 3
4 1 28.07500 4.4838599 8
6 0 19.12500 1.6317169 4
6 1 20.56667 0.7505553 3
8 0 15.05000 2.7743959 12
8 1 15.40000 0.5656854 2

Task

  • Take the starwars database.
  • Group by sex
  • Calculate the median for height and mass
  • And the number of cases per group (Hint: n())
starwars %>%
  group_by(sex) %>%
  summarise(
    median_height = median(height, na.rm = TRUE),
    median_mass = median(mass, na.rm = TRUE),
    n = n()
  )
sex median_height median_mass n
female 167.5 55.0 16
hermaphroditic 175.0 1358.0 1
male 183.0 80.0 60
none 97.0 53.5 6
NA 179.0 85.0 4

arrange()

Orders the rows by the values of selected columns.

arrange(.data, ...)

dat <- data.frame(age = c(5,5,6,7,6), sen = c(0, 1, 1, 0, 0), points = c(34, 55, 22, 11, 9))

dat %>% arrange(sen, age) 
age sen points
5 0 34
6 0 9
7 0 11
5 1 55
6 1 22

desc()

Specifies a variable to be arranged descending.

dat %>% arrange(desc(sen), desc(points))
age sen points
5 1 55
6 1 22
5 0 34
7 0 11
6 0 9

Task

  • Take the starwars database.
  • Calculate the bmi.
  • Summarize the median of the bmi grouped by species.
  • round the bmi to one decimal.
  • Also calculate the n for each group.
  • Only show cases with n > 2.
  • Arrange the resulting table. Sort by n in descending order.
starwars %>% 
  mutate(bmi = mass / (height / 100)^2) %>% 
  group_by(species) %>% 
  summarise(
    median_bmi = median(bmi, na.rm = TRUE) %>% round(1), 
    n = n()
  ) %>%
  filter(n >2) %>%
  arrange(desc(n))
species median_bmi n
Human 24.3 35
Droid 34.4 6
NA 24.8 4
Gungan 16.8 3