Create a new variable in a dataset with missing values

Please copy this syntax into R-Studio and execute:

study <- data.frame(
  sen    = factor(c(0, 1, 0, 1, 0, 1), labels = c("no sen", "sen"), levels = 0:1),
  gender = factor(c(1,2,2,1,2,1), labels = c("Male", "Female"), levels = 1:2),
  age  = c(150, 156, 138, 126, 136, 162),
  IQ     = c(90, 85, 90, 87, 99, 89),
  Q1     = c(1, 5, 3, 4, 2, 3),
  Q2     = c(2, 5, 3, 4, 5, 2),
  Q3     = c(1, 5, 3, 5, 2, 1)
)

Create a new variable in a dataset

With missing values:

study$Q4 <- NA

With specific values:

study$Q5 <- c(5, 1, 4, 2, 1, 3)
study
sen gender age IQ Q1 Q2 Q3 Q4 Q5
no sen Male 150 90 1 2 1 NA 5
sen Female 156 85 5 5 5 NA 1
no sen Female 138 90 3 3 3 NA 4
sen Male 126 87 4 4 5 NA 2
no sen Female 136 99 2 5 2 NA 1
sen Male 162 89 3 2 1 NA 3

Create new variables from existing ones

study$age2 <- study$age / 12
study
sen gender age IQ Q1 Q2 Q3 Q4 Q5 age2
no sen Male 150 90 1 2 1 NA 5 12.50000
sen Female 156 85 5 5 5 NA 1 13.00000
no sen Female 138 90 3 3 3 NA 4 11.50000
sen Male 126 87 4 4 5 NA 2 10.50000
no sen Female 136 99 2 5 2 NA 1 11.33333
sen Male 162 89 3 2 1 NA 3 13.50000

Drop a variable

study$age2 <- NULL
study
sen gender age IQ Q1 Q2 Q3 Q4 Q5
no sen Male 150 90 1 2 1 NA 5
sen Female 156 85 5 5 5 NA 1
no sen Female 138 90 3 3 3 NA 4
sen Male 126 87 4 4 5 NA 2
no sen Female 136 99 2 5 2 NA 1
sen Male 162 89 3 2 1 NA 3
# or: study <- subset(study, select = -age2)
# or: study <- study[, -10]
study$Q_sum_1_3 <- with(study, Q1 + Q2 + Q3)

# or:

study$Q_sum_1_3 <- study$Q1 + study$Q2 + study$Q3

study$Q_sum_1_3
[1]  4 15  9 13  9  6
study$age_year <- trunc(study$age /12)
study$age_month <- study$age - (study$age_year * 12)

# or

study$age_year <- study$age %/% 12
study$age_month <- study$age %% 12

study[, c("age", "age_year", "age_month")]
age age_year age_month
150 12 6
156 13 0
138 11 6
126 10 6
136 11 4
162 13 6
# or: study[, startsWith(names(study), "age")]

The apply function

The apply function applies a function to every row or column of a data frame.

  1. The first argument is the data frame object.

  2. The second argument is the margin (1 for rows and 2 for columns).

  3. The third argument is a function name (e.g. mean).

  4. Further arguments are arguments to the function provided in 3.

# An example:
study$Q_sum_1_2 <- apply(study[, c("Q1", "Q2")], 1, sum)
study$Q_sum_1_2
[1]  3 10  6  8  7  5

Task

Create a new variable Q_sum with the sum of Q1 to Q5 (sums should be build when NAs are in the data)

vars <- c("Q1", "Q2", "Q3", "Q4", "Q5")

study$Q_sum <- apply(study[, vars], 1, sum, na.rm = TRUE)
study$Q_sum
[1]  9 16 13 15 10  9