Create a new variable in a dataset with missing values

Please copy this syntax into R-Studio and execute:

study <- data.frame(
  sen    = factor(c(0, 1, 0, 1, 0, 1), labels = c("no sen", "sen"), levels = 0:1),
  gender = factor(c(1,2,2,1,2,1), labels = c("Male", "Female"), levels = 1:2),
  age  = c(150, 156, 138, 126, 136, 162),
  IQ     = c(90, 85, 90, 87, 99, 89),
  Q1     = c(1, 5, 3, 4, 2, 3),
  Q2     = c(2, 5, 3, 4, 5, 2),
  Q3     = c(1, 5, 3, 5, 2, 1)
)

Create a new variable in a dataset

With missing values:

study$Q4 <- NA

With specific values:

study$Q5 <- c(5, 1, 4, 2, 1, 3)
study
sen gender age IQ Q1 Q2 Q3 Q4 Q5
no sen Male 150 90 1 2 1 NA 5
sen Female 156 85 5 5 5 NA 1
no sen Female 138 90 3 3 3 NA 4
sen Male 126 87 4 4 5 NA 2
no sen Female 136 99 2 5 2 NA 1
sen Male 162 89 3 2 1 NA 3

Create new variables from existing ones

study$age2 <- study$age / 12
study
sen gender age IQ Q1 Q2 Q3 Q4 Q5 age2
no sen Male 150 90 1 2 1 NA 5 12.50000
sen Female 156 85 5 5 5 NA 1 13.00000
no sen Female 138 90 3 3 3 NA 4 11.50000
sen Male 126 87 4 4 5 NA 2 10.50000
no sen Female 136 99 2 5 2 NA 1 11.33333
sen Male 162 89 3 2 1 NA 3 13.50000

Drop a variable

study$age2 <- NULL
study
sen gender age IQ Q1 Q2 Q3 Q4 Q5
no sen Male 150 90 1 2 1 NA 5
sen Female 156 85 5 5 5 NA 1
no sen Female 138 90 3 3 3 NA 4
sen Male 126 87 4 4 5 NA 2
no sen Female 136 99 2 5 2 NA 1
sen Male 162 89 3 2 1 NA 3
# or: study <- subset(study, select = -age2)
# or: study <- study[, -10]

Task

Create a new variable Q_sum_1_3 as the sum of Q1 to Q3.

:-)

study$Q_sum_1_3 <- with(study, Q1 + Q2 + Q3)

# or:

study$Q_sum_1_3 <- study$Q1 + study$Q2 + study$Q3
study
sen gender age IQ Q1 Q2 Q3 Q4 Q5 Q_sum_1_3
no sen Male 150 90 1 2 1 NA 5 4
sen Female 156 85 5 5 5 NA 1 15
no sen Female 138 90 3 3 3 NA 4 9
sen Male 126 87 4 4 5 NA 2 13
no sen Female 136 99 2 5 2 NA 1 9
sen Male 162 89 3 2 1 NA 3 6

Task

Create the variables age_year and age_month where both variables do not have decimals

(tip: trunc function or the %% modulo operator and the integer division %/% operator. Use the help function if needed)

:-)

study$age_year <- trunc(study$age /12)
study$age_month <- study$age - (study$age_year * 12)

# or

study$age_year <- study$age %/% 12
study$age_month <- study$age %% 12

study
sen gender age IQ Q1 Q2 Q3 Q4 Q5 Q_sum_1_3 age_year age_month
no sen Male 150 90 1 2 1 NA 5 4 12 6
sen Female 156 85 5 5 5 NA 1 15 13 0
no sen Female 138 90 3 3 3 NA 4 9 11 6
sen Male 126 87 4 4 5 NA 2 13 10 6
no sen Female 136 99 2 5 2 NA 1 9 11 4
sen Male 162 89 3 2 1 NA 3 6 13 6

The apply function

The apply function applies a function to every row or column of a data frame.

  1. The first argument is the data frame object.

  2. The second argument is the margin (1 for rows and 2 for columns).

  3. The third argument is a function name (e.g. mean).

  4. Further arguments are arguments to the function provided in 3.

# An example:
study$Q_sum_1_2 <- apply(study[, c("Q1", "Q2")], 1, sum)
study
sen gender age IQ Q1 Q2 Q3 Q4 Q5 Q_sum_1_3 age_year age_month Q_sum_1_2
no sen Male 150 90 1 2 1 NA 5 4 12 6 3
sen Female 156 85 5 5 5 NA 1 15 13 0 10
no sen Female 138 90 3 3 3 NA 4 9 11 6 6
sen Male 126 87 4 4 5 NA 2 13 10 6 8
no sen Female 136 99 2 5 2 NA 1 9 11 4 7
sen Male 162 89 3 2 1 NA 3 6 13 6 5

Task

Create a new variable Q_sum with the sum of Q1 to Q5 (sums should be build when NAs are in the data)

study$Q_sum <- apply(subset(study, select = Q1:Q5), 1, sum, na.rm = TRUE)