Summarizes missing values across user-defined scales in a data frame.

analysis_missing(dat, scale)

Arguments

dat

A data frame containing the data to be analyzed.

scale

A named list of character vectors. Each list element defines a scale by specifying variable names (column names in dat) belonging to that scale.

Value

A data frame summarizing the missing value statistics for each scale with the following columns:

missing

Total number of missing values across variables in the scale.

total

Total number of values expected (cases × number of variables in scale).

p

Proportion of missing values (missing / total).

n cases

Number of cases with at least one missing value in the scale.

p cases

Proportion of such cases (n cases / total cases).

n all cases

Number of cases where all scale variables are missing.

p all cases

Proportion of cases where all scale variables are missing (n all cases / total cases).

Examples

dat <- data.frame(
  scale1_var1 = c(1, 2, NA, 4),
  scale1_var2 = c(NA, 2, 3, 4),
  scale2_var1 = c(1, NA, 3, 4),
  scale2_var2 = c(NA, NA, NA, 4)
)

scales <- list(
  scale1 = c("scale1_var1", "scale1_var2"),
  scale2 = c("scale2_var1", "scale2_var2")
)

analysis_missing(dat, scales)
#>        missing total    p n cases p cases n all cases p all cases
#> scale1       2     8 0.25       2    0.50           0        0.00
#> scale2       4     8 0.50       3    0.75           1        0.25