add_group_aggregate.RdComputes summary statistics for selected variables within subgroups defined
by the combination of one or more grouping variables (e.g., age within
sex) and merges the aggregated values back into the original data.
A data.frame containing the columns listed in grouping and
vars.
A character vector of one or more column names in dat that
define the subgroups (their joint combinations). For example, c("sex", "age") yields aggregates for each sex-by-age subgroup.
A character vector of column names in dat to be aggregated.
A list with named functions applied to each variable in vars
within each subgroup. Default computes the mean with missing values
removed.
A data.frame with the same observations as dat, plus additional
columns containing subgroup-level aggregated values for each variable in
vars.
Aggregation is performed using stats::aggregate() with by = dat[, grouping], so each unique combination of the grouping variables defines a
subgroup. Results are joined back to dat using base::merge() by all
grouping columns. If multiple functions are provided in func, the
resulting columns are suffixed with the names of the functions in func. If
func is an unnamed list, suffixes "stat1", "stat2", etc. are used.
dat <- data.frame(
sex = c("f", "f", "m", "m", "m"),
age = c(10, 10, 10, 12, 12),
score = c(1, NA, 3, 5, 7),
other = 1:5
)
# Mean score per subgroup (sex x age), added back to each row
add_group_aggregate(dat, grouping = c("sex", "age"), vars = "score")
#> sex age score other score_mean
#> 1 f 10 1 1 1
#> 2 f 10 NA 2 1
#> 3 m 10 3 3 3
#> 4 m 12 5 4 6
#> 5 m 12 7 5 6
# Maximum and median per subgroup
add_group_aggregate(
dat,
grouping = c("sex", "age"),
vars = c("score", "other"),
func = list(
max = function(x) max(x, na.rm = TRUE),
median = function(x) median(x, na.rm = TRUE)
)
)
#> sex age score other score_max other_max score_median other_median
#> 1 f 10 1 1 1 2 1 1.5
#> 2 f 10 NA 2 1 2 1 1.5
#> 3 m 10 3 3 3 3 3 3.0
#> 4 m 12 5 4 7 5 6 4.5
#> 5 m 12 7 5 7 5 6 4.5