split_at_percentile.RdSplits a numeric vector into groups at specified percentiles. Values below (or above) the percentiles are assigned to one group, values equal to or above (or below) the percentiles are assigned to the other groups. Optionally missing values can be assigned to a separate factor level.
split_at_percentile(x, frac, labels, type = "higher", explicit_na = NA)A vector.
A numeric vector with percentiles (between 0 and 1) at which to split the vector. Alternatively, use character strings "median", "tertile", "quartile", "quintile", or "decile" for common splits.
Vector with factor labels.
"higher" will split group below fraction and last group equal above last fraction. and "lower" will split group below fraction (vs. equal and above).
If not NA, NAs will be recoded as a factor level of the provided name. If TRUE, the name will default to '(Missing)'.
A vector of type factor with two levels.
This function computes the specified percentiles of the input vector and
assigns each value to a group based on these percentiles. The resulting
groups are returned as a factor with the specified labels. The
type parameter determines whether values equal to the percentile
thresholds are included in the lower or higher group.
Common splits can be specified using character strings for the frac
parameter:
"median": splits at the 50th percentile
"tertile": splits at the 33.3rd and 66.6th percentiles
"quartile": splits at the 25th, 50th and 75th percentiles
"quintile": splits at the 20th, 40th, 60th and 80th percentiles
"decile": splits at the 10th, 20th, ..., 90th percentiles
The labels parameter should contain one more label than the number of
percentiles specified in frac, as it defines the labels for each
resulting group.
If explicit_na is provided (not NA), missing values in the input
vector will be recoded as a separate factor level with the specified name.
If explicit_na is set to TRUE, the name will default to '(Missing)'.
## Generate sample data
x <- sample(c(1:100, NA), 1000, replace = TRUE)
## Ternary split
split_at_percentile(x, "tertile", explicit_na = TRUE) |> table()
#>
#> low middle high (Missing)
#> 328 327 332 13
## Quartile split with custom labels
split_at_percentile(
x,
frac = c(0.25, 0.5, 0.75),
labels = c("0-24", "25-49", "50-74", "75-100")
) |> table()
#>
#> 0-24 25-49 50-74 75-100
#> 239 241 257 250
## Quintile split
split_at_percentile(x, frac = "quintile") |> table() |> prop.table() |> round(2)
#>
#> quintile 1 quintile 2 quintile 3 quintile 4 quintile 5
#> 0.20 0.19 0.20 0.20 0.21