Generates a concise textual description of a dataset, summarizing each column's type, number of missing values, and a short overview of its content (e.g., range or levels). Optionally, this description can be written to a README.md file. this helpfile was generated with AI

create_data_description(dat, readme = FALSE, tab = "   ", max_char = 60)

Arguments

dat

A data.frame or a character string pointing to an .rds file to load. If a character string is provided and ends with .rds, the file will be loaded using readRDS().

readme

Logical. If TRUE, the output is appended to a file named "README.md". Default is FALSE.

max_char

Integer. Maximum number of characters to show per variable summary. Longer content will be truncated. Default is 60.

tabs

Integer. Number of tab characters (\\t) to use for formatting. Default is 1.

Value

Invisibly returns a character vector with one entry per column describing its contents. Side effect: prints the description to the console, and optionally to a README file.

Details

For each column in the dataset, the function provides:

  • The column name

  • The type (typeof)

  • The number of missing values

  • A brief summary:

    • For factors: list of levels (possibly truncated)

    • For numeric variables: value range

    • For character variables: coerced to factor and listed as above

Examples

df <- data.frame(
  id = 1:10,
  group = factor(c("A", "B")),
  score = c(NA, 2:10)
)
create_data_description(df)
#> # Discription of datafile `df`
#> 
#> Columns: 3 | Rows: 10
#> 
#> id      (integer, 0 NA):   1 to 10
#> group   (factor, 0 NA):    A, B
#> score   (integer, 1 NA):   2 to 10