Create a description of a dataset — create_data

Generates a concise textual description of a dataset, summarizing each column's type, number of missing values, and a short overview of its content (e.g., range or levels). Optionally, this description can be written to a README.md file. this helpfile was generated with AI

create_data_description(dat, readme = FALSE, tab = "   ", max_char = 60)

Arguments

dat: A data.frame or a character string pointing to an .rds file to load. If a character string is provided and ends with .rds, the file will be loaded using readRDS().
readme: Logical. If TRUE, the output is appended to a file named "README.md". Default is FALSE.
max_char: Integer. Maximum number of characters to show per variable summary. Longer content will be truncated. Default is 60.
tabs: Integer. Number of tab characters (\\t) to use for formatting. Default is 1.

Value

Invisibly returns a character vector with one entry per column describing its contents. Side effect: prints the description to the console, and optionally to a README file.

Details

For each column in the dataset, the function provides:

The column name
The type (typeof)
The number of missing values
A brief summary:
- For factors: list of levels (possibly truncated)
- For numeric variables: value range
- For character variables: coerced to factor and listed as above

Examples

df <- data.frame(
  id = 1:10,
  group = factor(c("A", "B")),
  score = c(NA, 2:10)
)
create_data_description(df)
#> # Discription of datafile `df`
#> 
#> Columns: 3 | Rows: 10
#> 
#> id      (integer, 0 NA):   1 to 10
#> group   (factor, 0 NA):    A, B
#> score   (integer, 1 NA):   2 to 10