How to build a dictionary file

In the following examples I will lay out the construction of a dictionary file.

First, we need an example data.frame to work with (i.e. the ex_scaledic_data example dataset):

ex_scaledic_data
rel_1	rel_2	rel_3	rel_4	rel_5	sui_1	sui_2	sui_3	sui_4	sui_5	gender	age
1	6	1	2	1	2	2	2	3	0	f	10.5
3	1	1	2	1	4	1	4	1	0	d	6.5
1	66	2	3	4	3	2	1	3	3	d	10.5
2	2	5	3	1	0	2	1	2	3	f	13.0
5	5	4	2	3	1	3	1	1	3	f	8.0
3	4	5	5	1	3	2	3	1	1	d	6.0
3	2	2	2	3	0	4	3	4	2	f	6.0
4	3	4	3	3	0	0	4	3	3	f	10.0
2	6	2	11	1	1	-999	4	2	1	f	8.5
6	6	4	5	3	55	4	0	3	1	f	8.5
6	1	3	1	66	4	0	2	3	4	m	7.5
6	1	2	5	3	1	3	1	3	0	m	-999.0
3	1	5	5	5	2	2	1	3	0	m	7.0
6	1	-999	1	1	4	2	3	3	0	m	7.5
2	2	1	5	3	2	3	2	2	0	d	8.5
3	5	5	4	2	1	0	0	1	2	d	13.0
4	4	5	5	1	3	1	4	66	0	-999	10.5
5	5	4	-999	4	1	4	2	0	1	m	10.0
6	5	4	2	4	0	0	2	3	3	m	8.0
2	3	5	2	3	1	3	2	4	3	m	6.0

Second, we need a dictionary file that is also a data.frame. Each row of a dictionary file addresses a variable (also called item in this context) and each column holds a specific attribute. These attributes can be divided in two classes. Firstly, predefined attributes with a specific name and additional attributed that can have any name. The predefined attributes are identified by their specific names. Theses are: item_name, item_label, values, value_labels, missing, weight, and type¹.

Additional attributes can have any name and are mainly used for selecting items or displaying additional information of an item (e.g., the scale and subscale an item belongs to; the reference and the author of an item; the translation of an item).

Building a dictionary step by step

We start the example with a very simple dictionary file containing only item labels and the corresponding item names (i.e, the corresponding variable names in our data frame)². The dictionary file has two columns, item_name and item_label:

Dictionary file
item_name	item_label
rel_1	How often do you attend church or other religious meetings?
rel_2	How often do you spend time in private religious activities, such as prayer, meditation or Bible study?
rel_3	In my life, I experience the presence of the Divine (i.e., God)
rel_4	My religious beliefs are what really lie behind my whole approach to life
rel_5	I try hard to carry my religion over into all other dealings in life
sui_1	Did you feel tense in the last week?
sui_2	Did you feel blue in the last week?
sui_3	Did you feel irritated in the last week?
sui_4	Did you feel inferior in the last week?
sui_5	Did you have problems falling asleep in the last week?
gender	gender
age	age

We combine the data file and the dic file with the apply_dic() function:

dat_dic <- apply_dic(ex_scaledic_data, dic_file)

1: 'type' attribute missing and replaced with an estimation (12x)

We get the message that the dic file does not contain information of the datatype (e.g. integer, factor, character). The type attribute is used in various scaledic functions. For example when the data is checked for invalid values, missing values are imputed, or scales are scored. A variable can be one of the following types: integer for numbers without decimals, float for numbers with decimals, character for variables with text, and factor for variables with text or numbers that are levels of a factor. Scaledic will estimate the data type from the given data when the type attribute is not provided in the dic file.

Let us add type information to the dic file:

Dictionary file
item_name	item_label	type
rel_1	How often do you attend church or other religious meetings?	integer
rel_2	How often do you spend time in private religious activities, such as prayer, meditation or Bible study?	integer
rel_3	In my life, I experience the presence of the Divine (i.e., God)	integer
rel_4	My religious beliefs are what really lie behind my whole approach to life	integer
rel_5	I try hard to carry my religion over into all other dealings in life	integer
sui_1	Did you feel tense in the last week?	integer
sui_2	Did you feel blue in the last week?	integer
sui_3	Did you feel irritated in the last week?	integer
sui_4	Did you feel inferior in the last week?	integer
sui_5	Did you have problems falling asleep in the last week?	integer
gender	gender	factor
age	age	float

Most item values are of type integer (the item has only whole numbers) with the exception of gender which is of type factor (i.e. it has a nominal scale with several levels).

dat_dic <- apply_dic(ex_scaledic_data, dic_file)

Now we can add weight information to the dic file. The weight attribute tells various scaledic functions a) whether the item is inverted or not, and b) how the values of an item are weighted. If a weight value is unsigned (e.g. 1), the item is not inverted. If a weight value has a negative sign (e.g. -1), the item is inverted. If the (absolute) value is not 1, the item will be weighted when calculating the item scores (e.g. 1.5 will give an item a weight of 1.5).

Dictionary file
item_name	item_label	weight	type
rel_1	How often do you attend church or other religious meetings?	1	integer
rel_2	How often do you spend time in private religious activities, such as prayer, meditation or Bible study?	1	integer
rel_3	In my life, I experience the presence of the Divine (i.e., God)	1	integer
rel_4	My religious beliefs are what really lie behind my whole approach to life	1	integer
rel_5	I try hard to carry my religion over into all other dealings in life	1	integer
sui_1	Did you feel tense in the last week?	1	integer
sui_2	Did you feel blue in the last week?	1	integer
sui_3	Did you feel irritated in the last week?	1	integer
sui_4	Did you feel inferior in the last week?	1	integer
sui_5	Did you have problems falling asleep in the last week?	1	integer
gender	gender	1	factor
age	age	1	float

In this example, all weights are 1 and no item is inverted.

Again, we join dic and data file:

dat_dic <- apply_dic(ex_scaledic_data, dic_file)

Now we add the values attribute to the dic file. values defines the valid values that a variable can take. Explicitly defining the values makes it possible to automatically identify invalid values in a data frame (e.g. typos). values are also necessary to reverse score an inverted item. For coding the values, you provide the possible values separated with a comma (e.g., 1, 2, 3, 4 for the integers 1 to 4). It is also possible to use a colon to define a range of integers (e.g., 5:11 indicated all integers starting with 5 and ending with 11). For type float, values represent the maximum and the minimum of valid values (e.g. 5, 11 for all values within 5 and 11 including decimal numbers). For variables of type factor or character you may want to provide text values. in that case, put these values within quotes (e.g., 'm', 'f', 'd' indicates three valid text entries).

Here is the dic file with added values:

Dictionary file
item_name	item_label	values	weight	type
rel_1	How often do you attend church or other religious meetings?	1:6	1	integer
rel_2	How often do you spend time in private religious activities, such as prayer, meditation or Bible study?	1:6	1	integer
rel_3	In my life, I experience the presence of the Divine (i.e., God)	1:5	1	integer
rel_4	My religious beliefs are what really lie behind my whole approach to life	1:5	1	integer
rel_5	I try hard to carry my religion over into all other dealings in life	1:5	1	integer
sui_1	Did you feel tense in the last week?	0:4	1	integer
sui_2	Did you feel blue in the last week?	0:4	1	integer
sui_3	Did you feel irritated in the last week?	0:4	1	integer
sui_4	Did you feel inferior in the last week?	0:4	1	integer
sui_5	Did you have problems falling asleep in the last week?	0:4	1	integer
gender	gender	‘m’, ‘f’, ‘d’	1	factor
age	age	5, 11	1	float

dat_dic <- apply_dic(ex_scaledic_data, dic_file)

Invalid values can be replaced automatically with the check_values() function.

dat_dic <- check_values(dat_dic, report = TRUE, replace = NA) 
Found the following invalid values:

'rel_2'
  Row:      3 
Value:     66 

'rel_3'
  Row:     14 
Value:   -999 

'rel_4'
  Row:      9     18 
Value:     11   -999 

'rel_5'
  Row:     11 
Value:     66 

'sui_1'
  Row:     10 
Value:     55 

'sui_2'
  Row:      9 
Value:   -999 

'sui_4'
  Row:     17 
Value:     66 

'age'
  Row:      4     12     16 
Value:     13   -999     13

dat_dic |> kable(caption = "data frame with replaced invalid values")

data frame with replaced invalid values
rel_1	rel_2	rel_3	rel_4	rel_5	sui_1	sui_2	sui_3	sui_4	sui_5	gender	age
1	6	1	2	1	2	2	2	3	0	f	10.5
3	1	1	2	1	4	1	4	1	0	d	6.5
1	NA	2	3	4	3	2	1	3	3	d	10.5
2	2	5	3	1	0	2	1	2	3	f	NA
5	5	4	2	3	1	3	1	1	3	f	8.0
3	4	5	5	1	3	2	3	1	1	d	6.0
3	2	2	2	3	0	4	3	4	2	f	6.0
4	3	4	3	3	0	0	4	3	3	f	10.0
2	6	2	NA	1	1	NA	4	2	1	f	8.5
6	6	4	5	3	NA	4	0	3	1	f	8.5
6	1	3	1	NA	4	0	2	3	4	m	7.5
6	1	2	5	3	1	3	1	3	0	m	NA
3	1	5	5	5	2	2	1	3	0	m	7.0
6	1	NA	1	1	4	2	3	3	0	m	7.5
2	2	1	5	3	2	3	2	2	0	d	8.5
3	5	5	4	2	1	0	0	1	2	d	NA
4	4	5	5	1	3	1	4	NA	0	NA	10.5
5	5	4	NA	4	1	4	2	0	1	m	10.0
6	5	4	2	4	0	0	2	3	3	m	8.0
2	3	5	2	3	1	3	2	4	3	m	6.0

Next, we add value_labels. Value labels give longer labels for all or some of the values. Value labels are coded in the form value = label with a semicolon separating each entry. m = male; f = female; d = diverse codes three value label (quotes are not necessary).

Here is the dic file with added value_labels:

Dictionary file
item_name	item_label	values	value_labels	weight	type
rel_1	How often do you attend church or other religious meetings?	1:6	1 = Never; 2 = Once a year or less; 3 = A few times a year; 4 = A few times a month; 5 = Once a week; 6 = More than once/week	1	integer
rel_2	How often do you spend time in private religious activities, such as prayer, meditation or Bible study?	1:6	1 = Rarely or never; 2 = A few times a month; 3 = Once a week; 4 = Two or more times/week; 5 = Daily; 6 = More than once a day	1	integer
rel_3	In my life, I experience the presence of the Divine (i.e., God)	1:5	1 = Definitely not true; 2 = Tends not to be true; 3 = Unsure; 4 = Tends to be true; 5 = Definitely true of me	1	integer
rel_4	My religious beliefs are what really lie behind my whole approach to life	1:5	1 = Definitely not true; 2 = Tends not to be true; 3 = Unsure; 4 = Tends to be true; 5 = Definitely true of me	1	integer
rel_5	I try hard to carry my religion over into all other dealings in life	1:5	1 = Definitely not true; 2 = Tends not to be true; 3 = Unsure; 4 = Tends to be true; 5 = Definitely true of me	1	integer
sui_1	Did you feel tense in the last week?	0:4	0 = not at all; 4 = extremely	1	integer
sui_2	Did you feel blue in the last week?	0:4	0 = not at all; 4 = extremely	1	integer
sui_3	Did you feel irritated in the last week?	0:4	0 = not at all; 4 = extremely	1	integer
sui_4	Did you feel inferior in the last week?	0:4	0 = not at all; 4 = extremely	1	integer
sui_5	Did you have problems falling asleep in the last week?	0:4	0 = not at all; 4 = extremely	1	integer
gender	gender	‘m’, ‘f’, ‘d’	m = male; f = female; d = diverse	1	factor
age	age	5, 11	5 = min; 11 = max	1	float

dat_dic <- apply_dic(ex_scaledic_data, dic_file) |>check_values(replace = NA)
Found the following invalid values:

'rel_2'
  Row:      3 
Value:     66 

'rel_3'
  Row:     14 
Value:   -999 

'rel_4'
  Row:      9     18 
Value:     11   -999 

'rel_5'
  Row:     11 
Value:     66 

'sui_1'
  Row:     10 
Value:     55 

'sui_2'
  Row:      9 
Value:   -999 

'sui_4'
  Row:     17 
Value:     66 

'age'
  Row:      4     12     16 
Value:     13   -999     13

Now lets see the coding for some of the variables:

dat_dic$rel_1
# How often do you attend church or other religious meetings? 
# Data type is integer
# Valid values: 1:6
# Value labels:
#   1 = Never
#   2 = Once a year or less
#   3 = A few times a year
#   4 = A few times a month
#   5 = Once a week
#   6 = More than once/week
# Length: 20
 [1] 1 3 1 2 5 3 3 4 2 6 6 6 3 6 2 3 4 5 6 2

Valid values are all integers from 1 to 6 and each value has a label.

dat_dic$sui_1
# Did you feel tense in the last week? 
# Data type is integer
# Valid values: 0:4
# Value labels:
#   0 = not at all
#   4 = extremely
# Length: 20
 [1]  2  4  3  0  1  3  0  0  1 NA  4  1  2  4  2  1  3  1  0  1

Here, valid values are 0, 1, 2, 3, 4 (short 0:4) but only the poles (0 and 4) have labels.

dat_dic$gender
# gender 
# Data type is factor
# Value labels:
#   m = male
#   f = female
#   d = diverse
# Length: 20
 [1] f    d    d    f    f    d    f    f    f    f    m    m    m    m    d   
[16] d    <NA> m    m    m   
Levels: m f d

gender is of type factor. The valid values 'm', 'f', 'd' have been turned into three factor levels with the corresponding labels.

Caution: The labels provided in the value_labels attribute are not automtically turned into the levels of a factor. This is because R internally would turn the values (here 'm', 'f', 'd') into integers (here 1, 2, 3)

dat_dic$age
# age 
# Data type is float
# Valid values: From 5 to 11
# Value labels:
#   5 = min
#   11 = max
# Length: 20
 [1] 10.5  6.5 10.5   NA  8.0  6.0  6.0 10.0  8.5  8.5  7.5   NA  7.0  7.5  8.5
[16]   NA 10.5 10.0  8.0  6.0

age can take all numbers from 5 to 11 (minimum is 5 and maximum is 11).

It is a common practice to code missing values with a specific number (rather than just leaving an empty entry on a datasheet). In the example dataset used here, -999 is used as a missing value. To account for this, we add the missing attribute to the dic file containing the missing number (e.g., -999). Multiple missing values are separated by commas (e.g. -99, -77).

Dictionary file
item_name	item_label	values	value_labels	weight	type	missing
rel_1	How often do you attend church or other religious meetings?	1:6	1 = Never; 2 = Once a year or less; 3 = A few times a year; 4 = A few times a month; 5 = Once a week; 6 = More than once/week	1	integer	-999
rel_2	How often do you spend time in private religious activities, such as prayer, meditation or Bible study?	1:6	1 = Rarely or never; 2 = A few times a month; 3 = Once a week; 4 = Two or more times/week; 5 = Daily; 6 = More than once a day	1	integer	-999
rel_3	In my life, I experience the presence of the Divine (i.e., God)	1:5	1 = Definitely not true; 2 = Tends not to be true; 3 = Unsure; 4 = Tends to be true; 5 = Definitely true of me	1	integer	-999
rel_4	My religious beliefs are what really lie behind my whole approach to life	1:5	1 = Definitely not true; 2 = Tends not to be true; 3 = Unsure; 4 = Tends to be true; 5 = Definitely true of me	1	integer	-999
rel_5	I try hard to carry my religion over into all other dealings in life	1:5	1 = Definitely not true; 2 = Tends not to be true; 3 = Unsure; 4 = Tends to be true; 5 = Definitely true of me	1	integer	-999
sui_1	Did you feel tense in the last week?	0:4	0 = not at all; 4 = extremely	1	integer	-999
sui_2	Did you feel blue in the last week?	0:4	0 = not at all; 4 = extremely	1	integer	-999
sui_3	Did you feel irritated in the last week?	0:4	0 = not at all; 4 = extremely	1	integer	-999
sui_4	Did you feel inferior in the last week?	0:4	0 = not at all; 4 = extremely	1	integer	-999
sui_5	Did you have problems falling asleep in the last week?	0:4	0 = not at all; 4 = extremely	1	integer	-999
gender	gender	‘m’, ‘f’, ‘d’	m = male; f = female; d = diverse	1	factor
age	age	5, 11	5 = min; 11 = max	1	float	-999

The values within the missing attributes are automatically replaced with NA when joining a data with a dic file:

dat_dic <- apply_dic(ex_scaledic_data, dic_file)

1: Missing values replaced with NA

To turn off this behavior, set the argument replace_missing = FALSE. Default, values are not checked for invalid entries. To do this, either set check_values = TRUE or apply the check_values() function.

dat_dic <- apply_dic(ex_scaledic_data, dic_file, check_values = TRUE)
Found the following invalid values:

'rel_2'
  Row:      3 
Value:     66 

'rel_4'
  Row:      9 
Value:     11 

'rel_5'
  Row:     11 
Value:     66 

'sui_1'
  Row:     10 
Value:     55 

'sui_4'
  Row:     17 
Value:     66 

'age'
  Row:      4     16 
Value:     13     13 

1: Invalid values replaced with NA
2: Missing values replaced with NA

dat_dic %>%
  slice(9:20) |>
  kable(caption = "Extract from the data frame with replaced missing values and checked values")

Extract from the data frame with replaced missing values and checked values
rel_1	rel_2	rel_3	rel_4	rel_5	sui_1	sui_2	sui_3	sui_4	sui_5	gender	age
2	6	2	NA	1	1	NA	4	2	1	f	8.5
6	6	4	5	3	NA	4	0	3	1	f	8.5
6	1	3	1	NA	4	0	2	3	4	m	7.5
6	1	2	5	3	1	3	1	3	0	m	NA
3	1	5	5	5	2	2	1	3	0	m	7.0
6	1	NA	1	1	4	2	3	3	0	m	7.5
2	2	1	5	3	2	3	2	2	0	d	8.5
3	5	5	4	2	1	0	0	1	2	d	NA
4	4	5	5	1	3	1	4	NA	0	NA	10.5
5	5	4	NA	4	1	4	2	0	1	m	10.0
6	5	4	2	4	0	0	2	3	3	m	8.0
2	3	5	2	3	1	3	2	4	3	m	6.0

In the last step of this tutorial we will add information about the scales that the items belong to. Therefore, we add a new attribute called scale to the dic file and another attribute scale_label for a longer description (we could have named these attributes in any other way as they are not predefined attributes):

Dictionary file
item_name	item_label	scale	scale_label	values	value_labels	type	weight	missing
rel_1	How often do you attend church or other religious meetings?	rel	Religious beliefs	1:6	1 = Never; 2 = Once a year or less; 3 = A few times a year; 4 = A few times a month; 5 = Once a week; 6 = More than once/week	integer	1	-999
rel_2	How often do you spend time in private religious activities, such as prayer, meditation or Bible study?	rel	Religious beliefs	1:6	1 = Rarely or never; 2 = A few times a month; 3 = Once a week; 4 = Two or more times/week; 5 = Daily; 6 = More than once a day	integer	1	-999
rel_3	In my life, I experience the presence of the Divine (i.e., God)	rel	Religious beliefs	1:5	1 = Definitely not true; 2 = Tends not to be true; 3 = Unsure; 4 = Tends to be true; 5 = Definitely true of me	integer	1	-999
rel_4	My religious beliefs are what really lie behind my whole approach to life	rel	Religious beliefs	1:5	1 = Definitely not true; 2 = Tends not to be true; 3 = Unsure; 4 = Tends to be true; 5 = Definitely true of me	integer	1	-999
rel_5	I try hard to carry my religion over into all other dealings in life	rel	Religious beliefs	1:5	1 = Definitely not true; 2 = Tends not to be true; 3 = Unsure; 4 = Tends to be true; 5 = Definitely true of me	integer	1	-999
sui_1	Did you feel tense in the last week?	sui	Suicide tendency	0:4	0 = not at all; 4 = extremely	integer	1	-999
sui_2	Did you feel blue in the last week?	sui	Suicide tendency	0:4	0 = not at all; 4 = extremely	integer	1	-999
sui_3	Did you feel irritated in the last week?	sui	Suicide tendency	0:4	0 = not at all; 4 = extremely	integer	1	-999
sui_4	Did you feel inferior in the last week?	sui	Suicide tendency	0:4	0 = not at all; 4 = extremely	integer	1	-999
sui_5	Did you have problems falling asleep in the last week?	sui	Suicide tendency	0:4	0 = not at all; 4 = extremely	integer	1	-999
gender	gender	misc	Miscellaneous	‘m’, ‘f’, ‘d’	m = male; f = female; d = diverse	factor	1
age	age	misc	Miscellaneous	5, 11	5 = min; 11 = max	float	1	-999

dat_dic <- apply_dic(ex_scaledic_data, dic_file, check_values = TRUE)
Found the following invalid values:

'rel_2'
  Row:      3 
Value:     66 

'rel_4'
  Row:      9 
Value:     11 

'rel_5'
  Row:     11 
Value:     66 

'sui_1'
  Row:     10 
Value:     55 

'sui_4'
  Row:     17 
Value:     66 

'age'
  Row:      4     16 
Value:     13     13 

1: Invalid values replaced with NA
2: Missing values replaced with NA

When can use the scale attribute to select items:

dat_dic |>
  select_items(scale == "rel") |>
  descriptives()
   name valid missing mean   sd min max range median  mad
1 rel_1    20       0 3.65 1.76   1   6     5      3 1.48
2 rel_2    19       1 3.32 1.92   1   6     5      3 2.97
3 rel_3    19       1 3.37 1.54   1   5     4      4 1.48
4 rel_4    18       2 3.17 1.50   1   5     4      3 1.48
5 rel_5    19       1 2.47 1.31   1   5     4      3 1.48

and the rename_items() function to get the item labels:

dat_dic |>
  select_items(scale == "rel") |>
  rename_items() %>%
  descriptives() %>%
  kable()

name	valid	missing	mean	sd	min	max	range	median	mad
How often do you attend church or other religious meetings?	20	0	3.65	1.76	1	6	5	3	1.48
How often do you spend time in private religious activities, such as prayer, meditation or Bible study?	19	1	3.32	1.92	1	6	5	3	2.97
In my life, I experience the presence of the Divine (i.e., God)	19	1	3.37	1.54	1	5	4	4	1.48
My religious beliefs are what really lie behind my whole approach to life	18	2	3.17	1.50	1	5	4	3	1.48
I try hard to carry my religion over into all other dealings in life	19	1	2.47	1.31	1	5	4	3	1.48

Jürgen Wilbert

2025-04-24

Building a dictionary step by step