Skip to contents

In the following examples I will lay out the construction of a dictionary file.

First, we need an example data.frame to work with (i.e. the ex_scaledic_data example dataset):

Table
ex_scaledic_data
rel_1 rel_2 rel_3 rel_4 rel_5 sui_1 sui_2 sui_3 sui_4 sui_5 gender age
1 6 1 2 1 2 2 2 3 0 f 10.5
3 1 1 2 1 4 1 4 1 0 d 6.5
1 66 2 3 4 3 2 1 3 3 d 10.5
2 2 5 3 1 0 2 1 2 3 f 13.0
5 5 4 2 3 1 3 1 1 3 f 8.0
3 4 5 5 1 3 2 3 1 1 d 6.0
3 2 2 2 3 0 4 3 4 2 f 6.0
4 3 4 3 3 0 0 4 3 3 f 10.0
2 6 2 11 1 1 -999 4 2 1 f 8.5
6 6 4 5 3 55 4 0 3 1 f 8.5
6 1 3 1 66 4 0 2 3 4 m 7.5
6 1 2 5 3 1 3 1 3 0 m -999.0
3 1 5 5 5 2 2 1 3 0 m 7.0
6 1 -999 1 1 4 2 3 3 0 m 7.5
2 2 1 5 3 2 3 2 2 0 d 8.5
3 5 5 4 2 1 0 0 1 2 d 13.0
4 4 5 5 1 3 1 4 66 0 -999 10.5
5 5 4 -999 4 1 4 2 0 1 m 10.0
6 5 4 2 4 0 0 2 3 3 m 8.0
2 3 5 2 3 1 3 2 4 3 m 6.0

Second, we need a dictionary file that is also a data.frame. Each row of a dictionary file addresses a variable (also called item in this context) and each column holds a specific attribute. These attributes can be divided in two classes. Firstly, predefined attributes with a specific name and additional attributed that can have any name. The predefined attributes are identified by their specific names. Theses are: item_name, item_label, values, value_labels, missing, weight, and type.

Additional attributes can have any name and are mainly used for selecting items or displaying additional information of an item (e.g., the scale and subscale an item belongs to; the reference and the author of an item; the translation of an item).

Building a dictionary step by step

We start the example with a very simple dictionary file containing only item labels and the corresponding item names (i.e, the corresponding variable names in our data frame)1. The dictionary file has two columns, item_name and item_label:

Table
Dictionary file
item_name item_label
rel_1 How often do you attend church or other religious meetings?
rel_2 How often do you spend time in private religious activities, such as prayer, meditation or Bible study?
rel_3 In my life, I experience the presence of the Divine (i.e., God)
rel_4 My religious beliefs are what really lie behind my whole approach to life
rel_5 I try hard to carry my religion over into all other dealings in life
sui_1 Did you feel tense in the last week?
sui_2 Did you feel blue in the last week?
sui_3 Did you feel irritated in the last week?
sui_4 Did you feel inferior in the last week?
sui_5 Did you have problems falling asleep in the last week?
gender gender
age age

We combine the data file and the dic file with the apply_dic() function:

dat_dic <- apply_dic(ex_scaledic_data, dic_file)

[33m!
[39m Type of rel_1 is missing and is estimated as 'numeric'.

[33m!
[39m Type of rel_2 is missing and is estimated as 'numeric'.

[33m!
[39m Type of rel_3 is missing and is estimated as 'numeric'.

[33m!
[39m Type of rel_4 is missing and is estimated as 'numeric'.

[33m!
[39m Type of rel_5 is missing and is estimated as 'numeric'.

[33m!
[39m Type of sui_1 is missing and is estimated as 'numeric'.

[33m!
[39m Type of sui_2 is missing and is estimated as 'numeric'.

[33m!
[39m Type of sui_3 is missing and is estimated as 'numeric'.

[33m!
[39m Type of sui_4 is missing and is estimated as 'numeric'.

[33m!
[39m Type of sui_5 is missing and is estimated as 'numeric'.

[33m!
[39m Type of gender is missing and is estimated as 'character'.

[33m!
[39m Type of age is missing and is estimated as 'numeric'.

We get the message that the dic file does not contain information of the datatype (e.g. numeric, factor, character). The type attribute is used in various scaledic functions. For example when the data is checked for invalid values, missing values are imputed, or scales are scored. A variable can be one of the following types: integer for numbers without decimals, numeric for numbers with or without decimals, character for variables with text, and factor for variables with text or numbers that are levels of a factor. Scaledic will estimate the data type from the given data when the type attribute is not provided in the dic file.

Let us add type information to the dic file:

Table
Dictionary file
item_name item_label type
rel_1 How often do you attend church or other religious meetings? numeric
rel_2 How often do you spend time in private religious activities, such as prayer, meditation or Bible study? numeric
rel_3 In my life, I experience the presence of the Divine (i.e., God) numeric
rel_4 My religious beliefs are what really lie behind my whole approach to life numeric
rel_5 I try hard to carry my religion over into all other dealings in life numeric
sui_1 Did you feel tense in the last week? numeric
sui_2 Did you feel blue in the last week? numeric
sui_3 Did you feel irritated in the last week? numeric
sui_4 Did you feel inferior in the last week? numeric
sui_5 Did you have problems falling asleep in the last week? numeric
gender gender factor
age age numeric

Most item values are of type numeric with the exception of gender which is of type factor (i.e. it has a nominal scale with several levels).

dat_dic <- apply_dic(ex_scaledic_data, dic_file)

Now we can add weight information to the dic file. The weight attribute tells various scaledic functions a) whether the item is inverted or not, and b) how the values of an item are weighted. If a weight value is unsigned (e.g. 1), the item is not inverted. If a weight value has a negative sign (e.g. -1), the item is inverted. If the (absolute) value is not 1, the item will be weighted when calculating the item scores (e.g. 1.5 will give an item a weight of 1.5).

Table
Dictionary file
item_name item_label weight type
rel_1 How often do you attend church or other religious meetings? 1 numeric
rel_2 How often do you spend time in private religious activities, such as prayer, meditation or Bible study? 1 numeric
rel_3 In my life, I experience the presence of the Divine (i.e., God) 1 numeric
rel_4 My religious beliefs are what really lie behind my whole approach to life 1 numeric
rel_5 I try hard to carry my religion over into all other dealings in life 1 numeric
sui_1 Did you feel tense in the last week? 1 numeric
sui_2 Did you feel blue in the last week? 1 numeric
sui_3 Did you feel irritated in the last week? 1 numeric
sui_4 Did you feel inferior in the last week? 1 numeric
sui_5 Did you have problems falling asleep in the last week? 1 numeric
gender gender 1 factor
age age 1 numeric

In this example, all weights are 1 and no item is inverted.

Again, we join dic and data file:

dat_dic <- apply_dic(ex_scaledic_data, dic_file)

Now we add the values attribute to the dic file. values defines the valid values that a variable can take. Explicitly defining the values makes it possible to automatically identify invalid values in a data frame (e.g. typos). values are also necessary to reverse score an inverted item. For coding the values, you provide the possible values separated with a comma (e.g., 1, 2, 3, 4 for the integers 1 to 4). It is also possible to use a colon to define a range of integers (e.g., 5:11 indicated all integers starting with 5 and ending with 11). For type float, values represent the maximum and the minimum of valid values (e.g. 5, 11 for all values within 5 and 11 including decimal numbers). For variables of type factor or character you may want to provide text values. in that case, put these values within quotes (e.g., 'm', 'f', 'd' indicates three valid text entries).

Here is the dic file with added values:

Table
Dictionary file
item_name item_label values weight type
rel_1 How often do you attend church or other religious meetings? 1:6 1 numeric
rel_2 How often do you spend time in private religious activities, such as prayer, meditation or Bible study? 1:6 1 numeric
rel_3 In my life, I experience the presence of the Divine (i.e., God) 1:5 1 numeric
rel_4 My religious beliefs are what really lie behind my whole approach to life 1:5 1 numeric
rel_5 I try hard to carry my religion over into all other dealings in life 1:5 1 numeric
sui_1 Did you feel tense in the last week? 0:4 1 numeric
sui_2 Did you feel blue in the last week? 0:4 1 numeric
sui_3 Did you feel irritated in the last week? 0:4 1 numeric
sui_4 Did you feel inferior in the last week? 0:4 1 numeric
sui_5 Did you have problems falling asleep in the last week? 0:4 1 numeric
gender gender 'm', 'f', 'd' 1 factor
age age 5, 11 1 numeric
dat_dic <- apply_dic(ex_scaledic_data, dic_file)

[33m!
[39m Set 1 NA in factor 'gender' for value -999

Factor ‘gender’ has a value (-999) which is not provided in the values attribute. For factors, these are replaced with NA.

Invalid values for all other data types can be replaced automatically with the check_values() function.

dat_dic <- check_values(dat_dic, replace = NA) 

[33m!
[39m rel_2' invalid at row 3 (is 66) -> set as NA

[33m!
[39m rel_3' invalid at row 14 (is -999) -> set as NA

[33m!
[39m rel_4' invalid at rows 9, 18 (is 11, -999) -> set as NA

[33m!
[39m rel_5' invalid at row 11 (is 66) -> set as NA

[33m!
[39m sui_1' invalid at row 10 (is 55) -> set as NA

[33m!
[39m sui_2' invalid at row 9 (is -999) -> set as NA

[33m!
[39m sui_4' invalid at row 17 (is 66) -> set as NA

[33m!
[39m age' invalid at rows 4, 12, 16 (is 13, -999, 13) -> set as NA
Table
data frame with replaced invalid values
rel_1 rel_2 rel_3 rel_4 rel_5 sui_1 sui_2 sui_3 sui_4 sui_5 gender age
1 6 1 2 1 2 2 2 3 0 f 10.5
3 1 1 2 1 4 1 4 1 0 d 6.5
1 NA 2 3 4 3 2 1 3 3 d 10.5
2 2 5 3 1 0 2 1 2 3 f NA
5 5 4 2 3 1 3 1 1 3 f 8.0
3 4 5 5 1 3 2 3 1 1 d 6.0
3 2 2 2 3 0 4 3 4 2 f 6.0
4 3 4 3 3 0 0 4 3 3 f 10.0
2 6 2 NA 1 1 NA 4 2 1 f 8.5
6 6 4 5 3 NA 4 0 3 1 f 8.5
6 1 3 1 NA 4 0 2 3 4 m 7.5
6 1 2 5 3 1 3 1 3 0 m NA
3 1 5 5 5 2 2 1 3 0 m 7.0
6 1 NA 1 1 4 2 3 3 0 m 7.5
2 2 1 5 3 2 3 2 2 0 d 8.5
3 5 5 4 2 1 0 0 1 2 d NA
4 4 5 5 1 3 1 4 NA 0 NA 10.5
5 5 4 NA 4 1 4 2 0 1 m 10.0
6 5 4 2 4 0 0 2 3 3 m 8.0
2 3 5 2 3 1 3 2 4 3 m 6.0

Next, we add value_labels. Value labels give longer labels for all or some of the values. Value labels are coded in the form value = label with a semicolon separating each entry. m = male; f = female; d = diverse codes three value label (quotes are not necessary).

Here is the dic file with added value_labels:

item_name item_label values value_labels weight type
rel_1 How often do you attend church or other religious meetings? 1:6 1 = Never; 2 = Once a year or less; 3 = A few times a year; 4 = A few times a month; 5 = Once a week; 6 = More than once/week 1 numeric
rel_2 How often do you spend time in private religious activities, such as prayer, meditation or Bible study? 1:6 1 = Rarely or never; 2 = A few times a month; 3 = Once a week; 4 = Two or more times/week; 5 = Daily; 6 = More than once a day 1 numeric
rel_3 In my life, I experience the presence of the Divine (i.e., God) 1:5 1 = Definitely not true; 2 = Tends not to be true; 3 = Unsure; 4 = Tends to be true; 5 = Definitely true of me 1 numeric
rel_4 My religious beliefs are what really lie behind my whole approach to life 1:5 1 = Definitely not true; 2 = Tends not to be true; 3 = Unsure; 4 = Tends to be true; 5 = Definitely true of me 1 numeric
rel_5 I try hard to carry my religion over into all other dealings in life 1:5 1 = Definitely not true; 2 = Tends not to be true; 3 = Unsure; 4 = Tends to be true; 5 = Definitely true of me 1 numeric
sui_1 Did you feel tense in the last week? 0:4 0 = not at all; 4 = extremely 1 numeric
sui_2 Did you feel blue in the last week? 0:4 0 = not at all; 4 = extremely 1 numeric
sui_3 Did you feel irritated in the last week? 0:4 0 = not at all; 4 = extremely 1 numeric
sui_4 Did you feel inferior in the last week? 0:4 0 = not at all; 4 = extremely 1 numeric
sui_5 Did you have problems falling asleep in the last week? 0:4 0 = not at all; 4 = extremely 1 numeric
gender gender 'm', 'f', 'd' m = male; f = female; d = diverse 1 factor
age age 5, 11 5 = min; 11 = max 1 numeric
dat_dic <- ex_scaledic_data |> 
  apply_dic(dic_file) |>
  check_values(replace = NA)

[33m!
[39m Set 1 NA in factor 'gender' for value -999

[33m!
[39m rel_2' invalid at row 3 (is 66) -> set as NA

[33m!
[39m rel_3' invalid at row 14 (is -999) -> set as NA

[33m!
[39m rel_4' invalid at rows 9, 18 (is 11, -999) -> set as NA

[33m!
[39m rel_5' invalid at row 11 (is 66) -> set as NA

[33m!
[39m sui_1' invalid at row 10 (is 55) -> set as NA

[33m!
[39m sui_2' invalid at row 9 (is -999) -> set as NA

[33m!
[39m sui_4' invalid at row 17 (is 66) -> set as NA

[33m!
[39m age' invalid at rows 4, 12, 16 (is 13, -999, 13) -> set as NA

Now lets see the coding for some of the variables:

dat_dic$rel_1
║How often do you attend church or other religious meetings? 
║Data type is numeric
║Valid values: From 1 to 6
║Value labels:
1 = Never
2 = Once a year or less
3 = A few times a year
4 = A few times a month
5 = Once a week
6 = More than once/week

║Length is 20 (0 NA; 0 invalid)
║ [1] 1 3 1 2 5 3 3 4 2 6 6 6 3 6 2 3 4 5 6 2

Valid values are all integers from 1 to 6 and each value has a label.

dat_dic$sui_1
║Did you feel tense in the last week? 
║Data type is numeric
║Valid values: From 0 to 4
║Value labels:
0 = not at all
4 = extremely

║Length is 20 (1 NA; 0 invalid)
║ [1]  2  4  3  0  1  3  0  0  1 NA  4  1  2  4  2  1  3  1  0  1

Here, valid values are 0, 1, 2, 3, 4 (short 0:4) but only the poles (0 and 4) have labels.

dat_dic$gender
║gender 
║Data type is factor
║Value labels:
║  m = male
║  f = female
║  d = diverse

║Length is 20 (1 NA; 0 invalid)
║ [1] female  diverse diverse female  female  diverse female  female  female 
║[10] female  male    male    male    male    diverse diverse <NA>    male   
║[19] male    male   
║Levels: male female diverse

gender is of type factor. The valid values 'm', 'f', 'd' have been turned into three factor levels with the corresponding labels.

Caution: The labels provided in the value_labels attribute are not automtically turned into the levels of a factor. This is because R internally would turn the values (here 'm', 'f', 'd') into integers (here 1, 2, 3)

dat_dic$age
║age 
║Data type is numeric
║Valid values: From 5 to 11
║Value labels:
5 = min
11 = max

║Length is 20 (3 NA; 0 invalid)
║ [1] 10.5  6.5 10.5   NA  8.0  6.0  6.0 10.0  8.5  8.5  7.5   NA  7.0  7.5  8.5
║[16]   NA 10.5 10.0  8.0  6.0

age can take all numbers from 5 to 11 (minimum is 5 and maximum is 11).

It is a common practice to code missing values with a specific number (rather than just leaving an empty entry on a datasheet). In the example dataset used here, -999 is used as a missing value. To account for this, we add the missing attribute to the dic file containing the missing number (e.g., -999). Multiple missing values are separated by commas (e.g. -99, -77).

Table
Dictionary file
item_name item_label values value_labels weight type missing
rel_1 How often do you attend church or other religious meetings? 1:6 1 = Never; 2 = Once a year or less; 3 = A few times a year; 4 = A few times a month; 5 = Once a week; 6 = More than once/week 1 numeric -999
rel_2 How often do you spend time in private religious activities, such as prayer, meditation or Bible study? 1:6 1 = Rarely or never; 2 = A few times a month; 3 = Once a week; 4 = Two or more times/week; 5 = Daily; 6 = More than once a day 1 numeric -999
rel_3 In my life, I experience the presence of the Divine (i.e., God) 1:5 1 = Definitely not true; 2 = Tends not to be true; 3 = Unsure; 4 = Tends to be true; 5 = Definitely true of me 1 numeric -999
rel_4 My religious beliefs are what really lie behind my whole approach to life 1:5 1 = Definitely not true; 2 = Tends not to be true; 3 = Unsure; 4 = Tends to be true; 5 = Definitely true of me 1 numeric -999
rel_5 I try hard to carry my religion over into all other dealings in life 1:5 1 = Definitely not true; 2 = Tends not to be true; 3 = Unsure; 4 = Tends to be true; 5 = Definitely true of me 1 numeric -999
sui_1 Did you feel tense in the last week? 0:4 0 = not at all; 4 = extremely 1 numeric -999
sui_2 Did you feel blue in the last week? 0:4 0 = not at all; 4 = extremely 1 numeric -999
sui_3 Did you feel irritated in the last week? 0:4 0 = not at all; 4 = extremely 1 numeric -999
sui_4 Did you feel inferior in the last week? 0:4 0 = not at all; 4 = extremely 1 numeric -999
sui_5 Did you have problems falling asleep in the last week? 0:4 0 = not at all; 4 = extremely 1 numeric -999
gender gender 'm', 'f', 'd' m = male; f = female; d = diverse 1 factor
age age 5, 11 5 = min; 11 = max 1 numeric -999

The values within the missing attributes are automatically replaced with NA when joining a data with a dic file:

dat_dic <- apply_dic(ex_scaledic_data, dic_file)

[33m!
[39m Set 1 NA in factor 'gender' for value -999

[33m!
[39m Replaced 1 missing value in 'rel_3' with NA

[33m!
[39m Replaced 1 missing value in 'rel_4' with NA

[33m!
[39m Replaced 1 missing value in 'sui_2' with NA

[33m!
[39m Replaced 1 missing value in 'age' with NA

To turn off this behavior, set the argument replace_missing = FALSE.

In the last step of this tutorial we will add information about the scales that the items belong to. Therefore, we add a new attribute called scale to the dic file and another attribute scale_label for a longer description (we could have named these attributes in any other way as they are not predefined attributes):

Table
Dictionary file
item_name item_label scale scale_label values value_labels type weight missing
rel_1 How often do you attend church or other religious meetings? rel Religious beliefs 1:6 1 = Never; 2 = Once a year or less; 3 = A few times a year; 4 = A few times a month; 5 = Once a week; 6 = More than once/week numeric 1 -999
rel_2 How often do you spend time in private religious activities, such as prayer, meditation or Bible study? rel Religious beliefs 1:6 1 = Rarely or never; 2 = A few times a month; 3 = Once a week; 4 = Two or more times/week; 5 = Daily; 6 = More than once a day numeric 1 -999
rel_3 In my life, I experience the presence of the Divine (i.e., God) rel Religious beliefs 1:5 1 = Definitely not true; 2 = Tends not to be true; 3 = Unsure; 4 = Tends to be true; 5 = Definitely true of me numeric 1 -999
rel_4 My religious beliefs are what really lie behind my whole approach to life rel Religious beliefs 1:5 1 = Definitely not true; 2 = Tends not to be true; 3 = Unsure; 4 = Tends to be true; 5 = Definitely true of me numeric 1 -999
rel_5 I try hard to carry my religion over into all other dealings in life rel Religious beliefs 1:5 1 = Definitely not true; 2 = Tends not to be true; 3 = Unsure; 4 = Tends to be true; 5 = Definitely true of me numeric 1 -999
sui_1 Did you feel tense in the last week? sui Suicide tendency 0:4 0 = not at all; 4 = extremely numeric 1 -999
sui_2 Did you feel blue in the last week? sui Suicide tendency 0:4 0 = not at all; 4 = extremely numeric 1 -999
sui_3 Did you feel irritated in the last week? sui Suicide tendency 0:4 0 = not at all; 4 = extremely numeric 1 -999
sui_4 Did you feel inferior in the last week? sui Suicide tendency 0:4 0 = not at all; 4 = extremely numeric 1 -999
sui_5 Did you have problems falling asleep in the last week? sui Suicide tendency 0:4 0 = not at all; 4 = extremely numeric 1 -999
gender gender misc Miscellaneous 'm', 'f', 'd' m = male; f = female; d = diverse factor 1
age age misc Miscellaneous 5, 11 5 = min; 11 = max numeric 1 -999
dat_dic <- apply_dic(ex_scaledic_data, dic_file, check_values = TRUE)

[33m!
[39m Set 1 NA in factor 'gender' for value -999

[33m!
[39m Replaced 1 missing value in 'rel_3' with NA

[33m!
[39m Replaced 1 missing value in 'rel_4' with NA

[33m!
[39m Replaced 1 missing value in 'sui_2' with NA

[33m!
[39m Replaced 1 missing value in 'age' with NA

[33m!
[39m rel_2' invalid at row 3 (is 66) -> set as NA

[33m!
[39m rel_4' invalid at row 9 (is 11) -> set as NA

[33m!
[39m rel_5' invalid at row 11 (is 66) -> set as NA

[33m!
[39m sui_1' invalid at row 10 (is 55) -> set as NA

[33m!
[39m sui_4' invalid at row 17 (is 66) -> set as NA

[33m!
[39m age' invalid at rows 4, 16 (is 13, 13) -> set as NA

You can use the scale attribute to select items:

dat_dic |>
  select_items(scale == "rel") |>
  wmisc::nice_descriptives(round = 2)
Table
Descriptive statistics
Variable Valid Missing Mean SD Min Max Range Median MAD
How often do you attend church or other religious meetings? 20 0 3.65 1.76 1 6 5 3 1.48
How often do you spend time in private religious activities, such as prayer, meditation or Bible study? 19 1 3.32 1.92 1 6 5 3 2.97
In my life, I experience the presence of the Divine (i.e., God) 19 1 3.37 1.54 1 5 4 4 1.48
My religious beliefs are what really lie behind my whole approach to life 18 2 3.17 1.50 1 5 4 3 1.48
I try hard to carry my religion over into all other dealings in life 19 1 2.47 1.31 1 5 4 3 1.48
Note. MAD is the median average deviation with a consistency adjustment.

and the rename_items() function to get the item labels:

dat_dic |>
  select_items(scale == "rel") |>
  rename_items() |>
  wmisc::nice_descriptives(round = 2)
Table
Descriptive statistics
Variable Valid Missing Mean SD Min Max Range Median MAD
How often do you attend church or other religious meetings? 20 0 3.65 1.76 1 6 5 3 1.48
How often do you spend time in private religious activities, such as prayer, meditation or Bible study? 19 1 3.32 1.92 1 6 5 3 2.97
In my life, I experience the presence of the Divine (i.e., God) 19 1 3.37 1.54 1 5 4 4 1.48
My religious beliefs are what really lie behind my whole approach to life 18 2 3.17 1.50 1 5 4 3 1.48
I try hard to carry my religion over into all other dealings in life 19 1 2.47 1.31 1 5 4 3 1.48
Note. MAD is the median average deviation with a consistency adjustment.