Skip to contents

In the following examples I will lay out the construction of a dictionary file.

First, we need an example data.frame to work with (i.e. the ex_scaledic_data example dataset):

ex_scaledic_data
rel_1 rel_2 rel_3 rel_4 rel_5 sui_1 sui_2 sui_3 sui_4 sui_5 gender age
1 6 1 2 1 2 2 2 3 0 f 10.5
3 1 1 2 1 4 1 4 1 0 d 6.5
1 66 2 3 4 3 2 1 3 3 d 10.5
2 2 5 3 1 0 2 1 2 3 f 13.0
5 5 4 2 3 1 3 1 1 3 f 8.0
3 4 5 5 1 3 2 3 1 1 d 6.0
3 2 2 2 3 0 4 3 4 2 f 6.0
4 3 4 3 3 0 0 4 3 3 f 10.0
2 6 2 11 1 1 -999 4 2 1 f 8.5
6 6 4 5 3 55 4 0 3 1 f 8.5
6 1 3 1 66 4 0 2 3 4 m 7.5
6 1 2 5 3 1 3 1 3 0 m -999.0
3 1 5 5 5 2 2 1 3 0 m 7.0
6 1 -999 1 1 4 2 3 3 0 m 7.5
2 2 1 5 3 2 3 2 2 0 d 8.5
3 5 5 4 2 1 0 0 1 2 d 13.0
4 4 5 5 1 3 1 4 66 0 -999 10.5
5 5 4 -999 4 1 4 2 0 1 m 10.0
6 5 4 2 4 0 0 2 3 3 m 8.0
2 3 5 2 3 1 3 2 4 3 m 6.0

Second, we need a dictionary file that is also a data.frame. Each row of a dictionary file addresses a variable (also called item in this context) and each column holds a specific attribute. These attributes can be divided in two classes. Firstly, predefined attributes with a specific name and additional attributed that can have any name. The predefined attributes are identified by their specific names. Theses are: item_name, item_label, values, value_labels, missing, weight, and type1.

Additional attributes can have any name and are mainly used for selecting items or displaying additional information of an item (e.g., the scale and subscale an item belongs to; the reference and the author of an item; the translation of an item).

Building a dictionary step by step

We start the example with a very simple dictionary file containing only item labels and the corresponding item names (i.e, the corresponding variable names in our data frame)2. The dictionary file has two columns, item_name and item_label:

Dictionary file
item_name item_label
rel_1 How often do you attend church or other religious meetings?
rel_2 How often do you spend time in private religious activities, such as prayer, meditation or Bible study?
rel_3 In my life, I experience the presence of the Divine (i.e., God)
rel_4 My religious beliefs are what really lie behind my whole approach to life
rel_5 I try hard to carry my religion over into all other dealings in life
sui_1 Did you feel tense in the last week?
sui_2 Did you feel blue in the last week?
sui_3 Did you feel irritated in the last week?
sui_4 Did you feel inferior in the last week?
sui_5 Did you have problems falling asleep in the last week?
gender gender
age age

We combine the data file and the dic file with the apply_dic() function:

dat_dic <- apply_dic(ex_scaledic_data, dic_file)

1: 'type' attribute missing and replaced with an estimation (12x)

We get the message that the dic file does not contain information of the datatype (e.g. integer, factor, character). The type attribute is used in various scaledic functions. For example when the data is checked for invalid values, missing values are imputed, or scales are scored. A variable can be one of the following types: integer for numbers without decimals, float for numbers with decimals, character for variables with text, and factor for variables with text or numbers that are levels of a factor. Scaledic will estimate the data type from the given data when the type attribute is not provided in the dic file.

Let us add type information to the dic file:

Dictionary file
item_name item_label type
rel_1 How often do you attend church or other religious meetings? integer
rel_2 How often do you spend time in private religious activities, such as prayer, meditation or Bible study? integer
rel_3 In my life, I experience the presence of the Divine (i.e., God) integer
rel_4 My religious beliefs are what really lie behind my whole approach to life integer
rel_5 I try hard to carry my religion over into all other dealings in life integer
sui_1 Did you feel tense in the last week? integer
sui_2 Did you feel blue in the last week? integer
sui_3 Did you feel irritated in the last week? integer
sui_4 Did you feel inferior in the last week? integer
sui_5 Did you have problems falling asleep in the last week? integer
gender gender factor
age age float

Most item values are of type integer (the item has only whole numbers) with the exception of gender which is of type factor (i.e. it has a nominal scale with several levels).

dat_dic <- apply_dic(ex_scaledic_data, dic_file)

Now we can add weight information to the dic file. The weight attribute tells various scaledic functions a) whether the item is inverted or not, and b) how the values of an item are weighted. If a weight value is unsigned (e.g. 1), the item is not inverted. If a weight value has a negative sign (e.g. -1), the item is inverted. If the (absolute) value is not 1, the item will be weighted when calculating the item scores (e.g. 1.5 will give an item a weight of 1.5).

Dictionary file
item_name item_label weight type
rel_1 How often do you attend church or other religious meetings? 1 integer
rel_2 How often do you spend time in private religious activities, such as prayer, meditation or Bible study? 1 integer
rel_3 In my life, I experience the presence of the Divine (i.e., God) 1 integer
rel_4 My religious beliefs are what really lie behind my whole approach to life 1 integer
rel_5 I try hard to carry my religion over into all other dealings in life 1 integer
sui_1 Did you feel tense in the last week? 1 integer
sui_2 Did you feel blue in the last week? 1 integer
sui_3 Did you feel irritated in the last week? 1 integer
sui_4 Did you feel inferior in the last week? 1 integer
sui_5 Did you have problems falling asleep in the last week? 1 integer
gender gender 1 factor
age age 1 float

In this example, all weights are 1 and no item is inverted.

Again, we join dic and data file:

dat_dic <- apply_dic(ex_scaledic_data, dic_file)

Now we add the values attribute to the dic file. values defines the valid values that a variable can take. Explicitly defining the values makes it possible to automatically identify invalid values in a data frame (e.g. typos). values are also necessary to reverse score an inverted item. For coding the values, you provide the possible values separated with a comma (e.g., 1, 2, 3, 4 for the integers 1 to 4). It is also possible to use a colon to define a range of integers (e.g., 5:11 indicated all integers starting with 5 and ending with 11). For type float, values represent the maximum and the minimum of valid values (e.g. 5, 11 for all values within 5 and 11 including decimal numbers). For variables of type factor or character you may want to provide text values. in that case, put these values within quotes (e.g., 'm', 'f', 'd' indicates three valid text entries).

Here is the dic file with added values:

Dictionary file
item_name item_label values weight type
rel_1 How often do you attend church or other religious meetings? 1:6 1 integer
rel_2 How often do you spend time in private religious activities, such as prayer, meditation or Bible study? 1:6 1 integer
rel_3 In my life, I experience the presence of the Divine (i.e., God) 1:5 1 integer
rel_4 My religious beliefs are what really lie behind my whole approach to life 1:5 1 integer
rel_5 I try hard to carry my religion over into all other dealings in life 1:5 1 integer
sui_1 Did you feel tense in the last week? 0:4 1 integer
sui_2 Did you feel blue in the last week? 0:4 1 integer
sui_3 Did you feel irritated in the last week? 0:4 1 integer
sui_4 Did you feel inferior in the last week? 0:4 1 integer
sui_5 Did you have problems falling asleep in the last week? 0:4 1 integer
gender gender ‘m’, ‘f’, ‘d’ 1 factor
age age 5, 11 1 float
dat_dic <- apply_dic(ex_scaledic_data, dic_file)

Invalid values can be replaced automatically with the check_values() function.

dat_dic <- check_values(dat_dic, report = TRUE, replace = NA) 
Found the following invalid values:

'rel_2'
  Row:      3 
Value:     66 

'rel_3'
  Row:     14 
Value:   -999 

'rel_4'
  Row:      9     18 
Value:     11   -999 

'rel_5'
  Row:     11 
Value:     66 

'sui_1'
  Row:     10 
Value:     55 

'sui_2'
  Row:      9 
Value:   -999 

'sui_4'
  Row:     17 
Value:     66 

'age'
  Row:      4     12     16 
Value:     13   -999     13 
dat_dic %>% kable(caption = "data frame with replaced invalid values")
data frame with replaced invalid values
rel_1 rel_2 rel_3 rel_4 rel_5 sui_1 sui_2 sui_3 sui_4 sui_5 gender age
1 6 1 2 1 2 2 2 3 0 f 10.5
3 1 1 2 1 4 1 4 1 0 d 6.5
1 NA 2 3 4 3 2 1 3 3 d 10.5
2 2 5 3 1 0 2 1 2 3 f NA
5 5 4 2 3 1 3 1 1 3 f 8.0
3 4 5 5 1 3 2 3 1 1 d 6.0
3 2 2 2 3 0 4 3 4 2 f 6.0
4 3 4 3 3 0 0 4 3 3 f 10.0
2 6 2 NA 1 1 NA 4 2 1 f 8.5
6 6 4 5 3 NA 4 0 3 1 f 8.5
6 1 3 1 NA 4 0 2 3 4 m 7.5
6 1 2 5 3 1 3 1 3 0 m NA
3 1 5 5 5 2 2 1 3 0 m 7.0
6 1 NA 1 1 4 2 3 3 0 m 7.5
2 2 1 5 3 2 3 2 2 0 d 8.5
3 5 5 4 2 1 0 0 1 2 d NA
4 4 5 5 1 3 1 4 NA 0 NA 10.5
5 5 4 NA 4 1 4 2 0 1 m 10.0
6 5 4 2 4 0 0 2 3 3 m 8.0
2 3 5 2 3 1 3 2 4 3 m 6.0

Next, we add value_labels. Value labels give longer labels for all or some of the values. Value labels are coded in the form value = label with a semicolon separating each entry. m = male; f = female; d = diverse codes three value label (quotes are not necessary).

Here is the dic file with added value_labels:

Dictionary file
item_name item_label values value_labels weight type
rel_1 How often do you attend church or other religious meetings? 1:6 1 = Never; 2 = Once a year or less; 3 = A few times a year; 4 = A few times a month; 5 = Once a week; 6 = More than once/week 1 integer
rel_2 How often do you spend time in private religious activities, such as prayer, meditation or Bible study? 1:6 1 = Rarely or never; 2 = A few times a month; 3 = Once a week; 4 = Two or more times/week; 5 = Daily; 6 = More than once a day 1 integer
rel_3 In my life, I experience the presence of the Divine (i.e., God) 1:5 1 = Definitely not true; 2 = Tends not to be true; 3 = Unsure; 4 = Tends to be true; 5 = Definitely true of me 1 integer
rel_4 My religious beliefs are what really lie behind my whole approach to life 1:5 1 = Definitely not true; 2 = Tends not to be true; 3 = Unsure; 4 = Tends to be true; 5 = Definitely true of me 1 integer
rel_5 I try hard to carry my religion over into all other dealings in life 1:5 1 = Definitely not true; 2 = Tends not to be true; 3 = Unsure; 4 = Tends to be true; 5 = Definitely true of me 1 integer
sui_1 Did you feel tense in the last week? 0:4 0 = not at all; 4 = extremely 1 integer
sui_2 Did you feel blue in the last week? 0:4 0 = not at all; 4 = extremely 1 integer
sui_3 Did you feel irritated in the last week? 0:4 0 = not at all; 4 = extremely 1 integer
sui_4 Did you feel inferior in the last week? 0:4 0 = not at all; 4 = extremely 1 integer
sui_5 Did you have problems falling asleep in the last week? 0:4 0 = not at all; 4 = extremely 1 integer
gender gender ‘m’, ‘f’, ‘d’ m = male; f = female; d = diverse 1 factor
age age 5, 11 5 = min; 11 = max 1 float
dat_dic <- apply_dic(ex_scaledic_data, dic_file) %>% check_values(replace = NA)

Now lets see the coding for some of the variables:

dat_dic$rel_1
# How often do you attend church or other religious meetings? 
# Data type is integer
# Valid values: 1:6
# Value labels:
# 1 = Never
# 2 = Once a year or less
# 3 = A few times a year
# 4 = A few times a month
# 5 = Once a week
# 6 = More than once/week
 [1] 1 3 1 2 5 3 3 4 2 6 6 6 3 6 2 3 4 5 6 2

Valid values are all integers from 1 to 6 and each value has a label.

dat_dic$sui_1
# Did you feel tense in the last week? 
# Data type is integer
# Valid values: 0:4
# Value labels:
# 0 = not at all
# 4 = extremely
 [1]  2  4  3  0  1  3  0  0  1 NA  4  1  2  4  2  1  3  1  0  1

Here, valid values are 0, 1, 2, 3, 4 (short 0:4) but only the poles (0 and 4) have labels.

dat_dic$gender
# gender 
# Data type is factor
# Value labels:
# m = male
# f = female
# d = diverse
 [1] f    d    d    f    f    d    f    f    f    f    m    m    m    m    d   
[16] d    <NA> m    m    m   
Levels: m f d

gender is of type factor. The valid values 'm', 'f', 'd' have been turned into three factor levels with the corresponding labels.

Caution: The labels provided in the value_labels attribute are not automtically turned into the levels of a factor. This is because R internally would turn the values (here 'm', 'f', 'd') into integers (here 1, 2, 3)

dat_dic$age
# age 
# Data type is float
# Valid values: From 5 to 11
# Value labels:
# 5 = min
# 11 = max
 [1] 10.5  6.5 10.5   NA  8.0  6.0  6.0 10.0  8.5  8.5  7.5   NA  7.0  7.5  8.5
[16]   NA 10.5 10.0  8.0  6.0

age can take all numbers from 5 to 11 (minimum is 5 and maximum is 11).

It is a common practice to code missing values with a specific number (rather than just leaving an empty entry on a datasheet). In the example dataset used here, -999 is used as a missing value. To account for this, we add the missing attribute to the dic file containing the missing number (e.g., -999). Multiple missing values are separated by commas (e.g. -99, -77).

Dictionary file
item_name item_label values value_labels weight type missing
rel_1 How often do you attend church or other religious meetings? 1:6 1 = Never; 2 = Once a year or less; 3 = A few times a year; 4 = A few times a month; 5 = Once a week; 6 = More than once/week 1 integer -999
rel_2 How often do you spend time in private religious activities, such as prayer, meditation or Bible study? 1:6 1 = Rarely or never; 2 = A few times a month; 3 = Once a week; 4 = Two or more times/week; 5 = Daily; 6 = More than once a day 1 integer -999
rel_3 In my life, I experience the presence of the Divine (i.e., God) 1:5 1 = Definitely not true; 2 = Tends not to be true; 3 = Unsure; 4 = Tends to be true; 5 = Definitely true of me 1 integer -999
rel_4 My religious beliefs are what really lie behind my whole approach to life 1:5 1 = Definitely not true; 2 = Tends not to be true; 3 = Unsure; 4 = Tends to be true; 5 = Definitely true of me 1 integer -999
rel_5 I try hard to carry my religion over into all other dealings in life 1:5 1 = Definitely not true; 2 = Tends not to be true; 3 = Unsure; 4 = Tends to be true; 5 = Definitely true of me 1 integer -999
sui_1 Did you feel tense in the last week? 0:4 0 = not at all; 4 = extremely 1 integer -999
sui_2 Did you feel blue in the last week? 0:4 0 = not at all; 4 = extremely 1 integer -999
sui_3 Did you feel irritated in the last week? 0:4 0 = not at all; 4 = extremely 1 integer -999
sui_4 Did you feel inferior in the last week? 0:4 0 = not at all; 4 = extremely 1 integer -999
sui_5 Did you have problems falling asleep in the last week? 0:4 0 = not at all; 4 = extremely 1 integer -999
gender gender ‘m’, ‘f’, ‘d’ m = male; f = female; d = diverse 1 factor
age age 5, 11 5 = min; 11 = max 1 float -999

The values within the missing attributes are automatically replaced with NA when joining a data with a dic file:

dat_dic <- apply_dic(ex_scaledic_data, dic_file)

1: Missing values replaced with NA

To turn off this behavior, set the argument replace_missing = FALSE. Default, values are not checked for invalid entries. To do this, either set check_values = TRUE or apply the check_values() function.

dat_dic <- apply_dic(ex_scaledic_data, dic_file, check_values = TRUE)

1: Invalid values replaced with NA
2: Missing values replaced with NA
dat_dic %>%
  slice(9:20) %>% 
  kable(caption = "Extract from the data frame with replaced missing values and checked values")
Extract from the data frame with replaced missing values and checked values
rel_1 rel_2 rel_3 rel_4 rel_5 sui_1 sui_2 sui_3 sui_4 sui_5 gender age
2 6 2 NA 1 1 NA 4 2 1 f 8.5
6 6 4 5 3 NA 4 0 3 1 f 8.5
6 1 3 1 NA 4 0 2 3 4 m 7.5
6 1 2 5 3 1 3 1 3 0 m NA
3 1 5 5 5 2 2 1 3 0 m 7.0
6 1 NA 1 1 4 2 3 3 0 m 7.5
2 2 1 5 3 2 3 2 2 0 d 8.5
3 5 5 4 2 1 0 0 1 2 d NA
4 4 5 5 1 3 1 4 NA 0 NA 10.5
5 5 4 NA 4 1 4 2 0 1 m 10.0
6 5 4 2 4 0 0 2 3 3 m 8.0
2 3 5 2 3 1 3 2 4 3 m 6.0

In the last step of this tutorial we will add information about the scales that the items belong to. Therefore, we add a new attribute called scale to the dic file and another attribute scale_label for a longer description (we could have named these attributes in any other way as they are not predefined attributes):

Dictionary file
item_name item_label scale scale_label values value_labels type weight missing
rel_1 How often do you attend church or other religious meetings? rel Religious beliefs 1:6 1 = Never; 2 = Once a year or less; 3 = A few times a year; 4 = A few times a month; 5 = Once a week; 6 = More than once/week integer 1 -999
rel_2 How often do you spend time in private religious activities, such as prayer, meditation or Bible study? rel Religious beliefs 1:6 1 = Rarely or never; 2 = A few times a month; 3 = Once a week; 4 = Two or more times/week; 5 = Daily; 6 = More than once a day integer 1 -999
rel_3 In my life, I experience the presence of the Divine (i.e., God) rel Religious beliefs 1:5 1 = Definitely not true; 2 = Tends not to be true; 3 = Unsure; 4 = Tends to be true; 5 = Definitely true of me integer 1 -999
rel_4 My religious beliefs are what really lie behind my whole approach to life rel Religious beliefs 1:5 1 = Definitely not true; 2 = Tends not to be true; 3 = Unsure; 4 = Tends to be true; 5 = Definitely true of me integer 1 -999
rel_5 I try hard to carry my religion over into all other dealings in life rel Religious beliefs 1:5 1 = Definitely not true; 2 = Tends not to be true; 3 = Unsure; 4 = Tends to be true; 5 = Definitely true of me integer 1 -999
sui_1 Did you feel tense in the last week? sui Suicide tendency 0:4 0 = not at all; 4 = extremely integer 1 -999
sui_2 Did you feel blue in the last week? sui Suicide tendency 0:4 0 = not at all; 4 = extremely integer 1 -999
sui_3 Did you feel irritated in the last week? sui Suicide tendency 0:4 0 = not at all; 4 = extremely integer 1 -999
sui_4 Did you feel inferior in the last week? sui Suicide tendency 0:4 0 = not at all; 4 = extremely integer 1 -999
sui_5 Did you have problems falling asleep in the last week? sui Suicide tendency 0:4 0 = not at all; 4 = extremely integer 1 -999
gender gender misc Miscellaneous ‘m’, ‘f’, ‘d’ m = male; f = female; d = diverse factor 1
age age misc Miscellaneous 5, 11 5 = min; 11 = max float 1 -999
dat_dic <- apply_dic(ex_scaledic_data, dic_file, check_values = TRUE)

1: Invalid values replaced with NA
2: Missing values replaced with NA

When can use the scale attribute to select items:

dat_dic %>% 
  select_items(scale == "rel") %>% 
  descriptives()
   name valid missing mean   sd min max range median  mad
1 rel_1    20       0 3.65 1.76   1   6     5      3 1.48
2 rel_2    19       1 3.32 1.92   1   6     5      3 2.97
3 rel_3    19       1 3.37 1.54   1   5     4      4 1.48
4 rel_4    18       2 3.17 1.50   1   5     4      3 1.48
5 rel_5    19       1 2.47 1.31   1   5     4      3 1.48

and the rename_items() function to get the item labels:

dat_dic %>% 
  select_items(scale == "rel") %>% 
  rename_items() %>%
  descriptives() %>%
  kable()
name valid missing mean sd min max range median mad
How often do you attend church or other religious meetings? 20 0 3.65 1.76 1 6 5 3 1.48
How often do you spend time in private religious activities, such as prayer, meditation or Bible study? 19 1 3.32 1.92 1 6 5 3 2.97
In my life, I experience the presence of the Divine (i.e., God) 19 1 3.37 1.54 1 5 4 4 1.48
My religious beliefs are what really lie behind my whole approach to life 18 2 3.17 1.50 1 5 4 3 1.48
I try hard to carry my religion over into all other dealings in life 19 1 2.47 1.31 1 5 4 3 1.48