Skip to contents

What is a dictionary file?

When you conduct research based on questionnaires or psychometric tests (and you are working in R), you typically create a data.frame with one column (variable) for each item on that questionnaire and one row for each person who participated. You can only store a limited amount of additional information about each item in that questionnaire within a data.frame (or tibble). You can give a variable a name and define a variable as a factor with appropriate levels. But basically, that is it. You cannot, at least not conveniently, include a longer label for each item, the name of a scale to which that item belongs to, information about reverse coding, etc.

I call the collection of this additional information about items an item dictionary. A dictionary contains a short label, a longer description, scale affiliation, and more for each item.

A dictionary file

A dictionary file is a table with one row for each variable and one column for each attribute of those variables. The most convenient way to create a dictionary file is in a spreadsheet program for later use with data sets.

Here is an extract from an example dic-file:

item_name item_label scale scale_label subscale subscale_label values value_labels missing type
itrf_I_1 Verbringt zu viel Zeit alleine ITRF Integrated teacher report form Int Internalizing 0:3 0 = not problematic; 1 = slightly problematic; 2 = problematic; 3 = strongly problematic -99 integer
itrf_I_2 Beschwert sich über Krankheit oder Schmerzen ITRF Integrated teacher report form Int Internalizing 0:3 0 = not problematic; 1 = slightly problematic; 2 = problematic; 3 = strongly problematic -99 integer
itrf_I_4 Vermeidet soziale Interaktionen ITRF Integrated teacher report form Int Internalizing 0:3 0 = not problematic; 1 = slightly problematic; 2 = problematic; 3 = strongly problematic -99 integer

A dictionary file can contain any additional attributes. This means that you can add a column with any name to store relevant information (e.g. the scale and scale label to which an item belongs, a translation of the item name). However, there are some predefined attributes with a specific meaning. The table below shows these attributes:

Basic columns of a dictionary file
Parameter Meaning Example
item_name A short item name itrf_1
item_label Full text of the item Vermeidet die Teilnahme an Diskussionen im Unterricht
values Valid response values in an R manner 1:5 (for integers 1 to 5) 1,2,3 (for integers 1, 2, 3)
value_labels Labels for each response value 0 = nicht; 1 = leicht; 2 = mäßig; 3 = stark
missing Missing values -888, -999
type Data type (factor, integer, float, real) integer
weight Reversion of item and its weight 1 (positive), -1 (reverse), 1.5 (positive, weights 1.5 times)

Apply a dictionary file

When you combine a dataset with a dictionary file, each variable in the dataset that corresponds to a variable described in the dictionary is completed with the given dictionary information.
The resulting dataset is now ready for use with all other scaledic functions.

The apply_dic function takes the name of the dataset and the dictionary file and combines them. Missing values are replaced by NAs:

# Here we use the example dataset "dat_itrf" and the example dic file "dic_itrf"
dat <- apply_dic(dat_itrf, dic_itrf)

Let us take a look at all the scales in the dataset:

list_scales(dat, paste0(c("scale", "subscale", "subscale_2"), "_label")) %>% kable()
scale_label subscale_label subscale_2_label
itrf_I_1 Integrated teacher report form Internalizing Socially Withdrawn
itrf_I_2 Integrated teacher report form Internalizing Anxious/Depressed
itrf_I_20 Integrated teacher report form Externalizing Oppositional/Disruptive
itrf_E_1 Integrated teacher report form Externalizing Academic Productivity/Disorganization

Clean raw data

Firstly, we check for invalid values in the dataset (e.g., typos) and replace them with NA:

dat <- check_values(dat, replace = NA)

Now we impute missing values:

# Imputation for items of the subscale Ext
dat <- impute_missing(dat, subscale == "Ext")

# Imputation for items of the subscale Int
dat <- impute_missing(dat, subscale == "Int")

Select scales for analyszing

Let us look at the descriptive statistics for the internalising subscale:

dat %>% 
  select_items(subscale == "Int") %>%
  descriptives(round = 1)
#>         name valid missing mean  sd min max range median mad
#> 1   itrf_I_1  4772       4  0.4 0.7   0   3     3      0   0
#> 2   itrf_I_2  4772       4  0.3 0.7   0   3     3      0   0
#> 3   itrf_I_4  4772       4  0.3 0.6   0   3     3      0   0
#> 4   itrf_I_5  4772       4  0.2 0.6   0   3     3      0   0
#> 5   itrf_I_6  4772       4  0.2 0.5   0   3     3      0   0
#> 6   itrf_I_7  4772       4  0.4 0.7   0   3     3      0   0
#> 7   itrf_I_8  4772       4  0.3 0.7   0   3     3      0   0
#> 8   itrf_I_9  4772       4  0.5 0.8   0   3     3      0   0
#> 9  itrf_I_10  4772       4  0.3 0.7   0   3     3      0   0
#> 10 itrf_I_11  4772       4  0.3 0.7   0   3     3      0   0
#> 11 itrf_I_12  4772       4  0.4 0.7   0   3     3      0   0
#> 12 itrf_I_13  4772       4  0.4 0.7   0   3     3      0   0
#> 13 itrf_I_14  4772       4  0.3 0.7   0   3     3      0   0
#> 14 itrf_I_15  4772       4  0.4 0.7   0   3     3      0   0
#> 15 itrf_I_16  4772       4  0.4 0.8   0   3     3      0   0
#> 16 itrf_I_17  4772       4  0.4 0.7   0   3     3      0   0
#> 17 itrf_I_19  4772       4  0.2 0.6   0   3     3      0   0
#> 18 itrf_I_23  4772       4  0.4 0.7   0   3     3      0   0
#> 19 itrf_I_24  4772       4  0.4 0.7   0   3     3      0   0

See items instead of labels

It is more convenient to see the original items rather than the short labels:

dat %>% 
  select_items(subscale == "Int") %>%
  rename_items() %>%
  descriptives(round = 1)  %>% 
  kable()
name valid missing mean sd min max range median mad
Verbringt zu viel Zeit alleine 4772 4 0.4 0.7 0 3 3 0 0
Beschwert sich über Krankheit oder Schmerzen 4772 4 0.3 0.7 0 3 3 0 0
Vermeidet soziale Interaktionen 4772 4 0.3 0.6 0 3 3 0 0
Spielt bevorzugt alleine 4772 4 0.2 0.6 0 3 3 0 0
Geht nicht auf Kontaktversuche der Mitschülerinnen und Mitschüler ein 4772 4 0.2 0.5 0 3 3 0 0
Macht sich Sorgen über unwichtige Details 4772 4 0.4 0.7 0 3 3 0 0
Beschwert sich über Kopfschmerzen oder Bauchschmerzen 4772 4 0.3 0.7 0 3 3 0 0
Wirkt unglücklich oder traurig 4772 4 0.5 0.8 0 3 3 0 0
Klammert sich an Erwachsene 4772 4 0.3 0.7 0 3 3 0 0
Verhält sich nervös 4772 4 0.3 0.7 0 3 3 0 0
Verhält sich ängstlich 4772 4 0.4 0.7 0 3 3 0 0
Behauptet sich nicht gegenüber anderen 4772 4 0.4 0.7 0 3 3 0 0
Verhält sich übermäßig schüchtern 4772 4 0.3 0.7 0 3 3 0 0
Beklagt sich oder jammert 4772 4 0.4 0.7 0 3 3 0 0
Beteiligt sich nicht an Gruppenaktionen 4772 4 0.4 0.8 0 3 3 0 0
Macht sich selbst schlecht 4772 4 0.4 0.7 0 3 3 0 0
Weint oder ist weinerlich 4772 4 0.2 0.6 0 3 3 0 0
Macht sich ständig Sorgen 4772 4 0.4 0.7 0 3 3 0 0
Lässt sich langsam auf neue Personen ein 4772 4 0.4 0.7 0 3 3 0 0

And then we analyse the factor structure. Here we use the rename_item() function to get a more convenient description.

dat %>%
  select_items(scale == "ITRF") %>%
  rename_items(pattern = "({reverse}){subscale}_{subscale_2}: {label}", max_chars = 70) %>%
  exploratory_fa(nfactors = 4, cut = 0.4) %>% kable()
MR1 MR3 MR2 MR4
(+)Ext_OPP: Verliert die Beherrschung 0.85
(+)Ext_OPP: Macht unangebrachte Bemerkungen 0.83
(+)Ext_OPP: Streitet und zankt mit Lehrkräften 0.8
(+)Ext_OPP: Hat Konflikte mit Mitschülerinnen und Mitschülern 0.8
(+)Ext_OPP: Kommandiert rum 0.78
(+)Ext_OPP: Verwendet unangemessene Sprache 0.78
(+)Ext_OPP: Ist schnell verärgert 0.76
(+)Ext_OPP: Stört andere 0.65
(+)Ext_OPP: Respektiert nicht die Privatsphäre anderer 0.64
(+)Ext_APD: Erledigt Hausaufgaben unvollständig 0.82
(+)Ext_APD: Zeigt Unterrichtsaufgaben nicht selbstständig vor 0.81
(+)Ext_APD: Stellt Unterrichtsaufgaben nicht rechtzeitig fertig 0.76
(+)Ext_APD: Kommt unvorbereitet zum Unterricht 0.73
(+)Ext_APD: Kontrolliert seine eigene Arbeit nicht 0.73
(+)Ext_APD: Nimmt Materialien, die zu Hause benötigt werden, nicht mit 0.72
(+)Ext_APD: Beginnt mit der Aufgabenbearbeitung nicht selbstständig 0.72
(+)Ext_APD: Beteiligt sich nicht am Unterricht 0.53
(+)Int_SW: Vermeidet soziale Interaktionen 0.86
(+)Int_SW: Geht nicht auf Kontaktversuche der Mitschülerinnen und Mits 0.79
(+)Int_SW: Spielt bevorzugt alleine 0.78
(+)Int_SW: Verbringt zu viel Zeit alleine 0.66
(+)Int_SW: Beteiligt sich nicht an Gruppenaktionen 0.66
(+)Int_SW: Verhält sich übermäßig schüchtern 0.55
(+)Int_SW: Behauptet sich nicht gegenüber anderen 0.5
(+)Int_SW: Lässt sich langsam auf neue Personen ein 0.47
(+)Int_AD: Beschwert sich über Kopfschmerzen oder Bauchschmerzen 0.73
(+)Int_AD: Macht sich ständig Sorgen 0.71
(+)Int_AD: Beschwert sich über Krankheit oder Schmerzen 0.7
(+)Int_AD: Beklagt sich oder jammert 0.65
(+)Int_AD: Macht sich Sorgen über unwichtige Details 0.65
(+)Int_AD: Weint oder ist weinerlich 0.59
(+)Int_AD: Verhält sich ängstlich 0.47
(+)Int_AD: Wirkt unglücklich oder traurig 0.47
(+)Int_AD: Macht sich selbst schlecht 0.44
(+)Int_AD: Klammert sich an Erwachsene 0.4
(+)Int_AD: Verhält sich nervös 0.4
SS loadings 6 4.79 4.53 4.35
Proportion Var 0.17 0.13 0.13 0.12
Cumulative Var 0.17 0.3 0.43 0.55
Proportion Explained 0.31 0.24 0.23 0.22
Cumulative Proportion 0.31 0.55 0.78 1

and provide item analyses

scales <- ex_itrf %>% get_scales(
  'APD' = subscale_2 == "APD",
  'OPP' = subscale_2 == "OPP",
  "SW" = subscale_2 == "SW",
  "AD" = subscale_2 == "AD"
)
alpha_table(dat, scales = scales) %>% kable()
Scale n n items Alpha CI95% Std.Alph CI95% Homogeneity Discriminations Means SDs |Loadings|
APD 4776 8 .91 [.90, .91] .91 [.91, .91] .56 [.53, .78] [0.38, 0.95] [0.75, 1.04] [.55, .82]
OPP 4776 9 .94 [.93, .94] .94 [.94, .94] .63 [.68, .81] [0.35, 0.83] [0.72, 0.96] [.70, .85]
SW 4772 8 .88 [.87, .88] .88 [.88, .89] .48 [.53, .78] [0.21, 0.43] [0.51, 0.76] [.56, .86]
AD 4772 11 .88 [.88, .89] .88 [.88, .89] .41 [.52, .69] [0.23, 0.48] [0.59, 0.77] [.55, .74]

Build scale scores

Now we will create scores for the internalizing and externalizing scales.

dat$itrf_ext <- score_scale(dat, scale == "ITRF" & subscale == "Ext", label = "Externalizing")
dat$itrf_int <- score_scale(dat, scale == "ITRF" & subscale == "Int", label = "Internalizing")

and get descriptives for those scores

dat %>%
  select_scores() %>%
  rename_items() %>%
  descriptives(round = 1)
#>            name valid missing mean  sd min max range median mad
#> 1 Internalizing  4772       4  0.3 0.4   0 2.6   2.6    0.2 0.3
#> 2 Externalizing  4776       0  0.6 0.6   0 3.0   3.0    0.4 0.5

Look up norms from a norm table

Many scales come with norm tables to convert raw scores to t-scores, percentile ranks, etc.

The lookup_norms function helps with this conversion.

Firstly, you need a data frame (or Excel table etc) which includes raw-scores and corresponding norm-scores.

Here is an example of such a table:

ex_normtable_int %>% slice(1:10) %>% kable()
group raw T PR T_from_PR
all 0 42 26 43
all 1 43 35 46
all 2 44 42 48
all 3 46 49 50
all 4 47 55 51
all 5 48 60 52
all 6 49 64 54
all 7 50 68 55
all 8 52 72 56
all 9 53 75 57

Then we need raw-scores from a scale. If they do not exist, you may use the score_scales function to add sum scores. Therefore set the sum argument to TRUE. By setting max_na = 0, we do not allow missing values in any scale item:

dat$raw_int <- score_scale(dat, subscale == "Int", sum = TRUE, max_na = 0)
dat$raw_ext <- score_scale(dat, subscale == "Ext", sum = TRUE, max_na = 0)

By default, lookup_norms looks for T values:

dat$T_int <- lookup_norms(dat$raw_int, normtable = ex_normtable_int)
dat$T_ext <- lookup_norms(dat$raw_ext, normtable = ex_normtable_ext)

But this can easily be changed to percentile ranks, if included:

dat$PR_int <- lookup_norms(dat$raw_int, normtable = ex_normtable_int, to = "PR")
dat$PR_ext <- lookup_norms(dat$raw_ext, normtable = ex_normtable_ext, to = "PR")
dat[1:10, c("T_int", "T_ext", "PR_int", "PR_ext")] %>% kable()
T_int T_ext PR_int PR_ext
43 40 35 14
43 46 35 49
47 47 55 53
71 87 95 100
47 64 55 89
44 50 42 64
61 40 88 14
46 46 49 49
53 62 75 87
47 77 55 98