Introduction to scaledic • scaledic

What is a dictionary file?

When you conduct research based on questionnaires or psychometric tests (and you are working in R), you typically create a data.frame with one column (variable) for each item on that questionnaire and one row for each person who participated. You can only store a limited amount of additional information about each item in that questionnaire within a data.frame (or tibble). You can give a variable a name and define a variable as a factor with appropriate levels. But basically, that is it. You cannot, at least not conveniently, include a longer label for each item, the name of a scale to which that item belongs to, information about reverse coding, etc.

I call the collection of this additional information about items an item dictionary. A dictionary contains a short label, a longer description, scale affiliation, and more for each item.

A dictionary file

A dictionary file is a table with one row for each variable and one column for each attribute of those variables. The most convenient way to create a dictionary file is in a spreadsheet program for later use with data sets.

Here is an extract from an example dic-file:

item_name	item_label	scale	scale_label	subscale	subscale_label	values	value_labels	missing	type
itrf_I_1	Verbringt zu viel Zeit alleine	ITRF	Integrated teacher report form	Int	Internalizing	0:3	0 = not problematic; 1 = slightly problematic; 2 = problematic; 3 = strongly problematic	-99	integer
itrf_I_2	Beschwert sich über Krankheit oder Schmerzen	ITRF	Integrated teacher report form	Int	Internalizing	0:3	0 = not problematic; 1 = slightly problematic; 2 = problematic; 3 = strongly problematic	-99	integer
itrf_I_4	Vermeidet soziale Interaktionen	ITRF	Integrated teacher report form	Int	Internalizing	0:3	0 = not problematic; 1 = slightly problematic; 2 = problematic; 3 = strongly problematic	-99	integer

A dictionary file can contain any additional attributes. This means that you can add a column with any name to store relevant information (e.g. the scale and scale label to which an item belongs, a translation of the item name). However, there are some predefined attributes with a specific meaning. The table below shows these attributes:

Basic columns of a dictionary file
Parameter	Meaning	Example
item_name	A short item name	itrf_1
item_label	Full text of the item	Vermeidet die Teilnahme an Diskussionen im Unterricht
values	Valid response values in an R manner	1:5 (for integers 1 to 5) 1,2,3 (for integers 1, 2, 3)
value_labels	Labels for each response value	0 = nicht; 1 = leicht; 2 = mäßig; 3 = stark
missing	Missing values	-888, -999
type	Data type (factor, integer, float, real)	integer
weight	Reversion of item and its weight	1 (positive), -1 (reverse), 1.5 (positive, weights 1.5 times)

Apply a dictionary file

When you combine a dataset with a dictionary file, each variable in the dataset that corresponds to a variable described in the dictionary is completed with the given dictionary information.
The resulting dataset is now ready for use with all other scaledic functions.

The apply_dic function takes the name of the dataset and the dictionary file and combines them. Missing values are replaced by NAs:

# Here we use the example dataset "dat_itrf" and the example dic file "dic_itrf"
dat <- apply_dic(dat_itrf, dic_itrf)
#> Found the following invalid values:
#> 
#> 'itrf_I_1'
#>   Row:   3192 
#> Value:      7 
#> 
#> 'itrf_I_2'
#>   Row:   4651 
#> Value:      9 
#> 
#> 'itrf_I_13'
#>   Row:   3699 
#> Value:      4 
#> 
#> 'itrf_I_20'
#>   Row:   2799 
#> Value:      4 
#> 
#> 'itrf_E_4'
#>   Row:   2621 
#> Value:      6 
#> 
#> 'itrf_E_6'
#>   Row:   2599 
#> Value:      9 
#> 
#> 'itrf_E_7'
#>   Row:   2599 
#> Value:      9 
#> 
#> 'itrf_E_8'
#>   Row:   2599 
#> Value:      9 
#> 
#> 'itrf_E_9'
#>   Row:   2599 
#> Value:      9 
#> 
#> 'itrf_E_10'
#>   Row:   2599 
#> Value:      9 
#> 
#> 'itrf_E_11'
#>   Row:   2599 
#> Value:      9 
#> 
#> 'itrf_E_12'
#>   Row:   2599 
#> Value:      9 
#> 
#> 'itrf_E_13'
#>   Row:   2599   4146 
#> Value:      9     11 
#> 
#> 'itrf_E_14'
#>   Row:   2599 
#> Value:      9

Let us take a look at all the scales in the dataset:

list_scales(dat, paste0(c("scale", "subscale", "subscale_2"), "_label")) |> kable()

scale_label	subscale_label	subscale_2_label
Integrated teacher report form	Internalizing	Socially Withdrawn
Integrated teacher report form	Internalizing	Anxious/Depressed
Integrated teacher report form	Externalizing	Oppositional/Disruptive
Integrated teacher report form	Externalizing	Academic Productivity/Disorganization

Clean raw data

Firstly, we check for invalid values in the dataset (e.g., typos) and replace them with NA:

dat <- check_values(dat, replace = NA)
#> No errors found.

Now we impute missing values:

# Imputation for items of the subscale Ext
dat <- impute_missing(dat, subscale == "Ext")

# Imputation for items of the subscale Int
dat <- impute_missing(dat, subscale == "Int")

Select scales for analyszing

Let us look at the descriptive statistics for the internalising subscale:

dat |>  
  select_items(subscale == "Int") |> 
  descriptives(round = 1) |> 
  kable()

name	valid	missing	mean	sd	max	range
itrf_I_1	4772	4	0.4	0.7	3	3
itrf_I_2	4772	4	0.3	0.7	3	3
itrf_I_4	4772	4	0.3	0.6	3	3
itrf_I_5	4772	4	0.2	0.6	3	3
itrf_I_6	4772	4	0.2	0.5	3	3
itrf_I_7	4772	4	0.4	0.7	3	3
itrf_I_8	4772	4	0.3	0.7	3	3
itrf_I_9	4772	4	0.5	0.8	3	3
itrf_I_10	4772	4	0.3	0.7	3	3
itrf_I_11	4772	4	0.3	0.7	3	3
itrf_I_12	4772	4	0.4	0.7	3	3
itrf_I_13	4772	4	0.4	0.7	3	3
itrf_I_14	4772	4	0.3	0.7	3	3
itrf_I_15	4772	4	0.4	0.7	3	3
itrf_I_16	4772	4	0.4	0.8	3	3
itrf_I_17	4772	4	0.4	0.7	3	3
itrf_I_19	4772	4	0.2	0.6	3	3
itrf_I_23	4772	4	0.4	0.7	3	3
itrf_I_24	4772	4	0.4	0.7	3	3

See items instead of labels

It is more convenient to see the original items rather than the short labels:

dat |> 
  select_items(subscale == "Int") |> 
  rename_items() |> 
  descriptives(round = 1) |> 
  kable()

name	valid	missing	mean	sd	max	range
Verbringt zu viel Zeit alleine	4772	4	0.4	0.7	3	3
Beschwert sich über Krankheit oder Schmerzen	4772	4	0.3	0.7	3	3
Vermeidet soziale Interaktionen	4772	4	0.3	0.6	3	3
Spielt bevorzugt alleine	4772	4	0.2	0.6	3	3
Geht nicht auf Kontaktversuche der Mitschülerinnen und Mitschüler ein	4772	4	0.2	0.5	3	3
Macht sich Sorgen über unwichtige Details	4772	4	0.4	0.7	3	3
Beschwert sich über Kopfschmerzen oder Bauchschmerzen	4772	4	0.3	0.7	3	3
Wirkt unglücklich oder traurig	4772	4	0.5	0.8	3	3
Klammert sich an Erwachsene	4772	4	0.3	0.7	3	3
Verhält sich nervös	4772	4	0.3	0.7	3	3
Verhält sich ängstlich	4772	4	0.4	0.7	3	3
Behauptet sich nicht gegenüber anderen	4772	4	0.4	0.7	3	3
Verhält sich übermäßig schüchtern	4772	4	0.3	0.7	3	3
Beklagt sich oder jammert	4772	4	0.4	0.7	3	3
Beteiligt sich nicht an Gruppenaktionen	4772	4	0.4	0.8	3	3
Macht sich selbst schlecht	4772	4	0.4	0.7	3	3
Weint oder ist weinerlich	4772	4	0.2	0.6	3	3
Macht sich ständig Sorgen	4772	4	0.4	0.7	3	3
Lässt sich langsam auf neue Personen ein	4772	4	0.4	0.7	3	3

And then we analyse the factor structure. Here we use the rename_item() function to get a more convenient description.

dat |> 
  select_items(scale == "ITRF") |>
  rename_items(pattern = "({reverse}){subscale}_{subscale_2}: {label}", max_chars = 70) |> 
  exploratory_fa(nfactors = 4, cut = 0.4) |> kable()

	MR1	MR3	MR2	MR4
(+)Ext_OPP: Verliert die Beherrschung	0.85
(+)Ext_OPP: Macht unangebrachte Bemerkungen	0.82
(+)Ext_OPP: Streitet und zankt mit Lehrkräften	0.8
(+)Ext_OPP: Hat Konflikte mit Mitschülerinnen und Mitschülern	0.8
(+)Ext_OPP: Kommandiert rum	0.78
(+)Ext_OPP: Verwendet unangemessene Sprache	0.78
(+)Ext_OPP: Ist schnell verärgert	0.77
(+)Ext_OPP: Stört andere	0.65
(+)Ext_OPP: Respektiert nicht die Privatsphäre anderer	0.64
(+)Ext_APD: Erledigt Hausaufgaben unvollständig		0.82
(+)Ext_APD: Zeigt Unterrichtsaufgaben nicht selbstständig vor		0.8
(+)Ext_APD: Stellt Unterrichtsaufgaben nicht rechtzeitig fertig		0.76
(+)Ext_APD: Kommt unvorbereitet zum Unterricht		0.73
(+)Ext_APD: Kontrolliert seine eigene Arbeit nicht		0.73
(+)Ext_APD: Nimmt Materialien, die zu Hause benötigt werden, nicht mit		0.73
(+)Ext_APD: Beginnt mit der Aufgabenbearbeitung nicht selbstständig		0.72
(+)Ext_APD: Beteiligt sich nicht am Unterricht		0.53
(+)Int_SW: Vermeidet soziale Interaktionen			0.86
(+)Int_SW: Geht nicht auf Kontaktversuche der Mitschülerinnen und Mits			0.79
(+)Int_SW: Spielt bevorzugt alleine			0.78
(+)Int_SW: Verbringt zu viel Zeit alleine			0.66
(+)Int_SW: Beteiligt sich nicht an Gruppenaktionen			0.66
(+)Int_SW: Verhält sich übermäßig schüchtern			0.55
(+)Int_SW: Behauptet sich nicht gegenüber anderen			0.5
(+)Int_SW: Lässt sich langsam auf neue Personen ein			0.48
(+)Int_AD: Beschwert sich über Kopfschmerzen oder Bauchschmerzen				0.73
(+)Int_AD: Macht sich ständig Sorgen				0.71
(+)Int_AD: Beschwert sich über Krankheit oder Schmerzen				0.7
(+)Int_AD: Beklagt sich oder jammert				0.65
(+)Int_AD: Macht sich Sorgen über unwichtige Details				0.65
(+)Int_AD: Weint oder ist weinerlich				0.59
(+)Int_AD: Verhält sich ängstlich				0.47
(+)Int_AD: Wirkt unglücklich oder traurig				0.47
(+)Int_AD: Macht sich selbst schlecht				0.44
(+)Int_AD: Klammert sich an Erwachsene				0.4
(+)Int_AD: Verhält sich nervös
SS loadings	6	4.79	4.54	4.34
Proportion Var	0.17	0.13	0.13	0.12
Cumulative Var	0.17	0.3	0.43	0.55
Proportion Explained	0.31	0.24	0.23	0.22
Cumulative Proportion	0.31	0.55	0.78	1

and provide item analyses

scales <- ex_itrf |> get_scales(
  'APD' = subscale_2 == "APD",
  'OPP' = subscale_2 == "OPP",
  "SW" = subscale_2 == "SW",
  "AD" = subscale_2 == "AD"
)
alpha_table(dat, scales = scales) |> kable()

Scale	n	n items	Alpha CI95%	Std.Alph CI95%	Homogeneity	Discriminations	Means	SDs	\|Loadings\|
APD	4776	8	.91 [.90, .91]	.91 [.91, .91]	.56	[.53, .78]	[0.38, 0.95]	[0.75, 1.04]	[.55, .82]
OPP	4776	9	.94 [.93, .94]	.94 [.94, .94]	.63	[.68, .81]	[0.35, 0.83]	[0.72, 0.96]	[.70, .85]
SW	4772	8	.88 [.87, .88]	.88 [.88, .89]	.49	[.53, .78]	[0.21, 0.43]	[0.51, 0.76]	[.56, .86]
AD	4772	11	.88 [.88, .89]	.88 [.88, .89]	.41	[.52, .69]	[0.23, 0.48]	[0.59, 0.77]	[.55, .74]

Build scale scores

Now we will create scores for the internalizing and externalizing scales.

dat$itrf_ext <- score_scale(dat, scale == "ITRF" & subscale == "Ext", label = "Externalizing")
dat$itrf_int <- score_scale(dat, scale == "ITRF" & subscale == "Int", label = "Internalizing")

and get descriptives for those scores

dat[, c("itrf_ext", "itrf_int")] |> 
  rename_items() |> 
  descriptives(round = 1)
#>            name valid missing mean  sd min max range median mad
#> 1 Externalizing  4776       0  0.6 0.6   0 3.0   3.0    0.4 0.5
#> 2 Internalizing  4772       4  0.3 0.4   0 2.6   2.6    0.2 0.3

Look up norms from a norm table

Many scales come with norm tables to convert raw scores to t-scores, percentile ranks, etc.

The lookup_norms function helps with this conversion.

Firstly, you need a data frame (or Excel table etc) which includes raw-scores and corresponding norm-scores.

Here is an example of such a table:

ex_normtable_int |> slice(1:10) |> kable()

group	raw	T	PR	T_from_PR
all	0	42	26	43
all	1	43	35	46
all	2	44	42	48
all	3	46	49	50
all	4	47	55	51
all	5	48	60	52
all	6	49	64	54
all	7	50	68	55
all	8	52	72	56
all	9	53	75	57

Then we need raw-scores from a scale. If they do not exist, you may use the score_scales function to add sum scores. Therefore set the sum argument to TRUE. By setting max_na = 0, we do not allow missing values in any scale item:

dat$raw_int <- score_scale(dat, subscale == "Int", sum = TRUE, max_na = 0)
dat$raw_ext <- score_scale(dat, subscale == "Ext", sum = TRUE, max_na = 0)

Looks up T values:

dat$T_int <- lookup_norms(dat$raw_int, normtable = ex_normtable_int, to = "T")
dat$T_ext <- lookup_norms(dat$raw_ext, normtable = ex_normtable_ext, to = "T")

Or percentile ranks:

dat$PR_int <- lookup_norms(dat$raw_int, normtable = ex_normtable_int, to = "PR")
dat$PR_ext <- lookup_norms(dat$raw_ext, normtable = ex_normtable_ext, to = "PR")

dat[1:10, c("T_int", "T_ext", "PR_int", "PR_ext")] |> kable()

T_int	T_ext	PR_int	PR_ext
43	40	35	14
43	46	35	49
47	47	55	53
71	87	95	100
47	64	55	89
44	50	42	64
61	40	88	14
46	46	49	49
53	60	75	85
47	77	55	98

group	raw	T	PR	T_from_PR
all	0	42	26	43
all	1	43	35	46
all	2	44	42	48
all	3	46	49	50
all	4	47	55	51
all	5	48	60	52
all	6	49	64	54
all	7	50	68	55
all	8	52	72	56
all	9	53	75	57

T_int	T_ext	PR_int	PR_ext
43	40	35	14
43	46	35	49
47	47	55	53
71	87	95	100
47	64	55	89
44	50	42	64
61	40	88	14
46	46	49	49
53	60	75	85
47	77	55	98

group	raw	T	PR	T_from_PR
all	0	42	26	43
all	1	43	35	46
all	2	44	42	48
all	3	46	49	50
all	4	47	55	51
all	5	48	60	52
all	6	49	64	54
all	7	50	68	55
all	8	52	72	56
all	9	53	75	57

T_int	T_ext	PR_int	PR_ext
43	40	35	14
43	46	35	49
47	47	55	53
71	87	95	100
47	64	55	89
44	50	42	64
61	40	88	14
46	46	49	49
53	60	75	85
47	77	55	98

group	raw	T	PR	T_from_PR
all	0	42	26	43
all	1	43	35	46
all	2	44	42	48
all	3	46	49	50
all	4	47	55	51
all	5	48	60	52
all	6	49	64	54
all	7	50	68	55
all	8	52	72	56
all	9	53	75	57

T_int	T_ext	PR_int	PR_ext
43	40	35	14
43	46	35	49
47	47	55	53
71	87	95	100
47	64	55	89
44	50	42	64
61	40	88	14
46	46	49	49
53	60	75	85
47	77	55	98