Huber2014$David4 Working with single-case data frames
4.1 Select cases
You can extract one or more single-cases from an scdf with multiple cases in two ways.
The first method follows the basic rules of the R syntax. If the case has a name, you can address it with the $ operator
or you can use squared brackets to select by the number (its position) of a case
Huber2014[1] #extracts case 1
Huber2014[2:3] #extracts cases 2 and 3new.huber2014 <- Huber2014[c(1, 4)] #extracts cases 1 and 4
new.huber2014#A single-case data frame with two cases
Adam: compliance mt phase | David: compliance mt phase |
25 1 A | 65.6 1 A |
20.8 2 A | 37.5 2 A |
39.6 3 A | 58.3 3 A |
75 4 A | 72.9 4 A |
45 5 A | 33.3 5 A |
39.6 6 A | 59.4 6 A |
54.2 7 A | 77.1 7 A |
50 8 A | 54.2 8 A |
28.1 9 A | 68.8 9 A |
40 10 A | 43.8 10 A |
52.1 11 B | 62.5 11 B |
31.3 12 B | 64.6 12 B |
15.6 13 B | 60.4 13 B |
29.2 14 B | 81.3 14 B |
43.8 15 B | 79.2 15 B |
# ... up to 61 more rows
The second method is to use the select_cases function.
select_cases(scdf, …)
Since version 0.53, scan includes functions to work with pipe-operators. scan imports the pipe operator %>% from the magrittr package. Alternatively, you can use R’s native pipe operator |>.
The select_cases() function takes case names and/or numbers for selecting cases:
# With pipes:
Huber2014 %>%
select_cases(Adam, Berta, 4) %>%
summary()#A single-case data frame with three cases
Measurements Design
Adam 37 A-B
Berta 29 A-B
David 76 A-B
Variable names:
compliance <dependent variable>
phase <phase variable>
mt <measurement-time variable>
Note: Behavioral data (compliance in percent).
Author of data: Christian Huber
# 1. Take the scdf Huber2014,
# 2. select the cases Adam, Berta and case number four,
# 3. show a summary of the remaining cases in the study. Case names can also be defined within a specific range by the colon operator:
Huber2014 %>%
select_cases(Berta:David) %>%
summary()#A single-case data frame with three cases
Measurements Design
Berta 29 A-B
Christian 76 A-B
David 76 A-B
Variable names:
compliance <dependent variable>
phase <phase variable>
mt <measurement-time variable>
Note: Behavioral data (compliance in percent).
Author of data: Christian Huber
4.2 Select measurements
subset(x, …)
The subset function is a method for the generic subset function. To call the help file you have to add the class to the function name: ?subset.scdf
The subset() function helps to extract measurements (or rows) from an scdf according to specific criteria.
Subset takes an scdf as the first argument and a logical expression (filter) as the second argument. Only measurements for which the logical argument is true are included in the returned scdf object.
For example, the scdf Huber2014 has a variable compliance and we want to keep measurements where compliance is greater than 10 because we assume the others are outliers:
Huber2014 %>%
subset(compliance > 10) %>%
summary()#A single-case data frame with four cases
Measurements Design
Adam 37 A-B
Berta 20 A-B
Christian 76 A-B
David 76 A-B
Variable names:
compliance <dependent variable>
phase <phase variable>
mt <measurement-time variable>
Note: Behavioral data (compliance in percent).
Author of data: Christian Huber
In a more complex example, we want to keep only values less than 60 when they are in phase A, or values equal to or greater than 60 when they are in phase B:
exampleAB %>%
subset((values < 60 & phase == "A") | (values >= 60 & phase == "B")) %>%
summary()#A single-case data frame with three cases
Measurements Design
Johanna 20 A-B
Karolina 18 A-B
Anja 19 A-B
Variable names:
values <dependent variable>
phase <phase variable>
mt <measurement-time variable>
Note: Randomly created data with normal distributed dependent variable.
4.3 Change and create variables
transform(*_data, …*)
With the help of the transform() function, you can add new variables or change existing variables for each case of an scdf. This can be useful if you want to
- z-standardize a variable,
- calculate a new variable as the sum of two existing variables
- convert a frequency to a percentage,
or in many other cases.
The transform function is a method for the generic transform function. To call the help file you have to add the class to the function name: ?transform.scdf
Here is an example of standardizing the dependent variable “values”:
exampleAB_z <- transform(
exampleAB, values = (values-mean(values)) / sd(values)
)
# note: alternatively for the same result:
# exampleAB_z <- transform(exampleAB, values = scale(values))Here is an example where a new percentage variable is added and the measurement times shifted to start with 0:
exampleAB_score %>%
transform(
percentage = values / trials * 100,
mt = mt - mt[1]
)#A single-case data frame with three cases
Christiano: values trials mt phase percentage
1 20 0 A 5
3 20 1 A 15
3 20 2 A 15
3 20 3 A 15
5 20 4 A 25
3 20 5 A 15
0 20 6 A 0
2 20 7 A 10
4 20 8 A 20
3 20 9 A 15
12 20 10 B 60
13 20 11 B 65
15 20 12 B 75
11 20 13 B 55
15 20 14 B 75
# ... up to 15 more rows
# two more cases
4.3.1 all_cases
The all_cases helper function returns the values of a variable across all cases. This allows for calculations where you need values within a case and values across cases, for example when you want to standardize a variable based on all cases:
exampleAB %>%
transform(
values = (values - mean(all_cases(values))) / sd(all_cases(values))
) %>%
setNames(paste0(names(exampleAB), "_z")) %>%
c(exampleAB) %>%
smd()Standardized mean differences
Johanna_z Karolina_z Anja_z Johanna Karolina Anja
mA -1.194 -1.431 -1.279 54.60 51.80 53.60
mB 0.454 0.398 0.449 74.13 73.47 74.07
sdA 0.203 0.577 0.257 2.41 6.83 3.05
sdB 0.755 0.824 0.639 8.94 9.76 7.57
sd cohen 0.553 0.711 0.487 6.55 8.43 5.77
sd hedges 0.673 0.776 0.577 7.97 9.19 6.83
Glass' delta 8.111 3.171 6.711 8.11 3.17 6.71
Hedges' g 2.451 2.357 2.996 2.45 2.36 3.00
Hedges' g correction 2.348 2.258 2.869 2.35 2.26 2.87
Hedges' g durlak correction 2.227 2.142 2.722 2.23 2.14 2.72
Cohen's d 2.983 2.572 3.545 2.98 2.57 3.54
# 1. Take the exampleAB scdf,
# 2. Z-standardise the values of each case based on all measurements,
# 3. rename the cases by adding a "_z" suffix,
# 4. add the original untransformed cases,
# 5. analyse the data by calculating measures of standardized mean differences.4.3.2 Smoothing
For smoothing the data dependent variable, transform has a number of helper functions:
moving_meancalculates the moving median of a series of values. Thelagargument specifies the number of values from which to calculate the mean (the default is 1, where the mean is calculated from a value and a measurement before and after that value),moving_medianis the same as before, but calculates the median instead of the mean,local_regressionregresses each value on the surrounding values. The argumentfdefines the fraction of the values (the defaultf = 0.2considers the surrounding 20% of the values). You must also provide the measurement time variable with the argumentmt.
transform(Huber2014,
"compliance (moving median)" = moving_median(compliance),
"compliance (moving mean)" = moving_mean(compliance),
"compliance (local regression)" = local_regression(compliance, mt = mt)
)#A single-case data frame with four cases
Adam: compliance mt phase compliance (moving median) compliance (moving mean)
25 1 A 25 25
20.8 2 A 25 28.47
39.6 3 A 39.6 47.69
75 4 A 45 55.9
45 5 A 45 46.83
39.6 6 A 45 46.88
54.2 7 A 50 50.36
50 8 A 50 42.82
28.1 9 A 40 36.97
40 10 A 40 43.02
52.1 11 B 40 42.14
31.3 12 B 31.3 29.68
15.6 13 B 29.2 24.83
29.2 14 B 29.2 32.61
43.8 15 B 29.2 33.8
compliance (local regression)
22.51
28.81
34.49
40.55
44.56
46.31
46.14
43.98
42.21
40.06
37.24
33.11
29.56
29.11
28.94
# ... up to 61 more rows
# three more cases
4.3.3 Transform values at the begining of a phase
The first_of helper function is specifically designed to replace values at or around the beginning of a phase. The first argument is a logical vector defining a selection criterion. The positions argument is a vector of positions to be addressed. Negative numbers refer to positions before and positive numbers to positions after the selection criteria. This is useful, for example, if you want to discard the first two measurements of a phase.
Here is an example that replaces the values at the beginning of phase A and the value after that to missing (NA), and also replaces the values at the beginning of phase B and the value before that to NA:
byHeart2011 %>%
transform(
values = replace(values, first_of(phase == "A", 0:1), NA),
values = replace(values, first_of(phase == "B", -1:0), NA)
)#A single-case data frame with 11 cases
Lisa (Turkish): values mt phase | Patrick (Spanish): values mt phase |
<NA> 1 A | <NA> 1 A |
<NA> 2 A | <NA> 2 A |
0 3 A | 3 3 A |
0 4 A | 0 4 A |
<NA> 5 A | <NA> 5 A |
<NA> 6 B | <NA> 6 B |
5 7 B | 8 7 B |
6 8 B | 8 8 B |
7 9 B | 8 9 B |
10 10 B | 12 10 B |
10 11 B | 13 11 B |
15 12 B | 13 12 B |
16 13 B | 15 13 B |
14 14 B | 14 14 B |
17 15 B | 15 15 B |
# ... up to 11 more rows
# nine more cases