$David Huber2014
4 Working with single-case data frames
4.1 Select cases
You can extract one or more single-cases from an scdf with multiple cases in two ways.
The first method follows the basic rules of the R syntax. If the case has a name, you can address it with the $
operator
or you can use squared brackets to select by the number (its position) of a case
1] #extracts case 1
Huber2014[2:3] #extracts cases 2 and 3 Huber2014[
<- Huber2014[c(1, 4)] #extracts cases 1 and 4
new.huber2014 new.huber2014
#A single-case data frame with two cases
Adam: mt compliance phase | David: mt compliance phase |
1 25 A | 1 65.6 A |
2 20.8 A | 2 37.5 A |
3 39.6 A | 3 58.3 A |
4 75 A | 4 72.9 A |
5 45 A | 5 33.3 A |
6 39.6 A | 6 59.4 A |
7 54.2 A | 7 77.1 A |
8 50 A | 8 54.2 A |
9 28.1 A | 9 68.8 A |
10 40 A | 10 43.8 A |
11 52.1 B | 11 62.5 B |
12 31.3 B | 12 64.6 B |
13 15.6 B | 13 60.4 B |
14 29.2 B | 14 81.3 B |
15 43.8 B | 15 79.2 B |
# ... up to 61 more rows
The second method is to use the select_cases
function.
select_cases(scdf, …)
Since version 0.53, scan includes functions to work with pipe-operators. scan
imports the pipe operator %>%
from the magrittr package. Alternatively, you can use R’s native pipe operator |>.
The select_cases()
function takes case names and/or numbers for selecting cases:
# With pipes:
%>%
Huber2014 select_cases(Adam, Berta, 4) %>%
summary()
#A single-case data frame with three cases
Measurements Design
Adam 37 A-B
Berta 29 A-B
David 76 A-B
Variable names:
compliance <dependent variable>
phase <phase variable>
mt <measurement-time variable>
Note: Behavioral data (compliance in percent).
Author of data: Christian Huber
# 1. Take the scdf Huber2014,
# 2. select the cases Adam, Berta and case number four,
# 3. show a summary of the remaining cases in the study.
Case names can also be defined within a specific range by the colon operator:
%>%
Huber2014 select_cases(Berta:David) %>%
summary()
#A single-case data frame with three cases
Measurements Design
Berta 29 A-B
Christian 76 A-B
David 76 A-B
Variable names:
compliance <dependent variable>
phase <phase variable>
mt <measurement-time variable>
Note: Behavioral data (compliance in percent).
Author of data: Christian Huber
4.2 Select measurements
subset(x, …)
The subset function is a method for the generic subset
function. To call the help file you have to add the class to the function name: ?subset.scdf
The subset()
function helps to extract measurements (or rows) from an scdf according to specific criteria.
Subset takes an scdf as the first argument and a logical expression (filter
) as the second argument. Only measurements for which the logical argument is true are included in the returned scdf object.
For example, the scdf Huber2014
has a variable compliance
and we want to keep measurements where compliance
is greater than 10 because we assume the others are outliers:
%>%
Huber2014 subset(compliance > 10) %>%
summary()
#A single-case data frame with four cases
Measurements Design
Adam 37 A-B
Berta 20 A-B
Christian 76 A-B
David 76 A-B
Variable names:
compliance <dependent variable>
phase <phase variable>
mt <measurement-time variable>
Note: Behavioral data (compliance in percent).
Author of data: Christian Huber
In a more complex example, we want to keep only values less than 60 when they are in phase A, or values equal to or greater than 60 when they are in phase B:
%>%
exampleAB subset((values < 60 & phase == "A") | (values >= 60 & phase == "B")) %>%
summary()
#A single-case data frame with three cases
Measurements Design
Johanna 20 A-B
Karolina 18 A-B
Anja 19 A-B
Variable names:
values <dependent variable>
phase <phase variable>
mt <measurement-time variable>
Note: Randomly created data with normal distributed dependent variable.
4.3 Change and create variables
transform(_data, …)
With the help of the transform()
function, you can add new variables or change existing variables for each case of an scdf. This can be useful if you want to
- z-standardize a variable,
- calculate a new variable as the sum of two existing variables
- convert a frequency to a percentage,
or in many other cases.
The transform function is a method for the generic transform
function. To call the help file you have to add the class to the function name: ?transform.scdf
Here is an example of standardizing the dependent variable “values”:
<- transform(
exampleAB_z values = (values-mean(values)) / sd(values)
exampleAB,
)
# note: alternatively for the same result:
# exampleAB_z <- transform(exampleAB, values = scale(values))
Here is an example where a new percentage variable is added and the measurement times shifted to start with 0
:
%>%
exampleAB_score transform(
percentage = values / trials * 100,
mt = mt - mt[1]
)
#A single-case data frame with three cases
Christiano: values trials mt phase percentage
1 20 0 A 5
3 20 1 A 15
3 20 2 A 15
3 20 3 A 15
5 20 4 A 25
3 20 5 A 15
0 20 6 A 0
2 20 7 A 10
4 20 8 A 20
3 20 9 A 15
12 20 10 B 60
13 20 11 B 65
15 20 12 B 75
11 20 13 B 55
15 20 14 B 75
# ... up to 15 more rows
# two more cases
4.3.1 all_cases
The all_cases
helper function returns the values of a variable across all cases. This allows for calculations where you need values within a case and values across cases, for example when you want to standardize a variable based on all cases:
%>%
exampleAB transform(
values = (values - mean(all_cases(values))) / sd(all_cases(values))
%>%
) setNames(paste0(names(exampleAB), "_z")) %>%
c(exampleAB) %>%
smd()
Standardized mean differences
Johanna_z Karolina_z Anja_z Johanna Karolina Anja
mA -1.194 -1.431 -1.279 54.60 51.80 53.60
mB 0.454 0.398 0.449 74.13 73.47 74.07
sdA 0.203 0.577 0.257 2.41 6.83 3.05
sdB 0.755 0.824 0.639 8.94 9.76 7.57
sd cohen 0.553 0.711 0.487 6.55 8.43 5.77
sd hedges 0.673 0.776 0.577 7.97 9.19 6.83
Glass' delta 8.111 3.171 6.711 8.11 3.17 6.71
Hedges' g 2.451 2.357 2.996 2.45 2.36 3.00
Hedges' g correction 2.348 2.258 2.869 2.35 2.26 2.87
Hedges' g durlak correction 2.227 2.142 2.722 2.23 2.14 2.72
Cohen's d 2.983 2.572 3.545 2.98 2.57 3.54
# 1. Take the exampleAB scdf,
# 2. Z-standardise the values of each case based on all measurements,
# 3. rename the cases by adding a "_z" suffix,
# 4. add the original untransformed cases,
# 5. analyse the data by calculating measures of standardized mean differences.
4.3.2 Smoothing
For smoothing the data dependent variable, transform has a number of helper functions:
moving_mean
calculates the moving median of a series of values. Thelag
argument specifies the number of values from which to calculate the mean (the default is 1, where the mean is calculated from a value and a measurement before and after that value),moving_median
is the same as before, but calculates the median instead of the mean,local_regression
regresses each value on the surrounding values. The argumentf
defines the fraction of the values (the defaultf = 0.2
considers the surrounding 20% of the values). You must also provide the measurement time variable with the argumentmt
.
transform(Huber2014,
"compliance (moving median)" = moving_median(compliance),
"compliance (moving mean)" = moving_mean(compliance),
"compliance (local regression)" = local_regression(compliance, mt = mt)
)
#A single-case data frame with four cases
Adam: mt compliance phase compliance (moving median) compliance (moving mean)
1 25 A 25 25
2 20.8 A 25 28.47
3 39.6 A 39.6 47.69
4 75 A 45 55.9
5 45 A 45 46.83
6 39.6 A 45 46.88
7 54.2 A 50 50.36
8 50 A 50 42.82
9 28.1 A 40 36.97
10 40 A 40 43.02
11 52.1 B 40 42.14
12 31.3 B 31.3 29.68
13 15.6 B 29.2 24.83
14 29.2 B 29.2 32.61
15 43.8 B 29.2 33.8
compliance (local regression)
22.51
28.81
34.49
40.55
44.56
46.31
46.14
43.98
42.21
40.06
37.24
33.11
29.56
29.11
28.94
# ... up to 61 more rows
# three more cases
4.3.3 Transform values at the begining of a phase
The first_of
helper function is specifically designed to replace values at or around the beginning of a phase. The first argument is a logical vector defining a selection criterion. The positions
argument is a vector of positions to be addressed. Negative numbers refer to positions before and positive numbers to positions after the selection criteria. This is useful, for example, if you want to discard the first two measurements of a phase.
Here is an example that replaces the values at the beginning of phase A and the value after that to missing (NA), and also replaces the values at the beginning of phase B and the value before that to NA:
%>%
byHeart2011 transform(
values = replace(values, first_of(phase == "A", 0:1), NA),
values = replace(values, first_of(phase == "B", -1:0), NA)
)
#A single-case data frame with 11 cases
Lisa (Turkish): values mt phase | Patrick (Spanish): values mt phase |
<NA> 1 A | <NA> 1 A |
<NA> 2 A | <NA> 2 A |
0 3 A | 3 3 A |
0 4 A | 0 4 A |
<NA> 5 A | <NA> 5 A |
<NA> 6 B | <NA> 6 B |
5 7 B | 8 7 B |
6 8 B | 8 8 B |
7 9 B | 8 9 B |
10 10 B | 12 10 B |
10 11 B | 13 11 B |
15 12 B | 13 12 B |
16 13 B | 15 13 B |
14 14 B | 14 14 B |
17 15 B | 15 15 B |
# ... up to 11 more rows
# nine more cases