`$David Huber2014`

# 4 Working with *single-case data frames*

## 4.1 Select cases

You can extract one or more single-cases from an *scdf* with multiple cases in two ways.

The first method follows the basic rules of the R syntax. If the case has a name, you can address it with the `$`

operator

or you can use squared brackets to select by the number (its position) of a case

```
1] #extracts case 1
Huber2014[2:3] #extracts cases 2 and 3 Huber2014[
```

```
<- Huber2014[c(1, 4)] #extracts cases 1 and 4
new.huber2014 new.huber2014
```

```
#A single-case data frame with two cases
Adam: mt compliance phase ｜ David: mt compliance phase ｜
1 25 A ｜ 1 65.6 A ｜
2 20.8 A ｜ 2 37.5 A ｜
3 39.6 A ｜ 3 58.3 A ｜
4 75 A ｜ 4 72.9 A ｜
5 45 A ｜ 5 33.3 A ｜
6 39.6 A ｜ 6 59.4 A ｜
7 54.2 A ｜ 7 77.1 A ｜
8 50 A ｜ 8 54.2 A ｜
9 28.1 A ｜ 9 68.8 A ｜
10 40 A ｜ 10 43.8 A ｜
11 52.1 B ｜ 11 62.5 B ｜
12 31.3 B ｜ 12 64.6 B ｜
13 15.6 B ｜ 13 60.4 B ｜
14 29.2 B ｜ 14 81.3 B ｜
15 43.8 B ｜ 15 79.2 B ｜
# ... up to 61 more rows
```

The second method is to use the `select_cases`

function.

select_cases(scdf, …)

Since version 0.53, scan includes functions to work with pipe-operators. `scan`

imports the pipe operator `%>%`

from the magrittr package. Alternatively, you can use R’s native pipe operator `|>.`

The `select_cases()`

function takes case names and/or numbers for selecting cases:

```
# With pipes:
%>%
Huber2014 select_cases(Adam, Berta, 4) %>%
summary()
```

```
#A single-case data frame with three cases
Measurements Design
Adam 37 A-B
Berta 29 A-B
David 76 A-B
Variable names:
compliance <dependent variable>
phase <phase variable>
mt <measurement-time variable>
Note: Behavioral data (compliance in percent).
Author of data: Christian Huber
```

```
# 1. Take the scdf Huber2014,
# 2. select the cases Adam, Berta and case number four,
# 3. show a summary of the remaining cases in the study.
```

Case names can also be defined within a specific range by the colon operator:

```
%>%
Huber2014 select_cases(Berta:David) %>%
summary()
```

```
#A single-case data frame with three cases
Measurements Design
Berta 29 A-B
Christian 76 A-B
David 76 A-B
Variable names:
compliance <dependent variable>
phase <phase variable>
mt <measurement-time variable>
Note: Behavioral data (compliance in percent).
Author of data: Christian Huber
```

## 4.2 Select measurements

subset(x, …)

The subset function is a method for the generic `subset`

function. To call the help file you have to add the class to the function name: `?subset.scdf`

The `subset()`

function helps to extract measurements (or rows) from an scdf according to specific criteria.

Subset takes an scdf as the first argument and a logical expression (`filter`

) as the second argument. Only measurements for which the logical argument is true are included in the returned scdf object.

For example, the scdf `Huber2014`

has a variable `compliance`

and we want to keep measurements where `compliance`

is greater than 10 because we assume the others are outliers:

```
%>%
Huber2014 subset(compliance > 10) %>%
summary()
```

```
#A single-case data frame with four cases
Measurements Design
Adam 37 A-B
Berta 20 A-B
Christian 76 A-B
David 76 A-B
Variable names:
compliance <dependent variable>
phase <phase variable>
mt <measurement-time variable>
Note: Behavioral data (compliance in percent).
Author of data: Christian Huber
```

In a more complex example, we want to keep only values less than 60 when they are in phase A, or values equal to or greater than 60 when they are in phase B:

```
%>%
exampleAB subset((values < 60 & phase == "A") | (values >= 60 & phase == "B")) %>%
summary()
```

```
#A single-case data frame with three cases
Measurements Design
Johanna 20 A-B
Karolina 18 A-B
Anja 19 A-B
Variable names:
values <dependent variable>
phase <phase variable>
mt <measurement-time variable>
Note: Randomly created data with normal distributed dependent variable.
```

## 4.3 Change and create variables

transform(_data, …)

With the help of the `transform()`

function, you can add new variables or change existing variables for each case of an *scdf*. This can be useful if you want to

- z-standardize a variable,
- calculate a new variable as the sum of two existing variables
- convert a frequency to a percentage,

or in many other cases.

The transform function is a method for the generic `transform`

function. To call the help file you have to add the class to the function name: `?transform.scdf`

Here is an example of standardizing the dependent variable “values”:

```
<- transform(
exampleAB_z values = (values-mean(values)) / sd(values)
exampleAB,
)
# note: alternatively for the same result:
# exampleAB_z <- transform(exampleAB, values = scale(values))
```

Here is an example where a new percentage variable is added and the measurement times shifted to start with `0`

:

```
%>%
exampleAB_score transform(
percentage = values / trials * 100,
mt = mt - mt[1]
)
```

```
#A single-case data frame with three cases
Christiano: values trials mt phase percentage
1 20 0 A 5
3 20 1 A 15
3 20 2 A 15
3 20 3 A 15
5 20 4 A 25
3 20 5 A 15
0 20 6 A 0
2 20 7 A 10
4 20 8 A 20
3 20 9 A 15
12 20 10 B 60
13 20 11 B 65
15 20 12 B 75
11 20 13 B 55
15 20 14 B 75
# ... up to 15 more rows
# two more cases
```

### 4.3.1 all_cases

The `all_cases`

helper function returns the values of a variable across all cases. This allows for calculations where you need values within a case and values across cases, for example when you want to standardize a variable based on all cases:

```
%>%
exampleAB transform(
values = (values - mean(all_cases(values))) / sd(all_cases(values))
%>%
) setNames(paste0(names(exampleAB), "_z")) %>%
c(exampleAB) %>%
smd()
```

```
Standardized mean differences
Johanna_z Karolina_z Anja_z Johanna Karolina Anja
mA -1.194 -1.431 -1.279 54.60 51.80 53.60
mB 0.454 0.398 0.449 74.13 73.47 74.07
sdA 0.203 0.577 0.257 2.41 6.83 3.05
sdB 0.755 0.824 0.639 8.94 9.76 7.57
sd cohen 0.553 0.711 0.487 6.55 8.43 5.77
sd hedges 0.673 0.776 0.577 7.97 9.19 6.83
Glass' delta 8.111 3.171 6.711 8.11 3.17 6.71
Hedges' g 2.451 2.357 2.996 2.45 2.36 3.00
Hedges' g correction 2.348 2.258 2.869 2.35 2.26 2.87
Hedges' g durlak correction 2.227 2.142 2.722 2.23 2.14 2.72
Cohen's d 2.983 2.572 3.545 2.98 2.57 3.54
```

```
# 1. Take the exampleAB scdf,
# 2. Z-standardise the values of each case based on all measurements,
# 3. rename the cases by adding a "_z" suffix,
# 4. add the original untransformed cases,
# 5. analyse the data by calculating measures of standardized mean differences.
```

### 4.3.2 Smoothing

For smoothing the data dependent variable, transform has a number of helper functions:

`moving_mean`

calculates the moving median of a series of values. The`lag`

argument specifies the number of values from which to calculate the mean (the default is 1, where the mean is calculated from a value and a measurement before and after that value),`moving_median`

is the same as before, but calculates the median instead of the mean,`local_regression`

regresses each value on the surrounding values. The argument`f`

defines the fraction of the values (the default`f = 0.2`

considers the surrounding 20% of the values). You must also provide the measurement time variable with the argument`mt`

.

```
transform(Huber2014,
"compliance (moving median)" = moving_median(compliance),
"compliance (moving mean)" = moving_mean(compliance),
"compliance (local regression)" = local_regression(compliance, mt = mt)
)
```

```
#A single-case data frame with four cases
Adam: mt compliance phase compliance (moving median) compliance (moving mean)
1 25 A 25 25
2 20.8 A 25 28.47
3 39.6 A 39.6 47.69
4 75 A 45 55.9
5 45 A 45 46.83
6 39.6 A 45 46.88
7 54.2 A 50 50.36
8 50 A 50 42.82
9 28.1 A 40 36.97
10 40 A 40 43.02
11 52.1 B 40 42.14
12 31.3 B 31.3 29.68
13 15.6 B 29.2 24.83
14 29.2 B 29.2 32.61
15 43.8 B 29.2 33.8
compliance (local regression)
22.51
28.81
34.49
40.55
44.56
46.31
46.14
43.98
42.21
40.06
37.24
33.11
29.56
29.11
28.94
# ... up to 61 more rows
# three more cases
```

### 4.3.3 Transform values at the begining of a phase

The `first_of`

helper function is specifically designed to replace values at or around the beginning of a phase. The first argument is a logical vector defining a selection criterion. The `positions`

argument is a vector of positions to be addressed. Negative numbers refer to positions before and positive numbers to positions after the selection criteria. This is useful, for example, if you want to discard the first two measurements of a phase.

Here is an example that replaces the values at the beginning of phase A and the value after that to missing (NA), and also replaces the values at the beginning of phase B and the value before that to NA:

```
%>%
byHeart2011 transform(
values = replace(values, first_of(phase == "A", 0:1), NA),
values = replace(values, first_of(phase == "B", -1:0), NA)
)
```

```
#A single-case data frame with 11 cases
Lisa (Turkish): values mt phase ｜ Patrick (Spanish): values mt phase ｜
<NA> 1 A ｜ <NA> 1 A ｜
<NA> 2 A ｜ <NA> 2 A ｜
0 3 A ｜ 3 3 A ｜
0 4 A ｜ 0 4 A ｜
<NA> 5 A ｜ <NA> 5 A ｜
<NA> 6 B ｜ <NA> 6 B ｜
5 7 B ｜ 8 7 B ｜
6 8 B ｜ 8 8 B ｜
7 9 B ｜ 8 9 B ｜
10 10 B ｜ 12 10 B ｜
10 11 B ｜ 13 11 B ｜
15 12 B ｜ 13 12 B ｜
16 13 B ｜ 15 13 B ｜
14 14 B ｜ 14 14 B ｜
17 15 B ｜ 15 15 B ｜
# ... up to 11 more rows
# nine more cases
```