4  Working with single-case data frames

4.1 Select cases

You can extract one or more single-cases from an scdf with multiple cases in two ways.

The first method follows the basic rules of the R syntax. If the case has a name, you can address it with the $ operator

Huber2014$David

or you can use squared brackets to select by the number (its position) of a case

Huber2014[1] #extracts case 1
Huber2014[2:3] #extracts cases 2 and 3
new.huber2014 <- Huber2014[c(1, 4)] #extracts cases 1 and 4
new.huber2014
#A single-case data frame with two cases

 Adam: mt compliance phase | David: mt compliance phase |
        1         25     A |         1       65.6     A |
        2       20.8     A |         2       37.5     A |
        3       39.6     A |         3       58.3     A |
        4         75     A |         4       72.9     A |
        5         45     A |         5       33.3     A |
        6       39.6     A |         6       59.4     A |
        7       54.2     A |         7       77.1     A |
        8         50     A |         8       54.2     A |
        9       28.1     A |         9       68.8     A |
       10         40     A |        10       43.8     A |
       11       52.1     B |        11       62.5     B |
       12       31.3     B |        12       64.6     B |
       13       15.6     B |        13       60.4     B |
       14       29.2     B |        14       81.3     B |
       15       43.8     B |        15       79.2     B |
# ... up to 61 more rows

The second method is to use the select_cases function.

The select_cases function call

select_cases(scdf, …)

Note

Since version 0.53, scan includes functions to work with pipe-operators. scan imports the pipe operator %>% from the magrittr package. Alternatively, you can use R’s native pipe operator |>.

The select_cases() function takes case names and/or numbers for selecting cases:

# With pipes:
Huber2014 %>%
  select_cases(Adam, Berta, 4) %>%
  summary()
#A single-case data frame with three cases

       Measurements Design
 Adam            37    A-B
 Berta           29    A-B
 David           76    A-B

Variable names:
compliance <dependent variable>
phase <phase variable>
mt <measurement-time variable>

Note: Behavioral data (compliance in percent). 

Author of data: Christian Huber 
# 1. Take the scdf Huber2014,
# 2. select the cases Adam, Berta and case number four,
# 3. show a summary of the remaining cases in the study. 

Case names can also be defined within a specific range by the colon operator:

Huber2014 %>%
  select_cases(Berta:David) %>%
  summary()
#A single-case data frame with three cases

           Measurements Design
 Berta               29    A-B
 Christian           76    A-B
 David               76    A-B

Variable names:
compliance <dependent variable>
phase <phase variable>
mt <measurement-time variable>

Note: Behavioral data (compliance in percent). 

Author of data: Christian Huber 

4.2 Select measurements

The subset function call

subset(x, …)

Note

The subset function is a method for the generic subset function. To call the help file you have to add the class to the function name: ?subset.scdf

The subset() function helps to extract measurements (or rows) from an scdf according to specific criteria.

Subset takes an scdf as the first argument and a logical expression (filter) as the second argument. Only measurements for which the logical argument is true are included in the returned scdf object.

For example, the scdf Huber2014 has a variable compliance and we want to keep measurements where compliance is greater than 10 because we assume the others are outliers:

Huber2014 %>%
  subset(compliance > 10) %>%
  summary()
#A single-case data frame with four cases

           Measurements Design
 Adam                37    A-B
 Berta               20    A-B
 Christian           76    A-B
 David               76    A-B

Variable names:
compliance <dependent variable>
phase <phase variable>
mt <measurement-time variable>

Note: Behavioral data (compliance in percent). 

Author of data: Christian Huber 

In a more complex example, we want to keep only values less than 60 when they are in phase A, or values equal to or greater than 60 when they are in phase B:

exampleAB %>%
  subset((values < 60 & phase == "A") | (values >= 60 & phase == "B")) %>%
  summary()
#A single-case data frame with three cases

          Measurements Design
 Johanna            20    A-B
 Karolina           18    A-B
 Anja               19    A-B

Variable names:
values <dependent variable>
phase <phase variable>
mt <measurement-time variable>

Note: Randomly created data with normal distributed dependent variable. 

4.3 Change and create variables

The transform function call

transform(_data, …)

With the help of the transform() function, you can add new variables or change existing variables for each case of an scdf. This can be useful if you want to

  • z-standardize a variable,
  • calculate a new variable as the sum of two existing variables
  • convert a frequency to a percentage,

or in many other cases.

Note

The transform function is a method for the generic transform function. To call the help file you have to add the class to the function name: ?transform.scdf

Here is an example of standardizing the dependent variable “values”:

exampleAB_z <- transform(
  exampleAB, values = (values-mean(values)) / sd(values)
)

# note: alternatively for the same result:
# exampleAB_z <- transform(exampleAB, values = scale(values))

Here is an example where a new percentage variable is added and the measurement times shifted to start with 0:

exampleAB_score %>%
  transform(
    percentage = values / trials * 100,
    mt = mt - mt[1]
  )
#A single-case data frame with three cases

 Christiano: values trials mt phase percentage
                  1     20  0     A          5
                  3     20  1     A         15
                  3     20  2     A         15
                  3     20  3     A         15
                  5     20  4     A         25
                  3     20  5     A         15
                  0     20  6     A          0
                  2     20  7     A         10
                  4     20  8     A         20
                  3     20  9     A         15
                 12     20 10     B         60
                 13     20 11     B         65
                 15     20 12     B         75
                 11     20 13     B         55
                 15     20 14     B         75
# ... up to 15 more rows
#  two more cases

4.3.1 all_cases

The all_cases helper function returns the values of a variable across all cases. This allows for calculations where you need values within a case and values across cases, for example when you want to standardize a variable based on all cases:

exampleAB %>%
  transform(
    values = (values - mean(all_cases(values))) / sd(all_cases(values))
  ) %>%
  setNames(paste0(names(exampleAB), "_z")) %>%
  c(exampleAB) %>%
  smd()
Standardized mean differences

                            Johanna_z Karolina_z Anja_z Johanna Karolina  Anja
mA                             -1.194     -1.431 -1.279   54.60    51.80 53.60
mB                              0.454      0.398  0.449   74.13    73.47 74.07
sdA                             0.203      0.577  0.257    2.41     6.83  3.05
sdB                             0.755      0.824  0.639    8.94     9.76  7.57
sd cohen                        0.553      0.711  0.487    6.55     8.43  5.77
sd hedges                       0.673      0.776  0.577    7.97     9.19  6.83
Glass' delta                    8.111      3.171  6.711    8.11     3.17  6.71
Hedges' g                       2.451      2.357  2.996    2.45     2.36  3.00
Hedges' g correction            2.348      2.258  2.869    2.35     2.26  2.87
Hedges' g durlak correction     2.227      2.142  2.722    2.23     2.14  2.72
Cohen's d                       2.983      2.572  3.545    2.98     2.57  3.54
# 1. Take the exampleAB scdf,
# 2. Z-standardise the values of each case based on all measurements,
# 3. rename the cases by adding a "_z" suffix,
# 4. add the original untransformed cases,
# 5. analyse the data by calculating measures of standardized mean differences.

4.3.2 Smoothing

For smoothing the data dependent variable, transform has a number of helper functions:

  • moving_mean calculates the moving median of a series of values. The lag argument specifies the number of values from which to calculate the mean (the default is 1, where the mean is calculated from a value and a measurement before and after that value),
  • moving_median is the same as before, but calculates the median instead of the mean,
  • local_regression regresses each value on the surrounding values. The argument f defines the fraction of the values (the default f = 0.2 considers the surrounding 20% of the values). You must also provide the measurement time variable with the argument mt.
transform(Huber2014,
  "compliance (moving median)" = moving_median(compliance),
  "compliance (moving mean)" = moving_mean(compliance),
  "compliance (local regression)" = local_regression(compliance, mt = mt)
)
#A single-case data frame with four cases

 Adam: mt compliance phase compliance (moving median) compliance (moving mean)
        1         25     A                         25                       25
        2       20.8     A                         25                    28.47
        3       39.6     A                       39.6                    47.69
        4         75     A                         45                     55.9
        5         45     A                         45                    46.83
        6       39.6     A                         45                    46.88
        7       54.2     A                         50                    50.36
        8         50     A                         50                    42.82
        9       28.1     A                         40                    36.97
       10         40     A                         40                    43.02
       11       52.1     B                         40                    42.14
       12       31.3     B                       31.3                    29.68
       13       15.6     B                       29.2                    24.83
       14       29.2     B                       29.2                    32.61
       15       43.8     B                       29.2                     33.8
 compliance (local regression)
                         22.51
                         28.81
                         34.49
                         40.55
                         44.56
                         46.31
                         46.14
                         43.98
                         42.21
                         40.06
                         37.24
                         33.11
                         29.56
                         29.11
                         28.94
# ... up to 61 more rows
#  three more cases

4.3.3 Transform values at the begining of a phase

The first_of helper function is specifically designed to replace values at or around the beginning of a phase. The first argument is a logical vector defining a selection criterion. The positions argument is a vector of positions to be addressed. Negative numbers refer to positions before and positive numbers to positions after the selection criteria. This is useful, for example, if you want to discard the first two measurements of a phase.

Here is an example that replaces the values at the beginning of phase A and the value after that to missing (NA), and also replaces the values at the beginning of phase B and the value before that to NA:

byHeart2011 %>%
  transform(
    values = replace(values, first_of(phase == "A", 0:1), NA),
    values = replace(values, first_of(phase == "B", -1:0), NA)
  )
#A single-case data frame with 11 cases

 Lisa (Turkish): values mt phase | Patrick (Spanish): values mt phase |
                   <NA>  1     A |                      <NA>  1     A |
                   <NA>  2     A |                      <NA>  2     A |
                      0  3     A |                         3  3     A |
                      0  4     A |                         0  4     A |
                   <NA>  5     A |                      <NA>  5     A |
                   <NA>  6     B |                      <NA>  6     B |
                      5  7     B |                         8  7     B |
                      6  8     B |                         8  8     B |
                      7  9     B |                         8  9     B |
                     10 10     B |                        12 10     B |
                     10 11     B |                        13 11     B |
                     15 12     B |                        13 12     B |
                     16 13     B |                        15 13     B |
                     14 14     B |                        14 14     B |
                     17 15     B |                        15 15     B |
# ... up to 11 more rows
#  nine more cases