# 9 Piecewise linear regressions

In a piecewise-regression analysis (sometimes called segmented regression) a dataset is split at a particular break point and the regression parameters (intercept and slopes) are calculated separately for the data before and after the break point. This is done because we assume that there is a qualitative change at the break point that affects the intercept and slope. This approach is well suited to the analysis of single-case data which are from a statistical point of view time-series data segmented into phases. A general model for single-case data based on the piecewise regression approach has been proposed by Huitema and McKean Huitema & Mckean (2000). They refer to two-phase single-case designs with a pre-intervention phase containing some measurements before the start of the intervention (A-phase) and an intervention phase containing measurements starting at the beginning of the intervention and continuing throughout intervention (B-phase).

In this model, four parameters predict the outcome at a specific measurement point:

The performance at the beginning of the study (

**intercept**),a developmental effect leading to a continuous increase throughout all measurements (

**trend effect**),an intervention effect leading to an immediate and constant increase in performance (

**level effect**), anda second intervention effect that evolves continuously with the beginning of the intervention (

**slope effect**).

*scan* provides an implementation based on this piecewise-regression approach. Though the original model is extended by several factors:

- multiple phase designs
- additional (control) variables
- autoregression modeling
- logistic, binomial, and poisson distributed dependent variables and error terms
- multivariate analyzes for analyzing the effect of an intervention on more than one outcome variable (see Chapter 11).
- multilevel analyzes for multiple cases (see Chapter 10).

## 9.1 The basic plm function

The basic function for applying a regression analyzes to a single-case dataset is `plm`

. This function analyzes one single-case. In its simplest way, `plm`

takes one argument with an *scdf* object and it returns a full piecewise-regression analyzes.

`plm(exampleAB$Johanna)`

```
Piecewise Regression Analysis
Contrast model: W / level = first, slope = first
Fitted a gaussian distribution.
F(3, 16) = 28.69; p = 0.000; R² = 0.843; Adjusted R² = 0.814
B 2.5% 97.5% SE t p delta R²
Intercept 54.400 46.776 62.024 3.890 13.986 0.000
Trend mt 0.100 -3.012 3.212 1.588 0.063 0.951 0.0000
Level phase B 7.858 -3.542 19.258 5.816 1.351 0.195 0.0179
Slope phase B 1.525 -1.642 4.692 1.616 0.944 0.359 0.0087
Autocorrelations of the residuals
lag cr
1 -0.32
2 -0.13
3 -0.01
Formula: values ~ 1 + mt + phaseB + interB
```

## 9.2 Adjusting the model

The plm model is a complex model specifically suited for single-case studies. It entails a series of important parameters. Nevertheless, often we have specific theoretical assumption that do no include some of these parameters. We might, for example, only expect an immediate but not a continuous change from a medical intervention. Therefore, it would not be useful to include the slope-effect into our modelling. Vice versa, we could investigate an intervention that will just develop across time without an immediate change with the intervention start. Here we should drop the level-effect from out model. Even the assumption of a trend-effect can be dropped in cases where we do not expect a serial dependency of the data and we do not assume intervention independent changes within the time-frame of the study.

It is important to keep in mind, that an overly complex model might have negative effects on the test power of an analyses (that is, the probability of detecting an actually existing effect is diminished) (see Wilbert, Lüke, & Börnert-Ringleb, 2022)

### 9.2.1 The `slope`

, `level`

, and `trend`

arguments

The plm function comes with three arguments (`slope`

, `level`

, and `trend`

) to include or drop the respective predictors from the plm model. Buy default, all arguments are set `TRUE`

and a full plm model is applied to the data.

Consider the following data example:

```
<- scdf(
example values = c(A = 55, 58, 53, 50, 52,
B = 55, 68, 68, 81, 67, 78, 73, 72, 78, 81, 78, 71, 85, 80, 76)
)
plm(example)
```

```
Piecewise Regression Analysis
Contrast model: W / level = first, slope = first
Fitted a gaussian distribution.
F(3, 16) = 21.36; p = 0.000; R² = 0.800; Adjusted R² = 0.763
B 2.5% 97.5% SE t p delta R²
Intercept 56.400 48.070 64.730 4.250 13.270 0.000
Trend mt -1.400 -4.801 2.001 1.735 -0.807 0.432 0.0081
Level phase B 16.967 4.510 29.424 6.356 2.670 0.017 0.0890
Slope phase B 2.500 -0.961 5.961 1.766 1.416 0.176 0.0250
Autocorrelations of the residuals
lag cr
1 -0.28
2 0.05
3 -0.11
Formula: values ~ 1 + mt + phaseB + interB
```

The piecewise regression reveals a significant level effect and two non significant effects for trend and slope. In a further analyses we would like to put the slope effect out of the equation. The easiest way to do this is to set the `slope`

argument to `FALSE`

.

`plm(example, slope = FALSE)`

```
Piecewise Regression Analysis
Contrast model: W / level = first, slope = first
Fitted a gaussian distribution.
F(2, 17) = 29.30; p = 0.000; R² = 0.775; Adjusted R² = 0.749
B 2.5% 97.5% SE t p delta R²
Intercept 51.572 46.455 56.690 2.611 19.752 0.000
Trend mt 1.014 0.364 1.664 0.332 3.057 0.007 0.1236
Level phase B 10.329 1.674 18.983 4.416 2.339 0.032 0.0724
Autocorrelations of the residuals
lag cr
1 -0.07
2 0.06
3 -0.17
Formula: values ~ 1 + mt + phaseB
```

In the resulting estimations the trend and level effects are now significant. The model estimated a trend effect of 1.01 points per measurement time and a level effect of 10.33 points. That is, with the beginning of the intervention (the B-phase) the score increases by 15.38 points (5 x 1.01 + 10.33).

## 9.3 Adding additional predictors

In more complex analyses, additional predictors can be included in the piecewise regression model.

To do this, we have to change the regression formula ‘manually’ by applying the `update`

argument. The `update`

argument allows to change the underlying regression formula. To add a new variable named for example `newVar`

, set `update = .~. + newVar`

. The `.~.`

part takes the internally build model formula (based on the number of phases in your model and the setting of the `slope`

, `level`

, and `trend`

arguments) and `+ newVar`

adds a variable called `newVar`

to the equation.

Here is an example adding the control variable `cigarrets`

to the model:

`plm(exampleAB_add, update = .~. + cigarrets)`

```
Piecewise Regression Analysis
Contrast model: W / level = first, slope = first
Fitted a gaussian distribution.
F(4, 35) = 5.87; p = 0.001; R² = 0.402; Adjusted R² = 0.333
B 2.5% 97.5% SE t p delta R²
Intercept 48.971 43.387 54.555 2.849 17.189 0.000
Trend day 0.392 -0.221 1.005 0.313 1.253 0.218 0.0269
Level phase Medication 3.459 -3.382 10.301 3.490 0.991 0.328 0.0168
Slope phase Medication -0.294 -0.972 0.384 0.346 -0.850 0.401 0.0124
cigarrets -0.221 -1.197 0.755 0.498 -0.443 0.660 0.0034
Autocorrelations of the residuals
lag cr
1 0.20
2 -0.19
3 -0.16
Formula: wellbeing ~ day + phaseMedication + interMedication + cigarrets
```

The output of the plm-function shows the resulting formula for the regression model that includes the cigarrets variable:

`Formula: wellbeing ~ day + phaseMedication + interMedication + cigarrets`

## 9.4 Dummy models

The `model`

argument is used to code the *dummy variables*. These *dummy variables* are used to compute the slope and level effects of the *phase* variable.

The *phase* variable is categorical, identifying the phase of each measurement. Typically, categorical variables are implemented by means of dummy variables. In a piecewise regression model two phase effects have to be estimated: a level effect and a slope effect. The level effect is implemented quite straight forward: for each phase beginning with the second phase a new dummy variable is created with values of zero for all measurements except the measurements of the phase in focus where values of one are set.

values | phase | mt | level B |
---|---|---|---|

3 | A | 1 | 0 |

6 | A | 2 | 0 |

4 | A | 3 | 0 |

7 | A | 4 | 0 |

5 | B | 5 | 1 |

3 | B | 6 | 1 |

4 | B | 7 | 1 |

6 | B | 8 | 1 |

3 | B | 9 | 1 |

For estimating the *slope effect* of each phase, another kind of dummy variables have to be created. Like the dummy variables for level effects the values are set to zero for all measurements except the ones of the phase in focus. Here, values start to increase with every measurement until the end of the phase.

Various suggestions have been made regarding the way in which these values increase (see Huitema & Mckean, 2000). The *B&L-B* model starts with a one at the first measurement of the phase and increases with every measurement while the *H-M* model starts with a zero.

slope B |
|||||
---|---|---|---|---|---|

values | phase | mt | level B | model B&L-M | model H-M |

3 | A | 1 | 0 | 0 | 0 |

6 | A | 2 | 0 | 0 | 0 |

4 | A | 3 | 0 | 0 | 0 |

7 | A | 4 | 0 | 0 | 0 |

5 | B | 5 | 1 | 1 | 0 |

3 | B | 6 | 1 | 2 | 1 |

4 | B | 7 | 1 | 3 | 2 |

6 | B | 8 | 1 | 4 | 3 |

3 | B | 9 | 1 | 5 | 4 |

Applying the *H-M* model will give you a “pure” level-effect while the *B&L-B* model will provide an estimation of the level-effect that is actually the level-effect plus on times the slope-effect (as the the model assumes that the slope variable is *1* at the first measurement of the B-phase). For most studies, the *H-M* model is more appropriate.

Still, we have to be aware of another aspect. Usually, measurement-times in single-case designs are coded as starting with *1* and increasing in integers (e.g., 1, 2, 3, …). At the same time, the estimation of the trend-effect is based on the measurement-time variable. In that case, the estimation of the model intercept (usually interpreted as the value at the start of the study) actually depicts the estimation of the start value plus one times the trend-effect. Therefore, I implemented the *W* model (since scan version `0.54.4`

). Here, the trend-effect is estimated for a measurement-time variable that starts with *0*. As a result the intercept will then represent the estimated value at the first measurement fo the study. The *W* model handles the slope estimation the same way as the *H-M* model. Since scan version `0.54.4`

the *W* model is the default.

mt |
slope |
|||||
---|---|---|---|---|---|---|

values | phase | level | B&L-M and H-M | W | B&L-M | H-M and W |

3 | A | 0 | 1 | 0 | 0 | 0 |

6 | A | 0 | 2 | 1 | 0 | 0 |

4 | A | 0 | 3 | 2 | 0 | 0 |

7 | A | 0 | 4 | 3 | 0 | 0 |

5 | B | 1 | 5 | 4 | 1 | 0 |

3 | B | 1 | 6 | 5 | 2 | 1 |

4 | B | 1 | 7 | 6 | 3 | 2 |

6 | B | 1 | 8 | 7 | 4 | 3 |

3 | B | 1 | 9 | 8 | 5 | 4 |

## 9.5 Designs with more than two phases: Setting the right contrasts

With single-case studies with more than two phases it gets a bit more complicated. Applying the models described above to three phases would result in a comparison between each phase and the first phase (usually phase A). That is, the regression weights and significance tests indicate the differences between each phase and the phase A values. Another common use is to compare the effects of one phase with the preceding phase.

Since scan version `0.54.4`

plm allows to set a contrast argument. `contrast = "first"`

(the default) will compare all slope and level-effects to the values in the first phase. `contrast = "preceding"`

will compare the slope and level-effects to the preceding phase.

For the *preceding contrast*, the dummy variable for the level-effect is set to zero for all phases preceding the phase in focus and set to one for all remaining measurements. Similar, the dummy variable for the slope-effect is set to zero for all phases preceding the one in focus and starts with one for the first measurement of the target phase and increases until the last measurement of the case.

You can set the contrast differently for the level and slope effects with the arguments `constrast_level`

and `contrast_slope`

. Both can be either `"first"`

or `"preceding"`

.

(Note: Prior to scan version `0.54.4`

, the option `model = "JW"`

was identical to `model = "B&L-B", contrast = "preceding"`

).

contrast first |
contrast preceeding |
|||||||||
---|---|---|---|---|---|---|---|---|---|---|

level |
slope |
level |
slope |
|||||||

values | phase | mt | B | C | B | C | B | C | B | C |

3 | A | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

6 | A | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

4 | A | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

7 | A | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

5 | B | 5 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 |

3 | B | 6 | 1 | 0 | 2 | 0 | 1 | 0 | 2 | 0 |

4 | B | 7 | 1 | 0 | 3 | 0 | 1 | 0 | 3 | 0 |

6 | B | 8 | 1 | 0 | 4 | 0 | 1 | 0 | 4 | 0 |

3 | B | 9 | 1 | 0 | 5 | 0 | 1 | 0 | 5 | 0 |

7 | C | 10 | 0 | 1 | 0 | 1 | 1 | 1 | 6 | 1 |

5 | C | 11 | 0 | 1 | 0 | 2 | 1 | 1 | 7 | 2 |

6 | C | 12 | 0 | 1 | 0 | 3 | 1 | 1 | 8 | 3 |

4 | C | 13 | 0 | 1 | 0 | 4 | 1 | 1 | 9 | 4 |

8 | C | 14 | 0 | 1 | 0 | 5 | 1 | 1 | 10 | 5 |

## 9.6 Understanding and interpreting contrasts

In this section, we will calculate four plm models with different contrast settings for the same single-case data.

The example scdf is the case ‘Marie’ from the *exampleABC* scdf (`exampleABC$Marie`

)

The dark-red lines indicate the intercept and slopes when calculated separately for each phase. They are:

intercept | slope | n | |
---|---|---|---|

phase A | 60.618 | -1.915 | 10 |

phase B | 74.855 | -0.612 | 10 |

phase C | 68.873 | -0.194 | 10 |

Now we estimate a plm model with four contrast settings:

Contrast level | Contrast slope | intercept | trend | level B | level C | slope B | slope C |
---|---|---|---|---|---|---|---|

first | first | 60.618 | -1.915 | 33.388 | 46.558 | 1.303 | 1.721 |

preceding | preceding | 60.618 | -1.915 | 33.388 | 0.139 | 1.303 | 0.418 |

first | preceding | 60.618 | -1.915 | 33.388 | 33.527 | 1.303 | 0.418 |

preceding | first | 60.618 | -1.915 | 33.388 | 13.170 | 1.303 | 1.721 |

### 9.6.1 Phase B estimates

All regression models in Table 9.2 have the same estimates for `intercept`

and `trend`

. These are not affected by the contrasts and are identical to those for phase A in Table 9.1. In addition, in Table 9.2, the estimates for `levelB`

and `slopeC`

are identical since all models contrast the same phase (the first and the preceding phase are both phase A). The values here can be calculated from Table 9.1^{1}:

\[ levelB = intercept_{phaseB} - (intercept_{phaseA} + n_{PhaseA} * slope_{phaseA}) \tag{9.1}\]

\[ 33.388 \approx 74.855 - (60.618 + 10*-1.915) \]

\[ slopeB = slope_{phaseB} - slope_{phaseA} \tag{9.2}\]

\[ 1.303 \approx -1.915 - (-0.612) \]

### 9.6.2 Phase C estimates

The `levelC`

and `slopeC`

estimates of the regression models in Table 9.2 are different for the various contrast models. Depending on the contrast setting, the estimates “answer” a different question. Table 9.3 provides interpretation help.

Contrast level | Contrast slope | Interpretation of level C effect | Interpretation slope C effect |
---|---|---|---|

first | first | What would be the value if phase A had continued until to the start of phase C and what is the difference to the actual value at the first measurement of phase C? | What is the difference between the slopes of phase C and A^{2}? |

preceding | preceding | What would be the value if phase B had continued to the start of phase C and what is the difference to the actual value at the first measurement of phase C? | What is the difference between the slopes of phase C and B? |

first | preceding | What would be the value if phase A had continued until the start of phase C (assuming a slope effect but no level effect in phase B)? And what is the difference to the actual value at the first measurement of phase C? | What is the difference between the slopes of phase C and B? |

preceding | first | What would be the value if phase B had continued until the start of phase C (assuming a level but no slope effect in phase B)? And what is the difference to the actual value at the first measurement of phase C? | What is the difference between the slopes of phase C and A? |

All four models are mathematically equivalent, i.e. they produce the same estimates of the dependent variable. Bellow I will show how the estimates from the piecewise regression models relate to the simple regression estimates from Table 9.1. These are \(intercept_{phaseC} = 68.873\) and \(slope_{phaseC} = -0.194\).

*Level first and slope first contrasts*

Table 9.2 estimates a `levelC`

increase of 46.558 compared to phase A (the intercept) and a `slopeC`

increase of 1.721.

\[ levelC = intercept_{phaseC} - (Intercept_{phaseA} + n_{phaseA+B} * slope_{phaseA}) \tag{9.3}\]

\[46.558 \approx 68.873 - (60.618 + 20*-1.915) \]

\[ slopeC = slope_{phaseC} - slope_{phaseA} \tag{9.4}\]

\[1.721 \approx -0.194 - (-1.915)\]

*Level preceding and slope preceding contrasts*

Table 9.2 estimates a `levelC`

increase of 0.139 compared to phase B and a `slopeC`

increase of 0.418.

\[ levelC = intercept_{phaseC} - (intercept_{phaseB} + n_{phaseB} * slope_{phaseB}) \tag{9.5}\]

\[0.139 \approx 68.873 - (74.855 + 10*-0.612)\]

\[ slopeC = slope_{phaseC} - slope_{phaseB} \tag{9.6}\]

\[0.418 \approx -0.194 - (-0.612)\]

*Level first and slope preceding contrasts*

Table 9.2 estimates a `levelC`

increase of 33.388 compared to phase A and a `slopeC`

increase of 0.418.

\[ levelC = intercept_{phaseC} - (intercept_{phaseA} + n_{phaseA} * slope_{phaseA} + n_{phaseB} * slope_{phaseB}) \tag{9.7}\]

\[ 33.527 \approx 68.873 - (60.618 + 10 * -1.915 + 10 * -0.612) \]

\[ slopeC = slope_{phaseC} - slope_{phaseB} \tag{9.8}\]

\[0.418 \approx -0.194 - (-0.612)\]

*Level preceding and slope first contrasts*

Table 9.2 estimates a `levelC`

increase of 13.170 compared to phase B and a `slopeC`

increase of 1.721.

\[ levelC = intercept_{phaseC} - (intercept_{phaseB} + n_{phaseB} * slope_{phaseA}) \tag{9.9}\]

\[ 13.170\approx 68.873 - (74.855 + 10*-1.915) \]

\[ slopeC = slope_{phaseC} - slope_{phaseA} \tag{9.10}\]

\[ 1.721 \approx -0.194 - (-1.915) \]