# 9  Piecewise linear regressions

In a piecewise-regression analysis (sometimes called segmented regression) a dataset is split at a particular break point and the regression parameters (intercept and slopes) are calculated separately for the data before and after the break point. This is done because we assume that there is a qualitative change at the break point that affects the intercept and slope. This approach is well suited to the analysis of single-case data which are from a statistical point of view time-series data segmented into phases. A general model for single-case data based on the piecewise regression approach has been proposed by Huitema and McKean Huitema & Mckean (2000). They refer to two-phase single-case designs with a pre-intervention phase containing some measurements before the start of the intervention (A-phase) and an intervention phase containing measurements starting at the beginning of the intervention and continuing throughout intervention (B-phase).

In this model, four parameters predict the outcome at a specific measurement point:

1. The performance at the beginning of the study (intercept),

2. a developmental effect leading to a continuous increase throughout all measurements (trend effect),

3. an intervention effect leading to an immediate and constant increase in performance (level effect), and

4. a second intervention effect that evolves continuously with the beginning of the intervention (slope effect). scan provides an implementation based on this piecewise-regression approach. Though the original model is extended by several factors:

• multiple phase designs
• autoregression modeling
• logistic, binomial, and poisson distributed dependent variables and error terms
• multivariate analyzes for analyzing the effect of an intervention on more than one outcome variable (see Chapter 11).
• multilevel analyzes for multiple cases (see Chapter 10).

## 9.1 The basic plm function

The plm function call

plm(data, dvar, pvar, mvar, AR = 0, model = c(“W”, “H-M”, “B&L-B”, “JW”), family = “gaussian”, trend = TRUE, level = TRUE, slope = TRUE, contrast = c(“first”, “preceding”), contrast_level = c(NA, “first”, “preceding”), contrast_slope = c(NA, “first”, “preceding”), formula = NULL, update = NULL, na.action = na.omit, r_squared = TRUE, var_trials = NULL, dvar_percentage = FALSE, …)

The basic function for applying a regression analyzes to a single-case dataset is plm. This function analyzes one single-case. In its simplest way, plm takes one argument with an scdf object and it returns a full piecewise-regression analyzes.

plm(exampleAB$Johanna) Piecewise Regression Analysis Contrast model: W / level = first, slope = first Fitted a gaussian distribution. F(3, 16) = 28.69; p = 0.000; R² = 0.843; Adjusted R² = 0.814 B 2.5% 97.5% SE t p delta R² Intercept 54.400 46.776 62.024 3.890 13.986 0.000 Trend mt 0.100 -3.012 3.212 1.588 0.063 0.951 0.0000 Level phase B 7.858 -3.542 19.258 5.816 1.351 0.195 0.0179 Slope phase B 1.525 -1.642 4.692 1.616 0.944 0.359 0.0087 Autocorrelations of the residuals lag cr 1 -0.32 2 -0.13 3 -0.01 Formula: values ~ 1 + mt + phaseB + interB ## 9.2 Adjusting the model The plm model is a complex model specifically suited for single-case studies. It entails a series of important parameters. Nevertheless, often we have specific theoretical assumption that do no include some of these parameters. We might, for example, only expect an immediate but not a continuous change from a medical intervention. Therefore, it would not be useful to include the slope-effect into our modelling. Vice versa, we could investigate an intervention that will just develop across time without an immediate change with the intervention start. Here we should drop the level-effect from out model. Even the assumption of a trend-effect can be dropped in cases where we do not expect a serial dependency of the data and we do not assume intervention independent changes within the time-frame of the study. It is important to keep in mind, that an overly complex model might have negative effects on the test power of an analyses (that is, the probability of detecting an actually existing effect is diminished) ### 9.2.1 The slope, level, and trend arguments The plm function comes with three arguments (slope, level, and trend) to include or drop the respective predictors from the plm model. Buy default, all arguments are set TRUE and a full plm model is applied to the data. Consider the following data example: example <- scdf( values = c(A = 55, 58, 53, 50, 52, B = 55, 68, 68, 81, 67, 78, 73, 72, 78, 81, 78, 71, 85, 80, 76) ) plm(example) Piecewise Regression Analysis Contrast model: W / level = first, slope = first Fitted a gaussian distribution. F(3, 16) = 21.36; p = 0.000; R² = 0.800; Adjusted R² = 0.763 B 2.5% 97.5% SE t p delta R² Intercept 56.400 48.070 64.730 4.250 13.270 0.000 Trend mt -1.400 -4.801 2.001 1.735 -0.807 0.432 0.0081 Level phase B 16.967 4.510 29.424 6.356 2.670 0.017 0.0890 Slope phase B 2.500 -0.961 5.961 1.766 1.416 0.176 0.0250 Autocorrelations of the residuals lag cr 1 -0.28 2 0.05 3 -0.11 Formula: values ~ 1 + mt + phaseB + interB The piecewise regression reveals a significant level effect and two non significant effects for trend and slope. In a further analyses we would like to put the slope effect out of the equation. The easiest way to do this is to set the slope argument to FALSE. plm(example, slope = FALSE) Piecewise Regression Analysis Contrast model: W / level = first, slope = first Fitted a gaussian distribution. F(2, 17) = 29.30; p = 0.000; R² = 0.775; Adjusted R² = 0.749 B 2.5% 97.5% SE t p delta R² Intercept 51.572 46.455 56.690 2.611 19.752 0.000 Trend mt 1.014 0.364 1.664 0.332 3.057 0.007 0.1236 Level phase B 10.329 1.674 18.983 4.416 2.339 0.032 0.0724 Autocorrelations of the residuals lag cr 1 -0.07 2 0.06 3 -0.17 Formula: values ~ 1 + mt + phaseB In the resulting estimations the trend and level effects are now significant. The model estimated a trend effect of 1.01 points per measurement time and a level effect of 10.33 points. That is, with the beginning of the intervention (the B-phase) the score increases by 15.38 points (5 x 1.01 + 10.33). ## 9.3 Adding additional predictors In more complex analyses, additional predictors can be included in the piecewise regression model. To do this, we have to change the regression formula ‘manually’ by applying the update argument. The update argument allows to change the underlying regression formula. To add a new variable named for example newVar, set update = .~. + newVar. The .~. part takes the internally build model formula (based on the number of phases in your model and the setting of the slope, level, and trend arguments) and + newVar adds a variable called newVar to the equation. Here is an example adding the control variable cigarrets to the model: plm(exampleAB_add, update = .~. + cigarrets) Piecewise Regression Analysis Contrast model: W / level = first, slope = first Fitted a gaussian distribution. F(4, 35) = 5.87; p = 0.001; R² = 0.402; Adjusted R² = 0.333 B 2.5% 97.5% SE t p delta R² Intercept 48.971 43.387 54.555 2.849 17.189 0.000 Trend day 0.392 -0.221 1.005 0.313 1.253 0.218 0.0269 Level phase Medication 3.459 -3.382 10.301 3.490 0.991 0.328 0.0168 Slope phase Medication -0.294 -0.972 0.384 0.346 -0.850 0.401 0.0124 cigarrets -0.221 -1.197 0.755 0.498 -0.443 0.660 0.0034 Autocorrelations of the residuals lag cr 1 0.20 2 -0.19 3 -0.16 Formula: wellbeing ~ day + phaseMedication + interMedication + cigarrets The output of the plm-function shows the resulting formula for the regression model that includes the cigarrets variable: Formula: wellbeing ~ day + phaseMedication + interMedication + cigarrets ## 9.4 Dummy models The model argument is used to code the dummy variables. These dummy variables are used to compute the slope and level effects of the phase variable. The phase variable is categorical, identifying the phase of each measurement. Typically, categorical variables are implemented by means of dummy variables. In a piecewise regression model two phase effects have to be estimated: a level effect and a slope effect. The level effect is implemented quite straight forward: for each phase beginning with the second phase a new dummy variable is created with values of zero for all measurements except the measurements of the phase in focus where values of one are set. values phase mt level B 3 A 1 0 6 A 2 0 4 A 3 0 7 A 4 0 5 B 5 1 3 B 6 1 4 B 7 1 6 B 8 1 3 B 9 1 For estimating the slope effect of each phase, another kind of dummy variables have to be created. Like the dummy variables for level effects the values are set to zero for all measurements except the ones of the phase in focus. Here, values start to increase with every measurement until the end of the phase. Various suggestions have been made regarding the way in which these values increase . The B&L-B model starts with a one at the first measurement of the phase and increases with every measurement while the H-M model starts with a zero. slope B values phase mt level B model B&L-M model H-M 3 A 1 0 0 0 6 A 2 0 0 0 4 A 3 0 0 0 7 A 4 0 0 0 5 B 5 1 1 0 3 B 6 1 2 1 4 B 7 1 3 2 6 B 8 1 4 3 3 B 9 1 5 4 Applying the H-M model will give you a “pure” level-effect while the B&L-B model will provide an estimation of the level-effect that is actually the level-effect plus on times the slope-effect (as the the model assumes that the slope variable is 1 at the first measurement of the B-phase). For most studies, the H-M model is more appropriate. Still, we have to be aware of another aspect. Usually, measurement-times in single-case designs are coded as starting with 1 and increasing in integers (e.g., 1, 2, 3, …). At the same time, the estimation of the trend-effect is based on the measurement-time variable. In that case, the estimation of the model intercept (usually interpreted as the value at the start of the study) actually depicts the estimation of the start value plus one times the trend-effect. Therefore, I implemented the W model (since scan version 0.54.4). Here, the trend-effect is estimated for a measurement-time variable that starts with 0. As a result the intercept will then represent the estimated value at the first measurement fo the study. The W model handles the slope estimation the same way as the H-M model. Since scan version 0.54.4 the W model is the default. mt slope values phase level B&L-M and H-M W B&L-M H-M and W 3 A 0 1 0 0 0 6 A 0 2 1 0 0 4 A 0 3 2 0 0 7 A 0 4 3 0 0 5 B 1 5 4 1 0 3 B 1 6 5 2 1 4 B 1 7 6 3 2 6 B 1 8 7 4 3 3 B 1 9 8 5 4 ## 9.5 Designs with more than two phases: Setting the right contrasts With single-case studies with more than two phases it gets a bit more complicated. Applying the models described above to three phases would result in a comparison between each phase and the first phase (usually phase A). That is, the regression weights and significance tests indicate the differences between each phase and the phase A values. Another common use is to compare the effects of one phase with the preceding phase. Since scan version 0.54.4 plm allows to set a contrast argument. contrast = "first" (the default) will compare all slope and level-effects to the values in the first phase. contrast = "preceding" will compare the slope and level-effects to the preceding phase. For the preceding contrast, the dummy variable for the level-effect is set to zero for all phases preceding the phase in focus and set to one for all remaining measurements. Similar, the dummy variable for the slope-effect is set to zero for all phases preceding the one in focus and starts with one for the first measurement of the target phase and increases until the last measurement of the case. You can set the contrast differently for the level and slope effects with the arguments constrast_level and contrast_slope. Both can be either "first" or "preceding". (Note: Prior to scan version 0.54.4, the option model = "JW" was identical to model = "B&L-B", contrast = "preceding"). contrast first contrast preceeding level slope level slope values phase mt B C B C B C B C 3 A 1 0 0 0 0 0 0 0 0 6 A 2 0 0 0 0 0 0 0 0 4 A 3 0 0 0 0 0 0 0 0 7 A 4 0 0 0 0 0 0 0 0 5 B 5 1 0 1 0 1 0 1 0 3 B 6 1 0 2 0 1 0 2 0 4 B 7 1 0 3 0 1 0 3 0 6 B 8 1 0 4 0 1 0 4 0 3 B 9 1 0 5 0 1 0 5 0 7 C 10 0 1 0 1 1 1 6 1 5 C 11 0 1 0 2 1 1 7 2 6 C 12 0 1 0 3 1 1 8 3 4 C 13 0 1 0 4 1 1 9 4 8 C 14 0 1 0 5 1 1 10 5 ## 9.6 Understanding and interpreting contrasts In this section, we will calculate four plm models with different contrast settings for the same single-case data. The example scdf is the case ‘Marie’ from the exampleABC scdf (exampleABC$Marie)

The dark-red lines indicate the intercept and slopes when calculated separately for each phase. They are:

Table 9.1: Intercept, slope, and number of measurements calculated separately for each phase
intercept slope n
phase A 60.618 -1.915 10
phase B 74.855 -0.612 10
phase C 68.873 -0.194 10

Now we estimate a plm model with four contrast settings:

Table 9.2: Estimates of a piecewise-linear regression with contrast models “first” and “preceding”.
Contrast level Contrast slope intercept trend level B level C slope B slope C
first first 60.618 -1.915 33.388 46.558 1.303 1.721
preceding preceding 60.618 -1.915 33.388 0.139 1.303 0.418
first preceding 60.618 -1.915 33.388 33.527 1.303 0.418
preceding first 60.618 -1.915 33.388 13.170 1.303 1.721

### 9.6.1 Phase B estimates

All regression models in Table 9.2 have the same estimates for intercept and trend. These are not affected by the contrasts and are identical to those for phase A in Table 9.1. In addition, in Table 9.2, the estimates for levelB and slopeC are identical since all models contrast the same phase (the first and the preceding phase are both phase A). The values here can be calculated from Table 9.11:

$levelB = intercept_{phaseB} - (intercept_{phaseA} + n_{PhaseA} * slope_{phaseA}) \tag{9.1}$

$33.388 \approx 74.855 - (60.618 + 10*-1.915)$

$slopeB = slope_{phaseB} - slope_{phaseA} \tag{9.2}$

$1.303 \approx -1.915 - (-0.612)$

### 9.6.2 Phase C estimates

The levelC and slopeC estimates of the regression models in Table 9.2 are different for the various contrast models. Depending on the contrast setting, the estimates “answer” a different question. Table 9.3 provides interpretation help.

Table 9.3: Interpretations of the effect estimates in various contrast conditions
Contrast level Contrast slope Interpretation of level C effect Interpretation slope C effect
first first What would be the value if phase A had continued until to the start of phase C and what is the difference to the actual value at the first measurement of phase C? What is the difference between the slopes of phase C and A2?
preceding preceding What would be the value if phase B had continued to the start of phase C and what is the difference to the actual value at the first measurement of phase C? What is the difference between the slopes of phase C and B?
first preceding What would be the value if phase A had continued until the start of phase C (assuming a slope effect but no level effect in phase B)? And what is the difference to the actual value at the first measurement of phase C? What is the difference between the slopes of phase C and B?
preceding first What would be the value if phase B had continued until the start of phase C (assuming a level but no slope effect in phase B)? And what is the difference to the actual value at the first measurement of phase C? What is the difference between the slopes of phase C and A?

All four models are mathematically equivalent, i.e. they produce the same estimates of the dependent variable. Bellow I will show how the estimates from the piecewise regression models relate to the simple regression estimates from Table 9.1. These are $$intercept_{phaseC} = 68.873$$ and $$slope_{phaseC} = -0.194$$.

Level first and slope first contrasts

Table 9.2 estimates a levelC increase of 46.558 compared to phase A (the intercept) and a slopeC increase of 1.721.

$levelC = intercept_{phaseC} - (Intercept_{phaseA} + n_{phaseA+B} * slope_{phaseA}) \tag{9.3}$

$46.558 \approx 68.873 - (60.618 + 20*-1.915)$

$slopeC = slope_{phaseC} - slope_{phaseA} \tag{9.4}$

$1.721 \approx -0.194 - (-1.915)$

Level preceding and slope preceding contrasts

Table 9.2 estimates a levelC increase of 0.139 compared to phase B and a slopeC increase of 0.418.

$levelC = intercept_{phaseC} - (intercept_{phaseB} + n_{phaseB} * slope_{phaseB}) \tag{9.5}$

$0.139 \approx 68.873 - (74.855 + 10*-0.612)$

$slopeC = slope_{phaseC} - slope_{phaseB} \tag{9.6}$

$0.418 \approx -0.194 - (-0.612)$

Level first and slope preceding contrasts

Table 9.2 estimates a levelC increase of 33.388 compared to phase A and a slopeC increase of 0.418.

$levelC = intercept_{phaseC} - (intercept_{phaseA} + n_{phaseA} * slope_{phaseA} + n_{phaseB} * slope_{phaseB}) \tag{9.7}$

$33.527 \approx 68.873 - (60.618 + 10 * -1.915 + 10 * -0.612)$

$slopeC = slope_{phaseC} - slope_{phaseB} \tag{9.8}$

$0.418 \approx -0.194 - (-0.612)$

Level preceding and slope first contrasts

Table 9.2 estimates a levelC increase of 13.170 compared to phase B and a slopeC increase of 1.721.

$levelC = intercept_{phaseC} - (intercept_{phaseB} + n_{phaseB} * slope_{phaseA}) \tag{9.9}$

$13.170\approx 68.873 - (74.855 + 10*-1.915)$

$slopeC = slope_{phaseC} - slope_{phaseA} \tag{9.10}$

$1.721 \approx -0.194 - (-1.915)$

1. Differences here and in the following calculations are due to rounding errors.↩︎

2. The slope of phase A is the trend effect.↩︎