12  Randomization tests

The rand_test function call

rand_test(data, dvar, pvar, statistic = c(“Mean B-A”, “Mean A-B”, “Median B-A”, “Median A-B”, “Mean |A-B|”), number = 500, complete = FALSE, limit = 5, startpoints = NA, exclude.equal = FALSE, phases = c(1, 2), graph = FALSE, output = NULL, seed = NULL)

The rand_test function computes a randomization test for single or multiple baseline single-case data. The function is based on an algorithm from the SCRT package (Bulté & Onghena, 2008, 2009), but rewritten and extended.

The basic idea of a randomization thest is to think counter factually “Assuming the phase had no influence of the measured data: what would the difference between the phases of my case be, if I would have started phase B at a different time?”. Considering the possible differences between the phases under the assumption that the phase had no influence, how likely are the real phase differences of the original case?

Therefore, a number of new cases are generated with a random start of each phase. That is, these new cases have the same data as the original case but different starting points for each phase. Now, a specific statistic (e.g., the mean difference between the phase A and phase B data) is calculated for each new case. When enough random cases are generated, we also generate a series of new statistic values (e.g. mean differences). The statistic for the original case is now compared to this new statistic values. The percentile of the original statistic within the new generated statistic values is the probability of the original statistic under the assumption of a random distribution of starting points of each phase. This percentile is returned as the p-value of the randomization test analyses.

Figure 12.1: Illustration of a randomization test

12.1 Arguments of the rand_test() function

The statsitics argument defines the statistic on which the comparison of the phases is based on. The following comparisons are possible:

  • “Mean A-B”: Uses the difference between the mean of phase A and the mean of phase B. * This is appropriate if a decrease of scores is expected for phase B.
  • “Mean B-A”: Uses the difference between the mean of phase B and the mean of phase A. This is appropriate if an increase of scores is expected for phase B.
  • “Mean |A-B|”: Uses the absolute value of the difference between the means of phases A and B.
  • “Median A-B”: The same as “Mean A-B”, but based on the median.
  • “Median B-A”: The same as “Mean B-A”, but based on the median.

Sample size of the randomization distribution. The exactness of the p-value can not exceed 1/number (i.e., number = 100 results in p-values with an exactness of one percent). Default is number = 500. For faster processing use number = 100. For more precise p-values set number = 1000.

If TRUE, the distribution is based on a complete permutation of all possible starting combinations. This setting overwrites the number Argument. The default setting is FALSE.

Minimal number of data points per phase in the sample. The first number refers to the A-phase and the second to the B-phase (e.g., limit = c(5, 3)). If only one number is given, this number is applied to both phases. Default is limit = 5.

Alternative to the limit-parameter, startpoints exactly defines the possible start points of phase B (e.g., startpoints = 4:9 restricts the phase B start points to measurements 4 to 9. startpoints overwrite the limit-parameter.

If set to FALSE, which is the default, random distribution values equal to the observed distribution are counted as null-hypothesis conform. That is, they decrease the probability of rejecting the null-hypothesis (increase the p-value). exclude.equal should be set to TRUE if you analyse one single-case design (not a multiple baseline data set) to reach a sufficient power. But be aware, that it increases the chance of an alpha-error.

If set TRUE, a histogram of the resulting distribution is plotted.

A vector of two characters or numbers indicating the two phases that should be compared. E.g., phases = c(“A”,“C”) or phases = c(2,4) for comparing the second and the fourth phase. Phases could be combined by providing a list with two elements. E.g., phases = list(A = c(1,3), B = c(2,4)) will compare phases 1 and 3 (as A) against 2 and 4 (as B). Default is phases = c(“A”,“B”).

12.2 Exanple

rand_test(exampleAB, graph = TRUE)

Randomization Test

Test for 3 cases.

Comparing phase 1 against phase 2 
Statistic:  Mean B-A 

Minimal length of each phase: A = 5 , B = 5 
Observed statistic =  20.55556 

Distribution based on a random sample of all 1331 possible combinations.
n   =  500 
M   =  18.54851 
SD  =  1.112389 
Min =  16.12222 
Max =  21.02897 

Probability of observed statistic based on distribution:
p   =  0.04 

Shapiro-Wilk Normality Test: W = 0.977; p = 0.000  (Hypothesis of normality rejected)

Probabilty of observed statistic based on the assumption of normality:
z = 1.8043, p = 0.0356 (single sided)