Material SEM

sem

statistics

Author

Jürgen Wilbert

Published

July 16, 2023

Abstract

Material for a brief explanation of sem.

Create example data set

# Load the required library
library(dplyr)
library(lavaan)

# Create the example data frame
set.seed(122)  # For reproducibility
n <- 2000      # Number of samples

example_data <- data.frame(
  intercept = rep(3, n)) %>%
  mutate(
  X1 = rnorm(n, mean = 10, sd = 1),
  X2 = X1 + rnorm(n, mean = 15, sd = 1),
  X3 = X2 + rnorm(n, mean = 5, sd = 1),
  residual = rnorm(n, mean = 0, sd = 1)
)
# Let's add a linear relationship to create criteria variable Y

example_data <- example_data %>% 
  mutate(Y = intercept + 0.5 * X1 + 0.3 * X2 + 0.2 * X3 + residual) %>%
  relocate(Y)

example_data %>% slice(1:10)

Y	intercept	X1	X2	X3	residual
21.80215	3	11.310701	25.79129	31.81429	-0.9534433
21.97488	3	9.124147	24.77909	30.94406	0.7902634
21.18578	3	10.199524	24.39014	30.08077	-0.2471751
21.33084	3	10.465954	26.04352	31.80822	-1.0768354
19.58122	3	8.197943	23.33933	29.38427	-0.3964034
22.48321	3	11.448744	27.12427	32.35725	-0.8498953
23.04190	3	10.298854	25.69187	30.66430	1.0520537
23.15183	3	10.361812	26.26141	29.45456	1.2015870
20.93370	3	8.993060	25.73242	31.11644	-0.5058392
21.74071	3	9.743001	25.59073	31.35714	-0.0794387

Do a regression with the lm() function:

Code

fit_lm <- lm(Y ~ X1 + X2 + X3, data = example_data)
summary(fit_lm)


Call:
lm(formula = Y ~ X1 + X2 + X3, data = example_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.3822 -0.6897  0.0199  0.7067  3.2161 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  2.96843    0.43415   6.837 1.07e-11 ***
X1           0.50581    0.03166  15.977  < 2e-16 ***
X2           0.29510    0.03150   9.369  < 2e-16 ***
X3           0.20252    0.02246   9.019  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.013 on 1996 degrees of freedom
Multiple R-squared:   0.56, Adjusted R-squared:  0.5593 
F-statistic: 846.8 on 3 and 1996 DF,  p-value: < 2.2e-16

Do the same regression with lavaan

Code

# Define the model
model <- '
  # Regress Y on X1, X2, and X3
  Y ~ X1 + X2 + X3
'

# Fit the model
fit <- sem(model, data = example_data)

# View the summary of the model
summary(fit)

lavaan 0.6.15 ended normally after 1 iteration

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                         4

  Number of observations                          2000

Model Test User Model:
                                                      
  Test statistic                                 0.000
  Degrees of freedom                                 0

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)
  Y ~                                                 
    X1                0.506    0.032   15.993    0.000
    X2                0.295    0.031    9.378    0.000
    X3                0.203    0.022    9.028    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .Y                 1.023    0.032   31.623    0.000

… add estimations for the intercept and standardized weights:

Code

fit <- sem(model, data = example_data, meanstructure = TRUE)
summary(fit, standardized = TRUE)

lavaan 0.6.15 ended normally after 16 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                         5

  Number of observations                          2000

Model Test User Model:
                                                      
  Test statistic                                 0.000
  Degrees of freedom                                 0

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  Y ~                                                                   
    X1                0.506    0.032   15.993    0.000    0.506    0.338
    X2                0.295    0.031    9.378    0.000    0.295    0.271
    X3                0.203    0.022    9.028    0.000    0.203    0.227

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .Y                 2.968    0.434    6.844    0.000    2.968    1.946

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .Y                 1.023    0.032   31.623    0.000    1.023    0.440

Adding a measurement structure to all variables (endogeneous and exogeneous):

Code

# Define the model
model <- '
  # Regress Y on X1, X2, and X3
  LY =~ Y
  LX1 =~ X1
  LX2 =~ X2
  LX3 =~ X3
  LY ~ LX1 + LX2 + LX3
'

# Fit the model
fit <- sem(model, data = example_data, meanstructure = TRUE)

# View the summary of the model
summary(fit, standardize = TRUE)

lavaan 0.6.15 ended normally after 36 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        14

  Number of observations                          2000

Model Test User Model:
                                                      
  Test statistic                                 0.000
  Degrees of freedom                                 0

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  LY =~                                                                 
    Y                 1.000                               1.525    1.000
  LX1 =~                                                                
    X1                1.000                               1.020    1.000
  LX2 =~                                                                
    X2                1.000                               1.398    1.000
  LX3 =~                                                                
    X3                1.000                               1.711    1.000

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  LY ~                                                                  
    LX1               0.506    0.032   15.993    0.000    0.338    0.338
    LX2               0.295    0.031    9.378    0.000    0.271    0.271
    LX3               0.203    0.022    9.028    0.000    0.227    0.227

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  LX1 ~~                                                                
    LX2               1.015    0.039   25.940    0.000    0.712    0.712
    LX3               1.035    0.045   22.821    0.000    0.593    0.593
  LX2 ~~                                                                
    LX3               1.932    0.069   28.095    0.000    0.807    0.807

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .Y                21.477    0.034  629.821    0.000   21.477   14.083
   .X1                9.996    0.023  438.423    0.000    9.996    9.803
   .X2               25.007    0.031  799.771    0.000   25.007   17.883
   .X3               29.987    0.038  783.830    0.000   29.987   17.527
   .LY                0.000                               0.000    0.000
    LX1               0.000                               0.000    0.000
    LX2               0.000                               0.000    0.000
    LX3               0.000                               0.000    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .Y                 0.000                               0.000    0.000
   .X1                0.000                               0.000    0.000
   .X2                0.000                               0.000    0.000
   .X3                0.000                               0.000    0.000
   .LY                1.023    0.032   31.623    0.000    0.440    0.440
    LX1               1.040    0.033   31.623    0.000    1.000    1.000
    LX2               1.955    0.062   31.623    0.000    1.000    1.000
    LX3               2.927    0.093   31.623    0.000    1.000    1.000

Model a new latent variable latent_y

Code

model <- '
  latent =~ X1 + X2 + X3
'

fit <- sem(model, data = example_data, std.lv = TRUE)

# View the summary of the model
summary(fit, standardize = TRUE)

lavaan 0.6.15 ended normally after 15 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                         6

  Number of observations                          2000

Model Test User Model:
                                                      
  Test statistic                                 0.000
  Degrees of freedom                                 0

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  latent =~                                                             
    X1                0.738    0.020   36.119    0.000    0.738    0.723
    X2                1.377    0.025   55.303    0.000    1.377    0.984
    X3                1.403    0.033   42.483    0.000    1.403    0.820

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .X1                0.496    0.018   27.802    0.000    0.496    0.477
   .X2                0.061    0.030    2.045    0.041    0.061    0.031
   .X3                0.958    0.043   22.186    0.000    0.958    0.327
    latent            1.000                               1.000    1.000

A more complex example from the lavaan example data:

y1 Expert ratings of the freedom of the press in 1960

y2 The freedom of political opposition in 1960

y3 The fairness of elections in 1960

y4 The effectiveness of the elected legislature in 1960

y5 Expert ratings of the freedom of the press in 1965

y6 The freedom of political opposition in 1965

y7 The fairness of elections in 1965

y8 The effectiveness of the elected legislature in 1965

x1 The gross national product (GNP) per capita in 1960

x2 The inanimate energy consumption per capita in 1960

x3 The percentage of the labor force in industry in 1960

Code

myModel <- '
   # latent variables
     ind60 =~ x1 + x2 + x3
     dem60 =~ y1 + y2 + y3 + y4
     dem65 =~ y5 + y6 + y7 + y8
   # regressions
     dem60 ~ ind60
     dem65 ~ ind60 + dem60
   # residual covariances
     y1 ~~ y5
     y2 ~~ y4 + y6
     y3 ~~ y7
     y4 ~~ y8
     y6 ~~ y8
'
fit <- sem(model = myModel, data  = PoliticalDemocracy)
summary(fit, standardize = TRUE)

lavaan 0.6.15 ended normally after 68 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        31

  Number of observations                            75

Model Test User Model:
                                                      
  Test statistic                                38.125
  Degrees of freedom                                35
  P-value (Chi-square)                           0.329

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  ind60 =~                                                              
    x1                1.000                               0.670    0.920
    x2                2.180    0.139   15.742    0.000    1.460    0.973
    x3                1.819    0.152   11.967    0.000    1.218    0.872
  dem60 =~                                                              
    y1                1.000                               2.223    0.850
    y2                1.257    0.182    6.889    0.000    2.794    0.717
    y3                1.058    0.151    6.987    0.000    2.351    0.722
    y4                1.265    0.145    8.722    0.000    2.812    0.846
  dem65 =~                                                              
    y5                1.000                               2.103    0.808
    y6                1.186    0.169    7.024    0.000    2.493    0.746
    y7                1.280    0.160    8.002    0.000    2.691    0.824
    y8                1.266    0.158    8.007    0.000    2.662    0.828

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  dem60 ~                                                               
    ind60             1.483    0.399    3.715    0.000    0.447    0.447
  dem65 ~                                                               
    ind60             0.572    0.221    2.586    0.010    0.182    0.182
    dem60             0.837    0.098    8.514    0.000    0.885    0.885

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
 .y1 ~~                                                                 
   .y5                0.624    0.358    1.741    0.082    0.624    0.296
 .y2 ~~                                                                 
   .y4                1.313    0.702    1.871    0.061    1.313    0.273
   .y6                2.153    0.734    2.934    0.003    2.153    0.356
 .y3 ~~                                                                 
   .y7                0.795    0.608    1.308    0.191    0.795    0.191
 .y4 ~~                                                                 
   .y8                0.348    0.442    0.787    0.431    0.348    0.109
 .y6 ~~                                                                 
   .y8                1.356    0.568    2.386    0.017    1.356    0.338

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .x1                0.082    0.019    4.184    0.000    0.082    0.154
   .x2                0.120    0.070    1.718    0.086    0.120    0.053
   .x3                0.467    0.090    5.177    0.000    0.467    0.239
   .y1                1.891    0.444    4.256    0.000    1.891    0.277
   .y2                7.373    1.374    5.366    0.000    7.373    0.486
   .y3                5.067    0.952    5.324    0.000    5.067    0.478
   .y4                3.148    0.739    4.261    0.000    3.148    0.285
   .y5                2.351    0.480    4.895    0.000    2.351    0.347
   .y6                4.954    0.914    5.419    0.000    4.954    0.443
   .y7                3.431    0.713    4.814    0.000    3.431    0.322
   .y8                3.254    0.695    4.685    0.000    3.254    0.315
    ind60             0.448    0.087    5.173    0.000    1.000    1.000
   .dem60             3.956    0.921    4.295    0.000    0.800    0.800
   .dem65             0.172    0.215    0.803    0.422    0.039    0.039