• Home
  • Projects
  • Publications
  • Software
  • Graduate Students
  • Private Blog

Material SEM

sem
statistics
Author

Jürgen Wilbert

Published

July 16, 2023

Abstract
Material for a brief explanation of sem.
Create example data set
# Load the required library
library(dplyr)
library(lavaan)

# Create the example data frame
set.seed(122)  # For reproducibility
n <- 2000      # Number of samples

example_data <- data.frame(
  intercept = rep(3, n)) %>%
  mutate(
  X1 = rnorm(n, mean = 10, sd = 1),
  X2 = X1 + rnorm(n, mean = 15, sd = 1),
  X3 = X2 + rnorm(n, mean = 5, sd = 1),
  residual = rnorm(n, mean = 0, sd = 1)
)
# Let's add a linear relationship to create criteria variable Y

example_data <- example_data %>% 
  mutate(Y = intercept + 0.5 * X1 + 0.3 * X2 + 0.2 * X3 + residual) %>%
  relocate(Y)

example_data %>% slice(1:10)
Y intercept X1 X2 X3 residual
21.80215 3 11.310701 25.79129 31.81429 -0.9534433
21.97488 3 9.124147 24.77909 30.94406 0.7902634
21.18578 3 10.199524 24.39014 30.08077 -0.2471751
21.33084 3 10.465954 26.04352 31.80822 -1.0768354
19.58122 3 8.197943 23.33933 29.38427 -0.3964034
22.48321 3 11.448744 27.12427 32.35725 -0.8498953
23.04190 3 10.298854 25.69187 30.66430 1.0520537
23.15183 3 10.361812 26.26141 29.45456 1.2015870
20.93370 3 8.993060 25.73242 31.11644 -0.5058392
21.74071 3 9.743001 25.59073 31.35714 -0.0794387

Do a regression with the lm() function:

Code
fit_lm <- lm(Y ~ X1 + X2 + X3, data = example_data)
summary(fit_lm)

Call:
lm(formula = Y ~ X1 + X2 + X3, data = example_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.3822 -0.6897  0.0199  0.7067  3.2161 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  2.96843    0.43415   6.837 1.07e-11 ***
X1           0.50581    0.03166  15.977  < 2e-16 ***
X2           0.29510    0.03150   9.369  < 2e-16 ***
X3           0.20252    0.02246   9.019  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.013 on 1996 degrees of freedom
Multiple R-squared:   0.56, Adjusted R-squared:  0.5593 
F-statistic: 846.8 on 3 and 1996 DF,  p-value: < 2.2e-16

Do the same regression with lavaan

Code
# Define the model
model <- '
  # Regress Y on X1, X2, and X3
  Y ~ X1 + X2 + X3
'

# Fit the model
fit <- sem(model, data = example_data)

# View the summary of the model
summary(fit)
lavaan 0.6.15 ended normally after 1 iteration

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                         4

  Number of observations                          2000

Model Test User Model:
                                                      
  Test statistic                                 0.000
  Degrees of freedom                                 0

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)
  Y ~                                                 
    X1                0.506    0.032   15.993    0.000
    X2                0.295    0.031    9.378    0.000
    X3                0.203    0.022    9.028    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .Y                 1.023    0.032   31.623    0.000

… add estimations for the intercept and standardized weights:

Code
fit <- sem(model, data = example_data, meanstructure = TRUE)
summary(fit, standardized = TRUE)
lavaan 0.6.15 ended normally after 16 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                         5

  Number of observations                          2000

Model Test User Model:
                                                      
  Test statistic                                 0.000
  Degrees of freedom                                 0

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  Y ~                                                                   
    X1                0.506    0.032   15.993    0.000    0.506    0.338
    X2                0.295    0.031    9.378    0.000    0.295    0.271
    X3                0.203    0.022    9.028    0.000    0.203    0.227

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .Y                 2.968    0.434    6.844    0.000    2.968    1.946

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .Y                 1.023    0.032   31.623    0.000    1.023    0.440

Adding a measurement structure to all variables (endogeneous and exogeneous):

Code
# Define the model
model <- '
  # Regress Y on X1, X2, and X3
  LY =~ Y
  LX1 =~ X1
  LX2 =~ X2
  LX3 =~ X3
  LY ~ LX1 + LX2 + LX3
'

# Fit the model
fit <- sem(model, data = example_data, meanstructure = TRUE)

# View the summary of the model
summary(fit, standardize = TRUE)
lavaan 0.6.15 ended normally after 36 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        14

  Number of observations                          2000

Model Test User Model:
                                                      
  Test statistic                                 0.000
  Degrees of freedom                                 0

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  LY =~                                                                 
    Y                 1.000                               1.525    1.000
  LX1 =~                                                                
    X1                1.000                               1.020    1.000
  LX2 =~                                                                
    X2                1.000                               1.398    1.000
  LX3 =~                                                                
    X3                1.000                               1.711    1.000

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  LY ~                                                                  
    LX1               0.506    0.032   15.993    0.000    0.338    0.338
    LX2               0.295    0.031    9.378    0.000    0.271    0.271
    LX3               0.203    0.022    9.028    0.000    0.227    0.227

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  LX1 ~~                                                                
    LX2               1.015    0.039   25.940    0.000    0.712    0.712
    LX3               1.035    0.045   22.821    0.000    0.593    0.593
  LX2 ~~                                                                
    LX3               1.932    0.069   28.095    0.000    0.807    0.807

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .Y                21.477    0.034  629.821    0.000   21.477   14.083
   .X1                9.996    0.023  438.423    0.000    9.996    9.803
   .X2               25.007    0.031  799.771    0.000   25.007   17.883
   .X3               29.987    0.038  783.830    0.000   29.987   17.527
   .LY                0.000                               0.000    0.000
    LX1               0.000                               0.000    0.000
    LX2               0.000                               0.000    0.000
    LX3               0.000                               0.000    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .Y                 0.000                               0.000    0.000
   .X1                0.000                               0.000    0.000
   .X2                0.000                               0.000    0.000
   .X3                0.000                               0.000    0.000
   .LY                1.023    0.032   31.623    0.000    0.440    0.440
    LX1               1.040    0.033   31.623    0.000    1.000    1.000
    LX2               1.955    0.062   31.623    0.000    1.000    1.000
    LX3               2.927    0.093   31.623    0.000    1.000    1.000

Model a new latent variable latent_y

Code
model <- '
  latent =~ X1 + X2 + X3
'

fit <- sem(model, data = example_data, std.lv = TRUE)

# View the summary of the model
summary(fit, standardize = TRUE)
lavaan 0.6.15 ended normally after 15 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                         6

  Number of observations                          2000

Model Test User Model:
                                                      
  Test statistic                                 0.000
  Degrees of freedom                                 0

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  latent =~                                                             
    X1                0.738    0.020   36.119    0.000    0.738    0.723
    X2                1.377    0.025   55.303    0.000    1.377    0.984
    X3                1.403    0.033   42.483    0.000    1.403    0.820

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .X1                0.496    0.018   27.802    0.000    0.496    0.477
   .X2                0.061    0.030    2.045    0.041    0.061    0.031
   .X3                0.958    0.043   22.186    0.000    0.958    0.327
    latent            1.000                               1.000    1.000

A more complex example from the lavaan example data:

y1 Expert ratings of the freedom of the press in 1960

y2 The freedom of political opposition in 1960

y3 The fairness of elections in 1960

y4 The effectiveness of the elected legislature in 1960

y5 Expert ratings of the freedom of the press in 1965

y6 The freedom of political opposition in 1965

y7 The fairness of elections in 1965

y8 The effectiveness of the elected legislature in 1965

x1 The gross national product (GNP) per capita in 1960

x2 The inanimate energy consumption per capita in 1960

x3 The percentage of the labor force in industry in 1960

Code
myModel <- '
   # latent variables
     ind60 =~ x1 + x2 + x3
     dem60 =~ y1 + y2 + y3 + y4
     dem65 =~ y5 + y6 + y7 + y8
   # regressions
     dem60 ~ ind60
     dem65 ~ ind60 + dem60
   # residual covariances
     y1 ~~ y5
     y2 ~~ y4 + y6
     y3 ~~ y7
     y4 ~~ y8
     y6 ~~ y8
'
fit <- sem(model = myModel, data  = PoliticalDemocracy)
summary(fit, standardize = TRUE)
lavaan 0.6.15 ended normally after 68 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        31

  Number of observations                            75

Model Test User Model:
                                                      
  Test statistic                                38.125
  Degrees of freedom                                35
  P-value (Chi-square)                           0.329

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  ind60 =~                                                              
    x1                1.000                               0.670    0.920
    x2                2.180    0.139   15.742    0.000    1.460    0.973
    x3                1.819    0.152   11.967    0.000    1.218    0.872
  dem60 =~                                                              
    y1                1.000                               2.223    0.850
    y2                1.257    0.182    6.889    0.000    2.794    0.717
    y3                1.058    0.151    6.987    0.000    2.351    0.722
    y4                1.265    0.145    8.722    0.000    2.812    0.846
  dem65 =~                                                              
    y5                1.000                               2.103    0.808
    y6                1.186    0.169    7.024    0.000    2.493    0.746
    y7                1.280    0.160    8.002    0.000    2.691    0.824
    y8                1.266    0.158    8.007    0.000    2.662    0.828

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  dem60 ~                                                               
    ind60             1.483    0.399    3.715    0.000    0.447    0.447
  dem65 ~                                                               
    ind60             0.572    0.221    2.586    0.010    0.182    0.182
    dem60             0.837    0.098    8.514    0.000    0.885    0.885

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
 .y1 ~~                                                                 
   .y5                0.624    0.358    1.741    0.082    0.624    0.296
 .y2 ~~                                                                 
   .y4                1.313    0.702    1.871    0.061    1.313    0.273
   .y6                2.153    0.734    2.934    0.003    2.153    0.356
 .y3 ~~                                                                 
   .y7                0.795    0.608    1.308    0.191    0.795    0.191
 .y4 ~~                                                                 
   .y8                0.348    0.442    0.787    0.431    0.348    0.109
 .y6 ~~                                                                 
   .y8                1.356    0.568    2.386    0.017    1.356    0.338

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .x1                0.082    0.019    4.184    0.000    0.082    0.154
   .x2                0.120    0.070    1.718    0.086    0.120    0.053
   .x3                0.467    0.090    5.177    0.000    0.467    0.239
   .y1                1.891    0.444    4.256    0.000    1.891    0.277
   .y2                7.373    1.374    5.366    0.000    7.373    0.486
   .y3                5.067    0.952    5.324    0.000    5.067    0.478
   .y4                3.148    0.739    4.261    0.000    3.148    0.285
   .y5                2.351    0.480    4.895    0.000    2.351    0.347
   .y6                4.954    0.914    5.419    0.000    4.954    0.443
   .y7                3.431    0.713    4.814    0.000    3.431    0.322
   .y8                3.254    0.695    4.685    0.000    3.254    0.315
    ind60             0.448    0.087    5.173    0.000    1.000    1.000
   .dem60             3.956    0.921    4.295    0.000    0.800    0.800
   .dem65             0.172    0.215    0.803    0.422    0.039    0.039
 

© Jürgen Wilbert, last updated 2025-05-16