This document provides an overview of structural equation modeling (SEM) using AMOS. It defines key SEM concepts like latent variables, observed variables, path analysis, and model identification. It also explains how to specify and estimate a SEM model in AMOS, including how to draw path diagrams, name variables, set regression weights, and view output. Model fit is discussed along with potential issues like sample size. Confirmatory factor analysis and other SEM models like path analysis and latent growth models are also introduced.
These are some slides I use in my Multivariate Statistics course to teach psychology graduate student the basics of structural equation modeling using the lavaan package in R. Topics are at an introductory level, for someone without prior experience with the topic.
These are some slides I use in my Multivariate Statistics course to teach psychology graduate student the basics of structural equation modeling using the lavaan package in R. Topics are at an introductory level, for someone without prior experience with the topic.
Introduction to SEM (Structural Equation Models) - invited talk at the seminar "Analyzing and Interpreting Data" organized by the Finnish Doctoral Programme in Education and Learning (15 May 2013) in Vuosaari, Helsinki, Finland. Acknowledgements to Barbara Byrne for an excellent intro book of SEM.
Estimators for structural equation models of Likert scale dataNick Stauner
Which estimation method is optimal for structural equation modeling (SEM) of Likert scale data? Conventional SEM assumes continuous measurement, and some SEM estimators assume a multivariate normal distribution, but Likert scale data are ordinal and do not necessarily resemble a discretized normal distribution. When treated as continuous, these data may yet be skewed due to item difficulty, choice of population, or various response biases. One can fit an SEM to a matrix of polychoric correlations, which estimate latent, continuous constructs underlying ordinally measured variables, but polychoric correlations also assume these latent factors are normally distributed. To what extent are these methods robust with continuous versus ordinal data and with varying degrees of skewness and kurtosis? To answer, I simulated 10,000 samples of multivariate normal data, each consisting of 500 observations of five strongly correlated variables. I transformed each consecutive sample to an incrementally greater degree to increase skew and kurtosis from approximately normal levels to extremes beyond six and 30, respectively. I then performed five confirmatory factor analyses on each sample using five different estimators: maximum likelihood (ML), weighted least squares (WLS), diagonally weighted least squares (DWLS), unweighted least squares (ULS), and generalized least squares (GLS). I compared results for continuous and discretized (ordinal) data, including loadings, error variances, fit statistics, and standard errors. I also noted frequencies of failures, which complicated calculation of polychoric correlations, and particularly plagued the WLS estimator. WLS estimation produced relatively biased loadings and error variance estimates. GLS also underestimated error variances. Neither estimator exhibited any unique advantage to offset these disadvantages. ML estimated parameters more accurately, but some fit statistics appeared biased by it, especially in the context of extreme nonnormality. Specifically, the chi squared goodness-of-fit test statistic and the root mean square error of approximation (RMSEA) began higher with ML-estimated SEMs of approximately normal data, and worsened sharply with greater nonnormality. The Tucker Lewis Index (TLI) and standardized root mean square residual (SRMR) also worsened more moderately with nonnormality when using ML estimation. GLS-estimated fit statistics shared ML’s sensitivity to nonnormality, and were even worse for the TLI and SRMR. Results generally favored ULS and DWLS estimators, which produced accurate parameter estimates, good and robust fit statistics, and small standard errors (SEs) for loadings. DWLS tended to produce smaller SEs than ULS when skewness was below three, but ULS SEs were more robust to nonnormality and smaller with extremely nonnormal data. ML SEs were larger for loadings, but smaller for error variance estimates, and fairly robust to nonnormality...
This is an interesting application of Path Analysis leveraging Richard Florida's findings regarding real estate valuations in different cities. This example serves as an introduction to Path Analysis.
Introduction to Structural Equation Modeling Partial Least Sqaures (SEM-PLS)Ali Asgari
Partial least squares structural equation modelling (PLS-SEM) has recently received considerable attention in a variety of disciplines.The goal of PLS-SEM is the explanation of variances (prediction-oriented approach of the methodology) rather than explaining covariances (theory testing via covariance-based SEM).
Introduction to SEM (Structural Equation Models) - invited talk at the seminar "Analyzing and Interpreting Data" organized by the Finnish Doctoral Programme in Education and Learning (15 May 2013) in Vuosaari, Helsinki, Finland. Acknowledgements to Barbara Byrne for an excellent intro book of SEM.
Estimators for structural equation models of Likert scale dataNick Stauner
Which estimation method is optimal for structural equation modeling (SEM) of Likert scale data? Conventional SEM assumes continuous measurement, and some SEM estimators assume a multivariate normal distribution, but Likert scale data are ordinal and do not necessarily resemble a discretized normal distribution. When treated as continuous, these data may yet be skewed due to item difficulty, choice of population, or various response biases. One can fit an SEM to a matrix of polychoric correlations, which estimate latent, continuous constructs underlying ordinally measured variables, but polychoric correlations also assume these latent factors are normally distributed. To what extent are these methods robust with continuous versus ordinal data and with varying degrees of skewness and kurtosis? To answer, I simulated 10,000 samples of multivariate normal data, each consisting of 500 observations of five strongly correlated variables. I transformed each consecutive sample to an incrementally greater degree to increase skew and kurtosis from approximately normal levels to extremes beyond six and 30, respectively. I then performed five confirmatory factor analyses on each sample using five different estimators: maximum likelihood (ML), weighted least squares (WLS), diagonally weighted least squares (DWLS), unweighted least squares (ULS), and generalized least squares (GLS). I compared results for continuous and discretized (ordinal) data, including loadings, error variances, fit statistics, and standard errors. I also noted frequencies of failures, which complicated calculation of polychoric correlations, and particularly plagued the WLS estimator. WLS estimation produced relatively biased loadings and error variance estimates. GLS also underestimated error variances. Neither estimator exhibited any unique advantage to offset these disadvantages. ML estimated parameters more accurately, but some fit statistics appeared biased by it, especially in the context of extreme nonnormality. Specifically, the chi squared goodness-of-fit test statistic and the root mean square error of approximation (RMSEA) began higher with ML-estimated SEMs of approximately normal data, and worsened sharply with greater nonnormality. The Tucker Lewis Index (TLI) and standardized root mean square residual (SRMR) also worsened more moderately with nonnormality when using ML estimation. GLS-estimated fit statistics shared ML’s sensitivity to nonnormality, and were even worse for the TLI and SRMR. Results generally favored ULS and DWLS estimators, which produced accurate parameter estimates, good and robust fit statistics, and small standard errors (SEs) for loadings. DWLS tended to produce smaller SEs than ULS when skewness was below three, but ULS SEs were more robust to nonnormality and smaller with extremely nonnormal data. ML SEs were larger for loadings, but smaller for error variance estimates, and fairly robust to nonnormality...
This is an interesting application of Path Analysis leveraging Richard Florida's findings regarding real estate valuations in different cities. This example serves as an introduction to Path Analysis.
Introduction to Structural Equation Modeling Partial Least Sqaures (SEM-PLS)Ali Asgari
Partial least squares structural equation modelling (PLS-SEM) has recently received considerable attention in a variety of disciplines.The goal of PLS-SEM is the explanation of variances (prediction-oriented approach of the methodology) rather than explaining covariances (theory testing via covariance-based SEM).
Intro to SVM with its maths and examples. Types of SVM and its parameters. Concept of vector algebra. Concepts of text analytics and Natural Language Processing along with its applications.
Introduction to linear regression and the maths behind it like line of best fit, regression matrics. Other concepts include cost function, gradient descent, overfitting and underfitting, r squared.
Detail Description about Probability Distribution for Dummies. The contents are about random variables, its types(Discrete and Continuous) , it's distribution (Discrete probability distribution and probability density function), Expected value, Binomial, Poisson and Normal Distribution usage and solved example for each topic.
It gives detail description about probability, types of probability, difference between mutually exclusive events and independent events, difference between conditional and unconditional probability and Bayes' theorem
How to write research proposal?, How to write statement of the problem?, Difference between Research question and hypothesis?, Difference between internal and external validity. Difference between l
This slides gives knowledge about how to define a research question. what are the do's and don'ts while defining research question, steps to define a research questions.examples of research questions
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
2. Basic Concept
Partial Correlation:
Correlation between Y and X1 where effects of
X2 have been removed from X1 but not from Y
is partial correlation
3. Interpretation of Partial
correlation
Part correlation squared is the unique amount of total
variance explained.
Sum of part correlations squared does NOT equal R
square because of overlapping variance.
Multi collinearity: Existence of substantial
correlation among a set of independent
variables.
5. Structural Equation Modeling
Structural equation modeling (SEM), as a
concept, is a combination of statistical techniques
such as exploratory factor analysis and multiple
regression.
The purpose of SEM is to examine a set of
relationships between one or more Independent
Variables (IV) and one or more Dependent
6. Goals of SEM
To understand the patterns of
correlation/covariance among a set of variables.
To explain as much of their variance as possible
with the model specified .
7.
8. How SEM is different from traditional
approach?
Multiple equations can be estimated
simultaneously
Non-recursive models are possible
Correlations among disturbances are possible
Formal specification of a model is required
Measurement and structural relations are
separated, with relations among latent variables
rather than measured variables
Assessing of model fit is not as straightforward
9. Types of SEM models
Path analysis
Confirmatory factor analysis.
Structural regression model
Latent change model
10. Approach to SEM analysis
Review the relevant theory and research literature to support model
specification
Specify a model (e.g., diagram, equations)
Determine model identification
Collect data
Conduct preliminary descriptive statistical analysis
Estimate parameters in the model (Model Estimation)
Assess model fit
Model Respecification
Interpret and present results.
11. Components of SEM
Latent variables, factors, constructs
Observed variables, measures, indicators,
manifest variables
Direction of influence, relationship from one variable
to another
Association not explained within the model
12. Important Definition
A measured variable (MV) is a variables that is
directly measured.
A latent variable could be defined as whatever its
multiple indicators have in common with each
other. It isn't measured directly.
Relationships between variables are of three
types such as Association (Correlation,
covariance), direct effect and indirect effect.
13. Path Analysis
Extension of multiple regression allowing us to
consider more than one dependent variable at a time
and more importantly, allowing variables to be both
Dependent and Independent variables.
B is dependent as well as independent variable
(mediating variable).
14. Path Analysis
Once the data is available, conduction of path
analysis is straightforward:
Draw a path diagram according to the theory.
Conduct one or more regression analyses.
Compare the regression estimates (B) to the
theoretical assumptions or (Beta) other studies.
If needed, modify the model by removing or adding
connecting paths between the variables and redo
stages 2 and 3.
24. Drawing in AMOS (Draw observed
variable)
Move the cursor to the
place where you want to
place an observed
variable and click your
mouse. Drag the box in
order to adjust the size
of the box.
Click Draw
unobserved
25. Direct effect
Click path icon of direct
effect and click respective
independent variable drag
up to dependent variable
26. Touch up variable
It gives neat look to our
model. Click the touch up
variable and click respective
observed or unobserved
which you want to look
neat.
27. Add unique variable
Click and then click a
box or a circle to which
you want to add errors
or a unique variables.
(When you use "Unique
Variable" button, the path
coefficient will be
automatically constrained
to 1.)
28. Naming the variables in AMOS
Click list all the variable. Drag
the variable from list and put
directly in to observed variable.
29. Naming the variables in AMOS
double click on the objects in the path
diagram. The Object Properties dialog box
appears.
And
Click on the Text tab and
enter the name of the
variable in the Variable
name field.
31. Performing the analysis in
AMOS
For our example, check the Minimization
history, Standardized estimates, and Squared
multiple correlations boxes.
To run AMOS, click on the Calculate estimates
icon on the toolbar.
AMOS will want to save this problem to a file.
32. Results
When AMOS has completed the calculations,
you have two options for viewing the output:
text output,
graphics output.
For text output, click the View Text icon on
the toolbar.
34. Viewing the graphics output in
AMOS
To view the graphics output, click the View output icon next to
the drawing area.
Chose to view either unstandardized or standardized
estimates by click one or the other in the Parameter
Formats panel next to your drawing area
35. Standardized vs. Unstandardized
Standardized coefficients can be compared across variables
within a model.
Standardized coefficients reflect not only the strength of the
relationship but also variances and covariance's of variables
included in the model as well of variance of variables not
included in the model and subsumed under the error term.
Standardized parameter estimates are transformations of
unstandardized estimates that remove scaling and can be
used for informal comparisons of parameters throughout the
model.
36. Standardized vs. Unstandardized
Unstandardized parameter estimates retain
scaling information of variables and can only be
interpreted with reference to the scales of the
variables.
A correlation matrix standardizes values and loses
the metric of the scales.
Therefore for correlation matrix, both standardizes
and unstandardized are same.
39. Improving the appearance
of the path diagram
You can change the appearance of your path diagram by
moving objects around
To move an object, click on the Move icon on the toolbar.
You will notice that the picture of a little moving truck
appears below your mouse pointer when you move into the
drawing area. This lets you know the Move function is
active.
Then click and hold down your left mouse button on the
object you wish to move. With the mouse button still
depressed, move the object to where you want it, and let go
of your mouse button. Amos Graphics will automatically
redraw all connecting arrows.
40. Improving the appearance
of the path diagram
If you make a mistake, there are always three icons on the toolbar to
quickly bail you out: the Erase and Undo functions.
To erase an object, simply click on the Erase icon and then click on
the object you wish to erase.
To undo your last drawing activity, click on the Undo icon and your
last activity disappears.
Each time you click Undo, your previous activity will be removed.
If you change your mind, click on Redo to restore a change.
41. SEM could impacted by
the requirement of sufficient sample size. A desirable goal is to have
a 20:1 ratio for the number of subjects to the number of model
parameters . However, a 10:1 may be a realistic target. If the ratio is
less than 5:1, the estimates may be unstable.
measurement instruments
multivariate normality
parameter identification
outliers
missing data
interpretation of model fit indices
42. Model Identification
A model is identified if:
It is theoretically possible to derive a unique estimate
of each parameter
The number of equations is equal to the number of
parameters to be estimated
It is fully recursive (No feedback loop)
43. Model identification
A model is over identified if:
A model has fewer parameters than observations.
There are more equations than are necessary for the
purpose of estimating parameters
44. Model identification
A model is under identified or not identified if:
It is not theoretically possible to derive a unique
estimate of each parameter
There is insufficient information for the purpose of
obtaining a determinate solution of parameters.
There are an infinite number of solutions may be
obtained
45. Model identification
Determine the # of parameters you have.
Formula: (v(v+1) / 2), where v= # of observed
variables
Use of this formula, allows to see if trying to guess
more than the number of parameters the existing data
allows.
Do not want to be JUST identified (cause lack of fit
indices) or UNDER identified, therefore looking to be
OVER-identified.
Being OVER identified essentially means that there are more
available parameters than trying to estimate.
50. Model Estimation
Maximum Likelihood
Generalized and UnGeneralized least square
2 stage and 3 stage least square
51. Model fit
Model fit = sample data are consistent with the
implied model
The smaller the discrepancy between the
implied model and the sample data, the better
the fit.
Many fit indexes
None are fallible (though some are better than others)
54. Model Respecification
What if the model does NOT fit?
Model trimming and building
LaGrange Multiplier test (add parameters)
Wald test (drop parameters)
Empirical vs. theoretical respecification
What justification do you have to respecify?
Consider equivalent models
55. Confirmatory factor analysis
How it differs from the more commonly
encountered forms of factor analysis.
What is factor analysis (FA)?
have many variables and want to examine if they can be
explained by a smaller number of factors.
No a priori hypothesis (impossible to even indicate a hunch
to the program) as to which variables will cluster together
on which factor.
56. Confirmatory factor analysis
The major difference is that an a priori
hypothesis is essential:
which variables grouped together as manifestations of
an underlying construct and fits the model.
Like with path analysis, it can be helpful to draw
hypothesized relations in a diagram.
57. CFA is not model building
With CFA, you stipulate where you think the variables
should load. Then, the program simply tells you
whether your model fits the data.
If no fit, then there are few clues to guide you how to
shuffle the variables around to make the model better
fit the data.
Note: Even if the model does fit, it does not guarantee that a
new arrangement of variables would be an even better fit.
Therefore, one must really use theory, knowledge, or previous
research to guide your model, rather than rely on statistical
criteria.
58. Scaling
Scaling factor: constrain one of the factor
loadings to 1 ( that variables called – reference
variable, the factor has a scale related to the
explained variance of the reference variable).
fix factor variance to a constant ( ex. 1), so all
factor loadings are free parameters
64. Unstandardized solution
Factor loadings =unstandardized regression coefficient
Unanalyzed association between factors or errors=
covariances
Standardized solution
Unanalyzed association between factors or errors=
correlations
Factor loadings =standardized regression coefficient (
structure coefficient)
The square of the factor loadings = the proportion of
the explained ( common) indicator variance,
R2(squared multiple correlation)
65. Standardized regression model
Inclusion of observed and latent variables
Assessment both of relationship between
observed and latent variables.
66. Latent growth Analysis
Can change in responses be tracked over
time?
Latent Growth Curve Analysis