Slides presented at the Greater Cleveland R User Meetup group on the statistical concept of mediation using the lavaan package for structural equation modeling.
Introduction to SEM (Structural Equation Models) - invited talk at the seminar "Analyzing and Interpreting Data" organized by the Finnish Doctoral Programme in Education and Learning (15 May 2013) in Vuosaari, Helsinki, Finland. Acknowledgements to Barbara Byrne for an excellent intro book of SEM.
Regresyonda Çoklu Bağlantı (Multicollinearity) Probleminin Temel Bileşenler A...yigitcanozmeral
Çoklu Bağlantının tanımı, nedenleri, teşhis yöntemleri ve giderilme yöntemleri ayrıntılı olarak açıklanmıştır. Temel Bileşenler Analizi anlatılmış ve uygulama yapılmıştır.
Introduction to SEM (Structural Equation Models) - invited talk at the seminar "Analyzing and Interpreting Data" organized by the Finnish Doctoral Programme in Education and Learning (15 May 2013) in Vuosaari, Helsinki, Finland. Acknowledgements to Barbara Byrne for an excellent intro book of SEM.
Regresyonda Çoklu Bağlantı (Multicollinearity) Probleminin Temel Bileşenler A...yigitcanozmeral
Çoklu Bağlantının tanımı, nedenleri, teşhis yöntemleri ve giderilme yöntemleri ayrıntılı olarak açıklanmıştır. Temel Bileşenler Analizi anlatılmış ve uygulama yapılmıştır.
Introduction to Structural Equation Modeling Partial Least Sqaures (SEM-PLS)Ali Asgari
Partial least squares structural equation modelling (PLS-SEM) has recently received considerable attention in a variety of disciplines.The goal of PLS-SEM is the explanation of variances (prediction-oriented approach of the methodology) rather than explaining covariances (theory testing via covariance-based SEM).
Trying to decide whether something is true or not? Need to know the validity of a statement or opinion? Want to know whether research is credible? Use the guidance in this slideshow to evaluate published research for accuracy and credibility.
Stata Uygulamalı Panel Eşbütünleşme Testleri ve Model Tahminiyigitcanozmeral
Model seçimi için uygulanan testler sonucunda veriye sabit etkiler modelin uygun olduğu görülmüş, heteroskedasite, otokorelasyon ve birimler arası korelasyonun varlığı sınanmıştır. Birimler arası korelasyonun varlığından dolayı, serinin durağanlığı ikinci kuşak panel birim kök testleriyle incelenmiştir. Birimler arası korelasyonun varlığından dolayı, değişkenler arasında uzun dönemde bir denge ilişkisinin olup olmadığı ikinci kuşak panel eşbütünleşme testleriyle incelenmiştir. Homojenlik testi sonucunda bu testlerden heterojen olanlar kullanılmıştır. Model tahmin edilmiştir.
These are some slides I use in my Multivariate Statistics course to teach psychology graduate student the basics of structural equation modeling using the lavaan package in R. Topics are at an introductory level, for someone without prior experience with the topic.
This is an elaborate presentation on how to predict employee attrition using various machine learning models. This presentation will take you through the process of statistical model building using Python.
A Comprehensive and detailed approach using Machine Learning - An international E-Commerce company wants to use some of the most advanced machine learning techniques to analyse their customers with respect to their services and some important customer success matrix. They also have future expansion plans to India.
Explore how data science can be used to predict employee churn using this data science project presentation, allowing organizations to proactively address retention issues. This student presentation from Boston Institute of Analytics showcases the methodology, insights, and implications of predicting employee turnover. visit https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/ for more data science insights
Introduction to Structural Equation Modeling Partial Least Sqaures (SEM-PLS)Ali Asgari
Partial least squares structural equation modelling (PLS-SEM) has recently received considerable attention in a variety of disciplines.The goal of PLS-SEM is the explanation of variances (prediction-oriented approach of the methodology) rather than explaining covariances (theory testing via covariance-based SEM).
Trying to decide whether something is true or not? Need to know the validity of a statement or opinion? Want to know whether research is credible? Use the guidance in this slideshow to evaluate published research for accuracy and credibility.
Stata Uygulamalı Panel Eşbütünleşme Testleri ve Model Tahminiyigitcanozmeral
Model seçimi için uygulanan testler sonucunda veriye sabit etkiler modelin uygun olduğu görülmüş, heteroskedasite, otokorelasyon ve birimler arası korelasyonun varlığı sınanmıştır. Birimler arası korelasyonun varlığından dolayı, serinin durağanlığı ikinci kuşak panel birim kök testleriyle incelenmiştir. Birimler arası korelasyonun varlığından dolayı, değişkenler arasında uzun dönemde bir denge ilişkisinin olup olmadığı ikinci kuşak panel eşbütünleşme testleriyle incelenmiştir. Homojenlik testi sonucunda bu testlerden heterojen olanlar kullanılmıştır. Model tahmin edilmiştir.
These are some slides I use in my Multivariate Statistics course to teach psychology graduate student the basics of structural equation modeling using the lavaan package in R. Topics are at an introductory level, for someone without prior experience with the topic.
This is an elaborate presentation on how to predict employee attrition using various machine learning models. This presentation will take you through the process of statistical model building using Python.
A Comprehensive and detailed approach using Machine Learning - An international E-Commerce company wants to use some of the most advanced machine learning techniques to analyse their customers with respect to their services and some important customer success matrix. They also have future expansion plans to India.
Explore how data science can be used to predict employee churn using this data science project presentation, allowing organizations to proactively address retention issues. This student presentation from Boston Institute of Analytics showcases the methodology, insights, and implications of predicting employee turnover. visit https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/ for more data science insights
As part of our team's enrollment for Data Science Super Specialization course under UpX Academy, we submitted many projects for our final assessments, one of them was Telecom Churn Analysis Model.
The input data was provided by UpX academy and language we used is R. As part of the project, our main objective was :-
-> To predict Customer Churn.
-> To Highlight the main variables/factors influencing Customer Churn.
-> To Use various ML algorithms to build prediction models, evaluate the accuracy and performance of these models.
-> Finding out the best model for our business case & providing executive Summary.
To address the mentioned business problem, we tried to follow a thorough approach. We did a detailed level Exploratory Data Analysis which consists of various Box Plots, Bar Plots etc..
Further we tried our best to build as many Classification models possible which fits our business case (Logistic Regression/kNN/Decision Trees/Random Forest/SVM) and also tried to touch Cox Hazard Survival analysis Model. Later for every model we tried to boost their performances by applying various performance tuning techniques.
As we all are still into our learning mode w.r.t these concepts & starting new, please feel free to provide feedback on our work. Any suggestions are most welcome... :)
Thanks!!
Random forest algorithm for regression a beginner's guideprateek kumar
Two popular families of ensemble methods
BAGGING
Several estimators are built independently on subsets of the data and their predictions are averaged. Typically, the combined estimator is usually better than any of the single base estimator.
Bagging can reduce variance with little to no effect on bias.
ex: Random Forests
BOOSTING
Base estimators are built sequentially. Each subsequent estimator focuses on the weaknesses of the previous estimators. In essence, several weak models “team up” to produce a powerful ensemble model.
Boosting can reduce bias without incurring higher variance.
ex: Gradient Boosted Trees, AdaBoost
Bagging
The ensemble method we will be using today is called bagging, which is short for bootstrap aggregating.
Bagging builds multiple base models with resampled training data with replacement. We train k base classifiers on k different samples of training data. Using random subsets of the data to train base models promotes more differences between the base models.
We can use the BaggingRegressor class to form an ensemble of regressors. One such Bagging algorithms are random forest regressor. A random forest regressor is a meta estimator that fits a number of classifying decision trees on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting. The sub-sample size is always the same as the original input sample size but the samples are drawn with replacement if bootstrap=True (default).
Random Forest Regressors uses some kind of splitting criterion to measure the quality of a split. Supported criteria are “MSE” for the mean squared error, which is equal to variance reduction as feature selection criterion, and “Mean Absolute Error” for the mean absolute error.
Problem Statement:
To predict the median prices of homes located in the Boston area given other attributes of the house
Software analytics focuses on analyzing and modeling a rich source of software data using well-established data analytics techniques in order to glean actionable insights for improving development practices, productivity, and software quality. However, if care is not taken when analyzing and modeling software data, the predictions and insights that are derived from analytical models may be inaccurate and unreliable. The goal of this hands-on tutorial is to guide participants on how to (1) analyze software data using statistical techniques like correlation analysis, hypothesis testing, effect size analysis, and multiple comparisons, (2) develop accurate, reliable, and reproducible analytical models, (3) interpret the models to uncover relationships and insights, and (4) discuss pitfalls associated with analytical techniques including hands-on examples with real software data. R will be the primary programming language. Code samples will be available in a public GitHub repository. Participants will do exercises via either RStudio or Jupyter Notebook through Binder.
R Programming For Beginners | R Language Tutorial | R Tutorial For Beginners ...Edureka!
This Edureka R Programming Tutorial For Beginners (R Tutorial Blog: https://goo.gl/mia382) will help you in understanding the fundamentals of R and will help you build a strong foundation in R. Below are the topics covered in this tutorial:
1. Variables
2. Data types
3. Operators
4. Conditional Statements
5. Loops
6. Strings
7. Functions
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Yao Yao
https://github.com/yaowser/data_mining_group_project
https://www.kaggle.com/c/zillow-prize-1/data
From the Zillow real estate data set of properties in the southern California area, conduct the following data cleaning, data analysis, predictive analysis, and machine learning algorithms:
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regression Model Performance, Optimizing Support Vector Machine Classifier, Accuracy of results and efficiency, Logistic Regression Feature Importance, interpretation of support vectors, Density Graph
Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...Sease
Interleaving is an online evaluation approach for information retrieval systems that compares the effectiveness of ranking functions in interpreting the users’ implicit feedback. Previous work such as Hofmann et al. (2011) has evaluated the most promising interleaved methods at the time, on uniform distributions of queries. In the real world, usually, there is an unbalanced distribution of repeated queries that follows a long-tailed users’ search demand curve. This paper first aims to reproduce the Team Draft Interleaving accuracy evaluation on uniform query distributions and then focuses on assessing how this method generalises to long-tailed real-world scenarios. The replicability work raised interesting considerations on how the winning ranking function for each query should impact the overall winner for the entire evaluation. Based on what was observed, we propose that not all the queries should contribute to the final decision in equal proportion. As a result of these insights, we designed two variations of the ∆AB score winner estimator that assign to each query a credit based on statistical hypothesis testing. To reproduce, replicate and extend the original work, we have developed from scratch a system that simulates a search engine and users’ interactions from datasets from the industry. Our experiments confirm our intuition and show that our methods are promising in terms of accuracy, sensitivity, and robustness to noise.
Data science with R - Clustering and ClassificationBrigitte Mueller
This presentation guides you through your first steps to a prediction with R. We predict flight delays using classification. I prepared and cleaned the data and split them into train and test data (github link /mbbrigitte).
The talk was held in May 2016 for Ruby programmers.
Introduction to Data Analytics starting with
OLS.
This is the first of a series of essays. I will share essays on unsupervised learning, dimensionality reduction and anomaly/outlier detection.
Building a Data Academy: Presentation to Pittsburgh Chapter, Association for ...George Mount
In an information age, data powers business; it’s moved from a desirable to an essential workforce skillset. Tasked with the effort to build a data-driven culture and strategy, talent development leaders may rush to hire throngs of data scientists to implement cutting-edge artificial intelligence (AI) techniques. But the most data-savvy cultures take time and are built from within, by upskilling current talent.
In this presentation, Stringfest Analytics CEO George Mount will offer a roadmap for organizations to develop data training academies. He’ll explain the dangers of rushing data strategy or attempting to hire away the problem and provide concrete steps to get started with the process. Attendees will pick up techniques to sketch, implement and ultimately measure the impact of their data academy.
George Mount is the founder and CEO of Stringfest Analytics, a consulting firm specializing in analytics education. He has worked with leading bootcamps, learning providers, and practice organizations to help individuals excel at analytics. The author of the O’Reilly Media book, Advancing into Analytics: From Excel to Python and R, he also blogs on analytics and data education at stringfestanalytics.com.
George holds a bachelor’s degree in economics from Hillsdale College and master’s degrees in finance and information systems from Case Western Reserve University. He resides in Cleveland, Ohio.
Building the Data Academy (Pluralsight LIVE presentation)George Mount
Slides for breakout session given at 2020 Pluralsight LIVE online conference. Description:
In September 2019, McKinsey called for the rise of the in-house “analytics academy” to up-skill employees’ data literacy and prepare organizations for the changes from artificial intelligence and automation. In this session, I will share tips for designing effective data analytics training programs acquired from working with leading technology and development organizations. Attendees will gain a step-by-step blueprint for building a data analytics academy, making the case to management, to identifying candidates and topics, to measuring its effectiveness.
Blogging Effectively about Coding (WordCamp Denver 2020)George Mount
Technical blogging has meant everything to my career. My clients have come from the content I produce.
I also see technical blogging as a way to build my community’s knowledge base and make learning social. Some of my most popular blog posts come from a friend asking about how to do something in Excel.
Full talk description below:
The WordPress community thrives when we share our technical expertise with each other. But, we’ve likely all met someone who is technically brilliant but struggles as a teacher.
In this talk, I will share my tips and tricks for developing effective technical instruction from over six years of blogging and online content creation. While my background is in data analytics, users in web development, technical writing and other fields will benefit from a discussion of blogging effectively about tech (particularly code).
Not only will attendees have a path forward for developing solid technical blog posts, I will lay out the win-win that this skill will have on their careers and the community as a whole.
Demo guide: The central limit theorem, visualized in ExcelGeorge Mount
Excel is not just a powerful tool for doing data, but for learning it: spreadsheets provide an unparalleled opportunity to look at the data, and watch it take shape before your eyes.
MailChimp data scientist John Foreman stated the advantage of learning data analysis with Excel like this: “You get to look at the data every step of the way, building confidence while learning the tricks of the trade.”
That’s why my demo guide on the central limit theorem, one of the most important statistical principles, is based not in a statistical programming language, but Excel.
What’s more, my example isn’t based on some sophisticated dataset or decontextualized random numbers, but the real-world example of a roulette wheel.
And, while the example of a roulette wheel may come from real life, the central limit theorm that it illustrates seems almost magical.
In solving analytics problems with software, the poor craftsman blames his tools... but at the same time, if they only tool one has is a hammer, everything looks like a nail.
I have identified four categories of applications that an analyst should have at their disposal: spreadsheets, databases, BI/dashboard platforms and programming languages.
None of these is "better" than another: they are all slices of the same stack. The superior analyst has the wisdom to choose the right set of tools for the right task.
"Blended learning" has become a trendy concept in education, but what is it, and what does it look like in the context of professional data education & data academies?
I provide a brief overview of the concept of blended learning and propose how to apply it to data education using four delivery methods: in-person learning, online study, work-based projects and hackathons.
Teaching coding: What is pair programming?George Mount
Pair programming is an exercise that's more or less what it sounds like, which is programming in pairs.
In this framework, a “driver” and a “navigator” trade off on roles. The driver is behind the "wheel," or keyboard, pushing ahead toward the destination. The navigator is in the co-pilot position, making sure the driver stays on course.
Pair programming is a very controversional practice in tech, but for the purposes of learning it can be a great occasional aide, because it makes students think out loud about their work and helps with giving and receiving feedback -- a huge part of professional success in general, not just for coding.
Hiring data scientists doesn't make a data cultureGeorge Mount
Some organizations try to kick-start their data capacities by hiring loads of highly-trained data scientists or investing in machine learning infrastructures.
This will not be a successful strategy because it surpasses necessary steps in building a data culture. Organizations are better off up-skilling the talent they have.
It’s time to open-source your data tools and processesGeorge Mount
When I took an empirical finance in graduate school [redacted] years ago, the course was conducted in SAS. I’ve never used SAS since: it’s too expensive!
That same course today is conducted in Python, which is a free- and open-source tool. Even the starched-collar, highly-regulated finance industry has opened up to open-source tools. They’ve become a part of many organization data strategies.
But I propose that companies go a step further by not just consuming open-source data tools, but producing them too. Here’s an example of what I mean.
Data is private and proprietary, but that doesn’t mean the tools used to analyze that data should be.
Most organizations have already moved from expensive proprietary tools like SAS to free, open-source ones like R. The advanced organizations, however, have even moved to producing open-source tools, not just consuming them.
It’s another angle to consider in the quest for data up-skilling: cultivate top talent, release solid tools for free, and watch more top talent take notice.
Building the data academy: Measuring ROIGeorge Mount
Most executives won’t get too excited about building something as grandiose-sounding as a “data academy,” but the alternative is even less feasible. A data-upskilled workforce is essential to compete, and building a data academy to get there is the highest-ROI strategy.
Ultimately, you’ll see that education is the most effective strategy for the necessary improvement of your organizaton’s data capacities.
Building your data academy: Assessing candidate skillsGeorge Mount
Analytics is a rapidly evolving field of high business value, so your organization needs a robust up-skilling strategy. I call the type of institutionalized data education platform organizations need a “data academy.”
Enrollment in the academy is a good time to take stock of candidate skills: after all, the nature of the education depends on the current skill levels of its students. So, what are some ways to assess the data skills of your candidates in relation to what your organization needs?
Take the time to assess your candidate data skills and motivations and you’ll be surprised at the talent you’ve got in house. The data academy will cultivate that talent.
Teaching coding: What is a faded example?George Mount
Imagine a faded example as a sort of fill-in-the-blank challenge, but with code. You can start students with a nearly completed block of code, but ask them to fill in some blanks.
Over time, you can ask for more and more blanks until the student is essentially coding the whole problem. This arrangement is often known as “scaffolding.”
The benefits of faded examples
This approach helps students work from example to example, learning progressively more about the code as they advance. It introduces an element of interactivity and debugging into the exercise in a way that starting off a blank file does not.
YouTube is not a data upskilling strategyGeorge Mount
Don’t get me wrong, there is some mind-bending educational content on YouTube. It’s a great place to learn.
But, if you are counting on your employees to use YouTube (or really any autodidactic medium) for their data upskilling, you are setting up your company for failure. Here’s why.
Five myths about learning data analyticsGeorge Mount
Myth 1: "I need to go to school for that:
Myth 2: "I won't like it because I don't like math"
Myth 3: "I don't need it to do my job"
Myth 4: "I *only* need it to do my job"
Myth 5: "I'm not the right 'type'"
Fact: You can learn data analytics.
In September 2019, leading consulting firm, McKinsey, called for the rise of the in-house “analytics academy” to up-skill employees’ data literacy and prepare organizations for the changes from artificial intelligence and automation.
In this talk, George Mount will share tips acquired for designing effective data analytics training programs after over a year of developing technical content professionally. Attendees will learn how to hire and work with subject matter experts, develop effective technical content, and evaluate a program’s effectiveness.
George hopes to bridge the gap between content developers, subject matter experts, and technical writers in developing successful data analytics content and programs.
Learning & Development professionals and instructional designers will also gain tips for building the analytics academy that is essential to their organizations.
Takeaways: Attendees will have concrete steps to take in building their organization’s analytics academy.
Slides presented at the Greater Cleveland R User Group on July 31st, 2019.
This will be an introduction to the survey and measure development process with emphasis on applications in R. We will first explore and visualize self-report survey data, then walk through exploratory factor analysis. Finally, we will formally test our measurement model and use it to predict outside variables of interest.
It's not what you know, it's how you show.George Mount
-Are you an artist?
-What is ‘hierarchying?’
-Have you bought into the ‘biker bar fallacy?’
We will discuss these and more. Ultimately, this talk challenges professionals to consider themselves as artists, not workers, and what this distinction means for effective networking.
Hey, Analyst! Learn Some Content MarketingGeorge Mount
This is a preview of my forthcoming course on networking for analysts. Learn more at georgejmount.com/networkingcourse
In a freelance-based economy, we will all have to sell our skills through content marketing -- even "data nerds." Here are some of the key lessons I've taken from content marketing, and how analysts can use it to build a stronger network.
This is a preview of my course on networking for analysts. Learn more at georgejmount.com/networkingcourse.
I was introduced to the concept of the "social employee" by Mark and Cheryl Burgess, who argue that companies can build a strong brand through the content and creation of their own employees.
I took this concept to heart and became a "social analyst," getting the attention of peers and influencers. Here is why even chart monkeys should strive to be social.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...2023240532
Quantitative data Analysis
Overview
Reliability Analysis (Cronbach Alpha)
Common Method Bias (Harman Single Factor Test)
Frequency Analysis (Demographic)
Descriptive Analysis
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
1. Path Analysis and Mediation in lavaan
George Mount - george@georgejmount.com - @gjmount
2. LAtent VAriable ANalysis
lavaan is available as a beta package for structural equation modeling.
In this example we will examine the mediating effects of self-esteem on the relationship
between grades and happiness.
#call lavaan
library(lavaan)
## This is lavaan 0.5-23.1097
## lavaan is BETA software! Please report any bugs.
#read file, name columns
myData <- read.csv('http://static.lib.virginia.edu/statlab/materials/data/mediat
colnames(myData) <- c("grades","selfesteem","happiness")
3. In classical statistics, a mediation model is one in which the dependent variable is
influenced by the mediator variable (B), which is influenced by the independent variable
(A). Thus the direct effect from the independent to dependent variable is insignificant
(C).
4. First we must specify the model. In lavaan the model is put in quotation marks. The
usual ~ mark is used for a regression and parameters are labeled for model specification.
set.seed(1234)
med.model <- '#direct effect
happiness ~ c * grades
#mediators
happiness ~ b * selfesteem
selfesteem ~ a * grades
#indirect effects
indirect := a*b
#direct effects
direct := c
#total effects
total := c + (a*b)'
5. To run the model, we use the sem function.
fit <- sem(med.model, data = myData)
Then we look at the results. We will choose to look at the standardized results and
model fit statistics.
6. summary(fit, standardized=T, fit.measures = TRUE, rsquare = TRUE)
## lavaan (0.5-23.1097) converged normally after 14 iterations
##
## Number of observations 100
##
## Estimator ML
## Minimum Function Test Statistic 0.000
## Degrees of freedom 0
## Minimum Function Value 0.0000000000000
##
## Model test baseline model:
##
## Minimum Function Test Statistic 77.413
## Degrees of freedom 3
## P-value 0.000
##
## User model versus baseline model:
##
## Comparative Fit Index (CFI) 1.000
## Tucker-Lewis Index (TLI) 1.000
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -576.601
## Loglikelihood unrestricted model (H1) -576.601
##
## Number of free parameters 5
## Akaike (AIC) 1163.203
## Bayesian (BIC) 1176.229
## Sample-size adjusted Bayesian (BIC) 1160.438
9. Compare this to the results of regression happiness on grades.
set.seed(1234)
med.model.b <- '#direct effect
happiness ~ c * grades
#direct effects
direct := c'
fit.b <- sem(med.model.b, data = myData)
10. set.seed(1234)
med.model.b <- '#direct effect
happiness ~ c * grades
#direct effects
direct := c'
fit.b <- sem(med.model.b, data = myData)
summary(fit.b, standardized=T, fit.measures = TRUE, rsquare = TRUE)
## lavaan (0.5-23.1097) converged normally after 11 iterations
##
## Number of observations 100
##
## Estimator ML
## Minimum Function Test Statistic 0.000
## Degrees of freedom 0
## Minimum Function Value 0.0000000000000
##
## Model test baseline model:
##
## Minimum Function Test Statistic 12.185
## Degrees of freedom 1
## P-value 0.000
##
## User model versus baseline model:
##
## Comparative Fit Index (CFI) 1.000
11. ## Tucker-Lewis Index (TLI) 1.000
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -403.548
## Loglikelihood unrestricted model (H1) -403.548
##
## Number of free parameters 2
## Akaike (AIC) 811.096
## Bayesian (BIC) 816.307
## Sample-size adjusted Bayesian (BIC) 809.990
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.000
## 90 Percent Confidence Interval 0.000 0.000
## P-value RMSEA <= 0.05 NA
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.000
##
## Parameter Estimates:
##
## Information Expected
## Standard Errors Standard
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## happiness ~