This document provides an overview of structural equation modeling (SEM) techniques. It discusses key SEM concepts like latent constructs, measurement models, structural models, and model estimation. The document also covers advanced SEM topics such as comparing covariance-based and variance-based approaches, handling formative versus reflective measures, and analyzing moderators and mediators. Overall, the document serves as a guide for researchers to understand and apply quantitative data analysis methods using SEM.
Prediction Of Bioactivity From Chemical StructureJeremy Besnard
The document provides an overview of quantitative structure-activity relationship (QSAR) modeling for predicting bioactivity from chemical structure. It discusses different types of QSAR models for continuous and categorical activity predictions. It also covers topics like molecular descriptors, model validation, and various statistical methods used in QSAR like linear regression, recursive partitioning, naïve Bayesian classifiers, and more. The document aims to give a practical introduction to key concepts in chemoinformatics and QSAR modeling.
Simplifying effort estimation based on use case pointsAbdulrhman Shaheen
This document describes an experiment to evaluate methods for simplifying the use case points (UCP) software effort estimation technique while maintaining accuracy. The experiment analyzed 14 projects to test hypotheses around simplifying factors like actors, transactions vs steps, and technical/environmental factors. Results found that transactions outperformed steps, many factors had minor impact or overlapped and could be removed, and total steps or transactions could estimate effort nearly as well as UCP with simpler calculations. Threats to the study's validity were also addressed.
Guidelines to Understanding Design of Experiment and Reliability Predictionijsrd.com
This paper will focus on how to plan experiments effectively and how to analyse data correctly. Practical and correct methods for analysing data from life testing will also be provided. This paper gives an extensive overview of reliability issues, definitions and prediction methods currently used in the industry. It defines different methods and correlations between these methods in order to make reliability comparison statements from different manufacturers' in easy way that may use different prediction methods and databases for failure rates. The paper finds however such comparison very difficult and risky unless the conditions for the reliability statements are scrutinized and analysed in detail.
Dowhy: An end-to-end library for causal inferenceAmit Sharma
In addition to efficient statistical estimators of a treatment's effect, successful application of causal inference requires specifying assumptions about the mechanisms underlying observed data and testing whether they are valid, and to what extent. However, most libraries for causal inference focus only on the task of providing powerful statistical estimators. We describe DoWhy, an open-source Python library that is built with causal assumptions as its first-class citizens, based on the formal framework of causal graphs to specify and test causal assumptions. DoWhy presents an API for the four steps common to any causal analysis---1) modeling the data using a causal graph and structural assumptions, 2) identifying whether the desired effect is estimable under the causal model, 3) estimating the effect using statistical estimators, and finally 4) refuting the obtained estimate through robustness checks and sensitivity analyses. In particular, DoWhy implements a number of robustness checks including placebo tests, bootstrap tests, and tests for unoberved confounding. DoWhy is an extensible library that supports interoperability with other implementations, such as EconML and CausalML for the the estimation step.
Service Management: Forecasting Hydrogen Demandirrosennen
The document discusses various data science methodologies that can be used for forecasting hydrogen demand in the industrial sector. It covers time series forecasting methods like exponential smoothing, ARIMA, and Prophet. Machine learning regression techniques including linear, logistic, and support vector regression are presented. Deep learning neural networks such as RNNs and LSTMs are also discussed. The document advocates for hybrid and ensemble methods. Additional topics include forecasting with external factors, demand segmentation, real-time data integration, cross-validation, and continuous monitoring and adjustment. RNNs have shown effectiveness for hydrogen demand forecasting. Ensemble models can outperform single methods when applied to complex phenomena. Real-time data is critical for accurate forecasts.
Data Trend Analysis by Assigning Polynomial Function For Given Data SetIJCERT
This paper aims at explaining the method of creating a polynomial equation out of the given data set which can be used as a representation of the data itself and can be used to run aggregation against itself to find the results. This approach uses least-squares technique to construct a model of data and fit to a polynomial. Differential calculus technique is used on this equation to generate the aggregated results that represents the original data set.
This document provides an introduction to measurement system analysis (MSA). It defines key terms related to measurement precision and accuracy, including repeatability, reproducibility, bias, linearity, and stability. It also describes how to use Minitab to conduct various types of MSA, including gauge repeatability and reproducibility studies, gauge linearity and bias studies, and attribute versus variable gauges. The goal of MSA is to determine how much observed process variation is due to the measurement system versus real variation between parts.
Prediction Of Bioactivity From Chemical StructureJeremy Besnard
The document provides an overview of quantitative structure-activity relationship (QSAR) modeling for predicting bioactivity from chemical structure. It discusses different types of QSAR models for continuous and categorical activity predictions. It also covers topics like molecular descriptors, model validation, and various statistical methods used in QSAR like linear regression, recursive partitioning, naïve Bayesian classifiers, and more. The document aims to give a practical introduction to key concepts in chemoinformatics and QSAR modeling.
Simplifying effort estimation based on use case pointsAbdulrhman Shaheen
This document describes an experiment to evaluate methods for simplifying the use case points (UCP) software effort estimation technique while maintaining accuracy. The experiment analyzed 14 projects to test hypotheses around simplifying factors like actors, transactions vs steps, and technical/environmental factors. Results found that transactions outperformed steps, many factors had minor impact or overlapped and could be removed, and total steps or transactions could estimate effort nearly as well as UCP with simpler calculations. Threats to the study's validity were also addressed.
Guidelines to Understanding Design of Experiment and Reliability Predictionijsrd.com
This paper will focus on how to plan experiments effectively and how to analyse data correctly. Practical and correct methods for analysing data from life testing will also be provided. This paper gives an extensive overview of reliability issues, definitions and prediction methods currently used in the industry. It defines different methods and correlations between these methods in order to make reliability comparison statements from different manufacturers' in easy way that may use different prediction methods and databases for failure rates. The paper finds however such comparison very difficult and risky unless the conditions for the reliability statements are scrutinized and analysed in detail.
Dowhy: An end-to-end library for causal inferenceAmit Sharma
In addition to efficient statistical estimators of a treatment's effect, successful application of causal inference requires specifying assumptions about the mechanisms underlying observed data and testing whether they are valid, and to what extent. However, most libraries for causal inference focus only on the task of providing powerful statistical estimators. We describe DoWhy, an open-source Python library that is built with causal assumptions as its first-class citizens, based on the formal framework of causal graphs to specify and test causal assumptions. DoWhy presents an API for the four steps common to any causal analysis---1) modeling the data using a causal graph and structural assumptions, 2) identifying whether the desired effect is estimable under the causal model, 3) estimating the effect using statistical estimators, and finally 4) refuting the obtained estimate through robustness checks and sensitivity analyses. In particular, DoWhy implements a number of robustness checks including placebo tests, bootstrap tests, and tests for unoberved confounding. DoWhy is an extensible library that supports interoperability with other implementations, such as EconML and CausalML for the the estimation step.
Service Management: Forecasting Hydrogen Demandirrosennen
The document discusses various data science methodologies that can be used for forecasting hydrogen demand in the industrial sector. It covers time series forecasting methods like exponential smoothing, ARIMA, and Prophet. Machine learning regression techniques including linear, logistic, and support vector regression are presented. Deep learning neural networks such as RNNs and LSTMs are also discussed. The document advocates for hybrid and ensemble methods. Additional topics include forecasting with external factors, demand segmentation, real-time data integration, cross-validation, and continuous monitoring and adjustment. RNNs have shown effectiveness for hydrogen demand forecasting. Ensemble models can outperform single methods when applied to complex phenomena. Real-time data is critical for accurate forecasts.
Data Trend Analysis by Assigning Polynomial Function For Given Data SetIJCERT
This paper aims at explaining the method of creating a polynomial equation out of the given data set which can be used as a representation of the data itself and can be used to run aggregation against itself to find the results. This approach uses least-squares technique to construct a model of data and fit to a polynomial. Differential calculus technique is used on this equation to generate the aggregated results that represents the original data set.
This document provides an introduction to measurement system analysis (MSA). It defines key terms related to measurement precision and accuracy, including repeatability, reproducibility, bias, linearity, and stability. It also describes how to use Minitab to conduct various types of MSA, including gauge repeatability and reproducibility studies, gauge linearity and bias studies, and attribute versus variable gauges. The goal of MSA is to determine how much observed process variation is due to the measurement system versus real variation between parts.
This document provides an introduction and overview of SmartPLS software for structural equation modeling. It discusses key concepts in SEM like latent and manifest variables, reflective and formative measurement models, and the structural and measurement models. It then demonstrates how to use SmartPLS by opening a project file, evaluating a sample research model and hypotheses, and interpreting the output metrics to assess model fit and quality.
Review of "Survey Research Methods & Design in Psychology"James Neill
This document provides an overview and summary of a lecture on survey research and design in psychology. It discusses key topics like the research process, survey design, data analysis, reliability and validity, sampling, and reporting results. Assessment involves a lab report on survey-based research and a final exam testing knowledge of research methods and statistical analysis techniques.
This document provides an overview of calibration, validation, and uncertainty analysis for environmental and hydrological modeling. It defines key concepts like calibration, validation, and uncertainty analysis. For calibration, it discusses finding parameter sets that minimize error between model outputs and observations while avoiding overfitting. Validation assesses model performance on new data. Uncertainty analysis quantifies uncertainty in model predictions. It also discusses sources of error and challenges in applying Bayesian methods due to non-normal errors and computational complexity. Simpler methods like GLUE (Generalized Likelihood Uncertainty Estimation) are also covered.
This document provides an overview of various multidimensional measurement and factor analysis techniques, including elementary linkage analysis, factor analysis, cluster analysis, multidimensional scaling, structural equation modeling, and multilevel modeling. It discusses the key stages and considerations for conducting factor analysis and interpreting the results, and provides examples of interpreting outputs from SPSS.
Business Market Research on Instant Messaging -2013Rajib Layek
The document discusses a social research study conducted to derive a model for measuring loyalty of IIT Masters and PhD students in using instant messaging on mobile. It aimed to verify if the model used in a Malaysian university study could be applied here. The researchers broke down loyalty into usefulness and satisfaction constructs, then further into variables like network size, perceived usefulness, and more. They tested the reliability and independence of questionnaire responses to validate the constructs. Factor and regression analyses showed satisfaction is defined by attention focus and perceived complementarity, while usefulness is defined by network size, perceived enjoyment and complementarity. The final standardized model showed loyalty is defined by satisfaction and usefulness.
1. The document discusses terms and concepts related to structural equation modeling (SEM) using SmartPLS software.
2. It provides details on normality assessment, internal consistency, convergent validity, and discriminant validity tests that were performed on survey data to evaluate the measurement model in SmartPLS.
3. Assessment of the structural model included calculating p-values and confidence intervals to test hypotheses and determine the significance and relevance of relationships between latent variables.
Computers have played an increasingly important role in pharmaceutical research and development since the 1960s. In the 1960s, computational chemistry was still primarily conducted in academia. Programs were shared through repositories like the Quantum Chemistry Program Exchange. In the 1970s, pharmaceutical companies like Lilly and Merck began adopting computational techniques. The 1980s saw further growth with advances in computing power from technologies like mainframe computers and personal computers. Statistical modeling and optimization techniques became more widely used in the 1990s to aid drug discovery and development. Population modeling also emerged as a tool to understand variability in drug exposure and response.
International Journal of Mathematics and Statistics Invention (IJMSI) inventionjournals
International Journal of Mathematics and Statistics Invention (IJMSI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJMSI publishes research articles and reviews within the whole field Mathematics and Statistics, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
This document provides several examples of statistical analysis results from hypothesis testing in different formats, including SPSS, Elsevier-Scopus, and regression analysis. It also includes examples of descriptive statistics, correlation analysis, and comparisons between qualitative and quantitative research methods.
This document discusses exploratory factor analysis (EFA). EFA is used to identify underlying factors that explain the pattern of correlations within a set of observed variables. The document outlines the steps of EFA including testing assumptions, constructing a correlation matrix, determining the number of factors, rotating factors, and interpreting the factor loadings. It provides an example of running EFA on a dataset with 11 physical performance and anthropometric variables from 21 participants. The analysis extracts 3 factors that explain over 80% of the total variance.
An Introduction to Factor analysis pptMukesh Bisht
This document discusses exploratory factor analysis (EFA). EFA is used to identify underlying factors that explain the pattern of correlations within a set of observed variables. The document outlines the steps of EFA, including testing assumptions, constructing a correlation matrix, determining the number of factors, rotating factors, and interpreting the factor loadings. It provides an example of running EFA on a dataset with 11 physical performance and anthropometric variables from 21 participants. The analysis extracts 3 factors that explain over 80% of the total variance.
The document describes the author's experience with evaluating a new machine learning algorithm. It discusses how:
- The author and a student developed a new algorithm that performed better than state-of-the-art according to standard evaluation practices, winning an award, but the NIH disagreed.
- This surprised the author and led to an realization that evaluation is more complex than initially understood, motivating the author to learn more about evaluation methods from other fields.
- The author has since co-written a book on evaluating machine learning algorithms from a classification perspective, and the tutorial presented is based on this book to provide an overview of issues in evaluation and available resources.
This document provides a summary of key concepts in physics measurements and units:
1. It defines base and derived physical quantities, with examples like length, mass, and velocity.
2. It discusses the importance of dimensional homogeneity in equations and lists several common SI base units like meters, kilograms, and seconds.
3. Errors in measurements are classified as either systematic or random. Precision refers to the spread of measurements while accuracy refers to how close measurements are to the true value.
The document summarizes key concepts from Chapter 7 on measurement in staffing activities, including defining measurement and discussing the importance, types, reliability, and validity of measures. It also addresses validity of measures in staffing, validity generalization, staffing metrics, data collection procedures, legal issues related to disparate impact, and requirements for standardization and validation.
Using the Machine to predict TestabilityMiguel Lopez
This document discusses using machine learning to predict testability based on source code metrics. It begins with an introduction to the presenting organization and definitions of testability and machine learning concepts. It then shows how decision trees and other machine learning approaches could be used to predict testability levels (high, medium, low) based on source code metrics like number of interfaces, abstractness, and coupling. As an example, metrics from 9 Java packages were analyzed to build and test a predictive model in the Weka machine learning software. However, the document notes the initial model is simplistic and could be improved by incorporating more metrics related to factors in the testability fishbone diagram.
This document provides an overview of quantitative data analysis techniques including descriptive statistics, reliability analysis, factor analysis, and various statistical tests. Descriptive statistics involve calculating frequencies, percentages, means, and cross-tabulations to summarize demographic and other variables. Reliability analysis using Cronbach's alpha is described to measure the internal consistency of scales. The steps for conducting an exploratory factor analysis are outlined. Finally, guidance is provided on selecting appropriate statistical tests such as t-tests, ANOVA, regression, chi-square, and Mann-Whitney U based on the variables' levels of measurement and number of groups being compared.
Data can provide information about the natural world. It can be collected through observation and measurement, and expressed quantitatively or qualitatively. Data patterns and relationships can be identified through graphing and analysis. The scientific method involves making observations, developing hypotheses, designing experiments to test hypotheses, and drawing conclusions. Data is key to advancing scientific theories and applying research findings.
"Six Sigma Green Belt Training Part 8" published by Skillogic.
If you are looking for six sigma training in Chennai, Mumbai and Hyderabad, then visit SKillogic. Regular batches are running during the weekends.
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
More Related Content
Similar to part-4-structural-equation-modelling-qr.pptx
This document provides an introduction and overview of SmartPLS software for structural equation modeling. It discusses key concepts in SEM like latent and manifest variables, reflective and formative measurement models, and the structural and measurement models. It then demonstrates how to use SmartPLS by opening a project file, evaluating a sample research model and hypotheses, and interpreting the output metrics to assess model fit and quality.
Review of "Survey Research Methods & Design in Psychology"James Neill
This document provides an overview and summary of a lecture on survey research and design in psychology. It discusses key topics like the research process, survey design, data analysis, reliability and validity, sampling, and reporting results. Assessment involves a lab report on survey-based research and a final exam testing knowledge of research methods and statistical analysis techniques.
This document provides an overview of calibration, validation, and uncertainty analysis for environmental and hydrological modeling. It defines key concepts like calibration, validation, and uncertainty analysis. For calibration, it discusses finding parameter sets that minimize error between model outputs and observations while avoiding overfitting. Validation assesses model performance on new data. Uncertainty analysis quantifies uncertainty in model predictions. It also discusses sources of error and challenges in applying Bayesian methods due to non-normal errors and computational complexity. Simpler methods like GLUE (Generalized Likelihood Uncertainty Estimation) are also covered.
This document provides an overview of various multidimensional measurement and factor analysis techniques, including elementary linkage analysis, factor analysis, cluster analysis, multidimensional scaling, structural equation modeling, and multilevel modeling. It discusses the key stages and considerations for conducting factor analysis and interpreting the results, and provides examples of interpreting outputs from SPSS.
Business Market Research on Instant Messaging -2013Rajib Layek
The document discusses a social research study conducted to derive a model for measuring loyalty of IIT Masters and PhD students in using instant messaging on mobile. It aimed to verify if the model used in a Malaysian university study could be applied here. The researchers broke down loyalty into usefulness and satisfaction constructs, then further into variables like network size, perceived usefulness, and more. They tested the reliability and independence of questionnaire responses to validate the constructs. Factor and regression analyses showed satisfaction is defined by attention focus and perceived complementarity, while usefulness is defined by network size, perceived enjoyment and complementarity. The final standardized model showed loyalty is defined by satisfaction and usefulness.
1. The document discusses terms and concepts related to structural equation modeling (SEM) using SmartPLS software.
2. It provides details on normality assessment, internal consistency, convergent validity, and discriminant validity tests that were performed on survey data to evaluate the measurement model in SmartPLS.
3. Assessment of the structural model included calculating p-values and confidence intervals to test hypotheses and determine the significance and relevance of relationships between latent variables.
Computers have played an increasingly important role in pharmaceutical research and development since the 1960s. In the 1960s, computational chemistry was still primarily conducted in academia. Programs were shared through repositories like the Quantum Chemistry Program Exchange. In the 1970s, pharmaceutical companies like Lilly and Merck began adopting computational techniques. The 1980s saw further growth with advances in computing power from technologies like mainframe computers and personal computers. Statistical modeling and optimization techniques became more widely used in the 1990s to aid drug discovery and development. Population modeling also emerged as a tool to understand variability in drug exposure and response.
International Journal of Mathematics and Statistics Invention (IJMSI) inventionjournals
International Journal of Mathematics and Statistics Invention (IJMSI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJMSI publishes research articles and reviews within the whole field Mathematics and Statistics, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
This document provides several examples of statistical analysis results from hypothesis testing in different formats, including SPSS, Elsevier-Scopus, and regression analysis. It also includes examples of descriptive statistics, correlation analysis, and comparisons between qualitative and quantitative research methods.
This document discusses exploratory factor analysis (EFA). EFA is used to identify underlying factors that explain the pattern of correlations within a set of observed variables. The document outlines the steps of EFA including testing assumptions, constructing a correlation matrix, determining the number of factors, rotating factors, and interpreting the factor loadings. It provides an example of running EFA on a dataset with 11 physical performance and anthropometric variables from 21 participants. The analysis extracts 3 factors that explain over 80% of the total variance.
An Introduction to Factor analysis pptMukesh Bisht
This document discusses exploratory factor analysis (EFA). EFA is used to identify underlying factors that explain the pattern of correlations within a set of observed variables. The document outlines the steps of EFA, including testing assumptions, constructing a correlation matrix, determining the number of factors, rotating factors, and interpreting the factor loadings. It provides an example of running EFA on a dataset with 11 physical performance and anthropometric variables from 21 participants. The analysis extracts 3 factors that explain over 80% of the total variance.
The document describes the author's experience with evaluating a new machine learning algorithm. It discusses how:
- The author and a student developed a new algorithm that performed better than state-of-the-art according to standard evaluation practices, winning an award, but the NIH disagreed.
- This surprised the author and led to an realization that evaluation is more complex than initially understood, motivating the author to learn more about evaluation methods from other fields.
- The author has since co-written a book on evaluating machine learning algorithms from a classification perspective, and the tutorial presented is based on this book to provide an overview of issues in evaluation and available resources.
This document provides a summary of key concepts in physics measurements and units:
1. It defines base and derived physical quantities, with examples like length, mass, and velocity.
2. It discusses the importance of dimensional homogeneity in equations and lists several common SI base units like meters, kilograms, and seconds.
3. Errors in measurements are classified as either systematic or random. Precision refers to the spread of measurements while accuracy refers to how close measurements are to the true value.
The document summarizes key concepts from Chapter 7 on measurement in staffing activities, including defining measurement and discussing the importance, types, reliability, and validity of measures. It also addresses validity of measures in staffing, validity generalization, staffing metrics, data collection procedures, legal issues related to disparate impact, and requirements for standardization and validation.
Using the Machine to predict TestabilityMiguel Lopez
This document discusses using machine learning to predict testability based on source code metrics. It begins with an introduction to the presenting organization and definitions of testability and machine learning concepts. It then shows how decision trees and other machine learning approaches could be used to predict testability levels (high, medium, low) based on source code metrics like number of interfaces, abstractness, and coupling. As an example, metrics from 9 Java packages were analyzed to build and test a predictive model in the Weka machine learning software. However, the document notes the initial model is simplistic and could be improved by incorporating more metrics related to factors in the testability fishbone diagram.
This document provides an overview of quantitative data analysis techniques including descriptive statistics, reliability analysis, factor analysis, and various statistical tests. Descriptive statistics involve calculating frequencies, percentages, means, and cross-tabulations to summarize demographic and other variables. Reliability analysis using Cronbach's alpha is described to measure the internal consistency of scales. The steps for conducting an exploratory factor analysis are outlined. Finally, guidance is provided on selecting appropriate statistical tests such as t-tests, ANOVA, regression, chi-square, and Mann-Whitney U based on the variables' levels of measurement and number of groups being compared.
Data can provide information about the natural world. It can be collected through observation and measurement, and expressed quantitatively or qualitatively. Data patterns and relationships can be identified through graphing and analysis. The scientific method involves making observations, developing hypotheses, designing experiments to test hypotheses, and drawing conclusions. Data is key to advancing scientific theories and applying research findings.
"Six Sigma Green Belt Training Part 8" published by Skillogic.
If you are looking for six sigma training in Chennai, Mumbai and Hyderabad, then visit SKillogic. Regular batches are running during the weekends.
Similar to part-4-structural-equation-modelling-qr.pptx (20)
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Natural Language Processing (NLP), RAG and its applications .pptxfkyes25
1. In the realm of Natural Language Processing (NLP), knowledge-intensive tasks such as question answering, fact verification, and open-domain dialogue generation require the integration of vast and up-to-date information. Traditional neural models, though powerful, struggle with encoding all necessary knowledge within their parameters, leading to limitations in generalization and scalability. The paper "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" introduces RAG (Retrieval-Augmented Generation), a novel framework that synergizes retrieval mechanisms with generative models, enhancing performance by dynamically incorporating external knowledge during inference.
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
5. When should you use…
1. Descriptive statistics
2. Regression models?
3. Analysis of variance models?
4. Time-series models?
5. Structural equation models?
Recap: Which statistical approach for which
question?
5
6. A so-called second generation data analysis method
First generation data analysis methods include techniques such as regression,
(multivariate) analysis of (co-)variance (ANOVA).
They are characterized by their shared limitation of being able to analyse only one
layer of linkages between independent and dependent variables at a time.
2nd generation methods allow for the simultaneous analysis of multiple independent
and dependent variables
encourages confirmatory rather than exploratory analysis.
Structural Equation Modelling
6
7. Complex research models:
Multiple associations between multiple independent and multiple dependent
variables
Usually also mediating and/or moderating variables present
▪ Latent concepts: Multi-dimensional constructs with several underlying dimensions
Satisfaction, usefulness, attitude etc.
Constructs that have multiple measures
Often measured with perceptual (self-report) data
Often: survey research but also in experiments and others
When do we use SEM?
7
8. abstractions about a phenomenon (e.g. usefulness, time, satisfaction, enjoyment) that
are latent in that they relate to a real thing but do not have a tangible existence:
Thus they cannot be measured directly
have indicators associated with them:
Measures are our approximations to latent constructs
– our empirical indicators that allow us to ‘grasp’
the latent construct.
1+ measure required per construct dimension
(also called substratum)
Typically multiple items because most constructs
are indeed complex concepts that have
multiple domains of meaning.
Latent constructs
8
13. SEM Overview
Four phases of analysis
1. (Descriptive statistics)
2. Measurement model estimation
3. Structural model estimation
4. Mediation/Moderation/supplementary
analyses
13
14. Demographic variable p-value
Type of organisation .206
Size of organisation .436
Size of modelling team .305
Country of origin .100
Years of experience in process modelling overall .346
Months of experience in process modelling with BPMN .639
Number of BPMN models created .345
Type of training .784
Use of modelling tool .060
Use of modelling guidelines .311
Use of BPMN constructs .542
Descriptive Statistics
e.g., assessment of non-response error
• Chi-square test of early versus late survey respondents
14
15. Model specification
Specification of an a-priori research model with theoretical constructs and hypothesized
relationships between them.
Model identification
Estimation of unknown parameters (such as factor loadings, path coefficients or explained
variance) based on observed correlations or covariances.
Model estimation
Finding of one set of model parameters that best fits the data.
Model fit testing
Assessment of how well a model fits the data.
Model re-specification
Improvement of either model parsimony or fit.
Structural Model Estimation: 5-stage process
15
17. Examines whether the model of what we measure fits the properties of the data
we collected
Often confused with confirmatory factor analysis.
The actual test criteria (for reflective models) is Goodness of Fit.
Measurement Model Estimation
Goodness of fit statistics for the measurement model (GFI = 0.91, NFI = 0.97, NNFI =
0.98, CFI = 0.98, SRMR = 0.05, RMSEA = 0.06, χ2 = 436.71, df = 155) suggest good fit
of the measurement model to the data set, considering the approximate benchmarks
suggested by Im and Grover (2004).
17
18. Assessment of the reliability and validity of the scales used.
Tests
Uni-dimensionality
A construct is uni-dimensional if its constituent items represent one underlying trait
Reliability and composite reliability
Reliability is defined as the degree to which scale items are free from error and,
therefore, yield consistent results.
Convergent validity
Convergent validity tests if measures that should be related are in fact related.
Discriminant validity
Discriminant validity refers to the degree to which items of different constructs are unique
from each other.
Measurement Model Estimation for Reflective
Measures
18
19. Validation via standard set of indices (e.g., Fornell and Larcker, 1981)
Uni-dimensionality:
Cronbach’s Alpha (α) > 0.7
Reliability:
Cronbach’s Alpha (α) > 0.8
Composite reliability (pc) > 0.5
Convergent validity:
Average variance extracted (AVE) > 0.5
Indicator factor loadings (λ) > 0.6
Indicator factor loadings significant at p < 0.05
Composite reliability (pc) > 0.8
Discriminant validity:
AVE should exceed the squared correlations between each of the constructs
Measurement Model Estimation
19
23. Perceived
Usefulness
R2
= 0.282 / 0.252
Perceived Ease of
Use
Intention to
Continue to Use
R2
= 0.310 / 0.317
0.531***
0.253***
0.572***
***
**
*
ns
p < 0.01
p < 0.001
p < 0.05
non significant
0.281***
0.502***
0.563***
Structural Model Estimation: Results Reporting
23
24. SEM Exercise based on
Recker, J. (2010): Explaining Usage of Process Modeling Grammars:
Comparing Three Theoretical Models in the Study of Two Grammars.
Information & Management, Vol. 47, No. 5-6, pp. 316-324
freely available from http://eprints.qut.edu.au/34162/
Let’s do it.
24
26. PLS versus LISREL (or AMOS)
Reflective vs formative models
Moderator analysis & mediation analysis
Missing data
SEM: Advanced matters: for Discussion
26
27. Or: correlation- versus covariance-based SEM
What are current beliefs?
What are strengths?
What are weaknesses?
1. PLS versus LISREL
27
28. non-normal data (50%), small sample size (46%), formative measures (33%),
prediction = research objective (28%), complex models (13%), categorical
variables (13%).
Average PLS sample size is 211 compared to 246 for CB-SEM. But 25% had
less than 100 observations, and 9% did not meet recommended sample size
criteria.
No studies report skewness or kurtosis.
42% reflective only; 6% formative only; 40% mixed; 12% no indication.
Reported Reasons for using PLS in Marketing:
28
29. Reported Reasons for using PLS in IS:
29
Ringle, C.M., Sarstedt, M., and Straub, D.W. "Editor's Comments: A Critical Look at the Use of PLS-SEM in
MIS Quarterly," MIS Quarterly (36:1) 2012, pp iii-xiv.
30. Reported PLS-SEM Model Types in IS:
30
Ringle, C.M., Sarstedt,
M., and Straub, D.W.
"Editor's Comments: A
Critical Look at the Use
of PLS-SEM in MIS
Quarterly," MIS
Quarterly (36:1) 2012,
pp iii-xiv.
31. 31
Criteria
Variance-Based Modeling
(e.g. SmartPLS, PLS Graph)
Covariance-Based Modeling
(e.g. LISREL, AMOS, Mplus)
Objective Prediction oriented Parameter oriented
Distribution Assumptions Non-parametric Normal distribution (parametric)
Required sample size Small (min. 30 – 100) High (min. 100 – 800)
Model complexity Large models OK
Large models problematic
(50+ indicator variables)
Parameter Estimates Potential Bias Stable, if assumptions met
Indicators per
construct
One – two OK
Large number OK
Typically 3 – 4 minimum to meet
identification requirements
Statistical tests for
parameter estimates
Inference requires Jackknifing or
Bootstrapping
Assumptions must be met
Measurement Model
Formative and Reflective
indicators OK
Typically only Reflective indicators
Goodness-of-fit measures None Many
32. 2. Formative vs reflective models
32
A construct could be measured reflectively or formatively.
Constructs are not necessarily (inherently) reflective or formative.
33. Formative vs reflective: Illustration
[Graphic courtesy of Robert Sainsbury, Mississippi State University] 33
34. Formative vs. Reflective
Let’s take firm performance as an example.
1. We can create a reflective scale that measures top managers’
views of how well the firm is performing.
These scale items can be interchangeable, and in this way let the
researcher assess the reliability of the measures in reflecting the construct.
2. Or we can create a set of metrics for firm performance that
measure disparate elements such as ROI, profitability, return on
equity, market share, etc.
These items are not interchangeable and, thus, are formative.
Formative vs reflective: Example
34
35. Reliability or internal consistency, as is traditionally measured, is not a factor in
formative constructs. We want multicollinearity and unidimensionality in
reflective constructs. Formative constructs are destabilized when this occurs.
May need to eliminate items if they are redundant.
Decomposed models or indices can change the meaning of the theoretical
relationship.
Consider the theoretical implications (not just empirical).
Even if empirical results are weak, theoretically certain (formative) measures
may be (very/not) important.
Unclear how to handle.
Debates for validation and measurement remain ongoing
Some issues with formative measures
35
36. 3. Mediation vs Moderation
36
Perceived
Usefulness
Perceived Ease of
Use
Intention to
Continue to Use
Perceived
Usefulness
Perceived Ease of
Use
Intention to
Continue to Use
40. Numbers needed
a = raw (unstandardized)
regression coefficient for the
association between IV and mediator.
sa = standard error of a.
b = raw coefficient for the association between
the mediator and the DV (when the IV is also a predictor of the DV).
sb = standard error of b.
The Sobel test works well only in large samples. A better example includes bootstrapping of raw data:
Preacher, K. J., & Hayes, A. F. (2004). SPSS and SAS procedures for estimating indirect effects in simple
Mediation models. Behavior Research Methods, Instruments, & Computers, 36, 717-731.
Sobel Mediation Test
Sobel, M. E. (1982). Asymptotic intervals for indirect effects in structural equations models. In S. Leinhart (Ed.), Sociological methodology 1982 (pp.290-
312). San Francisco: Jossey-Bass.
40
42. Procedure: Zhao et al. (2010)
42
Zhao, X., Lynch Jr., J.G., and Chen, Q. "Reconsidering Baron
and Kenny: Myths and Truths about Mediation Analysis,"
The Journal of Consumer Research (37:2) 2010, pp 197-206.
45. Moderation usually involves testing two SEMs: one for the sub-sample where
the values for the moderator are low; against the sub-sample where the values
for the moderator are high.
Example: PU ITU in the group where PEOU ratings are low/high.
Problem:
SEM are more complex and involves more than 1 relationship between 2 variables
(or constructs)
If not, we could run a (M)AN(C)OVA.
Conceptually, this is tricky theorizing: what should/would change exactly and why?
The problem with moderation analysis
45
47. For each moderator, split the data sample into two groups (high/low) and
compare the SEMs across the two sub-samples
Once model 1 is identified, the parameter from this model are used to constrain
model 2:
Model2a has factor loadings and error variances constrained to the parameters of
model 1.
Model2b has factor loadings free but error variances constrained to the
parameters of model 1.
Model2c has factor loadings and error variances free.
Model2d has factor loadings constrained to the parameters of model 1 but error
variances free.
Multi-sample path analysis in LISREL
47
48. Two tests of Δχ2 / Δdf:
Compare model 2b to 2c: if sig. different, this is caused by error
variances: ie measurement error, not moderation effect.
Compare model 2a to 2b: : if sig. different, this is caused by
different factor loadings and path coefficients: ie moderation
effect.
If both tests are sig. different, there is moderation and error
variance. In this case, shared error correlations (φ and ψ) must be
set to invariant to extract ‘true’ moderation effects.
Moderation Test
48
50. Multi-group analysis (MGA) in PLS is done by
comparing bootstrap parameters across models
estimated for two groups.
Essentially tests whether β(1) ≠ β(2).
Much simpler but less accurate than LISREL.
Multi-group analysis in PLS
50
52. Example
Recker, Jan C. & La Rosa, Marcello (2012) Understanding user differences in open-source workflow management system usage
intentions. Information Systems, 37(3), pp. 200-212.
52
53. Third approach to moderation analsyis:
Interaction terms
53
Perceived
Usefulness
Perceived Ease of
Use
Intention to
Continue to Use
55. PU PEOU PU x PEOU ITU
PU1 PEOU1 PU1 x PEOU1 ITU1
PU2 PEOU2 PU1 x PEOU2 ITU2
PU3 PEOU3 PU1 x PEOU3 ITU3
PU2 x PEOU1
PU2 x PEOU2
PU2 x PEOU3
PU3 x PEOU1
PU3 x PEOU2
PU3 x PEOU3
The corresponding construct table
55
56. Gefen, D., D.W. Straub and M.-C. Boudreau, “Structural Equation Modeling and Regression: Guidelines for
Research Practice,” Communications of the Association for Information Systems, 2000, 4:7.
Straub, D.W., M.-C. Boudreau and D. Gefen, “Validation Guidelines for IS Positivist Research,”
Communications of the Association for Information Systems, 2004, 13:24, pp. 380-427.
MacKenzie, S.B., P.M. Podsakoff and N.P. Podsakoff, “Construct Measurement and Validation Procedures
in MIS and Behavioral Research: Integrating New and Existing Techniques,” MIS Quarterly, 2011, 35:2, pp.
293-334.
Im, K.S. and V. Grover, “The Use of Structural Equation Modeling in IS Research: Review and
Recommendations” in Whitman, M.E. and A.B. Woszczynski (eds.) The Use of Structural Equation
Modeling in IS Research: Review and Recommendations, Idea Group Hershey, Pennsylvania, 2004, 44-65.
Vinzi, V.E., W.W. Chin, J. Henseler and H. Wang (eds.), Handbook of Partial Least Squares: Concepts,
Methods and Applications, Springer, New York, New York, 2010.
Jöreskog, K.G. and D. Sörbom, LISREL 8: User's Reference Guide, Lincolnwood, Illinois: Scientific
Software International, 2001.
References - basic
56
57. Moderation/Multi-group analysis: example
Im, I., Y. Kim and H.-J. Han, “The Effects of Perceived Risk and Technology Type on Users’ Acceptance of Technologies,” Information
& Management, 2008, 45:1, pp. 1-9.
Recker, J. and M. La Rosa, “Understanding User Differences in Open-Source Workflow Management System Usage Intentions,”
Information Systems, 2012, 37, pp. 200-212.
Mediation tests: example
Polites, G.L. and E. Karahanna, “Shackled to the Status Quo: The Inhibiting Effects of Incumbent System Habit, Switching Costs, and
Inertia on New System Acceptance ” MIS Quarterly, 2012, 36:1, pp. 21-42.
LISREL with missing data
Recker, J., M. Rosemann, P. Green and M. Indulska, “Do Ontological Deficiencies in Modeling Grammars Matter?,” MIS Quarterly,
2011, 35:1, pp. 57-79.
LISREL vs PLS
Goodhue, D.L., W. Lewis and R.L. Thompson, “Statistical Power in Analyzing Interaction Effects: Questioning the Advantage of PLS
With Product Indicators,” Information Systems Research, 2007, 18:2, pp. 211-227.
Ringle, C.M., Sarstedt, M., and Straub, D.W. "Editor's Comments: A Critical Look at the Use of PLS-SEM in MIS Quarterly," MIS
Quarterly (36:1) 2012, pp iii-xiv.
References – good examples
57
58. Straub, D.W., M.-C. Boudreau and D. Gefen, “Validation Guidelines for IS Positivist Research,” Communications of the
Association for Information Systems, 2004, 13:24, pp. 380-427.
MacKenzie, S.B., P.M. Podsakoff and N.P. Podsakoff, “Construct Measurement and Validation Procedures in MIS and
Behavioral Research: Integrating New and Existing Techniques,” MIS Quarterly, 2011, 35:2, pp. 293-334.
Wetzels, M., G. Odekerken-Schröder and C. Van Oppen, “Using PLS Path Modeling for Assessing Hierarchical
Construct Models: Guidelines and Empirical Illustration,” MIS Quarterly, 2009, 33:1, pp. 177-195.
Ringle, C.M., M. Sarstedt and D.W. Straub, “Editor's Comments: A Critical Look at the Use of PLS-SEM in MIS
Quarterly,” MIS Quarterly, 2012, 36:1, pp. iii-xiv.
Evermann, J. and M. Tate, “Fitting Covariance Models for Theory Generation,” Journal of the Association for Information
Systems, 2011, 12:9, pp. 632-661.
Evermann, J. and M. Tate, “Bayesian Structural Equation Models for Cumulative Theory Building in Information
Systems―A Brief Tutorial Using BUGS and R,” Communications of the Association for Information Systems, 2014,
34:77, pp. 1481-1514.
References – further reading
58
59. Online Statistics Textbook
http://www.statsoft.com/Textbook
Statistical tables calculator
http://vassarstats.net/tabs.html
Distribution tables for χ, t, F
http://www.medcalc.org/manual/chi-square-table.php
http://vassarstats.net/textbook/apx_d.html
http://www.psychstat.missouristate.edu/introbook/tdist.htm
Interactive mediation tests
http://quantpsy.org/sobel/sobel.htm
Websites I use
59