SlideShare a Scribd company logo
1 of 22
Overview of Multivariate
Statistical Methods
Thomas Uttaro, Ph.D., M.S.
Deputy Director and CIO,
South Beach Psychiatric Center
11th
Annual NYS-OMH Institute on Mental
Health Management Information
Introduction
 Concerned with data collected on several dimensions of
the same individual or units of analyses such as
geographic regions.
 Common in social, behavioral, life, and medical
sciences; medical and mental health outcomes,
economic indicators, demography.
 Extension of Univariate statistics, analysis of variation for
a single random variable, t tests, Correlation,
Regression, ANOVA, ANCOVA, Survival Analysis
 Multivariate techniques account for correlation of
measures due to common source for each individual or
other unit of analysis. These techniques also control
type I error rates with overall (experimentwise)
significance tests.
Preview of Multivariate Methods
 Multivariate General Linear Model-the
extension of ANOVA, ANCOVA, and regression
to a family of methods for multivariate outcomes.
 Principal Components Analysis-accounts for
variation in multivariate observations with a
smaller number of observed indices that are
linear combinations of the original variables.
 Factor Analysis -accounts for variation in
multiple outcomes with a linear combination of
unobserved factors and a variable specific term.
Preview of Multivariate Methods (cont.)
 Discriminant Analysis-concerned with
separating observations into known groups
based on multivariate observations.
 Cluster Analysis-concerned with identification
of unknown but interpretable groups and placing
individual observations within them.
 Canonical Correlation-individual variables are
divided into two groups, concerned with
describing the relationship between the two sets
through multivariate correlations.
General Linear Model: ANOVA, ANCOVA
and Multiple Regression
 General Linear Model unified framework relates ANOVA,
ANCOVA, and Regression methods.
 ANOVA predicts or relates factor or categorical
predictors to a single predicted continuous variable.
 Multiple Regression predicts or relates continuous
variables to a single predicted continuous variable.
 ANCOVA predicts or relates factor and continuous
variables to a single predicted continuous variable.
 A regression approach can be used with dummy
variables to perform ANOVA, hence the GLM model.
 Variations on these models exist to predict binary or
categorical outcomes (logistic regression, multinomial
regression).
ANOVA and Multiple Regression
Examples using SPSS 10.5
 ANOVA example relates FACA (2 levels, perhaps gender), FACB (3
levels, perhaps region) and the interaction FACA x FACB to a single
dependent variable (perhaps annual income).steve296anova.SPS
 F test indicates that the main effects FACA and FACB are significant
at the p<.001 level, FACA x FACB interaction is non-significant.
 Multiple regression example predicts instructor evaluation from 5
predictors: clarity, stimulation, knowledge, interest, and course
evaluation. Variables are continuous. Accounts for correlation
among predictor variables and determines which are most
important.stevep84MBAreg.SPS
 t tests of regression coefficients indicates that all variables except
interest are significant in predicting the level of instructor evaluation.
 Several diagnostics are output including residuals, leverages, and
Cook distance (influential data points) values. Plot of regression
standardized residuals should be approximately normal.
Multivariate General Linear Model
 Extensions of single dependent variable procedures
such as ANOVA, ANCOVA, and multiple regression.
 Statistical framework includes MANOVA (factor
predictors), MANCOVA (factors and continuous
predictors), and multivariate multiple regression
(continuous predictors).
 Prevents inflated overall type I error rate, accounts for
correlations among the predictors, can detect the joint
significance of a set of variables, even when univariate
analyses would not be significant.
 Hotelling’s T2
is the overall multivariate test statistic and a
generalization of the univariate t. Tests the Ho that the
population mean vectors are equal for two or more groups.
Multivariate General Linear Model (cont.)
 Multivariate significance implies that there is a linear
combination of dependent variables (the discriminant
function) that is separating the k groups.
 Multivariate test statistics are a function of eigenvalues
which are fundamental to all multivariate analyses.
 Four multivariate test statistics are commonly used,
Wilk’s Λ, Roy’s largest root, the Hotelling-Lawley trace,
and the Pillai Bartlett trace. Wilk’s Λ is most common.
 Following a significant finding, post hoc or planned
comparisons are then used to determine which variables
are driving the significance between groups.
Multivariate Regression Example using
SPSS 10.5
 Timm data on differences in cognitive tests due to
learning tasks. Scores on Ravin’s Progressive Matrices
and Peabody Picture Vocabulary regressed on 3
learning tasks.steve132multivreg.SPS
 Multivariate test statistic Wilk’s Λ is significant indicating
a significant relationship between the dependent
variables and the 3 predictors beyond the .01 level.
 Univariate F tests examine the regression on each
variable separately. In particular, NA (named action) is
related to PEVOCAB at t=2.68, p<.011.
 Univariate prediction equations do not take into account
correlations among dependent variables.
MANOVA Example with Tukey
post-hoc tests using SAS V8
 Novince data on improving social skills among college
women. 3 groups: control, behavioral rehearsal and
cognitive restructuring, 4 variables: anxiety, social
interaction skills, appropriateness, and assertiveness.
 SAS program used for 3 treatment group MANOVA with
4 measures to determine treatment effectiveness.
SASstevep204.sas
 Overall significant multivariate tests indicate true
differences between groups on one or more variables
and their linear combinations. Excellent optional output.
 Tukey post hoc tests generate significance levels and
confidence intervals to examine effects of variables.
Crisis Residence Treatment and the
Basis-32 at South Beach PC
 Treatment, gender, and GAF covariate effects on
BASIS-32 subscale scores, n=73 paired admissions and
discharges.
 Highly significant pre/post treatment effect, F=4.216, 5
df, p<.001. CR effective in terms of all BASIS-32
subscales.
 Significant GAF covariate effect F=5.271, 5 df, p<.001
strong relationship between clinician GAF and self-report
BASIS-32 on relationship to self/others and depression
subscales.
 Gender by treatment interaction non-significant. CR
equally effective for both genders in terms of subscale
scores
 Statistical diagnostics indicate excellent power for all
tests.
Principal Components Analysis
 Analysis based on a large number of original variables
can be simplified to a smaller number of standardized
linear combinations of original variables.
 x→y=Γ'(x-μ) where Γ is orthogonal Γ'ΣΓ=Λ. The ith
principal component of x may be defined as the ith
element of vector y, as yi=φ'i(x-μ), leading to uncorrelated
principal components. Principal components essentially
involves finding the eigenvalues of the covariance matrix
Σ.
 The first principal component has the largest variance of
all standardized linear combinations of x.
Principal Components Example using
S-Plus V6PrinComp.ssc
 Ph.D. qualifying examinations in five areas of
mathematics for 25 students.
 Analysis carried out using S-Plus princomp function
which returns object of mode princomp.
 A large coefficient (absolute value) corresponds to a high
loading, while a coefficient near zero has a low loading.
 First principal component loadings are of moderate size
in the same direction representing an average score.
 Second principal component contrasts two closed book
exams with three open book exams, with the first and
last exams weighted most heavily.
 Plots of the principal component loadings and the biplot of
original and transformed test scores in two dimensional
principal component space.
Factor Analysis
 Factor Analysis explains correlations between observed
variables with underlying factors.
 x=μ+Λf+u Λ={λij} is matrix of factor loadings, f and u represent
the common and unique factors respectively. Equivalently,
Σ=ΛΛ'+Ψ, decomposition into factor and error covariances.
 Diagonal of factor covariance matrix is the vector of
communalities h2
, common variation in the factors and Ψij is
the vector of uniquenesses, the variation in xi not shared with
the other variables. These sum to 1 for each variable.
 Factor solution is not unique. Factors can be rotated to ease
interpretation via Σ=(ΛG')+(G'Λ')+Ψ. Δ= ΛG is the matrix of
rotated factor loadings. Analyst seeks simple structure in the
rotation. Each variable should load highly on one factor and
all factor loadings have large absolute value or are near zero.
Factor Analysis Example using
S-Plus V6 FactorAnal.ssc
 S-Plus uses factanal, a weighted covariance estimation
function to perform factor analysis.
 Using testscores we analφyze whether a two factor
model, overall ability and closed or open book, explain
the overall variation in the scores.
 The two factor model explains about 80% of the variation in
the original data, with the first factor accounting for 45%.
 The rotated factor loadings indicate the importance of the
first overall ability factor and the relative effects of closed
and open book exams.
 Plots of the factor loadings and the biplot of test scores in
two dimensional factor space.
Discriminant Function Analysis
 Concerned with allocating observations to one or another
a priori defined classes.
 Calibrated on a training sample in which membership is
known and then applied to test cases which are
unknown.
 In Medicine post-mortem information (classes based on
survival) can used to classify at risk patients for mortality
or morbidity.
 An observation is classified into one of two groups on a
series of measurements x1, x2, x3,… xp using a linear
function z of the variables: z=a1x1+a2x2+…+apxp
Discriminant Function Analysis
 Coefficients maximize ratio of between groups variance of z to
within groups variance. V=a'Ba/a'Sa
 The data in both groups have a multivariate normal distribution
and the covariance matrices of each group are the same.
 Function evaluates z. Assign to group 1 if zi-zc<0, assign to group
2 if zi-zc≥0.
 Performance can be assessed through misclassification rate on
known cases through the training set data.
 Significance tests are available including 1) Wilk's Λ or others
previously mentioned and also Hotelling’s multivariate T2
, 2) φ to
see whether the discriminant function differs between groups,
and 3) Chi-square test of Mahalanobis distances from
observations to their group centers, if large chi-square then
unlikely that an observation came from a particular group.
Discriminant Function Analysis Example
using SAS V8SASHandDA.sas
 Archeological study of two types of skulls from the Tibetan areas
of Sikkim or Kharis (fundamental human type). 5 dimensional
variables measured on 32 skulls.
 This can be considered a training set for future classification, the
analysis will also identify the most important variables in
discrimination.
 Proc discrim output generates within group and between group
covariance matrices, covariance diagnostics, generalized
pairwise distances between groups, discrimination function
coefficients, and misclassification (resubstitution) rate.
 Proc stepdisc finds that faceheight is the most important variable
for classifying the members into groups. Crossvalidated with
another proc discrim using only faceheight.
Cluster Analysis
 Concerned with allocating observations to discrete groups or
clusters of observations which are unknown.
 A hierarchy of solutions from single observation clusters to a single
group cluster containing all observations are displayed in a
dendrogram. An particular clustering partition will be considered
optimal based on statistical and practical criteria.
 Clustering methods operate on the inter-individual Euclidian
distance matrix calculated from the raw data.
 Single Linkage or Nearest Neighbors -groups are merged a a given
distance if closest individuals from each group are at least the
specified distance.
 Complete Linkage or Furthest Neighbors- two groups merge only if
the most distant members are close enough together.
 Average Linkage- two groups merge if the average distance
between them is close enough.
Cluster Analysis Example
using SAS V8SASHandCA.sas
 Analysis of quality of air of U.S. cities. Object is to
identify groups of cities that are similar for policy
intervention.
 Clustering variables include SO2, temperature, factories,
population, windspeed, rain, rainydays.
 First step is to look for outliers using proc univariate.
Chicago is an outlier on manufacturing and population,
Phoenix has the lowest value on all three climate
variables, these cities are excluded from the analysis.
 Results from several runs each based on a different
clustering method are complex and require interpretation
and a feel for the technique.
Cluster Analysis Example
using SAS V8 (cont.)
 Cluster history indicates the stages at which various cities and
clusters are joined at particular distances along with other
diagnostics.
 Bimodalty index of at least .55 suggests clustering on a
particular variable. Factories and population are at .55.
 The value of the cubic clustering criterion (ccc) is a guide to
the number of clusters in the data. It peaks at 4 clusters for
the single and complete linkage runs. The number of
eigenvalues of the correlation matrix may also suggest
dimensionality in the data. Four clusters is only an
approximation as the evidence is not that clear.
 Dendrograms may also suggest evidence of structure but
generally do not make the optimal number of groups obvious.
Cluster Analysis Example
using SAS V8 (cont.)
 Means for clustering variables can be examined to
understand how clusters differ on the variables. Mean
differences on these variables can be tested.
 Clustering solutions can be displayed by plotting the data
in principal component space since they are linear
transformations of the clustering variables.
 In this example the first two principal components are
derived and the individual cluster observations are
graphed. They are distinct in the location of the
observations although the solution is not optimal.
 A box plot was created and means tested for differences
on the SO2 level.

More Related Content

What's hot

Nonparametric tests
Nonparametric testsNonparametric tests
Nonparametric testsArun Kumar
 
non parametric statistics
non parametric statisticsnon parametric statistics
non parametric statisticsAnchal Garg
 
Multiple regression presentation
Multiple regression presentationMultiple regression presentation
Multiple regression presentationCarlo Magno
 
Kruskal Wallis test, Friedman test, Spearman Correlation
Kruskal Wallis test, Friedman test, Spearman CorrelationKruskal Wallis test, Friedman test, Spearman Correlation
Kruskal Wallis test, Friedman test, Spearman CorrelationRizwan S A
 
Correlation and Regression
Correlation and RegressionCorrelation and Regression
Correlation and RegressionShubham Mehta
 
Simple & Multiple Regression Analysis
Simple & Multiple Regression AnalysisSimple & Multiple Regression Analysis
Simple & Multiple Regression AnalysisShailendra Tomar
 
Multivariate data analysis
Multivariate data analysisMultivariate data analysis
Multivariate data analysisSetia Pramana
 
Regression analysis.
Regression analysis.Regression analysis.
Regression analysis.sonia gupta
 
Spearman Rank Correlation - Thiyagu
Spearman Rank Correlation - ThiyaguSpearman Rank Correlation - Thiyagu
Spearman Rank Correlation - ThiyaguThiyagu K
 
Multivariate analyses
Multivariate analysesMultivariate analyses
Multivariate analysesNaveen Deswal
 
wilcoxon signed rank test
wilcoxon signed rank testwilcoxon signed rank test
wilcoxon signed rank testraj shekar
 
Regression analysis
Regression analysisRegression analysis
Regression analysisRavi shankar
 
Partial Correlation, Multiple Correlation And Multiple Regression Analysis
Partial Correlation, Multiple Correlation And Multiple Regression AnalysisPartial Correlation, Multiple Correlation And Multiple Regression Analysis
Partial Correlation, Multiple Correlation And Multiple Regression AnalysisSundar B N
 

What's hot (20)

Nonparametric tests
Nonparametric testsNonparametric tests
Nonparametric tests
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
non parametric statistics
non parametric statisticsnon parametric statistics
non parametric statistics
 
Multiple regression presentation
Multiple regression presentationMultiple regression presentation
Multiple regression presentation
 
Two-Way ANOVA
Two-Way ANOVATwo-Way ANOVA
Two-Way ANOVA
 
Kruskal Wallis test, Friedman test, Spearman Correlation
Kruskal Wallis test, Friedman test, Spearman CorrelationKruskal Wallis test, Friedman test, Spearman Correlation
Kruskal Wallis test, Friedman test, Spearman Correlation
 
Correlation and Regression
Correlation and RegressionCorrelation and Regression
Correlation and Regression
 
Simple & Multiple Regression Analysis
Simple & Multiple Regression AnalysisSimple & Multiple Regression Analysis
Simple & Multiple Regression Analysis
 
Multivariate data analysis
Multivariate data analysisMultivariate data analysis
Multivariate data analysis
 
Regression analysis.
Regression analysis.Regression analysis.
Regression analysis.
 
Spearman Rank Correlation - Thiyagu
Spearman Rank Correlation - ThiyaguSpearman Rank Correlation - Thiyagu
Spearman Rank Correlation - Thiyagu
 
Multivariate analyses
Multivariate analysesMultivariate analyses
Multivariate analyses
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 
Kruskal wallis test
Kruskal wallis testKruskal wallis test
Kruskal wallis test
 
wilcoxon signed rank test
wilcoxon signed rank testwilcoxon signed rank test
wilcoxon signed rank test
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Partial Correlation, Multiple Correlation And Multiple Regression Analysis
Partial Correlation, Multiple Correlation And Multiple Regression AnalysisPartial Correlation, Multiple Correlation And Multiple Regression Analysis
Partial Correlation, Multiple Correlation And Multiple Regression Analysis
 
Regression
RegressionRegression
Regression
 
Logistic Regression Analysis
Logistic Regression AnalysisLogistic Regression Analysis
Logistic Regression Analysis
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 

Similar to Overview of Multivariate Statistical Methods

s.analysis
s.analysiss.analysis
s.analysiskavi ...
 
Multinomial Logistic Regression.pdf
Multinomial Logistic Regression.pdfMultinomial Logistic Regression.pdf
Multinomial Logistic Regression.pdfAlemAyahu
 
© 2014 Laureate Education, Inc. Page 1 of 5 Week 4 A.docx
© 2014 Laureate Education, Inc.   Page 1 of 5  Week 4 A.docx© 2014 Laureate Education, Inc.   Page 1 of 5  Week 4 A.docx
© 2014 Laureate Education, Inc. Page 1 of 5 Week 4 A.docxgerardkortney
 
Commonly used Statistics in Medical Research Handout
Commonly used Statistics in Medical Research HandoutCommonly used Statistics in Medical Research Handout
Commonly used Statistics in Medical Research HandoutPat Barlow
 
Advanced Methods of Statistical Analysis used in Animal Breeding.
Advanced Methods of Statistical Analysis used in Animal Breeding.Advanced Methods of Statistical Analysis used in Animal Breeding.
Advanced Methods of Statistical Analysis used in Animal Breeding.DrBarada Mohanty
 
6ONE-WAY BETWEEN-SUBJECTS ANALYSIS OFVARIANCE6.1 .docx
6ONE-WAY BETWEEN-SUBJECTS ANALYSIS OFVARIANCE6.1  .docx6ONE-WAY BETWEEN-SUBJECTS ANALYSIS OFVARIANCE6.1  .docx
6ONE-WAY BETWEEN-SUBJECTS ANALYSIS OFVARIANCE6.1 .docxalinainglis
 
Applied statistics lecture_6
Applied statistics lecture_6Applied statistics lecture_6
Applied statistics lecture_6Daria Bogdanova
 
Lesson 8 Linear Correlation And Regression
Lesson 8 Linear Correlation And RegressionLesson 8 Linear Correlation And Regression
Lesson 8 Linear Correlation And RegressionSumit Prajapati
 
Correlational research
Correlational researchCorrelational research
Correlational researchJijo G John
 
Section 1 Data File DescriptionThe fictional data represents a te.docx
Section 1 Data File DescriptionThe fictional data represents a te.docxSection 1 Data File DescriptionThe fictional data represents a te.docx
Section 1 Data File DescriptionThe fictional data represents a te.docxbagotjesusa
 
Quantitative_analysis.ppt
Quantitative_analysis.pptQuantitative_analysis.ppt
Quantitative_analysis.pptmousaderhem1
 
Calculating Analysis of Variance (ANOVA) and Post Hoc Analyses Follo.docx
Calculating Analysis of Variance (ANOVA) and Post Hoc Analyses Follo.docxCalculating Analysis of Variance (ANOVA) and Post Hoc Analyses Follo.docx
Calculating Analysis of Variance (ANOVA) and Post Hoc Analyses Follo.docxaman341480
 
Statistical data handling
Statistical data handling Statistical data handling
Statistical data handling Rohan Jagdale
 
Group 5 - Regression Analysis.pdf
Group 5 - Regression Analysis.pdfGroup 5 - Regression Analysis.pdf
Group 5 - Regression Analysis.pdffahlevet40
 
Factor Extraction method in factor analysis with example in R studio.pptx
Factor Extraction method in factor analysis with example in R studio.pptxFactor Extraction method in factor analysis with example in R studio.pptx
Factor Extraction method in factor analysis with example in R studio.pptxGauravRajole
 
Correlation and Regression - ANOVA - DAY 5 - B.Ed - 8614 - AIOU
Correlation and Regression - ANOVA - DAY 5 - B.Ed - 8614 - AIOUCorrelation and Regression - ANOVA - DAY 5 - B.Ed - 8614 - AIOU
Correlation and Regression - ANOVA - DAY 5 - B.Ed - 8614 - AIOUEqraBaig
 
Recep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managersRecep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managersrecepmaz
 
Recep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managersRecep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managersrecepmaz
 

Similar to Overview of Multivariate Statistical Methods (20)

s.analysis
s.analysiss.analysis
s.analysis
 
Multinomial Logistic Regression.pdf
Multinomial Logistic Regression.pdfMultinomial Logistic Regression.pdf
Multinomial Logistic Regression.pdf
 
© 2014 Laureate Education, Inc. Page 1 of 5 Week 4 A.docx
© 2014 Laureate Education, Inc.   Page 1 of 5  Week 4 A.docx© 2014 Laureate Education, Inc.   Page 1 of 5  Week 4 A.docx
© 2014 Laureate Education, Inc. Page 1 of 5 Week 4 A.docx
 
Commonly used Statistics in Medical Research Handout
Commonly used Statistics in Medical Research HandoutCommonly used Statistics in Medical Research Handout
Commonly used Statistics in Medical Research Handout
 
Advanced Methods of Statistical Analysis used in Animal Breeding.
Advanced Methods of Statistical Analysis used in Animal Breeding.Advanced Methods of Statistical Analysis used in Animal Breeding.
Advanced Methods of Statistical Analysis used in Animal Breeding.
 
6ONE-WAY BETWEEN-SUBJECTS ANALYSIS OFVARIANCE6.1 .docx
6ONE-WAY BETWEEN-SUBJECTS ANALYSIS OFVARIANCE6.1  .docx6ONE-WAY BETWEEN-SUBJECTS ANALYSIS OFVARIANCE6.1  .docx
6ONE-WAY BETWEEN-SUBJECTS ANALYSIS OFVARIANCE6.1 .docx
 
Applied statistics lecture_6
Applied statistics lecture_6Applied statistics lecture_6
Applied statistics lecture_6
 
Lesson 8 Linear Correlation And Regression
Lesson 8 Linear Correlation And RegressionLesson 8 Linear Correlation And Regression
Lesson 8 Linear Correlation And Regression
 
Correlational research
Correlational researchCorrelational research
Correlational research
 
Section 1 Data File DescriptionThe fictional data represents a te.docx
Section 1 Data File DescriptionThe fictional data represents a te.docxSection 1 Data File DescriptionThe fictional data represents a te.docx
Section 1 Data File DescriptionThe fictional data represents a te.docx
 
Quantitative_analysis.ppt
Quantitative_analysis.pptQuantitative_analysis.ppt
Quantitative_analysis.ppt
 
Data processing
Data processingData processing
Data processing
 
Calculating Analysis of Variance (ANOVA) and Post Hoc Analyses Follo.docx
Calculating Analysis of Variance (ANOVA) and Post Hoc Analyses Follo.docxCalculating Analysis of Variance (ANOVA) and Post Hoc Analyses Follo.docx
Calculating Analysis of Variance (ANOVA) and Post Hoc Analyses Follo.docx
 
Statistical data handling
Statistical data handling Statistical data handling
Statistical data handling
 
Group 5 - Regression Analysis.pdf
Group 5 - Regression Analysis.pdfGroup 5 - Regression Analysis.pdf
Group 5 - Regression Analysis.pdf
 
Meta analysis with R
Meta analysis with RMeta analysis with R
Meta analysis with R
 
Factor Extraction method in factor analysis with example in R studio.pptx
Factor Extraction method in factor analysis with example in R studio.pptxFactor Extraction method in factor analysis with example in R studio.pptx
Factor Extraction method in factor analysis with example in R studio.pptx
 
Correlation and Regression - ANOVA - DAY 5 - B.Ed - 8614 - AIOU
Correlation and Regression - ANOVA - DAY 5 - B.Ed - 8614 - AIOUCorrelation and Regression - ANOVA - DAY 5 - B.Ed - 8614 - AIOU
Correlation and Regression - ANOVA - DAY 5 - B.Ed - 8614 - AIOU
 
Recep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managersRecep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managers
 
Recep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managersRecep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managers
 

Recently uploaded

Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsHajira Mahmood
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
preservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxpreservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxnoordubaliya2003
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 

Recently uploaded (20)

Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutions
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
preservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxpreservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptx
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 

Overview of Multivariate Statistical Methods

  • 1. Overview of Multivariate Statistical Methods Thomas Uttaro, Ph.D., M.S. Deputy Director and CIO, South Beach Psychiatric Center 11th Annual NYS-OMH Institute on Mental Health Management Information
  • 2. Introduction  Concerned with data collected on several dimensions of the same individual or units of analyses such as geographic regions.  Common in social, behavioral, life, and medical sciences; medical and mental health outcomes, economic indicators, demography.  Extension of Univariate statistics, analysis of variation for a single random variable, t tests, Correlation, Regression, ANOVA, ANCOVA, Survival Analysis  Multivariate techniques account for correlation of measures due to common source for each individual or other unit of analysis. These techniques also control type I error rates with overall (experimentwise) significance tests.
  • 3. Preview of Multivariate Methods  Multivariate General Linear Model-the extension of ANOVA, ANCOVA, and regression to a family of methods for multivariate outcomes.  Principal Components Analysis-accounts for variation in multivariate observations with a smaller number of observed indices that are linear combinations of the original variables.  Factor Analysis -accounts for variation in multiple outcomes with a linear combination of unobserved factors and a variable specific term.
  • 4. Preview of Multivariate Methods (cont.)  Discriminant Analysis-concerned with separating observations into known groups based on multivariate observations.  Cluster Analysis-concerned with identification of unknown but interpretable groups and placing individual observations within them.  Canonical Correlation-individual variables are divided into two groups, concerned with describing the relationship between the two sets through multivariate correlations.
  • 5. General Linear Model: ANOVA, ANCOVA and Multiple Regression  General Linear Model unified framework relates ANOVA, ANCOVA, and Regression methods.  ANOVA predicts or relates factor or categorical predictors to a single predicted continuous variable.  Multiple Regression predicts or relates continuous variables to a single predicted continuous variable.  ANCOVA predicts or relates factor and continuous variables to a single predicted continuous variable.  A regression approach can be used with dummy variables to perform ANOVA, hence the GLM model.  Variations on these models exist to predict binary or categorical outcomes (logistic regression, multinomial regression).
  • 6. ANOVA and Multiple Regression Examples using SPSS 10.5  ANOVA example relates FACA (2 levels, perhaps gender), FACB (3 levels, perhaps region) and the interaction FACA x FACB to a single dependent variable (perhaps annual income).steve296anova.SPS  F test indicates that the main effects FACA and FACB are significant at the p<.001 level, FACA x FACB interaction is non-significant.  Multiple regression example predicts instructor evaluation from 5 predictors: clarity, stimulation, knowledge, interest, and course evaluation. Variables are continuous. Accounts for correlation among predictor variables and determines which are most important.stevep84MBAreg.SPS  t tests of regression coefficients indicates that all variables except interest are significant in predicting the level of instructor evaluation.  Several diagnostics are output including residuals, leverages, and Cook distance (influential data points) values. Plot of regression standardized residuals should be approximately normal.
  • 7. Multivariate General Linear Model  Extensions of single dependent variable procedures such as ANOVA, ANCOVA, and multiple regression.  Statistical framework includes MANOVA (factor predictors), MANCOVA (factors and continuous predictors), and multivariate multiple regression (continuous predictors).  Prevents inflated overall type I error rate, accounts for correlations among the predictors, can detect the joint significance of a set of variables, even when univariate analyses would not be significant.  Hotelling’s T2 is the overall multivariate test statistic and a generalization of the univariate t. Tests the Ho that the population mean vectors are equal for two or more groups.
  • 8. Multivariate General Linear Model (cont.)  Multivariate significance implies that there is a linear combination of dependent variables (the discriminant function) that is separating the k groups.  Multivariate test statistics are a function of eigenvalues which are fundamental to all multivariate analyses.  Four multivariate test statistics are commonly used, Wilk’s Λ, Roy’s largest root, the Hotelling-Lawley trace, and the Pillai Bartlett trace. Wilk’s Λ is most common.  Following a significant finding, post hoc or planned comparisons are then used to determine which variables are driving the significance between groups.
  • 9. Multivariate Regression Example using SPSS 10.5  Timm data on differences in cognitive tests due to learning tasks. Scores on Ravin’s Progressive Matrices and Peabody Picture Vocabulary regressed on 3 learning tasks.steve132multivreg.SPS  Multivariate test statistic Wilk’s Λ is significant indicating a significant relationship between the dependent variables and the 3 predictors beyond the .01 level.  Univariate F tests examine the regression on each variable separately. In particular, NA (named action) is related to PEVOCAB at t=2.68, p<.011.  Univariate prediction equations do not take into account correlations among dependent variables.
  • 10. MANOVA Example with Tukey post-hoc tests using SAS V8  Novince data on improving social skills among college women. 3 groups: control, behavioral rehearsal and cognitive restructuring, 4 variables: anxiety, social interaction skills, appropriateness, and assertiveness.  SAS program used for 3 treatment group MANOVA with 4 measures to determine treatment effectiveness. SASstevep204.sas  Overall significant multivariate tests indicate true differences between groups on one or more variables and their linear combinations. Excellent optional output.  Tukey post hoc tests generate significance levels and confidence intervals to examine effects of variables.
  • 11. Crisis Residence Treatment and the Basis-32 at South Beach PC  Treatment, gender, and GAF covariate effects on BASIS-32 subscale scores, n=73 paired admissions and discharges.  Highly significant pre/post treatment effect, F=4.216, 5 df, p<.001. CR effective in terms of all BASIS-32 subscales.  Significant GAF covariate effect F=5.271, 5 df, p<.001 strong relationship between clinician GAF and self-report BASIS-32 on relationship to self/others and depression subscales.  Gender by treatment interaction non-significant. CR equally effective for both genders in terms of subscale scores  Statistical diagnostics indicate excellent power for all tests.
  • 12. Principal Components Analysis  Analysis based on a large number of original variables can be simplified to a smaller number of standardized linear combinations of original variables.  x→y=Γ'(x-μ) where Γ is orthogonal Γ'ΣΓ=Λ. The ith principal component of x may be defined as the ith element of vector y, as yi=φ'i(x-μ), leading to uncorrelated principal components. Principal components essentially involves finding the eigenvalues of the covariance matrix Σ.  The first principal component has the largest variance of all standardized linear combinations of x.
  • 13. Principal Components Example using S-Plus V6PrinComp.ssc  Ph.D. qualifying examinations in five areas of mathematics for 25 students.  Analysis carried out using S-Plus princomp function which returns object of mode princomp.  A large coefficient (absolute value) corresponds to a high loading, while a coefficient near zero has a low loading.  First principal component loadings are of moderate size in the same direction representing an average score.  Second principal component contrasts two closed book exams with three open book exams, with the first and last exams weighted most heavily.  Plots of the principal component loadings and the biplot of original and transformed test scores in two dimensional principal component space.
  • 14. Factor Analysis  Factor Analysis explains correlations between observed variables with underlying factors.  x=μ+Λf+u Λ={λij} is matrix of factor loadings, f and u represent the common and unique factors respectively. Equivalently, Σ=ΛΛ'+Ψ, decomposition into factor and error covariances.  Diagonal of factor covariance matrix is the vector of communalities h2 , common variation in the factors and Ψij is the vector of uniquenesses, the variation in xi not shared with the other variables. These sum to 1 for each variable.  Factor solution is not unique. Factors can be rotated to ease interpretation via Σ=(ΛG')+(G'Λ')+Ψ. Δ= ΛG is the matrix of rotated factor loadings. Analyst seeks simple structure in the rotation. Each variable should load highly on one factor and all factor loadings have large absolute value or are near zero.
  • 15. Factor Analysis Example using S-Plus V6 FactorAnal.ssc  S-Plus uses factanal, a weighted covariance estimation function to perform factor analysis.  Using testscores we analφyze whether a two factor model, overall ability and closed or open book, explain the overall variation in the scores.  The two factor model explains about 80% of the variation in the original data, with the first factor accounting for 45%.  The rotated factor loadings indicate the importance of the first overall ability factor and the relative effects of closed and open book exams.  Plots of the factor loadings and the biplot of test scores in two dimensional factor space.
  • 16. Discriminant Function Analysis  Concerned with allocating observations to one or another a priori defined classes.  Calibrated on a training sample in which membership is known and then applied to test cases which are unknown.  In Medicine post-mortem information (classes based on survival) can used to classify at risk patients for mortality or morbidity.  An observation is classified into one of two groups on a series of measurements x1, x2, x3,… xp using a linear function z of the variables: z=a1x1+a2x2+…+apxp
  • 17. Discriminant Function Analysis  Coefficients maximize ratio of between groups variance of z to within groups variance. V=a'Ba/a'Sa  The data in both groups have a multivariate normal distribution and the covariance matrices of each group are the same.  Function evaluates z. Assign to group 1 if zi-zc<0, assign to group 2 if zi-zc≥0.  Performance can be assessed through misclassification rate on known cases through the training set data.  Significance tests are available including 1) Wilk's Λ or others previously mentioned and also Hotelling’s multivariate T2 , 2) φ to see whether the discriminant function differs between groups, and 3) Chi-square test of Mahalanobis distances from observations to their group centers, if large chi-square then unlikely that an observation came from a particular group.
  • 18. Discriminant Function Analysis Example using SAS V8SASHandDA.sas  Archeological study of two types of skulls from the Tibetan areas of Sikkim or Kharis (fundamental human type). 5 dimensional variables measured on 32 skulls.  This can be considered a training set for future classification, the analysis will also identify the most important variables in discrimination.  Proc discrim output generates within group and between group covariance matrices, covariance diagnostics, generalized pairwise distances between groups, discrimination function coefficients, and misclassification (resubstitution) rate.  Proc stepdisc finds that faceheight is the most important variable for classifying the members into groups. Crossvalidated with another proc discrim using only faceheight.
  • 19. Cluster Analysis  Concerned with allocating observations to discrete groups or clusters of observations which are unknown.  A hierarchy of solutions from single observation clusters to a single group cluster containing all observations are displayed in a dendrogram. An particular clustering partition will be considered optimal based on statistical and practical criteria.  Clustering methods operate on the inter-individual Euclidian distance matrix calculated from the raw data.  Single Linkage or Nearest Neighbors -groups are merged a a given distance if closest individuals from each group are at least the specified distance.  Complete Linkage or Furthest Neighbors- two groups merge only if the most distant members are close enough together.  Average Linkage- two groups merge if the average distance between them is close enough.
  • 20. Cluster Analysis Example using SAS V8SASHandCA.sas  Analysis of quality of air of U.S. cities. Object is to identify groups of cities that are similar for policy intervention.  Clustering variables include SO2, temperature, factories, population, windspeed, rain, rainydays.  First step is to look for outliers using proc univariate. Chicago is an outlier on manufacturing and population, Phoenix has the lowest value on all three climate variables, these cities are excluded from the analysis.  Results from several runs each based on a different clustering method are complex and require interpretation and a feel for the technique.
  • 21. Cluster Analysis Example using SAS V8 (cont.)  Cluster history indicates the stages at which various cities and clusters are joined at particular distances along with other diagnostics.  Bimodalty index of at least .55 suggests clustering on a particular variable. Factories and population are at .55.  The value of the cubic clustering criterion (ccc) is a guide to the number of clusters in the data. It peaks at 4 clusters for the single and complete linkage runs. The number of eigenvalues of the correlation matrix may also suggest dimensionality in the data. Four clusters is only an approximation as the evidence is not that clear.  Dendrograms may also suggest evidence of structure but generally do not make the optimal number of groups obvious.
  • 22. Cluster Analysis Example using SAS V8 (cont.)  Means for clustering variables can be examined to understand how clusters differ on the variables. Mean differences on these variables can be tested.  Clustering solutions can be displayed by plotting the data in principal component space since they are linear transformations of the clustering variables.  In this example the first two principal components are derived and the individual cluster observations are graphed. They are distinct in the location of the observations although the solution is not optimal.  A box plot was created and means tested for differences on the SO2 level.