SlideShare a Scribd company logo
1 of 65
U N I V E R S I T Y O F S O U T H F L O R I D A // 1
Classification Methods
Dr. S. Shivendu
U N I V E R S I T Y O F S O U T H F L O R I D A // 2
Objectives
Overview of Classification Methods
Understand statistical concepts of Principal
component analysis and Factor analysis.
01
Interpret results of Principal component
analysis and Factor analysis.
02
Be able to debug SAS programs and export data
from SAS files.
03
U N I V E R S I T Y O F S O U T H F L O R I D A // 3
Agenda
Overview of Statistical Concepts
Principal Component Analysis
Factor Analysis
U N I V E R S I T Y O F S O U T H F L O R I D A // 4
Course Textbooks
U N I V E R S I T Y O F S O U T H F L O R I D A // 5
• We study phenomena that can not be directly observed
– ego, personality, intelligence in psychology
– Underlying factors that govern the observed data
• We want to identify and operate with underlying latent factors rather than the
observed data
– E.g. topics in news articles
– Transcription factors in genomics
• We want to discover and exploit hidden relationships
– “beautiful car” and “gorgeous automobile” are closely related
– So are “driver” and “automobile”
– But does your search engine know this?
– Reduces noise and error in results
Why Principal Component or Factor Analysis?
U N I V E R S I T Y O F S O U T H F L O R I D A // 6
• We have too many observations and dimensions
– To reason about or obtain insights from
– To visualize
– Too much noise in the data
– Need to “reduce” them to a smaller set of factors
– Better representation of data without losing much information
– Can build more effective data analyses on the reduced-dimensional space:
classification, clustering, pattern recognition
• Combinations of observed variables may be more effective bases for insights, even if
physical meaning is obscure
Why Principal Component or Factor Analysis?
U N I V E R S I T Y O F S O U T H F L O R I D A // 7
• Discover a new set of factors/dimensions/axes against which to represent, describe or
evaluate the data
– For more effective reasoning, insights, or better visualization
– Reduce noise in the data
– Typically a smaller set of factors: dimension reduction
• Factors are combinations of observed variables
– May be more effective bases for insights, even if physical meaning is obscure
– Observed data are described in terms of these factors rather than in terms of
original variables/dimensions
Factor or Component Analysis
U N I V E R S I T Y O F S O U T H F L O R I D A // 8
Basic Concept
• Areas of variance in data are where items can be best discriminated and key
underlying phenomena observed
• Areas of greatest “signal” in the data
• If two items or dimensions are highly correlated or dependent
• They are likely to represent highly related phenomena
• If they tell us about the same underlying variance in the data, combining them to
form a single measure is reasonable
• Parsimony
• Reduction in Error
• So we want to combine related variables, and focus on uncorrelated or independent
ones, especially those along which the observations have high variance
• We want a smaller set of variables that explain most of the variance in the original
data, in more compact and insightful form
U N I V E R S I T Y O F S O U T H F L O R I D A // 9
Basic Concept
• What if the dependences and correlations are not so strong or direct?
• And suppose you have 3 variables, or 4, or 5, or 10000?
• Look for the phenomena underlying the observed covariance/co-dependence in a set
of variables
• Once again, phenomena that are uncorrelated or independent, and especially
those along which the data show high variance
• These phenomena are called “factors” or “principal components” or “independent
components,” depending on the methods used
• Factor analysis: based on variance/covariance/correlation
• Independent Component Analysis: based on independence
U N I V E R S I T Y O F S O U T H F L O R I D A // 10
Principal Component Analysis
• Most common form of factor analysis
• The new variables/dimensions
• Are linear combinations of the original ones
• Are uncorrelated with one another
• Orthogonal in original dimension space
• Capture as much of the original variance in the data as possible
• Are called Principal Components
U N I V E R S I T Y O F S O U T H F L O R I D A // 11
Principal Component Analysis: An Overview
• It is a mathematical tool from applied linear algebra.
• It is a simple, non-parametric method of extracting relevant information from a data
set with large number of variables.
• It provides a roadmap for how to reduce a complex data set to a lower dimension.
U N I V E R S I T Y O F S O U T H F L O R I D A // 12
Let’s start with first principles: Variance
U N I V E R S I T Y O F S O U T H F L O R I D A // 13
So, what is Covariance?
U N I V E R S I T Y O F S O U T H F L O R I D A // 14
What do we understand by Covariance?
U N I V E R S I T Y O F S O U T H F L O R I D A // 15
So, for Covariance
U N I V E R S I T Y O F S O U T H F L O R I D A // 16
Why Covariance?
• Why bother with calculating (expensive) covariance when we could just plot the 2
values to see their relationship?
• Covariance calculations are used to find relationships between dimensions in high
dimensional data sets (usually greater than 3) where visualization is difficult.
U N I V E R S I T Y O F S O U T H F L O R I D A // 17
Another concept: Linear Independence
• A set of n-dimensional vectors xi Є Rn, are said to be linearly independent if
none of them can be written as a linear combination of the others.
• In other words,
U N I V E R S I T Y O F S O U T H F L O R I D A // 18
Linear Independence continued
U N I V E R S I T Y O F S O U T H F L O R I D A // 19
Some other terms: Span and Basis
U N I V E R S I T Y O F S O U T H F L O R I D A // 20
Orthonormal and Orthogonal Basis
U N I V E R S I T Y O F S O U T H F L O R I D A // 21
Some more basic terms: Eigen Value Problem and Eigenvector
U N I V E R S I T Y O F S O U T H F L O R I D A // 22
How to calculate Eigenvalues and Eigenvectors?
U N I V E R S I T Y O F S O U T H F L O R I D A // 23
Calculating Eigenvalues and Eigenvector (you can skip this)
U N I V E R S I T Y O F S O U T H F L O R I D A // 24
Properties of Eigenvalues and Eigenvectors
U N I V E R S I T Y O F S O U T H F L O R I D A // 25
So, What is Principal Component Analysis?
• An example:
U N I V E R S I T Y O F S O U T H F L O R I D A // 26
Example (continued)
U N I V E R S I T Y O F S O U T H F L O R I D A // 27
In other words:
U N I V E R S I T Y O F S O U T H F L O R I D A // 28
Which parameters we want to ignore and which
ones we want to keep?
• Which parameters to keep:
• Parameter that doesn’t depend on others (e.g. eye color), i.e. uncorrelated and
hence have low covariance.
• Parameter that changes a lot: Have High variance
• Which parameters to drop:
• Constant parameter (number of heads)
• Parameter with very low variance (thickness of hair)
• Parameter that is linearly dependent on other parameters: Z= aX + bY
U N I V E R S I T Y O F S O U T H F L O R I D A // 29
Now we are ready!
U N I V E R S I T Y O F S O U T H F L O R I D A // 30
Core principles of Principal Component Analysis (PCA)
• PCA is a mathematical presentation using linear algebra to “best express” the
raw data along minimum number of dimensions”.
• PCA allows us to filter out noise and extract the relevant information from the
given data set.
• Hence, the representation we are looking for is such that it decreases both
noise and redundancy in the data set at hand.
U N I V E R S I T Y O F S O U T H F L O R I D A // 31
What is noise?
U N I V E R S I T Y O F S O U T H F L O R I D A // 32
What is redundancy?
U N I V E R S I T Y O F S O U T H F L O R I D A // 33
Goal of PCA:
U N I V E R S I T Y O F S O U T H F L O R I D A // 34
PCA assumption and how it works
U N I V E R S I T Y O F S O U T H F L O R I D A // 35
How does it work?
U N I V E R S I T Y O F S O U T H F L O R I D A // 36
In summary
U N I V E R S I T Y O F S O U T H F L O R I D A // 37
Factor Analysis
• General Concepts
• Factor analysis provides information about reliability, item quality, and
construct validity
• General goal is to understand whether and to what extent items from a
scale may reflect an underlying hypothetical construct or constructs, known
as factors
• An analytic method with high sensitivity to identify problematic items and
assess the number of factors
• Useful for analysis of survey data
U N I V E R S I T Y O F S O U T H F L O R I D A // 38
General concepts (contd.)
• In general, factor analysis methods decompose (or break down) the
covariation among items in a measure into meaningful components
• Higher inter-item correlations should reflect greater overlap in what the
items measure, and, therefore, higher inter-item correlations reflect higher
internal reliability
U N I V E R S I T Y O F S O U T H F L O R I D A // 39
Objectives
• What is factor analysis?
• What do we need factor analysis for?
• What are the modeling assumptions?
• How to specify, fit, and interpret factor models?
• What is the difference between exploratory and confirmatory factor analysis?
• What is and how to assess model identifiability?
39
U N I V E R S I T Y O F S O U T H F L O R I D A // 40
What is factor analysis?
• Factor analysis is a theory driven statistical data reduction technique used to
explain covariance among observed random variables in terms of fewer
unobserved random variables named factors
U N I V E R S I T Y O F S O U T H F L O R I D A // 41
General concept
• In practice, a factor cannot be estimated with one item
• Should only be estimated with three or more items with higher
correlation with factor contribute more to the measure
U N I V E R S I T Y O F S O U T H F L O R I D A // 42
General Concepts
• Items are referred to as indicators Regression slopes between factor and
indicators are referred to as loadings
U N I V E R S I T Y O F S O U T H F L O R I D A // 43
General Concepts
• Patterns of high inter-item correlations among subsets of items suggest more
than one factor because the items tend to “cluster” together
• Any number of factors might underlie a set of items, up to the total number
of items (which would imply no common factor)
• Example: set of six items might assess extroversion and openness
U N I V E R S I T Y O F S O U T H F L O R I D A // 44
General Concepts
U N I V E R S I T Y O F S O U T H F L O R I D A // 45
General Concepts
• We never know the meaning of the factors, however; we can only use theory
to decide what they mean and then test their validity
• The factors may be related or not related—correlated or orthogonal
(uncorrelated)
• If those who are extroverted tend to be a little more open, then the factors
are correlated (contrary to what is suggested by the table)
U N I V E R S I T Y O F S O U T H F L O R I D A // 46
Types of Factor Analysis
• Two major types of factor analysis
• Exploratory factor analysis (EFA)
• Confirmatory factor analysis (CFA)
• Major difference is that EFA seeks to discover the number of factors and does
not specify which items load on which factors
U N I V E R S I T Y O F S O U T H F L O R I D A // 47
Exploratory Factor Analysis
• The researcher may discover there is one factor underlying the items or
many factors
• Items may be eliminated by the researcher if they do not load highly
• Researchers choose items that load highly on one factor and low on other
factors to achieve simple structure
• Composite scale scores often created based on the factor analysis to be
used in further research
U N I V E R S I T Y O F S O U T H F L O R I D A // 48
Exploratory Factor Analysis
• An initial analysis called principal components analysis (PCA) is first
conducted to help determine the number of factors that underlie the set of
items
• PCA is the default EFA method in most software and the first stage in other
exploratory factor analysis methods to select the number of factors
• PCA is not considered a “true factor analysis method,” because measurement
error is not estimated.
U N I V E R S I T Y O F S O U T H F L O R I D A // 49
EFA
• PCA gives eigenvalues for the number of components (factors) equal to the number of items
• If 12 items, there will be 12 eigenvalues
• Each component is a potential “cluster” of highly inter-correlated items
• Eigenvalues represent the amount of variance accounted for by each component, but they
are not in a standardized metric
• Larger eigenvalues indicate a more important (and more likely real) components or factor,
with some merely reflecting unimportant factors or random variation
• The values sum to the number of items, so if 12 items, then there will be 12 eigenvalues that
sum to 12
• The proportion or percentage of (co)variance accounted for by each factor can be calculated
by dividing by the number of items
U N I V E R S I T Y O F S O U T H F L O R I D A // 50
EFA
• There are several possible rules which may be used for choosing the number of factors
based on eigenvalues
• The usual rule of greater than 1.0 (the Kaiser-Guttman rule) does not seem to work the
best
• Most use the scree plot and a subjective scree test by identifying the biggest drop in
eigenvalues
• The scree test seems to work well for identifying the correct number of factors
U N I V E R S I T Y O F S O U T H F L O R I D A // 51
U N I V E R S I T Y O F S O U T H F L O R I D A // 52
EFA
• Next steps in an EFA after deciding on the number of factors is to choose a
method of extraction
• The extraction method is the statistical algorithm used to estimate loadings
• There are several to choose from, of which principal factors (principal axis
factoring) or maximum likelihood seem to perform the best
U N I V E R S I T Y O F S O U T H F L O R I D A // 53
EFA
• Factor rotation
• Factor rotation is a mathematical scaling process for the loadings that also specifies
whether the factors are correlated (oblique) or uncorrelated (orthogonal)
• Usually, no harm in allowing factors to correlate
• If the factor correlation is zero, then the same as orthogonal
• Orthogonal rotation makes a strong assumption that the factors are uncorrelated, which
probably is not likely in most applications
U N I V E R S I T Y O F S O U T H F L O R I D A // 54
Confirmatory Factor Analysis
• Confirmatory factor analysis (CFA) starts with a hypothesis about how many factors there are
and which items load on which factors
• Factor loadings and factor correlations are obtained as in EFA
• EFA, in contrast, does not specify a measurement model initially and usually seeks to discover
the measurement model
• In EFA, all items load on all factors but in In CFA, most researchers start with a model in which
items load on only one factor (simple structure)
U N I V E R S I T Y O F S O U T H F L O R I D A // 55
CFA
• A test is computed to investigate how well the hypothesized factor structure fits with the data
• The fit test seeks to find a non-significant result, indicating good fit to the data
• The model fit is derived from comparing the correlations (technically, the covariances) among
the items to the correlations expected by the model being tested
• One may hear about many fit indices, so here are some
• Chi-square, χ2 lower values indicate better fit
• RMSEA, lower values indicate better fit (< .06)
• SRMR, lower values indicate better fit (< .08)
• Comparative Fit Index, higher value indicate better fit (>.95)
• Tucker-Lewis Index, higher value indicate better fit (>.95)
U N I V E R S I T Y O F S O U T H F L O R I D A // 56
CFA (contd.)
U N I V E R S I T Y O F S O U T H F L O R I D A // 57
EFA vs CFA
U N I V E R S I T Y O F S O U T H F L O R I D A // 58
EFA vs CFA
U N I V E R S I T Y O F S O U T H F L O R I D A // 59
CFA to SEM
U N I V E R S I T Y O F S O U T H F L O R I D A // 60
Another method to classify?
U N I V E R S I T Y O F S O U T H F L O R I D A // 61
Important characteristics of discriminant analysis
U N I V E R S I T Y O F S O U T H F L O R I D A // 62
Moreover, discriminant analysis
• Extracts dominant, underlying gradients of variation (canonical functions) among
groups of sample entities (e.g., species, sites, observations, etc.) from a set of
multivariate observations, such that variation among groups is maximized and variation
within groups is minimized along the gradient.
• Reduces the dimensionality of a multivariate data set by condensing a large number of
original variables into a smaller set of new composite dimensions (canonical functions)
with a minimum loss of information.
U N I V E R S I T Y O F S O U T H F L O R I D A // 63
Hence, discriminant analysis
• Summarizes data redundancy by placing similar entities in proximity in canonical
space and producing a parsimonious understanding of the data in terms of a few
dominant gradients of variation.
• Describes maximum differences among pre-specified groups of sampling entities
based on a suite of discriminating characteristics (i.e., canonical analysis of
discrimination).
• Predicts the group membership of future samples, or samples from unknown
groups, based on a suite of classification characteristics (i.e., classification).
U N I V E R S I T Y O F S O U T H F L O R I D A // 64
Takeaways
• Principal Component Analysis and Factor Analysis are data intensive methods
• These methods provide insights beyond inferential statistics
• These data intensive methods are increasingly more important in pattern
recognition, computer vision and natural language processing.
• In this course, we have just touched the tip of the iceberg!
U N I V E R S I T Y O F S O U T H F L O R I D A // 65
U N I V E R S I T Y O F S O U T H F L O R I D A //
You have reached the end of the
presentation.

More Related Content

Similar to Classification

Analysing & interpreting data.ppt
Analysing & interpreting data.pptAnalysing & interpreting data.ppt
Analysing & interpreting data.pptmanaswidebbarma1
 
Research Course - RCT.pdf
Research Course - RCT.pdfResearch Course - RCT.pdf
Research Course - RCT.pdfMarioKopljar1
 
Research Course - RCT.pptx
Research Course - RCT.pptxResearch Course - RCT.pptx
Research Course - RCT.pptxMarioKopljar1
 
Research Course - RCT.pdf
Research Course - RCT.pdfResearch Course - RCT.pdf
Research Course - RCT.pdfMarioKopljar1
 
Background on Usability Engineering
Background on Usability EngineeringBackground on Usability Engineering
Background on Usability EngineeringAndres Baravalle
 
Practical applications and analysis in Research Methodology
Practical applications and analysis in Research Methodology Practical applications and analysis in Research Methodology
Practical applications and analysis in Research Methodology Hafsa Ranjha
 
Data and scales of measurement
Data and scales of measurement Data and scales of measurement
Data and scales of measurement riturandad
 
4. Formulating research problems
4. Formulating research problems4. Formulating research problems
4. Formulating research problemsRazif Shahril
 
ONS Guide to Social and Economic Research – Welsh Baccalaureate Teacher Training
ONS Guide to Social and Economic Research – Welsh Baccalaureate Teacher TrainingONS Guide to Social and Economic Research – Welsh Baccalaureate Teacher Training
ONS Guide to Social and Economic Research – Welsh Baccalaureate Teacher TrainingOffice for National Statistics
 
Chapter 11 Data Analysis Classification and Tabulation
Chapter 11 Data Analysis Classification and TabulationChapter 11 Data Analysis Classification and Tabulation
Chapter 11 Data Analysis Classification and TabulationInternational advisers
 
Quantitative research presentation, safiah almurashi
Quantitative research presentation, safiah almurashiQuantitative research presentation, safiah almurashi
Quantitative research presentation, safiah almurashiQUICKFIXQUICKFIX
 
Methods of data collection
Methods of data collectionMethods of data collection
Methods of data collectionYogeshSorot
 
Data Presentation & Analysis.pptx
Data Presentation & Analysis.pptxData Presentation & Analysis.pptx
Data Presentation & Analysis.pptxheencomm
 
Presentation of Project and Critique.pptx
Presentation of Project and Critique.pptxPresentation of Project and Critique.pptx
Presentation of Project and Critique.pptxBillyMoses1
 

Similar to Classification (20)

Analysing & interpreting data.ppt
Analysing & interpreting data.pptAnalysing & interpreting data.ppt
Analysing & interpreting data.ppt
 
EDA
EDAEDA
EDA
 
Research Course - RCT.pdf
Research Course - RCT.pdfResearch Course - RCT.pdf
Research Course - RCT.pdf
 
Research Course - RCT.pptx
Research Course - RCT.pptxResearch Course - RCT.pptx
Research Course - RCT.pptx
 
Research Course - RCT.pdf
Research Course - RCT.pdfResearch Course - RCT.pdf
Research Course - RCT.pdf
 
Descriptive Analysis.pptx
Descriptive Analysis.pptxDescriptive Analysis.pptx
Descriptive Analysis.pptx
 
Background on Usability Engineering
Background on Usability EngineeringBackground on Usability Engineering
Background on Usability Engineering
 
Practical applications and analysis in Research Methodology
Practical applications and analysis in Research Methodology Practical applications and analysis in Research Methodology
Practical applications and analysis in Research Methodology
 
Data and scales of measurement
Data and scales of measurement Data and scales of measurement
Data and scales of measurement
 
research.pptx
research.pptxresearch.pptx
research.pptx
 
4. Formulating research problems
4. Formulating research problems4. Formulating research problems
4. Formulating research problems
 
ONS Guide to Social and Economic Research – Welsh Baccalaureate Teacher Training
ONS Guide to Social and Economic Research – Welsh Baccalaureate Teacher TrainingONS Guide to Social and Economic Research – Welsh Baccalaureate Teacher Training
ONS Guide to Social and Economic Research – Welsh Baccalaureate Teacher Training
 
Chapter 11 Data Analysis Classification and Tabulation
Chapter 11 Data Analysis Classification and TabulationChapter 11 Data Analysis Classification and Tabulation
Chapter 11 Data Analysis Classification and Tabulation
 
Statistics
StatisticsStatistics
Statistics
 
Happiness ppt (2) (1)
Happiness ppt (2) (1)Happiness ppt (2) (1)
Happiness ppt (2) (1)
 
Quantitative research presentation, safiah almurashi
Quantitative research presentation, safiah almurashiQuantitative research presentation, safiah almurashi
Quantitative research presentation, safiah almurashi
 
Methods of data collection
Methods of data collectionMethods of data collection
Methods of data collection
 
Data Presentation & Analysis.pptx
Data Presentation & Analysis.pptxData Presentation & Analysis.pptx
Data Presentation & Analysis.pptx
 
Presentation of Project and Critique.pptx
Presentation of Project and Critique.pptxPresentation of Project and Critique.pptx
Presentation of Project and Critique.pptx
 
Data Science 1.pdf
Data Science 1.pdfData Science 1.pdf
Data Science 1.pdf
 

More from Michael770443

Discrete Choice Model - Part 2
Discrete Choice Model - Part 2Discrete Choice Model - Part 2
Discrete Choice Model - Part 2Michael770443
 
Discrete Choice Model
Discrete Choice ModelDiscrete Choice Model
Discrete Choice ModelMichael770443
 
Categorical Data and Statistical Analysis
Categorical Data and Statistical AnalysisCategorical Data and Statistical Analysis
Categorical Data and Statistical AnalysisMichael770443
 
Analysis of Variance
Analysis of VarianceAnalysis of Variance
Analysis of VarianceMichael770443
 
Introduction to Statistical Methods
Introduction to Statistical MethodsIntroduction to Statistical Methods
Introduction to Statistical MethodsMichael770443
 
Overview of Statistical Concepts
Overview of Statistical ConceptsOverview of Statistical Concepts
Overview of Statistical ConceptsMichael770443
 

More from Michael770443 (8)

Discrete Choice Model - Part 2
Discrete Choice Model - Part 2Discrete Choice Model - Part 2
Discrete Choice Model - Part 2
 
Discrete Choice Model
Discrete Choice ModelDiscrete Choice Model
Discrete Choice Model
 
Categorical Data and Statistical Analysis
Categorical Data and Statistical AnalysisCategorical Data and Statistical Analysis
Categorical Data and Statistical Analysis
 
Analysis of Variance
Analysis of VarianceAnalysis of Variance
Analysis of Variance
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 
Linear Regression
Linear RegressionLinear Regression
Linear Regression
 
Introduction to Statistical Methods
Introduction to Statistical MethodsIntroduction to Statistical Methods
Introduction to Statistical Methods
 
Overview of Statistical Concepts
Overview of Statistical ConceptsOverview of Statistical Concepts
Overview of Statistical Concepts
 

Recently uploaded

Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdfssuserdda66b
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 

Recently uploaded (20)

Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 

Classification

  • 1. U N I V E R S I T Y O F S O U T H F L O R I D A // 1 Classification Methods Dr. S. Shivendu
  • 2. U N I V E R S I T Y O F S O U T H F L O R I D A // 2 Objectives Overview of Classification Methods Understand statistical concepts of Principal component analysis and Factor analysis. 01 Interpret results of Principal component analysis and Factor analysis. 02 Be able to debug SAS programs and export data from SAS files. 03
  • 3. U N I V E R S I T Y O F S O U T H F L O R I D A // 3 Agenda Overview of Statistical Concepts Principal Component Analysis Factor Analysis
  • 4. U N I V E R S I T Y O F S O U T H F L O R I D A // 4 Course Textbooks
  • 5. U N I V E R S I T Y O F S O U T H F L O R I D A // 5 • We study phenomena that can not be directly observed – ego, personality, intelligence in psychology – Underlying factors that govern the observed data • We want to identify and operate with underlying latent factors rather than the observed data – E.g. topics in news articles – Transcription factors in genomics • We want to discover and exploit hidden relationships – “beautiful car” and “gorgeous automobile” are closely related – So are “driver” and “automobile” – But does your search engine know this? – Reduces noise and error in results Why Principal Component or Factor Analysis?
  • 6. U N I V E R S I T Y O F S O U T H F L O R I D A // 6 • We have too many observations and dimensions – To reason about or obtain insights from – To visualize – Too much noise in the data – Need to “reduce” them to a smaller set of factors – Better representation of data without losing much information – Can build more effective data analyses on the reduced-dimensional space: classification, clustering, pattern recognition • Combinations of observed variables may be more effective bases for insights, even if physical meaning is obscure Why Principal Component or Factor Analysis?
  • 7. U N I V E R S I T Y O F S O U T H F L O R I D A // 7 • Discover a new set of factors/dimensions/axes against which to represent, describe or evaluate the data – For more effective reasoning, insights, or better visualization – Reduce noise in the data – Typically a smaller set of factors: dimension reduction • Factors are combinations of observed variables – May be more effective bases for insights, even if physical meaning is obscure – Observed data are described in terms of these factors rather than in terms of original variables/dimensions Factor or Component Analysis
  • 8. U N I V E R S I T Y O F S O U T H F L O R I D A // 8 Basic Concept • Areas of variance in data are where items can be best discriminated and key underlying phenomena observed • Areas of greatest “signal” in the data • If two items or dimensions are highly correlated or dependent • They are likely to represent highly related phenomena • If they tell us about the same underlying variance in the data, combining them to form a single measure is reasonable • Parsimony • Reduction in Error • So we want to combine related variables, and focus on uncorrelated or independent ones, especially those along which the observations have high variance • We want a smaller set of variables that explain most of the variance in the original data, in more compact and insightful form
  • 9. U N I V E R S I T Y O F S O U T H F L O R I D A // 9 Basic Concept • What if the dependences and correlations are not so strong or direct? • And suppose you have 3 variables, or 4, or 5, or 10000? • Look for the phenomena underlying the observed covariance/co-dependence in a set of variables • Once again, phenomena that are uncorrelated or independent, and especially those along which the data show high variance • These phenomena are called “factors” or “principal components” or “independent components,” depending on the methods used • Factor analysis: based on variance/covariance/correlation • Independent Component Analysis: based on independence
  • 10. U N I V E R S I T Y O F S O U T H F L O R I D A // 10 Principal Component Analysis • Most common form of factor analysis • The new variables/dimensions • Are linear combinations of the original ones • Are uncorrelated with one another • Orthogonal in original dimension space • Capture as much of the original variance in the data as possible • Are called Principal Components
  • 11. U N I V E R S I T Y O F S O U T H F L O R I D A // 11 Principal Component Analysis: An Overview • It is a mathematical tool from applied linear algebra. • It is a simple, non-parametric method of extracting relevant information from a data set with large number of variables. • It provides a roadmap for how to reduce a complex data set to a lower dimension.
  • 12. U N I V E R S I T Y O F S O U T H F L O R I D A // 12 Let’s start with first principles: Variance
  • 13. U N I V E R S I T Y O F S O U T H F L O R I D A // 13 So, what is Covariance?
  • 14. U N I V E R S I T Y O F S O U T H F L O R I D A // 14 What do we understand by Covariance?
  • 15. U N I V E R S I T Y O F S O U T H F L O R I D A // 15 So, for Covariance
  • 16. U N I V E R S I T Y O F S O U T H F L O R I D A // 16 Why Covariance? • Why bother with calculating (expensive) covariance when we could just plot the 2 values to see their relationship? • Covariance calculations are used to find relationships between dimensions in high dimensional data sets (usually greater than 3) where visualization is difficult.
  • 17. U N I V E R S I T Y O F S O U T H F L O R I D A // 17 Another concept: Linear Independence • A set of n-dimensional vectors xi Є Rn, are said to be linearly independent if none of them can be written as a linear combination of the others. • In other words,
  • 18. U N I V E R S I T Y O F S O U T H F L O R I D A // 18 Linear Independence continued
  • 19. U N I V E R S I T Y O F S O U T H F L O R I D A // 19 Some other terms: Span and Basis
  • 20. U N I V E R S I T Y O F S O U T H F L O R I D A // 20 Orthonormal and Orthogonal Basis
  • 21. U N I V E R S I T Y O F S O U T H F L O R I D A // 21 Some more basic terms: Eigen Value Problem and Eigenvector
  • 22. U N I V E R S I T Y O F S O U T H F L O R I D A // 22 How to calculate Eigenvalues and Eigenvectors?
  • 23. U N I V E R S I T Y O F S O U T H F L O R I D A // 23 Calculating Eigenvalues and Eigenvector (you can skip this)
  • 24. U N I V E R S I T Y O F S O U T H F L O R I D A // 24 Properties of Eigenvalues and Eigenvectors
  • 25. U N I V E R S I T Y O F S O U T H F L O R I D A // 25 So, What is Principal Component Analysis? • An example:
  • 26. U N I V E R S I T Y O F S O U T H F L O R I D A // 26 Example (continued)
  • 27. U N I V E R S I T Y O F S O U T H F L O R I D A // 27 In other words:
  • 28. U N I V E R S I T Y O F S O U T H F L O R I D A // 28 Which parameters we want to ignore and which ones we want to keep? • Which parameters to keep: • Parameter that doesn’t depend on others (e.g. eye color), i.e. uncorrelated and hence have low covariance. • Parameter that changes a lot: Have High variance • Which parameters to drop: • Constant parameter (number of heads) • Parameter with very low variance (thickness of hair) • Parameter that is linearly dependent on other parameters: Z= aX + bY
  • 29. U N I V E R S I T Y O F S O U T H F L O R I D A // 29 Now we are ready!
  • 30. U N I V E R S I T Y O F S O U T H F L O R I D A // 30 Core principles of Principal Component Analysis (PCA) • PCA is a mathematical presentation using linear algebra to “best express” the raw data along minimum number of dimensions”. • PCA allows us to filter out noise and extract the relevant information from the given data set. • Hence, the representation we are looking for is such that it decreases both noise and redundancy in the data set at hand.
  • 31. U N I V E R S I T Y O F S O U T H F L O R I D A // 31 What is noise?
  • 32. U N I V E R S I T Y O F S O U T H F L O R I D A // 32 What is redundancy?
  • 33. U N I V E R S I T Y O F S O U T H F L O R I D A // 33 Goal of PCA:
  • 34. U N I V E R S I T Y O F S O U T H F L O R I D A // 34 PCA assumption and how it works
  • 35. U N I V E R S I T Y O F S O U T H F L O R I D A // 35 How does it work?
  • 36. U N I V E R S I T Y O F S O U T H F L O R I D A // 36 In summary
  • 37. U N I V E R S I T Y O F S O U T H F L O R I D A // 37 Factor Analysis • General Concepts • Factor analysis provides information about reliability, item quality, and construct validity • General goal is to understand whether and to what extent items from a scale may reflect an underlying hypothetical construct or constructs, known as factors • An analytic method with high sensitivity to identify problematic items and assess the number of factors • Useful for analysis of survey data
  • 38. U N I V E R S I T Y O F S O U T H F L O R I D A // 38 General concepts (contd.) • In general, factor analysis methods decompose (or break down) the covariation among items in a measure into meaningful components • Higher inter-item correlations should reflect greater overlap in what the items measure, and, therefore, higher inter-item correlations reflect higher internal reliability
  • 39. U N I V E R S I T Y O F S O U T H F L O R I D A // 39 Objectives • What is factor analysis? • What do we need factor analysis for? • What are the modeling assumptions? • How to specify, fit, and interpret factor models? • What is the difference between exploratory and confirmatory factor analysis? • What is and how to assess model identifiability? 39
  • 40. U N I V E R S I T Y O F S O U T H F L O R I D A // 40 What is factor analysis? • Factor analysis is a theory driven statistical data reduction technique used to explain covariance among observed random variables in terms of fewer unobserved random variables named factors
  • 41. U N I V E R S I T Y O F S O U T H F L O R I D A // 41 General concept • In practice, a factor cannot be estimated with one item • Should only be estimated with three or more items with higher correlation with factor contribute more to the measure
  • 42. U N I V E R S I T Y O F S O U T H F L O R I D A // 42 General Concepts • Items are referred to as indicators Regression slopes between factor and indicators are referred to as loadings
  • 43. U N I V E R S I T Y O F S O U T H F L O R I D A // 43 General Concepts • Patterns of high inter-item correlations among subsets of items suggest more than one factor because the items tend to “cluster” together • Any number of factors might underlie a set of items, up to the total number of items (which would imply no common factor) • Example: set of six items might assess extroversion and openness
  • 44. U N I V E R S I T Y O F S O U T H F L O R I D A // 44 General Concepts
  • 45. U N I V E R S I T Y O F S O U T H F L O R I D A // 45 General Concepts • We never know the meaning of the factors, however; we can only use theory to decide what they mean and then test their validity • The factors may be related or not related—correlated or orthogonal (uncorrelated) • If those who are extroverted tend to be a little more open, then the factors are correlated (contrary to what is suggested by the table)
  • 46. U N I V E R S I T Y O F S O U T H F L O R I D A // 46 Types of Factor Analysis • Two major types of factor analysis • Exploratory factor analysis (EFA) • Confirmatory factor analysis (CFA) • Major difference is that EFA seeks to discover the number of factors and does not specify which items load on which factors
  • 47. U N I V E R S I T Y O F S O U T H F L O R I D A // 47 Exploratory Factor Analysis • The researcher may discover there is one factor underlying the items or many factors • Items may be eliminated by the researcher if they do not load highly • Researchers choose items that load highly on one factor and low on other factors to achieve simple structure • Composite scale scores often created based on the factor analysis to be used in further research
  • 48. U N I V E R S I T Y O F S O U T H F L O R I D A // 48 Exploratory Factor Analysis • An initial analysis called principal components analysis (PCA) is first conducted to help determine the number of factors that underlie the set of items • PCA is the default EFA method in most software and the first stage in other exploratory factor analysis methods to select the number of factors • PCA is not considered a “true factor analysis method,” because measurement error is not estimated.
  • 49. U N I V E R S I T Y O F S O U T H F L O R I D A // 49 EFA • PCA gives eigenvalues for the number of components (factors) equal to the number of items • If 12 items, there will be 12 eigenvalues • Each component is a potential “cluster” of highly inter-correlated items • Eigenvalues represent the amount of variance accounted for by each component, but they are not in a standardized metric • Larger eigenvalues indicate a more important (and more likely real) components or factor, with some merely reflecting unimportant factors or random variation • The values sum to the number of items, so if 12 items, then there will be 12 eigenvalues that sum to 12 • The proportion or percentage of (co)variance accounted for by each factor can be calculated by dividing by the number of items
  • 50. U N I V E R S I T Y O F S O U T H F L O R I D A // 50 EFA • There are several possible rules which may be used for choosing the number of factors based on eigenvalues • The usual rule of greater than 1.0 (the Kaiser-Guttman rule) does not seem to work the best • Most use the scree plot and a subjective scree test by identifying the biggest drop in eigenvalues • The scree test seems to work well for identifying the correct number of factors
  • 51. U N I V E R S I T Y O F S O U T H F L O R I D A // 51
  • 52. U N I V E R S I T Y O F S O U T H F L O R I D A // 52 EFA • Next steps in an EFA after deciding on the number of factors is to choose a method of extraction • The extraction method is the statistical algorithm used to estimate loadings • There are several to choose from, of which principal factors (principal axis factoring) or maximum likelihood seem to perform the best
  • 53. U N I V E R S I T Y O F S O U T H F L O R I D A // 53 EFA • Factor rotation • Factor rotation is a mathematical scaling process for the loadings that also specifies whether the factors are correlated (oblique) or uncorrelated (orthogonal) • Usually, no harm in allowing factors to correlate • If the factor correlation is zero, then the same as orthogonal • Orthogonal rotation makes a strong assumption that the factors are uncorrelated, which probably is not likely in most applications
  • 54. U N I V E R S I T Y O F S O U T H F L O R I D A // 54 Confirmatory Factor Analysis • Confirmatory factor analysis (CFA) starts with a hypothesis about how many factors there are and which items load on which factors • Factor loadings and factor correlations are obtained as in EFA • EFA, in contrast, does not specify a measurement model initially and usually seeks to discover the measurement model • In EFA, all items load on all factors but in In CFA, most researchers start with a model in which items load on only one factor (simple structure)
  • 55. U N I V E R S I T Y O F S O U T H F L O R I D A // 55 CFA • A test is computed to investigate how well the hypothesized factor structure fits with the data • The fit test seeks to find a non-significant result, indicating good fit to the data • The model fit is derived from comparing the correlations (technically, the covariances) among the items to the correlations expected by the model being tested • One may hear about many fit indices, so here are some • Chi-square, χ2 lower values indicate better fit • RMSEA, lower values indicate better fit (< .06) • SRMR, lower values indicate better fit (< .08) • Comparative Fit Index, higher value indicate better fit (>.95) • Tucker-Lewis Index, higher value indicate better fit (>.95)
  • 56. U N I V E R S I T Y O F S O U T H F L O R I D A // 56 CFA (contd.)
  • 57. U N I V E R S I T Y O F S O U T H F L O R I D A // 57 EFA vs CFA
  • 58. U N I V E R S I T Y O F S O U T H F L O R I D A // 58 EFA vs CFA
  • 59. U N I V E R S I T Y O F S O U T H F L O R I D A // 59 CFA to SEM
  • 60. U N I V E R S I T Y O F S O U T H F L O R I D A // 60 Another method to classify?
  • 61. U N I V E R S I T Y O F S O U T H F L O R I D A // 61 Important characteristics of discriminant analysis
  • 62. U N I V E R S I T Y O F S O U T H F L O R I D A // 62 Moreover, discriminant analysis • Extracts dominant, underlying gradients of variation (canonical functions) among groups of sample entities (e.g., species, sites, observations, etc.) from a set of multivariate observations, such that variation among groups is maximized and variation within groups is minimized along the gradient. • Reduces the dimensionality of a multivariate data set by condensing a large number of original variables into a smaller set of new composite dimensions (canonical functions) with a minimum loss of information.
  • 63. U N I V E R S I T Y O F S O U T H F L O R I D A // 63 Hence, discriminant analysis • Summarizes data redundancy by placing similar entities in proximity in canonical space and producing a parsimonious understanding of the data in terms of a few dominant gradients of variation. • Describes maximum differences among pre-specified groups of sampling entities based on a suite of discriminating characteristics (i.e., canonical analysis of discrimination). • Predicts the group membership of future samples, or samples from unknown groups, based on a suite of classification characteristics (i.e., classification).
  • 64. U N I V E R S I T Y O F S O U T H F L O R I D A // 64 Takeaways • Principal Component Analysis and Factor Analysis are data intensive methods • These methods provide insights beyond inferential statistics • These data intensive methods are increasingly more important in pattern recognition, computer vision and natural language processing. • In this course, we have just touched the tip of the iceberg!
  • 65. U N I V E R S I T Y O F S O U T H F L O R I D A // 65 U N I V E R S I T Y O F S O U T H F L O R I D A // You have reached the end of the presentation.