Exploratory factor analysis

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

4 comments

Comments 1 - 4 of 4 previous next Post a comment

  • + guest55136bd guest55136bd 8 months ago
    Very helpful!! Thank you so much!
  • + guest9e02bc4 guest9e02bc4 8 months ago
    Very helpful. Thanks heaps!
  • + guest7c0af8 guest7c0af8 8 months ago
    Extremely helpful!
  • + abdalqani abdalqani 2 years ago
    it is one of the best tutorial on factor analysis I has seen yet. thank you a lot
Post a comment
Embed Video
Edit your comment Cancel

Notes on slide 1

7126/6667 Survey Research & Design in Psychology Semester 1, 2009, University of Canberra, ACT, Australia James T. Neill Home page: http://ucspace.canberra.edu.au/display/7126 Lecture page: http://en.wikiversity.org/wiki/Exploratory_factor_analysis Notes page: http://en.wikiversity.org/wiki/Survey_research_methods_and_design_in_psychology Image source: http://www.flickr.com/photos/jimkster/832582262/ By jimkster: http://www.flickr.com/photos/jimkster License: CC-by-A 2.0 Description: This lecture is accompanied by a computer-lab based tutorial, the notes for which are available here: http://ucspace.canberra.edu.au/display/7126/Tutorial+-+Psychometrics Introduces and explains the use of exploratory factor analysis particularly for the purposes of psychometric instrument development.

6 Favorites

Exploratory factor analysis - Presentation Transcript

  1. Lecture 5 Survey Research & Design in Psychology James Neill, 2009 Exploratory Factor Analysis
  2. Overview
    • Introduction
    • Assumptions
    • Steps / Process
    • Summary
  3. Introduction
    • What is factor analysis?
    • Purpose
    • History
  4. Conceptual model of factor analysis FA uses correlations among many items to search for common clusters.
  5. What is factor analysis?
    • A family of multivariate statistical techniques for examining correlations amongst variables.
    • Used to identify clusters of inter-correlated variables (or 'factors').
  6. What is factor analysis?
    • Groups of related variables are called ‘factors’.
    • Empirically tests theoretical data structures.
    • Commonly used in psychometric instrument development.
  7. Purposes There are two main applications of factor analytic techniques:
      • Data reduction : Reduce the number of variables to a smaller number of factors.
      • Theory development : Detect structure in the relationships between variables, that is, to classify variables.
  8. Data reduction
    • Simplifies data by revealing a smaller number of underlying factors
    • Helps to eliminate:
      • redundant variables
      • unclear variables
      • irrelevant variables
  9. EFA = Exploratory Factor Analysis
      • explores & summarises underlying correlational structure for a data set
    CFA = Confirmatory Factor Analysis
      • tests the correlational structure of a data set against a hypothesised structure and rates the “goodness of fit”
    Exploratory vs. confirmatory factor analysis
  10. History of factor analysis
    • Invented by Spearman (1904)
    • Usage was hampered by onerousness of hand calculation
    • Since the advent of computers, usage has thrived, esp. to develop:
      • Theory e.g., determining the structure of personality
      • Practice e.g., development of 10,000s+ of psychological screening & measurement tests
  11. Conceptual model - Simple model
    • e.g., 12 items testing might actually tap only 3 underlying factors
    • Factors consist of relatively homogeneous variables.
    Factor 1 Factor 2 Factor 3
  12. Conceptual model - Simple model
  13. Eysenck’s three personality factors e.g., 12 items testing three underlying dimensions of personality Extraversion/ introversion Neuroticism Psychoticism talkative shy sociable fun anxious gloomy relaxed tense unconventional nurturing harsh loner
  14. Conceptual model – Complex model (with cross-loadings)
  15. How many factors? One factor? Three factors? Nine factors? (independent items)
  16. Conceptual model – Area plot
  17. Conceptual models for factor analysis (3-d)
  18. Factor analysis process
  19. IQ – does intelligence consist of separate factors, e.g,.
    • Verbal
    • Mathematical
    • Interpersonal, etc.?
    ...or is it one global factor (g)? ...or is it hierarchically structured? Examples of psychological factor structures: Intelligence
  20. Personality – does it consist of 2, 3, or 5, 16, etc. factors? e.g., the “Big 5”?
    • Neuroticism
    • Extraversion
    • Agreeableness
    • Openness
    • Conscientiousness
    Examples of psychological factor structures: Personality
  21. Examples of psychological factor factor structures: Essential facial features
  22. Six orthogonal factors, represent 76.5 % of the total variability in facial recognition (in order of importance):
      • upper-lip
      • eyebrow-position
      • nose-width
      • eye-position
      • eye/eyebrow-length
      • face-width.
    Essential facial features
  23. Assumptions
    • GIGO
    • Sample size
    • Levels of measurement
    • Normality
    • Linearity
    • Outliers
    • Factorability
  24. G arbage . I n . G arbage . O ut
  25. Assumption testing: Sample size Some guidelines:
    • Min.: N > 5 cases per variable
      • e.g., 20 variables, should have > 100 cases (1:5)
    • Ideal: N > 20 cases per variable
      • e.g., 20 variables, ideally have > 400 cases (1:20)
    • Total N > 200 preferable
  26. Assumption testing: Sample size Comrey and Lee (1992): 50 = very poor, 100 = poor, 200 = fair, 300 = good, 500 = very good 1000+ = excellent
  27. Assumption testing – Sample size
    • 15 classroom behaviours of high-school children were rated by teachers using a 5-point scale.
    • Task: Identify groups of variables (behaviours) that are strongly inter-related & represent underlying factors.
    Example factor analysis: Classroom behaviour
  28. Classroom behaviour items
    • Cannot concentrate ↔ can concentrate
    • Curious & enquiring ↔ little curiousity
    • Perseveres ↔ lacks perseverance
    • Irritable ↔ even-tempered
    • Easily excited ↔ not easily excited
    • Patient ↔ demanding
    • Easily upset ↔ contented
    • Control ↔ no control
    • Relates warmly to others ↔ provocative, disruptive
    • Persistent ↔ easily frustrated
    • Difficult ↔ easy
    • Restless ↔ relaxed
    • Lively ↔ settled
    • Purposeful ↔ aimless
    • Cooperative ↔ disputes
    Classroom behaviour items
  29. Classroom behaviour items
  30. Assumption testing: Level of Measurement
    • All variables must be suitable for correlational analysis, i.e., they should be ratio/metric data or at least Likert data with several interval levels.
  31. Assumption testing: Normality
    • FA is robust to violation of assumptions of normality
    • If the variables are normally distributed then the solution is enhanced
  32. Assumption Testing: Linearity
    • Because FA is based on correlations between variables, it is important to check there are linear relations amongst the variables (i.e., check scatterplots)
  33. Assumption testing: Outliers
    • FA is sensitive to outlying cases
      • Bivariate outliers (e.g., check scatterplots)
      • Multivariate outliers (e.g., Mahalanobis’ distance)
    • Identify outliers, then remove or transform
  34. Assumption testing: Factorability Check the factorability of the correlation matrix (i.e., how suitable is the data for factor analysis?)
    • Correlation matrix correlations > .3?
    • Anti-image matrix diagonals > .5?
    • Measures of sampling adequacy (MSAs)?
      • Bartlett’s
      • KMO
  35. Assumption Testing: Factorability
    • Examine correlation matrix
    • Take note whether there are SOME correlations over .30 – if not, reconsider doing an FA – remember G.I.G.O.
  36. Assumption testing: Factorability
    • Medium effort, reasonably accurate
    • Examine the diagonals on the anti-image correlation matrix – variables with correatlions less that .5 should be excluded from the analysis – they lack sufficient correlation with other variables
    Assumption testing: Factorability: Anti-image correlation matrix
  37. Anti-Image correlation matrix
    • Quickest method, but least reliable
    • Global diagnostic indicators - correlation matrix is factorable if:
      • Bartlett’s test of sphericity is significant and/or
      • Kaiser-Mayer Olkin (KMO) measure of sampling adequacy > .5
    Assumption testing: Factorability: Measures of sampling adequacy
  38. Assumption testing: Factorability
  39. Summary: Measures of sampling adequacy
    • Several correlations > .3?
    • Anti-image matrix diagonals > .5?
    • Bartlett’s test significant?
    • KMO > .5 to .6? (depends on whose rule of thumb)
  40. Steps / Process
    • Test assumptions
    • Select type of analysis
    • Determine no. of factors
    • Identify which items belong in each factor
    • Drop items as necessary and repeat steps 3 to 4
    • Name and define factors
    • Examine correlations amongst factors
    • Analyse internal reliability
  41. Type of EFA: Extraction method: PC vs. PAF There are two main approaches to EFA based on:
    • Analysing all variance Principle Components (PC)
    • Analysing only shared variance Principle Axis Factoring (PAF)
  42. Principal components (PC)
    • More common
    • More practical
    • Used to reduce data to a set of factor scores for use in other analyses
    • Analyses all the variance in each variable
  43. Principal axis factoring (PAF)
    • Used to uncover the structure of an underlying set of p original variables
    • More theoretical
    • Analyses only shared variance (i.e. leaves out unique variance)
  44. Total variance of a variable
    • Often there is little difference in the solutions for the two procedures.
    • Often it’s a good idea to check your solution using both techniques
    • If you get a different solution between the two methods
      • Try to work out why and decide on which solution is more appropriate
    PC vs. PAF
  45. Communalities
    • The proportion of variance in each variable explained by the factors
    • Communality for a variable = sum of the squared loadings for the variable on each of the factors
    • Communalities range between 0 and 1
    • High communalities (> .5): Extracted factors explain most of the variance in the variables being analysed
    • Low communalities (< .5): A variable has considerable variance unexplained by the extracted factors
    • May then need to extract MORE factors to explain the variance
    Communalities
  46. Communalities - 2
  47. Eigen values
    • Each factor has an eigen value
    • Sum of squared correlations
    • Indicates the overall strength of relationship between a factor and the variables
    • Successive EVs have lower values
    • Eigen values over 1 are ‘stable’
  48. Explained variance
    • A good factor solution is one that explains the most variance with the fewest factors
    • Realistically, happy with 50-75% of the variance explained
  49. Explained variance
  50. How many factors? A subjective process ... Seek to explain maximum variance using fewest factors, considering:
    • Theory – what is predicted/expected?
    • Eigen Values > 1? (Kaiser’s criterion)
    • Scree Plot – where does it drop off?
    • Interpretability of last factor?
    • Try several different solutions?
    • Factors must be able to be meaningfully interpreted & make theoretical sense?
  51. How many factors?
    • Aim for 50-75% of variance explained with 1/4 to 1/3 as many factors as variables/items.
    • Stop extracting factors when they no longer represent useful/meaningful clusters of variables
    • Keep checking/clarifying the meaning of each factor and its items.
  52. Scree plot
    • A line graph of Eigen Values
    • Depicts amount of variance explained by each factor.
    • Look for where additional factors fail to add appreciably to the cumulative explained variance.
    • 1st factor explains the most variance
    • Last factor explains the least amount of variance
  53. Scree plot
  54. Scree plot Scree plot
  55. Scree plot Scree plot
    • Factor loadings (FLs) indicate relative importance of each item to each factor.
      • In the initial solution, each factor tries “selfishly” to grab maximum unexplained variance.
      • All variables will tend to load strongly on the 1st factor
    Initial solution: Unrotated factor structure
  56. Initial solution - Unrotated factor structure Factors are made up of linear combinations of the variables (max. poss. sum of squared rs for each variable)
  57. 1st factor extracted:
    • Best possible line of best fit through the original variables
    • Seeks to explain maximum overall variance
    • A single, best summary of the variance in the whole set of items
    Initial solution - Unrotated factor structure
    • Each subsequent factor tries to maximise the amount of unexplained variance which it can explain.
      • Second factor is orthogonal to first factor - seeks to maximize its own eigen value (i.e., tries to gobble up as much of the remaining unexplained variance as possible)
    Initial solution - Unrotated factor structure
  58. Vectors (Lines of best fit)
  59. Initial solution: Unrotated factor structure
    • Seldom see a simple unrotated factor structure
    • Many variables will load on 2 or more factors
    • Some variables may not load highly on any factors
    • Until the FLs are rotated, they are difficult to interpret.
    • Rotation of the FL matrix helps to find a more interpretable factor structure.
  60. Initial solution – Unrotated factor structure 4
  61. Two basic types of factor rotation Orthogonal (Varimax) Oblique (Oblimin)
  62. Two basic types of factor rotation
    • Orthogonal minimises factor covariation, produces factors which are uncorrelated
    • Oblimin allows factors to covary, allows correlations between factors
  63. Orthogonal rotation
  64. Why rotate a factor loading matrix?
    • After rotation, the vectors (lines of best fit) are rearranged to optimally go through clusters of shared variance
    • Then the FLs and the factor they represent can be more readily interpreted
  65. Why rotate a factor loading matrix?
    • A rotated factor structure is simpler & more easily interpretable
      • each variable loads strongly on only one factor
      • each factor shows at least 3 strong loadings
      • all loading are either strong or weak, no intermediate loadings
  66. Orthogonal vs. oblique rotations
    • Consider purpose of factor analysis
    • If in doubt, try both
    • Consider interpretability
    • Look at correlations between factors in oblique solution
      • if >.3 then go with oblique rotation (>10% shared variance between factors)
  67. Rotated factor matrix Task Orientation Sociability Settledness
  68. 3-d plot
  69. 2-d plots
  70. 2-d plots
  71. Interpretability
    • It is dangerous to be driven by factor loadings only – think about it - be guided by theory and common sense in selecting factor structure.
    • You must be able to understand and interpret a factor if you’re going to extract it.
  72. Interpretability
    • However, watch out for ‘seeing what you want to see’ when factor analysis evidence might suggest a different solution.
    • There may be more than one good solution! e.g.,
      • 2 factor model of personality
      • 5 factor model of personality
      • 16 factor model of personality
  73. Factor loadings & item selection A factor structure is most interpretable when: 1. Each variable loads strongly (> + .40) on only one factor 2. Each factor shows 3 or more strong loadings; more loadings = greater reliability 3. Most loadings are either high or low, few intermediate values. 4. These elements give a ‘simple’ factor structure.
    • Bare min. = 2
    • Recommended min. = 3
    • Max. = unlimited
    • More items:
    -> ↑ reliability -> ↑ 'roundedness' -> Law of diminishing returns
    • Typically = 4 to 10 is reasonable
    How many items per factor?
  74. How do I eliminate items? A subjective process, but consider:
    • Size of main loading (min=.4)
    • Size of cross loadings (max=.3?)
    • Meaning of item (face validity)
    • Contribution it makes to the factor
    • Eliminate 1 variable at a time, then re-run, before deciding which/if any items to eliminate next
    • Number of items already in the factor
  75. Factor loadings & item selection Choosing a cut-off for acceptable loadings:
    • Look for gap in loadings (e.g., .8, .7, .6 , .3, .2)
    • Choose cut-off because factors can be interpreted above but not below cut-off
  76. Factor loadings & item selection Comrey & Lee (1992): loadings > .70 - excellent > .63 - very good > .55 - good > .45 - fair > .32 - poor
  77. Factor analysis in practice
    • To find a good solution, consider each combination of:
      • PC-varimax
      • PC-oblimin
      • PAF-varimax
      • PAF-oblimin
    • Apply the above methods to a range of possible/likely factors, e.g., for 2, 3, 4, 5, 6, and 7 factors
    • Eliminate poor items one at a time, retesting the possible solutions
    • Conduct reliability analysis
    • Check factor structure across sub-groups (e.g., gender) if there is sufficient data
    • You will probably come up with a different solution from someone else!
    Factor analysis in practice
  78. Example: Condom use
    • The Condom Use Self-Efficacy Scale (CUSES) was administered to 447 multicultural college students.
    • PC FA with a varimax rotation.
    • Three distinct factors were extracted:
      • `Appropriation'
      • `Sexually Transmitted Diseases‘
      • `Partners' Disapproval.
  79. Factor loadings & item selection .56   I feel confident I could gracefully remove and dispose of a condom after sexual intercourse .61   I feel confident I could remember to carry a condom with me should I need one .65   I feel confident I could purchase condoms without feeling embarrassed .75   I feel confident in my ability to put a condom on myself or my partner   FL Factor 1: Appropriation
  80. Factor loadings & item selection .80   I would not feel confident suggesting using condoms with a new partner because I would be afraid he or she would think I thought they had a sexually transmitted disease .86   I would not feel confident suggesting using condoms with a new partner because I would be afraid he or she would think I have a sexually transmitted disease .72   I would not feel confident suggesting using condoms with a new partner because I would be afraid he or she would think I've had a past homosexual experience   FL Factor 2: STDs
  81. Factor loadings & item selection .58   If my partner and I were to try to use a condom and did not succeed, I would feel embarrassed to try to use one again (e.g. not being able to unroll condom, putting it on backwards or awkwardness) .65   If I were unsure of my partner's feelings about using condoms I would not suggest using one .73   If I were to suggest using a condom to a partner, I would feel afraid that he or she would reject me   FL Factor 3: Partner's reaction
  82. Factor correlations
  83. Summary
    • Introduction
    • Assumptions
    • Steps/Process
  84. Introduction: Summary
    • Factor analysis is a family of multivariate correlational data analysis methods for summarising clusters of covariance.
    • FA analyses and summarises correlations amongst items.
    • These common clusters (the factors) can be used as summary indicators of the underlying constructs.
  85. Assumptions: Summary
    • Sample size
      • 5+ cases per variables (ideal is 20+)
      • N > 200
    • Bivariate & multivariate outliers
    • Factorability of correlation matrix (Measures of Sampling Adequacy)
    • Normality enhances the solution
  86. Summary: Steps / Process
    • Test assumptions
    • Select type of analysis
    • Determine no. of factors (Eigen Values, Scree plot, % variance explained)
    • Select items (drop items one by one and repeat analysis)
    • Name and define factors
    • Examine correlations amongst factors
    • Analyse internal reliability
  87. Summary: Types of FA: PC vs. PAF
    • PC: Data reduction e.g., computing composite scores for another analysis (uses all variance)
    • PAF: Theoretical data exploration (uses shared variance)
    • Try both ways – are solutions different?
  88. Summary: Rotation
    • Orthogonal – perpendicular vectors
    • Oblique – angled vectors
    • Try both ways – are solutions different?
  89. No. of factors to extract?
    • Inspect EVs - look for > 1 or sudden drop (Inspect scree plot)
    • Use % of variance explained (aim for 50 to 75%)
    • Interpretability / theory
    Summary: Factor extraction
    • Fabrigar, L. R., Wegener, D. T., MacCallum, R. C., & Strahan, E. J. (1999). Evaluating the use of exploratory factor analysis in psychological research. Psychological Methods , 4(3), 272-299.
    • Tabachnick, B. G. & Fidell, L. S. (2001). Principal components and factor analysis. In Using multivariate statistics . (4th ed., pp. 582 - 633). Needham Heights, MA: Allyn & Bacon.
    References

+ University of CanberraUniversity of Canberra, 2 years ago

custom

6877 views, 6 favs, 2 embeds more stats

Introduces and explains the use of exploratory fact more

More info about this document

CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

Go to text version

  • Total Views 6877
    • 6537 on SlideShare
    • 340 from embeds
  • Comments 4
  • Favorites 6
  • Downloads 406
Most viewed embeds
  • 339 views on http://ucspace.canberra.edu.au
  • 1 views on http://static.slidesharecdn.com

more

All embeds
  • 339 views on http://ucspace.canberra.edu.au
  • 1 views on http://static.slidesharecdn.com

less

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

Cancel
File a copyright complaint
Having problems? Go to our helpdesk?

Categories