Data Analysis

 Anindita C. Rao
Data Analysis
D                                         Data Analysis    Results
A                                                          interpretation
T
A



C
O
L   Getting data ready
L   for analysis          Feel for data     Goodness of    Hypothesis
E   •Editing              1.Mean            Data           Testing
    •Handling Blank       2.S.D.            •Reliability   •Appropriate
C
    Reponses              3.Correlation     •Validity      statistical tools
T                         4.Frequency
    •Coding
I   •Categorizing         distribution
O   •Creating data file
N   •Programming
Data Preparation
•   Getting data ready for analysis(SPSS)
•   Editing
•   Handling Blank Reponses
•   Coding
•   Categorizing
•   Creating data file
•   Programming
Coding & Categorization
• Usually a number to each response
• Coding of questionnaires
• If male=1 and if female=2
• Negatively worded questions
Some compromises with ethics helps you in
  practical life Strongly disagree1-2-3-4-5-6-7Strongly agree

• EXCEL we designate specific columns to
  specific questions and responses
Descriptive statistics
• Frequency(tabulation)
• Measure of central tendency(mean,
  median & mode)
• Measure of Dispersion(range, variance,
  standard deviation & interquartile range)
Inferential Statistics
• Correlation( Direction, Strength &
  Significance)     -1 to +1
• Parametric: Pearson’s correlation
• Non Parametric: Spearman’s Correlation
  & Kendall’s rank correlation(ordinal)
Data Analysis
•   Simple tabulation & Cross tabulation
•   ANOVA
•   Correlation & Regression
•   Discriminant Analysis
•   Factor Analysis
•   Conjoint Analysis
•   Multidimensional Scaling
•   Cluster Analysis
Tabulation                                                       Variables



              Male              Female                   Total



              25                35                       60



                                                                                        Frequency
                                         Family Background


Descriptive
                                       Frequen                 Valid     Cumulative
Statistics                               cy        Percent    Percent     Percent

                     Valid   Nuclear
                                              72       60.0       60.0         60.0

                             Joint
                                              48       40.0       40.0        100.0

                             Total
                                           120        100.0      100.0
Cross Tabulation
     Two variable interaction
                          Variable
                                                                                                 1(nominal)
Variable
2(nominal)
                                    Qualification * Centrality Crosstabulation



    Count


                                                                  Centrality


                                     -2          0            1                2        3          4       Total

    Qualification   Doctorate
                                          4            6             0             18       6          0           34

                    Post Graduate
                                          0            4             3             20       14         8           49

                    Graduate
                                          0            0             5             28       4          0           37

    Total
                                          4          10              8             66       24         8       120
Chi square
• To determine the systematic association
  between two variables
• Null hypothesis: no association
• Expected cell frequencies comparison with
  actual cell frequencies
• Greater the discrepancies, greater will be
  chi square statistic
Chi square test

• Two nominal variables
• Cross tabulation
• Non parametric
• SPSS code
1. Analyze from SPSS bar
2.Analyze> Nonparametric test> Chi-square
Exercise
•    In this case study , we are observing association between educational
     background(independent variable) of the PGDBM students and their performance in the
     terms of grade(dependent variable) secured. We want to test at 90% and 95% confidence
     level, what is the level of significance of association.(refer to file in SPSS)
Educational Background Code
B.Com              1
B.E.               2
B.Sc.              3
BBA               4
B.A.               5

Grades as follows
Grade Obtained Grade Code
A                 1
B                 2
C                 3
t test
• Significant mean difference between two
  groups
• Nominal variable on interval or ratio scale
• (smokers & non smokers on extent of well
  being)
• Sample size less than 30
• Df = N-1
• Mann Whitney U test
ANOVA
• Significant mean difference among
  multiple groups



• Multiple regression
Variance caused by independent variable on
  dependent variables simultaneously
Hypothesis Testing
•   Errors
•   Normal population
•   Degrees of freedom
•   One tailed or two tailed
•   Single Population
•   . p value
Univariate Techniques
             Metric Data                                   Nonmetric data
             (interval or ratio)                           (nominal or ordinal scale)

                                                          One sample
                                                                             Two or More Sample
                                                          •Frequency
One sample                         Two or More
                                                          •Chi square
                                   samples
•T test
                                                          •K-S
•Z test
                                                          •Runs
                       Independent           Related
                                                          •Binomial
                       •Two group t          •Paired t
                       test                  test
                                                         Independent          Related
                       •Z test
                                                         •Chi square          •Sign
                       •One way
                       anova                             •Mann whitney        •Wilcoxon

                                                         •Median              •Mc Nemar

                                                         •K-S                 •Chi Square

                                                         •K-W Anova
Multivariate Techniques
                                            Independence
        Dependence
                                            Techniques
        Techniques

                                                           Interobject
One dependent     More than one   Variable
                                                           Similarity
variable          dependent       Interdependence
                  variable                                 •Cluster
•Cross tabs                       •Factor Analysis
                                                           Analysis
                  •Cannonical
•Anova &          correlation                              •MDS
Covariance
                  •Multiple
•Multiple         discriminant
regression
•Discriminant
•Conjoint
Criterion(dependent)
                     One                                      Two or More

                  Nomin    Ordinal        Interval      No     Ordinal      Interval
                  al                                    min
                                                        al
One    Nominal    Chi      •Sign test     Analysis of                       Multiple
                  Squar    •Mann          variance                          discriminant
                  e        whinney                                          analysis
                           •Krushal
                           Wallis Anova

       Ordinal             •Spearman
                           rank
                           Correlation
                           •Kendall’s
                           rank
                           correlation

Two    Interval   Anova                   Regression    Ano                 Multiple
or                                        Analysis      va                  Regression
More                                                                        Analysis
       Nominal             Friedman       Anova                             Anova
                           two way        Factorial
                           analysis       Design
       Ordinal

Data analysis

  • 1.
  • 2.
    Data Analysis D Data Analysis Results A interpretation T A C O L Getting data ready L for analysis Feel for data Goodness of Hypothesis E •Editing 1.Mean Data Testing •Handling Blank 2.S.D. •Reliability •Appropriate C Reponses 3.Correlation •Validity statistical tools T 4.Frequency •Coding I •Categorizing distribution O •Creating data file N •Programming
  • 3.
    Data Preparation • Getting data ready for analysis(SPSS) • Editing • Handling Blank Reponses • Coding • Categorizing • Creating data file • Programming
  • 4.
    Coding & Categorization •Usually a number to each response • Coding of questionnaires • If male=1 and if female=2 • Negatively worded questions Some compromises with ethics helps you in practical life Strongly disagree1-2-3-4-5-6-7Strongly agree • EXCEL we designate specific columns to specific questions and responses
  • 5.
    Descriptive statistics • Frequency(tabulation) •Measure of central tendency(mean, median & mode) • Measure of Dispersion(range, variance, standard deviation & interquartile range)
  • 6.
    Inferential Statistics • Correlation(Direction, Strength & Significance) -1 to +1 • Parametric: Pearson’s correlation • Non Parametric: Spearman’s Correlation & Kendall’s rank correlation(ordinal)
  • 7.
    Data Analysis • Simple tabulation & Cross tabulation • ANOVA • Correlation & Regression • Discriminant Analysis • Factor Analysis • Conjoint Analysis • Multidimensional Scaling • Cluster Analysis
  • 8.
    Tabulation Variables Male Female Total 25 35 60 Frequency Family Background Descriptive Frequen Valid Cumulative Statistics cy Percent Percent Percent Valid Nuclear 72 60.0 60.0 60.0 Joint 48 40.0 40.0 100.0 Total 120 100.0 100.0
  • 9.
    Cross Tabulation Two variable interaction Variable 1(nominal) Variable 2(nominal) Qualification * Centrality Crosstabulation Count Centrality -2 0 1 2 3 4 Total Qualification Doctorate 4 6 0 18 6 0 34 Post Graduate 0 4 3 20 14 8 49 Graduate 0 0 5 28 4 0 37 Total 4 10 8 66 24 8 120
  • 10.
    Chi square • Todetermine the systematic association between two variables • Null hypothesis: no association • Expected cell frequencies comparison with actual cell frequencies • Greater the discrepancies, greater will be chi square statistic
  • 11.
    Chi square test •Two nominal variables • Cross tabulation • Non parametric • SPSS code 1. Analyze from SPSS bar 2.Analyze> Nonparametric test> Chi-square
  • 13.
    Exercise • In this case study , we are observing association between educational background(independent variable) of the PGDBM students and their performance in the terms of grade(dependent variable) secured. We want to test at 90% and 95% confidence level, what is the level of significance of association.(refer to file in SPSS) Educational Background Code B.Com 1 B.E. 2 B.Sc. 3 BBA 4 B.A. 5 Grades as follows Grade Obtained Grade Code A 1 B 2 C 3
  • 15.
    t test • Significantmean difference between two groups • Nominal variable on interval or ratio scale • (smokers & non smokers on extent of well being) • Sample size less than 30 • Df = N-1 • Mann Whitney U test
  • 16.
    ANOVA • Significant meandifference among multiple groups • Multiple regression Variance caused by independent variable on dependent variables simultaneously
  • 17.
    Hypothesis Testing • Errors • Normal population • Degrees of freedom • One tailed or two tailed • Single Population • . p value
  • 18.
    Univariate Techniques Metric Data Nonmetric data (interval or ratio) (nominal or ordinal scale) One sample Two or More Sample •Frequency One sample Two or More •Chi square samples •T test •K-S •Z test •Runs Independent Related •Binomial •Two group t •Paired t test test Independent Related •Z test •Chi square •Sign •One way anova •Mann whitney •Wilcoxon •Median •Mc Nemar •K-S •Chi Square •K-W Anova
  • 19.
    Multivariate Techniques Independence Dependence Techniques Techniques Interobject One dependent More than one Variable Similarity variable dependent Interdependence variable •Cluster •Cross tabs •Factor Analysis Analysis •Cannonical •Anova & correlation •MDS Covariance •Multiple •Multiple discriminant regression •Discriminant •Conjoint
  • 20.
    Criterion(dependent) One Two or More Nomin Ordinal Interval No Ordinal Interval al min al One Nominal Chi •Sign test Analysis of Multiple Squar •Mann variance discriminant e whinney analysis •Krushal Wallis Anova Ordinal •Spearman rank Correlation •Kendall’s rank correlation Two Interval Anova Regression Ano Multiple or Analysis va Regression More Analysis Nominal Friedman Anova Anova two way Factorial analysis Design Ordinal