SOC2002 Lecture 11

1,806 views

Published on

Published in: Technology, Business
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,806
On SlideShare
0
From Embeds
0
Number of Embeds
14
Actions
Shares
0
Downloads
0
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide
  • SOC2002 Lecture 11

    1. 1. SOC2002: Sociological Analysis and Research Methods LECTURE 11: Data Analysis (1) Quantitative data analysis and SPSS Lecturer: Bonnie Green [email_address]
    2. 2. The research process: what we’ve covered so far… Reporting Data collection Topic/Object 1 2 3 4 5 6 Research question Research design Data analysis Interpretation Literature review, and/or field reconnaissance Choosing indicators & Project Planning Ethics Quality
    3. 3. The research process: today… Reporting Data collection Topic/Object 1 2 3 4 5 6 LECTURES 11, 12 & 13 Research question Research design Data analysis Interpretation Literature review, and/or field reconnaissance Choosing indicators & Project Planning Ethics Quality
    4. 4. Data Analysis (1): Overview <ul><li>Numerous techniques for quantitative data analysis </li></ul><ul><ul><li>Indication depends on what information you want to generate </li></ul></ul><ul><li>Today: </li></ul><ul><ul><li>Descriptive analysis </li></ul></ul><ul><ul><li>Exploratory analysis </li></ul></ul><ul><ul><li>Statistical analysis </li></ul></ul><ul><li>Suitable for closed-questions on a self/interviewer-completed questionnaire </li></ul>
    5. 5. STEP 1: Data entry
    6. 6. STEP 1: Data entry <ul><li>Types of data: </li></ul><ul><ul><li>Categorical </li></ul></ul><ul><ul><ul><li>Nominal </li></ul></ul></ul><ul><ul><ul><li>Ordinal </li></ul></ul></ul><ul><ul><li>Interval/ratio (scale) </li></ul></ul><ul><li>Missing data: </li></ul><ul><ul><li>Closed-questions often provide values in themselves (e.g. age) and can be 0 </li></ul></ul><ul><ul><li>Missing cases conventionally coded &quot;99&quot;, but must be a value the than is not found in the data for that variable </li></ul></ul>
    7. 7. STEP 1: Data entry <ul><li>Nominal Measures : </li></ul><ul><li>Data with a limited number of distinct categories or values </li></ul><ul><li>There is no inherent order to the categories </li></ul>
    8. 8. STEP 1: Data entry <ul><li>Ordinal Measures : </li></ul><ul><li>Data with a limited number of distinct categories or values </li></ul><ul><li>There is a meaningful order of categories , but no measurable distance between values </li></ul>
    9. 9. STEP 1: Data entry <ul><li>Scale Measures : </li></ul><ul><li>Data measured on an interval or ratio scale </li></ul><ul><li>Data values indicate both the order of values and the distance between values </li></ul>
    10. 10. STEP 2: Data analysis <ul><li>Descriptive (summary) statistics </li></ul><ul><ul><li>Frequency tables and charts for individual variables </li></ul></ul><ul><ul><li>Summary statistics for individual variables </li></ul></ul><ul><li>Exploratory statistics </li></ul><ul><ul><li>Cross-tabulations for two or more variables </li></ul></ul><ul><ul><li>Correlations </li></ul></ul><ul><li>Statistical tests </li></ul><ul><ul><li>Chi-squared </li></ul></ul><ul><ul><li>T-test </li></ul></ul><ul><ul><li>Regression analysis </li></ul></ul>
    11. 11. Frequency tables and charts <ul><li>For categorical data descriptive or summary statistics include tables or graphs/charts of frequency </li></ul><ul><li>Frequency may appear as either the number or percentage of cases in each category </li></ul><ul><li>In SPSS use the &quot;Frequencies&quot; submenu </li></ul>
    12. 12. Frequency tables <ul><li>Frequency tables for two categorical variables </li></ul><ul><li>What information is presented here? </li></ul><ul><ul><li>n of valid cases=6400 </li></ul></ul><ul><ul><li>1307 people out of 6400 (20.4%) answered &quot;yes&quot; to owning a pda and 5093 (79.6%) answered &quot;no&quot; </li></ul></ul><ul><ul><li>6337 people out of 6400 (99%) answered &quot;yes&quot; to owning a TV and 63 (1%) answered &quot;no&quot; </li></ul></ul><ul><li>This information can also be presented graphically… </li></ul>
    13. 13. Frequency graphs and charts <ul><li>The same information can be displayed as: </li></ul>
    14. 14. Numerical summaries <ul><li>Measures of central tendency: </li></ul><ul><ul><li>Mean: the arithmetic average </li></ul></ul><ul><ul><li>Median: the value at which half the cases fall above and below </li></ul></ul><ul><ul><li>Mode: the category with the greatest number of cases </li></ul></ul><ul><li>Measures of dispersion: </li></ul><ul><ul><li>Miniumum </li></ul></ul><ul><ul><li>Maximum </li></ul></ul><ul><ul><li>Standard deviation: measures the spread of a distribution around the mean </li></ul></ul>
    15. 15. Numerical summaries <ul><li>For categorical data the median and mode may be relevant </li></ul><ul><li>For interval data the mean and standard deviation may be the most useful </li></ul><ul><li>In SPSS use the &quot;Frequencies&quot; submenu </li></ul>
    16. 16. Numerical summaries <ul><li>Numerical description of one interval/ratio variable </li></ul><ul><li>What information is presented here? </li></ul><ul><ul><li>n of valid cases=6400 </li></ul></ul><ul><ul><li>Mean average number of years spent at the current address is 11.6 in this sample </li></ul></ul><ul><ul><li>The standard deviation in number of years spent at the current address is 9.9 in this sample </li></ul></ul><ul><li>This information can also be presented graphically… </li></ul>
    17. 17. STEP 2: Data analysis <ul><li>Descriptive (summary) statistics </li></ul><ul><ul><li>Frequency tables and charts for individual variables </li></ul></ul><ul><ul><li>Summary statistics for individual variables </li></ul></ul><ul><li>Exploratory statistics </li></ul><ul><ul><li>Cross-tabulations for two or more variables </li></ul></ul><ul><ul><li>Correlations </li></ul></ul><ul><li>Statistical tests </li></ul><ul><ul><li>Chi-squared </li></ul></ul><ul><ul><li>T-test </li></ul></ul><ul><ul><li>Regression analysis </li></ul></ul>
    18. 18. Crosstabs <ul><li>Used to examine relationships between variables </li></ul><ul><li>Here, between income and PDA ownership </li></ul><ul><li>In SPSS use the “Crosstabs&quot; submenu </li></ul>
    19. 19. Crosstabs <ul><li>Here relationship between two categorical variables is explored </li></ul><ul><li>What information is presented here? </li></ul><ul><ul><li>Table cells show the or number of cases for each joint combination of values (e.g. 455 people in the income range $25,000 - $49,000 own PDAs) </li></ul></ul><ul><ul><li>Percentages tell us more: The percentage of people who own PDAS rises as the income category rises </li></ul></ul>
    20. 20. Correlations <ul><li>Correlation coefficients </li></ul><ul><ul><li>Pearson’s r </li></ul></ul><ul><ul><li>Spearman’s ρ (Rho) </li></ul></ul><ul><li>Numerical indices which describe: </li></ul><ul><ul><li>How closely related two variables are and how they relate to each other </li></ul></ul><ul><ul><ul><li>Positive: both variables increase numerically </li></ul></ul></ul><ul><ul><ul><li>Negative: scores on one variable increase as they decrease on the other variable </li></ul></ul></ul><ul><ul><ul><li>None </li></ul></ul></ul>
    21. 21. Correlations <ul><li>Used for interval data </li></ul><ul><li>Bivariate correlations use two variables </li></ul><ul><li>In SPSS use the “Bivariate&quot; submenu </li></ul>
    22. 22. Correlations <ul><li>Correlation between scores of musical and mathematical ability </li></ul><ul><li>What information is presented here? </li></ul><ul><ul><li>N=10 </li></ul></ul><ul><ul><li>The Pearson correlation between scores on the music test and the maths test is -0.900 ( r =-0.90) </li></ul></ul><ul><ul><li>The significance of this is 0.000 ( p =0.000) </li></ul></ul><ul><li>But what does this mean? </li></ul>
    23. 23. Measures of Significance (Bryman, 2001: 232-234) <ul><li>Used when you have a random (probability) sample and you want to generalise to a population </li></ul><ul><li>Produced when you do a statistical test </li></ul><ul><li>The significance or p -value is an indicator of how confident you can be in your finding </li></ul><ul><ul><li>Relates to hypothesis testing </li></ul></ul><ul><ul><li>The probability that the result occurred by chance </li></ul></ul><ul><li>Acceptable levels of significance in social science </li></ul><ul><ul><li>p ≥ 0.05 finding is not significant (i.e. we cannot be confident that it did not occur by chance) </li></ul></ul><ul><ul><li>p < 0.05 finding is significant (i.e. we can be confident that it did not occur by chance) </li></ul></ul>
    24. 24. STEP 2: Data analysis <ul><li>Descriptive (summary) statistics </li></ul><ul><ul><li>Frequency tables and charts for individual variables </li></ul></ul><ul><ul><li>Summary statistics for individual variables </li></ul></ul><ul><li>Exploratory statistics </li></ul><ul><ul><li>Cross-tabulations for two or more variables </li></ul></ul><ul><ul><li>Correlations </li></ul></ul><ul><li>Statistical tests </li></ul><ul><ul><li>Chi-squared </li></ul></ul><ul><ul><li>T-test </li></ul></ul><ul><ul><li>Regression analysis </li></ul></ul>
    25. 25. Chi-square <ul><li>A significance test for crosstabs </li></ul><ul><li>Can be used with categorical and interval data </li></ul><ul><li>In SPSS use “Crosstabs” > “Statistics” </li></ul>
    26. 26. Chi-square <ul><li>Here testing whether the differences in PDA ownership between different income categories is due to chance </li></ul><ul><li>The value of the statistic itself is not that important. The p -value is </li></ul><ul><li>Here p < 0.05 </li></ul>
    27. 27. T-tests <ul><li>Uncorrelated (independent samples) t-test tells you whether the means of two sets of scores are significantly different from one another </li></ul><ul><li>Used for interval data drawn from different samples </li></ul><ul><li>In SPSS use the “Compare Means&quot; submenu </li></ul>
    28. 28. T-tests <ul><li>Data from the 2002 General Social Survey </li></ul><ul><li>Comparing the age at which their first child was born between men and women (i.e. independent groups) </li></ul><ul><li>What information is presented here? </li></ul><ul><ul><li>Means and standard deviations for both groups </li></ul></ul><ul><ul><li>Levene’s Test for Equality of Variances </li></ul></ul><ul><ul><li>2 sets of p -values </li></ul></ul><ul><li>If Levene’s Test is statistically significant then variances are unequal, if not they are equal </li></ul><ul><li>Here p < 0.05 for Levene’s Test, therefore we read the row “equal variances not assumed” </li></ul>
    29. 29. Regression analysis <ul><li>Scatterplot of musical v. mathematical ability </li></ul><ul><li>The relationship between the two can be described by a line </li></ul><ul><li>This allows you to predict musical ability from mathematical ability </li></ul><ul><li>Where the regression coefficient is statistically significant we can say that mathematical ability is a good predictor of musical ability </li></ul>
    30. 30. Regression analysis <ul><li>Where we have more than two variables we use multiple regression </li></ul><ul><li>Used for interval data </li></ul><ul><li>In SPSS use the “Regression&quot; submenu </li></ul>
    31. 31. Beware of assumptions <ul><li>Many statistical tests assume the data follows a normal distribution </li></ul><ul><ul><li>Check this as best as possible </li></ul></ul><ul><ul><li>If you are unsure, use a non-parametric test or acknowledge this could be a possible problem </li></ul></ul><ul><li>Correlation ≠ Causality </li></ul><ul><ul><li>Though a may be correlated with b it does not necessarily follow that a causes b </li></ul></ul><ul><ul><ul><li>b may cause a </li></ul></ul></ul><ul><ul><ul><li>Both a and b may be caused by c </li></ul></ul></ul>
    32. 32. Data Analysis (1): Summary <ul><li>Today, techniques for the analysis of quantitative data: </li></ul><ul><ul><li>Descriptive analysis </li></ul></ul><ul><ul><li>Exploratory analysis </li></ul></ul><ul><ul><li>Statistical analysis </li></ul></ul><ul><li>Computer class follows (Library seminar room): </li></ul><ul><ul><li>Introduction to SPSS </li></ul></ul><ul><ul><li>Unpack what some of these numbers mean </li></ul></ul>

    ×