SOC2002: Sociological Analysis and Research Methods LECTURE 11: Data Analysis (1) Quantitative data analysis and SPSS Lecturer: Bonnie Green [email_address]
The research process: what we’ve covered so far… Reporting Data collection Topic/Object 1 2 3 4 5 6 Research question Research design Data analysis Interpretation Literature review, and/or field reconnaissance Choosing indicators & Project Planning Ethics Quality
The research process: today… Reporting Data collection Topic/Object 1 2 3 4 5 6 LECTURES 11, 12 & 13 Research question Research design Data analysis Interpretation Literature review, and/or field reconnaissance Choosing indicators & Project Planning Ethics Quality
Data Analysis (1):  Overview Numerous techniques for quantitative data analysis Indication depends on what  information  you want to generate Today: Descriptive analysis Exploratory analysis Statistical analysis Suitable for closed-questions on a self/interviewer-completed questionnaire
STEP 1: Data entry
STEP 1: Data entry Types of data: Categorical Nominal Ordinal Interval/ratio (scale) Missing data: Closed-questions often provide values in themselves (e.g. age) and can be 0 Missing cases conventionally coded "99", but must be a value the than is not found in the data for that variable
STEP 1: Data entry Nominal Measures : Data with a  limited number of distinct categories  or values  There is  no inherent order to the categories
STEP 1: Data entry Ordinal Measures : Data with a  limited number of distinct categories  or values  There is a meaningful  order of categories , but  no measurable distance between values
STEP 1: Data entry Scale Measures : Data measured on an  interval or ratio scale Data values indicate  both the order of values and the distance between values
STEP 2: Data analysis Descriptive (summary) statistics Frequency tables and charts for individual variables Summary statistics for individual variables Exploratory statistics Cross-tabulations for two or more variables Correlations Statistical tests Chi-squared T-test Regression analysis
Frequency tables and charts For categorical data  descriptive or summary statistics include tables or graphs/charts of frequency Frequency may appear as either the number or percentage of cases in each category In SPSS use the "Frequencies" submenu
Frequency tables Frequency tables for two categorical variables What  information  is presented here? n of  valid  cases=6400 1307 people out of 6400 (20.4%) answered "yes" to owning a pda and 5093 (79.6%) answered "no" 6337 people out of 6400 (99%) answered "yes" to owning a TV and 63 (1%) answered "no" This information can also be presented graphically…
Frequency graphs and charts The same  information  can be displayed as:
Numerical summaries Measures of central tendency: Mean: the arithmetic average Median: the value at which half the cases fall above and below Mode: the category with the greatest number of cases Measures of dispersion: Miniumum Maximum Standard deviation:  measures the spread of a distribution around the mean
Numerical summaries For  categorical data  the median and mode may be relevant For  interval data  the mean and standard deviation may be the most useful In SPSS use the "Frequencies" submenu
Numerical summaries Numerical description of one interval/ratio variable What  information  is presented here? n of  valid  cases=6400 Mean average number of years spent at the current address is 11.6 in this sample The standard deviation in number of years spent at the current address is 9.9 in this sample This information can also be presented graphically…
STEP 2: Data analysis Descriptive (summary) statistics Frequency tables and charts for individual variables Summary statistics for individual variables Exploratory statistics Cross-tabulations for two or more variables Correlations Statistical tests Chi-squared T-test Regression analysis
Crosstabs Used to examine  relationships  between variables Here, between income and PDA ownership In SPSS use the “Crosstabs" submenu
Crosstabs Here relationship between two  categorical variables  is explored What  information  is presented here? Table cells show the or number of cases for each joint combination of values (e.g. 455 people in the income range $25,000 - $49,000 own PDAs) Percentages tell us more:  The percentage of people who own PDAS rises as the income category rises
Correlations Correlation coefficients Pearson’s  r Spearman’s  ρ  (Rho) Numerical indices which describe: How closely related two variables are and how they relate to each other Positive: both variables increase numerically Negative: scores on one variable increase as they decrease on the other variable None
Correlations Used for  interval  data Bivariate correlations use two variables In SPSS use the “Bivariate" submenu
Correlations Correlation between scores of musical and mathematical ability What  information  is presented here? N=10 The Pearson correlation between scores on the music test and the maths test is -0.900 ( r =-0.90) The  significance  of this is 0.000 ( p =0.000) But what does this mean?
Measures of Significance (Bryman, 2001: 232-234) Used when you have a random (probability) sample and you want to generalise to a population  Produced when you do a  statistical test The significance or  p -value is an indicator of how confident you can be in your finding Relates to hypothesis testing The probability that the result occurred by chance Acceptable levels of significance in social science p ≥  0.05 finding is not significant (i.e. we cannot be confident that it did not occur by chance) p  < 0.05 finding is significant (i.e. we can be confident that it did not occur by chance)
STEP 2: Data analysis Descriptive (summary) statistics Frequency tables and charts for individual variables Summary statistics for individual variables Exploratory statistics Cross-tabulations for two or more variables Correlations Statistical tests Chi-squared T-test Regression analysis
Chi-square A significance test for crosstabs Can be used with categorical and interval data In SPSS use “Crosstabs” > “Statistics”
Chi-square Here testing whether  the differences in PDA ownership between different income categories is due to chance The value of the statistic itself is not that important. The  p -value is Here  p  < 0.05
T-tests Uncorrelated (independent samples) t-test tells you whether the means of two sets of scores are significantly different from one another Used for  interval data  drawn from  different samples In SPSS use the “Compare Means&quot; submenu
T-tests Data from the 2002 General Social Survey Comparing the age at which their first child was born between men and women (i.e. independent groups) What  information  is presented here? Means and standard deviations for both groups Levene’s Test for Equality of Variances 2 sets of  p -values If Levene’s Test is statistically significant then variances are unequal, if not they are equal Here  p  < 0.05 for Levene’s Test, therefore we read the row “equal variances not assumed”
Regression analysis Scatterplot of musical v. mathematical ability The relationship between the two can be described by a line This allows you to predict musical ability from mathematical ability Where the  regression coefficient  is statistically significant we can say that mathematical ability is a  good predictor  of musical ability
Regression analysis Where we have more than two variables we use  multiple regression Used for  interval data In SPSS use the “Regression&quot; submenu
Beware of assumptions Many statistical tests assume the data follows a normal distribution Check this as best as possible If you are unsure, use a non-parametric test or acknowledge this could be a possible problem Correlation ≠ Causality Though  a  may be correlated with  b  it does not necessarily follow that  a  causes  b b  may cause  a Both  a  and  b  may be caused by  c
Data Analysis (1):  Summary Today, techniques for the analysis of quantitative data: Descriptive analysis Exploratory analysis Statistical analysis Computer class follows (Library seminar room): Introduction to SPSS Unpack what some of these numbers mean

SOC2002 Lecture 11

  • 1.
    SOC2002: Sociological Analysisand Research Methods LECTURE 11: Data Analysis (1) Quantitative data analysis and SPSS Lecturer: Bonnie Green [email_address]
  • 2.
    The research process:what we’ve covered so far… Reporting Data collection Topic/Object 1 2 3 4 5 6 Research question Research design Data analysis Interpretation Literature review, and/or field reconnaissance Choosing indicators & Project Planning Ethics Quality
  • 3.
    The research process:today… Reporting Data collection Topic/Object 1 2 3 4 5 6 LECTURES 11, 12 & 13 Research question Research design Data analysis Interpretation Literature review, and/or field reconnaissance Choosing indicators & Project Planning Ethics Quality
  • 4.
    Data Analysis (1): Overview Numerous techniques for quantitative data analysis Indication depends on what information you want to generate Today: Descriptive analysis Exploratory analysis Statistical analysis Suitable for closed-questions on a self/interviewer-completed questionnaire
  • 5.
  • 6.
    STEP 1: Dataentry Types of data: Categorical Nominal Ordinal Interval/ratio (scale) Missing data: Closed-questions often provide values in themselves (e.g. age) and can be 0 Missing cases conventionally coded &quot;99&quot;, but must be a value the than is not found in the data for that variable
  • 7.
    STEP 1: Dataentry Nominal Measures : Data with a limited number of distinct categories or values There is no inherent order to the categories
  • 8.
    STEP 1: Dataentry Ordinal Measures : Data with a limited number of distinct categories or values There is a meaningful order of categories , but no measurable distance between values
  • 9.
    STEP 1: Dataentry Scale Measures : Data measured on an interval or ratio scale Data values indicate both the order of values and the distance between values
  • 10.
    STEP 2: Dataanalysis Descriptive (summary) statistics Frequency tables and charts for individual variables Summary statistics for individual variables Exploratory statistics Cross-tabulations for two or more variables Correlations Statistical tests Chi-squared T-test Regression analysis
  • 11.
    Frequency tables andcharts For categorical data descriptive or summary statistics include tables or graphs/charts of frequency Frequency may appear as either the number or percentage of cases in each category In SPSS use the &quot;Frequencies&quot; submenu
  • 12.
    Frequency tables Frequencytables for two categorical variables What information is presented here? n of valid cases=6400 1307 people out of 6400 (20.4%) answered &quot;yes&quot; to owning a pda and 5093 (79.6%) answered &quot;no&quot; 6337 people out of 6400 (99%) answered &quot;yes&quot; to owning a TV and 63 (1%) answered &quot;no&quot; This information can also be presented graphically…
  • 13.
    Frequency graphs andcharts The same information can be displayed as:
  • 14.
    Numerical summaries Measuresof central tendency: Mean: the arithmetic average Median: the value at which half the cases fall above and below Mode: the category with the greatest number of cases Measures of dispersion: Miniumum Maximum Standard deviation: measures the spread of a distribution around the mean
  • 15.
    Numerical summaries For categorical data the median and mode may be relevant For interval data the mean and standard deviation may be the most useful In SPSS use the &quot;Frequencies&quot; submenu
  • 16.
    Numerical summaries Numericaldescription of one interval/ratio variable What information is presented here? n of valid cases=6400 Mean average number of years spent at the current address is 11.6 in this sample The standard deviation in number of years spent at the current address is 9.9 in this sample This information can also be presented graphically…
  • 17.
    STEP 2: Dataanalysis Descriptive (summary) statistics Frequency tables and charts for individual variables Summary statistics for individual variables Exploratory statistics Cross-tabulations for two or more variables Correlations Statistical tests Chi-squared T-test Regression analysis
  • 18.
    Crosstabs Used toexamine relationships between variables Here, between income and PDA ownership In SPSS use the “Crosstabs&quot; submenu
  • 19.
    Crosstabs Here relationshipbetween two categorical variables is explored What information is presented here? Table cells show the or number of cases for each joint combination of values (e.g. 455 people in the income range $25,000 - $49,000 own PDAs) Percentages tell us more: The percentage of people who own PDAS rises as the income category rises
  • 20.
    Correlations Correlation coefficientsPearson’s r Spearman’s ρ (Rho) Numerical indices which describe: How closely related two variables are and how they relate to each other Positive: both variables increase numerically Negative: scores on one variable increase as they decrease on the other variable None
  • 21.
    Correlations Used for interval data Bivariate correlations use two variables In SPSS use the “Bivariate&quot; submenu
  • 22.
    Correlations Correlation betweenscores of musical and mathematical ability What information is presented here? N=10 The Pearson correlation between scores on the music test and the maths test is -0.900 ( r =-0.90) The significance of this is 0.000 ( p =0.000) But what does this mean?
  • 23.
    Measures of Significance(Bryman, 2001: 232-234) Used when you have a random (probability) sample and you want to generalise to a population Produced when you do a statistical test The significance or p -value is an indicator of how confident you can be in your finding Relates to hypothesis testing The probability that the result occurred by chance Acceptable levels of significance in social science p ≥ 0.05 finding is not significant (i.e. we cannot be confident that it did not occur by chance) p < 0.05 finding is significant (i.e. we can be confident that it did not occur by chance)
  • 24.
    STEP 2: Dataanalysis Descriptive (summary) statistics Frequency tables and charts for individual variables Summary statistics for individual variables Exploratory statistics Cross-tabulations for two or more variables Correlations Statistical tests Chi-squared T-test Regression analysis
  • 25.
    Chi-square A significancetest for crosstabs Can be used with categorical and interval data In SPSS use “Crosstabs” > “Statistics”
  • 26.
    Chi-square Here testingwhether the differences in PDA ownership between different income categories is due to chance The value of the statistic itself is not that important. The p -value is Here p < 0.05
  • 27.
    T-tests Uncorrelated (independentsamples) t-test tells you whether the means of two sets of scores are significantly different from one another Used for interval data drawn from different samples In SPSS use the “Compare Means&quot; submenu
  • 28.
    T-tests Data fromthe 2002 General Social Survey Comparing the age at which their first child was born between men and women (i.e. independent groups) What information is presented here? Means and standard deviations for both groups Levene’s Test for Equality of Variances 2 sets of p -values If Levene’s Test is statistically significant then variances are unequal, if not they are equal Here p < 0.05 for Levene’s Test, therefore we read the row “equal variances not assumed”
  • 29.
    Regression analysis Scatterplotof musical v. mathematical ability The relationship between the two can be described by a line This allows you to predict musical ability from mathematical ability Where the regression coefficient is statistically significant we can say that mathematical ability is a good predictor of musical ability
  • 30.
    Regression analysis Wherewe have more than two variables we use multiple regression Used for interval data In SPSS use the “Regression&quot; submenu
  • 31.
    Beware of assumptionsMany statistical tests assume the data follows a normal distribution Check this as best as possible If you are unsure, use a non-parametric test or acknowledge this could be a possible problem Correlation ≠ Causality Though a may be correlated with b it does not necessarily follow that a causes b b may cause a Both a and b may be caused by c
  • 32.
    Data Analysis (1): Summary Today, techniques for the analysis of quantitative data: Descriptive analysis Exploratory analysis Statistical analysis Computer class follows (Library seminar room): Introduction to SPSS Unpack what some of these numbers mean