SOC2002: Sociological Analysis and Research Methods LECTURE 11: Data Analysis (1) Quantitative data analysis and SPSS Lecturer: Bonnie Green [email_address]
The research process: what we’ve covered so far… Reporting Data collection Topic/Object 1 2 3 4 5 6 Research question Research design Data analysis Interpretation Literature review, and/or field reconnaissance Choosing indicators & Project Planning Ethics Quality
The research process: today… Reporting Data collection Topic/Object 1 2 3 4 5 6 LECTURES 11, 12 & 13 Research question Research design Data analysis Interpretation Literature review, and/or field reconnaissance Choosing indicators & Project Planning Ethics Quality
Data Analysis (1): Overview
Numerous techniques for quantitative data analysis
Indication depends on what information you want to generate
Today:
Descriptive analysis
Exploratory analysis
Statistical analysis
Suitable for closed-questions on a self/interviewer-completed questionnaire
STEP 1: Data entry
STEP 1: Data entry
Types of data:
Categorical
Nominal
Ordinal
Interval/ratio (scale)
Missing data:
Closed-questions often provide values in themselves (e.g. age) and can be 0
Missing cases conventionally coded "99", but must be a value the than is not found in the data for that variable
STEP 1: Data entry
Nominal Measures :
Data with a limited number of distinct categories or values
There is no inherent order to the categories
STEP 1: Data entry
Ordinal Measures :
Data with a limited number of distinct categories or values
There is a meaningful order of categories , but no measurable distance between values
STEP 1: Data entry
Scale Measures :
Data measured on an interval or ratio scale
Data values indicate both the order of values and the distance between values
STEP 2: Data analysis
Descriptive (summary) statistics
Frequency tables and charts for individual variables
Summary statistics for individual variables
Exploratory statistics
Cross-tabulations for two or more variables
Correlations
Statistical tests
Chi-squared
T-test
Regression analysis
Frequency tables and charts
For categorical data descriptive or summary statistics include tables or graphs/charts of frequency
Frequency may appear as either the number or percentage of cases in each category
In SPSS use the "Frequencies" submenu
Frequency tables
Frequency tables for two categorical variables
What information is presented here?
n of valid cases=6400
1307 people out of 6400 (20.4%) answered "yes" to owning a pda and 5093 (79.6%) answered "no"
6337 people out of 6400 (99%) answered "yes" to owning a TV and 63 (1%) answered "no"
This information can also be presented graphically…
Frequency graphs and charts
The same information can be displayed as:
Numerical summaries
Measures of central tendency:
Mean: the arithmetic average
Median: the value at which half the cases fall above and below
Mode: the category with the greatest number of cases
Measures of dispersion:
Miniumum
Maximum
Standard deviation: measures the spread of a distribution around the mean
Numerical summaries
For categorical data the median and mode may be relevant
For interval data the mean and standard deviation may be the most useful
In SPSS use the "Frequencies" submenu
Numerical summaries
Numerical description of one interval/ratio variable
What information is presented here?
n of valid cases=6400
Mean average number of years spent at the current address is 11.6 in this sample
The standard deviation in number of years spent at the current address is 9.9 in this sample
This information can also be presented graphically…
STEP 2: Data analysis
Descriptive (summary) statistics
Frequency tables and charts for individual variables
Summary statistics for individual variables
Exploratory statistics
Cross-tabulations for two or more variables
Correlations
Statistical tests
Chi-squared
T-test
Regression analysis
Crosstabs
Used to examine relationships between variables
Here, between income and PDA ownership
In SPSS use the “Crosstabs" submenu
Crosstabs
Here relationship between two categorical variables is explored
What information is presented here?
Table cells show the or number of cases for each joint combination of values (e.g. 455 people in the income range $25,000 - $49,000 own PDAs)
Percentages tell us more: The percentage of people who own PDAS rises as the income category rises
Correlations
Correlation coefficients
Pearson’s r
Spearman’s ρ (Rho)
Numerical indices which describe:
How closely related two variables are and how they relate to each other
Positive: both variables increase numerically
Negative: scores on one variable increase as they decrease on the other variable
None
Correlations
Used for interval data
Bivariate correlations use two variables
In SPSS use the “Bivariate" submenu
Correlations
Correlation between scores of musical and mathematical ability
What information is presented here?
N=10
The Pearson correlation between scores on the music test and the maths test is -0.900 ( r =-0.90)
The significance of this is 0.000 ( p =0.000)
But what does this mean?
Measures of Significance (Bryman, 2001: 232-234)
Used when you have a random (probability) sample and you want to generalise to a population
Produced when you do a statistical test
The significance or p -value is an indicator of how confident you can be in your finding
Relates to hypothesis testing
The probability that the result occurred by chance
Acceptable levels of significance in social science
p ≥ 0.05 finding is not significant (i.e. we cannot be confident that it did not occur by chance)
p < 0.05 finding is significant (i.e. we can be confident that it did not occur by chance)
STEP 2: Data analysis
Descriptive (summary) statistics
Frequency tables and charts for individual variables
Summary statistics for individual variables
Exploratory statistics
Cross-tabulations for two or more variables
Correlations
Statistical tests
Chi-squared
T-test
Regression analysis
Chi-square
A significance test for crosstabs
Can be used with categorical and interval data
In SPSS use “Crosstabs” > “Statistics”
Chi-square
Here testing whether the differences in PDA ownership between different income categories is due to chance
The value of the statistic itself is not that important. The p -value is
Here p < 0.05
T-tests
Uncorrelated (independent samples) t-test tells you whether the means of two sets of scores are significantly different from one another
Used for interval data drawn from different samples
In SPSS use the “Compare Means" submenu
T-tests
Data from the 2002 General Social Survey
Comparing the age at which their first child was born between men and women (i.e. independent groups)
What information is presented here?
Means and standard deviations for both groups
Levene’s Test for Equality of Variances
2 sets of p -values
If Levene’s Test is statistically significant then variances are unequal, if not they are equal
Here p < 0.05 for Levene’s Test, therefore we read the row “equal variances not assumed”
Regression analysis
Scatterplot of musical v. mathematical ability
The relationship between the two can be described by a line
This allows you to predict musical ability from mathematical ability
Where the regression coefficient is statistically significant we can say that mathematical ability is a good predictor of musical ability
Regression analysis
Where we have more than two variables we use multiple regression
Used for interval data
In SPSS use the “Regression" submenu
Beware of assumptions
Many statistical tests assume the data follows a normal distribution
Check this as best as possible
If you are unsure, use a non-parametric test or acknowledge this could be a possible problem
Correlation ≠ Causality
Though a may be correlated with b it does not necessarily follow that a causes b
b may cause a
Both a and b may be caused by c
Data Analysis (1): Summary
Today, techniques for the analysis of quantitative data:
0 comments
Post a comment