Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- The AI Rush by Jean-Baptiste Dumont 1139712 views
- AI and Machine Learning Demystified... by Carol Smith 3648233 views
- 10 facts about jobs in the future by Pew Research Cent... 676436 views
- 2017 holiday survey: An annual anal... by Deloitte United S... 1091360 views
- Harry Surden - Artificial Intellige... by Harry Surden 637999 views
- Inside Google's Numbers in 2017 by Rand Fishkin 1220043 views

254 views

Published on

Published in:
Education

No Downloads

Total views

254

On SlideShare

0

From Embeds

0

Number of Embeds

2

Shares

0

Downloads

0

Comments

0

Likes

2

No embeds

No notes for slide

- 1. INTRODUCTION TO STATISTICS FOR SECOND LANGUAGE EDUCATORS DR ACHILLEAS KOSTOULAS
- 2. OBJECTIVES OF THIS SESSION You will learn how to construct a sample You will learn how to describe your sample using statistical methods. You will learn how to find connections between different phenomena in your data.
- 3. OUTLINE OF THIS SESSION 1. Populations and samples 2. Different types of data 3. Univariate analysis Central tendency Spread 4. Bivariate analysis Cross-tabulations T-Tests Correlations
- 4. POPULATIONS & SAMPLES
- 5. POPULATION The total number of people (or events, or things) whose properties or behaviour we are interested in understanding e.g. University students in Austria Symbol: N Population Sampling frame Sample
- 6. SAMPLING FRAME The total number of people (or events, or things) that we have access to for our research e.g. Students currently present in this classroom Population Sampling frame Sample
- 7. SAMPLE The total number of people who were contacted and agreed to participate in the study Symbol: n Population Sampling frame Sample
- 8. SAMPLING METHODS / STRATEGIES Simple random sampling Systematic sampling Convenience sampling
- 9. HOW LARGE SHOULD THE SAMPLE BE? Depends on: the population size the degree of certainty required the statistical tools we want to use
- 10. DIFFERENT TYPES OF DATA
- 11. Variables Cases Values
- 12. LEVELS OF MEASUREMENT 1. Nominal data 2. Ordinal data 3. Scale data
- 13. CATEGORICAL / NOMINAL DATA A variable is nominal if values do not have a numerical relation to each other. Examples: Gender (M / F / Other) Place of Birth
- 14. ORDINAL DATA Ordinal variables are like categorical ones, but we can rank the values according to order, size, frequency, etc. Examples: Level of education (High School, BA, MA/Mag., Doctorate) Attitudes (strongly disagree, disagree, neutral, agree, strongly agree)
- 15. CONTINUOUS / SCALE DATA A variable is continuous if it contains an infinite number of values that can be mathematically manipulated. Examples: Age (12, 13, 13:2, 14…) Height (165cm, 167cm, 183cm…)
- 16. UNIVARIATE ANALYSIS CENTRAL TENDENCY
- 17. MEASURES OF CENTRAL TENDENCY Mode (the most common value) Median (the middle value) Mean (the middle value, weighted)
- 18. EXAMPLE (RAW DATA) Case Height Gender Loves Statistics 1 167 M Strongly disagree 2 178 M Strongly agree 3 189 F Agree 4 201 F Agree 5 182 M Disagree 6 175 F Strongly agree 7 162 M Strongly disagree 8 180 F Disagree 9 187 M Agree
- 19. EXAMPLE (PROCESSED) Case Height Gender Loves Statistics 1 167 1 4 2 178 1 1 3 189 2 3 4 201 2 3 5 182 1 2 6 175 2 4 7 162 1 1 8 180 2 2 9 187 1 3 Total 1,621 - -
- 20. CENTRAL TENDENCY: THE MODE Gender N % --Male 5 55 --Female 4 44 --Total 9 100* Case Gender 1 1 2 1 3 2 4 2 5 1 6 2 7 1 8 2 9 1 *Rounding up error “The majority of respondents were male (n = 5, 55%). “ “Respondents were almost evenly split between male (n = 5, 55%) and female (n = 4, 44%)”
- 21. CENTRAL TENDENCY: THE MEDIAN Case <3 1 4 2 1 3 3 4 3 5 2 6 4 7 1 8 2 9 3 Case <3 1 4 6 4 3 3 4 3 9 3 5 2 8 2 2 1 7 1 I love statistics N % --Strongly agree 2 22 --Agree 3 33 --Disagree 2 22 --Strongly disagree 2 22 --Total 9 100* *Rounding up error “As can be seen in Table 1, attitudes towards statistics were largely positive (x̅ = 3)”
- 22. CENTRAL TENDENCY: THE MEAN 1,441 / 9 = 180.1 Case Height 1 167 2 178 3 189 4 201 5 182 6 175 7 162 8 180 9 187 Total 1,441 “Respondents were rather tall (M = 180.1)”
- 23. UNIVARIATE ANALYSIS SPREAD
- 24. COMPARE THESE TWO SCHOOLS School A School B 0 1 2 3 4 5 6 7 8 40-49 50-59 60-69 70-79 80-89 90-100 Based on Muijs 2007
- 25. COMPARE THESE TWO SCHOOLS Case School A School B 1 45 60 2 50 65 3 55 65 4 60 70 5 65 70 6 70 70 7 70 70 8 75 70 9 80 70 10 85 75 11 90 75 12 95 80 Media n 70 70 Mean 70 70
- 26. MEASURES OF SPREAD Range (the difference between the highest and the lowest value) Interquartile range (the difference between the highest and lowest values after we remove extremes) Standard deviation
- 27. MEASURES OF SPREAD: RANGE Case School A School B 1 45 60 2 50 65 3 55 65 4 60 70 5 65 70 6 70 70 7 70 70 8 75 70 9 80 70 10 85 75 11 90 75 12 95 80 Media n 70 70 Mean 70 70 Range Range (School A): 95 – 45 = 50 Range (School B): 80 – 60 = 20 “The test scores in School A ranged from 45 to 95 (M = 70. Scores in School B were more tightly distributed, ranging from 60 to 80 (M = 70)”
- 28. MEASURES OF SPREAD: INTERQUARTILE RANGE Case School A School B 1 45 60 2 50 65 3 55 65 4 60 70 5 65 70 6 70 70 7 70 70 8 75 70 9 80 70 10 85 75 11 90 75 12 95 80 IQR IQR (School A): 82.5 – 57.5 = 25 IQR (School B): 72.5 – 67.5 = 5 “Although the average test performance in both schools was similar (M = 70), the test scores in School A were much more widely distributed than those in School B (IQRA = 25, IQRB = 5”
- 29. MEASURES OF SPREAD: STANDARD DEVIATION Case School A School B 1 45 60 2 50 65 3 55 65 4 60 70 5 65 70 6 70 70 7 70 70 8 75 70 9 80 70 10 85 75 11 90 75 12 95 80 Media n 70 70 Mean 70 70 SD (School A) / SDA: 15.81 SD (School B) / SDB: 5.22 “The test scores in School A were satisfactory (M = 70, SD= 15.81). School B reported similar results, which clustered more tightly around the average (M = 70, SD = 5.22)”
- 30. UNIVARIATE STATISTICS: SUMMARY Central Tendency Spread Mode Median Mean Range IQR SD Nominal Ordinal Continuous
- 31. POP QUIZ Average and mean are the same thing In daily use, the words average and mean are interchangeable. In statistics, the mean is one type of average. The mode and median are also types of average We must always use the median with ordinal variables Technically, we can use both the median and the mode, but the median is a more powerful metric. The third option, the mean, cannot be used with ordinal data. We must always use the median with continuous variables It is usually the best option. However, if we have unusual data (with one or two very high or very low values) it may be better to use the median. We can calculate mean values in a Likert scale (1: Strongly Agree; 2: Agree; 3: Disagree; 4: Strongly Disagree) Some people do, but you shouldn’t. Likert scales produce ordinal data. You should not use the mean when your data is ordinal. The appropriate spread metric for nominal variables is the IQR No. Nominal data cannot be ranked in any sensible way, so they do not have a spread.
- 32. BIVARIATE ANALYSIS CROSSTABULATIONS & CHI-SQUARE TESTS
- 33. CROSSTABULATIONS We use a cross-tabulation when we want to compare two ordinal or nominal variables Examples: Gender x Favourite colour School type x Attitudes towards mathematics
- 34. EXAMPLE CROSSTABULATION
- 35. CHI-SQUARE TEST Action Figures Barbie Dolls Male (50) 25 25 Female (60) 30 30
- 36. CHI-SQUARE TEST Action Figures Barbie Dolls Male (50) 48 2 25 25 Female (60) 25 35 30 30
- 37. CHI-SQUARE “A statistically significant difference was found in the toy preferences of boys and girls. As can be seen in Table 1 boys were much more likely to prefer action figures, compared to girls (χ2= 36.068, df=1, p=o.ooo)“ Gender AF BD Total --Male 48 2 50 -- Female 25 35 60 --Total 73 37 110
- 38. BIVARIATE ANALYSIS T-TESTS
- 39. T-TESTS We use a t-test to see if there is any connection between a nominal variable (the independent variable) and a continuous one (the dependent variable) The t-test breaks up your population in two groups (e.g., boys and girls), examines the mean value of the independent variable for each group, and then compares them.
- 40. T-TESTS
- 41. T-TEST
- 42. BIVARIATE ANALYSIS CORRELATIONS
- 43. CORRELATIONS We use a correlation (e.g., Spearmann or Pearson‘s coefficient) to see if there is any connection between two continuous variables (e.g. weight and height). Correlations range from 1 to -1. A high value on either side means that the two variables are strongly connected. A value close to 0 means that they are not. We can depict correlations visually with a scatterplot diagramme.
- 44. STRONG POSITIVE CORRELATION
- 45. STRONG NEGATIVE CORRELATION
- 46. NO CORRELATION
- 47. CORRELATION IS NOT CAUSATION!
- 48. REALLY, IT DOESN‘T
- 49. OR DOES IT?
- 50. SUMMARY Nominal Ordinal Continuous Nominal Crosstabs / χ2 Crosstabs / χ2 T-Test (if it has two values) Ordinal Crosstabs / χ2 Crosstabs / χ2 T-Test (if it has two values) Continuous T-Test (if it has two values) T-Test (if it has two values) Correlation
- 51. POP QUIZ If I want to test whether there is a connection music preferences and gender, I must use a cross-tab That is correct. Music preferences and gender are both nominal variables. The correct procedure for pairing nominal variables is a crosstab (and chi-square) A p value of 0.045 shows that something is statistically significant. That is correct. The usual threshold of statistical significance in educational research is 0.05, and anything lower than that is considered significant. I can prove that something is causing something else using a Pearson‘s correlation coefficient. No, you cannot. Correlation does not imply causation.

No public clipboards found for this slide

Be the first to comment