Introduction to statistics RSS6 2014

726 views

Published on

Published in: Health & Medicine, Technology
  • Be the first to comment

Introduction to statistics RSS6 2014

  1. 1. Introduction to Statistics Amr Albanna, MD, MSc
  2. 2. Content • Scales of Measurement – Categorical Variables – Numerical Variables: • Displays of Categorical Data – Frequencies – Bar Graph – Pie Chart • Numerical Measures of Central Tendency – Mean – Median – Mode • Numerical Measures of Spread • Association • Correlation • Regression
  3. 3. Scales of Measurement • Categorical Variables: – Nominal: Categorical variable with no order (e.g. Blood type A, B, AB or O). – Ordinal: Categorical, but with an order (e.g. Pain: “none", “mild", “moderate", or “severe"). • Numerical Variables: – Interval: Quantitative data where differences are meaningful (e.g. Years 2009 -2010.). Here differences are meaningful; ratios are not meaningful. – Ratio: Quantitative data where ratios are meaningful (e.g. weights, 200 lbs is twice as heavy as 100 lbs).
  4. 4. Categorical Variables • Displays of Categorical Data – Frequencies – Bar Graph – Pie Chart
  5. 5. Categorical Variables Variable (Sex) Frequency Proportion Male 609 0.61 Female 391 0.39 Total 1000 100 0 100 200 300 400 500 600 700 Male Female Bar Graph Pie Chart
  6. 6. Bar Graph
  7. 7. Numerical Variables Central Tendency Numerical Spread
  8. 8. Measures of Central Tendency • The 3 M's – Mean – Median – Mode
  9. 9. Measures of Central Tendency Sample Mean The sample mean, 𝑥, is the sum of all values in the sample divided by the total number of observations, n, in the sample. 𝑥 = 𝑥𝑖 𝑛 𝑖=1 𝑛
  10. 10. Example: Sample Mean • Mean systolic blood pressure Scenario 1: Mean = (120 + 135 + 115 + 110 + 105 + 140)/6 =121 Subjects BP 1 120 (x1) 2 135 (x2) 3 115 (x3) 4 110 (x4) 5 105 (x5) 6 140 (x6)
  11. 11. Sample Mean • The mean is affected by extreme observations and is not a resistant measure. Scenario 2: Mean = (120 + 135 + 115 + 110 + 105 + 140 + 280)/7 =144 Subjects BP 1 120 (x1) 2 135 (x2) 3 115 (x3) 4 110 (x4) 5 105 (x5) 6 140 (x6) 7 280 (x7)
  12. 12. Median • The sample median, M, is the number such that “half" the values in the sample are smaller and the other “half" are larger. • Use the following steps to find M. – Sort the data (arrange in increasing order). – Is the size of the data set n even or odd? – If odd: M = value in the exact middle. – If even: M = the average of the two middle numbers.
  13. 13. Example: Sample Median • Median systolic BP: Scenario 1: 120 : 135 : 115 : 110 : 105 : 140 Median = (115 + 110) /2 = 112.5 Scenario 2: 120 : 135 : 115 : 110 : 105 : 140 : 280 Median = 110 • The median is not affected by extreme observations and is a resistant measure.
  14. 14. Mode • The sample mode is the value that occurs most frequently in the sample (a data set can have more than one mode). • This is the only measure of center which can also be used for categorical data. • The population mode is the highest point on the population distribution.
  15. 15. Symmetric Data Distribution 0 1 2 3 4 5 6 10 20 30 40 50 Frequency Value
  16. 16. Rightward Skewness of Data 0 1 2 3 4 5 6 10 20 30 40 50 Mode Frequency Value Median Mean
  17. 17. Leftward Skewness of Data 0 1 2 3 4 5 6 10 20 30 40 50 Mean Median Mode Value Frequency
  18. 18. Numerical Measures of Spread • Range • Sample Variance • Inter Quartile Range (IQR)
  19. 19. Numerical Measures of Spread Range: The range of the data set is the difference between the highest value and the lowest value. – Range = highest value - lowest value – Easy to compute BUT ignores a great deal of information. – Obviously the range is affected by extreme observations and is not a resistant measure.
  20. 20. Numerical Measures of Spread • Variance: equal to the sum of squared deviations from the sample mean divided by n - 1, where n is the number of observations in the sample.
  21. 21. Numerical Measures of Spread • Percentile: The percentile of a distribution is the value at which observations fall at or below it.
  22. 22. Numerical Measures of Spread • The most commonly used percentiles are the quartiles. 1st quartile Q1 = 25th percentile. 2nd quartile Q2 = 50th percentile. 3rd quartile Q1 = 75th percentile.
  23. 23. Numerical Measures of Spread Inter Quartile Range (IQR) A simple measure spread giving the range covered by the middle half of the data is the (IQR) defined below. IQR = Q3 - Q1 The IQR is a resistant measure of spread.
  24. 24. Numerical Measures of Spread Outliers: extreme observations that fall well outside the overall pattern of the distribution. • An outlier may be the result of a – Recording error, – An observation from a different population, – An unusual extreme observation (biological diversity)
  25. 25. Numerical Measures of Spread
  26. 26. Association Between Variables • Explanatory (exposure) variable “X” • Response (outcome) variable “Y”
  27. 27. Association Between Variables
  28. 28. Association Between Variables
  29. 29. Association Between Variables
  30. 30. Measurement of Correlation
  31. 31. Correlation is NOT Association
  32. 32. Regression

×