Your SlideShare is downloading. ×
Introduction to statistics RSS6 2014
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Introduction to statistics RSS6 2014

157
views

Published on

Published in: Health & Medicine, Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
157
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Introduction to Statistics Amr Albanna, MD, MSc
  • 2. Content • Scales of Measurement – Categorical Variables – Numerical Variables: • Displays of Categorical Data – Frequencies – Bar Graph – Pie Chart • Numerical Measures of Central Tendency – Mean – Median – Mode • Numerical Measures of Spread • Association • Correlation • Regression
  • 3. Scales of Measurement • Categorical Variables: – Nominal: Categorical variable with no order (e.g. Blood type A, B, AB or O). – Ordinal: Categorical, but with an order (e.g. Pain: “none", “mild", “moderate", or “severe"). • Numerical Variables: – Interval: Quantitative data where differences are meaningful (e.g. Years 2009 -2010.). Here differences are meaningful; ratios are not meaningful. – Ratio: Quantitative data where ratios are meaningful (e.g. weights, 200 lbs is twice as heavy as 100 lbs).
  • 4. Categorical Variables • Displays of Categorical Data – Frequencies – Bar Graph – Pie Chart
  • 5. Categorical Variables Variable (Sex) Frequency Proportion Male 609 0.61 Female 391 0.39 Total 1000 100 0 100 200 300 400 500 600 700 Male Female Bar Graph Pie Chart
  • 6. Bar Graph
  • 7. Numerical Variables Central Tendency Numerical Spread
  • 8. Measures of Central Tendency • The 3 M's – Mean – Median – Mode
  • 9. Measures of Central Tendency Sample Mean The sample mean, 𝑥, is the sum of all values in the sample divided by the total number of observations, n, in the sample. 𝑥 = 𝑥𝑖 𝑛 𝑖=1 𝑛
  • 10. Example: Sample Mean • Mean systolic blood pressure Scenario 1: Mean = (120 + 135 + 115 + 110 + 105 + 140)/6 =121 Subjects BP 1 120 (x1) 2 135 (x2) 3 115 (x3) 4 110 (x4) 5 105 (x5) 6 140 (x6)
  • 11. Sample Mean • The mean is affected by extreme observations and is not a resistant measure. Scenario 2: Mean = (120 + 135 + 115 + 110 + 105 + 140 + 280)/7 =144 Subjects BP 1 120 (x1) 2 135 (x2) 3 115 (x3) 4 110 (x4) 5 105 (x5) 6 140 (x6) 7 280 (x7)
  • 12. Median • The sample median, M, is the number such that “half" the values in the sample are smaller and the other “half" are larger. • Use the following steps to find M. – Sort the data (arrange in increasing order). – Is the size of the data set n even or odd? – If odd: M = value in the exact middle. – If even: M = the average of the two middle numbers.
  • 13. Example: Sample Median • Median systolic BP: Scenario 1: 120 : 135 : 115 : 110 : 105 : 140 Median = (115 + 110) /2 = 112.5 Scenario 2: 120 : 135 : 115 : 110 : 105 : 140 : 280 Median = 110 • The median is not affected by extreme observations and is a resistant measure.
  • 14. Mode • The sample mode is the value that occurs most frequently in the sample (a data set can have more than one mode). • This is the only measure of center which can also be used for categorical data. • The population mode is the highest point on the population distribution.
  • 15. Symmetric Data Distribution 0 1 2 3 4 5 6 10 20 30 40 50 Frequency Value
  • 16. Rightward Skewness of Data 0 1 2 3 4 5 6 10 20 30 40 50 Mode Frequency Value Median Mean
  • 17. Leftward Skewness of Data 0 1 2 3 4 5 6 10 20 30 40 50 Mean Median Mode Value Frequency
  • 18. Numerical Measures of Spread • Range • Sample Variance • Inter Quartile Range (IQR)
  • 19. Numerical Measures of Spread Range: The range of the data set is the difference between the highest value and the lowest value. – Range = highest value - lowest value – Easy to compute BUT ignores a great deal of information. – Obviously the range is affected by extreme observations and is not a resistant measure.
  • 20. Numerical Measures of Spread • Variance: equal to the sum of squared deviations from the sample mean divided by n - 1, where n is the number of observations in the sample.
  • 21. Numerical Measures of Spread • Percentile: The percentile of a distribution is the value at which observations fall at or below it.
  • 22. Numerical Measures of Spread • The most commonly used percentiles are the quartiles. 1st quartile Q1 = 25th percentile. 2nd quartile Q2 = 50th percentile. 3rd quartile Q1 = 75th percentile.
  • 23. Numerical Measures of Spread Inter Quartile Range (IQR) A simple measure spread giving the range covered by the middle half of the data is the (IQR) defined below. IQR = Q3 - Q1 The IQR is a resistant measure of spread.
  • 24. Numerical Measures of Spread Outliers: extreme observations that fall well outside the overall pattern of the distribution. • An outlier may be the result of a – Recording error, – An observation from a different population, – An unusual extreme observation (biological diversity)
  • 25. Numerical Measures of Spread
  • 26. Association Between Variables • Explanatory (exposure) variable “X” • Response (outcome) variable “Y”
  • 27. Association Between Variables
  • 28. Association Between Variables
  • 29. Association Between Variables
  • 30. Measurement of Correlation
  • 31. Correlation is NOT Association
  • 32. Regression