Upcoming SlideShare
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

# Saving this for later?

### Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Standard text messaging rates apply

# Introduction to statistics RSS6 2014

157
views

Published on

Published in: Health & Medicine, Technology

0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total Views
157
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
4
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Transcript

• 1. Introduction to Statistics Amr Albanna, MD, MSc
• 2. Content • Scales of Measurement – Categorical Variables – Numerical Variables: • Displays of Categorical Data – Frequencies – Bar Graph – Pie Chart • Numerical Measures of Central Tendency – Mean – Median – Mode • Numerical Measures of Spread • Association • Correlation • Regression
• 3. Scales of Measurement • Categorical Variables: – Nominal: Categorical variable with no order (e.g. Blood type A, B, AB or O). – Ordinal: Categorical, but with an order (e.g. Pain: “none", “mild", “moderate", or “severe"). • Numerical Variables: – Interval: Quantitative data where differences are meaningful (e.g. Years 2009 -2010.). Here differences are meaningful; ratios are not meaningful. – Ratio: Quantitative data where ratios are meaningful (e.g. weights, 200 lbs is twice as heavy as 100 lbs).
• 4. Categorical Variables • Displays of Categorical Data – Frequencies – Bar Graph – Pie Chart
• 5. Categorical Variables Variable (Sex) Frequency Proportion Male 609 0.61 Female 391 0.39 Total 1000 100 0 100 200 300 400 500 600 700 Male Female Bar Graph Pie Chart
• 6. Bar Graph
• 7. Numerical Variables Central Tendency Numerical Spread
• 8. Measures of Central Tendency • The 3 M's – Mean – Median – Mode
• 9. Measures of Central Tendency Sample Mean The sample mean, 𝑥, is the sum of all values in the sample divided by the total number of observations, n, in the sample. 𝑥 = 𝑥𝑖 𝑛 𝑖=1 𝑛
• 10. Example: Sample Mean • Mean systolic blood pressure Scenario 1: Mean = (120 + 135 + 115 + 110 + 105 + 140)/6 =121 Subjects BP 1 120 (x1) 2 135 (x2) 3 115 (x3) 4 110 (x4) 5 105 (x5) 6 140 (x6)
• 11. Sample Mean • The mean is affected by extreme observations and is not a resistant measure. Scenario 2: Mean = (120 + 135 + 115 + 110 + 105 + 140 + 280)/7 =144 Subjects BP 1 120 (x1) 2 135 (x2) 3 115 (x3) 4 110 (x4) 5 105 (x5) 6 140 (x6) 7 280 (x7)
• 12. Median • The sample median, M, is the number such that “half" the values in the sample are smaller and the other “half" are larger. • Use the following steps to find M. – Sort the data (arrange in increasing order). – Is the size of the data set n even or odd? – If odd: M = value in the exact middle. – If even: M = the average of the two middle numbers.
• 13. Example: Sample Median • Median systolic BP: Scenario 1: 120 : 135 : 115 : 110 : 105 : 140 Median = (115 + 110) /2 = 112.5 Scenario 2: 120 : 135 : 115 : 110 : 105 : 140 : 280 Median = 110 • The median is not affected by extreme observations and is a resistant measure.
• 14. Mode • The sample mode is the value that occurs most frequently in the sample (a data set can have more than one mode). • This is the only measure of center which can also be used for categorical data. • The population mode is the highest point on the population distribution.
• 15. Symmetric Data Distribution 0 1 2 3 4 5 6 10 20 30 40 50 Frequency Value
• 16. Rightward Skewness of Data 0 1 2 3 4 5 6 10 20 30 40 50 Mode Frequency Value Median Mean
• 17. Leftward Skewness of Data 0 1 2 3 4 5 6 10 20 30 40 50 Mean Median Mode Value Frequency
• 18. Numerical Measures of Spread • Range • Sample Variance • Inter Quartile Range (IQR)
• 19. Numerical Measures of Spread Range: The range of the data set is the difference between the highest value and the lowest value. – Range = highest value - lowest value – Easy to compute BUT ignores a great deal of information. – Obviously the range is affected by extreme observations and is not a resistant measure.
• 20. Numerical Measures of Spread • Variance: equal to the sum of squared deviations from the sample mean divided by n - 1, where n is the number of observations in the sample.
• 21. Numerical Measures of Spread • Percentile: The percentile of a distribution is the value at which observations fall at or below it.
• 22. Numerical Measures of Spread • The most commonly used percentiles are the quartiles. 1st quartile Q1 = 25th percentile. 2nd quartile Q2 = 50th percentile. 3rd quartile Q1 = 75th percentile.
• 23. Numerical Measures of Spread Inter Quartile Range (IQR) A simple measure spread giving the range covered by the middle half of the data is the (IQR) defined below. IQR = Q3 - Q1 The IQR is a resistant measure of spread.
• 24. Numerical Measures of Spread Outliers: extreme observations that fall well outside the overall pattern of the distribution. • An outlier may be the result of a – Recording error, – An observation from a different population, – An unusual extreme observation (biological diversity)
• 25. Numerical Measures of Spread
• 26. Association Between Variables • Explanatory (exposure) variable “X” • Response (outcome) variable “Y”
• 27. Association Between Variables
• 28. Association Between Variables
• 29. Association Between Variables
• 30. Measurement of Correlation
• 31. Correlation is NOT Association
• 32. Regression