Upcoming SlideShare
×

# MD Paediatrics (Part 1) - Overview of Basic Statistics

971 views

Published on

Brief overview of basic statistics which migh be useful for MD (Paedatrics -Part 1)
Please note that some images and slides taken from the internet behalf of the readers to have a clear picture.

Published in: Health & Medicine, Technology
12 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• OK ,sure .i will email this to you

Are you sure you want to  Yes  No
• Dear Bernard, If possible kindly share this ppt with me in my e-mail (janardan.nayak@hotmail.com) which will be helpful to me in my PGDPHM course as well as is teaching my su-ordinates. Thanks & regards.

Are you sure you want to  Yes  No
Views
Total views
971
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
0
2
Likes
12
Embeds 0
No embeds

No notes for slide

### MD Paediatrics (Part 1) - Overview of Basic Statistics

1. 1. Basic Statistics Overview for MD Paediatrics (Part 01) 2014
2. 2. Contents • Statistics – Introduction • Variables & their representations (Tables,Graphs) • Measures of Central tendency & Dispersion • Normal Distribution • Tests of Significance • Sampling • Hypothesis testing – Null hypothesis – Alternative Hypothesis – Type 1 & 2 errors • Study Designs • Epidemiology
3. 3. © bdwjayamanne@gmail.com/djayamanne@yahoo.com 01.Statistics • What is Statistics (සසසස‍සසසස) ? The science of collection, analysis, and making inference / conclusion of data. • Collection • Analysis • Making Inference (* the word statistic(සසසස‍සසසසස) has a different meaning)
4. 4. © bdwjayamanne@gmail.com/djayamanne@yahoo.com Variable: A quantity that vary from one unit to another ,the quantity referred as a variable. Eg: Height ,Weight,Blood Pressure, Crop yield -one value is no sufficient Discrete - Fixed number of possibilities (Blood Group) Continuous - Infinite number of possibilities (BP) -even within a finite interval 02.Variables and Constants
5. 5. © bdwjayamanne@gmail.com/djayamanne@yahoo.com Constant Constant: Opposite of a variable .If the quantity is not vary from one unit to another that quantity is referred as a constant. Eg. Density of an element - one value is sufficient
6. 6. © bdwjayamanne@gmail.com/djayamanne@yahoo.com Levels of measurement in statistics 1. Nominal scale a. Only indicates category b. Eg.Religion -Buddhism ,Christianity,Hindu 2. Ordinal scale a. in addition to the category,allows cases to be ordered by degree according to the measurement b. Eg: very poor,Poor,OK,Good,Excellent 3. Interval scale a. Has units measuring intervals of equal distance between values - measured in linear scale b. No true zero c. Eg: temperature in Celsius ,Date ,Latitude 4. Ratio scale a. Has true zero b. Not measured in linear scale
7. 7. © bdwjayamanne@gmail.com/djayamanne@yahoo.com • Four (4) major types of Graphs – Bar (Pareto Diagram) – Pictorial – Pie (circle) – Line • Other types – Histogram(Special Bar graph) – Stem and Leaf Plot – Dot Plot (Stem and Leaf + Histogram) – Box and Whisker Plot – Scatter plot Graphs
8. 8. © bdwjayamanne@gmail.com/djayamanne@yahoo.com http://www.sophia.org/tutorials/types-of-graphs Graphs
9. 9. © bdwjayamanne@gmail.com/djayamanne@yahoo.com Pictogram Graphs
10. 10. © bdwjayamanne@gmail.com/djayamanne@yahoo.com Box and Whisker plot Median Q1 Q3 Box Whiskers 1. the minimum and maximum of all of the data 2. the lowest datum still within 1.5 IQR the lowest datum still within 1.5 IQR of the lower quartile, and the highest datum still within 1.5 IQR of the upper quartile(Tukey Box plot) 3. one standard deviation above and below the mean of the data 4. the 9th percentile and the 91st percentile 5. the 2nd percentile and the 98th percentile.
11. 11. © bdwjayamanne@gmail.com/djayamanne@yahoo.com Scatter plot • 2 variables Dependent Variable (y) Independent Variable (x)
12. 12. © bdwjayamanne@gmail.com/djayamanne@yahoo.com 04.Measures of Central Tendency & Dispersion Measures of Centre (Central tendency): 1.Mean 2.Median 3.Mode Measures of Dispersion 1.Range 2.Variance 3.Standard deviation 4.Absolute deviance 5.Quantiles
13. 13. © bdwjayamanne@gmail.com/djayamanne@yahoo.com 1.Mean i) Arithmetic mean Simply the average Eg :Average speed Distance x 60km/h Distance x 30km/h Total Distance 2x Speed = Distance/Time ii) Harmonic Mean(H) Suitable for Moving objects Speeds iii) Geometric Mean(G) Eg : Calculate average for the rates
14. 14. © bdwjayamanne@gmail.com/djayamanne@yahoo.com 2.Median Value of the middle observation after arranging the data set to an order (ascending & descending) Eg: 1)3,8,5 ,Median = 5 2) If 25 observations , Median = Value of the 13 th observation Location of the median = (n+12) Eg: n=24 Median =24+12=12.5 (Average of the 12th and 13th observation) Advantages • Not sensitive for outliers (Outliers - observations highly deviated from the rest)
15. 15. © bdwjayamanne@gmail.com/djayamanne@yahoo.com 3.Mode Most frequent value of a data set eg: 2,5,11,5,8,8 - Mode 8 2,5,11,5,8,8,5 - 2 Modes 8 & 5
16. 16. © bdwjayamanne@gmail.com/djayamanne@yahoo.com Measures of Dispersion 1.Range 2.Variance 3.Standard deviation 4.Absolute deviance 5.Quantiles
17. 17. © bdwjayamanne@gmail.com/djayamanne@yahoo.com 1.Range Difference between maximum and minimum . • Positive value (+) • No idea about minimum and maximum 2.Absolute deviance 3.Variance Average of squared deviation of an observation from the mean. Population (As an estimator of variability) No variability =0 ,Small variability = small , Large variability =Large units Squared of the original unit, Limitation - Unit (Scale) dependent 4.Standard deviation SD = Variance ^1/2
18. 18. © bdwjayamanne@gmail.com/djayamanne@yahoo.com 5. Quantiles(සසසස) Dividing ordered data into q essentially equal-sized data subsets is the motivation for q-quantiles Eg: • Median - 2nd quartile is called the median • Quartiles - 4 quantiles (Q) • Percentiles -100 quantiles (P) • Deciles - 10-quantiles are called deciles (D) Quartiles(Q) (සසසසසසස) Q1 - 1st quartile Q2 - 2nd quartile/Median Q3 - 3rd quartile Q1 is the value of the observation ,divide the data set 25% to the left and 75% on to the right. Q2 Value of the observation divide the data set 50% to the both sides Inter Quartile range IQR = Q3-Q1
19. 19. © bdwjayamanne@gmail.com/djayamanne@yahoo.com 05.Normal Distribution Most important distributions in Statistics 1.Normal Distribution 2.Poisson Distribution 3.Binomial Distribution Normal Distribution(s) A. Normal Family of Distributions B. Standard Normal Distribution Estimates Mean Variance (standard deviation) Skewness Kurtosis
20. 20. © bdwjayamanne@gmail.com/djayamanne@yahoo.com Normal Family of Distributions Different means SD Constant Different SD Mean Constant
21. 21. © bdwjayamanne@gmail.com/djayamanne@yahoo.com Mean and SD both different
22. 22. © bdwjayamanne@gmail.com/djayamanne@yahoo.com Normal Family of Distributions • Mean = Median = Mode • 68% of Data between -1SD and -1SD • 95% of Data between -2SD and +2SD • 99.7% of Data between -3SD and +3SD
23. 23. © bdwjayamanne@gmail.com/djayamanne@yahoo.com Skewness simply Skewness = (Mean - Mode ) / SD Positively /Right Skewed Right tail is longer Mode < Mean < Median More than 50% data > Mean value Negatively /Left Skewed Left tail is longer Median <Mean Mode More than 50% data < Mean value
24. 24. © bdwjayamanne@gmail.com/djayamanne@yahoo.com Multimodal Distributions Bimodal Distributions - 2 modes
25. 25. © bdwjayamanne@gmail.com/djayamanne@yahoo.com Standard Normal Distribution Z
26. 26. © bdwjayamanne@gmail.com/djayamanne@yahoo.com Standard Normal Distribution Z
27. 27. © bdwjayamanne@gmail.com/djayamanne@yahoo.com • Mean = 0 • Variance = SD = 1 • Skewness = 0 • Kurtosis =0 Standard Normal Distribution
28. 28. © bdwjayamanne@gmail.com/djayamanne@yahoo.com Parametric VS Nonparametric General rule • Measurement Scale Nominal or Ordinal usually Nonparametric Tests used • Interval & Ratio-Scale variables - Parametric test • Test of Normality – Shapiro-Wilk Test – Kolmogorov-Smirnov Test – Anderson-Darling Test – Chi square Test
29. 29. © bdwjayamanne@gmail.com/djayamanne@yahoo.com 05.Parametric Tests VS Nonparametric Tests Mean SD Pearson Correlation One sample t test Independent Sample t test Related Sample t test One Way ANOVA Two Way ANOVA Median IQR / Range Spearman Correlation / Kendall's Tau Sign Test Mann-Whitney U Test / Rank Sum test Wilcoxon sign ranked test Kruskal Wallis Test Friedmann Test / Quade test Parametric Nonparametric
30. 30. © bdwjayamanne@gmail.com/djayamanne@yahoo.com 06.Sampling a.Probability sampling 1. Every element has a known nonzero probability of being sampled and 2. involves random selection at some point. i. Simple Random Sampling a. With replacement b. Without replacement ii. Systematic Sampling iii. Stratified Sampling iv.Probability Proportional to Size Sampling v. Cluster or Multistage Sampling
31. 31. © bdwjayamanne@gmail.com/djayamanne@yahoo.com b.Non-probability Sampling • Convenience, Haphazard or Accidental Sampling • The sample is composed of whatever persons can be most easily accessed to fill out the survey • Quota Sampling /ad hoc quotas • The sample is designed to include a designated number of people with certain specified characteristics. For example, 100 coffee drinkers. This type of sampling is common in nonprobability market research surveys • Purposive Sampling or Judgmental sampling. – A researcher decides which population members to include in the sample based on his or her judgement. The researcher may provide some alternative justification for the representativeness of the sample • Snowball sampling(Respondent Driven Sampling): Eg :Social Networks – Often used when a target population is rare, members of the target population recruit other members of the population for the survey • Deviant Case (Special case of Purposive Sampling) • Case study Sampling
32. 32. © bdwjayamanne@gmail.com/djayamanne@yahoo.com 07.Hypothesis Testing Goal: Make statement(s) regarding unknown population parameter values based on sample data Elements of a hypothesis test: *Null hypothesis - Statement regarding the value(s) of unknown parameter(s). Typically will imply no association between explanatory and response variables in our applications (will always contain an equality) Alternative hypothesis - Statement contradictory to the null hypothesis (will always contain an inequality)
33. 33. © bdwjayamanne@gmail.com/djayamanne@yahoo.com What to test Effect or Difference we are interested in.. *Difference in Means or Proportions *Odds Ratio (OR) *Relative Risk (RR) *Correlation Coefficient *Clinically important difference *Smallest difference considered biologically or clinically relevant
34. 34. © bdwjayamanne@gmail.com/djayamanne@yahoo.com Null Hypothesis *Usually that there is no effect *Mean = 0 *OR = 1 *RR = 1 *Correlation Coefficient = 0
35. 35. © bdwjayamanne@gmail.com/djayamanne@yahoo.com Alternative Hypothesis *Contradicts the null *There is an effect *What you want to prove ?
36. 36. © bdwjayamanne@gmail.com/djayamanne@yahoo.com Null hypothesis (H0) is true Null hypothesis (H0) is false Reject null hypothesis Type I error False positive Correct outcome True positive Fail to reject null hypothesis Correct outcome True negative Type II error False negative
38. 38. © bdwjayamanne@gmail.com/djayamanne@yahoo.com Eg Mean Birth Weight National =2.5kg | Sample 2.6 kg Null hypothesis = H0 = There is no difference between sample mean and National (population) mean Alternate Hypothesis • Ha = Sample mean is different from National Mean Or • Ha1= Sample mean is higher than national mean • Ha2= Sample mean is lower than national mean
39. 39. Descriptive Analytical •Case Report •Case Series •Cross – Sectional D. •Ecological •Prevalence /Surveillance •Cohort Studies •Case Control(Trohoc) •Cross –Sectional Analytical Randomized Non- Randomized 08.
40. 40. © bdwjayamanne@gmail.com/djayamanne@yahoo.com LANCET 2002:359:57 -
41. 41. © bdwjayamanne@gmail.com/djayamanne@yahoo.com Observational - Descriptive • Frequency, Natural History, Possible determinants • No comparisons groups • Useful for hypothesis generation about causal associations • E.g.: – Case Reports (SLJOP) –www.sljol.info – Case series – Descriptive Cross Sectional – Ecological/Population – Correlations
42. 42. © bdwjayamanne@gmail.com/djayamanne@yahoo.com • Always have a comparison/control group • Allows determination of causal association • Hypothesis testing • E.g. – Cohort study – Case Control Study (Trohoc Study) – Analytical Cross Sectional Study Observational - Analytical
43. 43. © bdwjayamanne@gmail.com/djayamanne@yahoo.com Observational - Analytical LANCET 2002:359:57 - 61
44. 44. © bdwjayamanne@gmail.com/djayamanne@yahoo.com Cohort Studies • Types – Prospective – Retrospective – Ambi-directional
45. 45. © bdwjayamanne@gmail.com/djayamanne@yahoo.com Cohort studies - Advantages • Temporality can be established • Incidence can be calculated. • Several possible outcome related to exposure can be studied simultaneously. • Provide direct estimate of risk. • Since comparison groups are formed before disease develops certain forms of bias can be minimized like misclassification bias. • Allows the conclusion of cause effect relationship
46. 46. © bdwjayamanne@gmail.com/djayamanne@yahoo.com Cohort studies - disadvantages • Large population is needed • Not suitable for rare diseases. • It is time consuming and expensive • Certain administrative problems like loss of staff, loss of funding and extensive record keeping are common. • Problem of attrition(drop outs) of initial cohort is common • Study‍itself‍may‍alter‍people’s‍behavior‍
47. 47. © bdwjayamanne@gmail.com/djayamanne@yahoo.com Case Control Studies
48. 48. © bdwjayamanne@gmail.com/djayamanne@yahoo.com Case Control Studies - Advantages • Quick, less expensive • Well suited for disease with long latent period • Optimal for evaluation of rare diseases • Can study etiological factors for a single disease • Requires small sample than a cohort study • No attrition (drop outs) problem • Ethical problems are minimal, no risk to subjects
49. 49. © bdwjayamanne@gmail.com/djayamanne@yahoo.com Case Control Studies - Disadvantages • More prone to bias -Relies on records or recall of exposure information • Validation of exposure data often difficult • Selection of appropriate control group may be difficult • Inefficient for evaluation of rare exposure • Cannot directly measure incidence, can only estimate relative risk • Study of natural history of disease not possible
50. 50. © bdwjayamanne@gmail.com/djayamanne@yahoo.com Observational - Analytical • Data Analysis / Presentation - Basic – Rates – Proportions with 95% Confidence Interval – Percentages – Odds Ratio (95% CI) – Case control & Cross sectional – Relative Risk (95% CI) – Attributable Risk/ Risk difference – Attributable Risk ratio / Aetiologic Fraction Cohort – Population Attributable risk – Number Needed to Treat /Harm
51. 51. © bdwjayamanne@gmail.com/djayamanne@yahoo.com Prevalence The proportion of a population found to have a condition / disease 1.Point Prevalence 2.Period Prevalence 3.Lifetime Prevalence 09.Epidemiology - Introduction to terms
52. 52. © bdwjayamanne@gmail.com/djayamanne@yahoo.com Incidence The number of new cases during some time period 1.Incidence Proportion /Cumulative Incidence 2.Incidence (Density)Rate/ Person-time Incidence rate
53. 53. © bdwjayamanne@gmail.com/djayamanne@yahoo.com Thank you