FK6163Explore & Summarise     Dr Azmi Mohd Tamil  Dept of Community HealthUniversiti Kebangsaan Malaysia                  ...
Introduction   Method of Exploring and   Summarising Data differsAccording to Types of Variables                        ©d...
Dependent/Independent         Independent VariablesFood Intake                 Frequency of Exercise                  Obes...
©drtamil@gmail.com 2012
Explore4 Itis the first step in the analytic process4 to explore the characteristics of the data4 to screen for errors and...
Data Screening                                         PARITY                                         Frequency   Percent4...
Data Screening4 See  whether the  data make sense or  not.4 E.g. Parity 10 but  age only 25.          ©drtamil@gmail.com 2...
©drtamil@gmail.com 2012
©drtamil@gmail.com 2012
Data Screening   4 By   looking at measures of central tendency      and range, we can also detect abnormal values      fo...
Interpreting the Box Plot                         OutlierLargest non-outlier                The whiskers extend           ...
Data Screening              6004 We  can also make    500                                 73 use of       400 graphical to...
Data Cleaning4 Identify the extreme/wrong values4 Check with original data source – i.e.  questionnaire4 If incorrect, do ...
Parameters of Data                        Distribution4 Mean  – central value of data4 Standard deviation – measure of how...
Normal distribution4   The Normal distribution is    represented by a family of curves    defined uniquely by two paramete...
Normal distribution4   If the observations follow a     99.7%    Normal distribution, a range     95.4%    covered by one ...
Normality4 Why   bother with normality??4 Because it dictates the type of analysis  that you can run on the data          ...
Normality-Why?                                                              ParametricQualitative      Quantitative   Norm...
Normality-Why?                                                      Non-parametricQualitative     Quantitative    Data not...
Normality-How?                           4 Explored   statistically4 Explored   graphically                             • ...
Kolmogorov- Smirnov4 In the 1930’s, Andrei Nikolaevich  Kolmogorov (1903-1987) and N.V.  Smirnov (his student) came out wi...
Skew ness4 Skewed   to the right  indicates the  presence of large  extreme values4 Skewed to the left  indicates the  pre...
Kurtosis4 For  symmetrical  distribution only.4 Describes the shape  of the curve4 Mesokurtic -  average shaped4 Leptokurt...
Skew ness & Kurtosis4 Skew   ness ranges from -3 to 3.4 Acceptable range for normality is skew ness  lying between -1 to 1...
©drtamil@gmail.com 2012
Normality - Examples                                                                             Graphically605040302010  ...
Q&Q Plot4 This  plot compares the quintiles of a data  distribution with the quintiles of a standardised  theoretical dist...
Normality - Examples                                                                                                     G...
Normal distributionMean=median=mode       ©drtamil@gmail.com 2012
Normality - Examples                                                                                 Statistically        ...
K-S Test©drtamil@gmail.com 2012
K-S Test4 very  sensitive to the sample sizes of the  data.4 For small samples (n<20, say), the  likelihood of getting p<0...
Guide to deciding on           normality          ©drtamil@gmail.com 2012
Normality                                                                                                                 ...
TYPES OF TRANSFORMATIONS     Square root              Logarithm               InverseReflect and square       Reflect and ...
Summarise4 Summarise   a large set of data by a few  meaningful numbers.4 Single variable analysis  • For the purpose of d...
Frequency Table                             Race     F            %                            Malay    760        95.84% ...
Frequency                            Distribution Table• > 20 observations, best          Umur       Bil          %present...
Measurement of Central Tendency & Spread                     ©drtamil@gmail.com 2012
Measures of Central                 Tendency4Mean4Mode4Median                  ©drtamil@gmail.com 2012
Measures of Variability4Standard  deviation4Inter-quartiles4Skew ness & kurtosis                  ©drtamil@gmail.com 2012
Mean4 theaverage of the data collected4 To calculate the mean, add up the  observed values and divide by the  number of th...
Mean: Example412, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 584Total   of x = 6484n=    204M...
Measures of variation -                               standard deviation4   tells us how much all the scores in a dataset ...
©drtamil@gmail.com 2012
sd: Example                                x                      x4   12, 13, 17, 21, 24, 24,           (x-mean)^2       ...
Median4 the  ranked value that lies in the middle  of the data4 the point which has the property that half  the data are g...
Median:4 12, 13, 17, 21, 24, 24, 26, 27, 27, 30,  32, 35, 37, 38, 41, 43, 44, 46, 53, 584 (20+1)/2   = 10th which is 30, 1...
Measures of variation -                          quartiles4 The range is very susceptible to what are known as outliers4A ...
Quartiles4 12, 13, 17, 21, 24,  24, 26, 27, 27, 30,  32, 35, 37, 38, 41,  43, 44, 46, 53, 584 25th   percentile 24; (24+24...
Mode4 The   most frequent occurring number.  E.g. 3, 13, 13, 20, 22, 25: mode = 13.4 It is usually more informative to quo...
Mode: Example4 12,13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 584 Modes   are 24 (10%) & 27 (1...
Mean or Median?4 Which   measure of central tendency  should we use?4 if the distribution is normal, the mean+sd  will be ...
Not Normal distribution;   Normal distribution;Use Median & IQR           Use Mean+SD                                   ©d...
PresentationQualitative & Quantitative Data       Charts & Tables                          ©drtamil@gmail.com 2012
PresentationQualitative Data                   ©drtamil@gmail.com 2012
Graphing Categorical Data:                   Univariate Data                                Categorical Data              ...
Bar Chart          80                              69          60          40          20                                 ...
Pie ChartOthersChinese          Malay          ©drtamil@gmail.com 2012
Tabulating and Graphing        Bivariate Categorical Data4 Contingency       tables:Table 1: Contigency table of pregnancy...
Tabulating and Graphing     Bivariate Categorical Data                  1204 Side                  100 by                 ...
PresentationQuantitative Data                    ©drtamil@gmail.com 2012
Tabulating and Graphing                                  Numerical Data                                     Numerical Data...
Tabulating Numerical Data:           Frequency Distributions4 Sort raw data in ascending order:  12, 13, 17, 21, 24, 24, 2...
Frequency Distributions                              and Percentage Distributions                   Data in ordered array:...
Graphing Numerical Data:                                     The Histogram                                  Data in ordere...
Graphing Numerical Data:          The Frequency Polygon                  Data in ordered array:12, 13, 17, 21, 24, 24, 26,...
Calculate Measures of        Central Tendency & Spread4 We can use frequency distribution table to calculate; •   Mean •  ...
MeanX=   ∑ f .mp         n              Class       Midpoint   Freq   freq x m.p.4 Mean  = 659/20      10.0 - 19.9    14.9...
Standard deviation                                   2                     ( ∑ f .mp )     ∑ f .mp   2                   −...
Median  Class       Freq                  4   L1 +i *((n+1)/2) – f1                                                   fmed...
Mode=L1 +i *(Diff1/(Diff1+Diff2))                                  Class       Freq=19.95 + 10(3/(3+1))=27.45             ...
Graphing Bivariate Numerical          Data (Scatter Plot)                  ©drtamil@gmail.com 2012
Linear Regression Line             ©drtamil@gmail.com 2012
Survival Function             1.2             1.0              .8              .6              .4C S rvival um u          ...
Principles of Graphical                     Excellence4 Presents   data in a way that provides  substance, statistics and ...
Upcoming SlideShare
Loading in...5
×

Exploratory Data Analysis - Checking For Normality

3,532

Published on

Exploratory Data Analysis - Checking For Normality

5 Comments
4 Likes
Statistics
Notes
No Downloads
Views
Total Views
3,532
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
192
Comments
5
Likes
4
Embeds 0
No embeds

No notes for slide

Exploratory Data Analysis - Checking For Normality

  1. 1. FK6163Explore & Summarise Dr Azmi Mohd Tamil Dept of Community HealthUniversiti Kebangsaan Malaysia ©drtamil@gmail.com 2012
  2. 2. Introduction Method of Exploring and Summarising Data differsAccording to Types of Variables ©drtamil@gmail.com 2012
  3. 3. Dependent/Independent Independent VariablesFood Intake Frequency of Exercise Obesity Dependent Variable ©drtamil@gmail.com 2012
  4. 4. ©drtamil@gmail.com 2012
  5. 5. Explore4 Itis the first step in the analytic process4 to explore the characteristics of the data4 to screen for errors and correct them4 to look for distribution patterns - normal distribution or not4 May require transformation before further analysis using parametric methods4 Or may need analysis using non-parametric techniques ©drtamil@gmail.com 2012
  6. 6. Data Screening PARITY Frequency Percent4 By running Valid 1 67 30.7 frequencies, we may 2 44 20.2 3 36 16.5 detect inappropriate 4 22 10.1 responses 5 21 9.6 6 8 3.74 How many in the 7 3 1.4 audience have 15 8 7 3.2 children and 9 5 2.3 10 3 1.4 currently pregnant 11 1 .5 with the 16th? 15 1 .5 Total 218 100.0 ©drtamil@gmail.com 2012
  7. 7. Data Screening4 See whether the data make sense or not.4 E.g. Parity 10 but age only 25. ©drtamil@gmail.com 2012
  8. 8. ©drtamil@gmail.com 2012
  9. 9. ©drtamil@gmail.com 2012
  10. 10. Data Screening 4 By looking at measures of central tendency and range, we can also detect abnormal values for quantitative data Descriptive Statistics Std. N Minimum Maximum Mean DeviationPre-pregnancy weight 184 32 484 53.05 33.37Valid N (listwise) 184 ©drtamil@gmail.com 2012
  11. 11. Interpreting the Box Plot OutlierLargest non-outlier The whiskers extend to 1.5 times the box width from both endsUpper quartile of the box and ends at an observed value. Three times the boxMedian width marks the boundary between "mild" and "extreme"Lower quartile outliers. "mild" = closed dotsSmallest non-outlier Outlier"extreme"= open dots ©drtamil@gmail.com 2012
  12. 12. Data Screening 6004 We can also make 500 73 use of 400 graphical tools such 300 as the box 200 plot to detect 100 181 211 198 141 wrong 0 data entry N= 184 Pre-pregnancy weight ©drtamil@gmail.com 2012
  13. 13. Data Cleaning4 Identify the extreme/wrong values4 Check with original data source – i.e. questionnaire4 If incorrect, do the necessary correction.4 Correction must be done before transformation, recoding and analysis. ©drtamil@gmail.com 2012
  14. 14. Parameters of Data Distribution4 Mean – central value of data4 Standard deviation – measure of how the data scatter around the mean4 Symmetry (skewness) – the degree of the data pile up on one side of the mean4 Kurtosis – how far data scatter from the mean ©drtamil@gmail.com 2012
  15. 15. Normal distribution4 The Normal distribution is represented by a family of curves defined uniquely by two parameters, which are the mean and the standard deviation of the population.4 The curves are always symmetrically bell shaped, but the extent to which the bell is compressed or flattened out depends on the standard deviation of the population.4 However, the mere fact that a curve is bell shaped does not mean that it represents a Normal distribution, because other distributions may have a similar sort of shape. ©drtamil@gmail.com 2012
  16. 16. Normal distribution4 If the observations follow a 99.7% Normal distribution, a range 95.4% covered by one standard 68.3% deviation above the mean and one standard deviation below it includes about 68.3% of the observations;4 a range of two standard deviations above and two below (+ 2sd) about 95.4% of the observations; and4 of three standard deviations above and three below (+ 3sd) about 99.7% of the observations ©drtamil@gmail.com 2012
  17. 17. Normality4 Why bother with normality??4 Because it dictates the type of analysis that you can run on the data ©drtamil@gmail.com 2012
  18. 18. Normality-Why? ParametricQualitative Quantitative Normally distributed data Students t TestDichotomusQualitative Quantitative Normally distributed data ANOVAPolinomialQuantitative Quantitative Repeated measurement of the Paired t Test same individual & item (e.g. Hb level before & after treatment). Normally distributed dataQuantitative - Quantitative - Normally distributed data Pearson Correlationcontinous continous & Linear Regresssion ©drtamil@gmail.com 2012
  19. 19. Normality-Why? Non-parametricQualitative Quantitative Data not normally distributed Wilcoxon Rank SumDichotomus Test or U Mann- Whitney TestQualitative Quantitative Data not normally distributed Kruskal-Wallis OnePolinomial Way ANOVA TestQuantitative Quantitative Repeated measurement of the Wilcoxon Rank Sign same individual & item TestQuantitative - Quantitative - Data not normally distributed Spearman/Kendallcontinous/ordina continous Rank Correlationl ©drtamil@gmail.com 2012
  20. 20. Normality-How? 4 Explored statistically4 Explored graphically • Kolmogorov-Smirnov • Histogram statistic, with • Stem & Leaf Lilliefors significance • Box plot level and the • Normal probability Shapiro-Wilks plot statistic • Detrended normal • Skew ness (0) plot • Kurtosis (0) – + leptokurtic – 0 mesokurtik – - platykurtic ©drtamil@gmail.com 2012
  21. 21. Kolmogorov- Smirnov4 In the 1930’s, Andrei Nikolaevich Kolmogorov (1903-1987) and N.V. Smirnov (his student) came out with the approach for comparison of distributions that did not make use of parameters.4 This is known as the Kolmogorov- Smirnov test. ©drtamil@gmail.com 2012
  22. 22. Skew ness4 Skewed to the right indicates the presence of large extreme values4 Skewed to the left indicates the presence of small extreme values ©drtamil@gmail.com 2012
  23. 23. Kurtosis4 For symmetrical distribution only.4 Describes the shape of the curve4 Mesokurtic - average shaped4 Leptokurtic - narrow & slim4 Platikurtic - flat & wide ©drtamil@gmail.com 2012
  24. 24. Skew ness & Kurtosis4 Skew ness ranges from -3 to 3.4 Acceptable range for normality is skew ness lying between -1 to 1.4 Normality should not be based on skew ness alone; the kurtosis measures the “peak ness” of the bell-curve (see Fig. 4).4 Likewise, acceptable range for normality is kurtosis lying between -1 to 1. ©drtamil@gmail.com 2012
  25. 25. ©drtamil@gmail.com 2012
  26. 26. Normality - Examples Graphically605040302010 Std. Dev = 5.26 Mean = 151.60 N = 218.00 140.0 145.0 150.0 155.0 160.0 165.0 142.5 147.5 152.5 157.5 162.5 167.5 Height ©drtamil@gmail.com 2012
  27. 27. Q&Q Plot4 This plot compares the quintiles of a data distribution with the quintiles of a standardised theoretical distribution from a specified family of distributions (in this case, the normal distribution).4 If the distributional shapes differ, then the points will plot along a curve instead of a line.4 Take note that the interest here is the central portion of the line, severe deviations means non-normality. Deviations at the “ends” of the curve signifies the existence of outliers. ©drtamil@gmail.com 2012
  28. 28. Normality - Examples Graphically Normal Q-Q Plot of Height 3 2 1 0 Detrended Normal Q-Q Plot of HeightExpected Normal -1 .6 .5 -2 .4 -3 .3 130 140 150 160 170 .2 Observed Value Dev from Normal .1 0.0 -.1 -.2 130 140 150 160 170 Observed Value ©drtamil@gmail.com 2012
  29. 29. Normal distributionMean=median=mode ©drtamil@gmail.com 2012
  30. 30. Normality - Examples Statistically Descriptives Statistic Std. ErrorHeight Mean 151.65 .356 95% Confidence Lower Bound 150.94 Interval for Mean Upper Bound Normal distribution 152.35 Mean=median=mode 5% Trimmed Mean 151.59 Median 151.50 Variance 27.649 Skewness & kurtosis Std. Deviation 5.258 Minimum 139 within +1 Maximum 168 Range 29 Interquartile Range 8.00 p > 0.05, so normal Skewness .148 .165 distribution Kurtosis .061 .328 Tests of Normality a Kolmogorov-Smirnov Shapiro-Wilks; only if Statistic df Sig. sample size less than 100. Height .060 218 .052 a. Lilliefors Significance Correction ©drtamil@gmail.com 2012
  31. 31. K-S Test©drtamil@gmail.com 2012
  32. 32. K-S Test4 very sensitive to the sample sizes of the data.4 For small samples (n<20, say), the likelihood of getting p<0.05 is low4 for large samples (n>100), a slight deviation from normality will result in being reported as abnormal distribution ©drtamil@gmail.com 2012
  33. 33. Guide to deciding on normality ©drtamil@gmail.com 2012
  34. 34. Normality Transformation Normal Q-Q Plot of PARITY Normal Q-Q Plot of PARITY 33 22 11 Normal Q-Q Plot of LN_PARIT Normal Q-Q Plot of LN_PARIT 00 3 3Expected NormalExpected Normal -1 -1 2 2 -2 -2 00 22 44 66 88 10 10 12 12 14 14 16 16 Observed Value Observed Value 1 1 0 0 Expected Normal Expected Normal -1 -1 -2 -2 -.5 -.5 0.0 0.0 .5 .5 1.0 1.0 1.5 1.5 2.0 2.0 2.5 2.5 3.0 3.0 Observed Value Observed Value ©drtamil@gmail.com 2012
  35. 35. TYPES OF TRANSFORMATIONS Square root Logarithm InverseReflect and square Reflect and logarithm Reflect and inverseroot ©drtamil@gmail.com 2012
  36. 36. Summarise4 Summarise a large set of data by a few meaningful numbers.4 Single variable analysis • For the purpose of describing the data • Example; in one year, what kind of cases are treated by the Psychiatric Dept? • Tables & diagrams are usually used to describe the data • For numerical data, measures of central tendency & spread is usually used ©drtamil@gmail.com 2012
  37. 37. Frequency Table Race F % Malay 760 95.84% Chinese 5 0.63% Indian 0 0.00% Others 28 3.53% TOTAL 793 100.00%•Illustrates the frequency observed for eachcategory ©drtamil@gmail.com 2012
  38. 38. Frequency Distribution Table• > 20 observations, best Umur Bil %presented as a frequency 0-0.99 25 3.26% 1-4.99 78 10.18%distribution table. 5-14.99 140 18.28%•Columns divided into class & 15-24.99 126 16.45% 25-34.99 112 14.62%frequency. 35-44.99 90 11.75%•Mod class can be determined 45-54.99 66 8.62% 55-64.99 60 7.83%using such tables. 65-74.99 50 6.53% 75-84.99 16 2.09% 85+ 3 0.39% JUMLAH 766 ©drtamil@gmail.com 2012
  39. 39. Measurement of Central Tendency & Spread ©drtamil@gmail.com 2012
  40. 40. Measures of Central Tendency4Mean4Mode4Median ©drtamil@gmail.com 2012
  41. 41. Measures of Variability4Standard deviation4Inter-quartiles4Skew ness & kurtosis ©drtamil@gmail.com 2012
  42. 42. Mean4 theaverage of the data collected4 To calculate the mean, add up the observed values and divide by the number of them.4A major disadvantage of the mean is that it is sensitive to outlying points ©drtamil@gmail.com 2012
  43. 43. Mean: Example412, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 584Total of x = 6484n= 204Mean = 648/20 = 32.4 ©drtamil@gmail.com 2012
  44. 44. Measures of variation - standard deviation4 tells us how much all the scores in a dataset cluster around the mean. A large S.D. is indicative of a more varied data scores.4 a summary measure of the differences of each observation from the mean.4 If the differences themselves were added up, the positive would exactly balance the negative and so their sum would be zero.4 Consequently the squares of the differences are added. ©drtamil@gmail.com 2012
  45. 45. ©drtamil@gmail.com 2012
  46. 46. sd: Example x x4 12, 13, 17, 21, 24, 24, (x-mean)^2 (x-mean)^2 12 416.16 32 0.16 26, 27, 27, 30, 32, 35, 13 376.36 35 6.76 37, 38, 41, 43, 44, 46, 17 237.16 37 21.16 53, 58 21 129.96 38 31.36 24 70.56 41 73.964 Mean = 32.4; n = 20 24 70.56 43 112.364 Total of(x-mean)2 26 40.96 44 134.56 = 3050.8 27 29.16 46 184.96 27 29.16 53 424.364 Variance = 3050.8/19 30 5.76 58 655.36 = 160.5684 TOTAL 1405.8 TOTAL 16454 sd = 160.56840.5=12.67 ©drtamil@gmail.com 2012
  47. 47. Median4 the ranked value that lies in the middle of the data4 the point which has the property that half the data are greater than it, and half the data are less than it.4 if n is even, average the n/2th largest and the n/2 + 1th largest observations4 "robust" to outliers ©drtamil@gmail.com 2012
  48. 48. Median:4 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 584 (20+1)/2 = 10th which is 30, 11th is 324 Therefore median is (30 + 32)/2 = 31 ©drtamil@gmail.com 2012
  49. 49. Measures of variation - quartiles4 The range is very susceptible to what are known as outliers4A more robust approach is to divide the distribution of the data into four, and find the points below which are 25%, 50% and 75% of the distribution. These are known as quartiles, and the median is the second quartile. ©drtamil@gmail.com 2012
  50. 50. Quartiles4 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 584 25th percentile 24; (24+24)/24 50th percentile 31; (30+32)/2 ; = median4 75th percentile 42.5; (41+43)/2 ©drtamil@gmail.com 2012
  51. 51. Mode4 The most frequent occurring number. E.g. 3, 13, 13, 20, 22, 25: mode = 13.4 It is usually more informative to quote the mode accompanied by the percentage of times it happened; e.g., the mode is 13 with 33% of the occurrences. ©drtamil@gmail.com 2012
  52. 52. Mode: Example4 12,13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 584 Modes are 24 (10%) & 27 (10%) ©drtamil@gmail.com 2012
  53. 53. Mean or Median?4 Which measure of central tendency should we use?4 if the distribution is normal, the mean+sd will be the measure to be presented, otherwise the median+IQR should be more appropriate. ©drtamil@gmail.com 2012
  54. 54. Not Normal distribution; Normal distribution;Use Median & IQR Use Mean+SD ©drtamil@gmail.com 2012
  55. 55. PresentationQualitative & Quantitative Data Charts & Tables ©drtamil@gmail.com 2012
  56. 56. PresentationQualitative Data ©drtamil@gmail.com 2012
  57. 57. Graphing Categorical Data: Univariate Data Categorical Data Graphing Data Tabulating DataThe Summary Table Pie Charts CD S avings B onds Bar Charts Pareto Diagram S toc ks 45 120 40 0 10 20 30 40 50 100 35 30 80 25 60 20 15 40 10 20 5 0 0 S toc ks B onds S avings CD ©drtamil@gmail.com 2012
  58. 58. Bar Chart 80 69 60 40 20 20Percent 11 0 Housew ife Office w ork Field w ork Type of work ©drtamil@gmail.com 2012
  59. 59. Pie ChartOthersChinese Malay ©drtamil@gmail.com 2012
  60. 60. Tabulating and Graphing Bivariate Categorical Data4 Contingency tables:Table 1: Contigency table of pregnancy induced hypertension and SGACount SGA Normal SGA TotalPregnancy induced No 103 94 197hypertension Yes 5 16 21Total 108 110 218 ©drtamil@gmail.com 2012
  61. 61. Tabulating and Graphing Bivariate Categorical Data 1204 Side 100 by 103 94 side 80 charts 60 40 SGA 20 Normal Count 16 0 SGA No Yes Pregnancy induced hypertension ©drtamil@gmail.com 2012
  62. 62. PresentationQuantitative Data ©drtamil@gmail.com 2012
  63. 63. Tabulating and Graphing Numerical Data Numerical Data 41, 24, 32, 26, 27, 27, 30, 24, 38, 21 Frequency Distributions Ordered Array Ogive21, 24, 24, 26, 27, 27, 30, 32, 38, 41 Cumulative Distributions 120 100 80 60 40 20 0 2 144677 Area 10 20 30 40 50 60 Stem and Leaf Histograms 3 028 Display 7 6 4 1 5 4 Tables 3 2 1 Polygons 0 10 20 30 40 50 60 ©drtamil@gmail.com 2012
  64. 64. Tabulating Numerical Data: Frequency Distributions4 Sort raw data in ascending order: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 584 Find range: 58 - 12 = 464 Select number of classes: 5 (usually between 5 and 15)4 Compute class interval (width): 10 (46/5 then round up)4 Determine class boundaries (limits): 10, 20, 30, 40, 50, 604 Compute class midpoints: 14.95, 24.95, 34.95, 44.95, 54.954 Count observations & assign to classes ©drtamil@gmail.com 2012
  65. 65. Frequency Distributions and Percentage Distributions Data in ordered array: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 Class Midpoint Freq %10.0 - 19.9 14.95 3 15%20.0 - 29.9 24.95 6 30%30.0 - 39.9 34.95 5 25%40.0 - 49.9 44.95 4 20%50.0 - 59.9 54.95 2 10% TOTAL 20 100% ©drtamil@gmail.com 2012
  66. 66. Graphing Numerical Data: The Histogram Data in ordered array: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 7 6 6 5 5Frequency 4 4 3 No Gaps 3 Between 2 2 Bars 1 0 14.95 24.95 34.95 44.95 54.95 Age Class Boundaries Class Midpoints ©drtamil@gmail.com 2012
  67. 67. Graphing Numerical Data: The Frequency Polygon Data in ordered array:12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 5876543210 14.95 24.95 34.95 44.95 54.95 Class Midpoints ©drtamil@gmail.com 2012
  68. 68. Calculate Measures of Central Tendency & Spread4 We can use frequency distribution table to calculate; • Mean • Standard Deviation • Median • Mode ©drtamil@gmail.com 2012
  69. 69. MeanX= ∑ f .mp n Class Midpoint Freq freq x m.p.4 Mean = 659/20 10.0 - 19.9 14.95 3 44.85 = 32.95 20.0 - 29.9 24.95 6 149.704 Compare with 32.4 30.0 - 39.9 34.95 5 174.75 from direct 40.0 - 49.9 44.95 4 179.80 calculation. 50.0 - 59.9 54.95 2 109.90 TOTAL 20 659.00 ©drtamil@gmail.com 2012
  70. 70. Standard deviation 2 ( ∑ f .mp ) ∑ f .mp 2 − ns= Mid n −1 Class Point Freq f.m.p. f.mp^2 14.95 3 44.85s2=((24634.05-(6592/20))/19) 10.0 - 19.9 670.51s2=2920.05/19 20.0 - 29.9 24.95 6 149.70 3735.02s2=153.69 30.0 - 39.9 34.95 5 174.75 6107.51s = 12.4 40.0 - 49.9 44.95 4 179.80 8082.014 Compare with 12.67 from direct measurement. 50.0 - 59.9 54.95 2 109.90 6039.01 TOTAL 20 659.00 24634.05 ©drtamil@gmail.com 2012
  71. 71. Median Class Freq 4 L1 +i *((n+1)/2) – f1 fmed10.0 - 19.9 3 4 f1 = cumulative freq above median class20.0 - 29.9 6 4 29.95 + 10((21/2)-9)30.0 - 39.9 5 median class 540.0 - 49.9 4 4 29.95 + 15/5 = 32.95 4 From direct calculation,50.0 - 59.9 2 median = 31 TOTAL 20 ©drtamil@gmail.com 2012
  72. 72. Mode=L1 +i *(Diff1/(Diff1+Diff2)) Class Freq=19.95 + 10(3/(3+1))=27.45 10.0 - 19.9 3 20.0 - 29.9 6 mode class4 Compare with 30.0 - 39.9 5 modes of 24 & 27 40.0 - 49.9 4 from direct 50.0 - 59.9 2 calculation. TOTAL 20 ©drtamil@gmail.com 2012
  73. 73. Graphing Bivariate Numerical Data (Scatter Plot) ©drtamil@gmail.com 2012
  74. 74. Linear Regression Line ©drtamil@gmail.com 2012
  75. 75. Survival Function 1.2 1.0 .8 .6 .4C S rvival um u .2 Survival Function 0.0 Censored 0 1 2 3 4 5 6 7 DURATION ©drtamil@gmail.com 2012
  76. 76. Principles of Graphical Excellence4 Presents data in a way that provides substance, statistics and design4 Communicates complex ideas with clarity, precision and efficiency4 Gives the largest number of ideas in the most efficient manner4 Almost always involves several dimensions4 Tells the truth about the data ©drtamil@gmail.com 2012
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×