Introductory Statistics

1,668 views
1,600 views

Published on

Presentation on introductory statistics

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,668
On SlideShare
0
From Embeds
0
Number of Embeds
290
Actions
Shares
0
Downloads
65
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Introductory Statistics

  1. 1. Introductory Statistics AvMed Health Plans Brian Wells, MSM, MPH
  2. 2. <ul><li>“ It’s all about observing what you see.” </li></ul><ul><ul><ul><li>Yogi Berra, Great American philosopher </li></ul></ul></ul>Intro to Statistics Wells 2006
  3. 3. Statistics – What’s It All About? <ul><li>The hunt for the truth. </li></ul><ul><li>Finding relationships and causality </li></ul><ul><li>Separating the wheat from the chaff [lots of chaff out there!] </li></ul><ul><li>A process of critical thinking and an analytic approach to the literature and research </li></ul><ul><li>It can help you avoid contracting the dreaded “data rich but information poor” syndrome! </li></ul>Wells 2006 Fein 2003
  4. 4. Statistics and the “Truth” <ul><li>Can we ever know the truth? </li></ul><ul><li>Statistics is a way of telling us the likelihood that we have arrived at the truth of a matter [or not!]. </li></ul>Wells 2006 Fein 2003
  5. 5. Statistics – Based on Probability and Multiple Assumptions <ul><li>An approach to searching for the truth which recognizes that there is little, if anything, which is concrete or dichotomous </li></ul><ul><li>Employs quantitative concepts like “confidence”, “reliability”, and “significance” to get at the “truth” </li></ul>Wells 2006 Fein 2003
  6. 6. Statistics & the “Truth” <ul><li>Two kinds of statistics: </li></ul><ul><ul><li>Descriptive – describes “what is” </li></ul></ul><ul><ul><li>Experimental – makes a point – tries to “prove” something </li></ul></ul><ul><li>Problem: Almost impossible to prove something, but much easier to disprove </li></ul><ul><li>Thus: Null hypothesis H o i.e., there is no difference </li></ul>Wells 2006
  7. 7. Types of Statistics <ul><li>Descriptive: describe / communicate what you see without any attempts at generalizing beyond the sample at hand </li></ul><ul><li>Inferential: determine the likelihood that observed results: </li></ul><ul><ul><li>Can be generalized to larger populations </li></ul></ul><ul><ul><li>Occurred by chance rather than as a result of known, specified interventions [ H o ] </li></ul></ul>Wells 2006 Fein 2003
  8. 8. The Null Hypothesis <ul><li>A hypothesis which is tested for possible rejection under the assumption that it is true (usually that observations are the result of chance). </li></ul><ul><li>Experimental stats works to disprove the null hypothesis, to show that the null hypothesis is wrong, that a difference exists. [e.g., glucose levels and diabetics] In other words, that you have found or discovered something new. </li></ul><ul><li>The null hypothesis usually represents the opposite of what the researcher may believe to be true. </li></ul>Wells 2006 Fein 2003
  9. 9. Normality <ul><li>When variability in data points is due to the sum of numerous independent sources, with no one source dominating, the result should be a normal, or Gaussian distribution (named for Karl Friedrich Gauss). </li></ul><ul><li>Note: technically, true Gaussian distributions do not occur in nature: Gaussian distributions extend infinitely in both directions. Bell shaped curves are the norm for biological data, with end points to the right and left of the mean. </li></ul><ul><li>Bell curve vs. normal distribution vs. Gaussian distribution </li></ul><ul><li>Normal distribution is unfortunately named because it encourages the fallacy that many or all probability distributions are &quot;normal&quot;. </li></ul>Wells 2006
  10. 10. Why Is the Normal Distribution Important? <ul><li>Provides the fundamental mathematical substrate permitting the use and calculation of most statistical analyses. </li></ul><ul><li>The ND represents one of the empirically verified elementary “truths about the general nature of reality” and its status can be compared with one of the fundamental laws of natural sciences. The exact shape of the ND is defined by a function which has only two parameters: mean and standard deviation. </li></ul>Wells 2006 Fein 2003
  11. 11. Gaussian Distribution Wells 2006
  12. 12. Why Examine the Data? <ul><li>Means are the most commonly used measures of populations </li></ul><ul><ul><li>Amenable to mathematical manipulation </li></ul></ul><ul><ul><li>Handy measure of central tendency </li></ul></ul><ul><li>BUT – can be misleading </li></ul><ul><li>Examine the curve, median, mode, range, and outliers </li></ul><ul><li>Look for bi- or multi-modal distributions </li></ul>Wells 2006 Fein 2003
  13. 13. Beware small samples! <ul><li>Fancy statistical tests can bury the truth here! </li></ul><ul><li>Conversely, not finding any difference means nothing </li></ul><ul><li>Normality testing is pointless when there are less than 20-30 data points – can be misleading </li></ul>Wells 2006 Fein 2003
  14. 14. Variability of Measurements <ul><li>Unbiasedness: tendency to arrive at the true or correct value </li></ul><ul><li>Precision: degree of spread of series of observations [repeatability] [also referred to as reliability] Can also refer to # of decimal points – can be misleading </li></ul><ul><li>Accuracy: encompasses both unbiasedness and precision. Accurate measurements are both unbiased and precise. Inaccurate measurements may be either biased or imprecise or both. </li></ul>Wells 2006 Fein 2003
  15. 15. The Mean <ul><li> vs X </li></ul><ul><li>Potentially dangerous and misleading value </li></ul><ul><li> = true population mean </li></ul><ul><li>Xbar = mean of sample </li></ul>Wells 2006 Fein 2003
  16. 16. Standard Deviation <ul><li>SD is a measure of scatter or dispersion of data around the mean </li></ul><ul><li>~68% of values fall within 1 SD of the mean [+/- one Z] </li></ul><ul><li>~95% of values fall within 2 SD of the mean, with ~2.5% in each tail. </li></ul>Wells 2006
  17. 17. Single Sample Means <ul><li>The mean value you calculate from your sample of data points depends on which values you happened to sample and is unlikely to equal the true population mean exactly. </li></ul>Wells 2006 Fein 2003
  18. 18. Confidence Intervals <ul><li>A confidence interval gives an estimated range of values which is likely to include an unknown population parameter, the estimated range being calculated from a given set of sample data. </li></ul><ul><li>If independent samples are taken repeatedly from the same population, and a confidence interval calculated for each sample, then a certain percentage (confidence level) of the intervals will include the unknown population parameter. Confidence intervals are usually calculated so that this percentage is 95%, but we can produce 90%, 99%, 99.9% (or whatever) confidence intervals for the unknown parameter. </li></ul>Wells 2006
  19. 19. Confidence Intervals <ul><li>The width of the confidence interval gives us some idea about how uncertain we are about the unknown parameter (see precision). A very wide interval may indicate that more data should be collected before anything very definite can be said about the parameter. </li></ul><ul><li>Confidence intervals are more informative than the simple results of hypothesis tests (where we decide &quot;reject Ho&quot; or &quot;don't reject Ho&quot;) since they provide a range of plausible values for the unknown parameter. </li></ul>Wells 2006
  20. 20. Errors of Analysis/Detection <ul><li>Type 1 [alpha] error – finding a significant difference when none really exists - usually due to random error </li></ul><ul><li>Type 2 [beta] error – not finding a difference when there is in fact a difference – more likely to occur with smaller sample sizes. </li></ul><ul><li>Chosen significance levels will impact both of these </li></ul>Wells 2006
  21. 21. Errors of Analysis Wells 2006 TRUE STATE DECISION Null Hypothesis is Correct Null Hypothesis is wrong   Reject Null Hypothesis as Incorrect – i.e., find a difference   Type I ERROR probability =       CORRECT probability = 1-   &quot;power&quot; Accept Null Hypothesis – decide that there are no effects or differences CORRECT probability = 1 -     Type II ERROR probability =  
  22. 22. Statistical Significance <ul><li>The probability that the findings are due to chance alone </li></ul><ul><li>p<.05 – less than a 5% likelihood of findings being due to chance </li></ul><ul><li>P<.05 is commonly used but arbitrary </li></ul>Wells 2006 Fein 2003
  23. 23. Two-Sample t -Test for Means (unequal variance) <ul><li>Used to determine if two population means are equal </li></ul><ul><li>H o :  1 =  2 </li></ul><ul><li>H a :  1 =  2 </li></ul><ul><li>Test Statistic: </li></ul><ul><li>DF: </li></ul><ul><li>Reject H o if: </li></ul>Wells 2006
  24. 24. Two-Sample t- Test <ul><li>The E&M database uses the Smith-Satterwaithe test (also known as the Welch test) for unequal variance </li></ul>Wells 2006
  25. 25. Critical Values <ul><li>Assuming that alpha = 0.05 (this is a standard measure), we can say the following: </li></ul><ul><li>lim υ => ∞ t (α/2, υ) = 1.96 </li></ul><ul><li>So, for υ > 100, T-Critical = 1.96 </li></ul><ul><li>For all other values, a t- Value table is needed </li></ul>Wells 2006
  26. 26. Chart of Critical Values Wells 2006
  27. 27. What does it mean? <ul><li>Rejecting H o with a positive T-value means that it is likely that the provider has a higher mean billing level at the 95% confidence level </li></ul><ul><li>Again, it’s probability – this does not mean we know for sure that the provider’s mean billing level is higher than his/her peers </li></ul><ul><li>Result can be skewed if underlying data is sufficiently non-Gaussian </li></ul>Wells 2006
  28. 28. Database Recommendations and Limitations <ul><li>Test for equivariance using F-distribution </li></ul><ul><li>Add test for normality to database (e.g. Kolmogorov-Smirnov, Shapiro-Wilk) </li></ul><ul><li>Use non-parametric test or transformation when data is non-Gaussian </li></ul><ul><li>Can not discriminate between a physician billing higher levels because of upcoding and a physician simply seeing sicker patients than his/her peers </li></ul>Wells 2006
  29. 29. Thank you! Wells 2006

×