• Like
  • Save
Introductory Statistics
Upcoming SlideShare
Loading in...5
×
 

Introductory Statistics

on

  • 1,398 views

Presentation on introductory statistics

Presentation on introductory statistics

Statistics

Views

Total Views
1,398
Views on SlideShare
1,204
Embed Views
194

Actions

Likes
0
Downloads
49
Comments
0

4 Embeds 194

http://voicereason.com 190
url_unknown 2
http://cursos.itesm.mx 1
http://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Introductory Statistics Introductory Statistics Presentation Transcript

    • Introductory Statistics AvMed Health Plans Brian Wells, MSM, MPH
      • “ It’s all about observing what you see.”
          • Yogi Berra, Great American philosopher
      Intro to Statistics Wells 2006
    • Statistics – What’s It All About?
      • The hunt for the truth.
      • Finding relationships and causality
      • Separating the wheat from the chaff [lots of chaff out there!]
      • A process of critical thinking and an analytic approach to the literature and research
      • It can help you avoid contracting the dreaded “data rich but information poor” syndrome!
      Wells 2006 Fein 2003
    • Statistics and the “Truth”
      • Can we ever know the truth?
      • Statistics is a way of telling us the likelihood that we have arrived at the truth of a matter [or not!].
      Wells 2006 Fein 2003
    • Statistics – Based on Probability and Multiple Assumptions
      • An approach to searching for the truth which recognizes that there is little, if anything, which is concrete or dichotomous
      • Employs quantitative concepts like “confidence”, “reliability”, and “significance” to get at the “truth”
      Wells 2006 Fein 2003
    • Statistics & the “Truth”
      • Two kinds of statistics:
        • Descriptive – describes “what is”
        • Experimental – makes a point – tries to “prove” something
      • Problem: Almost impossible to prove something, but much easier to disprove
      • Thus: Null hypothesis H o i.e., there is no difference
      Wells 2006
    • Types of Statistics
      • Descriptive: describe / communicate what you see without any attempts at generalizing beyond the sample at hand
      • Inferential: determine the likelihood that observed results:
        • Can be generalized to larger populations
        • Occurred by chance rather than as a result of known, specified interventions [ H o ]
      Wells 2006 Fein 2003
    • The Null Hypothesis
      • A hypothesis which is tested for possible rejection under the assumption that it is true (usually that observations are the result of chance).
      • Experimental stats works to disprove the null hypothesis, to show that the null hypothesis is wrong, that a difference exists. [e.g., glucose levels and diabetics] In other words, that you have found or discovered something new.
      • The null hypothesis usually represents the opposite of what the researcher may believe to be true.
      Wells 2006 Fein 2003
    • Normality
      • When variability in data points is due to the sum of numerous independent sources, with no one source dominating, the result should be a normal, or Gaussian distribution (named for Karl Friedrich Gauss).
      • Note: technically, true Gaussian distributions do not occur in nature: Gaussian distributions extend infinitely in both directions. Bell shaped curves are the norm for biological data, with end points to the right and left of the mean.
      • Bell curve vs. normal distribution vs. Gaussian distribution
      • Normal distribution is unfortunately named because it encourages the fallacy that many or all probability distributions are "normal".
      Wells 2006
    • Why Is the Normal Distribution Important?
      • Provides the fundamental mathematical substrate permitting the use and calculation of most statistical analyses.
      • The ND represents one of the empirically verified elementary “truths about the general nature of reality” and its status can be compared with one of the fundamental laws of natural sciences. The exact shape of the ND is defined by a function which has only two parameters: mean and standard deviation.
      Wells 2006 Fein 2003
    • Gaussian Distribution Wells 2006
    • Why Examine the Data?
      • Means are the most commonly used measures of populations
        • Amenable to mathematical manipulation
        • Handy measure of central tendency
      • BUT – can be misleading
      • Examine the curve, median, mode, range, and outliers
      • Look for bi- or multi-modal distributions
      Wells 2006 Fein 2003
    • Beware small samples!
      • Fancy statistical tests can bury the truth here!
      • Conversely, not finding any difference means nothing
      • Normality testing is pointless when there are less than 20-30 data points – can be misleading
      Wells 2006 Fein 2003
    • Variability of Measurements
      • Unbiasedness: tendency to arrive at the true or correct value
      • Precision: degree of spread of series of observations [repeatability] [also referred to as reliability] Can also refer to # of decimal points – can be misleading
      • Accuracy: encompasses both unbiasedness and precision. Accurate measurements are both unbiased and precise. Inaccurate measurements may be either biased or imprecise or both.
      Wells 2006 Fein 2003
    • The Mean
      •  vs X
      • Potentially dangerous and misleading value
      •  = true population mean
      • Xbar = mean of sample
      Wells 2006 Fein 2003
    • Standard Deviation
      • SD is a measure of scatter or dispersion of data around the mean
      • ~68% of values fall within 1 SD of the mean [+/- one Z]
      • ~95% of values fall within 2 SD of the mean, with ~2.5% in each tail.
      Wells 2006
    • Single Sample Means
      • The mean value you calculate from your sample of data points depends on which values you happened to sample and is unlikely to equal the true population mean exactly.
      Wells 2006 Fein 2003
    • Confidence Intervals
      • A confidence interval gives an estimated range of values which is likely to include an unknown population parameter, the estimated range being calculated from a given set of sample data.
      • If independent samples are taken repeatedly from the same population, and a confidence interval calculated for each sample, then a certain percentage (confidence level) of the intervals will include the unknown population parameter. Confidence intervals are usually calculated so that this percentage is 95%, but we can produce 90%, 99%, 99.9% (or whatever) confidence intervals for the unknown parameter.
      Wells 2006
    • Confidence Intervals
      • The width of the confidence interval gives us some idea about how uncertain we are about the unknown parameter (see precision). A very wide interval may indicate that more data should be collected before anything very definite can be said about the parameter.
      • Confidence intervals are more informative than the simple results of hypothesis tests (where we decide "reject Ho" or "don't reject Ho") since they provide a range of plausible values for the unknown parameter.
      Wells 2006
    • Errors of Analysis/Detection
      • Type 1 [alpha] error – finding a significant difference when none really exists - usually due to random error
      • Type 2 [beta] error – not finding a difference when there is in fact a difference – more likely to occur with smaller sample sizes.
      • Chosen significance levels will impact both of these
      Wells 2006
    • Errors of Analysis Wells 2006 TRUE STATE DECISION Null Hypothesis is Correct Null Hypothesis is wrong   Reject Null Hypothesis as Incorrect – i.e., find a difference   Type I ERROR probability =       CORRECT probability = 1-   "power" Accept Null Hypothesis – decide that there are no effects or differences CORRECT probability = 1 -     Type II ERROR probability =  
    • Statistical Significance
      • The probability that the findings are due to chance alone
      • p<.05 – less than a 5% likelihood of findings being due to chance
      • P<.05 is commonly used but arbitrary
      Wells 2006 Fein 2003
    • Two-Sample t -Test for Means (unequal variance)
      • Used to determine if two population means are equal
      • H o :  1 =  2
      • H a :  1 =  2
      • Test Statistic:
      • DF:
      • Reject H o if:
      Wells 2006
    • Two-Sample t- Test
      • The E&M database uses the Smith-Satterwaithe test (also known as the Welch test) for unequal variance
      Wells 2006
    • Critical Values
      • Assuming that alpha = 0.05 (this is a standard measure), we can say the following:
      • lim υ => ∞ t (α/2, υ) = 1.96
      • So, for υ > 100, T-Critical = 1.96
      • For all other values, a t- Value table is needed
      Wells 2006
    • Chart of Critical Values Wells 2006
    • What does it mean?
      • Rejecting H o with a positive T-value means that it is likely that the provider has a higher mean billing level at the 95% confidence level
      • Again, it’s probability – this does not mean we know for sure that the provider’s mean billing level is higher than his/her peers
      • Result can be skewed if underlying data is sufficiently non-Gaussian
      Wells 2006
    • Database Recommendations and Limitations
      • Test for equivariance using F-distribution
      • Add test for normality to database (e.g. Kolmogorov-Smirnov, Shapiro-Wilk)
      • Use non-parametric test or transformation when data is non-Gaussian
      • Can not discriminate between a physician billing higher levels because of upcoding and a physician simply seeing sicker patients than his/her peers
      Wells 2006
    • Thank you! Wells 2006