Practical Applications of Statistical Methods in the Clinical Laboratory Roger L. Bertholf, Ph.D., DABCC Associate Professor of Pathology Director of Clinical Chemistry & Toxicology UF Health Science Center/Jacksonville
“ [Statistics are] the only tools by which an opening can be cut through the formidable thicket of difficulties that bars the path of those who pursue the Science of Man.” [Sir] Francis Galton (1822-1911)
“ There are three kinds of lies:  Lies, damned lies, and statistics” Benjamin Disraeli (1804-1881)
What are statistics, and what are they used for? Descriptive statistics are used to characterize data Statistical analysis is used to distinguish between random and meaningful variations In the laboratory, we use statistics to monitor and verify method performance, and interpret the results of clinical laboratory tests
“ Do not worry about your difficulties in mathematics, I assure you that mine are greater” Albert Einstein (1879-1955)
“ I don't believe in mathematics” Albert Einstein
Summation function
Product function
The Mean (average) The mean is a measure of the centrality of a set of data.
Mean (arithmetical)
Mean (geometric)
Use of the Geometric mean: The geometric mean is primarily used to average ratios or rates of change.
Mean (harmonic)
Example of the use of Harmonic mean: Suppose you spend $6 on pills costing 30 cents per dozen, and $6 on pills costing 20 cents per dozen.  What was the average price of the pills you bought?
Example of the use of Harmonic mean: You spent $12 on 50 dozen pills, so the average cost is 12/50=0.24, or 24 cents. This also happens to be the harmonic mean of 20 and 30:
Root mean square (RMS)
For the data set: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10:
The Weighted Mean
Other measures of centrality Mode
The Mode The mode is the value that occurs most often
Other measures of centrality Mode Midrange
The Midrange The midrange is the mean of the highest and lowest values
Other measures of centrality Mode Midrange Median
The Median The median is the value for which half of the remaining values are above and half are below it.  I.e. , in an ordered array of 15 values, the 8th value is the median.  If the array has 16 values, the median is the mean of the 8th and 9th values.
Example of the use of median vs. mean: Suppose you’re thinking about building a house in a certain neighborhood, and the real estate agent tells you that the average (mean) size house in that area is 2,500 sq. ft.  Astutely, you ask “What’s the median size?”  The agent replies “1,800 sq. ft.” What does this tell you about the sizes of the houses in the neighborhood?
Measuring variance Two sets of data may have similar means, but otherwise be very dissimilar.  For example, males and females have similar baseline LH concentrations, but there is much wider variation in females. How do we express quantitatively the amount of variation in a data set?
 
The Variance
The Variance The variance is the mean of the squared differences between individual data points and the mean of the array. Or, after simplifying, the mean of the squares minus the squared mean.
The Variance
The Variance In what units is the variance? Is that a problem?
The Standard Deviation
The Standard Deviation The standard deviation is the square root of the variance.  Standard deviation is  not  the mean difference between individual data points and the mean of the array.
The Standard Deviation In what units is the standard deviation? Is that a problem?
The Coefficient of Variation * * Sometimes called the Relative Standard Deviation (RSD or %RSD)
Standard Deviation (or Error) of the Mean The standard deviation of an average decreases by the reciprocal of the square root of the number of data points used to calculate the average.
Exercises How many measurements must we average to improve our precision by a factor of 2?
Answer To improve precision by a factor of 2:
Exercises How many measurements must we average to improve our precision by a factor of 2? How many to improve our precision by a factor of 10?
Answer To improve precision by a factor of 10:
Exercises How many measurements must we average to improve our precision by a factor of 2? How many to improve our precision by a factor of 10? If an assay has a CV of 7%, and we decide run samples in duplicate and average the measurements, what should the resulting CV be?
Answer Improvement in CV by running duplicates:
Population vs. Sample standard deviation When we speak of a  population , we’re referring to the entire data set, which will have a mean   :
Population vs. Sample standard deviation When we speak of a  population , we’re referring to the entire data set, which will have a mean   When we speak of a  sample , we’re referring to a subset of the population, customarily designated “x-bar” Which is used to calculate the standard deviation?
“ Sir, I have found you an argument. I am not obliged to find you an understanding.” Samuel Johnson (1709-1784)
Population vs. Sample standard deviation
Distributions Definition
Statistical (probability) Distribution A statistical distribution is a mathematically-derived probability function that can be used to predict the characteristics of certain applicable real populations Statistical methods based on probability distributions are  parametric , since certain assumptions are made about the data
Distributions Definition Examples
Binomial distribution The  binomial  distribution applies to events that have  two possible outcomes .  The probability of  r  successes in  n  attempts, when the probability of success in any individual attempt is  p , is given by:
Example What is the probability that 10 of the 12 babies born one busy evening in your hospital will be girls?
Solution
Distributions Definition Examples Binomial
“ God does arithmetic” Karl Friedrich Gauss (1777-1855)
The Gaussian Distribution What is the Gaussian distribution?
63 81 36 12 28 7 79 52 96 17 22 4 61 85 etc.
 
63 81 36 12 28 7 79 52 96 17 22 4 61 85 22 73 54 33 99 5 61 28 58 24 16 77 43 8 + 85 152 90 45 127 12 140 70 154 41 38 81 104 93 =
 
.  .  .  etc.
Probability x
The Gaussian Probability Function The probability of  x  in a Gaussian distribution with mean     and standard deviation    is given by:
The Gaussian Distribution What is the Gaussian distribution? What types of data fit a Gaussian distribution?
“ Like the ski resort full of girls hunting for husbands and husbands hunting for girls, the situation is not as symmetrical as it might seem.” Alan Lindsay Mackay (1926- )
Are these Gaussian? Human height Outside temperature Raindrop size Blood glucose concentration Serum CK activity QC results Proficiency results
The Gaussian Distribution What is the Gaussian distribution? What types of data fit a Gaussian distribution? What is the advantage of using a Gaussian distribution?
Gaussian probability distribution Probability µ µ+  µ+2  µ+3  µ-  µ-2  µ-3  .67 .95
What are the odds of an observation . . . more than 1   from the mean (+/-) more than 2     greater  than the mean more than 3    from the mean
Some useful Gaussian probabilities Range Probability Odds +/- 1.00   68.3% 1 in 3 +/- 1.64   90.0% 1 in 10 +/- 1.96   95.0% 1 in 20 +/- 2.58   99.0% 1 in 100
Example This That
[On the Gaussian curve]  “Experimentalists think that it is a mathematical theorem while the mathematicians believe it to be an experimental fact.” Gabriel Lippman (1845-1921 )
Distributions Definition Examples Binomial Gaussian
"Life is good for only two things, discovering mathematics and teaching mathematics" Siméon Poisson (1781-1840)
The Poisson Distribution The Poisson distribution predicts the frequency of  r  events occurring randomly in time, when the expected frequency is  
Examples of events described by a Poisson distribution Lightning Accidents Laboratory? ?
A very useful property of the Poisson distribution
Using the Poisson distribution How many counts must be collected in an RIA in order to ensure an analytical CV of 5% or less?
Answer
Distributions Definition Examples Binomial Gaussian Poisson
The Student’s t Distribution When a small sample is selected from a large population, we sometimes have to make certain assumptions in order to apply statistical methods
Questions about our sample Is the mean of our sample, x bar, the same as the mean of the population,   ? Is the standard deviation of our sample,  s , the same as the standard deviation for the population,   ? Unless we can answer both of these questions affirmatively, we don’t know whether our sample has the same distribution as the population from which it was drawn.
Recall that the Gaussian distribution is defined by the probability function: Note that the exponential factor contains both   and   , both population parameters.  The factor is often simplified by making the substitution:
The variable  z  in the equation: is distributed according to a  unit gaussian , since it has a mean of zero and a standard deviation of 1
Gaussian probability distribution Probability 0 1 2 3 -1 -2 -3 .95 z .67
But if we use the sample mean and standard deviation instead, we get: and we’ve defined a new quantity,  t , which is  not  distributed according to the unit Gaussian.  It is distributed according to the  Student’s t distribution .
Important features of the Student’s t distribution Use of the t statistic assumes that the parent distribution is Gaussian The degree to which the t distribution approximates a gaussian distribution depends on N (the degrees of freedom) As N gets larger (above 30 or so), the differences between t and z become negligible
Application of Student’s t distribution to a sample mean The Student’s t statistic can also be used to analyze differences between the sample mean and the population mean:
Comparison of Student’s t and Gaussian distributions Note that, for a sufficiently large N (>30),  t  can be replaced with  z , and a Gaussian distribution can be assumed
Exercise The mean age of the 20 participants in one workshop is 27 years, with a standard deviation of 4 years.  Next door, another workshop has 16 participants with a mean age of 29 years and standard deviation of 6 years.  Is the second workshop attracting older technologists?
Preliminary analysis Is the population Gaussian? Can we use a Gaussian distribution for our sample? What statistic should we calculate?
Solution First, calculate the  t  statistic for the two means:
Solution, cont. Next, determine the degrees of freedom:
Statistical Tables
Conclusion Since 1.16 is less than 1.64 (the t value corresponding to 90% confidence limit), the difference between the mean ages for the participants in the two workshops is not significant
The Paired t Test Suppose we are comparing two sets of data in which each value in one set has a corresponding value in the other.  Instead of calculating the difference between the means of the two sets, we can calculate the mean difference between data pairs.
Instead of: we use: to calculate t:
Advantage of the Paired t If the type of data permit paired analysis, the paired t test is much more sensitive than the unpaired t. Why?
Applications of the Paired t Method correlation Comparison of therapies
Distributions Definition Examples Binomial Gaussian Poisson Student’s t
The   2   (Chi-square) Distribution There is a general formula that relates actual measurements to their predicted values
The   2   (Chi-square) Distribution A special (and very useful) application of the   2  distribution is to frequency data
Exercise In your hospital, you have had 83 cases of iatrogenic strep infection in your last 725 patients.  St. Elsewhere, across town, reports 35 cases of strep in their last 416 patients. Do you need to review your infection control policies?
Analysis If your infection control policy is roughly as effective as St. Elsewhere’s, we would expect that the rates of strep infection for the two hospitals would be similar.  The  expected  frequency, then would be the average
Calculating   2 First, calculate the expected frequencies at your hospital ( f 1 ) and St. Elsewhere ( f 2 )
Calculating   2 Next, we sum the squared differences between actual and expected frequencies
Degrees of freedom In general, when comparing  k  sample proportions, the degrees of freedom for   2  analysis are  k  - 1.  Hence, for our problem, there is 1 degree of freedom.
Conclusion A table of   2  values lists 3.841 as the   2  corresponding to a probability of 0.05. So the variation (  2  between strep infection rates at the two hospitals is within statistically-predicted limits, and therefore is not significant.
Distributions Definition Examples Binomial Gaussian Poisson Student’s t  2
The  F  distribution The  F  distribution predicts the expected differences between the  variances  of two samples This distribution has also been called  Snedecor’s F distribution, Fisher distribution, and variance ratio distribution
The  F  distribution The  F  statistic is simply the ratio of two variances (by convention, the larger V is the numerator)
Applications of the  F  distribution There are several ways the  F  distribution can be used.  Applications of the  F  statistic are part of a more general type of statistical analysis called  analysis of variance  (ANOVA).  We’ll see more about ANOVA later.
Example You’re asked to do a “quick and dirty” correlation between three whole blood glucose analyzers.  You prick your finger and measure your blood glucose four times on each of the analyzers. Are the results equivalent?
Data
Analysis The mean glucose concentrations for the three analyzers are 70, 85, and 76. If the three analyzers are equivalent, then we can assume that all of the results are drawn from a overall population with mean    and variance   2 .
Analysis, cont. Approximate    by calculating the mean of the means:
Analysis, cont. Calculate the variance of the means:
Analysis, cont. But what we really want is the variance of the population.  Recall that:
Analysis, cont. Since we just calculated we can solve for  
Analysis, cont. So we now have an estimate of the population variance, which we’d like to compare to the real variance to see whether they differ.  But what is the real variance? We don’t know, but we can calculate the variance based on our individual measurements.
Analysis, cont. If all the data were drawn from a larger population, we can assume that the variances are the same, and we can simply average the variances for the three data sets.
Analysis, cont. Now calculate the  F  statistic:
Conclusion A table of  F  values indicates that 4.26 is the limit for the  F  statistic at a 95% confidence level (when the appropriate degrees of freedom are selected).  Our value of 10.6 exceeds that, so we conclude that there is significant variation between the analyzers.
Distributions Definition Examples Binomial Gaussian Poisson Student’s t  2 F
Unknown or irregular distribution Transform
Log transform Probability x Probability log x
Unknown or irregular distribution Transform Non-parametric methods
Non-parametric methods Non-parametric methods make no assumptions about the distribution of the data There are non-parametric methods for characterizing data, as well as for comparing data sets These methods are also called  distribution-free, robust,  or sometimes  non-metric  tests
Application to Reference Ranges The concentrations of most clinical analytes are not usually distributed in a Gaussian manner.  Why? How do we determine the reference range (limits of expected values) for these analytes?
Application to Reference Ranges Reference ranges for normal, healthy populations are customarily defined as the “central 95%”. An entirely non-parametric way of expressing this is to eliminate the upper and lower 2.5% of data, and use the remaining upper and lower values to define the range. NCCLS recommends 120 values, dropping the two highest and two lowest.
Application to Reference Ranges What happens when we want to compare one reference range with another?  This is precisely what CLIA ‘88 requires us to do. How do we do this?
“ Everything should be made as simple as possible, but not simpler.” Albert Einstein
Solution #1:  Simple comparison Suppose we just do a small internal reference range study, and compare our results to the manufacturer’s range. How do we compare them? Is this a valid approach?
NCCLS recommendations Inspection Method:  Verify reference populations are equivalent Limited Validation:  Collect 20 reference specimens No more than 2 exceed range Repeat if failed Extended Validation:  Collect 60 reference specimens; compare ranges .
Solution #2:  Mann-Whitney * Rank normal values ( x 1 ,x 2 ,x 3 ...x n ) and the reference population ( y 1 ,y 2 ,y 3 ...y n ): x 1 , y 1 , x 2 , x 3 ,   y 2 , y 3  ... x n , y n Count the number of  y  values that follow each  x , and call the sum U x .  Calculate U y  also. * Also called the  U test, rank sum test,  or  Wilcoxen’s test .
Mann-Whitney, cont. It should be obvious that:  U x  +  U y  =  N x N y If the two distributions are the same, then: U x  =  U y  = 1/2 N x N y Large differences between  U x  and  U y  indicate that the distributions are not equivalent
“‘ Obvious’ is the most dangerous word in mathematics.”  Eric Temple Bell (1883-1960)
Solution #3: Run test In the  run test , order the values in the two distributions as before: x 1 , y 1 , x 2 , x 3 , y 2 , y 3  ... x n , y n Add up the number of  runs  (consecutive values from the same distribution).  If the two data sets are randomly selected from one population, there will be few runs.
Solution #4:  The Monte Carlo method Sometimes, when we don’t know anything about a distribution, the best thing to do is independently test its characteristics.
The Monte Carlo method x y
The Monte Carlo method Reference population mean, SD mean, SD mean, SD mean, SD N N N N
The Monte Carlo method With the Monte Carlo method, we have simulated the test we wish to apply--that is, we have randomly selected samples from the parent distribution, and determined whether our in-house data are in agreement with the randomly-selected samples.
Analysis of paired data For certain types of laboratory studies, the data we gather is  paired We typically want to know how closely the paired data  agree We need quantitative measures of the extent to which the data agree or disagree Examples?
Examples of paired data Method correlation data Pharmacodynamic effects Risk analysis Pathophysiology
Correlation 0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50
Linear regression (least squares) Linear regression analysis generates an equation for a straight line y = mx + b where  m  is the slope of the line and  b  is the value of  y  when  x  = 0 (the  y-intercept ). The calculated equation  minimizes  the differences between actual  y  values and the linear regression line.
Correlation 0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50 y  = 1.031 x  - 0.024
Covariance Do  x  and  y  values vary in concert, or randomly?
What if  y   increases  when  x  increases? What if  y   decreases  when  x  increases? What if  y  and  x  vary  independently ?
Covariance It is clear that the greater the covariance, the stronger the relationship between  x  and  y . But . . .  what about units? e.g.,  if you measure glucose in  mg/dL , and I measure it in  mmol/L , who’s likely to have the highest covariance?
The Correlation Coefficient
The Correlation Coefficient The correlation coefficient is a unitless quantity that roughly indicates the degree to which  x  and  y  vary in the same direction.    is useful for detecting relationships between parameters, but it is not a very sensitive measure of the  spread .
Correlation 0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50 y  = 1.031 x  - 0.024   = 0.9986
Correlation 0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50 y  = 1.031 x  - 0.024   = 0.9894
Standard Error of the Estimate The linear regression equation gives us a way to calculate an “estimated”  y  for any given  x  value, given the symbol  ŷ (y-hat) :
Standard Error of the Estimate Now what we are interested in is the average difference between the measured  y  and its estimate,  ŷ  :
Correlation 0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50 y  = 1.031 x  - 0.024    = 0.9986 s y/x =1.83
Correlation 0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50 y  = 1.031 x  - 0.024    = 0.9894 s y/x  = 5.32
Standard Error of the Estimate If we assume that the errors in the  y  measurements are Gaussian ( is that a safe assumption? ), then the standard error of the estimate gives us the boundaries within which 67% of the  y  values will fall.  2s y/x  defines the 95% boundaries..
Limitations of linear regression Assumes no error in  x  measurement Assumes that variance in  y  is constant throughout concentration range
Alternative approaches Weighted  linear regression analysis can compensate for non-constant variance among  y  measurements Deming  regression analysis takes into account variance in the  x  measurements Weighted Deming  regression analysis allows for both
Evaluating method performance Precision
Method Precision Within-run:  10 or 20 replicates What types of errors does within-run precision reflect? Day-to-day:  NCCLS recommends evaluation over 20 days What types of errors does day-to-day precision reflect?
Evaluating method performance Precision Sensitivity
Method Sensitivity The  analytical  sensitivity of a method refers to the lowest concentration of analyte that can be reliably detected. The most common definition of sensitivity is the analyte concentration that will result in a signal two or three standard deviations above background.
Signal time Signal/Noise threshold
Other measures of sensitivity Limit of Detection (LOD) is sometimes defined as the concentration producing an S/N > 3. In drug testing, LOD is customarily defined as the lowest concentration that meets all identification criteria. Limit of Quantitation (LOQ) is sometimes defined as the concentration producing an S/N >5. In drug testing, LOQ is customarily defined as the lowest concentration that can be measured within  ±20%.
Question At an S/N ratio of 5, what is the minimum CV of the measurement? If the S/N is 5, 20% of the measured signal is noise, which is random.  Therefore, the CV must be at least 20%.
Evaluating method performance Precision Sensitivity Linearity
Method Linearity A linear relationship between concentration and signal is not absolutely necessary, but it is highly desirable.  Why? CLIA ‘88 requires that the linearity of analytical methods is verified on a periodic basis.
Ways to evaluate linearity Visual/linear regression
Signal Concentration
Outliers We can eliminate any point that differs from the next highest value by more than 0.765 (p=0.05) times the spread between the highest and lowest values (Dixon test). Example:   4, 5, 6, 13 (13 - 4) x 0.765 = 6.89
Limitation of linear regression method If the analytical method has a high variance (CV), it is likely that small deviations from linearity will not be detected due to the high standard error of the estimate
Signal Concentration
Ways to evaluate linearity Visual/linear regression Quadratic regression
Quadratic regression Recall that, for linear data, the relationship between  x  and  y  can be expressed as y  =  f ( x ) =  a  +  bx
Quadratic regression A curve is described by the  quadratic  equation: y  =  f ( x ) =  a  +  bx  +  cx 2 which is identical to the linear equation except for the addition of the  cx 2  term.
Quadratic regression It should be clear that the smaller the  x 2  coefficient,  c , the closer the data are to linear (since the equation reduces to the linear form when  c  approaches 0). What is the drawback to this approach?
Ways to evaluate linearity Visual/linear regression Quadratic regression Lack-of-fit analysis
Lack-of-fit analysis There are two components of the variation from the regression line Intrinsic variability of the method Variability due to deviations from linearity The problem is to distinguish between these two sources of variability What statistical test do you think is appropriate?
Signal Concentration
Lack-of-fit analysis The ANOVA technique requires that method variance is constant at all concentrations.  Cochran’s test is used to test whether this is the case.
Lack-of-fit method calculations Total sum of the squares:   the variance calculated from all of the  y  values Linear regression sum of the squares:   the variance of  y  values from the regression line Residual sum of the squares:   difference between TSS and LSS Lack of fit sum of the squares:   the RSS minus the pure error (sum of variances)
Lack-of-fit analysis The LOF is compared to the pure error to give the “ G ” statistic (which is actually  F ) If the LOF is small compared to the pure error,  G  is small and the method is linear If the LOF is large compared to the pure error,  G  will be large, indicating significant deviation from linearity
Significance limits for  G 90% confidence = 2.49 95% confidence = 3.29 99% confidence = 5.42
“ If your experiment needs statistics, you ought to have done a better experiment.” Ernest Rutherford (1871-1937)
Evaluating Clinical Performance of laboratory tests The  clinical performance  of a laboratory test defines how well it predicts disease The  sensitivity  of a test indicates the likelihood that it will be positive when disease is present
Clinical Sensitivity If  TP  as the number of “true positives”, and  FN  is the number of “false negatives”, the  sensitivity  is defined as:
Example Of 25 admitted cocaine abusers, 23 tested positive for urinary benzoylecgonine and 2 tested negative.  What is the sensitivity of the urine screen?
Evaluating Clinical Performance of laboratory tests The  clinical performance  of a laboratory test defines how well it predicts disease The  sensitivity  of a test indicates the likelihood that it will be positive when disease is present The  specificity  of a test indicates the likelihood that it will be negative when disease is absent
Clinical Specificity If  TN  is the number of “true negative” results, and  FP  is the number of falsely positive results, the  specificity  is defined as:
Example What would you guess is the specificity of any particular clinical laboratory test?  (Choose any one you want)
Answer Since reference ranges are customarily set to include the central 95% of values in healthy subjects, we expect 5% of values from healthy people to be “abnormal”--this is the false positive rate. Hence, the specificity of most clinical tests is no better than 95%.
Sensitivity vs. Specificity Sensitivity and specificity are inversely related.
Marker concentration - + Disease
Sensitivity vs. Specificity Sensitivity and specificity are inversely related. How do we determine the best compromise between sensitivity and specificity?
Receiver Operating Characteristic True positive rate (sensitivity) False positive rate 1-specificity
Evaluating Clinical Performance of laboratory tests The  sensitivity  of a test indicates the likelihood that it will be positive when disease is present The  specificity  of a test indicates the likelihood that it will be negative when disease is absent The  predictive value  of a test indicates the probability that the test result correctly classifies a patient
Predictive Value The predictive value of a clinical laboratory test takes into account the prevalence of a certain disease, to quantify the probability that a positive test is associated with the disease in a randomly-selected individual, or alternatively, that a negative test is associated with health.
Illustration Suppose you have invented a new screening test for Addison disease. The test correctly identified 98 of 100 patients with confirmed Addison disease ( What is the sensitivity? ) The test was positive in only 2 of 1000 patients with no evidence of Addison disease ( What is the specificity? )
Test performance The sensitivity is 98.0% The specificity is 99.8% But Addison disease is a rare disorder--incidence = 1:10,000 What happens if we screen 1 million people?
Analysis In 1 million people, there will be 100 cases of Addison disease. Our test will identify 98 of these cases ( TP ) Of the 999,900 non-Addison subjects, the test will be positive in 0.2%, or about 2,000 ( FP ).
Predictive value of the positive test The predictive value is the % of all positives that are true positives:
What about the negative predictive value? TN  = 999,900 - 2000 = 997,900 FN  = 100 * 0.002 = 0 (or 1)
Summary of predictive value Predictive value describes the usefulness of a clinical laboratory test in the real world. Or does it?
Lessons about predictive value Even when you have a very good test, it is generally not cost effective to screen for diseases which have low incidence in the general population.  Exception? The higher the clinical suspicion, the better the predictive value of the test.  Why?
Efficiency We can combine the  PV +  and  PV -  to give a quantity called the  efficiency : The efficiency is the percentage of all patients that are classified correctly by the test result.
Efficiency of our Addison screen
“ To call in the statistician after the experiment is done may be no more than asking him to perform a postmortem examination: he may be able to say what the experiment died of.” Ronald Aylmer Fisher (1890 - 1962)
Application of Statistics to Quality Control We expect quality control to fit a Gaussian distribution We can use Gaussian statistics to predict the variability in quality control values What sort of  tolerance  will we allow for variation in quality control values? Generally, we will question variations that have a statistical probability of less than 5%
“ He uses statistics as a drunken man uses lamp posts -- for support rather than illumination.” Andrew Lang (1844-1912)
Westgard’s rules 1 2s 1 3s 2 2s R 4s 4 1s 10 x 1 in 20 1 in 300 1 in 400 1 in 800 1 in 600 1 in 1000
Some examples mean +1sd +2sd +3sd -1sd -2sd -3sd
Some examples mean +1sd +2sd +3sd -1sd -2sd -3sd
Some examples mean +1sd +2sd +3sd -1sd -2sd -3sd
Some examples mean +1sd +2sd +3sd -1sd -2sd -3sd
“ In science one tries to tell people, in such a way as to be understood by everyone, something that no one ever knew before. But in poetry, it's the exact opposite.” Paul Adrien Maurice Dirac (1902- 1984)

Statistics excellent

  • 1.
    Practical Applications ofStatistical Methods in the Clinical Laboratory Roger L. Bertholf, Ph.D., DABCC Associate Professor of Pathology Director of Clinical Chemistry & Toxicology UF Health Science Center/Jacksonville
  • 2.
    “ [Statistics are]the only tools by which an opening can be cut through the formidable thicket of difficulties that bars the path of those who pursue the Science of Man.” [Sir] Francis Galton (1822-1911)
  • 3.
    “ There arethree kinds of lies: Lies, damned lies, and statistics” Benjamin Disraeli (1804-1881)
  • 4.
    What are statistics,and what are they used for? Descriptive statistics are used to characterize data Statistical analysis is used to distinguish between random and meaningful variations In the laboratory, we use statistics to monitor and verify method performance, and interpret the results of clinical laboratory tests
  • 5.
    “ Do notworry about your difficulties in mathematics, I assure you that mine are greater” Albert Einstein (1879-1955)
  • 6.
    “ I don'tbelieve in mathematics” Albert Einstein
  • 7.
  • 8.
  • 9.
    The Mean (average)The mean is a measure of the centrality of a set of data.
  • 10.
  • 11.
  • 12.
    Use of theGeometric mean: The geometric mean is primarily used to average ratios or rates of change.
  • 13.
  • 14.
    Example of theuse of Harmonic mean: Suppose you spend $6 on pills costing 30 cents per dozen, and $6 on pills costing 20 cents per dozen. What was the average price of the pills you bought?
  • 15.
    Example of theuse of Harmonic mean: You spent $12 on 50 dozen pills, so the average cost is 12/50=0.24, or 24 cents. This also happens to be the harmonic mean of 20 and 30:
  • 16.
  • 17.
    For the dataset: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10:
  • 18.
  • 19.
    Other measures ofcentrality Mode
  • 20.
    The Mode Themode is the value that occurs most often
  • 21.
    Other measures ofcentrality Mode Midrange
  • 22.
    The Midrange Themidrange is the mean of the highest and lowest values
  • 23.
    Other measures ofcentrality Mode Midrange Median
  • 24.
    The Median Themedian is the value for which half of the remaining values are above and half are below it. I.e. , in an ordered array of 15 values, the 8th value is the median. If the array has 16 values, the median is the mean of the 8th and 9th values.
  • 25.
    Example of theuse of median vs. mean: Suppose you’re thinking about building a house in a certain neighborhood, and the real estate agent tells you that the average (mean) size house in that area is 2,500 sq. ft. Astutely, you ask “What’s the median size?” The agent replies “1,800 sq. ft.” What does this tell you about the sizes of the houses in the neighborhood?
  • 26.
    Measuring variance Twosets of data may have similar means, but otherwise be very dissimilar. For example, males and females have similar baseline LH concentrations, but there is much wider variation in females. How do we express quantitatively the amount of variation in a data set?
  • 27.
  • 28.
  • 29.
    The Variance Thevariance is the mean of the squared differences between individual data points and the mean of the array. Or, after simplifying, the mean of the squares minus the squared mean.
  • 30.
  • 31.
    The Variance Inwhat units is the variance? Is that a problem?
  • 32.
  • 33.
    The Standard DeviationThe standard deviation is the square root of the variance. Standard deviation is not the mean difference between individual data points and the mean of the array.
  • 34.
    The Standard DeviationIn what units is the standard deviation? Is that a problem?
  • 35.
    The Coefficient ofVariation * * Sometimes called the Relative Standard Deviation (RSD or %RSD)
  • 36.
    Standard Deviation (orError) of the Mean The standard deviation of an average decreases by the reciprocal of the square root of the number of data points used to calculate the average.
  • 37.
    Exercises How manymeasurements must we average to improve our precision by a factor of 2?
  • 38.
    Answer To improveprecision by a factor of 2:
  • 39.
    Exercises How manymeasurements must we average to improve our precision by a factor of 2? How many to improve our precision by a factor of 10?
  • 40.
    Answer To improveprecision by a factor of 10:
  • 41.
    Exercises How manymeasurements must we average to improve our precision by a factor of 2? How many to improve our precision by a factor of 10? If an assay has a CV of 7%, and we decide run samples in duplicate and average the measurements, what should the resulting CV be?
  • 42.
    Answer Improvement inCV by running duplicates:
  • 43.
    Population vs. Samplestandard deviation When we speak of a population , we’re referring to the entire data set, which will have a mean  :
  • 44.
    Population vs. Samplestandard deviation When we speak of a population , we’re referring to the entire data set, which will have a mean  When we speak of a sample , we’re referring to a subset of the population, customarily designated “x-bar” Which is used to calculate the standard deviation?
  • 45.
    “ Sir, Ihave found you an argument. I am not obliged to find you an understanding.” Samuel Johnson (1709-1784)
  • 46.
    Population vs. Samplestandard deviation
  • 47.
  • 48.
    Statistical (probability) DistributionA statistical distribution is a mathematically-derived probability function that can be used to predict the characteristics of certain applicable real populations Statistical methods based on probability distributions are parametric , since certain assumptions are made about the data
  • 49.
  • 50.
    Binomial distribution The binomial distribution applies to events that have two possible outcomes . The probability of r successes in n attempts, when the probability of success in any individual attempt is p , is given by:
  • 51.
    Example What isthe probability that 10 of the 12 babies born one busy evening in your hospital will be girls?
  • 52.
  • 53.
  • 54.
    “ God doesarithmetic” Karl Friedrich Gauss (1777-1855)
  • 55.
    The Gaussian DistributionWhat is the Gaussian distribution?
  • 56.
    63 81 3612 28 7 79 52 96 17 22 4 61 85 etc.
  • 57.
  • 58.
    63 81 3612 28 7 79 52 96 17 22 4 61 85 22 73 54 33 99 5 61 28 58 24 16 77 43 8 + 85 152 90 45 127 12 140 70 154 41 38 81 104 93 =
  • 59.
  • 60.
    . . . etc.
  • 61.
  • 62.
    The Gaussian ProbabilityFunction The probability of x in a Gaussian distribution with mean  and standard deviation  is given by:
  • 63.
    The Gaussian DistributionWhat is the Gaussian distribution? What types of data fit a Gaussian distribution?
  • 64.
    “ Like theski resort full of girls hunting for husbands and husbands hunting for girls, the situation is not as symmetrical as it might seem.” Alan Lindsay Mackay (1926- )
  • 65.
    Are these Gaussian?Human height Outside temperature Raindrop size Blood glucose concentration Serum CK activity QC results Proficiency results
  • 66.
    The Gaussian DistributionWhat is the Gaussian distribution? What types of data fit a Gaussian distribution? What is the advantage of using a Gaussian distribution?
  • 67.
    Gaussian probability distributionProbability µ µ+  µ+2  µ+3  µ-  µ-2  µ-3  .67 .95
  • 68.
    What are theodds of an observation . . . more than 1  from the mean (+/-) more than 2  greater than the mean more than 3  from the mean
  • 69.
    Some useful Gaussianprobabilities Range Probability Odds +/- 1.00  68.3% 1 in 3 +/- 1.64  90.0% 1 in 10 +/- 1.96  95.0% 1 in 20 +/- 2.58  99.0% 1 in 100
  • 70.
  • 71.
    [On the Gaussiancurve] “Experimentalists think that it is a mathematical theorem while the mathematicians believe it to be an experimental fact.” Gabriel Lippman (1845-1921 )
  • 72.
  • 73.
    "Life is goodfor only two things, discovering mathematics and teaching mathematics" Siméon Poisson (1781-1840)
  • 74.
    The Poisson DistributionThe Poisson distribution predicts the frequency of r events occurring randomly in time, when the expected frequency is 
  • 75.
    Examples of eventsdescribed by a Poisson distribution Lightning Accidents Laboratory? ?
  • 76.
    A very usefulproperty of the Poisson distribution
  • 77.
    Using the Poissondistribution How many counts must be collected in an RIA in order to ensure an analytical CV of 5% or less?
  • 78.
  • 79.
    Distributions Definition ExamplesBinomial Gaussian Poisson
  • 80.
    The Student’s tDistribution When a small sample is selected from a large population, we sometimes have to make certain assumptions in order to apply statistical methods
  • 81.
    Questions about oursample Is the mean of our sample, x bar, the same as the mean of the population,  ? Is the standard deviation of our sample, s , the same as the standard deviation for the population,  ? Unless we can answer both of these questions affirmatively, we don’t know whether our sample has the same distribution as the population from which it was drawn.
  • 82.
    Recall that theGaussian distribution is defined by the probability function: Note that the exponential factor contains both  and  , both population parameters. The factor is often simplified by making the substitution:
  • 83.
    The variable z in the equation: is distributed according to a unit gaussian , since it has a mean of zero and a standard deviation of 1
  • 84.
    Gaussian probability distributionProbability 0 1 2 3 -1 -2 -3 .95 z .67
  • 85.
    But if weuse the sample mean and standard deviation instead, we get: and we’ve defined a new quantity, t , which is not distributed according to the unit Gaussian. It is distributed according to the Student’s t distribution .
  • 86.
    Important features ofthe Student’s t distribution Use of the t statistic assumes that the parent distribution is Gaussian The degree to which the t distribution approximates a gaussian distribution depends on N (the degrees of freedom) As N gets larger (above 30 or so), the differences between t and z become negligible
  • 87.
    Application of Student’st distribution to a sample mean The Student’s t statistic can also be used to analyze differences between the sample mean and the population mean:
  • 88.
    Comparison of Student’st and Gaussian distributions Note that, for a sufficiently large N (>30), t can be replaced with z , and a Gaussian distribution can be assumed
  • 89.
    Exercise The meanage of the 20 participants in one workshop is 27 years, with a standard deviation of 4 years. Next door, another workshop has 16 participants with a mean age of 29 years and standard deviation of 6 years. Is the second workshop attracting older technologists?
  • 90.
    Preliminary analysis Isthe population Gaussian? Can we use a Gaussian distribution for our sample? What statistic should we calculate?
  • 91.
    Solution First, calculatethe t statistic for the two means:
  • 92.
    Solution, cont. Next,determine the degrees of freedom:
  • 93.
  • 94.
    Conclusion Since 1.16is less than 1.64 (the t value corresponding to 90% confidence limit), the difference between the mean ages for the participants in the two workshops is not significant
  • 95.
    The Paired tTest Suppose we are comparing two sets of data in which each value in one set has a corresponding value in the other. Instead of calculating the difference between the means of the two sets, we can calculate the mean difference between data pairs.
  • 96.
    Instead of: weuse: to calculate t:
  • 97.
    Advantage of thePaired t If the type of data permit paired analysis, the paired t test is much more sensitive than the unpaired t. Why?
  • 98.
    Applications of thePaired t Method correlation Comparison of therapies
  • 99.
    Distributions Definition ExamplesBinomial Gaussian Poisson Student’s t
  • 100.
    The 2 (Chi-square) Distribution There is a general formula that relates actual measurements to their predicted values
  • 101.
    The 2 (Chi-square) Distribution A special (and very useful) application of the  2 distribution is to frequency data
  • 102.
    Exercise In yourhospital, you have had 83 cases of iatrogenic strep infection in your last 725 patients. St. Elsewhere, across town, reports 35 cases of strep in their last 416 patients. Do you need to review your infection control policies?
  • 103.
    Analysis If yourinfection control policy is roughly as effective as St. Elsewhere’s, we would expect that the rates of strep infection for the two hospitals would be similar. The expected frequency, then would be the average
  • 104.
    Calculating 2 First, calculate the expected frequencies at your hospital ( f 1 ) and St. Elsewhere ( f 2 )
  • 105.
    Calculating 2 Next, we sum the squared differences between actual and expected frequencies
  • 106.
    Degrees of freedomIn general, when comparing k sample proportions, the degrees of freedom for  2 analysis are k - 1. Hence, for our problem, there is 1 degree of freedom.
  • 107.
    Conclusion A tableof  2 values lists 3.841 as the  2 corresponding to a probability of 0.05. So the variation (  2  between strep infection rates at the two hospitals is within statistically-predicted limits, and therefore is not significant.
  • 108.
    Distributions Definition ExamplesBinomial Gaussian Poisson Student’s t  2
  • 109.
    The F distribution The F distribution predicts the expected differences between the variances of two samples This distribution has also been called Snedecor’s F distribution, Fisher distribution, and variance ratio distribution
  • 110.
    The F distribution The F statistic is simply the ratio of two variances (by convention, the larger V is the numerator)
  • 111.
    Applications of the F distribution There are several ways the F distribution can be used. Applications of the F statistic are part of a more general type of statistical analysis called analysis of variance (ANOVA). We’ll see more about ANOVA later.
  • 112.
    Example You’re askedto do a “quick and dirty” correlation between three whole blood glucose analyzers. You prick your finger and measure your blood glucose four times on each of the analyzers. Are the results equivalent?
  • 113.
  • 114.
    Analysis The meanglucose concentrations for the three analyzers are 70, 85, and 76. If the three analyzers are equivalent, then we can assume that all of the results are drawn from a overall population with mean  and variance  2 .
  • 115.
    Analysis, cont. Approximate  by calculating the mean of the means:
  • 116.
    Analysis, cont. Calculatethe variance of the means:
  • 117.
    Analysis, cont. Butwhat we really want is the variance of the population. Recall that:
  • 118.
    Analysis, cont. Sincewe just calculated we can solve for 
  • 119.
    Analysis, cont. Sowe now have an estimate of the population variance, which we’d like to compare to the real variance to see whether they differ. But what is the real variance? We don’t know, but we can calculate the variance based on our individual measurements.
  • 120.
    Analysis, cont. Ifall the data were drawn from a larger population, we can assume that the variances are the same, and we can simply average the variances for the three data sets.
  • 121.
    Analysis, cont. Nowcalculate the F statistic:
  • 122.
    Conclusion A tableof F values indicates that 4.26 is the limit for the F statistic at a 95% confidence level (when the appropriate degrees of freedom are selected). Our value of 10.6 exceeds that, so we conclude that there is significant variation between the analyzers.
  • 123.
    Distributions Definition ExamplesBinomial Gaussian Poisson Student’s t  2 F
  • 124.
    Unknown or irregulardistribution Transform
  • 125.
    Log transform Probabilityx Probability log x
  • 126.
    Unknown or irregulardistribution Transform Non-parametric methods
  • 127.
    Non-parametric methods Non-parametricmethods make no assumptions about the distribution of the data There are non-parametric methods for characterizing data, as well as for comparing data sets These methods are also called distribution-free, robust, or sometimes non-metric tests
  • 128.
    Application to ReferenceRanges The concentrations of most clinical analytes are not usually distributed in a Gaussian manner. Why? How do we determine the reference range (limits of expected values) for these analytes?
  • 129.
    Application to ReferenceRanges Reference ranges for normal, healthy populations are customarily defined as the “central 95%”. An entirely non-parametric way of expressing this is to eliminate the upper and lower 2.5% of data, and use the remaining upper and lower values to define the range. NCCLS recommends 120 values, dropping the two highest and two lowest.
  • 130.
    Application to ReferenceRanges What happens when we want to compare one reference range with another? This is precisely what CLIA ‘88 requires us to do. How do we do this?
  • 131.
    “ Everything shouldbe made as simple as possible, but not simpler.” Albert Einstein
  • 132.
    Solution #1: Simple comparison Suppose we just do a small internal reference range study, and compare our results to the manufacturer’s range. How do we compare them? Is this a valid approach?
  • 133.
    NCCLS recommendations InspectionMethod: Verify reference populations are equivalent Limited Validation: Collect 20 reference specimens No more than 2 exceed range Repeat if failed Extended Validation: Collect 60 reference specimens; compare ranges .
  • 134.
    Solution #2: Mann-Whitney * Rank normal values ( x 1 ,x 2 ,x 3 ...x n ) and the reference population ( y 1 ,y 2 ,y 3 ...y n ): x 1 , y 1 , x 2 , x 3 , y 2 , y 3 ... x n , y n Count the number of y values that follow each x , and call the sum U x . Calculate U y also. * Also called the U test, rank sum test, or Wilcoxen’s test .
  • 135.
    Mann-Whitney, cont. Itshould be obvious that: U x + U y = N x N y If the two distributions are the same, then: U x = U y = 1/2 N x N y Large differences between U x and U y indicate that the distributions are not equivalent
  • 136.
    “‘ Obvious’ isthe most dangerous word in mathematics.” Eric Temple Bell (1883-1960)
  • 137.
    Solution #3: Runtest In the run test , order the values in the two distributions as before: x 1 , y 1 , x 2 , x 3 , y 2 , y 3 ... x n , y n Add up the number of runs (consecutive values from the same distribution). If the two data sets are randomly selected from one population, there will be few runs.
  • 138.
    Solution #4: The Monte Carlo method Sometimes, when we don’t know anything about a distribution, the best thing to do is independently test its characteristics.
  • 139.
    The Monte Carlomethod x y
  • 140.
    The Monte Carlomethod Reference population mean, SD mean, SD mean, SD mean, SD N N N N
  • 141.
    The Monte Carlomethod With the Monte Carlo method, we have simulated the test we wish to apply--that is, we have randomly selected samples from the parent distribution, and determined whether our in-house data are in agreement with the randomly-selected samples.
  • 142.
    Analysis of paireddata For certain types of laboratory studies, the data we gather is paired We typically want to know how closely the paired data agree We need quantitative measures of the extent to which the data agree or disagree Examples?
  • 143.
    Examples of paireddata Method correlation data Pharmacodynamic effects Risk analysis Pathophysiology
  • 144.
    Correlation 0 510 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50
  • 145.
    Linear regression (leastsquares) Linear regression analysis generates an equation for a straight line y = mx + b where m is the slope of the line and b is the value of y when x = 0 (the y-intercept ). The calculated equation minimizes the differences between actual y values and the linear regression line.
  • 146.
    Correlation 0 510 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50 y = 1.031 x - 0.024
  • 147.
    Covariance Do x and y values vary in concert, or randomly?
  • 148.
    What if y increases when x increases? What if y decreases when x increases? What if y and x vary independently ?
  • 149.
    Covariance It isclear that the greater the covariance, the stronger the relationship between x and y . But . . . what about units? e.g., if you measure glucose in mg/dL , and I measure it in mmol/L , who’s likely to have the highest covariance?
  • 150.
  • 151.
    The Correlation CoefficientThe correlation coefficient is a unitless quantity that roughly indicates the degree to which x and y vary in the same direction.  is useful for detecting relationships between parameters, but it is not a very sensitive measure of the spread .
  • 152.
    Correlation 0 510 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50 y = 1.031 x - 0.024  = 0.9986
  • 153.
    Correlation 0 510 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50 y = 1.031 x - 0.024  = 0.9894
  • 154.
    Standard Error ofthe Estimate The linear regression equation gives us a way to calculate an “estimated” y for any given x value, given the symbol ŷ (y-hat) :
  • 155.
    Standard Error ofthe Estimate Now what we are interested in is the average difference between the measured y and its estimate, ŷ :
  • 156.
    Correlation 0 510 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50 y = 1.031 x - 0.024  = 0.9986 s y/x =1.83
  • 157.
    Correlation 0 510 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50 y = 1.031 x - 0.024  = 0.9894 s y/x = 5.32
  • 158.
    Standard Error ofthe Estimate If we assume that the errors in the y measurements are Gaussian ( is that a safe assumption? ), then the standard error of the estimate gives us the boundaries within which 67% of the y values will fall.  2s y/x defines the 95% boundaries..
  • 159.
    Limitations of linearregression Assumes no error in x measurement Assumes that variance in y is constant throughout concentration range
  • 160.
    Alternative approaches Weighted linear regression analysis can compensate for non-constant variance among y measurements Deming regression analysis takes into account variance in the x measurements Weighted Deming regression analysis allows for both
  • 161.
  • 162.
    Method Precision Within-run: 10 or 20 replicates What types of errors does within-run precision reflect? Day-to-day: NCCLS recommends evaluation over 20 days What types of errors does day-to-day precision reflect?
  • 163.
    Evaluating method performancePrecision Sensitivity
  • 164.
    Method Sensitivity The analytical sensitivity of a method refers to the lowest concentration of analyte that can be reliably detected. The most common definition of sensitivity is the analyte concentration that will result in a signal two or three standard deviations above background.
  • 165.
  • 166.
    Other measures ofsensitivity Limit of Detection (LOD) is sometimes defined as the concentration producing an S/N > 3. In drug testing, LOD is customarily defined as the lowest concentration that meets all identification criteria. Limit of Quantitation (LOQ) is sometimes defined as the concentration producing an S/N >5. In drug testing, LOQ is customarily defined as the lowest concentration that can be measured within ±20%.
  • 167.
    Question At anS/N ratio of 5, what is the minimum CV of the measurement? If the S/N is 5, 20% of the measured signal is noise, which is random. Therefore, the CV must be at least 20%.
  • 168.
    Evaluating method performancePrecision Sensitivity Linearity
  • 169.
    Method Linearity Alinear relationship between concentration and signal is not absolutely necessary, but it is highly desirable. Why? CLIA ‘88 requires that the linearity of analytical methods is verified on a periodic basis.
  • 170.
    Ways to evaluatelinearity Visual/linear regression
  • 171.
  • 172.
    Outliers We caneliminate any point that differs from the next highest value by more than 0.765 (p=0.05) times the spread between the highest and lowest values (Dixon test). Example: 4, 5, 6, 13 (13 - 4) x 0.765 = 6.89
  • 173.
    Limitation of linearregression method If the analytical method has a high variance (CV), it is likely that small deviations from linearity will not be detected due to the high standard error of the estimate
  • 174.
  • 175.
    Ways to evaluatelinearity Visual/linear regression Quadratic regression
  • 176.
    Quadratic regression Recallthat, for linear data, the relationship between x and y can be expressed as y = f ( x ) = a + bx
  • 177.
    Quadratic regression Acurve is described by the quadratic equation: y = f ( x ) = a + bx + cx 2 which is identical to the linear equation except for the addition of the cx 2 term.
  • 178.
    Quadratic regression Itshould be clear that the smaller the x 2 coefficient, c , the closer the data are to linear (since the equation reduces to the linear form when c approaches 0). What is the drawback to this approach?
  • 179.
    Ways to evaluatelinearity Visual/linear regression Quadratic regression Lack-of-fit analysis
  • 180.
    Lack-of-fit analysis Thereare two components of the variation from the regression line Intrinsic variability of the method Variability due to deviations from linearity The problem is to distinguish between these two sources of variability What statistical test do you think is appropriate?
  • 181.
  • 182.
    Lack-of-fit analysis TheANOVA technique requires that method variance is constant at all concentrations. Cochran’s test is used to test whether this is the case.
  • 183.
    Lack-of-fit method calculationsTotal sum of the squares: the variance calculated from all of the y values Linear regression sum of the squares: the variance of y values from the regression line Residual sum of the squares: difference between TSS and LSS Lack of fit sum of the squares: the RSS minus the pure error (sum of variances)
  • 184.
    Lack-of-fit analysis TheLOF is compared to the pure error to give the “ G ” statistic (which is actually F ) If the LOF is small compared to the pure error, G is small and the method is linear If the LOF is large compared to the pure error, G will be large, indicating significant deviation from linearity
  • 185.
    Significance limits for G 90% confidence = 2.49 95% confidence = 3.29 99% confidence = 5.42
  • 186.
    “ If yourexperiment needs statistics, you ought to have done a better experiment.” Ernest Rutherford (1871-1937)
  • 187.
    Evaluating Clinical Performanceof laboratory tests The clinical performance of a laboratory test defines how well it predicts disease The sensitivity of a test indicates the likelihood that it will be positive when disease is present
  • 188.
    Clinical Sensitivity If TP as the number of “true positives”, and FN is the number of “false negatives”, the sensitivity is defined as:
  • 189.
    Example Of 25admitted cocaine abusers, 23 tested positive for urinary benzoylecgonine and 2 tested negative. What is the sensitivity of the urine screen?
  • 190.
    Evaluating Clinical Performanceof laboratory tests The clinical performance of a laboratory test defines how well it predicts disease The sensitivity of a test indicates the likelihood that it will be positive when disease is present The specificity of a test indicates the likelihood that it will be negative when disease is absent
  • 191.
    Clinical Specificity If TN is the number of “true negative” results, and FP is the number of falsely positive results, the specificity is defined as:
  • 192.
    Example What wouldyou guess is the specificity of any particular clinical laboratory test? (Choose any one you want)
  • 193.
    Answer Since referenceranges are customarily set to include the central 95% of values in healthy subjects, we expect 5% of values from healthy people to be “abnormal”--this is the false positive rate. Hence, the specificity of most clinical tests is no better than 95%.
  • 194.
    Sensitivity vs. SpecificitySensitivity and specificity are inversely related.
  • 195.
  • 196.
    Sensitivity vs. SpecificitySensitivity and specificity are inversely related. How do we determine the best compromise between sensitivity and specificity?
  • 197.
    Receiver Operating CharacteristicTrue positive rate (sensitivity) False positive rate 1-specificity
  • 198.
    Evaluating Clinical Performanceof laboratory tests The sensitivity of a test indicates the likelihood that it will be positive when disease is present The specificity of a test indicates the likelihood that it will be negative when disease is absent The predictive value of a test indicates the probability that the test result correctly classifies a patient
  • 199.
    Predictive Value Thepredictive value of a clinical laboratory test takes into account the prevalence of a certain disease, to quantify the probability that a positive test is associated with the disease in a randomly-selected individual, or alternatively, that a negative test is associated with health.
  • 200.
    Illustration Suppose youhave invented a new screening test for Addison disease. The test correctly identified 98 of 100 patients with confirmed Addison disease ( What is the sensitivity? ) The test was positive in only 2 of 1000 patients with no evidence of Addison disease ( What is the specificity? )
  • 201.
    Test performance Thesensitivity is 98.0% The specificity is 99.8% But Addison disease is a rare disorder--incidence = 1:10,000 What happens if we screen 1 million people?
  • 202.
    Analysis In 1million people, there will be 100 cases of Addison disease. Our test will identify 98 of these cases ( TP ) Of the 999,900 non-Addison subjects, the test will be positive in 0.2%, or about 2,000 ( FP ).
  • 203.
    Predictive value ofthe positive test The predictive value is the % of all positives that are true positives:
  • 204.
    What about thenegative predictive value? TN = 999,900 - 2000 = 997,900 FN = 100 * 0.002 = 0 (or 1)
  • 205.
    Summary of predictivevalue Predictive value describes the usefulness of a clinical laboratory test in the real world. Or does it?
  • 206.
    Lessons about predictivevalue Even when you have a very good test, it is generally not cost effective to screen for diseases which have low incidence in the general population. Exception? The higher the clinical suspicion, the better the predictive value of the test. Why?
  • 207.
    Efficiency We cancombine the PV + and PV - to give a quantity called the efficiency : The efficiency is the percentage of all patients that are classified correctly by the test result.
  • 208.
    Efficiency of ourAddison screen
  • 209.
    “ To callin the statistician after the experiment is done may be no more than asking him to perform a postmortem examination: he may be able to say what the experiment died of.” Ronald Aylmer Fisher (1890 - 1962)
  • 210.
    Application of Statisticsto Quality Control We expect quality control to fit a Gaussian distribution We can use Gaussian statistics to predict the variability in quality control values What sort of tolerance will we allow for variation in quality control values? Generally, we will question variations that have a statistical probability of less than 5%
  • 211.
    “ He usesstatistics as a drunken man uses lamp posts -- for support rather than illumination.” Andrew Lang (1844-1912)
  • 212.
    Westgard’s rules 12s 1 3s 2 2s R 4s 4 1s 10 x 1 in 20 1 in 300 1 in 400 1 in 800 1 in 600 1 in 1000
  • 213.
    Some examples mean+1sd +2sd +3sd -1sd -2sd -3sd
  • 214.
    Some examples mean+1sd +2sd +3sd -1sd -2sd -3sd
  • 215.
    Some examples mean+1sd +2sd +3sd -1sd -2sd -3sd
  • 216.
    Some examples mean+1sd +2sd +3sd -1sd -2sd -3sd
  • 217.
    “ In scienceone tries to tell people, in such a way as to be understood by everyone, something that no one ever knew before. But in poetry, it's the exact opposite.” Paul Adrien Maurice Dirac (1902- 1984)

Editor's Notes