Overview – Courses


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Overview – Courses

  1. 1. • Kristian Linnet, MD, PhD Linnet@post7.tele.dk • Per Hyltoft Petersen, MSc Per.hyltoft.petersen@ouh.fyns-amt.dk • Sverre Sandberg, MD, PhD Sverre.sandberg@isf.uib.no Statistics & graphics for the laboratory Linda Thienpont Linda.thienpont@ugent.be Dietmar Stöckl Dietmar@stt-consulting.com In cooperation with AQML: D Stöckl, L Thienpont & Applications Reference interval & Biological variation
  2. 2. Statistics & graphics for the laboratory2 Prof Dr Linda M Thienpont University of Gent Institute for Pharmaceutical Sciences Laboratory for Analytical Chemistry Harelbekestraat 72, B-9000 Gent, Belgium e-mail: linda.thienpont@ugent.be STT Consulting Dietmar Stöckl, PhD Abraham Hansstraat 11 B-9667 Horebeke, Belgium e-mail: dietmar@stt-consulting.com Tel + FAX: +32/5549 8671 Copyright: STT Consulting 2007
  3. 3. Statistics & graphics for the laboratory3 Content overview Reference interval Introduction Data presentation • Histogram • Normal probability plot & rankit-transformation • Graphical interpretation of rankit-plots Partitioning Statistical estimation • Parametric and non-parametric Biological variation • Introduction • Estimation (ANOVA application) • Index-of-individuality • Comparison of a result with a reference interval ("Grey-zone") • Reference change value (RCV) Content
  4. 4. Statistics & graphics for the laboratory4 Estimation of reference intervals – Overview REFERENCE INDIVIDUALS comprise a REFERENCE POPULATION from which is selected a REFERENCE SAMPLE GROUP on which are determined REFERENCE VALUES on which is observed a REFERENCE DISTRIBUTION from which are calculated REFERENCE LIMITS that may define REFERENCE INTERVALS that help with the interpretation of an OBSERVED VALUE Flowchart Introduction Start Select statistics Parametric method Stop Gaussian No No Yes Collect samples Inspect distribution Detect & handle outliers Partition? Intuitive assessment Non-parametric method Stop Stop Transform data Gaussian Yes Backtransform estimates Stop
  5. 5. Statistics & graphics for the laboratory5 Inclusion criteria (NORIP, Malmø 27/4-2004) The reference individual should • be feeling subjectively well • have reached the age of 18 • not be pregnant or breast-feeding • not been an in-patient in a hospital nor been subjectively dangerously ill during the last month • not had more than 2 measures of alcohol (24 g) in the last 24 hours • not given blood as a donor in the last five months • not taken prescribed drugs other than the P-pill or estrogens (female sex hormone) during the last two weeks • not smoked in the last hour prior to blood sampling Preanalytical conditions (NORIP, Malmø 27/4-2004) Reference individual • Sitting at least 15 min before sampling Sample collection • Li-heparin plasma or serum, EDTA-blood for haematology • Standard procedure • Minimal stasis Sample handling (plasma and serum) • Stored in the dark • Storage in room temperature before centrifugation serum: 0.5-1.5 h, plasma: max 15 min • Centrifugation: 10 min at min 1500 g • Distributed to secondary tubes within 2 h • Stored at -80 °C within 4 h Outliers • Gross or slight deviation • Check records • Check results • Re-analyse • ? Include • ? Omit Introduction
  6. 6. Statistics & graphics for the laboratory6 Data presentation Tools for presentation and inspection of distributions • Histogram • Normal probability plot and "Rankit-transformation" Rankit-transformation Reference population: Gauss-distribution Hyltoft Petersen P, Hørder M. Influence of analytical quality on test results. Scand J Clin Lab Invest 1992;52 Suppl 208:65-87. The frequency distribution is transformed to the cumulated frequency distribution and then transformed to the Rankit- or Normal Probability Plot. Data presentation & inspection In the Normal Probability Plot, the values are plotted on the x-axis and their normalized deviation from the mean (z-value, or Rankit) on the y-axis. In the figure below, a second axis has been introduced where the corresponding probabilities (to the z-value) can be read. Note, this second axis is non-linear and needs to be introduced as picture. It cannot be created with EXCEL. The tick-marks, however, can be programmed into an EXCEL chart (see: NormalRankitPlot.xls). Use: Visual test for Normal distribution: data should fit a line.
  7. 7. Statistics & graphics for the laboratory7 The rankit plot Triacylglyceride example Effect of imprecision (left Fig) and bias (right Fig) on the Normal Probability Plot An increase in imprecision (here 1.5 x) rotates the line clockwise and changes the probability at z = 1.65 from 95% to 84%. The introduction of a bias (here = 1) moves the line to the right and changes the probability at z = 1.65 from 95% to 74%. Data presentation & inspection NormalRankitPlot
  8. 8. Statistics & graphics for the laboratory8 The rankit plot Bimodal situation: left population healthy, right population diseased Data presentation & inspection When we apply the plot in the bimodal situation, we can directly read the fase negatives (FN) and the false positives (FP). Note, the healthy are cumulated from right to left. Under the conditions chosen (diseased at a distance of +2 SD and cutoff = 1.28 SD), FN = 24% and FP = 10%.
  9. 9. Statistics & graphics for the laboratory9 Data inspection – Examples Uric acid (µmol/l) – Simulation (distributions moved!) Female Male Mean 250 370 SD 40 40 n 1000 1000 Depending on the bin-size, bimodal distributions may be hidden in histograms! Uric acid ~reality, but Normal distributed Female Male Mean 250 330 SD 55 65 n 1000 1000 Graphical techniques are too weak to uncover bimodal situations where the population means are close together! Test for normal distribution P Chi-square 0.836 Kolmogorov-Smirnov 0.249 Anderson-Darling 0.02 D'Agostino-Pearson 0.016 Statistical techniques may uncover that "something is wrong" (not Normal) with the distribution. From that, one may consider to look for subgroups! However, different tests may have enourmously different power! Data presentation & inspection 0 0.06 0.12 0.18 0.24 0.3 100 200 300 400 500 Analyte Conc.
  10. 10. Statistics & graphics for the laboratory10 Calculations with logarithms Data transformation: Logarithms When the data are not normal distributed, one can try a transformation. Because, in nature, data are often log-normal distributed, logarithmic transformation of data can make them normal distributed. Test for normality: Triglycerides (See: Datasets.xls) n = 282; Lowest value: 0.3 mmol/L; Highest value: 3.2 mmol/L; Median: 0.92 mmol/L. CBstat Anderson Darling test: Anderson Darling test after logarithmic (natural) transformation P < 0.01 P = 0.13  data not normally distributed  data log-normally distributed Normal Probability Plot (ln-transformed data Data are "on a line" → Data are ln-Normal distributed Testing normality
  11. 11. Statistics & graphics for the laboratory11 Working with logarithms Calculate the reference interval of a logarithmic distribution Triglycerides 1. Transform the original data to ln 2. Calculate the mean of the ln (xi) values 3. Take the anti-ln of the mean of ln (xi) This equals the geometric mean of the original population, which is close to its median.  The anti-ln of the mean of the logged value e-0.0689 is equalto the geometric mean of the original distribution where the latter is given by [x1*x2 …Xn]1/n  The anti-ln of the SD is meaningless. Calculation of 2.5 and 97.5% percentile Mean (ln transformed) -0.0689 SD (ln transformed) 0.395 2.5 Percentile -0.0689 – 1.96*0.395 = - 0.843 97.5 percentile -0.0689 + 1.96*0.394 = 0.7053 Anti-ln of 2.5 & 97.5 perc 0.43 – 2.02 Calculations with logarithms Number mmol/l ln 1 0.3 -1.204 2 0.32 -1.139 3 0.34 -1.079 4 0.38 -0.968 5 0.4 -0.916 6 0.4 -0.916 … … … 282 3.2 1.163 Median 0.92 Anti-ln (ex ) 0.933 -0.069 Mean, ln EXCEL: EXP(x) Geometric mean 0.933 EXCEL: GEOMEAN
  12. 12. Statistics & graphics for the laboratory12 Partitioning of reference intervals Visual, on the basis of suspected differences (sex, race, age, …) The reference interval Frequency polygon Rankit-plot
  13. 13. Statistics & graphics for the laboratory13 Example: Partitioning – Visual Comparison of oromucosid values: Caucasians and Indians in Leeds (Johnson et al. CCLM 2004;42:792-9). Statistical criteria for partitioning (Lahti et al. Clin Chem 2002;48:338-52) Difference between two upper or lower limits • D <0,25*s: No partitioning • D = 0,25 – 0.75*s: Variable • D >0,75*s: Partitioning • or percentage: Pb 0.9 and Pa 4.1 % The reference interval
  14. 14. Statistics & graphics for the laboratory14 Statistical model for estimating a reference interval The statistical procedures assume random sampling in the target population. Traditionally: 2.5- and 97.5-percentiles are estimated with on average 95% of population included. In some contexts, one-sided: 95-, 97.5- or 99-percentiles are used. Statistical estimation procedures Parametric • Assumes normal distribution or distribution that can be transformed to the normal distribution Nonparametric • Model-free estimation of percentiles Partitioning • Subdivision according to gender, age, race, etc. should be considered where relevant Reference interval & type of distribution Normal distributions can be expected for analytes with relatively narrow biological distribution, e.g. Electrolytes. The reference interval for Normal distributions ranges from the 2.5th to the 97.5th percentile (= mean+/-1.96 SD). The reference interval 95% Reference interval Upper reference limit Lower reference limit
  15. 15. Statistics & graphics for the laboratory15 Skewed distributions • Biological variation is very often skewed to the right, i.e. there is a tailing with high values. • The theoretical background is many factors that has a multiplicative impact (an additive impact of many independent factors yields a normal distribution). Skewed distributions often can be modeled by the log-normal distribution. The log-normal type of distribution is actually constituted of a family of distributions with a spectrum of degrees of skewness determined by the parameter values (ratio between standard deviation and mean). Coefficient of skewness: Cskew = [Σ(xi – xm)3 /N]/SD3 Zero: symmetric distribution; Positive: skewed to the right; Negative: skewed to the left Nonparametric procedure Applicable to all types of distributions Simple procedures • Based on ordering (ranking) of values according to size Refined procedures • Weighted percentile estimation, smoothing techniques, resampling principle (bootstrap). The reference interval Coefficient of kurtosis: Ckurt = [Σ(xi – xm)4 /N]/SD4 – 3 Zero: Normal distribution; Positive: Peaked distribution; Negative: Flat distribution
  16. 16. Statistics & graphics for the laboratory16 Simple nonparametric procedure(s) Approach • Sort N reference values in increasing numerical order • Assign rank numbers; lowest = 1; highest = N • Rank number of 2.5-Percentile = 0.025 x (N+1) or 0.025 x (N) + 0.5 • Rank number of 97.5-Percentile = 0.975 x (N+1) or 0.975 x (N) + 0.5 • Lower reference limit = reference value corresponding to rank number of 2.5- Percentile • Upper reference limit = reference value corresponding to rank number of 97.5- Percentile Remark – Estimation of 2.5 & 97.5 percentiles Procedure recommended by the IFCC and CLSI: • 2.5-Percentile = Value of number: (0.025) x (N+1) • 97.5-Percentile = Value of number: (0.975) x (N+1) Optimal procedure (slightly different from above): • 2.5-Percentile = Value of number: (0.025) x (N) + 0.5 • 97.5-Percentile = Value of number: (0.975) x (N) + 0.5 (Linnet K. Clin Chem 2000;46:867-9) Triglycerides: n = 282 0.025 x (282 + 1) = 7.1 = Rank: 7 = 0.42 mmol/L 0.975 x (282 + 1) = 276 = Rank: 276 = 2.12 mmol/L Reference interval = 0.42 – 2.12 mmol/L The reference interval
  17. 17. Statistics & graphics for the laboratory17 Sample size and precision of estimates Precision of percentiles of Normal distribution Can be expressed as a ratio between 90%-confidence intervals (90%-CI) and the width of the 95%-reference interval (e.g. ratios 0.3, 0.2 or 0.1 as outlined below). The necessary sample sizes are indicated: Ratio Parametric N (90% CI/95% RI) 0.3 23 0.2 50 0.1 205 Precision of percentiles of normal distribution Comparison between parametric and non-parametric procedures. Ratio Parametric N Non-parametric N (90% CI/95% RI) 0.3 23 56 0.2 50 125 0.1 205 500 The reference interval
  18. 18. Statistics & graphics for the laboratory18 Sample size and precision of estimates Coefficient of skewness: 0.75 Ratio Parametric N Non-parametric N (90% CI/95% RI) 0.3 90 140 0.2 200 315 0.1 800 1250 Coefficient of skewness: 1.5 Ratio Parametric N Non-parametric N (90% CI/95% RI) 0.3 200 315 0.2 440 695 0.1 1750 2740 The reference interval
  19. 19. Statistics & graphics for the laboratory19 Bootstrap principle Repeated random re-sampling with replacement of observations. • For a set of N observations: Each observation has the probability of 1/N of being re-sampled. • A re-sampled set of N observations (a so-called pseudo-set of observations) may contain several copies of one observation and lack others. Origin of the name The bootstrap term refers to the phrase to pull oneself up by one´s bootstrap originating from the tale The Adventures of Baron Munchausen (by Rudolph Erich Raspe (1737-94)) in which ”The Baron had fallen to the bottom of a deep lake. Just when it looked like all was lost, he thought to pick himself up by his own bootstraps”. Calculation of estimates • For each pseudo-set of N observations, the percentiles are computed by the simple nonparametric procedure. • By repetition on a computer, e.g. 100 or more times, a distribution of estimated percentiles are obtained that mimicks the real sampling variation. • The bootstrap estimates are the means of the pseudo-estimates. • The bootstrap procedure is slightly (5-15%) more efficient than simple nonparametric estimation. • Standard errors of estimates are provided. Limitations • Too low coverage* at small sample sizes (N < 40) • Modified versions with smoothing might improve coverage at small sample sizes • Some bias problems with the bootstrap estimates at low sample sizes (N < 40) *Coverage: Expected percentage of times an estimated CI-interval includes the true value, i.e. ideally 90% for a supposed 90%-CI The reference interval
  20. 20. Statistics & graphics for the laboratory20 Comparison of statistical procedures Can be studied theoretically and/or by simulation on the basis of specified model distributions, e.g. normal and log-normal types. In simulation, the procedure is repeated a large number of times in order to study bias (= difference between average of percentile estimates and true value) and precision (standard error: SE) of the estimation procedure (small SE: efficient procedure). SEs should reflect the real uncertainty so that estimated confidence intervals become correct. Tool: Root mean squared error (RMSE) RMSE: [Σ(xobs – xTrue)2 /Nrun]2 = [Bias2 + SE2 ]0.5 (Nrun: no. of simulation runs) A combined error measure taking both systematic deviation and random error into account. Often used in statistics as an overall error measure allowing ranking of various statistical estimation procedures studied theoretically or by simulations. Model example Using a theoretical model distribution, e.g. a CHI-square-distribution, the true percentile values are known. By simulation, the performance of parametric and nonparametric procedures can be compared and the RMSE of the percentile estimates can be related to the sample size. Outcome The higher the sample size, the higher is the likelihood that the nonparametric procedure is the optimal approach (lowest RMSE at given sample size). The relationship relies in the general fact that a bias associated with parametric estimation is independent of sample size and will tend to dominate the RMSE at high sample sizes where the random error vanishes. The reference interval
  21. 21. Statistics & graphics for the laboratory21 Statistical procedures – Summary Ranking of procedures according to efficiency 1. Parametric procedure 2. Bootstrap non-parametric – 3. Simple non-parametric – Non-parametric vs parametric About half as effective, i.e. about twice the sample size required to attain the same SE of the percentiles The difference in effectiveness is larger the more extreme the percentiles are (e.g. 99 vs 97.5 percentile) Simple non-parametric procedure N p +0.5 slightly better than N p +1 for both normal and skewed distributions Bootstrap non-parametric vs simple non-parametric Slightly more efficient (5-15% savings of sample size) Confidence intervals can be estimated for smaller sample sizes (for simple non- parametric N ≥ 120 for 90%-CI) The reference interval
  22. 22. Statistics & graphics for the laboratory22 Example Example: Triglycerides with CBstat Procedure CI Lower limit CI Upper limit Parametric direct 0.08 – 0.23 1.79 – 1.94 Non-parametric 0.34 – 0.52 1.92 – 2.60 Non-parametric bootstrap 0.37 – 0.52 1.88 – 2.33 Parametric after log-transform 0.40 – 0.46 1.90 – 2.16 Note: Direct parametric is not correct! Simulation of triacylglyceride data We simulate data that correspond to the triacylglyceride data: skew ~1.64. We do that with Worksheet LnNormal 3 (mean = 0; SD = 0.48; n = 1000). Copy the data in the file RefInt.xls. Adapt the digits to 2 after the point (precision as displayed). Sample 20 values from these data (Tools>Data Analysis>Sampling). Compare the 90% confidence intervals n = 20 with the respective ones for n = 1000. The reference interval DataGeneration
  23. 23. Statistics & graphics for the laboratory23 Software & references CBstat A Windows program distributed by K. Linnet (via aaccdirect.org). Offers general statistical methods and procedures dedicated for clinical biochemistry Estimation of reference intervals: • Simple nonparametric and bootstrap procedure • Parametric direct • Parametric after transformations –One-stage: log-, 3-parameter-log-, Box & Cox- and Manly- –Two-stage: 1) Correction of skewness; 2) Correction of kurtosis • Normality testing with appropriate corrections after transformations • Appropriate confidence intervals of percentiles after transformation Further information: • www.cbstat.com References Linnet K. Nonparametric estimation of reference intervals by simple and bootstrap-based procedures. Clin Chem 2000;46:867-9. Linnet K. Two-stage transformation systems for normalization of reference distributions evaluated. Clin Chem 1987;33:381-6. IFCC. J Clin Chem Clin Biochem 1987;25:645-56. Linnet K. Testing normality of transformed distributions. Appl Statist 1988;37:180- 6. The reference interval
  24. 24. Statistics & graphics for the laboratory24 Notes Notes