Managing Data Quality: What does your Chemical Result Mean?


Published on

What does your chemical result mean? How does one result relate to another result taken 3 months earlier? Were the analytical methods the same? Were the sampling procedures the same? Are these questions that you can asnwer?

Published in: Technology, Business
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Managing Data Quality: What does your Chemical Result Mean?

  1. 1. Managing data quality: What does your chemical result mean? K. Stagg1, D. Tuhovak2, & M. Beard3 ! Environmental Simulations, UK 2 Conestoga-Rovers and Associates, U.S.A. 3 EarthSoft, Inc, U.S.A. Abstract What does your chemical result mean? How does one result relate to another result taken 3 months earlier? Were the analytical methods the same? Were the sampling procedures the same? Are these questions that you can answer? Quality assurance provides a history for every piece of data, enabling validation and traceability of your data. Quality assurance of data should be a prime concern of every data handler; however, cost implications, inadequate data management software and the lack of legislation result in little data being recorded. With a greater need to provide justification of data and higher technology in the form of electronic data transfer and comprehensive database systems, a good QA/QC system should be standard for all environmental data. EarthSoft's EQuIS (Environmental Quality Information System) provides a means of obtaining and validating laboratory data electronically. The laboratory provides data in a standardized electronic format that includes both sample data and supporting quality control data. The data are imported into the end user's database, eliminating potential hand-entry errors. An initial validation of the data is then performed using the Data Verification Module (DVM) software. The DVM essentially identifies outlying quality control results, summarizes potentially impacted sample results and recommends data qualifiers to the validator, significantly reducing the amount of time needed to validate data. Using EQuIS, data of known quality are easily and accurately transferred from the Laboratory Information Management System (LIMS) to the end user's database.
  2. 2. Introduction What does your chemical result mean? There are many influences that may affect the chemical result that is given for your sample, and there are many reasons that this result may differ from another result that you would expect to be the same. It is therefore extremely important to be able to assess the influences that produce these differences to enable an accurate interpretation of these results. In some cases it may be possible to quantify the errors associated with these results, but in other cases this may prove impossible. Therefore, it is imperative to track quality assurance results that may indicate unreliable chemical results from environmental samples. This paper looks at the possible ways of assessing errors associated with chemical results by quantification of the errors and by good quality assurance of data to provide validation and traceability of your data. Error analysis concerns the investigation of data to identify and quantify influences on the data. The accuracy of data—how close a single measurement is to the true value—is determined by the bias and/or the precision of the data. The bias is a measure of the systematic error of a method and the precision is a measure of the random error of a method. These measurements are generally summed and expressed as uncertainty values – the interval around the result of the measurement that contains the true value with high probability. Standard uncertainty is generally expressed as one standard deviation from the measurement; extended uncertainty is generally two standard deviations [1]. Classifying Errors in Environmental Data Error analysis can be applied to environmental samples by classifying the errors associated with the sample as sampling or analytical error. Sampling error provides an indication of whether the sample is representative of the scale of interest for which it was obtained. Analytical error provides an assessment of the accuracy of the actual analytical characterization methods and whether the method of characterization accurately determines the data from the sample. Within environmental samples the associated errors can generally be divided into four categories: sampling precision; sampling bias; analytical precision; and analytical bias. Various modifications to the sampling procedures are required to quantify these errors and considerable work is available on the determination of errors associated with sampling [1-4]. It is possible to produce calculations of these errors by adopting a simple error-analysis field-sampling protocol. Analytical Precision and Bias Analytical precision may be calculated by producing duplicate samples, obtained by splitting the original sample and performing the analytical method on both halves. The analytical precision variance is obtained by the differences between the results produced by the two samples.
  3. 3. The analytical bias of the sample method is the systematic variance of the result from the true value. This bias may be introduced due to incorrect calibration of analytical instrumentation, use of defective standards, or the improper execution of analytical techniques. The analytical precision is determined by varying these procedures by using different laboratories, standards from independent sources, or different analytical methods. Therefore, the analytical precision variance is determined by the difference in the results produced by duplicate analytical samples - samples where the original is split – with the analytical method changed. Quality assurance programs are used by analytical laboratories to measure bias and variability in analytical results that cause errors in the data, and to define acceptable ranges for these errors. Under a comprehensive QA program, a laboratory continually monitors analytical bias and precision by including quality control samples with each batch of investigative samples, and comparing the results to statistically derived acceptance limits. QC results that fall outside of these limits prompt the laboratory to investigate the cause and to take necessary corrective actions to improve data quality. Additionally, other elements of a standard laboratory QA program are designed to ensure analytical accuracy and precision including sample handling procedures to ensure that samples are not mislabelled, are properly stored, and are not cross-contaminated; sample documentation procedures to ensure that laboratory calculations are performed and reported correctly; standard preparation procedures to ensure that reference standards are prepared properly or obtained from a reputable source and all are checked against a second source to confirm accuracy; training procedures to ensure that all analysts are performing analyses in a consistent manner, etc. Standard quality control samples used by analytical laboratories include the following: • Matrix spike samples are prepared by spiking a second aliquot of an investigative sample with known concentrations of analytes of concern to assess the potential effects of sample matrix on analytical accuracy. The recoveries of the spiked analytes, less the concentrations of these analytes in the original samples, are reported as a percentage of the expected values. • Laboratory control samples are blank samples that are spiked with known concentrations of analytes of interest, and are prepared and analysed as investigative samples. The recoveries of the spiking compounds are used to assess overall analytical accuracy. • Surrogate compounds are compounds similar in structure and characteristics to the analytes of interest, but would not typically be detected at the source from which the samples were collected. Surrogates are added to every investigative sample and the recoveries are compared to the expected values to assess potential matrix effects on analytical accuracy. • Blank samples are prepared and analysed as samples to assess potential contamination introduced during sample analysis. Analytes detected in the blanks may also have been introduced.
  4. 4. The quality control samples typically used to assess analytical precision include duplicate and/or matrix spike duplicate samples. • Duplicate samples are prepared and analysed independently and the results compared to assess analytical precision. • Matrix spike duplicate samples are prepared by spiking two separate sample aliquots and the recoveries are compared to assess analytical precision. Sampling Precision and Bias The sampling precision is determined by obtaining two samples, both representing the intended sample scale. This repeat sampling may involve sample variations spatially or temporally. For example, if a sample is intended to represent the groundwater from a borehole over a period of a day, then repeat samples should be taken at different levels within the borehole (spatial variance) and at different times during that day (temporal variance). The sampling precision variance is determined by differences between the results obtained for these repeat samples. The sampling bias is probably the hardest error to quantify; it is caused by the systematic variance of the result from the true result due to sampling practice. For example, if samples are contaminated during sampling, the accuracy of the results would be impacted. To quantify this error, different sampling methods are required and the results produced by both methods assessed. However, in order to calculate the sampling error directly, the other variances (sampling precision etc.) must remain constant. Therefore, this error is generally determined by the variance of the results obtained from separate sampling sessions where all of the discussed errors are determined in both sessions [5, 6]. Changes to field protocols and sampling patterns can provide an indication of these errors; however complex-sampling patterns are expensive and cost implications, inadequate data management software and lack of legislation may result in little data to produce these calculations. These field protocols also do not allow assessment of where or how these errors have occurred and therefore how to minimize them. In order to interpret these errors, assessment of laboratory quality assurance data and a strong validation and traceability of your data is required. Assessing Data Usability Prior to collecting and analyzing samples, the data end-user must first define the program objectives and determine the level of error in the data that can be tolerated, while still meeting these objectives. These data quality objectives (DQOs) are often expressed as tolerance ranges for the QC sample results.
  5. 5. The assessment of the analytical results to determine whether program DQOs have been met is typically performed by an independent third party using a process referred to as data validation. During data validation, the validator reviews the results of laboratory and field QC sample analyses and assesses them against the project DQOs. Depending on the level of review, the validator may also examine raw analytical data to perform a more comprehensive review of the analytical activities, including verifying the laboratory’s calculations, assessing instrument calibration data, and reviewing analyte identifications. Using the results of the data review, the validator determines the impact of any error identified on the sample results. Data deemed unusable are rejected. Sample data associated with QC results that show potential biases or variability in the analytical process, that are not significant enough to warrant data rejection, are qualified by the data validator as estimated values. Electronic Tools for Data Quality Assessment and Handling The process of obtaining data from the analytical laboratory, assessing data quality, and transferring these data into the format required by the end-user, is a difficult process. Using EQuIS the data end-user obtains analytical data from the laboratory in a standardised electronic format for import into the electronic database without introducing the errors associated with manual data input. The end-user can then access their data electronically from the database and can import the data into the required documents, using whatever format is required. EQuIS is a comprehensive geo-environmental data management database designed to store analytical test data and related data obtained during environmental site investigations, routine site monitoring, and hazardous waste remediation projects. EQuIS can be used for report and chart generation and is integrated with multiple statistical, numerical modeling and data visualization tools. Although data validation requires significant expertise in analytical chemistry, some portions of this process can be automated using the Data Verification Module (DVM). DVM is an optional companion program to EQuIS. DVM can access the data in an EQuIS chemistry project database to identify out-of-control QA results and to flag the corresponding environmental results. The comprehensive electronic deliverable includes analytical sample results, QC results, control limits established to assess the QC results, analytical method test details, test batch groupings, preparation & analytical test dates and times, and other significant information about the analytical program. EQuIS DVM allows the data manager to edit validation parameters and settings; data can be selected by sample delivery group (SDG). Multiple sample delivery groups can be processed simultaneously. Verification options include detection limit, accuracy, contamination, precision, holding times, surrogate, and percent solids.
  6. 6. Selecting one or more sample delivery groups for processing allows the data verification process to begin. Warning messages will be generated during the verification process unless interactive warning message suppression is specified or the data is in order. In any case, a DVM process log (with any warnings) is displayed on the screen and can be printed and/or saved to a file. Using DVM, error levels that did not meet the program DQOs are identified. The DVM report includes a summary of outlying QC results and the sample results potentially impacted by the biases and variability indicated by these outliers. DVM can assign validator qualifiers to these data in accordance with standard data validation conventions. DVM can be used immediately after data are imported into EQuIS to identify potential data quality issues that may require re-collection of samples. In many cases, identifying these outliers as quickly as possible can help avoid additional costs and project delays. DVM performs the majority of the components of a low-level data validation. For more comprehensive data validations, DVM can be used to perform the majority of the steps that do not require analytical decision making capabilities (e.g. identifying QC outliers and summarising potentially impacted samples) and the validator’s time can be spent performing the facets of validation that require analytical expertise (e.g. reviewing raw data to confirm analyte identifications, etc.). The reports prepared with EQuIS Chemistry/DVM focus on relevant findings in each of the following functional areas: Holding Times, Contamination, Accuracy, Precision, and Surrogates, The reports prepared by DVM are simple and concise, showing only the data needed to access data quality and minimizing raw data. For example, the DVM accuracy report shows spike recoveries. The spike recoveries shown by DVM are the most directly relevant information, and showing the spike concentration values does not contribute directly to data quality review. If these spike concentration data are present (usually they are not available), then DVM will recalculate recoveries to check the laboratory's math, but this review does not directly impact issues of accuracy. The reports are designed to be usable in 12-point text in landscape page layout. Shading is not used to improve faxing. The reports are created as Microsoft Word files to simplify editing and inclusion in other reports. The DVM contamination report includes information from lab blanks as well as field blanks; this level of review is generally impossible within the lab. EQuIS Chemistry allows very flexible associations between field blanks and normal environmental samples. For example, a normal environmental sample can be associated with the equipment rinsate blank that preceded it and with the equipment blank that followed it. DVM limits the report to only those results that are thought to be related to contamination. If a normal environmental sample result is some factor higher than the blank hit result (typically 5 or 10 times higher), then the normal environmental sample result is usually considered valid (i.e., not due only to
  7. 7. contamination). DVM allows the user to select this blank ratio on a per- chemical basis. The DVM holding time summary report focuses on elapsed times and limits. The sample preparation information is shown only if a relevant holding time exists. This summary report is at a test event level; it helps to show how many tests were late. The DVM holding time detail report focuses on sample results. This summary report is at a result level; it helps to show how the impact on results. DVM supports both method and contractual holding limits. The DVM accuracy report includes information from matrix spikes as well as laboratory control samples. For out of control blank spike recoveries (sample types BS, SD), the prep or analytical batch is used to find associated environmental sample results. For out of control inorganic matrix spike recoveries (MS, SD), the SDG is used to find associated environmental sample results. For out of control organic matrix spike recoveries (sample types MS, SD), the flags are assigned only to the environmental sample from which it was derived (i.e., the quot;neatquot; sample). The DVM precision report includes information from matrix spike duplicates, laboratory control sample duplicates, lab duplicates, and field duplicates or replicates. The DVM surrogate report summarizes all surrogate recoveries that did not meet the individual program DQOs and lists the sample results potentially impacted by the errors in analytical accuracy that these outlying recoveries indicate. See Appendix for example Contamination and Accuracy reports. Conclusion There are many sources of potential error in analytical data. For error that is inherent to sample collection and analysis activities, systems must be in place to identify and quantify the level of error. The data end-user must determine the level of error that can be tolerated for their program objectives prior to sample collection. To avoid some of the error potentially introduced with data manipulation, data should be used in an electronic format. EarthSoft's EQuIS (Environmental Quality Information System) provides a means of obtaining and validating laboratory data electronically. Using this system, data manipulation errors are minimized, and analytical and sampling error can be identified and assessed against project DQOs. Using EQuIS, data of known quality are easily and accurately transferred from the Laboratory Information Management System (LIMS) to the end-user's database, providing the end-user with unlimited flexibility for data reporting and manipulation. References
  8. 8. [1] Ramsey, M.H., Watkins, P.J. & Sams, M.S. Estimation of measurements for in-situ borehole determinations using a geochemical tool. Q.J.E.G., 15, pp. 1-15, 1996. [2] Davis, J.C. Statistics and Data Analysis in Geology, John Wiley & Sons: New York, 1986. [3] Puls, R.W., Clark D.A., Bledsoe, B., Powell, R.M., & Paul, C.J. Metals in groundwater-sampling artifacts and reproducibility. Hazardous Waste & Hazardous Materials, 9(2), pp.149-162, 1992. [4] Rouhani, S. & Hall, T.J. Geostatistical schemes for groundwater sampling. Journal of Hydrology, 103, pp.85-102, 1988. [5] Brahana et al. (eds). Hydrochemical variations with depth in a major UK aquifer: the fractured, high permeability Triassic Sandstone. Gambling with groundwater—physical, chemical, and biological aspects of aquifer- stream relationships, American Institute of Hydrology: Minnesota, pp. 53- 58, 1998. [6] Stagg, K.A. An investigation into colloids in Triassic Sandstone Groundwaters. Unpublished PhD thesis, University of Birmingham, UK, 1999.
  9. 9. Appendix. Table 1: EQuIS Chemistry DVM Report for Contamination. Sample deliver group A01. Blank Hits. Blank matrix Blank type Blank sample ID Contaminant Blank concentration Qualification Limit WQ LB LB001 Styrene (monomer) 5 mg/liter 50 mg/liter WQ FB FB001 Benzene 5 mg/liter 50 mg/liter Table 2: Environmental sample results associated with blank hits. Sample Matrix Method Analyte Sample Qualified Qualification Blank ID Result Result Limit ID N001 WG 8240 Benzene 7 mg/liter 7 U* mg/liter 50 mg/liter FB001 N002 WG 8240 Styrene (monomer) 40 mg/liter 40 U* mg/liter 50 mg/liter LB001
  10. 10. Table 3: EQuIS/Chem Data Verification Module for accuracy. Sample delivery group A02. Out of control spike recoveries. Sample ID Lab Sample ID Type Analyte Recovery LCL UCL 001SOMS AA78414 MS Antimony 18.8% 75.0 125 001SOMSD AA78437 SD Antimony 17.2% 75.0 125 001SOMSD AA78437 SD Barium 129% 75.0 125 001SOMS AA78414 MS Cadmium 45.0% 75.0 125 001SOMSD AA78437 SD Cadmium 56.6% 75.0 125 001SOMS AA78414 MS Copper 140% 75.0 125 001SOMS AA78414 BS Manganese 267% 75.0 125 001SOMSD AA78437 BD Manganese 302% 75.0 125 Table 4: Environmental sample results associated with out of control spike recoveries. Sample Matri Prep Method Analysis Analyte Result Hit? Flag ID x Method N011 SO SW3050 SW6010 Antimony 7 mg/kg Y J N011 SO SW3050 SW6010 Barium 32 mg/kg Y J N011 SO SW3050 SW6010 Cadmium 45 mg/kg Y J N011 SO SW3050 SW6010 Copper 16 mg/kg Y J N011 SO SW3050 SW6010 Manganese 102 mg/kg Y J