What does your chemical result mean? How does one result relate to another result taken 3 months earlier? Were the analytical methods the same? Were the sampling procedures the same? Are these questions that you can asnwer?
Managing Data Quality: What does your Chemical Result Mean?
Managing data quality: What does your
chemical result mean?
K. Stagg1, D. Tuhovak2, & M. Beard3
Environmental Simulations, UK
Conestoga-Rovers and Associates, U.S.A.
EarthSoft, Inc, U.S.A.
What does your chemical result mean? How does one result relate to
another result taken 3 months earlier? Were the analytical methods the same?
Were the sampling procedures the same? Are these questions that you can
Quality assurance provides a history for every piece of data, enabling
validation and traceability of your data. Quality assurance of data should be a
prime concern of every data handler; however, cost implications, inadequate data
management software and the lack of legislation result in little data being
recorded. With a greater need to provide justification of data and higher
technology in the form of electronic data transfer and comprehensive database
systems, a good QA/QC system should be standard for all environmental data.
EarthSoft's EQuIS (Environmental Quality Information System)
provides a means of obtaining and validating laboratory data electronically. The
laboratory provides data in a standardized electronic format that includes both
sample data and supporting quality control data. The data are imported into the
end user's database, eliminating potential hand-entry errors. An initial validation
of the data is then performed using the Data Verification Module (DVM)
software. The DVM essentially identifies outlying quality control results,
summarizes potentially impacted sample results and recommends data qualifiers
to the validator, significantly reducing the amount of time needed to validate
data. Using EQuIS, data of known quality are easily and accurately transferred
from the Laboratory Information Management System (LIMS) to the end user's
What does your chemical result mean? There are many influences that
may affect the chemical result that is given for your sample, and there are many
reasons that this result may differ from another result that you would expect to
be the same. It is therefore extremely important to be able to assess the
influences that produce these differences to enable an accurate interpretation of
these results. In some cases it may be possible to quantify the errors associated
with these results, but in other cases this may prove impossible. Therefore, it is
imperative to track quality assurance results that may indicate unreliable
chemical results from environmental samples. This paper looks at the possible
ways of assessing errors associated with chemical results by quantification of the
errors and by good quality assurance of data to provide validation and
traceability of your data.
Error analysis concerns the investigation of data to identify and quantify
influences on the data. The accuracy of data—how close a single measurement is
to the true value—is determined by the bias and/or the precision of the data. The
bias is a measure of the systematic error of a method and the precision is a
measure of the random error of a method. These measurements are generally
summed and expressed as uncertainty values – the interval around the result of
the measurement that contains the true value with high probability. Standard
uncertainty is generally expressed as one standard deviation from the
measurement; extended uncertainty is generally two standard deviations .
Classifying Errors in Environmental Data
Error analysis can be applied to environmental samples by classifying
the errors associated with the sample as sampling or analytical error. Sampling
error provides an indication of whether the sample is representative of the scale
of interest for which it was obtained. Analytical error provides an assessment of
the accuracy of the actual analytical characterization methods and whether the
method of characterization accurately determines the data from the sample.
Within environmental samples the associated errors can generally be divided
into four categories: sampling precision; sampling bias; analytical precision; and
analytical bias. Various modifications to the sampling procedures are required to
quantify these errors and considerable work is available on the determination of
errors associated with sampling [1-4]. It is possible to produce calculations of
these errors by adopting a simple error-analysis field-sampling protocol.
Analytical Precision and Bias
Analytical precision may be calculated by producing duplicate samples,
obtained by splitting the original sample and performing the analytical method
on both halves. The analytical precision variance is obtained by the differences
between the results produced by the two samples.
The analytical bias of the sample method is the systematic variance of
the result from the true value. This bias may be introduced due to incorrect
calibration of analytical instrumentation, use of defective standards, or the
improper execution of analytical techniques. The analytical precision is
determined by varying these procedures by using different laboratories,
standards from independent sources, or different analytical methods. Therefore,
the analytical precision variance is determined by the difference in the results
produced by duplicate analytical samples - samples where the original is split –
with the analytical method changed.
Quality assurance programs are used by analytical laboratories to
measure bias and variability in analytical results that cause errors in the data, and
to define acceptable ranges for these errors. Under a comprehensive QA
program, a laboratory continually monitors analytical bias and precision by
including quality control samples with each batch of investigative samples, and
comparing the results to statistically derived acceptance limits. QC results that
fall outside of these limits prompt the laboratory to investigate the cause and to
take necessary corrective actions to improve data quality. Additionally, other
elements of a standard laboratory QA program are designed to ensure analytical
accuracy and precision including sample handling procedures to ensure that
samples are not mislabelled, are properly stored, and are not cross-contaminated;
sample documentation procedures to ensure that laboratory calculations are
performed and reported correctly; standard preparation procedures to ensure that
reference standards are prepared properly or obtained from a reputable source
and all are checked against a second source to confirm accuracy; training
procedures to ensure that all analysts are performing analyses in a consistent
Standard quality control samples used by analytical laboratories include the
• Matrix spike samples are prepared by spiking a second aliquot of an
investigative sample with known concentrations of analytes of concern to
assess the potential effects of sample matrix on analytical accuracy. The
recoveries of the spiked analytes, less the concentrations of these analytes in
the original samples, are reported as a percentage of the expected values.
• Laboratory control samples are blank samples that are spiked with known
concentrations of analytes of interest, and are prepared and analysed as
investigative samples. The recoveries of the spiking compounds are used to
assess overall analytical accuracy.
• Surrogate compounds are compounds similar in structure and characteristics
to the analytes of interest, but would not typically be detected at the source
from which the samples were collected. Surrogates are added to every
investigative sample and the recoveries are compared to the expected values
to assess potential matrix effects on analytical accuracy.
• Blank samples are prepared and analysed as samples to assess potential
contamination introduced during sample analysis. Analytes detected in the
blanks may also have been introduced.
The quality control samples typically used to assess analytical precision include
duplicate and/or matrix spike duplicate samples.
• Duplicate samples are prepared and analysed independently and the results
compared to assess analytical precision.
• Matrix spike duplicate samples are prepared by spiking two separate sample
aliquots and the recoveries are compared to assess analytical precision.
Sampling Precision and Bias
The sampling precision is determined by obtaining two samples, both
representing the intended sample scale. This repeat sampling may involve
sample variations spatially or temporally. For example, if a sample is intended to
represent the groundwater from a borehole over a period of a day, then repeat
samples should be taken at different levels within the borehole (spatial variance)
and at different times during that day (temporal variance). The sampling
precision variance is determined by differences between the results obtained for
these repeat samples.
The sampling bias is probably the hardest error to quantify; it is caused
by the systematic variance of the result from the true result due to sampling
practice. For example, if samples are contaminated during sampling, the
accuracy of the results would be impacted. To quantify this error, different
sampling methods are required and the results produced by both methods
assessed. However, in order to calculate the sampling error directly, the other
variances (sampling precision etc.) must remain constant. Therefore, this error is
generally determined by the variance of the results obtained from separate
sampling sessions where all of the discussed errors are determined in both
sessions [5, 6].
Changes to field protocols and sampling patterns can provide an
indication of these errors; however complex-sampling patterns are expensive and
cost implications, inadequate data management software and lack of legislation
may result in little data to produce these calculations. These field protocols also
do not allow assessment of where or how these errors have occurred and
therefore how to minimize them. In order to interpret these errors, assessment of
laboratory quality assurance data and a strong validation and traceability of your
data is required.
Assessing Data Usability
Prior to collecting and analyzing samples, the data end-user must first
define the program objectives and determine the level of error in the data that
can be tolerated, while still meeting these objectives. These data quality
objectives (DQOs) are often expressed as tolerance ranges for the QC sample
The assessment of the analytical results to determine whether program
DQOs have been met is typically performed by an independent third party using
a process referred to as data validation.
During data validation, the validator reviews the results of laboratory
and field QC sample analyses and assesses them against the project DQOs.
Depending on the level of review, the validator may also examine raw analytical
data to perform a more comprehensive review of the analytical activities,
including verifying the laboratory’s calculations, assessing instrument
calibration data, and reviewing analyte identifications.
Using the results of the data review, the validator determines the impact
of any error identified on the sample results. Data deemed unusable are rejected.
Sample data associated with QC results that show potential biases or variability
in the analytical process, that are not significant enough to warrant data
rejection, are qualified by the data validator as estimated values.
Electronic Tools for Data Quality Assessment and Handling
The process of obtaining data from the analytical laboratory, assessing
data quality, and transferring these data into the format required by the end-user,
is a difficult process. Using EQuIS the data end-user obtains analytical data
from the laboratory in a standardised electronic format for import into the
electronic database without introducing the errors associated with manual data
input. The end-user can then access their data electronically from the database
and can import the data into the required documents, using whatever format is
EQuIS is a comprehensive geo-environmental data management
database designed to store analytical test data and related data obtained during
environmental site investigations, routine site monitoring, and hazardous waste
remediation projects. EQuIS can be used for report and chart generation and is
integrated with multiple statistical, numerical modeling and data visualization
Although data validation requires significant expertise in analytical
chemistry, some portions of this process can be automated using the Data
Verification Module (DVM). DVM is an optional companion program to
EQuIS. DVM can access the data in an EQuIS chemistry project database to
identify out-of-control QA results and to flag the corresponding environmental
results. The comprehensive electronic deliverable includes analytical sample
results, QC results, control limits established to assess the QC results, analytical
method test details, test batch groupings, preparation & analytical test dates and
times, and other significant information about the analytical program.
EQuIS DVM allows the data manager to edit validation parameters and
settings; data can be selected by sample delivery group (SDG). Multiple sample
delivery groups can be processed simultaneously. Verification options include
detection limit, accuracy, contamination, precision, holding times, surrogate, and
Selecting one or more sample delivery groups for processing allows the
data verification process to begin. Warning messages will be generated during
the verification process unless interactive warning message suppression is
specified or the data is in order. In any case, a DVM process log (with any
warnings) is displayed on the screen and can be printed and/or saved to a file.
Using DVM, error levels that did not meet the program DQOs are
identified. The DVM report includes a summary of outlying QC results and the
sample results potentially impacted by the biases and variability indicated by
these outliers. DVM can assign validator qualifiers to these data in accordance
with standard data validation conventions.
DVM can be used immediately after data are imported into EQuIS to
identify potential data quality issues that may require re-collection of samples.
In many cases, identifying these outliers as quickly as possible can help avoid
additional costs and project delays.
DVM performs the majority of the components of a low-level data
validation. For more comprehensive data validations, DVM can be used to
perform the majority of the steps that do not require analytical decision making
capabilities (e.g. identifying QC outliers and summarising potentially impacted
samples) and the validator’s time can be spent performing the facets of
validation that require analytical expertise (e.g. reviewing raw data to confirm
analyte identifications, etc.).
The reports prepared with EQuIS Chemistry/DVM focus on relevant
findings in each of the following functional areas: Holding Times,
Contamination, Accuracy, Precision, and Surrogates,
The reports prepared by DVM are simple and concise, showing only the
data needed to access data quality and minimizing raw data. For example, the
DVM accuracy report shows spike recoveries. The spike recoveries shown by
DVM are the most directly relevant information, and showing the spike
concentration values does not contribute directly to data quality review. If these
spike concentration data are present (usually they are not available), then DVM
will recalculate recoveries to check the laboratory's math, but this review does
not directly impact issues of accuracy. The reports are designed to be usable in
12-point text in landscape page layout. Shading is not used to improve faxing.
The reports are created as Microsoft Word files to simplify editing and inclusion
in other reports.
The DVM contamination report includes information from lab blanks as
well as field blanks; this level of review is generally impossible within the lab.
EQuIS Chemistry allows very flexible associations between field blanks and
normal environmental samples. For example, a normal environmental sample
can be associated with the equipment rinsate blank that preceded it and with the
equipment blank that followed it.
DVM limits the report to only those results that are thought to be
related to contamination. If a normal environmental sample result is some factor
higher than the blank hit result (typically 5 or 10 times higher), then the normal
environmental sample result is usually considered valid (i.e., not due only to
contamination). DVM allows the user to select this blank ratio on a per-
The DVM holding time summary report focuses on elapsed times and
limits. The sample preparation information is shown only if a relevant holding
time exists. This summary report is at a test event level; it helps to show how
many tests were late. The DVM holding time detail report focuses on sample
results. This summary report is at a result level; it helps to show how the impact
on results. DVM supports both method and contractual holding limits.
The DVM accuracy report includes information from matrix spikes as
well as laboratory control samples. For out of control blank spike recoveries
(sample types BS, SD), the prep or analytical batch is used to find associated
environmental sample results. For out of control inorganic matrix spike
recoveries (MS, SD), the SDG is used to find associated environmental sample
results. For out of control organic matrix spike recoveries (sample types MS,
SD), the flags are assigned only to the environmental sample from which it was
derived (i.e., the quot;neatquot; sample).
The DVM precision report includes information from matrix spike
duplicates, laboratory control sample duplicates, lab duplicates, and field
duplicates or replicates.
The DVM surrogate report summarizes all surrogate recoveries that did
not meet the individual program DQOs and lists the sample results potentially
impacted by the errors in analytical accuracy that these outlying recoveries
See Appendix for example Contamination and Accuracy reports.
There are many sources of potential error in analytical data. For error
that is inherent to sample collection and analysis activities, systems must be in
place to identify and quantify the level of error. The data end-user must
determine the level of error that can be tolerated for their program objectives
prior to sample collection. To avoid some of the error potentially introduced
with data manipulation, data should be used in an electronic format.
EarthSoft's EQuIS (Environmental Quality Information System)
provides a means of obtaining and validating laboratory data electronically.
Using this system, data manipulation errors are minimized, and analytical and
sampling error can be identified and assessed against project DQOs.
Using EQuIS, data of known quality are easily and accurately transferred from
the Laboratory Information Management System (LIMS) to the end-user's
database, providing the end-user with unlimited flexibility for data reporting and
 Ramsey, M.H., Watkins, P.J. & Sams, M.S. Estimation of measurements for
in-situ borehole determinations using a geochemical tool. Q.J.E.G., 15, pp.
 Davis, J.C. Statistics and Data Analysis in Geology, John Wiley & Sons:
New York, 1986.
 Puls, R.W., Clark D.A., Bledsoe, B., Powell, R.M., & Paul, C.J. Metals in
groundwater-sampling artifacts and reproducibility. Hazardous Waste &
Hazardous Materials, 9(2), pp.149-162, 1992.
 Rouhani, S. & Hall, T.J. Geostatistical schemes for groundwater sampling.
Journal of Hydrology, 103, pp.85-102, 1988.
 Brahana et al. (eds). Hydrochemical variations with depth in a major UK
aquifer: the fractured, high permeability Triassic Sandstone. Gambling
with groundwater—physical, chemical, and biological aspects of aquifer-
stream relationships, American Institute of Hydrology: Minnesota, pp. 53-
 Stagg, K.A. An investigation into colloids in Triassic Sandstone
Groundwaters. Unpublished PhD thesis, University of Birmingham, UK,
Table 1: EQuIS Chemistry DVM Report for Contamination. Sample deliver group A01. Blank Hits.
Blank matrix Blank type Blank sample ID Contaminant Blank concentration Qualification Limit
WQ LB LB001 Styrene (monomer) 5 mg/liter 50 mg/liter
WQ FB FB001 Benzene 5 mg/liter 50 mg/liter
Table 2: Environmental sample results associated with blank hits.
Sample Matrix Method Analyte Sample Qualified Qualification Blank
ID Result Result Limit ID
N001 WG 8240 Benzene 7 mg/liter 7 U* mg/liter 50 mg/liter FB001
N002 WG 8240 Styrene (monomer) 40 mg/liter 40 U* mg/liter 50 mg/liter LB001
Table 3: EQuIS/Chem Data Verification Module
for accuracy. Sample delivery group A02.
Out of control spike recoveries.
Sample ID Lab Sample ID Type Analyte Recovery LCL UCL
001SOMS AA78414 MS Antimony 18.8% 75.0 125
001SOMSD AA78437 SD Antimony 17.2% 75.0 125
001SOMSD AA78437 SD Barium 129% 75.0 125
001SOMS AA78414 MS Cadmium 45.0% 75.0 125
001SOMSD AA78437 SD Cadmium 56.6% 75.0 125
001SOMS AA78414 MS Copper 140% 75.0 125
001SOMS AA78414 BS Manganese 267% 75.0 125
001SOMSD AA78437 BD Manganese 302% 75.0 125
Table 4: Environmental sample results associated with out of control spike recoveries.
Sample Matri Prep Method Analysis Analyte Result Hit? Flag
ID x Method
N011 SO SW3050 SW6010 Antimony 7 mg/kg Y J
N011 SO SW3050 SW6010 Barium 32 mg/kg Y J
N011 SO SW3050 SW6010 Cadmium 45 mg/kg Y J
N011 SO SW3050 SW6010 Copper 16 mg/kg Y J
N011 SO SW3050 SW6010 Manganese 102 mg/kg Y J