SlideShare a Scribd company logo
1 of 142
Download to read offline
The Role of Data Quality
Assessment in a Project
Dr. Ferdin Joe John Joseph
Kamnoetvidya Science Academy
Rayong,Thailand
ferdinjoe@gmail.com
Course Objectives
At the end of this course, you will be able to:
•Explain why DQA is important and how it can be applied to
your projects
•List the five steps of DQA and explain the purpose of each
step
•Evaluate the application of DQA on a dataset
•Interpret basic statistics and simple graphs
•Recognize different software tools and other resources for
performing DQA
Data Quality
Meaningful only when "data quality" relates to
intended use of data
Some data are good ("high quality") for some
purposes but are bad ("low quality") for others
Data Quality Assessment
A scientific and a statistical evaluation to determine if
data are adequate for their intended use
DQA is described in the Guidance for Data Quality
Assessment: Practical Methods for Data Analysis
(EPA/QA G-9), EPA/600/R-96/084, July 2000
The Project Life Cycle
Product or Decision
Plan for Data Collection - Set data quality
objectives or other performance and acceptance
criteria. Document in QA Project Plan.
Collect Data - Collect/assemble data in
accordance with QA Project Plan. Perform
assessments defined in Plan.
Assess and Use Data - Verify whether the
data meet acceptance criteria. Run
statistical methods to analyze data.
DQA is Performed ...
Whenever data are used to make a decision, for
estimation, or for research purposes
This applies to:
–New data to be collected
–Data collected by someone else
–Data collected by you for another project
IMPLEMENTATION
Field Data Collection and Associated
QA / QC Activities
PLANNING
Systematic Planning (e.g., Data
Quality Objectives Process)
QA Project Plan Development
ASSESSMENT
Data Validation/Verification
Data Quality Assessment
OUTPUT
INPUT
OUTPUT
QUALITY ASSURANCE ASSESSMENT
CONCLUSIONS DRAWN FROM DATA
DATA VALIDATION/VERIFICATION
Verify measurement performance
Verify measurement procedures and
reporting requirements
VALIDATED/VERIFIED DATA
DATA QUALITY ASSESSMENT
Review objectives and design
Conduct preliminary data review
Select statistical method
Verify assumptions
Draw conclusions
QC/Performance
Evaluation Data
Routine Data
INPUTS
Data Verification - the process of evaluating the
completeness, correctness, and conformance/
compliance of a specific data set against the method,
procedural, or contractual requirements
Data Validation - an analyte- and sample-specific
process that extends the evaluation of data beyond
method, procedural, or contractual compliance (i.e.,
data verification) to determine the analytical quality of
a specific data set
Data Quality Assessment - the process to determine if
the data are suitable for a specific use
Verification vs. Validation vs.
Assessement
The Five Steps of
Data Quality Assessment
1. Review the Objectives and Sampling Design
2. Conduct a Preliminary Data Review
3. Select the Statistical Method
4. Verify the Assumptions of the Statistical
Method
5. Draw Conclusions from the Data
Data Quality Assessment
Do the
assumption
s hold?
No
Yes
Step 4: Verify the assumptions of the method
Step 5: Draw conclusions
Step 3: Select the statistical method
Step 2: Learn more about the data
Step 1: Review Decision Problem
Revise the scope
of the problem
Choose a new statistical test
Transform or otherwise modify
the data
OR
OR
Product/decision
Two Views of DQA
1
Define the Decision Rule
and Decision Errors
2
Specify Acceptable Limits on
Decision Errors
3
Identify Method for Applying
Decision Rule
4
Ensure that Method
is Defensible
5 Apply the Decision Rule
1
Define the Statistical
Hypotheses
2
Determine Acceptable Type I
and Type II Error Rates
3
Identify Statistical Test or
Method and Assumptions
4
Assess Validity of
Statistical Test/Method
5
Perform Statistical Test
and Assess Design
DECISION MAKER'S VIEW DATA ANALYST'S VIEW
DQA Project Table
Project Objective &
Data Collection
Design (Step 1)
Observations from
QA Reports,
Summary Statistics,
and Graphs (Step 2)
Statistical Method
and Assumptions
(Step 3)
Verification of
Assumptions
(Step 4)
Results from
Statistical Method
(Step 5)
LIST:
- Objective
- Parameter of
interest
- Type of analysis
needed
- Type of data
collection design
- Information on
deviations from the
design in the
implementation
LIST:
- Non-detects
- Probable
distribution
- Potential outliers
- Anomalies
LIST:
- Analysis method
- Assumptions to
verify
- Significance levels
LIST:
- Assumptions,
whether they were
met, and how they
were verified
(including
significance levels)
LIST:
- Final results from
data analysis
- Other factors
affecting the final
product or decision
[This column will
contain an overview
of the project and
background
information against
which to determine
"quality."]
[This column will
contain information
that will provide
insight into which
assumptions might
be met.]
[This column will
contain information
on the statistical
method and its
assumptions.]
[This column will
describe what
assumptions were
checked, how they
were checked, and
what the results
were.]
[This column will
summarize the final
results from the
statistical test and
other factors to
consider in the final
product or decision.]
Overview of
Data Quality Assessment
The Five Steps of Data Quality
Assessment
1. Review objectives and data collection design
2. Conduct a preliminary data review
3. Select the statistical method
4. Verify the assumptions of the statistical method
5. Draw conclusions from the data
IMPORTANT: If data other than (or in addition to) new
data is being used, ALL steps must still be performed!
DQA Step 1: Review Objectives
and Data Collection Design
Translate the data user's objectives into a statement
of the primary statistical hypothesis or estimation
goal
Translate the data user's objectives into tolerable
limits on the probability of committing decision errors
Review the sampling design and note any special
features or deviation from the sampling plan
Step 1: Input
QA Project Plan or any other planning documents that
contain:
–Project objective or question to be answered
–Decision performance criteria (DQOs) or other
performance and acceptance criteria
Field Sampling Plan and any reports on actual
implementation of sampling plan
If Systematic Planning was Performed...
Use the reports documenting the planning to answer:
–What is the objective of the project?
–What are the performance or acceptance criteria for
the product or decision?
If Systematic Planning was NOT
performed...
Decision Making: Apply the Data Quality Objectives
Process or other planning process to:
–develop hypotheses
–define the potential decision errors
–specify tolerable limits on making decision errors
Estimation: Use a systematic planning process to:
–select parameters
–develop performance or acceptance criteri.
Example Hypotheses:
–Null Hypothesis: True mean is less than 50 mg/Kg
–Alternative: True mean is greater than 50 mg/Kg
Example Decision Errors: If the null hypothesis is
that the mean is less than 50 mg/Kg:
–False Rejection: Decide that the true mean PAH
concentration is more than 50 mg/Kg when it is
really less than 50 mg/Kg
–False Acceptance: Decide that the true mean PAH
concentration is less than 50 mg/Kg when it is
really greater than 50 mg/Kg
Example Hypothesis & Decision Errors
False Rejection: Decide that the true mean PAH
concentration more than 50 mg/Kg, when it is really
less than 50 mg/Kg
–10% probability of making an error at 50 mg/Kg
–5% probability of making an error at 25 mg/Kg
False Acceptance: Decide that the true mean PAH
concentration is less than 50 mg/Kg, when it is really
greater than 50 mg/Kg
–10% probability of making an error at 70 mg/Kg
–5% probability of making an error at 100 mg/Kg
Example Limits on Decision Errors
Reviewing the Sampling Design
Review the planned sampling design and the
information on the actual data collection; note any
special features or deviations
Determine whether these deviations could effect the
potential analysis of the data
Step 1 - Output
Well-defined project objectives and criteria
Verification that the hypothesis chosen is consistent
with the objective and criteria
A list of any deviations from the planned sampling
design and the effects of these deviations
Review quality assurance reports for anomalies
Calculate standard statistical quantities
Display the data using graphical representations
DQA Step 2:
Conduct Preliminary Data Review
Step 2: Input
Verified and Validated Data
QA reports, QC data
Technical systems audit results:
–Performance evaluations
–Corrective action reports
–Data verification and validation reports
QA Project Plan, Sampling and Analysis Plan, or other
planning documents
Review QA Reports
Look for:
–Failure to meet acceptance criteria/obvious QC
violations:
ƒ variable detection limits
ƒ nonequivalent analytical methods
–Implementation anomalies from the QA Project
Plan:
ƒ negative emission rates
ƒ pH values exceeding 14.0
ƒ Values in wrong reporting units
Calculate Standard Statistical Quantities
Statistical Quantities include measures of:
–Central Tendency (mean, median, etc.)
–Relative Standing (percentiles)
–Dispersion (range, variance)
–Association (correlation)
Review these quantities to determine:
–Do the data look reasonable - do the values make
sense?
–Are there any obvious anomalies?
–Are there any trends or patterns?
Display the Data using Graphs
Common Graphs:
–Histogram
–Stem-and-Leaf
–Box and Whiskers
–Scatter Plot
–Time Plot
Review graphs to determine:
–Do the data look reasonable?
–What is the distribution like? Is it symmetric,
bimodal?
–Are there extremely high or extremely low values?
–Are there any obvious trends?
Step 2: Output
Statistical quantities and graphs that provide you with
a preliminary understanding of the data and any
potential issues including:
–Distribution of data
–Potential outliers
–Non-detects
DQA Step 3:
Select the Statistical Method
Select the statistical method based on the data user's
objectives and the preliminary data review
Identify the assumptions underlying the statistical
method
Step 3: Input
Project objectives, hypotheses, and preliminary
statistical method if identified
Background on statistical methods
If identified during planning, determine if that choice
seems reasonable based on the preliminary review of
the data
Otherwise, select statistical method based on the data
user's objectives and the preliminary data review
Example Methods:
–Tests: One-sample t-test, Two-sample t-test, Test for
a single proportion, Wilcoxon Signed Rank Test
–Estimation
–Regression Analysis
–Time Series Analysis
Select Method
Every method has assumptions.
Common Assumptions:
–Distributional form
–Independence
–Dispersion characteristics
–Homogeneity
–Basis for randomization
Example - One-sample t-test: random sample,
independence of data; sample mean is normally
distributed; no outliers; few "non-detects"
Identify Assumptions
Step 3: Output
Proposed statistical method that looks appropriate for
the data and the project objectives
List of assumptions for the statistical method
DQA Step 4: Verify the Assumptions
of the Statistical Method
Determine approach for verifying assumptions
Perform tests of assumptions
If necessary, determine corrective actions to be taken
Step 4: Input
Data
Assumptions identified for statistical method
Methods to verify these assumptions along with their
formulas
Determine Approach for Verifying
Assumptions and Perform Test
Evaluate Step 3 to see what assumptions need to be
verified
Determine what tests are available for verifying
assumptions for this dataset
Select test and appropriate significance level
Determine Corrective Actions
If this data set does not meet the needed assumptions,
determine the next steps that should be taken:
–Repeat Step 3 and select a different method for
analyzing the data
–Transform the data
–Reduce significance level
–Gather additional data
–Modify objective
ƒ . . . But this should be done with caution.
Step 4: Output
Documentation of the method used to verify each
assumption and the results of these methods
Corrective actions (if necessary)
DQA Step 5:
Draw Conclusions from the Data
Perform the calculations for the statistical method
Evaluate the results and draw conclusions
Step 5: Input
Data
Objective, hypotheses (if applicable), and
performance or acceptance criteria
Formulas for statistical method
Non-statistical factors to incorporate into the final
decision or product
Perform the Statistical Method
Use formulas and procedures from standard text
books
Use software to perform the calculations:
–SPSS
–SAS or Splus
–R
–DataQUEST
Note: Software could be used on any data, whether
the assumptions have been verified or not. But when
the assumptions don't hold, then the results are
highly suspect.
Evaluate the Results
The statistical results are not necessarily the answer.
Factor in items like:
–Practical significance
–Political/social factors
–Contextual significance
Step 5: Output
Statistical results with a specified significance level
Final Product or Decision
DQA Steps - Summary Table
STEP INPUT PROCESS OUTPUT
1 QA Project Plan or any other planning
documents.
Project objective or question to be
answered; decision performance criteria or
other performance and acceptance criteria
Reports (e.g., Field Sampling Plan) on
actual implementation of sampling plan
Translate objectives into a statement of
the primary statistical hypothesis or
estimation goal
Translate objectives into tolerable limits
on the probability of committing decision
errors
Review the sampling design and note
any special features of deviations
Well-defined project objectives and
criteria.
Verification that the hypothesis chosen is
consistent with the objective and criteria
A list of deviations from the planned
sampling design and the effects of these
deviations.
2 Verified and Validated Data
QA reports, QC data
Technical systems audits results
QA Project Plan, Sampling and Analysis
Plan, or other planning documents
Review quality assurance reports for
anomalies
Calculate standard statistical quantities
Display the data using graphical
representations
Statistical quantities and graphs that
provide you with a preliminary
understanding of the data and any
potential issues
3 Project objectives, hypotheses, and
preliminary statistical method if identified
Background on statistical methods
Select the statistical method based on the
data user's objectives and the preliminary
data review
Identify the assumptions underlying the
statistical method
Proposed statistical method that seems
appropriate for the data and the project
objectives.
List of assumptions for the statistical
method.
4 Data
Assumptions identified for statistical method.
Methods to verify assumptions along with
formula
Determine approach for verifying
assumptions
Perform tests of assumptions
If necessary, determine corrective actions
to be taken
Documentation of the methods used to
very each assumption and the results.
Corrective actions (if necessary)
5 Data
Hypotheses (if applicable) and performance
or acceptance criteria
Formula for statistical method
Non-statistical factors to incorporate into the
final decision or product.
Perform the calculations for the statistical
method
Evaluate the results of the statistical
method and draw conclusions
Statistical results with a specified
significance level
Final product or decision.
Data Quality
Assessment
Steps 1, 2, and 3
The Five Steps of Data Quality
Assessment
1. Review objectives and data collection design
2. Conduct a preliminary data review
3. Select the statistical method
4. Verify the assumptions of the statistical method
5. Draw conclusions from the data
DQA Step 1: Review Objectives
and Data Collection Design
A. Translate the project's objectives into a statement
of the primary statistical hypothesis or estimation goal
B. Translate the data user's objectives into
performance or acceptance criteria
C. Review the sampling design and note any special
features or deviation from the sample plan
Two Situations to Consider
Project Systematically Planned
–Use QA Project Plan and other planning documents
to perform this action
Project NOT Systematically Planned
–Use a systematical planning process (e.g., the Data
Quality Objectives Process) to plan retrospectively
Project Objective (Step 1A)
The objective indicates what the final outputs from
the project should be
For example:
–If the goal is to ascertain if the contamination
exceeds a threshold, the objective would be
"determine whether the contamination is greater
than X"
–If the goal is to ascertain if the contamination in the
soil has reached the groundwater, the objective
would be "determine whether the contamination
can be detected in the aquifer"
Different Analysis Methods
Estimation
Hypothesis Testing
Regression
Analysis of Variance
Time Series Analysis
Spatial Analysis
Defining the Boundaries (Step 1A)
Define the geographical area within which decisions
apply and the media of concern (Spatial Boundary)
Determine the time frame to which the study results
apply (Temporal Boundary)
Define a scale of decision making
Specifying Criteria (Step 1B)
Set quantitative performance or acceptance criteria
Consider consequences of any potential decision
errors. Consequences may include:
ƒ Health risks
ƒ Ecological risks
ƒ Political risks
ƒ Social risks
ƒ Resource risks
EXAMPLE: When selecting between two opposing
conditions, define a 'gray region', false rejection error
limit, false acceptance error limit.
Review the Sampling Design (Step 1C)
Information Needed Where it would be found
Original Sampling Plan - locations
and types of samples to be taken.
QA Project Plan or other planning
documents. Documentation from
the Systematic planning process.
Details on how the samples were
actually collected.
Summary reports from field
notes, maps.
Documentation of deviations from
sampling plan.
Must be developed based on
comparison of original plan and
actual implementation.
Review of deviations to ensure
that the implemented plan still
meets the objectives for the
project.
Must be developed based on
information from systematic
planning on study objectives.
What if you are assessing data
collected by another project?
Gather available information about the way that that
data were collected - for example, sample collection
plans, implementation of sample collection plans,
analytical method used, . . .
PCB Example: Background
Electronic Manufacturing Corporation of America
operated at site from 1965 to 1985 and sold the site to
Energy Components Company in 1985. Both
companies went bankrupt in 1990.
In 1991, chlorinated solvents were discovered in
water from city wells located in a field east of site.
Waste oil contaminated with PCBs was sprayed on a
dirt road on the site for dust suppression while the
site was operational.
Problem: Determine if the extent of PCB
contamination along the road presents unacceptable
risks and remedial action is needed.
PCB Example: Statistical Hypotheses
If the mean concentration of total PCBs in surface soil
(top 1 inch) over the dirt road exceeds 2 mg/Kg, then
take remedial action; otherwise, take no further
action.
Null Hypothesis: True Mean < 2 mg/Kg
vs.
Alternative Hypothesis: True Mean > 2 mg/Kg
PCB Example: Tolerable Limits on
Decision Errors
Probability of Deciding that the
Mean Exceeds 2 mg/Kg
True Concentration of PCB (mg/Kg)
Tolerable
False
Rejection
Decision
Error Rates
Tolerable
False
Acceptance
Decision
Error Rates
Action
Level
Gray
Region
Composite samples
selected using
simple random
sampling from the
dirt road.
Dirt road was one of
4 strata at the site
(stratum 1).
Each composite
consists of 5 mini-
samples.
PCB Example: Sampling Design
Stratum 1
PCB Example: Data
PCB concentration levels were measured (in mg/Kg)
from 16 surface soil samples (top one inch of soil)
from the dirt road. Each soil sample consists of 5
mini-samples composited together.
1.92 2.49 4.58 1.17
2.48 5.62 2.54 25.15
7.72 1.02 2.91 3.23
2.87 8.66 1.71 1.18
A. Review quality assurance reports for anomalies:
ƒ Anomalies in recorded data, missing values,
deviations from standard operating procedures,
failure to meet acceptance criteria, use of
nonstandard data collection methodologies
B. Calculate standard statistical quantities:
ƒ Do the data look reasonable? Do the values make
sense? Are there any anomalies?
C. Display the data using graphical representations:
ƒ Are there any trends? What is the distribution
like? Are there any extreme values?
DQA Step 2:
Conduct Preliminary Data Review
Review QA Reports (Step 2A)
Data validation reports that document sample
collection, handling, analysis, data reduction, and
reporting procedures used
Quality control reports from laboratories or field
stations that document measurement system
performance, including data from check samples,
split samples, spiked samples, or any other internal
QC measures
Technical system reviews, performance evaluation
audits, and audits of data quality, including data from
performance evaluation samples
Summary Statistics (Step 2B)
Central Tendency: mean, median, mode
Relative Standing:
–5th percentile
–25th percentile
–50th percentile
–75th percentile
–95th percentile
–99th percentile
Dispersion: variance, standard deviation, range
Association: correlation coefficient, regression
Commonly Used Graphs (Step 2C)
Histogram Scatter Plot
Stem-and-Leaf Time Plot
Ranked Data Plot Spatial Correlogram
Quantile Plot Posting Plot
Normal Probability Plot Symbol Plot
PCB Example: Statistical Quantities
Number of Observations: 16
Minimum: 1.020 Maximum: 25.150
Mean: 4.703 Median: 2.705
Variance: 34.808 Standard Deviation: 5.900
Range: 24.130 Interquartile Range: 3.285
Coefficient of Variation: 1.254
Coefficient of Skewness: 2.818
Coefficient of Kurtosis: 7.321
Percentiles:
1st: 1.020 75th: 5.100
5th: 1.020 90th: 8.660
10th: 1.170 95th: 25.150
25th: 1.815 99th: 25.150
PCB Example: Histogram
PCB Example: Ordered Data Plot
PCB Example: Summary of Statistical
Quantities and Graphs
Not symmetric
One extreme value (21.15)
Does not appear to be normally distributed
DQA Step 3:
Select the Statistical Method
A. Select the statistical method based on the project's
objectives and the preliminary data review
B. Identify the assumptions underlying the statistical
method
If the statistical method has been identified during
planning, ensure it is applicable based on the
preliminary review of the data
Otherwise, select statistical method based on the data
user's objectives and the preliminary data review
Example Tests:
–One-sample t-test
–Two-sample t-test
–Test for a single proportion
–Wilcoxon Signed Rank Test
Select Method (Step 3A)
Each method has assumptions which can be found in
references on that test or in Guidance for Data Quality
Assessment
Common Assumptions:
–Distributional form
–Independence
–Dispersion characteristics
–Homogeneity
–Basis for randomization
Example: One-sample t-test: random sample;
independence of data; sample mean is normally
distributed; no outliers; not an excessive amount of
"non-detects"
Identify Assumptions (Step 3B)
ONE SAMPLE t-TEST ASSUMPTIONS:
No outliers (sample mean and standard deviation are
very sensitive to outliers)
Sample mean approximately normally distributed
Random sample (independence of the data values)
Relatively few values below the detection limit
PCB Example: Select Method and
Identify Assumptions
Exercise: Using Graphs
Example: Raw Data
ND 4.5 19.4
ND 5.8 19.5
ND 6.2 19.5
ND 6.8 19.7
ND 8.33348 19.7
ND 7.4 19.7
ND 12.5 19.8
ND 14.7 19.8
ND 19.0
204.6
lowest highest
Example: Order Data Plot
6 5 4 3 1 60 3
9 6 55 5 5 5 6 7 7 8
2 50 0 0 1 2 3 3 3
7 7 7 7 7 5 5 5 5 5 5 45 5 6 7 7 8 8 8 8 8 9
4 4 4 4 4 4 3 3 3 3 2 40 0 2 3 4
35 9
Site 2 Ozone
June
Site 1 Ozone
June
Example: Stem and Leaf
Example: Frequency Plot
Example: Box and Whiskers
Example: Time Plot
Example: Scatter Plot
Example: Posting Plot
15.4 11.4 7.4
12.3 8.3
43.0
14.7 10.7 6.7
10.5 6.5 2.4
Creek
Road
Data Quality Assessment
Steps 4 and 5
The Five Steps of Data Quality
Assessment
1. Review objectives and data collection design
2. Conduct a preliminary data review
3. Select the statistical method
4. Verify the assumptions of the statistical method
5. Draw conclusions from the data
DQA Step 4: Verify the Assumptions
of the Statistical Test
A. Perform method to test assumptions:
ƒ Determine what methods are available
ƒ Select method and significance level
ƒ Perform calculations
B. If necessary, determine corrective actions to be
taken, e.g., select different test in Step 3, reduce
significance level, gather additional data, modify
slightly the objectives of the study, etc.
Types of Assumptions (Step 4A)
Random sample
Independence of data
Distribution of Data
Existence of Outliers
Extent of non-detect data
Verify Random Sample
Review data collection plan to verify that some
element of random selection was used to locate the
samples
If judgmental sampling was used to choose some
sample locations, you may need to consult a
statistician to determine if the data collection is
"random enough" to be able to draw reliable
conclusions from the data
If judgmental sampling was used to choose all sample
locations, the ability to draw conclusions about the
entire population is limited
Verify Independence of Data
For one variable (i.e., one contaminant) ensure there
are no trends within the data - for example, by
location, by time period, etc.
For two (or more) variables (i.e., two or more
contaminants), ensure that they are not correlated
with one another
Verify Distribution
Typical assumptions are:
–Normal distribution
–Symmetric distribution
An indication of the distribution can be gained by
reviewing the summary statistics and the graphs
Quantitative tests are available to verify whether a
data set has a specific distribution
Verify Outliers
Statistical outliers are anomalies with respect to the
proposed distribution for the data -- an extreme value
may be a statistical outlier but may still be a valid data
point
Before removing any statistical outliers, review these
values on both a scientific or quality assurance basis
-- do not delete any values unless there is a scientific
or quality assurance based reason for the removal
If any data are deleted, run all analyses both with and
without the data to see the effect of the deletions
Using Data Below the Detection Limit
Quality control information would indicate which data
points were non-detects
Methods for Addressing non-detects:
–Substitution, e.g., detection limit, 1/2 detection limit,
. . .
–Special algorithms, e.g., Winsorized Mean, Cohen's
Method, . . .
–Use of percentiles
Types of Corrective Actions (Step 4B)
Select a different test or method of analysis
Transform the data to a different metric
Gather additional data or modify objective
PCB Example: Background
Electronic Manufacturing Corporation of America
(EMCA) operated at site from 1965 to 1985, when site
was sold to Energy Components Company (ECC).
Both companies went bankrupt in 1990.
In 1991, chlorinated solvents discovered in water from
city well field east of site.
Waste oil contaminated with PCBs was sprayed on a
dirt road for dust suppression.
Problem: Determine the extent of PCB contamination
on the dirt road that presents unacceptable risks.
No outliers?
Data are approximately normally distributed?
Random sample (for independence of data values)?
No data reported as "Not Detected?"
PCB Example: Assumptions of the t-Test
for a Single Mean
Is the extreme value of 25.15 a statistical outlier?
Extreme Value Test will be used - this test assumes
data without the outlier are normally distributed.
PCB Example: Identifying Outliers
Results: Data are Not Normally Distributed
PCB Example: Testing for Normality
Shapiro-Wilk W Test
Shapiro-Wilk Test
Null Hypothesis: 'Data are normally distributed'
Sample Value: 0.836
Tabled Value: 0.881
Non-normality has been detected at a 5% significance level.
Data without outlier appears to be lognormally
distributed in the Histogram
So, apply Shapiro-Wilk W Test to natural logarithms of
the data to test for lognormality -- if logged data are
normally distributed, then untransformed data are
lognormally distributed
Original data value 1.92 becomes 0.652, 2.49 becomes
0.912, etc.
PCB Example: Not Normally Distributed
PCB Example: Testing Data for
Lognormality
Shapiro-Wilk Test
Null Hypothesis: ‘Data are normally distributed'
Sample Value: 0.953
Tabled Value: 0.881
There is not enough evidence to reject the assumption of
normality with a 5% significance level.
Result: Can not reject the assumption that the data are
lognormally distributed and that the logs of the data are
normally distributed.
PCB Example: Discordance Test for
Outliers
Discordance Test for Outliers
Value Tested: 3.225 [ ln(25.12)]
Sample Value: 2.476
Tabled Value: 2.443
Conclude 3.225 is an outlier at a 5%
significance level.
PCB Example: Assumptions Satisfied?
No outliers?
Data approximately normally distributed?
Random sample?
All data is above the detection limit?
PCB Example: A New Look at the
Logged Data
Number of Observations: 15
Minimum: 0.020 Maximum: 2.159
Mean: 1.002 Median: 0.932
Variance: 0.428 Standard Deviation: 0.654
Range: 2.139 Interquartile Range: 0.985
Coefficient of Variation: 0.653
Coefficient of Skewness: 0.244
Coefficient of Kurtosis: -0.783
Percentiles:
1st: 0.020 75th: 1.522
5th: 0.020 90th: 2.044
10th: 0.157 95th: 2.159
25th: 0.536 99th: 2.159
DQA Step 5: Perform Method and
Draw Conclusions from the Data
A. Perform the calculations for the statistical method
B. Evaluate the results and draw conclusions
Perform the Statistical Method (Step 5A)
For basic methods, formulas and procedures are
available in standard text books
Software can also do the calculations:
–SAS, SPSS, or SPlus
–DataQUEST
A statistical test can give two results:
–Reject the baseline condition or
–Fail to reject the baseline condition
If the test does not reject the baseline condition, then
the false acceptance error rate must be verified
Evaluate the Results (Step 5B)
If the baseline condition (Null Hypothesis) is true and
the statistical test rejects the baseline condition with
a 5% significance level, then the results will occur
naturally with a chance of less than one-in-twenty.
A chance of less than one-in-twenty is highly unlikely,
therefore it is unlikely that the baseline condition is
true. Therefore, the alternative condition is selected
as being correct (with a significance level of 5%).
Significant at 5%
PCB Example: Perform Calculations
Formula:
ƒ Reject baseline condition if t > t1-α
α = 5, t0.95 = 1.753
AL = 0.6931 [the natural log of 2.0]
Results: Reject baseline condition at a 5% level of
significance.
t =
¯X - AL
s
n
=
1.002 - 0.6931
0.654
15
= 1.818
PCB Example: Conclusions
The t-test rejected the null hypothesis that the mean
is less than action level at a 5% significance level
Statistically, the mean concentration of total PCBs in
surface soil (top 1 inch) over the dirt road exceeds
2 ppm and remedial action is necessary
Research Example:
Inhalation Exposure to
Manganese
Manganese Exposure
Manganese (Mn) is a metal emitted from industrial
operations (e.g., steel mills) and cars using gasoline
with additive MMT
Mn is an essential dietary requirement in trace
amounts but higher exposures are toxic:
–occupational exposure shown to affect motor skills
–disruption of neurotransmitters may adversely
affect higher-level cognitive skills
–exposure during infancy and early childhood may
cause learning disabilities and emotional problems
Research Problem: Determine if elevated exposure to
Mn during early childhood can be linked to cognitive
dysfunction in teenagers
Manganese Exposure Routes
Mn is usually inhaled or ingested
MMT can be absorbed through the skin
Key route of concern: Mn is transported to the brain
via the olfactory nerve when Mn in coarse particles is
inhaled
Exposure to Mn can be estimated by analyzing
ambient air monitoring data in Total Suspended
Particulates (TSP) - undisturbed dust in attics and
other areas of homes can be an indicator of long-term
exposure
DQA Step 1: Review Objectives
and Data Collection Design
Translate the data user's objectives into a statement
of the primary statistical hypothesis or estimation
goal
Translate the data user's objectives into tolerable
limits on the probability of committing decision errors
Review the sampling design and note any special
features or deviation from the sample plan
Pilot Study: Objective and Background
Objective: Determine whether the proposed study
area is suitable for a large-scale investigation
Background
–Range of Mn exposures vary from very high to very
low
–Concentrations of Mn in air and dust decrease with
distance from known point source (steel mill)
–There are few, if any, confounding factors, such as
lead in paint
Exploratory nature of pilot study does not warrant
specification of rigorous performance criteria;
however, results must support the larger study goals
–Detect correlation of 40% with at least 80% power
–Identify very high and very low exposures
Sample size for TSP analysis driven by existing
ambient air monitoring network design for air
measurements; sample size for dust samples based
on performance criteria, budget, professional
judgment, and experience with similar studies
Performance Criteria for Pilot Study
Pilot Study: Statistical Methods and
Performance Criteria
Statistical Methods:
–Determine correlation between distance to steel mill
and Mn concentrations in air (TSP) and house dust
–Estimate annual average air concentrations of Mn
in TSP at various distances from steel mill
10% significance level chosen for testing hypotheses
about relationships between Mn exposure
concentrations and results of cognitive tests
Pilot Study: Performance Criteria
(continued)
Correlations expected to be relatively low (10%- 50%)
and negative
False negative error rate specified by determining
power under various assumptions of correlation
10% 20% 30% 40% 50%
50% 166 42 19 11 7
60% 237 60 27 15 10
70% 327 82 36 20 13
75% 383 96 42 23 14
80% 451 113 49 27 17
85% 537 134 58 32 19
90% 656 163 72 39 24
True Population Correlation
Power
Review Sampling Design
2-stage stratified random sampling design
–Sampling frame was phone list in the target area
around the steel mill, pre-filtered to include
households with children age 13-17
–Sampling population was stratified by distance to
the steel mill
–First stage of sampling was random-digit-dialing to
recruit households who met the study criteria
–Second stage of sampling was within the set of
eligible and willing participants
For each household selected, dust from the attic or
basement was collected using a vacuum device per
established protocols
30 households were successfully sampled including
individual houses, attached townhouses, and
apartments
The field report indicated that 50% of the households
were sampled in a basement furnace area, 10% in
other basement areas, 10% in attics, and the
remainder in miscellaneous crawl spaces or other
furnace areas
All of these locations seem reasonable in light of the
project objectives
Review Data Collection Reports
Review quality assurance reports for anomalies
Calculate standard statistical quantities
Display the data using graphical representations
DQA Step 2:
Conduct Preliminary Data Review
Quality Control Reports
There were no "below the detection limit" values in
the Mn data set
Duplicate sample analyses and instrument analyses
indicated excellent agreement, all within 5%
There were no significant anomalies in the QC reports
Summary Statistics for Mn in Dust
Mean 924.23
Std Dev 2566.9
C.V. 2.78
Median 390.0
Mode N.A.
Range 14420
Minimum 16
Maximum 14436
Percentile Value
95% 1339
75% 699
50% 390
25% 258
5% 130
N = 30
Histogram of Mn in Dust Example
Conc. (ppm) Frequency
0-99 1
100-199 1
200-299 6
300-399 7
400-499 5
500-599 1
600-699 2
700-799 4
800-899 0
900-999 0
1000-1099 1
1100-1199 0
1200-1299 0
1300-1399 1
1400-1499 0
1500 + 1
Box-and-Whiskers Plots of Mn in Dust
Full Data Set
+
16
(median) 390
(mean) 924
o14436
Without Potential Outlier
+
o
16.0
(median) 256.0
(mean) 458.3
1339.0
Scatter Plot of Mn vs. Distance
using a logarithmic scale for Mn
Mn (ppm) Distance
(mi)
465 39.06
712 20.98
456 20.29
502 27.42
448 34.82
414 27.53
341 39.26
212 16.88
619 34.86
1339 30.80
243 42.77
250 39.26
16 39.19
203 39.26
130 39.41
327 12.83
332 12.15
309 16.17
350 12.82
366 12.25
258 14.67
225 6.66
743 6.60
468 16.08
14436 13.85
726 14.74
347 16.29
1037 1.21
754 1.22
699 13.86
DQA Step 3:
Select the Statistical Method
Select the statistical method based on the data user's
objectives and the preliminary data review
Identify the assumptions underlying the statistical
method
Two potential measures of correlation:
–Pearson's correlation coefficient
ƒ detects linear relationship between two sets of
values
ƒ sensitive to extreme values
ƒ not affected by linear transformations
–Spearman's rank correlation coefficient
ƒ uses ranks of data values
ƒ less sensitive to extreme values
ƒ not affected by monotonic nonlinear
transformations
Correlation Between Mn and
Distance to Point Source
Assumptions to be Verified for Pearson's
Correlation Coefficient
Random Sample
Independence of data
Linear Relationship
No Outliers
Normal Distribution
DQA Step 4: Verify the Assumptions
of the Statistical Method
Determine approach for verifying assumptions
Perform tests of assumptions
If necessary, determine corrective actions to be taken
Assumption: Random Sample
Verification Method: Review how the data collection
plan was developed and implemented (i.e., how the
sample locations were chosen)
Verification Results: The QA Project Plan and the
data collection documentation reports that random
sample locations were chosen
It is highly probable that this assumption of a random
sample has been met
Assumption: Independence of Data
Verification Method: Rank von Neumann test
Verification Result: Using the baseline condition (Null
Hypothesis) that there is no serial correlation present,
the Rank von Neumann test conclude that there is no
serial correlation at the 10% level of significance
It is highly probable that this assumption of a
independence has been met
Assumption: Few Values Below
Detection Limit
Verification Method: Review data
Verification Result: There are none in this data set
This assumption of few values below the detection
limit has been met
Assumption: No Outliers
Verification Method:
–Review histogram: Look for very high or very low
values as compared to the rest of the distribution
–Review Box Plot: Potential outliers show up
quickly.
–Use Rosner's Test on any extreme values (with a
5% signicance level.
Review Histogram
Review Box and Whiskers Plot
Looks like
an extreme
outlier!
+
16
(median) 390
(mean) 924
o14436
Assumption: No Outliers
Verification Result: Rosner's test has determined that
the observation 14436 is a statistical outlier at the 5%
significance level and should be investigated further.
(The sample value was 5.264 and the critical value
was 2.910.)
This assumption of no outliers has not been met!
Look at the data without the outlier...
Looks better, but
still may have an
outlier...
+
o
16.0
(median) 256.0
(mean) 458.3
1339.0
Check for outliers in censored data set
Verification Result: Rosner's test has determined that
the observation 1339 is a statistical outlier at the 5%
significance level and should be investigated further.
(The sample value was 3.129 and the critical value
was 2.890.)
HOWEVER, Rosner's test assumes normality...
Shapiro-Wilk Test for Normality
The Shapiro-Wilk test for normality determined that the
baseline condition that the data are normally distributed
can be rejected at the 5% significance level (Sample
value = 0.900; Critical Value = 0.926)
Non-normality has been detected at the 5% level of
significance, but only just!
What about a log transform of
the censored data set?
+
o
16.0
(median) 256.0
(mean) 458.3
1339.0
What about a log transform of
the censored data set?
Still looks skewed:
Shapiro-Wilk test
confirms that data
are not strictly normal.
205 394 593 772 961 1150 1339
Concentration (ppm)
0
5
10
15
Frequency
16
What should we conclude about
assumptions?
Data seems to be an independent random sample
Data are not normally distributed (or lognormal,
either)
Data probably has multiple outliers -- it's heavily
skewed to the right.
Methods such as Pearson's correlation coefficient,
that are sensitive to extreme values, are not
recommended
Spearman's Rank Correlation Coefficient is a better
approach for evaluating the relationship between Mn
in dust vs. distance to point source
DQA Step 5: Perform Method and
Draw Conclusions from the Data
Perform the calculations for the statistical method
Evaluate the results of the statistical method and
draw conclusions
Spearman's Rank Correlation Coefficient
Results are less sensitive to extreme values (use the
original full data set)
The Spearman's Correlation Coefficient of distance
from the source to Mn concentration is -0.406
ƒ The correlation is negative (as suspected) and the
concentration decreases as distance from the
point source increases (a negativecorrelation
coefficient)
ƒ The relationship is not strong (only 40% of the
total variation is attributable to a relationship
between distance and concentration)
Interpret the Results
Factors other than distance-to-point-source that
determine concentrations of Mn in dust:
–Infiltration rates into the dwelling
–Housekeeping practices
–Age of dwelling (how long the dust was left
undisturbed)
–Changing wind patterns in the region
Those other factors mediate the relationship between
Mn in dust and distance to point source.
Nonetheless, the researchers believe there is enough
of a relationship to support the use of distance as a
basis for stratifying the population in the larger study
THANK YOU!

More Related Content

What's hot

IT Control Objectives for SOX
IT Control Objectives for SOXIT Control Objectives for SOX
IT Control Objectives for SOXMahesh Patwardhan
 
How to Implement Data Governance Best Practice
How to Implement Data Governance Best PracticeHow to Implement Data Governance Best Practice
How to Implement Data Governance Best PracticeDATAVERSITY
 
Togaf 9.1 ADM summary
Togaf 9.1 ADM summaryTogaf 9.1 ADM summary
Togaf 9.1 ADM summaryMarco Bakker
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?DATAVERSITY
 
Best Practices in Metadata Management
Best Practices in Metadata ManagementBest Practices in Metadata Management
Best Practices in Metadata ManagementDATAVERSITY
 
Optimize Project Intake Approval and Prioritization
Optimize Project Intake Approval and PrioritizationOptimize Project Intake Approval and Prioritization
Optimize Project Intake Approval and PrioritizationInfo-Tech Research Group
 
Data-Ed Engineering Solutions to Data Quality Challenges
Data-Ed Engineering Solutions to Data Quality ChallengesData-Ed Engineering Solutions to Data Quality Challenges
Data-Ed Engineering Solutions to Data Quality ChallengesDATAVERSITY
 
Implementing Effective Data Governance
Implementing Effective Data GovernanceImplementing Effective Data Governance
Implementing Effective Data GovernanceChristopher Bradley
 
Data Management Maturity Assessment
Data Management Maturity AssessmentData Management Maturity Assessment
Data Management Maturity AssessmentFiras Hamdan
 
Best Practices in Metadata Management
Best Practices in Metadata ManagementBest Practices in Metadata Management
Best Practices in Metadata ManagementDATAVERSITY
 
IT Strategy &amp; Planning
IT Strategy &amp; PlanningIT Strategy &amp; Planning
IT Strategy &amp; Planningchakraj
 
How to Strengthen Enterprise Data Governance with Data Quality
How to Strengthen Enterprise Data Governance with Data QualityHow to Strengthen Enterprise Data Governance with Data Quality
How to Strengthen Enterprise Data Governance with Data QualityDATAVERSITY
 
Practical Guide to Data Governance Success
Practical Guide to Data Governance SuccessPractical Guide to Data Governance Success
Practical Guide to Data Governance SuccessAmple Insight Inc
 
Geek Sync I The Importance of Data Model Change Management
Geek Sync I The Importance of Data Model Change ManagementGeek Sync I The Importance of Data Model Change Management
Geek Sync I The Importance of Data Model Change ManagementIDERA Software
 
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...DATAVERSITY
 
Chapter 7: Data Security Management
Chapter 7: Data Security ManagementChapter 7: Data Security Management
Chapter 7: Data Security ManagementAhmed Alorage
 

What's hot (20)

IT Control Objectives for SOX
IT Control Objectives for SOXIT Control Objectives for SOX
IT Control Objectives for SOX
 
How to Implement Data Governance Best Practice
How to Implement Data Governance Best PracticeHow to Implement Data Governance Best Practice
How to Implement Data Governance Best Practice
 
Data Quality Presentation.ppt
Data Quality Presentation.pptData Quality Presentation.ppt
Data Quality Presentation.ppt
 
Togaf 9.1 ADM summary
Togaf 9.1 ADM summaryTogaf 9.1 ADM summary
Togaf 9.1 ADM summary
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
 
Best Practices in Metadata Management
Best Practices in Metadata ManagementBest Practices in Metadata Management
Best Practices in Metadata Management
 
Project Execution
Project ExecutionProject Execution
Project Execution
 
Practical IT auditing
Practical IT auditingPractical IT auditing
Practical IT auditing
 
Optimize Project Intake Approval and Prioritization
Optimize Project Intake Approval and PrioritizationOptimize Project Intake Approval and Prioritization
Optimize Project Intake Approval and Prioritization
 
Data-Ed Engineering Solutions to Data Quality Challenges
Data-Ed Engineering Solutions to Data Quality ChallengesData-Ed Engineering Solutions to Data Quality Challenges
Data-Ed Engineering Solutions to Data Quality Challenges
 
Implementing Effective Data Governance
Implementing Effective Data GovernanceImplementing Effective Data Governance
Implementing Effective Data Governance
 
Data Management Maturity Assessment
Data Management Maturity AssessmentData Management Maturity Assessment
Data Management Maturity Assessment
 
Best Practices in Metadata Management
Best Practices in Metadata ManagementBest Practices in Metadata Management
Best Practices in Metadata Management
 
IT Strategy &amp; Planning
IT Strategy &amp; PlanningIT Strategy &amp; Planning
IT Strategy &amp; Planning
 
How to Strengthen Enterprise Data Governance with Data Quality
How to Strengthen Enterprise Data Governance with Data QualityHow to Strengthen Enterprise Data Governance with Data Quality
How to Strengthen Enterprise Data Governance with Data Quality
 
Data Strategy
Data StrategyData Strategy
Data Strategy
 
Practical Guide to Data Governance Success
Practical Guide to Data Governance SuccessPractical Guide to Data Governance Success
Practical Guide to Data Governance Success
 
Geek Sync I The Importance of Data Model Change Management
Geek Sync I The Importance of Data Model Change ManagementGeek Sync I The Importance of Data Model Change Management
Geek Sync I The Importance of Data Model Change Management
 
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
 
Chapter 7: Data Security Management
Chapter 7: Data Security ManagementChapter 7: Data Security Management
Chapter 7: Data Security Management
 

Similar to Role of Data Quality Assessment in a Project

10. Project Quality Management
10. Project Quality Management 10. Project Quality Management
10. Project Quality Management BhuWan Khadka
 
Data Acquisition: A Key Challenge for Quality and Reliability Improvement
Data Acquisition: A Key Challenge for Quality and Reliability ImprovementData Acquisition: A Key Challenge for Quality and Reliability Improvement
Data Acquisition: A Key Challenge for Quality and Reliability ImprovementASQ Reliability Division
 
Insights from Program Evaluation for Retrospective Reviews of regulations
Insights from Program Evaluation for Retrospective Reviews of regulationsInsights from Program Evaluation for Retrospective Reviews of regulations
Insights from Program Evaluation for Retrospective Reviews of regulationsNick Hart, Ph.D.
 
Software Engineering (Testing Activities, Management, and Automation)
Software Engineering (Testing Activities, Management, and Automation)Software Engineering (Testing Activities, Management, and Automation)
Software Engineering (Testing Activities, Management, and Automation)ShudipPal
 
Fundamentaltestprocess windirohmaheny11453205427 kelase
Fundamentaltestprocess windirohmaheny11453205427 kelaseFundamentaltestprocess windirohmaheny11453205427 kelase
Fundamentaltestprocess windirohmaheny11453205427 kelasewindi rohmaheny
 
Methods Engineering
Methods EngineeringMethods Engineering
Methods Engineeringmiguelaep
 
Management oriented evaluation approaches
Management oriented evaluation approachesManagement oriented evaluation approaches
Management oriented evaluation approachesJessica Bernardino
 
08 projectqualitymanagement
08 projectqualitymanagement08 projectqualitymanagement
08 projectqualitymanagementDhamo daran
 
Six Sigma Green Belt
Six Sigma Green BeltSix Sigma Green Belt
Six Sigma Green BeltRaju N
 
L07 quality management
L07 quality managementL07 quality management
L07 quality managementAsa Chan
 
Zmitrowicz Test Strategy Test Forum Milan 2019
Zmitrowicz Test Strategy Test Forum Milan 2019Zmitrowicz Test Strategy Test Forum Milan 2019
Zmitrowicz Test Strategy Test Forum Milan 2019KAROLINA ZMITROWICZ
 
Use the Windshield, Not the Mirror Predictive Metrics that Drive Successful ...
 Use the Windshield, Not the Mirror Predictive Metrics that Drive Successful ... Use the Windshield, Not the Mirror Predictive Metrics that Drive Successful ...
Use the Windshield, Not the Mirror Predictive Metrics that Drive Successful ...Seapine Software
 
xx QMP QMS QA documents full.ppt
xx QMP QMS QA documents full.pptxx QMP QMS QA documents full.ppt
xx QMP QMS QA documents full.pptssusera85eeb1
 

Similar to Role of Data Quality Assessment in a Project (20)

10. Project Quality Management
10. Project Quality Management 10. Project Quality Management
10. Project Quality Management
 
Data Acquisition: A Key Challenge for Quality and Reliability Improvement
Data Acquisition: A Key Challenge for Quality and Reliability ImprovementData Acquisition: A Key Challenge for Quality and Reliability Improvement
Data Acquisition: A Key Challenge for Quality and Reliability Improvement
 
Insights from Program Evaluation for Retrospective Reviews of regulations
Insights from Program Evaluation for Retrospective Reviews of regulationsInsights from Program Evaluation for Retrospective Reviews of regulations
Insights from Program Evaluation for Retrospective Reviews of regulations
 
Project Metrics & Measures
Project Metrics & MeasuresProject Metrics & Measures
Project Metrics & Measures
 
Software Engineering (Testing Activities, Management, and Automation)
Software Engineering (Testing Activities, Management, and Automation)Software Engineering (Testing Activities, Management, and Automation)
Software Engineering (Testing Activities, Management, and Automation)
 
Fundamentaltestprocess windirohmaheny11453205427 kelase
Fundamentaltestprocess windirohmaheny11453205427 kelaseFundamentaltestprocess windirohmaheny11453205427 kelase
Fundamentaltestprocess windirohmaheny11453205427 kelase
 
Methods Engineering
Methods EngineeringMethods Engineering
Methods Engineering
 
Management oriented evaluation approaches
Management oriented evaluation approachesManagement oriented evaluation approaches
Management oriented evaluation approaches
 
Data Quality Control
Data Quality ControlData Quality Control
Data Quality Control
 
08 projectqualitymanagement
08 projectqualitymanagement08 projectqualitymanagement
08 projectqualitymanagement
 
Six Sigma Green Belt
Six Sigma Green BeltSix Sigma Green Belt
Six Sigma Green Belt
 
Session 12 4th edition PMP
Session  12 4th edition PMPSession  12 4th edition PMP
Session 12 4th edition PMP
 
L07 quality management
L07 quality managementL07 quality management
L07 quality management
 
Zmitrowicz Test Strategy Test Forum Milan 2019
Zmitrowicz Test Strategy Test Forum Milan 2019Zmitrowicz Test Strategy Test Forum Milan 2019
Zmitrowicz Test Strategy Test Forum Milan 2019
 
Fundamental Test Process
Fundamental Test ProcessFundamental Test Process
Fundamental Test Process
 
7- PMP Training - Quality Management
7- PMP Training - Quality Management7- PMP Training - Quality Management
7- PMP Training - Quality Management
 
Environmental Quality Assurance -- a primer
Environmental Quality Assurance -- a primerEnvironmental Quality Assurance -- a primer
Environmental Quality Assurance -- a primer
 
Use the Windshield, Not the Mirror Predictive Metrics that Drive Successful ...
 Use the Windshield, Not the Mirror Predictive Metrics that Drive Successful ... Use the Windshield, Not the Mirror Predictive Metrics that Drive Successful ...
Use the Windshield, Not the Mirror Predictive Metrics that Drive Successful ...
 
qmpfull.ppt
qmpfull.pptqmpfull.ppt
qmpfull.ppt
 
xx QMP QMS QA documents full.ppt
xx QMP QMS QA documents full.pptxx QMP QMS QA documents full.ppt
xx QMP QMS QA documents full.ppt
 

More from Ferdin Joe John Joseph PhD

Week 11: Cloud Native- DSA 441 Cloud Computing
Week 11: Cloud Native- DSA 441 Cloud ComputingWeek 11: Cloud Native- DSA 441 Cloud Computing
Week 11: Cloud Native- DSA 441 Cloud ComputingFerdin Joe John Joseph PhD
 
Week 10: Cloud Security- DSA 441 Cloud Computing
Week 10: Cloud Security- DSA 441 Cloud ComputingWeek 10: Cloud Security- DSA 441 Cloud Computing
Week 10: Cloud Security- DSA 441 Cloud ComputingFerdin Joe John Joseph PhD
 
Week 9: Relational Database Service Alibaba Cloud- DSA 441 Cloud Computing
Week 9: Relational Database Service Alibaba Cloud- DSA 441 Cloud ComputingWeek 9: Relational Database Service Alibaba Cloud- DSA 441 Cloud Computing
Week 9: Relational Database Service Alibaba Cloud- DSA 441 Cloud ComputingFerdin Joe John Joseph PhD
 
Week 7: Object Storage Service Alibaba Cloud- DSA 441 Cloud Computing
Week 7: Object Storage Service Alibaba Cloud- DSA 441 Cloud ComputingWeek 7: Object Storage Service Alibaba Cloud- DSA 441 Cloud Computing
Week 7: Object Storage Service Alibaba Cloud- DSA 441 Cloud ComputingFerdin Joe John Joseph PhD
 
Week 6: Server Load Balancer and Auto Scaling Alibaba Cloud- DSA 441 Cloud Co...
Week 6: Server Load Balancer and Auto Scaling Alibaba Cloud- DSA 441 Cloud Co...Week 6: Server Load Balancer and Auto Scaling Alibaba Cloud- DSA 441 Cloud Co...
Week 6: Server Load Balancer and Auto Scaling Alibaba Cloud- DSA 441 Cloud Co...Ferdin Joe John Joseph PhD
 
Week 5: Elastic Compute Service (ECS) with Alibaba Cloud- DSA 441 Cloud Compu...
Week 5: Elastic Compute Service (ECS) with Alibaba Cloud- DSA 441 Cloud Compu...Week 5: Elastic Compute Service (ECS) with Alibaba Cloud- DSA 441 Cloud Compu...
Week 5: Elastic Compute Service (ECS) with Alibaba Cloud- DSA 441 Cloud Compu...Ferdin Joe John Joseph PhD
 
Week 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud Computing
Week 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud ComputingWeek 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud Computing
Week 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud ComputingFerdin Joe John Joseph PhD
 
Week 3: Virtual Private Cloud, On Premise, IaaS, PaaS, SaaS - DSA 441 Cloud C...
Week 3: Virtual Private Cloud, On Premise, IaaS, PaaS, SaaS - DSA 441 Cloud C...Week 3: Virtual Private Cloud, On Premise, IaaS, PaaS, SaaS - DSA 441 Cloud C...
Week 3: Virtual Private Cloud, On Premise, IaaS, PaaS, SaaS - DSA 441 Cloud C...Ferdin Joe John Joseph PhD
 
Week 2: Virtualization and VM Ware - DSA 441 Cloud Computing
Week 2: Virtualization and VM Ware - DSA 441 Cloud ComputingWeek 2: Virtualization and VM Ware - DSA 441 Cloud Computing
Week 2: Virtualization and VM Ware - DSA 441 Cloud ComputingFerdin Joe John Joseph PhD
 
Week 1: Introduction to Cloud Computing - DSA 441 Cloud Computing
Week 1: Introduction to Cloud Computing - DSA 441 Cloud ComputingWeek 1: Introduction to Cloud Computing - DSA 441 Cloud Computing
Week 1: Introduction to Cloud Computing - DSA 441 Cloud ComputingFerdin Joe John Joseph PhD
 
Sept 6 2021 BTech Artificial Intelligence and Data Science curriculum
Sept 6 2021 BTech Artificial Intelligence and Data Science curriculumSept 6 2021 BTech Artificial Intelligence and Data Science curriculum
Sept 6 2021 BTech Artificial Intelligence and Data Science curriculumFerdin Joe John Joseph PhD
 
Transforming deep into transformers – a computer vision approach
Transforming deep into transformers – a computer vision approachTransforming deep into transformers – a computer vision approach
Transforming deep into transformers – a computer vision approachFerdin Joe John Joseph PhD
 

More from Ferdin Joe John Joseph PhD (20)

Invited Talk DGTiCon 2022
Invited Talk DGTiCon 2022Invited Talk DGTiCon 2022
Invited Talk DGTiCon 2022
 
Week 12: Cloud AI- DSA 441 Cloud Computing
Week 12: Cloud AI- DSA 441 Cloud ComputingWeek 12: Cloud AI- DSA 441 Cloud Computing
Week 12: Cloud AI- DSA 441 Cloud Computing
 
Week 11: Cloud Native- DSA 441 Cloud Computing
Week 11: Cloud Native- DSA 441 Cloud ComputingWeek 11: Cloud Native- DSA 441 Cloud Computing
Week 11: Cloud Native- DSA 441 Cloud Computing
 
Week 10: Cloud Security- DSA 441 Cloud Computing
Week 10: Cloud Security- DSA 441 Cloud ComputingWeek 10: Cloud Security- DSA 441 Cloud Computing
Week 10: Cloud Security- DSA 441 Cloud Computing
 
Week 9: Relational Database Service Alibaba Cloud- DSA 441 Cloud Computing
Week 9: Relational Database Service Alibaba Cloud- DSA 441 Cloud ComputingWeek 9: Relational Database Service Alibaba Cloud- DSA 441 Cloud Computing
Week 9: Relational Database Service Alibaba Cloud- DSA 441 Cloud Computing
 
Week 7: Object Storage Service Alibaba Cloud- DSA 441 Cloud Computing
Week 7: Object Storage Service Alibaba Cloud- DSA 441 Cloud ComputingWeek 7: Object Storage Service Alibaba Cloud- DSA 441 Cloud Computing
Week 7: Object Storage Service Alibaba Cloud- DSA 441 Cloud Computing
 
Week 6: Server Load Balancer and Auto Scaling Alibaba Cloud- DSA 441 Cloud Co...
Week 6: Server Load Balancer and Auto Scaling Alibaba Cloud- DSA 441 Cloud Co...Week 6: Server Load Balancer and Auto Scaling Alibaba Cloud- DSA 441 Cloud Co...
Week 6: Server Load Balancer and Auto Scaling Alibaba Cloud- DSA 441 Cloud Co...
 
Week 5: Elastic Compute Service (ECS) with Alibaba Cloud- DSA 441 Cloud Compu...
Week 5: Elastic Compute Service (ECS) with Alibaba Cloud- DSA 441 Cloud Compu...Week 5: Elastic Compute Service (ECS) with Alibaba Cloud- DSA 441 Cloud Compu...
Week 5: Elastic Compute Service (ECS) with Alibaba Cloud- DSA 441 Cloud Compu...
 
Week 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud Computing
Week 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud ComputingWeek 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud Computing
Week 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud Computing
 
Week 3: Virtual Private Cloud, On Premise, IaaS, PaaS, SaaS - DSA 441 Cloud C...
Week 3: Virtual Private Cloud, On Premise, IaaS, PaaS, SaaS - DSA 441 Cloud C...Week 3: Virtual Private Cloud, On Premise, IaaS, PaaS, SaaS - DSA 441 Cloud C...
Week 3: Virtual Private Cloud, On Premise, IaaS, PaaS, SaaS - DSA 441 Cloud C...
 
Week 2: Virtualization and VM Ware - DSA 441 Cloud Computing
Week 2: Virtualization and VM Ware - DSA 441 Cloud ComputingWeek 2: Virtualization and VM Ware - DSA 441 Cloud Computing
Week 2: Virtualization and VM Ware - DSA 441 Cloud Computing
 
Week 1: Introduction to Cloud Computing - DSA 441 Cloud Computing
Week 1: Introduction to Cloud Computing - DSA 441 Cloud ComputingWeek 1: Introduction to Cloud Computing - DSA 441 Cloud Computing
Week 1: Introduction to Cloud Computing - DSA 441 Cloud Computing
 
Sept 6 2021 BTech Artificial Intelligence and Data Science curriculum
Sept 6 2021 BTech Artificial Intelligence and Data Science curriculumSept 6 2021 BTech Artificial Intelligence and Data Science curriculum
Sept 6 2021 BTech Artificial Intelligence and Data Science curriculum
 
Hadoop in Alibaba Cloud
Hadoop in Alibaba CloudHadoop in Alibaba Cloud
Hadoop in Alibaba Cloud
 
Cloud Computing Essentials in Alibaba Cloud
Cloud Computing Essentials in Alibaba CloudCloud Computing Essentials in Alibaba Cloud
Cloud Computing Essentials in Alibaba Cloud
 
Transforming deep into transformers – a computer vision approach
Transforming deep into transformers – a computer vision approachTransforming deep into transformers – a computer vision approach
Transforming deep into transformers – a computer vision approach
 
Week 11: Programming for Data Analysis
Week 11: Programming for Data AnalysisWeek 11: Programming for Data Analysis
Week 11: Programming for Data Analysis
 
Week 10: Programming for Data Analysis
Week 10: Programming for Data AnalysisWeek 10: Programming for Data Analysis
Week 10: Programming for Data Analysis
 
Week 9: Programming for Data Analysis
Week 9: Programming for Data AnalysisWeek 9: Programming for Data Analysis
Week 9: Programming for Data Analysis
 
Week 8: Programming for Data Analysis
Week 8: Programming for Data AnalysisWeek 8: Programming for Data Analysis
Week 8: Programming for Data Analysis
 

Recently uploaded

Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...ThinkInnovation
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样wsppdmt
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证pwgnohujw
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token PredictionNABLAS株式会社
 
jll-asia-pacific-capital-tracker-1q24.pdf
jll-asia-pacific-capital-tracker-1q24.pdfjll-asia-pacific-capital-tracker-1q24.pdf
jll-asia-pacific-capital-tracker-1q24.pdfjaytendertech
 
Chapter 1 - Introduction to Data Mining Concepts and Techniques.pptx
Chapter 1 - Introduction to Data Mining Concepts and Techniques.pptxChapter 1 - Introduction to Data Mining Concepts and Techniques.pptx
Chapter 1 - Introduction to Data Mining Concepts and Techniques.pptxkusamee0
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeBoston Institute of Analytics
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesBoston Institute of Analytics
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...mikehavy0
 
bams-3rd-case-presentation-scabies-12-05-2020.pptx
bams-3rd-case-presentation-scabies-12-05-2020.pptxbams-3rd-case-presentation-scabies-12-05-2020.pptx
bams-3rd-case-presentation-scabies-12-05-2020.pptxJocylDuran
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Klinik Aborsi
 
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...Voces Mineras
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxStephen266013
 
原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证pwgnohujw
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsBrainSell Technologies
 
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样jk0tkvfv
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?RemarkSemacio
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 

Recently uploaded (20)

Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
 
jll-asia-pacific-capital-tracker-1q24.pdf
jll-asia-pacific-capital-tracker-1q24.pdfjll-asia-pacific-capital-tracker-1q24.pdf
jll-asia-pacific-capital-tracker-1q24.pdf
 
Chapter 1 - Introduction to Data Mining Concepts and Techniques.pptx
Chapter 1 - Introduction to Data Mining Concepts and Techniques.pptxChapter 1 - Introduction to Data Mining Concepts and Techniques.pptx
Chapter 1 - Introduction to Data Mining Concepts and Techniques.pptx
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
 
bams-3rd-case-presentation-scabies-12-05-2020.pptx
bams-3rd-case-presentation-scabies-12-05-2020.pptxbams-3rd-case-presentation-scabies-12-05-2020.pptx
bams-3rd-case-presentation-scabies-12-05-2020.pptx
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
 
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
 
原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data Analytics
 
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 

Role of Data Quality Assessment in a Project

  • 1. The Role of Data Quality Assessment in a Project Dr. Ferdin Joe John Joseph Kamnoetvidya Science Academy Rayong,Thailand ferdinjoe@gmail.com
  • 2. Course Objectives At the end of this course, you will be able to: •Explain why DQA is important and how it can be applied to your projects •List the five steps of DQA and explain the purpose of each step •Evaluate the application of DQA on a dataset •Interpret basic statistics and simple graphs •Recognize different software tools and other resources for performing DQA
  • 3. Data Quality Meaningful only when "data quality" relates to intended use of data Some data are good ("high quality") for some purposes but are bad ("low quality") for others
  • 4. Data Quality Assessment A scientific and a statistical evaluation to determine if data are adequate for their intended use DQA is described in the Guidance for Data Quality Assessment: Practical Methods for Data Analysis (EPA/QA G-9), EPA/600/R-96/084, July 2000
  • 5. The Project Life Cycle Product or Decision Plan for Data Collection - Set data quality objectives or other performance and acceptance criteria. Document in QA Project Plan. Collect Data - Collect/assemble data in accordance with QA Project Plan. Perform assessments defined in Plan. Assess and Use Data - Verify whether the data meet acceptance criteria. Run statistical methods to analyze data.
  • 6. DQA is Performed ... Whenever data are used to make a decision, for estimation, or for research purposes This applies to: –New data to be collected –Data collected by someone else –Data collected by you for another project
  • 7. IMPLEMENTATION Field Data Collection and Associated QA / QC Activities PLANNING Systematic Planning (e.g., Data Quality Objectives Process) QA Project Plan Development ASSESSMENT Data Validation/Verification Data Quality Assessment OUTPUT INPUT OUTPUT QUALITY ASSURANCE ASSESSMENT CONCLUSIONS DRAWN FROM DATA DATA VALIDATION/VERIFICATION Verify measurement performance Verify measurement procedures and reporting requirements VALIDATED/VERIFIED DATA DATA QUALITY ASSESSMENT Review objectives and design Conduct preliminary data review Select statistical method Verify assumptions Draw conclusions QC/Performance Evaluation Data Routine Data INPUTS
  • 8. Data Verification - the process of evaluating the completeness, correctness, and conformance/ compliance of a specific data set against the method, procedural, or contractual requirements Data Validation - an analyte- and sample-specific process that extends the evaluation of data beyond method, procedural, or contractual compliance (i.e., data verification) to determine the analytical quality of a specific data set Data Quality Assessment - the process to determine if the data are suitable for a specific use Verification vs. Validation vs. Assessement
  • 9. The Five Steps of Data Quality Assessment 1. Review the Objectives and Sampling Design 2. Conduct a Preliminary Data Review 3. Select the Statistical Method 4. Verify the Assumptions of the Statistical Method 5. Draw Conclusions from the Data
  • 10. Data Quality Assessment Do the assumption s hold? No Yes Step 4: Verify the assumptions of the method Step 5: Draw conclusions Step 3: Select the statistical method Step 2: Learn more about the data Step 1: Review Decision Problem Revise the scope of the problem Choose a new statistical test Transform or otherwise modify the data OR OR Product/decision
  • 11. Two Views of DQA 1 Define the Decision Rule and Decision Errors 2 Specify Acceptable Limits on Decision Errors 3 Identify Method for Applying Decision Rule 4 Ensure that Method is Defensible 5 Apply the Decision Rule 1 Define the Statistical Hypotheses 2 Determine Acceptable Type I and Type II Error Rates 3 Identify Statistical Test or Method and Assumptions 4 Assess Validity of Statistical Test/Method 5 Perform Statistical Test and Assess Design DECISION MAKER'S VIEW DATA ANALYST'S VIEW
  • 12. DQA Project Table Project Objective & Data Collection Design (Step 1) Observations from QA Reports, Summary Statistics, and Graphs (Step 2) Statistical Method and Assumptions (Step 3) Verification of Assumptions (Step 4) Results from Statistical Method (Step 5) LIST: - Objective - Parameter of interest - Type of analysis needed - Type of data collection design - Information on deviations from the design in the implementation LIST: - Non-detects - Probable distribution - Potential outliers - Anomalies LIST: - Analysis method - Assumptions to verify - Significance levels LIST: - Assumptions, whether they were met, and how they were verified (including significance levels) LIST: - Final results from data analysis - Other factors affecting the final product or decision [This column will contain an overview of the project and background information against which to determine "quality."] [This column will contain information that will provide insight into which assumptions might be met.] [This column will contain information on the statistical method and its assumptions.] [This column will describe what assumptions were checked, how they were checked, and what the results were.] [This column will summarize the final results from the statistical test and other factors to consider in the final product or decision.]
  • 14. The Five Steps of Data Quality Assessment 1. Review objectives and data collection design 2. Conduct a preliminary data review 3. Select the statistical method 4. Verify the assumptions of the statistical method 5. Draw conclusions from the data IMPORTANT: If data other than (or in addition to) new data is being used, ALL steps must still be performed!
  • 15. DQA Step 1: Review Objectives and Data Collection Design Translate the data user's objectives into a statement of the primary statistical hypothesis or estimation goal Translate the data user's objectives into tolerable limits on the probability of committing decision errors Review the sampling design and note any special features or deviation from the sampling plan
  • 16. Step 1: Input QA Project Plan or any other planning documents that contain: –Project objective or question to be answered –Decision performance criteria (DQOs) or other performance and acceptance criteria Field Sampling Plan and any reports on actual implementation of sampling plan
  • 17. If Systematic Planning was Performed... Use the reports documenting the planning to answer: –What is the objective of the project? –What are the performance or acceptance criteria for the product or decision?
  • 18. If Systematic Planning was NOT performed... Decision Making: Apply the Data Quality Objectives Process or other planning process to: –develop hypotheses –define the potential decision errors –specify tolerable limits on making decision errors Estimation: Use a systematic planning process to: –select parameters –develop performance or acceptance criteri.
  • 19. Example Hypotheses: –Null Hypothesis: True mean is less than 50 mg/Kg –Alternative: True mean is greater than 50 mg/Kg Example Decision Errors: If the null hypothesis is that the mean is less than 50 mg/Kg: –False Rejection: Decide that the true mean PAH concentration is more than 50 mg/Kg when it is really less than 50 mg/Kg –False Acceptance: Decide that the true mean PAH concentration is less than 50 mg/Kg when it is really greater than 50 mg/Kg Example Hypothesis & Decision Errors
  • 20. False Rejection: Decide that the true mean PAH concentration more than 50 mg/Kg, when it is really less than 50 mg/Kg –10% probability of making an error at 50 mg/Kg –5% probability of making an error at 25 mg/Kg False Acceptance: Decide that the true mean PAH concentration is less than 50 mg/Kg, when it is really greater than 50 mg/Kg –10% probability of making an error at 70 mg/Kg –5% probability of making an error at 100 mg/Kg Example Limits on Decision Errors
  • 21. Reviewing the Sampling Design Review the planned sampling design and the information on the actual data collection; note any special features or deviations Determine whether these deviations could effect the potential analysis of the data
  • 22. Step 1 - Output Well-defined project objectives and criteria Verification that the hypothesis chosen is consistent with the objective and criteria A list of any deviations from the planned sampling design and the effects of these deviations
  • 23. Review quality assurance reports for anomalies Calculate standard statistical quantities Display the data using graphical representations DQA Step 2: Conduct Preliminary Data Review
  • 24. Step 2: Input Verified and Validated Data QA reports, QC data Technical systems audit results: –Performance evaluations –Corrective action reports –Data verification and validation reports QA Project Plan, Sampling and Analysis Plan, or other planning documents
  • 25. Review QA Reports Look for: –Failure to meet acceptance criteria/obvious QC violations: ƒ variable detection limits ƒ nonequivalent analytical methods –Implementation anomalies from the QA Project Plan: ƒ negative emission rates ƒ pH values exceeding 14.0 ƒ Values in wrong reporting units
  • 26. Calculate Standard Statistical Quantities Statistical Quantities include measures of: –Central Tendency (mean, median, etc.) –Relative Standing (percentiles) –Dispersion (range, variance) –Association (correlation) Review these quantities to determine: –Do the data look reasonable - do the values make sense? –Are there any obvious anomalies? –Are there any trends or patterns?
  • 27. Display the Data using Graphs Common Graphs: –Histogram –Stem-and-Leaf –Box and Whiskers –Scatter Plot –Time Plot Review graphs to determine: –Do the data look reasonable? –What is the distribution like? Is it symmetric, bimodal? –Are there extremely high or extremely low values? –Are there any obvious trends?
  • 28. Step 2: Output Statistical quantities and graphs that provide you with a preliminary understanding of the data and any potential issues including: –Distribution of data –Potential outliers –Non-detects
  • 29. DQA Step 3: Select the Statistical Method Select the statistical method based on the data user's objectives and the preliminary data review Identify the assumptions underlying the statistical method
  • 30. Step 3: Input Project objectives, hypotheses, and preliminary statistical method if identified Background on statistical methods
  • 31. If identified during planning, determine if that choice seems reasonable based on the preliminary review of the data Otherwise, select statistical method based on the data user's objectives and the preliminary data review Example Methods: –Tests: One-sample t-test, Two-sample t-test, Test for a single proportion, Wilcoxon Signed Rank Test –Estimation –Regression Analysis –Time Series Analysis Select Method
  • 32. Every method has assumptions. Common Assumptions: –Distributional form –Independence –Dispersion characteristics –Homogeneity –Basis for randomization Example - One-sample t-test: random sample, independence of data; sample mean is normally distributed; no outliers; few "non-detects" Identify Assumptions
  • 33. Step 3: Output Proposed statistical method that looks appropriate for the data and the project objectives List of assumptions for the statistical method
  • 34. DQA Step 4: Verify the Assumptions of the Statistical Method Determine approach for verifying assumptions Perform tests of assumptions If necessary, determine corrective actions to be taken
  • 35. Step 4: Input Data Assumptions identified for statistical method Methods to verify these assumptions along with their formulas
  • 36. Determine Approach for Verifying Assumptions and Perform Test Evaluate Step 3 to see what assumptions need to be verified Determine what tests are available for verifying assumptions for this dataset Select test and appropriate significance level
  • 37. Determine Corrective Actions If this data set does not meet the needed assumptions, determine the next steps that should be taken: –Repeat Step 3 and select a different method for analyzing the data –Transform the data –Reduce significance level –Gather additional data –Modify objective ƒ . . . But this should be done with caution.
  • 38. Step 4: Output Documentation of the method used to verify each assumption and the results of these methods Corrective actions (if necessary)
  • 39. DQA Step 5: Draw Conclusions from the Data Perform the calculations for the statistical method Evaluate the results and draw conclusions
  • 40. Step 5: Input Data Objective, hypotheses (if applicable), and performance or acceptance criteria Formulas for statistical method Non-statistical factors to incorporate into the final decision or product
  • 41. Perform the Statistical Method Use formulas and procedures from standard text books Use software to perform the calculations: –SPSS –SAS or Splus –R –DataQUEST Note: Software could be used on any data, whether the assumptions have been verified or not. But when the assumptions don't hold, then the results are highly suspect.
  • 42. Evaluate the Results The statistical results are not necessarily the answer. Factor in items like: –Practical significance –Political/social factors –Contextual significance
  • 43. Step 5: Output Statistical results with a specified significance level Final Product or Decision
  • 44. DQA Steps - Summary Table STEP INPUT PROCESS OUTPUT 1 QA Project Plan or any other planning documents. Project objective or question to be answered; decision performance criteria or other performance and acceptance criteria Reports (e.g., Field Sampling Plan) on actual implementation of sampling plan Translate objectives into a statement of the primary statistical hypothesis or estimation goal Translate objectives into tolerable limits on the probability of committing decision errors Review the sampling design and note any special features of deviations Well-defined project objectives and criteria. Verification that the hypothesis chosen is consistent with the objective and criteria A list of deviations from the planned sampling design and the effects of these deviations. 2 Verified and Validated Data QA reports, QC data Technical systems audits results QA Project Plan, Sampling and Analysis Plan, or other planning documents Review quality assurance reports for anomalies Calculate standard statistical quantities Display the data using graphical representations Statistical quantities and graphs that provide you with a preliminary understanding of the data and any potential issues 3 Project objectives, hypotheses, and preliminary statistical method if identified Background on statistical methods Select the statistical method based on the data user's objectives and the preliminary data review Identify the assumptions underlying the statistical method Proposed statistical method that seems appropriate for the data and the project objectives. List of assumptions for the statistical method. 4 Data Assumptions identified for statistical method. Methods to verify assumptions along with formula Determine approach for verifying assumptions Perform tests of assumptions If necessary, determine corrective actions to be taken Documentation of the methods used to very each assumption and the results. Corrective actions (if necessary) 5 Data Hypotheses (if applicable) and performance or acceptance criteria Formula for statistical method Non-statistical factors to incorporate into the final decision or product. Perform the calculations for the statistical method Evaluate the results of the statistical method and draw conclusions Statistical results with a specified significance level Final product or decision.
  • 46. The Five Steps of Data Quality Assessment 1. Review objectives and data collection design 2. Conduct a preliminary data review 3. Select the statistical method 4. Verify the assumptions of the statistical method 5. Draw conclusions from the data
  • 47. DQA Step 1: Review Objectives and Data Collection Design A. Translate the project's objectives into a statement of the primary statistical hypothesis or estimation goal B. Translate the data user's objectives into performance or acceptance criteria C. Review the sampling design and note any special features or deviation from the sample plan
  • 48. Two Situations to Consider Project Systematically Planned –Use QA Project Plan and other planning documents to perform this action Project NOT Systematically Planned –Use a systematical planning process (e.g., the Data Quality Objectives Process) to plan retrospectively
  • 49. Project Objective (Step 1A) The objective indicates what the final outputs from the project should be For example: –If the goal is to ascertain if the contamination exceeds a threshold, the objective would be "determine whether the contamination is greater than X" –If the goal is to ascertain if the contamination in the soil has reached the groundwater, the objective would be "determine whether the contamination can be detected in the aquifer"
  • 50. Different Analysis Methods Estimation Hypothesis Testing Regression Analysis of Variance Time Series Analysis Spatial Analysis
  • 51. Defining the Boundaries (Step 1A) Define the geographical area within which decisions apply and the media of concern (Spatial Boundary) Determine the time frame to which the study results apply (Temporal Boundary) Define a scale of decision making
  • 52. Specifying Criteria (Step 1B) Set quantitative performance or acceptance criteria Consider consequences of any potential decision errors. Consequences may include: ƒ Health risks ƒ Ecological risks ƒ Political risks ƒ Social risks ƒ Resource risks EXAMPLE: When selecting between two opposing conditions, define a 'gray region', false rejection error limit, false acceptance error limit.
  • 53. Review the Sampling Design (Step 1C) Information Needed Where it would be found Original Sampling Plan - locations and types of samples to be taken. QA Project Plan or other planning documents. Documentation from the Systematic planning process. Details on how the samples were actually collected. Summary reports from field notes, maps. Documentation of deviations from sampling plan. Must be developed based on comparison of original plan and actual implementation. Review of deviations to ensure that the implemented plan still meets the objectives for the project. Must be developed based on information from systematic planning on study objectives.
  • 54. What if you are assessing data collected by another project? Gather available information about the way that that data were collected - for example, sample collection plans, implementation of sample collection plans, analytical method used, . . .
  • 55. PCB Example: Background Electronic Manufacturing Corporation of America operated at site from 1965 to 1985 and sold the site to Energy Components Company in 1985. Both companies went bankrupt in 1990. In 1991, chlorinated solvents were discovered in water from city wells located in a field east of site. Waste oil contaminated with PCBs was sprayed on a dirt road on the site for dust suppression while the site was operational. Problem: Determine if the extent of PCB contamination along the road presents unacceptable risks and remedial action is needed.
  • 56. PCB Example: Statistical Hypotheses If the mean concentration of total PCBs in surface soil (top 1 inch) over the dirt road exceeds 2 mg/Kg, then take remedial action; otherwise, take no further action. Null Hypothesis: True Mean < 2 mg/Kg vs. Alternative Hypothesis: True Mean > 2 mg/Kg
  • 57. PCB Example: Tolerable Limits on Decision Errors Probability of Deciding that the Mean Exceeds 2 mg/Kg True Concentration of PCB (mg/Kg) Tolerable False Rejection Decision Error Rates Tolerable False Acceptance Decision Error Rates Action Level Gray Region
  • 58. Composite samples selected using simple random sampling from the dirt road. Dirt road was one of 4 strata at the site (stratum 1). Each composite consists of 5 mini- samples. PCB Example: Sampling Design Stratum 1
  • 59. PCB Example: Data PCB concentration levels were measured (in mg/Kg) from 16 surface soil samples (top one inch of soil) from the dirt road. Each soil sample consists of 5 mini-samples composited together. 1.92 2.49 4.58 1.17 2.48 5.62 2.54 25.15 7.72 1.02 2.91 3.23 2.87 8.66 1.71 1.18
  • 60. A. Review quality assurance reports for anomalies: ƒ Anomalies in recorded data, missing values, deviations from standard operating procedures, failure to meet acceptance criteria, use of nonstandard data collection methodologies B. Calculate standard statistical quantities: ƒ Do the data look reasonable? Do the values make sense? Are there any anomalies? C. Display the data using graphical representations: ƒ Are there any trends? What is the distribution like? Are there any extreme values? DQA Step 2: Conduct Preliminary Data Review
  • 61. Review QA Reports (Step 2A) Data validation reports that document sample collection, handling, analysis, data reduction, and reporting procedures used Quality control reports from laboratories or field stations that document measurement system performance, including data from check samples, split samples, spiked samples, or any other internal QC measures Technical system reviews, performance evaluation audits, and audits of data quality, including data from performance evaluation samples
  • 62. Summary Statistics (Step 2B) Central Tendency: mean, median, mode Relative Standing: –5th percentile –25th percentile –50th percentile –75th percentile –95th percentile –99th percentile Dispersion: variance, standard deviation, range Association: correlation coefficient, regression
  • 63. Commonly Used Graphs (Step 2C) Histogram Scatter Plot Stem-and-Leaf Time Plot Ranked Data Plot Spatial Correlogram Quantile Plot Posting Plot Normal Probability Plot Symbol Plot
  • 64. PCB Example: Statistical Quantities Number of Observations: 16 Minimum: 1.020 Maximum: 25.150 Mean: 4.703 Median: 2.705 Variance: 34.808 Standard Deviation: 5.900 Range: 24.130 Interquartile Range: 3.285 Coefficient of Variation: 1.254 Coefficient of Skewness: 2.818 Coefficient of Kurtosis: 7.321 Percentiles: 1st: 1.020 75th: 5.100 5th: 1.020 90th: 8.660 10th: 1.170 95th: 25.150 25th: 1.815 99th: 25.150
  • 66. PCB Example: Ordered Data Plot
  • 67. PCB Example: Summary of Statistical Quantities and Graphs Not symmetric One extreme value (21.15) Does not appear to be normally distributed
  • 68. DQA Step 3: Select the Statistical Method A. Select the statistical method based on the project's objectives and the preliminary data review B. Identify the assumptions underlying the statistical method
  • 69. If the statistical method has been identified during planning, ensure it is applicable based on the preliminary review of the data Otherwise, select statistical method based on the data user's objectives and the preliminary data review Example Tests: –One-sample t-test –Two-sample t-test –Test for a single proportion –Wilcoxon Signed Rank Test Select Method (Step 3A)
  • 70. Each method has assumptions which can be found in references on that test or in Guidance for Data Quality Assessment Common Assumptions: –Distributional form –Independence –Dispersion characteristics –Homogeneity –Basis for randomization Example: One-sample t-test: random sample; independence of data; sample mean is normally distributed; no outliers; not an excessive amount of "non-detects" Identify Assumptions (Step 3B)
  • 71. ONE SAMPLE t-TEST ASSUMPTIONS: No outliers (sample mean and standard deviation are very sensitive to outliers) Sample mean approximately normally distributed Random sample (independence of the data values) Relatively few values below the detection limit PCB Example: Select Method and Identify Assumptions
  • 73. Example: Raw Data ND 4.5 19.4 ND 5.8 19.5 ND 6.2 19.5 ND 6.8 19.7 ND 8.33348 19.7 ND 7.4 19.7 ND 12.5 19.8 ND 14.7 19.8 ND 19.0 204.6
  • 75. 6 5 4 3 1 60 3 9 6 55 5 5 5 6 7 7 8 2 50 0 0 1 2 3 3 3 7 7 7 7 7 5 5 5 5 5 5 45 5 6 7 7 8 8 8 8 8 9 4 4 4 4 4 4 3 3 3 3 2 40 0 2 3 4 35 9 Site 2 Ozone June Site 1 Ozone June Example: Stem and Leaf
  • 77. Example: Box and Whiskers
  • 80. Example: Posting Plot 15.4 11.4 7.4 12.3 8.3 43.0 14.7 10.7 6.7 10.5 6.5 2.4 Creek Road
  • 82. The Five Steps of Data Quality Assessment 1. Review objectives and data collection design 2. Conduct a preliminary data review 3. Select the statistical method 4. Verify the assumptions of the statistical method 5. Draw conclusions from the data
  • 83. DQA Step 4: Verify the Assumptions of the Statistical Test A. Perform method to test assumptions: ƒ Determine what methods are available ƒ Select method and significance level ƒ Perform calculations B. If necessary, determine corrective actions to be taken, e.g., select different test in Step 3, reduce significance level, gather additional data, modify slightly the objectives of the study, etc.
  • 84. Types of Assumptions (Step 4A) Random sample Independence of data Distribution of Data Existence of Outliers Extent of non-detect data
  • 85. Verify Random Sample Review data collection plan to verify that some element of random selection was used to locate the samples If judgmental sampling was used to choose some sample locations, you may need to consult a statistician to determine if the data collection is "random enough" to be able to draw reliable conclusions from the data If judgmental sampling was used to choose all sample locations, the ability to draw conclusions about the entire population is limited
  • 86. Verify Independence of Data For one variable (i.e., one contaminant) ensure there are no trends within the data - for example, by location, by time period, etc. For two (or more) variables (i.e., two or more contaminants), ensure that they are not correlated with one another
  • 87. Verify Distribution Typical assumptions are: –Normal distribution –Symmetric distribution An indication of the distribution can be gained by reviewing the summary statistics and the graphs Quantitative tests are available to verify whether a data set has a specific distribution
  • 88. Verify Outliers Statistical outliers are anomalies with respect to the proposed distribution for the data -- an extreme value may be a statistical outlier but may still be a valid data point Before removing any statistical outliers, review these values on both a scientific or quality assurance basis -- do not delete any values unless there is a scientific or quality assurance based reason for the removal If any data are deleted, run all analyses both with and without the data to see the effect of the deletions
  • 89. Using Data Below the Detection Limit Quality control information would indicate which data points were non-detects Methods for Addressing non-detects: –Substitution, e.g., detection limit, 1/2 detection limit, . . . –Special algorithms, e.g., Winsorized Mean, Cohen's Method, . . . –Use of percentiles
  • 90. Types of Corrective Actions (Step 4B) Select a different test or method of analysis Transform the data to a different metric Gather additional data or modify objective
  • 91. PCB Example: Background Electronic Manufacturing Corporation of America (EMCA) operated at site from 1965 to 1985, when site was sold to Energy Components Company (ECC). Both companies went bankrupt in 1990. In 1991, chlorinated solvents discovered in water from city well field east of site. Waste oil contaminated with PCBs was sprayed on a dirt road for dust suppression. Problem: Determine the extent of PCB contamination on the dirt road that presents unacceptable risks.
  • 92. No outliers? Data are approximately normally distributed? Random sample (for independence of data values)? No data reported as "Not Detected?" PCB Example: Assumptions of the t-Test for a Single Mean
  • 93. Is the extreme value of 25.15 a statistical outlier? Extreme Value Test will be used - this test assumes data without the outlier are normally distributed. PCB Example: Identifying Outliers
  • 94. Results: Data are Not Normally Distributed PCB Example: Testing for Normality Shapiro-Wilk W Test Shapiro-Wilk Test Null Hypothesis: 'Data are normally distributed' Sample Value: 0.836 Tabled Value: 0.881 Non-normality has been detected at a 5% significance level.
  • 95. Data without outlier appears to be lognormally distributed in the Histogram So, apply Shapiro-Wilk W Test to natural logarithms of the data to test for lognormality -- if logged data are normally distributed, then untransformed data are lognormally distributed Original data value 1.92 becomes 0.652, 2.49 becomes 0.912, etc. PCB Example: Not Normally Distributed
  • 96. PCB Example: Testing Data for Lognormality Shapiro-Wilk Test Null Hypothesis: ‘Data are normally distributed' Sample Value: 0.953 Tabled Value: 0.881 There is not enough evidence to reject the assumption of normality with a 5% significance level. Result: Can not reject the assumption that the data are lognormally distributed and that the logs of the data are normally distributed.
  • 97. PCB Example: Discordance Test for Outliers Discordance Test for Outliers Value Tested: 3.225 [ ln(25.12)] Sample Value: 2.476 Tabled Value: 2.443 Conclude 3.225 is an outlier at a 5% significance level.
  • 98. PCB Example: Assumptions Satisfied? No outliers? Data approximately normally distributed? Random sample? All data is above the detection limit?
  • 99. PCB Example: A New Look at the Logged Data Number of Observations: 15 Minimum: 0.020 Maximum: 2.159 Mean: 1.002 Median: 0.932 Variance: 0.428 Standard Deviation: 0.654 Range: 2.139 Interquartile Range: 0.985 Coefficient of Variation: 0.653 Coefficient of Skewness: 0.244 Coefficient of Kurtosis: -0.783 Percentiles: 1st: 0.020 75th: 1.522 5th: 0.020 90th: 2.044 10th: 0.157 95th: 2.159 25th: 0.536 99th: 2.159
  • 100. DQA Step 5: Perform Method and Draw Conclusions from the Data A. Perform the calculations for the statistical method B. Evaluate the results and draw conclusions
  • 101. Perform the Statistical Method (Step 5A) For basic methods, formulas and procedures are available in standard text books Software can also do the calculations: –SAS, SPSS, or SPlus –DataQUEST
  • 102. A statistical test can give two results: –Reject the baseline condition or –Fail to reject the baseline condition If the test does not reject the baseline condition, then the false acceptance error rate must be verified Evaluate the Results (Step 5B)
  • 103. If the baseline condition (Null Hypothesis) is true and the statistical test rejects the baseline condition with a 5% significance level, then the results will occur naturally with a chance of less than one-in-twenty. A chance of less than one-in-twenty is highly unlikely, therefore it is unlikely that the baseline condition is true. Therefore, the alternative condition is selected as being correct (with a significance level of 5%). Significant at 5%
  • 104. PCB Example: Perform Calculations Formula: ƒ Reject baseline condition if t > t1-α α = 5, t0.95 = 1.753 AL = 0.6931 [the natural log of 2.0] Results: Reject baseline condition at a 5% level of significance. t = ¯X - AL s n = 1.002 - 0.6931 0.654 15 = 1.818
  • 105. PCB Example: Conclusions The t-test rejected the null hypothesis that the mean is less than action level at a 5% significance level Statistically, the mean concentration of total PCBs in surface soil (top 1 inch) over the dirt road exceeds 2 ppm and remedial action is necessary
  • 107. Manganese Exposure Manganese (Mn) is a metal emitted from industrial operations (e.g., steel mills) and cars using gasoline with additive MMT Mn is an essential dietary requirement in trace amounts but higher exposures are toxic: –occupational exposure shown to affect motor skills –disruption of neurotransmitters may adversely affect higher-level cognitive skills –exposure during infancy and early childhood may cause learning disabilities and emotional problems Research Problem: Determine if elevated exposure to Mn during early childhood can be linked to cognitive dysfunction in teenagers
  • 108. Manganese Exposure Routes Mn is usually inhaled or ingested MMT can be absorbed through the skin Key route of concern: Mn is transported to the brain via the olfactory nerve when Mn in coarse particles is inhaled Exposure to Mn can be estimated by analyzing ambient air monitoring data in Total Suspended Particulates (TSP) - undisturbed dust in attics and other areas of homes can be an indicator of long-term exposure
  • 109. DQA Step 1: Review Objectives and Data Collection Design Translate the data user's objectives into a statement of the primary statistical hypothesis or estimation goal Translate the data user's objectives into tolerable limits on the probability of committing decision errors Review the sampling design and note any special features or deviation from the sample plan
  • 110. Pilot Study: Objective and Background Objective: Determine whether the proposed study area is suitable for a large-scale investigation Background –Range of Mn exposures vary from very high to very low –Concentrations of Mn in air and dust decrease with distance from known point source (steel mill) –There are few, if any, confounding factors, such as lead in paint
  • 111. Exploratory nature of pilot study does not warrant specification of rigorous performance criteria; however, results must support the larger study goals –Detect correlation of 40% with at least 80% power –Identify very high and very low exposures Sample size for TSP analysis driven by existing ambient air monitoring network design for air measurements; sample size for dust samples based on performance criteria, budget, professional judgment, and experience with similar studies Performance Criteria for Pilot Study
  • 112. Pilot Study: Statistical Methods and Performance Criteria Statistical Methods: –Determine correlation between distance to steel mill and Mn concentrations in air (TSP) and house dust –Estimate annual average air concentrations of Mn in TSP at various distances from steel mill 10% significance level chosen for testing hypotheses about relationships between Mn exposure concentrations and results of cognitive tests
  • 113. Pilot Study: Performance Criteria (continued) Correlations expected to be relatively low (10%- 50%) and negative False negative error rate specified by determining power under various assumptions of correlation 10% 20% 30% 40% 50% 50% 166 42 19 11 7 60% 237 60 27 15 10 70% 327 82 36 20 13 75% 383 96 42 23 14 80% 451 113 49 27 17 85% 537 134 58 32 19 90% 656 163 72 39 24 True Population Correlation Power
  • 114. Review Sampling Design 2-stage stratified random sampling design –Sampling frame was phone list in the target area around the steel mill, pre-filtered to include households with children age 13-17 –Sampling population was stratified by distance to the steel mill –First stage of sampling was random-digit-dialing to recruit households who met the study criteria –Second stage of sampling was within the set of eligible and willing participants For each household selected, dust from the attic or basement was collected using a vacuum device per established protocols
  • 115. 30 households were successfully sampled including individual houses, attached townhouses, and apartments The field report indicated that 50% of the households were sampled in a basement furnace area, 10% in other basement areas, 10% in attics, and the remainder in miscellaneous crawl spaces or other furnace areas All of these locations seem reasonable in light of the project objectives Review Data Collection Reports
  • 116. Review quality assurance reports for anomalies Calculate standard statistical quantities Display the data using graphical representations DQA Step 2: Conduct Preliminary Data Review
  • 117. Quality Control Reports There were no "below the detection limit" values in the Mn data set Duplicate sample analyses and instrument analyses indicated excellent agreement, all within 5% There were no significant anomalies in the QC reports
  • 118. Summary Statistics for Mn in Dust Mean 924.23 Std Dev 2566.9 C.V. 2.78 Median 390.0 Mode N.A. Range 14420 Minimum 16 Maximum 14436 Percentile Value 95% 1339 75% 699 50% 390 25% 258 5% 130 N = 30
  • 119. Histogram of Mn in Dust Example Conc. (ppm) Frequency 0-99 1 100-199 1 200-299 6 300-399 7 400-499 5 500-599 1 600-699 2 700-799 4 800-899 0 900-999 0 1000-1099 1 1100-1199 0 1200-1299 0 1300-1399 1 1400-1499 0 1500 + 1
  • 120. Box-and-Whiskers Plots of Mn in Dust Full Data Set + 16 (median) 390 (mean) 924 o14436 Without Potential Outlier + o 16.0 (median) 256.0 (mean) 458.3 1339.0
  • 121. Scatter Plot of Mn vs. Distance using a logarithmic scale for Mn Mn (ppm) Distance (mi) 465 39.06 712 20.98 456 20.29 502 27.42 448 34.82 414 27.53 341 39.26 212 16.88 619 34.86 1339 30.80 243 42.77 250 39.26 16 39.19 203 39.26 130 39.41 327 12.83 332 12.15 309 16.17 350 12.82 366 12.25 258 14.67 225 6.66 743 6.60 468 16.08 14436 13.85 726 14.74 347 16.29 1037 1.21 754 1.22 699 13.86
  • 122. DQA Step 3: Select the Statistical Method Select the statistical method based on the data user's objectives and the preliminary data review Identify the assumptions underlying the statistical method
  • 123. Two potential measures of correlation: –Pearson's correlation coefficient ƒ detects linear relationship between two sets of values ƒ sensitive to extreme values ƒ not affected by linear transformations –Spearman's rank correlation coefficient ƒ uses ranks of data values ƒ less sensitive to extreme values ƒ not affected by monotonic nonlinear transformations Correlation Between Mn and Distance to Point Source
  • 124. Assumptions to be Verified for Pearson's Correlation Coefficient Random Sample Independence of data Linear Relationship No Outliers Normal Distribution
  • 125. DQA Step 4: Verify the Assumptions of the Statistical Method Determine approach for verifying assumptions Perform tests of assumptions If necessary, determine corrective actions to be taken
  • 126. Assumption: Random Sample Verification Method: Review how the data collection plan was developed and implemented (i.e., how the sample locations were chosen) Verification Results: The QA Project Plan and the data collection documentation reports that random sample locations were chosen It is highly probable that this assumption of a random sample has been met
  • 127. Assumption: Independence of Data Verification Method: Rank von Neumann test Verification Result: Using the baseline condition (Null Hypothesis) that there is no serial correlation present, the Rank von Neumann test conclude that there is no serial correlation at the 10% level of significance It is highly probable that this assumption of a independence has been met
  • 128. Assumption: Few Values Below Detection Limit Verification Method: Review data Verification Result: There are none in this data set This assumption of few values below the detection limit has been met
  • 129. Assumption: No Outliers Verification Method: –Review histogram: Look for very high or very low values as compared to the rest of the distribution –Review Box Plot: Potential outliers show up quickly. –Use Rosner's Test on any extreme values (with a 5% signicance level.
  • 131. Review Box and Whiskers Plot Looks like an extreme outlier! + 16 (median) 390 (mean) 924 o14436
  • 132. Assumption: No Outliers Verification Result: Rosner's test has determined that the observation 14436 is a statistical outlier at the 5% significance level and should be investigated further. (The sample value was 5.264 and the critical value was 2.910.) This assumption of no outliers has not been met!
  • 133. Look at the data without the outlier... Looks better, but still may have an outlier... + o 16.0 (median) 256.0 (mean) 458.3 1339.0
  • 134. Check for outliers in censored data set Verification Result: Rosner's test has determined that the observation 1339 is a statistical outlier at the 5% significance level and should be investigated further. (The sample value was 3.129 and the critical value was 2.890.) HOWEVER, Rosner's test assumes normality...
  • 135. Shapiro-Wilk Test for Normality The Shapiro-Wilk test for normality determined that the baseline condition that the data are normally distributed can be rejected at the 5% significance level (Sample value = 0.900; Critical Value = 0.926) Non-normality has been detected at the 5% level of significance, but only just!
  • 136. What about a log transform of the censored data set? + o 16.0 (median) 256.0 (mean) 458.3 1339.0
  • 137. What about a log transform of the censored data set? Still looks skewed: Shapiro-Wilk test confirms that data are not strictly normal. 205 394 593 772 961 1150 1339 Concentration (ppm) 0 5 10 15 Frequency 16
  • 138. What should we conclude about assumptions? Data seems to be an independent random sample Data are not normally distributed (or lognormal, either) Data probably has multiple outliers -- it's heavily skewed to the right. Methods such as Pearson's correlation coefficient, that are sensitive to extreme values, are not recommended Spearman's Rank Correlation Coefficient is a better approach for evaluating the relationship between Mn in dust vs. distance to point source
  • 139. DQA Step 5: Perform Method and Draw Conclusions from the Data Perform the calculations for the statistical method Evaluate the results of the statistical method and draw conclusions
  • 140. Spearman's Rank Correlation Coefficient Results are less sensitive to extreme values (use the original full data set) The Spearman's Correlation Coefficient of distance from the source to Mn concentration is -0.406 ƒ The correlation is negative (as suspected) and the concentration decreases as distance from the point source increases (a negativecorrelation coefficient) ƒ The relationship is not strong (only 40% of the total variation is attributable to a relationship between distance and concentration)
  • 141. Interpret the Results Factors other than distance-to-point-source that determine concentrations of Mn in dust: –Infiltration rates into the dwelling –Housekeeping practices –Age of dwelling (how long the dust was left undisturbed) –Changing wind patterns in the region Those other factors mediate the relationship between Mn in dust and distance to point source. Nonetheless, the researchers believe there is enough of a relationship to support the use of distance as a basis for stratifying the population in the larger study