As companies scramble to adjust to the demands of an increasingly data-driven world, testers are told “go test data quality” without any guidance as to what that entails or how to go about it. The fact that the data is often a living, flowing ecosystem, rather than just a single object, requires the use of different strategies to gain meaningful insights. Shauna Ayers and Catherine Cruz Agosto guide you through the challenges of data quality and apply a structured approach to analyze, measure, test, and monitor living data sets, and gauge the business impact of data quality issues. Shauna and Catherine define data quality, describe the five goals of data quality management, provide the four pillars of data quality assurance, and show how data flow, scale, and properties interact to build the data quality landscape. Learn how to tame the data quality beast, determine what and how to test, overcome technical obstacles—and emerge with a usable plan of attack.
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Survival Guide: Taming the Data Quality Beast
1. 4/23/15
1
Survival
Guide:
Taming
the
Data
Quality
Beast
By
Shauna
Ayers
and
Catherine
Cruz
Agosto
About
.
• Availity
is
a
trusted
intermediary
for
informa:on
exchange
between
health
plans
and
providers
• Availity
eases
the
complexity
of
moving
business
and
clinical
informa:on
to
health
care
stakeholders
na:onwide
• Availity’s
real-‐:me,
point-‐to-‐point
connec:vity
provides
speed
and
accuracy
at
the
intersec:on
of
health
care
and
technology
• Availity’s
tools
include:
– A
mul:-‐payer
Web
Portal
– An
all-‐payer
Advanced
Clearinghouse
– A
powerful
Revenue
Cycle
Management
suite
– A
smarter
Pa:ent
Access
solu:on
2. 4/23/15
2
Overview
• Data
Quality
Defini:ons
and
Impact
• The
5
Goals
of
Data
Quality
• The
4
Pillars
of
Data
Quality
• The
Flow
of
Your
Data
• The
4
V’s
of
Your
Data
Sets
• The
Proper:es
of
Your
Data
• Sharing
the
Health
of
Your
Data
Defini:ons
and
Impact
• Data
quality
is
data's
fitness
and
usability
for
its
intended
purpose.
• Data
quality
assurance
is
the
monitoring
and
analysis
of
data
sets
and
the
processes
that
create
or
manipulate
data,
in
order
to
ensure
the
data’s
quality
meets
the
company's
needs.
• The
role
of
data
quality
assurance
within
the
company
is
to
iden:fy
problems
with
its
data
and
to
manage
these
problems,
preven:ng
them
wherever
possible,
and
correc:ng
those
that
cannot
be
prevented.
• Func?ons
suppor?ng
data
quality
assurance,
and
frequently
integrated
with
it,
include
but
are
not
limited
to
data
governance,
data
architecture,
data
stewardship,
data
quality
tes:ng,
and
data
cleansing.
3. 4/23/15
3
The
5
Goals
of
Data
Quality
• Prevent
• Detect
• Communicate
• Mi:gate
• Correct
These
goals
guide
us
and
light
our
path.
The
4
Pillars
of
Data
Quality
• Analysis
and
Profiling
• Strategies
and
Tac:cs
• Tes:ng
• Intelligence
4. 4/23/15
4
• Data
is
not
sta:c.
It
constantly
flows
between
data
sets
and
applica:ons
in
con:nuing
waves
of
gathering,
delivery,
storage,
integra:on
/
transforma:on,
retrieval
and
analysis.
• …So,
how
do
we
test
a
moving
target?
The
Flow
of
Your
Data
The
4
V’s
of
Your
Data
Sets
The
scale
of
your
data
is
driven
by
the
four
V’s:
• Volume
• Variety
• Vitality
• Velocity
The
boundaries
of
each
data
set
are
defined
by
business
rules
and
constraints.
The
content
of
each
data
set
is
what
is
measured
or
evaluated.
Volume
Variety Velocity
Vitality
5. 4/23/15
5
The
Proper:es
of
Your
Data
The
quality
of
your
data
is
driven
by
various
proper:es:
• Accuracy
• Completeness
• Timeliness
• Consistency
• Validity
• Temporal
Reliability
• Interpretability
• Accessibility
• Usage
• Precision
• Uniqueness
Property
+
Business
Value
=
Impact
of
Quality
problem
Sharing
the
Health
of
Your
Data
To
find
your
quarry,
and
tame
it,
you
must
be
able
to
see
the
forest
for
the
trees.
Ar:facts
used
to
communicate
data
system
health:
• Dashboards
• System
monitoring
alerts
• Reports
• Bug-‐tracking
:ckets
6. 4/23/15
6
Analysis
and
Profiling
Pillar
Analyzing
the
data
can
give
valuable
insight
into
the
data.
It
can
shed
light
on
paberns
that
might
not
have
been
seen
previously.
Profiling
allows
for
similar
data
to
be
grouped.
• Categoriza:on
• Methods
• “Gotchas”
and
possible
challenges
• Gathering
metrics
– On
data
– On
test
coverage
• Dependencies,
rela:onships
and
paberns
Strategies
and
Tac:cs
Pillar
Most
companies
use
a
mix
of
strategies
and
tac:cs,
such
as:
• Input
valida:on
• Cri:cal
value
checks
(sampling
or
periodic
analysis
of
standing
data)
• In-‐line
valida:on
• Hash
values
and
checksums
• Tolerance
checks
and
sta:s:cal
analysis
• Architectural
and
domain
integrity
checks
Without
a
plan,
your
results
can
be
haphazard.
7. 4/23/15
7
Tes:ng
Pillar
Types
of
tests
• Count
checks
• Compare
checks
• Business
Rule
Valida:on
• Null
value
checks
• Code
Checks
Methods
and
Strategies
• Exploratory
• Manual
• Automated
Tools
• Buying
vs.
In-‐house
• Machine
cannot
replace
a
human
Intelligence
Pillar
Data
Quality
intelligence
provides
visibility
of
the
data
environment,
suppor:ng:
• Opera:onal
Troubleshoo:ng
• Process
Improvement
• Risk
Analysis
• Data
Governance
and
Regulatory
Compliance
Metrics
useful
for
DQ
Intelligence
• Current
state:
unresolved
defects
or
failed
tests
• Property
Tolerances:
e.g.,
histogram
analysis,
%
change
over
:me
• Defect
Trends
over
:me:
defect
count
by
data
set
or
type
• Test
Coverage:
%
implemented/%
possible
8. 4/23/15
8
Property:
Accuracy
• Defini:on:
Whether
the
data
values
stored
for
an
object
are
the
correct
values.
To
be
correct,
a
data
value
must
be
the
right
value,
and
must
be
represented
in
a
consistent
and
unambiguous
form.
• Possible
DQ
checks:
Hash
values
and
checksums,
business
rule
valida:ons,
source-‐
to-‐target
value
comparisons
• Examples:
– Mismatch
between
labeling
and
content
– American
vs
European
date
formats
– “John
Doe”
vs
“JOHN
DOE”
Property:
Completeness
• Defini:on:
When
all
the
data
required
to
meet
the
requirements/business
need
is
available
in
the
target
• Possible
DQ
checks:
Source-‐to-‐Target
Count
checks,
Compare
Checks,
not-‐null
checks
• Examples:
– Inconsistent
data
types
between
source
and
target
– Unenforced
column
is
null
in
the
target.
– Missing
criteria
in
filter
causing
records
to
be
missed
9. 4/23/15
9
Property:
Timeliness
• Defini:on:
Whether
data
is
visible
when
the
user
or
consuming
applica:on
expects
it
to
be.
• Possible
DQ
checks:
process
control
tolerance
checks,
ID
comparisons,
missing
update
checks
• Examples:
– Package
delivery
– Credit
card
account
ac:vity
– CRM
data
Property:
Consistency
• Defini:on:
The
process
works
all
the
:me.
No
maber
what
source
you
get
the
data
from,
it
should
be
the
same
if
it
correlates.
• Possible
DQ
checks:
Business
Rule
Valida:on,
Source-‐to-‐target
Compare
• Example:
– Table
A
shows
one
address
for
customer
and
Table
B
shows
another
– Account
informa:on
is
different
when
look
at
profile
on
website
vs
mobile
app
10. 4/23/15
10
Property:
Validity
• Defini:on:
The
correctness
and
reasonableness
of
data,
how
well
it
conforms
to
the
syntax
(format,
type,
range)
of
its
defini:on.
• Possible
DQ
checks:
input
valida:on,
parametric
checks,
domain
checks
• Examples:
– Two-‐digit
years
on
birthdates
for
Medicare
enrollees
– Nega:ve
cycle
:mes
– Invalid
customer
codes
Property:
Temporal
Reliability
• Defini:on:
Time
dependent
data
• Possible
DQ
checks:
Source
to
target
count
checks,
Compare
checks
• Example:
– Source
to
view
change
from
daily
to
real-‐:me
– Process
loads
data
to
source
table
is
delayed
11. 4/23/15
11
Property:
Interpretability
• Defini:on:
How
easy
is
it
to
extract
understandable
informa:on
from
the
data
• Possible
DQ
checks:
Histograms,
source-‐to-‐
target
ID
compares
over
date
range
• Examples:
– Units
of
measurement:
Metric
mishap
caused
loss
of
NASA
orbiter
Property:
Accessibility
• Defini:on:
Is
it
available?
• Possible
DQ
checks:
Security
checks,
source-‐
to-‐target
checks
• Examples:
– User
unable
to
search
for
data
when
using
one
iden:fier
but
can
find
record
using
a
different
iden:fier
– Order
specific
12. 4/23/15
12
Property:
Usage
• Defini:on:
Does
the
data
support
the
usage
to
which
it
is
being
applied?
• Possible
DQ
checks:
Duplicate
checks,
histograms,
ID
compares
over
:me,
domain
checks
• Examples:
– Time
Zone
assump:ons:
Data
from
the
future
– Page
rankings
derived
from
links
to
the
page
– Cross-‐grain
configura:on
values
(“All”
or
“Other”)
Property:
Precision
• Defini:on:
Correla:on
between
what
is
reality
and
what
is
shown
in
the
data.
• Possible
DQ
checks:
Business
Rule
Valida:on,
Source
to
target
comparison
• Example:
– Incorrect
address
displayed
for
customer
– Showing
Customer
A
data
in
Customer
B’s
account
page
– Calcula:ons
13. 4/23/15
13
Property:
Uniqueness
• Defini:on:
What
makes
a
data
en:ty
one
of
its
kind.
• Possible
DQ
checks:
Duplicate
checks
• Examples:
– Mul:ple
customer
entries
in
CRM
system
– Mul:ple
conflic:ng
configura:on
entries
for
same
en:ty
– Duplicate
inventory
entries
Overall
picture/
conclusion
• Any
expedi:on
to
ensure
data
quality
in
the
living,
dynamic
data
ecosystem
that
occurs
in
every
company
requires
the
following:
– clear
goals
to
guide
efforts,
– a
func:onal
framework
providing
the
tools
to
work
with,
– an
understanding
of
the
living
flow
of
your
data,
– an
understanding
of
its
fundamental
shape
and
nature
– clear
communica:on
of
these
elements
to
all
members
of
the
party
involved