New Paradigm for Ensuring and Improving Data Quality and Usability

A
New
Paradigm
for
Ensuring
and
Improving

Dataset
Quality
and
Usability

–
Roles
and
Responsibili?es
of
Stewards
and
Other
Major
Product
Stakeholders

Ge
Peng

NOAA’s
Coopera?ve
Ins?tute
for
Climate
and
Satellite
–
North
Carolina
(CICS-‐NC)

NC
State
University
and
NOAA’s
Na?onal
Centers
for
Environmental
Informa?on
(NCEI)

In
Collabora?on
with

Nancy
Ritchey,
Kenneth
Casey,
Edward
Kearns,
Jeﬀrey
PriveQe,

Drew
Saunders,
Philip
Jones,
Tom
Maycock,
and
Steve
Ansari

Version
20160515

CC-‐BY-‐SA
4.0

POC:
gpeng@cicsnc.org

What
Is
Data
Quality?

Who
Should
Care?

Ø How
good
or
bad
a
data
product
is.

Ø All
Key
Players
-‐
everyone
who
develops,
creates,
produces,

stewards,
manages,
publishes,
or
serves
the
product

Ø Other
major
product
stakeholders
(including
sponsors,
power

users,
and
management)

Ø General
users

What
Is
Data
Usability?

Ø How
easy
or
hard
a
data
product
is
understood
and
used.

Quality
-‐
How
good
or
bad
something
is

•  Product
quality
–
degree
to
which
the
data
product
is
produced
and
described
correctly.

•  Stewardship
quality
–
degree
to
which
the
data
product
was
being
preserved
and
cared
for

properly.

Steward
-‐
A
person
managing
or
caring
for
other’s
assets

•  A
role
in
incorporaSng
processes,
policies,
guidelines
and
responsibiliSes
to

administering
organizaSon’s
data
in
compliance
with
policy
and/or
regulatory

obligaSons.

•  Requires
expert
domain
knowledge
and
general
knowledge
for
relevant
domains
and

intenSon
to
ensure
and
improve
the
stewardship
of
other
people’s
datasets.

§  Data
steward:

Ø  A
role
responsible
for
managing
both
dataset
and
metadata

§  Scien?ﬁc
steward:

Ø  A
role
responsible
for
managing
data
quality
and
usability

§  Technology
steward:

Ø  A
role
responsible
for
managing
tools
and
systems

(
Source:
Chisholm
2014;

Peng
et
al.
2016)

•  Stewards
are
stewardship
roles
assigned
to
domain
subject

maYer
experts
(SMEs)
who
have
general
knowledge
of
other

relevant
domains.

§  SMEs
are
people
with
extensive
knowledge
and
experiences
in
their

local
domains.

§  The
role
of
SME
is
gained
and
not
assigned.

•  Stewards
need
to
have
a
mindset
of
caring
for
other
people’s

asset
(e.g.,
data
products)
and
are
capable
of
communicaSng

within
and
across-‐domains.

•  One
person
could
be
assigned
more
than
one
stewardship
role.

(Source:
Chisholm
2014;
Peng
et
al.
2016)

Something
about
Stewards

Ensuring
and
improving
data
quality
and
usability

throughout
the
life
cycle
of
a
dataset

•  Old
days
–
one
person

Ø  Primarily
done
by
data
producers

Ø  Usability,
i.e.,
easy
to
use,
is
usually
not
taking
into
consideraSon

Ø  InformaSon
about
procedures
or
pracSces
on
data
quality
are
hard
to
come
by

Ø  Data
choice
is
limited
for
users
and
users
have
no
choice
but
to
wait
for
the

release
of
the
dataset

•  Nowadays
–
an
integrated
team

Ø  Need
to
be
more
scalable

Ø  Need
to
be
more
integrated

Ø  Need
to
be
more
Smely

Ø  InformaSon
about
methods
and
results
need
to
be

§  readily
available;
in
an
easy
to
understand
and
interoperable
format

Ø  Users
have
many
choices
and
they
do
not
have
to
wait
for
or
use
your
data

A
Quality
Example
We
Can
All
Relate
To

Product
Quality
Stewardship
Quality
Use/Service
Quality

Data
Producers

•  Deﬁne/Create/Obtain

Stewards

•  Maintain/Preserve/Document/Access

Data
Providers/Users

•  Use/Service

Food
Quality

•  Requirements

•  Produc?on/distribu?on

•  Info
on
product
specs

•  Storage,
transport,
re-‐distribu?on

•  Product
packing/labels

•  Cooking
instruc?on

•  Stores/restaurants/homes

•  Derived
products
-‐-‐-‐>

•  Timeliness/Presenta?on

Data
Quality

Producers
Middlemen
Providers

A
shared
responsibility
in
ensuring
quality!

So
We
All
Have
To
Talk
To
Each
Other
–
That
Is
The
Problem!

(another
example:
adap?ng
ISO
OAIS
RM
for
long-‐term
preserva?on)

Func?onal

En??es

Data

Produc?on

Roles

Ingest

Metadata

Documenta?on

Archive

Dissemina?on

Access

Service

Data
Use

Data

Producer

Metadata
Specialist

Access
POC

Science
POC

User
Service
POC

Access
Specialist

User
Service
POC

Archive
POC

Science
POC

Data

Consumer

Stakeholders
including
Sponsors
and
Management

•  We
do
not
talk
in
the
same
language

•  We
do
not
communicate
in
the
same
channel

Potential interfaces in knowledge domains

Why
Do
We
Need
to
Deﬁne
Roles
of
Stewards?

Data
Producer
Metadata
POC

Adap?ng
ISO
Data
Quality
(DQ)
Metadata
Standard

Why
Do
We
Need
to
Deﬁne
Roles
of
Stewards?

Stewards
help
capture
and
convey
DQ
info
into
the
context
of
DQ
metadata!

Data
Producer
Metadata
POC

Adap?ng
ISO
Data
Quality
(DQ)
Metadata
Standard

Why
Do
We
Need
to
Deﬁne
Responsibili?es
of

Key
Players
and
Stakeholders?

Data
Producer

Program
Managers

Metadata
POC

Stewardship
Management

Adap?ng
ISO
Data
Quality
(DQ)
Metadata
Standards

Ø  Crea?ng
and
improving
DQ
metadata
and
documenta?ons
is
beyond
the
current
job

scope
and
exper?se
of
data
providers
and
metadata
curators.

Ø  Deﬁning
responsibili?es
will
help
facilitate
the
process!

Ø  It
will
help
raise
the
awareness
and
improve
requirements
of
data
quality
and
usability.

You
are
responsible
for

data
quality
of
your
data.

So
you
should
provide
us

with
the
DQ
metadata!

You
are
responsible
for

metadata.
You
should

create
the
DQ
metadata

yourself!

First
Step
in
Formalizing
Roles
and

High-‐Level
Responsibili?es

Data

Producer

• Ensure
and
improve
Scien,fic
Quality
of
the
data
product
-‐

defining
and
documen?ng
data
product
accuracy,
precision,

uncertainty
sources
and
es?mates

• Ensure
Data
Quality
during
produc?on–
screening/assurance

• Assess
and
improve
Data
Quality
–
verifica?on/valida?on

• Ensure
Data
Integrity
–
crea?on/staging

• Help
ensure
Preservability
-‐
providing
informa?on
about

data
product
(?me,
space,
size,

variables,
etc.)

• Ensure
Produc,on
Sustainability

• Help
Ensure
Transparency
-‐
providing
informa?on
on
data

source,
algorithm
and
processing
steps,
and
error
es?mates/
sources

• Ensure
and
improve
Data
Usability
-‐
providing
informa?on

about
the
product
(update
frequency,
latency,

variable

aQributes,
etc.)
and
guidance
on
data
use

Roles
Responsibili?es
Within
the
context
of

ensuring
and
improving

dataset
quality
(DQ)
and
usability

• Ensure
Data
Integrity
–
ingest
and
archive

• Ensure
and
improve
Data
Provenance
and
Traceability

• Improve
Data
Quality
metadata

• Ensure
and
improve
archiving
requirements

• Assess/improve
Data
Quality
–
Evalua?on/verifica?on

• Promote
and
improve
Data
Usability
–
Characteriza?on

• Help
ensure
and
improve
Data
Quality
metadata

• Ensure
and
improve
data
quality
and
usability
requirements

• Ensure
Data
Integrity
–
ingest,
archive
retrieval,
data
access,

and
file
system
and
technology
upgrade

• Ensure
and
Improve
Data
Accessibility
and
Discoverability

• Promote
and
improve
Data
Interoperability

• Ensure
and
improve
sobware
and
system
requirements

Data

Steward

Scien?fic

Steward

Technology

Steward

Roles
Responsibili?es
Within
the
context
of

ensuring
and
improving

dataset
quality
(DQ)
and
usability

End-‐User

• Request
Transparency
in
data
quality
procedures
and
prac?ces

• Request
Provenance
of
the
data
product

• Request
evalua?on
results
of
product,
stewardship,
and
service

maturity
of
the
data
product

• Provide
feedback
on
Quality
and
Usability
of
the
data
product

Manager

• Help
increase
awareness
of
Data
Quality
and
Usability

• Help
improve
data
quality
and
usability
requirements

• Help
ensure
Data
Interoperability

Sponsor

• Deﬁne
Data
Quality
and
Usability
requirements

• Require
data
quality
oversight
and
monitoring

• Encourage
Transparency
in
data
quality
procedures
and
prac?ces

Data

Distributor

• Ensure
and
improve
Representa,on
of
data
quality
informa?on

• Ensure
and
improve
Traceability
of
data
quality
informa?on

• Ensure
user
feedback

• Help
improve
data
quality
and
usability
requirements

Roles
Responsibili?es
Within
the
context
of

ensuring
and
improving

dataset
quality
(DQ)
and
usability

Data

Originator

•  Ensure and improve Scientific Quality of the data product - defining and
documenting data product accuracy, precision, uncertainty sources and estimates
•  Ensure Data Quality during production– screening/assurance
•  Assess and improve Data Quality – verification/validation
•  Ensure Data Integrity – creation/staging
•  Help ensure Preservability - providing information about data product (time, space,
size, variables, etc.)
•  Ensure Production Sustainability
•  Help Ensure Transparency - providing information on data source, algorithm and
processing steps, and error estimates/sources
•  Ensure and improve Data Usability - providing information about the product (update
frequency, latency, variable attributes, etc.) and guidance on data use
Data

Steward

•  Ensure Data Integrity – ingest and archive
•  Ensure and improve Data Provenance and Traceability
•  Improve Data Quality metadata
•  Ensure and improve archiving requirements
Technology

Steward

•  Ensure Data Integrity – ingest, archive retrieval, data access, and file system and
technology upgrade
•  Ensure and Improve Data Accessibility and Discoverability
•  Promote and improve Data Interoperability
•  Ensure and improve software and system requirements
Scien?ﬁc

Steward

•  Assess/improve Data Quality – Evaluation/verification
•  Promote and improve Data Usability – Characterization
•  Help ensure and improve Data Quality metadata
•  Ensure and improve data quality and usability requirements

Documenta?on

•  Capture

•  Convey

•  Be
traceable

•  Be
transparent

•  Be
machine
–
readable

•  Be
human-‐
understandable

Quality
Ra?ng

•  Assess

•  Improve

•  Be
transparent

•  Be
quanSﬁable

•  Be
machine-‐
readable

•  Be
human-‐
understandable

•  Understandable

info
for
users

•  Ac?onable
info

for
management

•  Integrable
tags

for
machines

Roles
Responsibili?es

One
person
may
wear

several
hats!

End-‐User

•  Request Transparency in data quality procedures and practices
•  Request Provenance of the data product
•  Request evaluation results of product, stewardship, and service maturity of the data
product
•  Provide feedback on Quality and Usability of the data product
Within
the
context
of

ensuring
and
improving

dataset
quality
(DQ)
and
usability

Data

Distributor

•  Ensure and improve Representation of data quality information
•  Ensure and improve Traceability of data quality information
•  Ensure user feedback
•  Help improve data quality and usability requirements
Sponsor

•  Define Data Quality and Usability requirements
•  Require data quality oversight and monitoring
•  Encourage Transparency in data quality procedures and practices
Manager

•  Help increase awareness of Data Quality and Usability
•  Help improve data quality and usability requirements
•  Help ensure Data Interoperability
Version:
20160515

CC-‐BY-‐SA
4.0

POC:
gpeng@cicsnc.org

Take
Away
Messages

•  Ensuring
data
quality
is
an
end-‐to-‐end
process
and
a
shared
responsibility
of

all
key
players
(data
producers,
managers/stewards,
providers/publishers)

and
other
major
stakeholders
(sponsors,
power
users,
and
management).

•  Effec?ve
stewardship
of
scien?fic
data
requires:

§  Expert
domain
knowledge
in
data
management,
technology,
and
science

§  ConSnuous
oversight
from
all
stewards,
and

§  Open
and
conSnuous
communicaSon
among
key
players
and
stakeholders

•  Defining
roles
and
responsibili?es
of
key
players
and
stakeholders
will
help

facilitate
the
process
of

§  Ensuring
and
improving
dataset
quality
and
usability

§  Capturing
and
conveying
informaSon
about
data
quality

Acknowledgement

The
idea
of
using
food
quality
for
an
analog
of
data
quality
originated
from

one
of
the
family
dinner
table
discussions.
I
thank
my
family
for
beneﬁcial

discussions
that
followed,
for
allowing
me
to
use
them
as
“Guinea
Pigs”,
and

for
their
helpful
comments!

To
cite
this
presenta?on

Peng,
G.,
2015:
A
New
Paradigm
for
Ensuring
and
Improving
Dataset
Quality
and

Usability
–
Roles
and
ResponsibiliSes
of
Stewards
and
Other
Major
Product

Stakeholders.
Updated:
May
15,
2016.
Slideshare.
Access
date:
mm/dd/yyyy.

View
Latest
Version
of
This
Presenta?on

hYp://Snyurl.com/RolesRs-‐DQU

Related
Presenta?on:
Stewards
–
Knowledge
and
CommunicaSon
Hub

hYp://Snyurl.com/Stewards-‐Hub

Image
source

hYp://www.busyinbrooklyn.com/wp-‐content/uploads/2013/09/USDA_GRADES.jpg;

hYp://www.kaleelbrothers.com/images/Fresh-‐Produce.png;

hYp://www.pgabeef.com/images/storage_chart.gif;

hYps://www.colorado.gov/pacific/sites/default/files/u/6556/Egg-‐Grading.JPG;

hYp://www.hickmanseggs.com/w3/wp-‐content/uploads/2014/04/egg_size.jpg;

hYps://c2.staScflickr.com/8/7159/6801729225_82e823a5d6_z.jpg;

hYp://www.thepoultrysite.com/arScles/contents/09-‐12CobbChicks1.jpg;

hYp://www.topratedsteakhouses.com/wp-‐content/uploads/2013/12/Grilled-‐Beef-‐with-‐Tomato.jpg;

hYp://cdn2.hubspot.net/hub/66214/file-‐15223310-‐jpg/images/wearingmanyhats.jpg;

References

Chisholm,
M.,
2014:
Data
Stewards
versus
Subject
MaYer
Experts
and
Data

Managers.

Informa/on
Management.
Version:
May
28,
2014.
[Available
online
at:
hYp://
www.informaSon-‐management.com/news/news/data-‐stewards-‐versus-‐subject-‐
maYer-‐experts-‐and-‐data-‐managers-‐10025704-‐1.html.]

Peng,
G.,
N.
A.
Ritchey,
K.
S.
Casey,
E.
J.
Kearns,
J.
L.
PriveYe,
D.
Saunders,
P.
Jones,
T.

Maycock,
and
S.
Ansari,
2016:
ScienSfic
Stewardship
in
the
Open
Data
and
Big
Data
Era

-‐
Roles
and
ResponsibiliSes
of
Stewards
and
Other
Major
Product
Stakeholders.
D.-‐Lib

Magazine,
22.
doi:
10.1045/may2016-‐peng.
[Available
online
at:

hYp://dlib.org/dlib/may16/peng/05peng.html.]

New Paradigm for Ensuring and Improving Data Quality and Usability

Recommended

Recommended

More Related Content

Similar to New Paradigm for Ensuring and Improving Data Quality and Usability

Similar to New Paradigm for Ensuring and Improving Data Quality and Usability (20)

More from Ge Peng

More from Ge Peng (6)

Recently uploaded

Recently uploaded (20)

New Paradigm for Ensuring and Improving Data Quality and Usability