The role of the Chief Data Officer (CDO) has become integral to the evolution needed to turn a wisdom-driven company into an analytics-driven company. With Data Governance at the core of your responsibility, moving the innovation meter is a global challenge among CDOs. Specifically the CDO must:
• Provide a single point of accountability for data initiatives and issues
• Innovate ways to use existing data and evangelize a data vision for the organization
• Support & enforce data governance policies via outreach, training & tools
• Work with IT to develop/maintain an enterprise data repository
• Set standards for analytical reporting and generate data insights through data science
In this session, Joe Caserta addresses real-word CDO challenges, shares techniques to overcome them, manage corporate disruption and achieve success.
Advanced Machine Learning for Business Professionals
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT CDOIQ, 2017)
1. @joe_caserta
#mitcdoiq
Integrating the CDO Role Into Your Organization
Managing the Disruption
Presented By:
Joe Caserta
July 13, 2017
@joe_Caserta
#MITCDOIQ
Massachusetts
Institute of
Technology
Chief Data Officer and
Information Quality Symposium
2. @joe_caserta
#mitcdoiq
Joe
Caserta
Launched Big Data practice
Co-author, with Ralph Kimball, The Data
Warehouse ETL Toolkit (Wiley)
Data Analysis, Data Warehousing and Business
Intelligence since 1996
Began consulting database programing and data
modeling 30+ years hands-on experience building database
solutions
Founded Caserta Concepts in NYC
Web log analytics solution published in
Intelligent Enterprise magazine
Launched Data Science, Data Interaction and Cloud
practices
Laser focus on extending Data Analytics with Big Data
solutions
1986
2004
1996
2009
2001
2013
2012
2016
Dedicated to Data Governance Techniques on Big
Data (Innovation)
Awarded Top 20 Big Data Companies 2016
Top 20 Most Powerful
Big Data consulting firms
Launched Big Data Warehousing (BDW) Meetup NYC:
4.500+ Members
2017
Added Disruption Management Practice to Caserta
Established Best Practices for big data ecosystem
implementations
3. @joe_caserta
#mitcdoiq
About
Caserta
Concepts
– Consul1ng
Data
Innova>on
and
Modern
Data
Engineering
– Award-‐winning
company
– Interna>onally
recognized
work
force
– Strategy,
Architecture,
Implementa>on,
Governance
– Innova1on
Partner
– Strategic
Consul>ng
– Advanced
Architecture
– Build
&
Deploy
– Leader
in
Enterprise
Data
Solu>ons
– Big
Data
Analy>cs
– Data
Warehousing
– Business
Intelligence
– Data
Science
– Cloud
Compu>ng
– Data
Governance
8. @joe_caserta
#mitcdoiq
Harnessing
the
Customer
Journey
Awareness
Considera>on
Purchase
Service
Loyalty
Expansion
PR
Radio
TV
Print
Outdoor
Word
of
Mouth
Direct
Mail
Customer
Service
Physical
Touchpoints
Digital
Touchpoints
Search
Paid
Content
email
Website/
Landing
Pages
Social
Media
Community
Chat
Social
Media
Call
Center
Offers
Mailings
Survey
Loyalty
Programs
email
Agents
Partners
Ads
Website
Mobile
3rd
Party
Sites
Offers
Web
self-‐service
9. @joe_caserta
#mitcdoiq
A[ribu>on
Type
Comments
Single
Touch
Rules-‐Based
Sta>s>cally
Driven
Assign
the
credit
to
the
first
or
last
exposure
Assign
the
credit
to
each
interac>on
based
on
business
rules
Assign
the
credit
to
interac>ons
based
on
data-‐driven
model
Ad-‐Click
Mailing
Mailing
E-‐mail
E-‐mail
Ad-‐Click
Ad-‐Click
100%
33%
33%
33%
27%
49%
24%
- Last
touch
only
- Ignores
bulk
of
customer
journey
- Undervalues
other
interac>ons
and
influencers
- Subjec>ve
- Assigns
arbitrary
values
to
each
interac>on
- Lacks
analy>cs
rigor
to
determine
weights
ü Looks
at
full
behavior
pa[erns
ü Consider
all
touch
points
ü Can
apply
different
models
for
best
results
ü Use
data
to
find
correla>ons
between
touch
points
(winning
combina>ons)
Why
do
we
Care?
10. @joe_caserta
#mitcdoiq
Onboarding
New
Data
Business:
“I
need
to
analyze
some
new
data”
ü
IT
collects
requirements
ü
Creates
normalized
and/or
dimensional
data
models
ü
Profiles
and
conforms
and
the
data
ü
Sophis>cated
ETL
programs
and
quality
standards
ü
Loads
it
into
data
models
ü
Builds
a
BI
seman>c
layer
ü
Creates
dashboards
and
reports
IT:
“You’ll
have
your
data
in
3-‐6
months
to
see
if
it
has
value!
– Onboarding
new
data
is
difficult!
– Rigid
Structures
and
Data
Governance
– Disconnected/removed
from
business
11. @joe_caserta
#mitcdoiq
Houston,
we
have
a
Problem:
Data
Sprawl
• There
is
one
applica>on
for
every
5-‐10
employees
genera>ng
copies
of
the
same
files
leading
to
massive
amounts
of
duplicate
idle
data
strewn
all
across
the
enterprise.
-‐
Michael
Vizard,
ITBusinessEdge.com
• Employees
spend
35%
of
their
work
>me
searching
for
informa>on...
finding
what
they
seek
50%
of
the
>me
or
less.
-‐
“The
High
Cost
of
Not
Finding
Informa>on,”
IDC
13. @joe_caserta
#mitcdoiq
GDPR
Cannot
be
Ignored
GDPR
Compliance
Top
Data
Protec3on
Priority
for
92%
of
US
Organiza3ons
in
2017
-‐
PwC
Survey
• The
GDPR
requirements
will
force
U.S.
companies
to
change
the
way
they
process,
store,
and
protect
customers’
personal
data.
• Companies
must
be
able
to
show
compliance
by
May
25,
2018
• Data
Elements
Regulated:
• Basic
iden>ty
informa>on
such
as
name,
address
and
ID
numbers
• Web
data
such
as
loca>on,
IP
address,
cookie
data
and
RFID
tags
• Health
and
gene>c
data
• Biometric
data
• Racial
or
ethnic
data
• Poli>cal
opinions
• Sexual
orienta>on
• A
data
protec>on
officer
(DPO)
may
be
required
New
York
legislature,
inspired
by
the
GDPR,
proposed
the
Right
to
be
Forgo[en
Act,.
• GDPR
will
con>nue
influencing
privacy
regula>ons
across
the
globe
• Companies
that
comply
with
the
GDPR
will
be
be[er
prepared
for
future
changes
in
U.S.
legisla>on.
14. @joe_caserta
#mitcdoiq
The
New
Data
Paradigm
OLD
WAY:
• Structure
Data
à
Ingest
Data
à
Analyze
Data
• Fully
Governed
• Monolith
NEW
WAY:
• Ingest
Data
à
Analyze
Data
à
Structure
Data
• Just
Enough
Governance
• Dynamic
RECIPE:
• Data
Officer
&
Data
Organiza>on
• Enterprise
Data
Lake
• Holis>c
Data
Architecture
&
Framework
15. @joe_caserta
#mitcdoiq
Ingest
Raw
Data
Organize,
Define,
Complete
Munging,
Blending
Machine
Learning
Data
Quality
and
Monitoring
Metadata,
ILM
,
Security
Data
Catalog
Data
Integra>on
Fully
Governed
(
trusted)
Arbitrary/Ad-‐hoc
Queries
and
Repor>ng
Big
Data
Warehouse
Data
Science
Workspace
Data
Lake
–
Integrated
Sandbox
Landing
Area
–
Source
Data
in
“Full
Fidelity”
Usage
Pa[ern
Data
Governance
Metadata,
ILM,
Security
Corporate
Data
Pyramid
(CDP)
16. @joe_caserta
#mitcdoiq
Data
Asset
Development
Lifecycle
• Data
Science
is
performed
in
the
ephemeral
workspaces
to
derive
new
insights/assets
• The
work
products
of
data
science
is
promoted
from
insights
to
assets.
• Rigorous
Data
Governance
applied
• Processes
must
be
hardened,
repeatable,
and
performant
Big$
Data$
Warehouse$
Data$Science$Workspace$
Data$Lake$–$Integrated$Sandbox$$
Landing$Area$–$Source$Data$in$“Full$Fidelity”$
New$$
Data$
New$
Insights$
Governance
Refinery
17. @joe_caserta
#mitcdoiq
Enter
the
Chief
Data
Officer
• Evangelize
a
data
vision
for
the
organiza>on
• Support
&
enforce
data
governance
policies
via
outreach,
training
&
tools
• Monitor
and
enforce
data
quality
in
collabora>on
with
data
owners
• Monitor
and
enforce
data
security
along
with
Legal/Security/Compliance
• Work
with
IT
to
develop/maintain
an
enterprise
repository
of
strategic
data
• Set
standards
for
analy>cal
repor>ng
and
generate
data
insights
• Provide
a
single
point
of
accountability
for
data
ini>a>ves
and
issues
• Innovate
ways
to
use
exis>ng
data
• Enrich
and
augment
data
by
combining
internal
and
external
sources
• Support
efficient
and
agile
analy1cs
through
training
and
templates
18. @joe_caserta
#mitcdoiq
The
CDO:
The
Whole
Brain
Challenge
Front
Back
Analy1cs
Oriented
• Data
Science
• Research
Process
Oriented
• Data
Governance
• Compliance
Opera1ons
Oriented
• Shared
Services
• Data
Engineering
Revenue
Oriented
• Revenue
Goals
• Mone>zing
Data
19. @joe_caserta
#mitcdoiq
Data
Officer
• Create
and
evangelize
vision,
strategy,
and
mission
statement
• Create,
communicate,
and
enforce
policies,
procedures,
and
processes
• Plan,
priori>ze,
and
project
manage
data
ini>a>ves
• Prepare
&
maintain
budget
for
staff,
infrastructure,
services,
tools
&
training
• Innovate
ways
to
use
exis>ng
data
• Enrich
and
augment
data
by
combining
internal
and
external
sources
• Protec>on
–
ensuring
data
privacy
and
security
Data
Governance
Lead
• Represent
business
interests
across
departments
• Priori>ze
and
manage
data
requests
and
remedia>on
efforts
• Iden>fy
pockets
of
business,
technical,
and
data
exper>se
• Socialize
policies
and
support
programs
Data
Stewards
• Receive,
manage,
priori>ze
and
track
data
quality
issues
• Proac>vely
lead
data
quality
monitoring
of
high
value
data
• Iden>fy,
train,
and
manage
cri>cal
data
sources
• Ensure
remedia>on
efforts
follow
change
management
policies
• Assist
in
management
and
maintenance
of
master
data
Data
Librarian
• Track
and
manage
data
related
assets
(sources,
metadata,
business
glossary,
data
lineage)
• Track
and
manage
common
queries
with
embedded
business
logic
• Track
and
manage
canned
reports
(to
prevent
duplica>on)
• Track
and
manage
custom
reports
(to
prevent
duplica>on)
• Track
and
manage
standard
reports
and
dashboard
templates
• Track
internal
and
external
data
and
tool
experts
• Manage
the
Data
Governance
knowledge
repository
Data
Organiza>on
Roles
20. @joe_caserta
#mitcdoiq
Global
economics
Intensity
of
compe>>on
Reduce
costs
Move
to
cross-‐func>onal
teams
New
execu>ve
leadership
Speed
of
technical
change
Social
trends
and
changes
Period
of
>me
in
present
role
Status
&
perks
of
office/dept
under
threat
No
apparent
reasons
for
proposed
changes
Lack
of
understanding
of
proposed
changes
Fear
of
inability
to
cope
with
new
technology
Concern
over
job
security
Forces
for
Change
Forces
Resis>ng
Change
Status
Quo
Disrup>on
Management
h[p://www.change-‐management-‐coach.com/force-‐field-‐analysis.html
21. @joe_caserta
#mitcdoiq
Chief
Data
Organiza1on
(Oversight)
Ver1cal
Business
Area
[Sales/Finance/Marke>ng/Opera>ons/Customer
Svc]
Product
Owner
SCRUM
Master
Development
Team
Business
Subject
Ma[er
Exper>se
Data
Librarian/Data
Stewardship
Data
Science/
Sta>s>cal
Skills
Data
Engineering
/
Architecture
Presenta>on/
BI
Report
Development
Skills
Data
Quality
Assurance
DevOps
IT
Organiza1on
(Oversight)
Enterprise
Data
Architect
Solu>on
Engineers
Data
Integra>on
Prac>ce
User
Experience
Prac>ce
QA
Prac>ce
Opera>ons
Prac>ce
Advanced
Analy1cs
Business
Analysts
Data
Analysts
Data
Scien>sts
Sta>s>cians
Data
Engineers
Planning
Organiza1on
Project
Managers
Data
Organiza1on
Data
Gov
Coordinator
Data
Librarians
Data
Stewards
It
Takes
a
Village!
22. @joe_caserta
#mitcdoiq
Cau1on:
Assembly
Required
— Some
of
the
most
hopeful
tools
are
brand
new
or
in
incuba>on
— Enterprise
big
data
implementa>ons
typically
combine
products
with
custom
built
components
Making
it
Happen
People,
Processes
and
Business
commitment
are
s1ll
cri1cal!
Data
Integra1on
&
Quality
Data
Catalog
&
Governance
Emerging
Solu1ons
23. @joe_caserta
#mitcdoiq
CDO
Success
in
Summary
• Self-‐service,
reduce
ongoing
dependency
on
IT
• Automate
Workflows
Streamline
Processes
Automa>on
Business
Defini>ons
• Iden>fica>on
of
KPI’s
• Itera>ve
Process
–
defini>ons
mature
over
>me
• Tools
provide
user-‐centric
experience
• Data
Discovery
• Data
Profiling
• Workflows
• Data
Quality
• Automated
ILM
• CDO
• Data
Governance
Council
• Data
Stewardship
Team
• Business
SME’s
• Data
Scien>sts
for
Insights
Roles
Metrics
Architecture
• Consolidated
view
of
data
• Flexibility
for
future
growth
• Viewable
Everywhere
• Gauge
overall
governance
of
data
• Data
Quality
repor>ng
• Issue
Tracking
Data
Centric,
Technology
Enabled,
Business
Focused
24. @joe_caserta
#mitcdoiq
• DevOps
for
Analy>cs
• Search-‐Based
BI
(NLP)
• Ar>ficial
Intelligence
(AI)
• Virtual
Reality
BI
(VR)
• Virtual
Assistant
BI
(Voice)
• Repor>ng/Predic>ons
Converge
• Ci>zen
Data
Scien>sts
Emerge
What
the
Future
Holds
25. @joe_caserta
#mitcdoiq
Joe Caserta
President, Caserta Concepts
joe@casertaconcepts.com
Data is not important, it’s what you do with it that’s important!
Thank
You
Massachusetts
Institute of
Technology
Chief Data Officer and
Information Quality Symposium
27. @joe_caserta
#mitcdoiq
Cloud
Component
AWS
Google
Microsog
Scalable
distributed
storage
S3
GCS
Azure
Storage
Pluggable
fit-‐for-‐purpose
processing
EMR
DataProc
HDInsight
Compute
Services
EC2
GCE
VMs
Consistent
extensible
framework
Spark
Spark
Spark
Dimensional
MPP
Data
Warehouse
Redshix/
Snowflake
BigQuery
Azure
SQL
Data
Warehouse
Data
Streaming
Kenesis
PubSub
Azure
Stream
Common
Interface
Jupyter
DataLab
Azure
Notebook
The
Data
Lake
on
the
Cloud
• Remove
barriers
between
data
inges>on
and
analysis
• Democra>ze
data
with
Just
Enough
Data
Governance
(JEDG)