SlideShare a Scribd company logo
1 of 27
@joe_caserta	
  	
  #mitcdoiq	
  
Integrating the CDO Role Into Your Organization
Managing the Disruption
Presented By:
Joe Caserta
July 13, 2017
@joe_Caserta
#MITCDOIQ
Massachusetts
Institute of
Technology
Chief Data Officer and
Information Quality Symposium
@joe_caserta	
  	
  #mitcdoiq	
  
Joe	
  Caserta	
  
Launched Big Data practice
Co-author, with Ralph Kimball, The Data
Warehouse ETL Toolkit (Wiley)
Data Analysis, Data Warehousing and Business
Intelligence since 1996
Began consulting database programing and data
modeling 30+ years hands-on experience building database
solutions
Founded Caserta Concepts in NYC
Web log analytics solution published in
Intelligent Enterprise magazine
Launched Data Science, Data Interaction and Cloud
practices
Laser focus on extending Data Analytics with Big Data
solutions
1986	
  
2004	
  
1996	
  
2009	
  
2001	
  
2013	
  
2012	
  
2016	
  
Dedicated to Data Governance Techniques on Big
Data (Innovation)
Awarded Top 20 Big Data Companies 2016
Top 20 Most Powerful
Big Data consulting firms
Launched Big Data Warehousing (BDW) Meetup NYC:
4.500+ Members
2017	
   Added Disruption Management Practice to Caserta
Established Best Practices for big data ecosystem
implementations
@joe_caserta	
  	
  #mitcdoiq	
  
About	
  Caserta	
  Concepts	
  
–  Consul1ng	
  Data	
  Innova>on	
  and	
  Modern	
  Data	
  Engineering	
  
–  Award-­‐winning	
  company	
  
–  Interna>onally	
  recognized	
  work	
  force	
  
–  Strategy,	
  Architecture,	
  Implementa>on,	
  Governance	
  
–  Innova1on	
  Partner	
  
–  Strategic	
  Consul>ng	
  
–  Advanced	
  Architecture	
  
–  Build	
  &	
  Deploy	
  
–  Leader	
  in	
  Enterprise	
  Data	
  Solu>ons	
  
–  Big	
  Data	
  Analy>cs	
  
–  Data	
  Warehousing	
  
–  Business	
  Intelligence	
  
–  Data	
  Science	
  
–  Cloud	
  Compu>ng	
  
–  Data	
  Governance	
  
@joe_caserta	
  	
  #mitcdoiq	
  
Caserta	
  Client	
  PorQolio	
  
Retail/eCommerce	
  
&	
  Manufacturing	
  
Finance,	
  Healthcare	
  
&	
  Insurance	
  
Digital	
  Media/AdTech	
  
Educa>on	
  &	
  Services	
  
@joe_caserta	
  	
  #mitcdoiq	
  
Awards	
  &	
  Recogni>on	
  
Top 10
Fastest Growing
Big Data Companies
2016
@joe_caserta	
  	
  #mitcdoiq	
  
Our	
  Partners	
  
@joe_caserta	
  	
  #mitcdoiq	
  
1500s%
Prin*ng%Press%
1840s%
Penny%Post%
1850s%
Telegraph%
1850s%
Rural%Free%Post%
1890s%
Telephone%
1900s%
Radio%
1950s%
TV%
1970s%
PCs%
1980s%
Internet%
1990s%
Web%
2000s%
Social%Media,%Mobile,%Big%Data,%Cloud%%%
98,000+%Tweets&
695,000&Status&Updates&
11%Million&instant&messages&
698,445&Google&Searches&
168%million+&emails&sent&
1,829%TB%of&data&created&&
217&new&mobile&web&
users&
Every 60 Seconds
Why	
  is	
  Data	
  So	
  Important?	
  
@joe_caserta	
  	
  #mitcdoiq	
  
Harnessing	
  the	
  Customer	
  Journey	
  
Awareness	
   Considera>on	
   Purchase	
   Service	
  
Loyalty	
  
Expansion	
  
PR	
  
Radio	
  
TV	
  
Print	
  
Outdoor	
  
Word	
  of	
  Mouth	
  
Direct	
  Mail	
  
Customer	
  Service	
  
Physical	
  Touchpoints	
  
Digital	
  Touchpoints	
  
Search	
  
Paid	
  Content	
  
email	
  
Website/	
  
Landing	
  Pages	
  
Social	
  Media	
  
Community	
  
Chat	
  
Social	
  Media	
  
Call	
  Center	
  
Offers	
  
Mailings	
  
Survey	
  
Loyalty	
  Programs	
  
email	
  
Agents	
  
Partners	
  
Ads	
  
Website	
  
Mobile	
  
3rd	
  Party	
  Sites	
  
Offers	
  
Web	
  self-­‐service	
  
@joe_caserta	
  	
  #mitcdoiq	
  
A[ribu>on	
  
Type	
  
Comments	
  
Single	
  Touch	
   Rules-­‐Based	
   Sta>s>cally	
  Driven	
  
Assign	
  the	
  credit	
  
to	
  the	
  first	
  or	
  last	
  
exposure	
  
Assign	
  the	
  credit	
  
to	
  each	
  interac>on	
  
based	
  on	
  business	
  
rules	
  
Assign	
  the	
  credit	
  to	
  
interac>ons	
  based	
  
on	
  data-­‐driven	
  
model	
  
Ad-­‐Click	
   Mailing	
   Mailing	
  E-­‐mail	
   E-­‐mail	
  Ad-­‐Click	
   Ad-­‐Click	
  
100%	
   33%	
   33%	
   33%	
   27%	
   49%	
   24%	
  
-  Last	
  touch	
  only	
  
-  Ignores	
  bulk	
  of	
  
customer	
  journey	
  
-  Undervalues	
  
other	
  interac>ons	
  
and	
  influencers	
  	
  
-  Subjec>ve	
  
-  Assigns	
  arbitrary	
  
values	
  to	
  each	
  
interac>on	
  
-  Lacks	
  analy>cs	
  rigor	
  
to	
  determine	
  weights	
  
ü  Looks	
  at	
  full	
  behavior	
  
pa[erns	
  
ü  Consider	
  all	
  touch	
  points	
  
ü  Can	
  apply	
  different	
  
models	
  for	
  best	
  results	
  
ü  Use	
  data	
  to	
  find	
  
correla>ons	
  between	
  
touch	
  points	
  (winning	
  
combina>ons)	
  	
  
Why	
  do	
  we	
  Care?	
  
@joe_caserta	
  	
  #mitcdoiq	
  
Onboarding	
  New	
  Data	
  
Business:	
  	
  “I	
  need	
  to	
  analyze	
  some	
  new	
  data”	
  	
  
ü  	
  	
  IT	
  collects	
  requirements	
  
ü  	
  	
  Creates	
  normalized	
  and/or	
  dimensional	
  data	
  models	
  
ü  	
  	
  Profiles	
  and	
  conforms	
  and	
  the	
  data	
  
ü  	
  	
  Sophis>cated	
  ETL	
  programs	
  and	
  quality	
  standards	
  	
  
ü  	
  	
  Loads	
  it	
  into	
  data	
  models	
  
ü  	
  	
  Builds	
  a	
  BI	
  seman>c	
  layer	
  
ü  	
  	
  Creates	
  dashboards	
  and	
  reports	
  
IT:	
  “You’ll	
  have	
  your	
  data	
  in	
  3-­‐6	
  months	
  to	
  see	
  	
  if	
  it	
  has	
  value!	
  
–  Onboarding	
  new	
  data	
  is	
  difficult!	
  
–  Rigid	
  Structures	
  and	
  Data	
  Governance	
  
–  Disconnected/removed	
  from	
  business	
  
	
  
@joe_caserta	
  	
  #mitcdoiq	
  
Houston,	
  we	
  have	
  a	
  Problem:	
  Data	
  Sprawl	
  
•  There	
  is	
  one	
  applica>on	
  for	
  every	
  5-­‐10	
  employees	
  genera>ng	
  copies	
  of	
  
the	
  same	
  files	
  leading	
  to	
  massive	
  amounts	
  of	
  duplicate	
  idle	
  data	
  strewn	
  all	
  
across	
  the	
  enterprise.	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐	
  Michael	
  Vizard,	
  ITBusinessEdge.com	
  
•  Employees	
  spend	
  35%	
  of	
  their	
  work	
  >me	
  searching	
  for	
  informa>on...	
  
finding	
  what	
  they	
  seek	
  50%	
  of	
  the	
  >me	
  or	
  less.	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐	
  “The	
  High	
  Cost	
  of	
  Not	
  Finding	
  Informa>on,”	
  IDC	
  
@joe_caserta	
  	
  #mitcdoiq	
  
@joe_caserta	
  	
  #mitcdoiq	
  
GDPR	
  Cannot	
  be	
  Ignored	
  
GDPR	
  Compliance	
  Top	
  Data	
  Protec3on	
  Priority	
  for	
  92%	
  of	
  US	
  Organiza3ons	
  in	
  2017	
  	
  	
  	
  	
  -­‐	
  PwC	
  Survey	
  
•  The	
  GDPR	
  requirements	
  will	
  force	
  U.S.	
  companies	
  to	
  
change	
  the	
  way	
  they	
  process,	
  store,	
  and	
  protect	
  
customers’	
  personal	
  data.	
  
•  Companies	
  must	
  be	
  able	
  to	
  show	
  compliance	
  by	
  
May	
  25,	
  2018	
  
•  Data	
  Elements	
  Regulated:	
  
•  Basic	
  iden>ty	
  informa>on	
  such	
  as	
  name,	
  address	
  and	
  
ID	
  numbers	
  
•  Web	
  data	
  such	
  as	
  loca>on,	
  IP	
  address,	
  cookie	
  data	
  
and	
  RFID	
  tags	
  
•  Health	
  and	
  gene>c	
  data	
  
•  Biometric	
  data	
  
•  Racial	
  or	
  ethnic	
  data	
  
•  Poli>cal	
  opinions	
  
•  Sexual	
  orienta>on	
  
•  A	
  data	
  protec>on	
  officer	
  (DPO)	
  may	
  be	
  required	
  
New	
  York	
  legislature,	
  inspired	
  by	
  the	
  GDPR,	
  
proposed	
  the	
  Right	
  to	
  be	
  Forgo[en	
  Act,.	
  	
  
•  GDPR	
  will	
  con>nue	
  influencing	
  privacy	
  
regula>ons	
  across	
  the	
  globe	
  
•  Companies	
  that	
  comply	
  with	
  the	
  GDPR	
  will	
  
be	
  be[er	
  prepared	
  for	
  future	
  changes	
  in	
  
U.S.	
  legisla>on.	
  
@joe_caserta	
  	
  #mitcdoiq	
  
The	
  New	
  Data	
  Paradigm	
  
	
  
OLD	
  WAY:	
  
•  Structure	
  Data	
  à	
  Ingest	
  Data	
  	
  à	
  Analyze	
  Data	
  
•  Fully	
  Governed	
  
•  Monolith	
  
NEW	
  WAY:	
  
•  Ingest	
  Data	
  à	
  Analyze	
  Data	
  à	
  Structure	
  Data	
  
•  Just	
  Enough	
  Governance	
  
•  Dynamic	
  
RECIPE:	
  
•  Data	
  Officer	
  &	
  Data	
  Organiza>on	
  
•  Enterprise	
  Data	
  Lake	
  
•  Holis>c	
  Data	
  Architecture	
  &	
  Framework	
  
@joe_caserta	
  	
  #mitcdoiq	
  
Ingest	
  Raw	
  
Data	
  
Organize,	
  Define,	
  
Complete	
  
Munging,	
  Blending	
  
Machine	
  Learning	
  
Data	
  Quality	
  and	
  Monitoring	
  
	
  	
  	
  	
  Metadata,	
  ILM	
  ,	
  Security	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  Data	
  Catalog	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Data	
  Integra>on	
  
Fully	
  Governed	
  (	
  trusted)	
  
Arbitrary/Ad-­‐hoc	
  	
  Queries	
  
and	
  Repor>ng	
  
Big	
  
Data	
  
Warehouse	
  
Data	
  Science	
  Workspace	
  
Data	
  Lake	
  –	
  Integrated	
  Sandbox	
  	
  
Landing	
  Area	
  –	
  Source	
  Data	
  in	
  “Full	
  Fidelity”	
  
Usage	
  Pa[ern	
   Data	
  Governance	
  
Metadata,	
  ILM,	
  
	
  	
  	
  Security	
  	
  
Corporate	
  Data	
  Pyramid	
  (CDP)	
  
@joe_caserta	
  	
  #mitcdoiq	
  
Data	
  Asset	
  Development	
  Lifecycle	
  
•  Data	
  Science	
  is	
  performed	
  in	
  the	
  ephemeral	
  workspaces	
  to	
  derive	
  new	
  insights/assets	
  
•  The	
  work	
  products	
  of	
  data	
  science	
  is	
  promoted	
  from	
  insights	
  to	
  assets.	
  	
  
•  Rigorous	
  Data	
  Governance	
  applied	
  
•  Processes	
  must	
  be	
  hardened,	
  repeatable,	
  and	
  performant	
  
Big$
Data$
Warehouse$
Data$Science$Workspace$
Data$Lake$–$Integrated$Sandbox$$
Landing$Area$–$Source$Data$in$“Full$Fidelity”$
New$$
Data$
New$
Insights$
Governance
Refinery
@joe_caserta	
  	
  #mitcdoiq	
  
Enter	
  the	
  Chief	
  Data	
  Officer	
  
•  Evangelize	
  a	
  data	
  vision	
  for	
  the	
  organiza>on	
  
•  Support	
  &	
  enforce	
  data	
  governance	
  policies	
  via	
  outreach,	
  training	
  &	
  tools	
  
•  Monitor	
  and	
  enforce	
  data	
  quality	
  in	
  collabora>on	
  with	
  data	
  owners	
  
•  Monitor	
  and	
  enforce	
  data	
  security	
  along	
  with	
  Legal/Security/Compliance	
  
•  Work	
  with	
  IT	
  to	
  develop/maintain	
  an	
  enterprise	
  repository	
  of	
  strategic	
  data	
  
•  Set	
  standards	
  for	
  analy>cal	
  repor>ng	
  and	
  generate	
  data	
  insights	
  
•  Provide	
  a	
  single	
  point	
  of	
  accountability	
  for	
  data	
  
ini>a>ves	
  and	
  issues	
  
•  Innovate	
  ways	
  to	
  use	
  exis>ng	
  data	
  
•  Enrich	
  and	
  augment	
  data	
  by	
  combining	
  internal	
  and	
  
external	
  sources	
  
•  Support	
  efficient	
  and	
  agile	
  analy1cs	
  through	
  training	
  
and	
  templates	
  
@joe_caserta	
  	
  #mitcdoiq	
  
The	
  CDO:	
  The	
  Whole	
  Brain	
  Challenge	
  
Front	
  
Back	
  
Analy1cs	
  Oriented	
  
•  Data	
  Science	
  
•  Research	
  
Process	
  Oriented	
  
•  Data	
  Governance	
  
•  Compliance	
  
Opera1ons	
  Oriented	
  
•  Shared	
  Services	
  
•  Data	
  Engineering	
  
Revenue	
  Oriented	
  
•  Revenue	
  Goals	
  
•  Mone>zing	
  Data	
  
@joe_caserta	
  	
  #mitcdoiq	
  
Data	
  Officer	
  
• Create	
  and	
  evangelize	
  vision,	
  
strategy,	
  and	
  mission	
  
statement	
  
• Create,	
  communicate,	
  and	
  
enforce	
  policies,	
  procedures,	
  
and	
  processes	
  
• Plan,	
  priori>ze,	
  and	
  project	
  
manage	
  data	
  ini>a>ves	
  
• Prepare	
  &	
  maintain	
  budget	
  
for	
  staff,	
  infrastructure,	
  
services,	
  tools	
  &	
  training	
  
• Innovate	
  ways	
  to	
  use	
  exis>ng	
  
data	
  
• Enrich	
  and	
  augment	
  data	
  by	
  
combining	
  internal	
  and	
  
external	
  sources	
  
• Protec>on	
  –	
  ensuring	
  data	
  
privacy	
  and	
  security	
  
Data	
  Governance	
  Lead	
  	
  
• Represent	
  business	
  interests	
  
across	
  departments	
  
• Priori>ze	
  and	
  manage	
  data	
  
requests	
  and	
  remedia>on	
  
efforts	
  
• Iden>fy	
  pockets	
  of	
  business,	
  
technical,	
  and	
  data	
  exper>se	
  
• Socialize	
  policies	
  and	
  support	
  
programs	
  
Data	
  Stewards	
  
• Receive,	
  manage,	
  priori>ze	
  
and	
  track	
  data	
  quality	
  issues	
  
• Proac>vely	
  lead	
  data	
  quality	
  
monitoring	
  of	
  high	
  value	
  data	
  
• Iden>fy,	
  train,	
  and	
  manage	
  
cri>cal	
  data	
  sources	
  
• Ensure	
  remedia>on	
  efforts	
  
follow	
  change	
  management	
  
policies	
  
• Assist	
  in	
  management	
  and	
  
maintenance	
  of	
  master	
  data	
  
Data	
  Librarian	
  	
  	
  
• Track	
  and	
  manage	
  data	
  
related	
  assets	
  (sources,	
  
metadata,	
  business	
  glossary,	
  
data	
  lineage)	
  
• Track	
  and	
  manage	
  common	
  
queries	
  with	
  embedded	
  
business	
  logic	
  
• Track	
  and	
  manage	
  canned	
  
reports	
  (to	
  prevent	
  
duplica>on)	
  
• Track	
  and	
  manage	
  custom	
  
reports	
  (to	
  prevent	
  
duplica>on)	
  
• Track	
  and	
  manage	
  standard	
  
reports	
  and	
  dashboard	
  
templates	
  
• Track	
  internal	
  and	
  external	
  
data	
  and	
  tool	
  experts	
  
• Manage	
  the	
  Data	
  Governance	
  
knowledge	
  repository	
  
Data	
  Organiza>on	
  Roles	
  
@joe_caserta	
  	
  #mitcdoiq	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
Global	
  economics	
  
Intensity	
  of	
  compe>>on	
  
Reduce	
  costs	
  
Move	
  to	
  cross-­‐func>onal	
  teams	
  
New	
  execu>ve	
  leadership	
  
Speed	
  of	
  technical	
  change	
  
Social	
  trends	
  and	
  changes	
  
Period	
  of	
  >me	
  in	
  present	
  role	
  
Status	
  &	
  perks	
  of	
  office/dept	
  under	
  threat	
  
No	
  apparent	
  reasons	
  for	
  proposed	
  changes	
  
Lack	
  of	
  understanding	
  of	
  proposed	
  changes	
  
Fear	
  of	
  inability	
  to	
  cope	
  with	
  new	
  technology	
  
Concern	
  over	
  job	
  security	
  
Forces	
  for	
  Change	
   Forces	
  Resis>ng	
  Change	
  
Status	
  Quo	
  
Disrup>on	
  Management	
  
h[p://www.change-­‐management-­‐coach.com/force-­‐field-­‐analysis.html	
  
@joe_caserta	
  	
  #mitcdoiq	
  
Chief	
  Data	
  Organiza1on	
  (Oversight)	
  
Ver1cal	
  Business	
  Area	
  
[Sales/Finance/Marke>ng/Opera>ons/Customer	
  Svc]	
  
Product	
  Owner	
  
SCRUM	
  Master	
  
	
  
Development	
  Team	
  
	
  
	
  
Business	
  Subject	
  Ma[er	
  Exper>se	
  
Data	
  Librarian/Data	
  Stewardship	
  
Data	
  Science/	
  Sta>s>cal	
  Skills	
  
Data	
  Engineering	
  	
  /	
  Architecture	
  
Presenta>on/	
  BI	
  Report	
  Development	
  Skills	
  
Data	
  Quality	
  Assurance	
  
DevOps	
  
	
  
IT	
  Organiza1on	
  
	
  
(Oversight)	
  
Enterprise	
  Data	
  Architect	
  
	
  
Solu>on	
  Engineers	
  
Data	
  Integra>on	
  Prac>ce	
  	
  
User	
  Experience	
  Prac>ce	
  	
  
	
  QA	
  Prac>ce	
  
Opera>ons	
  Prac>ce	
  
Advanced	
  Analy1cs	
  
	
  
	
  
Business	
  Analysts	
  
Data	
  Analysts	
  
Data	
  Scien>sts	
  
Sta>s>cians	
  
Data	
  Engineers	
  
	
  
Planning	
  Organiza1on	
  
	
  
	
  
Project	
  Managers	
  
Data	
  Organiza1on	
  
	
  
Data	
  Gov	
  Coordinator	
  
Data	
  Librarians	
  
Data	
  Stewards	
  
	
  
It	
  Takes	
  a	
  Village!	
  
@joe_caserta	
  	
  #mitcdoiq	
  
Cau1on:	
  Assembly	
  Required	
  
—  Some	
  of	
  the	
  most	
  hopeful	
  tools	
  are	
  brand	
  new	
  or	
  in	
  
incuba>on	
  
—  Enterprise	
  big	
  data	
  implementa>ons	
  typically	
  combine	
  
products	
  with	
  custom	
  built	
  components	
  
Making	
  it	
  Happen	
  
People,	
  Processes	
  and	
  Business	
  commitment	
  are	
  s1ll	
  cri1cal!	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
Data	
  Integra1on	
  &	
  Quality	
   Data	
  Catalog	
  &	
  Governance	
   Emerging	
  Solu1ons	
  
@joe_caserta	
  	
  #mitcdoiq	
  
CDO	
  Success	
  in	
  Summary	
  
•  Self-­‐service,	
  reduce	
  ongoing	
  
dependency	
  on	
  IT	
  
•  Automate	
  Workflows	
  
Streamline	
  Processes	
   Automa>on	
  Business	
  Defini>ons	
  
•  Iden>fica>on	
  of	
  KPI’s	
  
•  Itera>ve	
  Process	
  –	
  defini>ons	
  
mature	
  over	
  >me	
  
•  Tools	
  provide	
  user-­‐centric	
  
experience	
  
•  Data	
  Discovery	
  
•  Data	
  Profiling	
  
•  Workflows	
  
•  Data	
  Quality	
  
•  Automated	
  ILM	
  
•  CDO	
  
•  Data	
  Governance	
  Council	
  
•  Data	
  Stewardship	
  Team	
  
•  Business	
  SME’s	
  
•  Data	
  Scien>sts	
  for	
  Insights	
  
Roles	
   Metrics	
  Architecture	
  
•  Consolidated	
  view	
  of	
  data	
  
•  Flexibility	
  for	
  future	
  growth	
  
•  Viewable	
  Everywhere	
  
•  Gauge	
  overall	
  governance	
  of	
  data	
  
•  Data	
  Quality	
  repor>ng	
  
•  Issue	
  Tracking	
  
Data	
  Centric,	
  Technology	
  Enabled,	
  Business	
  Focused	
  
@joe_caserta	
  	
  #mitcdoiq	
  
•  DevOps	
  for	
  Analy>cs	
  	
  
•  Search-­‐Based	
  BI	
  	
  (NLP)	
  
•  Ar>ficial	
  Intelligence	
  (AI)	
  
•  Virtual	
  Reality	
  BI	
  	
  (VR)	
  
•  Virtual	
  Assistant	
  BI	
  (Voice)	
  
•  Repor>ng/Predic>ons	
  Converge	
  	
  
•  Ci>zen	
  Data	
  Scien>sts	
  Emerge	
  
What	
  the	
  Future	
  Holds	
  
@joe_caserta	
  	
  #mitcdoiq	
  
Joe Caserta
President, Caserta Concepts
joe@casertaconcepts.com
Data is not important, it’s what you do with it that’s important!
Thank	
  You	
  
Massachusetts
Institute of
Technology
Chief Data Officer and
Information Quality Symposium
@joe_caserta	
  	
  #mitcdoiq	
  
S3
Ingest Storage ETL Presentation VisualizationData Sources
•  OPRA
•  Equifax
•  CDS
•  Moody’s
•  BlackBox
Relational Datasets
•  Barclay
•  Eureka
•  Hedge Fund
Intelligence
•  Hedge Fund
Research
•  Lipper
•  Morningstar
•  MF Holdings
•  BD/ADV
Flat File Datasets
S/FTP
Push
Kinesis
•  CAT
Landing
Data Lake
(Tier 1)
Data Lake
(Tier 2)
Data Science
(Ephemeral)
Redshift
Spark
(Streaming*
/Batch)
Lambda&
Data&Science&
•  Python&
•  SQL&
•  Scala&
•  Predic5ve&
Analy5cs&
•  Text&Analy5cs&
•  Business&
Intelligence&
Structured&
Data&
Redshift
Metadata&
Repository&
•  Data&
Marketplace&
•  Clean&
•  Match&
•  Derive&
•  Aggregate&
•  Mllib&
•  CoreNLP&
•  Prepare&
•  Deliver&
Streaming Data Sets
Sample	
  Solu>on	
  Architecture	
  
@joe_caserta	
  	
  #mitcdoiq	
  
	
  	
  	
  	
  	
  Cloud	
  Component	
   AWS	
   Google	
   Microsog	
  
Scalable	
  distributed	
  storage	
   S3	
   GCS	
   Azure	
  Storage	
  
Pluggable	
  fit-­‐for-­‐purpose	
  processing	
   EMR	
   DataProc	
   HDInsight	
  
Compute	
  Services	
   EC2	
   GCE	
   VMs	
  
Consistent	
  extensible	
  framework	
   Spark	
   Spark	
   Spark	
  
Dimensional	
  MPP	
  Data	
  Warehouse	
   Redshix/	
  
Snowflake	
  
BigQuery	
  
Azure	
  SQL	
  Data	
  
Warehouse	
  
Data	
  Streaming	
   Kenesis	
   PubSub	
   Azure	
  Stream	
  
Common	
  Interface	
   Jupyter	
   DataLab	
   Azure	
  Notebook	
  
The	
  Data	
  Lake	
  on	
  the	
  Cloud	
  
•  Remove	
  barriers	
  between	
  data	
  inges>on	
  and	
  analysis	
  
•  Democra>ze	
  data	
  with	
  Just	
  Enough	
  Data	
  Governance	
  (JEDG)	
  

More Related Content

What's hot

Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Caserta
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?Caserta
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on HadoopCaserta
 
A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...DataWorks Summit
 
Moving Past Infrastructure Limitations
Moving Past Infrastructure LimitationsMoving Past Infrastructure Limitations
Moving Past Infrastructure LimitationsCaserta
 
Journey to Cloud Analytics
Journey to Cloud Analytics Journey to Cloud Analytics
Journey to Cloud Analytics Datavail
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the CloudCaserta
 
Balancing Data Governance and Innovation
Balancing Data Governance and InnovationBalancing Data Governance and Innovation
Balancing Data Governance and InnovationCaserta
 
What Data Do You Have and Where is It?
What Data Do You Have and Where is It? What Data Do You Have and Where is It?
What Data Do You Have and Where is It? Caserta
 
Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
Agile Leadership: Guiding DataOps Teams Through Rapid Change and UncertaintyAgile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
Agile Leadership: Guiding DataOps Teams Through Rapid Change and UncertaintyTamrMarketing
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for EveryoneCaserta
 
Balancing Data Governance and Innovation
Balancing Data Governance and InnovationBalancing Data Governance and Innovation
Balancing Data Governance and InnovationCaserta
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkCaserta
 
Setting Up the Data Lake
Setting Up the Data LakeSetting Up the Data Lake
Setting Up the Data LakeCaserta
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18Harvinder Atwal
 
Focus on Your Analysis, Not Your SQL Code
Focus on Your Analysis, Not Your SQL CodeFocus on Your Analysis, Not Your SQL Code
Focus on Your Analysis, Not Your SQL CodeDATAVERSITY
 
Michael Stonebraker: Big Data, Disruption, and the 800 Pound Gorilla in the ...
Michael Stonebraker:  Big Data, Disruption, and the 800 Pound Gorilla in the ...Michael Stonebraker:  Big Data, Disruption, and the 800 Pound Gorilla in the ...
Michael Stonebraker: Big Data, Disruption, and the 800 Pound Gorilla in the ...TamrMarketing
 
The Key to Big Data Modeling: Collaboration
The Key to Big Data Modeling: CollaborationThe Key to Big Data Modeling: Collaboration
The Key to Big Data Modeling: CollaborationEmbarcadero Technologies
 
How to Consume Your Data for AI
How to Consume Your Data for AIHow to Consume Your Data for AI
How to Consume Your Data for AIDATAVERSITY
 
Bringing Strategy to Life: Using an Intelligent Data Platform to Become Data ...
Bringing Strategy to Life: Using an Intelligent Data Platform to Become Data ...Bringing Strategy to Life: Using an Intelligent Data Platform to Become Data ...
Bringing Strategy to Life: Using an Intelligent Data Platform to Become Data ...DLT Solutions
 

What's hot (20)

Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on Hadoop
 
A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...
 
Moving Past Infrastructure Limitations
Moving Past Infrastructure LimitationsMoving Past Infrastructure Limitations
Moving Past Infrastructure Limitations
 
Journey to Cloud Analytics
Journey to Cloud Analytics Journey to Cloud Analytics
Journey to Cloud Analytics
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the Cloud
 
Balancing Data Governance and Innovation
Balancing Data Governance and InnovationBalancing Data Governance and Innovation
Balancing Data Governance and Innovation
 
What Data Do You Have and Where is It?
What Data Do You Have and Where is It? What Data Do You Have and Where is It?
What Data Do You Have and Where is It?
 
Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
Agile Leadership: Guiding DataOps Teams Through Rapid Change and UncertaintyAgile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for Everyone
 
Balancing Data Governance and Innovation
Balancing Data Governance and InnovationBalancing Data Governance and Innovation
Balancing Data Governance and Innovation
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache Spark
 
Setting Up the Data Lake
Setting Up the Data LakeSetting Up the Data Lake
Setting Up the Data Lake
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18
 
Focus on Your Analysis, Not Your SQL Code
Focus on Your Analysis, Not Your SQL CodeFocus on Your Analysis, Not Your SQL Code
Focus on Your Analysis, Not Your SQL Code
 
Michael Stonebraker: Big Data, Disruption, and the 800 Pound Gorilla in the ...
Michael Stonebraker:  Big Data, Disruption, and the 800 Pound Gorilla in the ...Michael Stonebraker:  Big Data, Disruption, and the 800 Pound Gorilla in the ...
Michael Stonebraker: Big Data, Disruption, and the 800 Pound Gorilla in the ...
 
The Key to Big Data Modeling: Collaboration
The Key to Big Data Modeling: CollaborationThe Key to Big Data Modeling: Collaboration
The Key to Big Data Modeling: Collaboration
 
How to Consume Your Data for AI
How to Consume Your Data for AIHow to Consume Your Data for AI
How to Consume Your Data for AI
 
Bringing Strategy to Life: Using an Intelligent Data Platform to Become Data ...
Bringing Strategy to Life: Using an Intelligent Data Platform to Become Data ...Bringing Strategy to Life: Using an Intelligent Data Platform to Become Data ...
Bringing Strategy to Life: Using an Intelligent Data Platform to Become Data ...
 

Viewers also liked

"Building an Epic Brand" at SaaStr Annual 2016
"Building an Epic Brand" at SaaStr Annual 2016"Building an Epic Brand" at SaaStr Annual 2016
"Building an Epic Brand" at SaaStr Annual 2016saastr
 
Graylog for open stack 3 steps to know why
Graylog for open stack    3 steps to know whyGraylog for open stack    3 steps to know why
Graylog for open stack 3 steps to know whyMạnh Đinh
 
Dataiku pig - hive - cascading
Dataiku   pig - hive - cascadingDataiku   pig - hive - cascading
Dataiku pig - hive - cascadingDataiku
 
15h00 intel - intel big data for aws summits rev3
15h00   intel - intel big data for aws summits rev315h00   intel - intel big data for aws summits rev3
15h00 intel - intel big data for aws summits rev3infolive
 
Fracture du pied chez l'enfant
Fracture du pied chez l'enfantFracture du pied chez l'enfant
Fracture du pied chez l'enfantAyoub EL KADDOURI
 
Four Graphics credentials
Four Graphics credentialsFour Graphics credentials
Four Graphics credentialsEmile Melki
 
Bioocean1 :Introduction to Biological Oceanography
Bioocean1 :Introduction to Biological Oceanography Bioocean1 :Introduction to Biological Oceanography
Bioocean1 :Introduction to Biological Oceanography Gazi Abdullah
 
7+1 hiba, amit Te is elkövet(het)sz
7+1 hiba, amit Te is elkövet(het)sz7+1 hiba, amit Te is elkövet(het)sz
7+1 hiba, amit Te is elkövet(het)szCzímer Zoltán
 
прайс лист ооо форсэт
прайс лист ооо форсэтпрайс лист ооо форсэт
прайс лист ооо форсэтstrelk
 
Four Strategies to Create a DevOps Culture & System that Favors Innovation & ...
Four Strategies to Create a DevOps Culture & System that Favors Innovation & ...Four Strategies to Create a DevOps Culture & System that Favors Innovation & ...
Four Strategies to Create a DevOps Culture & System that Favors Innovation & ...Amazon Web Services
 
IT6701 Information Management Unit-I
IT6701 Information Management Unit-IIT6701 Information Management Unit-I
IT6701 Information Management Unit-IMikel Raj
 
Modernizing the Legacy - How Dish is Adapting its SOA Services for a Cloud Fi...
Modernizing the Legacy - How Dish is Adapting its SOA Services for a Cloud Fi...Modernizing the Legacy - How Dish is Adapting its SOA Services for a Cloud Fi...
Modernizing the Legacy - How Dish is Adapting its SOA Services for a Cloud Fi...VMware Tanzu
 
Esdm Case Studies
Esdm Case StudiesEsdm Case Studies
Esdm Case StudiesTony Andre
 
Technical Radar (Chinese version) 2014-06
Technical Radar (Chinese version) 2014-06Technical Radar (Chinese version) 2014-06
Technical Radar (Chinese version) 2014-06Freyr Lin
 
EMOCON 2017 S/S - 마음이 편해지는 글로벌 인프라 만들기
EMOCON 2017 S/S - 마음이 편해지는 글로벌 인프라 만들기EMOCON 2017 S/S - 마음이 편해지는 글로벌 인프라 만들기
EMOCON 2017 S/S - 마음이 편해지는 글로벌 인프라 만들기Seung Heun Noh
 
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraLow-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraCaserta
 
DGIQ 2015 The Fundamentals of Data Quality
DGIQ 2015 The Fundamentals of Data QualityDGIQ 2015 The Fundamentals of Data Quality
DGIQ 2015 The Fundamentals of Data QualityCaserta
 

Viewers also liked (20)

"Building an Epic Brand" at SaaStr Annual 2016
"Building an Epic Brand" at SaaStr Annual 2016"Building an Epic Brand" at SaaStr Annual 2016
"Building an Epic Brand" at SaaStr Annual 2016
 
Graylog for open stack 3 steps to know why
Graylog for open stack    3 steps to know whyGraylog for open stack    3 steps to know why
Graylog for open stack 3 steps to know why
 
Dataiku pig - hive - cascading
Dataiku   pig - hive - cascadingDataiku   pig - hive - cascading
Dataiku pig - hive - cascading
 
15h00 intel - intel big data for aws summits rev3
15h00   intel - intel big data for aws summits rev315h00   intel - intel big data for aws summits rev3
15h00 intel - intel big data for aws summits rev3
 
Fracture du pied chez l'enfant
Fracture du pied chez l'enfantFracture du pied chez l'enfant
Fracture du pied chez l'enfant
 
Four Graphics credentials
Four Graphics credentialsFour Graphics credentials
Four Graphics credentials
 
Bioocean1 :Introduction to Biological Oceanography
Bioocean1 :Introduction to Biological Oceanography Bioocean1 :Introduction to Biological Oceanography
Bioocean1 :Introduction to Biological Oceanography
 
7+1 hiba, amit Te is elkövet(het)sz
7+1 hiba, amit Te is elkövet(het)sz7+1 hiba, amit Te is elkövet(het)sz
7+1 hiba, amit Te is elkövet(het)sz
 
прайс лист ооо форсэт
прайс лист ооо форсэтпрайс лист ооо форсэт
прайс лист ооо форсэт
 
Four Strategies to Create a DevOps Culture & System that Favors Innovation & ...
Four Strategies to Create a DevOps Culture & System that Favors Innovation & ...Four Strategies to Create a DevOps Culture & System that Favors Innovation & ...
Four Strategies to Create a DevOps Culture & System that Favors Innovation & ...
 
IT6701 Information Management Unit-I
IT6701 Information Management Unit-IIT6701 Information Management Unit-I
IT6701 Information Management Unit-I
 
Understanding big data
Understanding big dataUnderstanding big data
Understanding big data
 
Azure Key Vault
Azure Key VaultAzure Key Vault
Azure Key Vault
 
Modernizing the Legacy - How Dish is Adapting its SOA Services for a Cloud Fi...
Modernizing the Legacy - How Dish is Adapting its SOA Services for a Cloud Fi...Modernizing the Legacy - How Dish is Adapting its SOA Services for a Cloud Fi...
Modernizing the Legacy - How Dish is Adapting its SOA Services for a Cloud Fi...
 
Esdm Case Studies
Esdm Case StudiesEsdm Case Studies
Esdm Case Studies
 
Technical Radar (Chinese version) 2014-06
Technical Radar (Chinese version) 2014-06Technical Radar (Chinese version) 2014-06
Technical Radar (Chinese version) 2014-06
 
EMOCON 2017 S/S - 마음이 편해지는 글로벌 인프라 만들기
EMOCON 2017 S/S - 마음이 편해지는 글로벌 인프라 만들기EMOCON 2017 S/S - 마음이 편해지는 글로벌 인프라 만들기
EMOCON 2017 S/S - 마음이 편해지는 글로벌 인프라 만들기
 
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraLow-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
 
Cloud developer evolution
Cloud developer evolutionCloud developer evolution
Cloud developer evolution
 
DGIQ 2015 The Fundamentals of Data Quality
DGIQ 2015 The Fundamentals of Data QualityDGIQ 2015 The Fundamentals of Data Quality
DGIQ 2015 The Fundamentals of Data Quality
 

Similar to Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT CDOIQ, 2017)

The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation Caserta
 
Align Business Data & Analytics for Digital Transformation
Align Business Data & Analytics for Digital TransformationAlign Business Data & Analytics for Digital Transformation
Align Business Data & Analytics for Digital TransformationPerficient, Inc.
 
Architecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsArchitecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsCaserta
 
Data Governance Strategies - With Great Power Comes Great Accountability
Data Governance Strategies - With Great Power Comes Great AccountabilityData Governance Strategies - With Great Power Comes Great Accountability
Data Governance Strategies - With Great Power Comes Great AccountabilityDATAVERSITY
 
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...Matt Stubbs
 
Turning Big Data into Better Business Outcomes
Turning Big Data into Better Business OutcomesTurning Big Data into Better Business Outcomes
Turning Big Data into Better Business OutcomesCisco Canada
 
Go-To-Market with Capstone v3
Go-To-Market with Capstone v3Go-To-Market with Capstone v3
Go-To-Market with Capstone v3Tracy Hawkey
 
Usama Fayyad talk in South Africa: From BigData to Data Science
Usama Fayyad talk in South Africa:  From BigData to Data ScienceUsama Fayyad talk in South Africa:  From BigData to Data Science
Usama Fayyad talk in South Africa: From BigData to Data ScienceUsama Fayyad
 
Data Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesData Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesDATAVERSITY
 
Ellicium's Gadfly - Next Generation Big Data Text Analytics Platform
Ellicium's Gadfly - Next Generation Big Data Text Analytics Platform Ellicium's Gadfly - Next Generation Big Data Text Analytics Platform
Ellicium's Gadfly - Next Generation Big Data Text Analytics Platform Ellicium Solutions Inc.
 
Necessity of Data Lakes in the Financial Services Sector
Necessity of Data Lakes in the Financial Services SectorNecessity of Data Lakes in the Financial Services Sector
Necessity of Data Lakes in the Financial Services SectorDataWorks Summit
 
2014 Big Data Research by IDG Enterprise
2014 Big Data Research by IDG Enterprise2014 Big Data Research by IDG Enterprise
2014 Big Data Research by IDG EnterpriseIDG
 
Big Data's Impact on the Enterprise
Big Data's Impact on the EnterpriseBig Data's Impact on the Enterprise
Big Data's Impact on the EnterpriseCaserta
 
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...Denodo
 
Nasscomilf2014 thedigitalenterprise-bigdataandanalyticsleadtheway-thomashdave...
Nasscomilf2014 thedigitalenterprise-bigdataandanalyticsleadtheway-thomashdave...Nasscomilf2014 thedigitalenterprise-bigdataandanalyticsleadtheway-thomashdave...
Nasscomilf2014 thedigitalenterprise-bigdataandanalyticsleadtheway-thomashdave...Sandra Fernandes
 
Big data
Big dataBig data
Big dataRiya
 
Why Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A LieWhy Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A LieSunil Ranka
 

Similar to Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT CDOIQ, 2017) (20)

The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation
 
Align Business Data & Analytics for Digital Transformation
Align Business Data & Analytics for Digital TransformationAlign Business Data & Analytics for Digital Transformation
Align Business Data & Analytics for Digital Transformation
 
The Rise of People Analytics
The Rise of People AnalyticsThe Rise of People Analytics
The Rise of People Analytics
 
Architecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsArchitecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment Options
 
Data Governance Strategies - With Great Power Comes Great Accountability
Data Governance Strategies - With Great Power Comes Great AccountabilityData Governance Strategies - With Great Power Comes Great Accountability
Data Governance Strategies - With Great Power Comes Great Accountability
 
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
 
Turning Big Data into Better Business Outcomes
Turning Big Data into Better Business OutcomesTurning Big Data into Better Business Outcomes
Turning Big Data into Better Business Outcomes
 
Go-To-Market with Capstone v3
Go-To-Market with Capstone v3Go-To-Market with Capstone v3
Go-To-Market with Capstone v3
 
Data is not the new snake oil
Data is not the new snake oilData is not the new snake oil
Data is not the new snake oil
 
Usama Fayyad talk in South Africa: From BigData to Data Science
Usama Fayyad talk in South Africa:  From BigData to Data ScienceUsama Fayyad talk in South Africa:  From BigData to Data Science
Usama Fayyad talk in South Africa: From BigData to Data Science
 
Data Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesData Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & Approaches
 
Ellicium's Gadfly - Next Generation Big Data Text Analytics Platform
Ellicium's Gadfly - Next Generation Big Data Text Analytics Platform Ellicium's Gadfly - Next Generation Big Data Text Analytics Platform
Ellicium's Gadfly - Next Generation Big Data Text Analytics Platform
 
Necessity of Data Lakes in the Financial Services Sector
Necessity of Data Lakes in the Financial Services SectorNecessity of Data Lakes in the Financial Services Sector
Necessity of Data Lakes in the Financial Services Sector
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
2014 Big Data Research by IDG Enterprise
2014 Big Data Research by IDG Enterprise2014 Big Data Research by IDG Enterprise
2014 Big Data Research by IDG Enterprise
 
Big Data's Impact on the Enterprise
Big Data's Impact on the EnterpriseBig Data's Impact on the Enterprise
Big Data's Impact on the Enterprise
 
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
 
Nasscomilf2014 thedigitalenterprise-bigdataandanalyticsleadtheway-thomashdave...
Nasscomilf2014 thedigitalenterprise-bigdataandanalyticsleadtheway-thomashdave...Nasscomilf2014 thedigitalenterprise-bigdataandanalyticsleadtheway-thomashdave...
Nasscomilf2014 thedigitalenterprise-bigdataandanalyticsleadtheway-thomashdave...
 
Big data
Big dataBig data
Big data
 
Why Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A LieWhy Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A Lie
 

More from Caserta

Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Caserta
 
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Caserta
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure CloudCaserta
 
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by DatabricksCaserta
 
Introducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing MeetupIntroducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing MeetupCaserta
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureCaserta
 
Real Time Big Data Processing on AWS
Real Time Big Data Processing on AWSReal Time Big Data Processing on AWS
Real Time Big Data Processing on AWSCaserta
 
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeBig Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeCaserta
 
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Caserta
 

More from Caserta (9)

Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)
 
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure Cloud
 
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by Databricks
 
Introducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing MeetupIntroducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing Meetup
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic Architecture
 
Real Time Big Data Processing on AWS
Real Time Big Data Processing on AWSReal Time Big Data Processing on AWS
Real Time Big Data Processing on AWS
 
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeBig Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data Lake
 
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
 

Recently uploaded

GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 

Recently uploaded (20)

GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 

Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT CDOIQ, 2017)

  • 1. @joe_caserta    #mitcdoiq   Integrating the CDO Role Into Your Organization Managing the Disruption Presented By: Joe Caserta July 13, 2017 @joe_Caserta #MITCDOIQ Massachusetts Institute of Technology Chief Data Officer and Information Quality Symposium
  • 2. @joe_caserta    #mitcdoiq   Joe  Caserta   Launched Big Data practice Co-author, with Ralph Kimball, The Data Warehouse ETL Toolkit (Wiley) Data Analysis, Data Warehousing and Business Intelligence since 1996 Began consulting database programing and data modeling 30+ years hands-on experience building database solutions Founded Caserta Concepts in NYC Web log analytics solution published in Intelligent Enterprise magazine Launched Data Science, Data Interaction and Cloud practices Laser focus on extending Data Analytics with Big Data solutions 1986   2004   1996   2009   2001   2013   2012   2016   Dedicated to Data Governance Techniques on Big Data (Innovation) Awarded Top 20 Big Data Companies 2016 Top 20 Most Powerful Big Data consulting firms Launched Big Data Warehousing (BDW) Meetup NYC: 4.500+ Members 2017   Added Disruption Management Practice to Caserta Established Best Practices for big data ecosystem implementations
  • 3. @joe_caserta    #mitcdoiq   About  Caserta  Concepts   –  Consul1ng  Data  Innova>on  and  Modern  Data  Engineering   –  Award-­‐winning  company   –  Interna>onally  recognized  work  force   –  Strategy,  Architecture,  Implementa>on,  Governance   –  Innova1on  Partner   –  Strategic  Consul>ng   –  Advanced  Architecture   –  Build  &  Deploy   –  Leader  in  Enterprise  Data  Solu>ons   –  Big  Data  Analy>cs   –  Data  Warehousing   –  Business  Intelligence   –  Data  Science   –  Cloud  Compu>ng   –  Data  Governance  
  • 4. @joe_caserta    #mitcdoiq   Caserta  Client  PorQolio   Retail/eCommerce   &  Manufacturing   Finance,  Healthcare   &  Insurance   Digital  Media/AdTech   Educa>on  &  Services  
  • 5. @joe_caserta    #mitcdoiq   Awards  &  Recogni>on   Top 10 Fastest Growing Big Data Companies 2016
  • 6. @joe_caserta    #mitcdoiq   Our  Partners  
  • 7. @joe_caserta    #mitcdoiq   1500s% Prin*ng%Press% 1840s% Penny%Post% 1850s% Telegraph% 1850s% Rural%Free%Post% 1890s% Telephone% 1900s% Radio% 1950s% TV% 1970s% PCs% 1980s% Internet% 1990s% Web% 2000s% Social%Media,%Mobile,%Big%Data,%Cloud%%% 98,000+%Tweets& 695,000&Status&Updates& 11%Million&instant&messages& 698,445&Google&Searches& 168%million+&emails&sent& 1,829%TB%of&data&created&& 217&new&mobile&web& users& Every 60 Seconds Why  is  Data  So  Important?  
  • 8. @joe_caserta    #mitcdoiq   Harnessing  the  Customer  Journey   Awareness   Considera>on   Purchase   Service   Loyalty   Expansion   PR   Radio   TV   Print   Outdoor   Word  of  Mouth   Direct  Mail   Customer  Service   Physical  Touchpoints   Digital  Touchpoints   Search   Paid  Content   email   Website/   Landing  Pages   Social  Media   Community   Chat   Social  Media   Call  Center   Offers   Mailings   Survey   Loyalty  Programs   email   Agents   Partners   Ads   Website   Mobile   3rd  Party  Sites   Offers   Web  self-­‐service  
  • 9. @joe_caserta    #mitcdoiq   A[ribu>on   Type   Comments   Single  Touch   Rules-­‐Based   Sta>s>cally  Driven   Assign  the  credit   to  the  first  or  last   exposure   Assign  the  credit   to  each  interac>on   based  on  business   rules   Assign  the  credit  to   interac>ons  based   on  data-­‐driven   model   Ad-­‐Click   Mailing   Mailing  E-­‐mail   E-­‐mail  Ad-­‐Click   Ad-­‐Click   100%   33%   33%   33%   27%   49%   24%   -  Last  touch  only   -  Ignores  bulk  of   customer  journey   -  Undervalues   other  interac>ons   and  influencers     -  Subjec>ve   -  Assigns  arbitrary   values  to  each   interac>on   -  Lacks  analy>cs  rigor   to  determine  weights   ü  Looks  at  full  behavior   pa[erns   ü  Consider  all  touch  points   ü  Can  apply  different   models  for  best  results   ü  Use  data  to  find   correla>ons  between   touch  points  (winning   combina>ons)     Why  do  we  Care?  
  • 10. @joe_caserta    #mitcdoiq   Onboarding  New  Data   Business:    “I  need  to  analyze  some  new  data”     ü     IT  collects  requirements   ü     Creates  normalized  and/or  dimensional  data  models   ü     Profiles  and  conforms  and  the  data   ü     Sophis>cated  ETL  programs  and  quality  standards     ü     Loads  it  into  data  models   ü     Builds  a  BI  seman>c  layer   ü     Creates  dashboards  and  reports   IT:  “You’ll  have  your  data  in  3-­‐6  months  to  see    if  it  has  value!   –  Onboarding  new  data  is  difficult!   –  Rigid  Structures  and  Data  Governance   –  Disconnected/removed  from  business    
  • 11. @joe_caserta    #mitcdoiq   Houston,  we  have  a  Problem:  Data  Sprawl   •  There  is  one  applica>on  for  every  5-­‐10  employees  genera>ng  copies  of   the  same  files  leading  to  massive  amounts  of  duplicate  idle  data  strewn  all   across  the  enterprise.                                      -­‐  Michael  Vizard,  ITBusinessEdge.com   •  Employees  spend  35%  of  their  work  >me  searching  for  informa>on...   finding  what  they  seek  50%  of  the  >me  or  less.                                                                                      -­‐  “The  High  Cost  of  Not  Finding  Informa>on,”  IDC  
  • 13. @joe_caserta    #mitcdoiq   GDPR  Cannot  be  Ignored   GDPR  Compliance  Top  Data  Protec3on  Priority  for  92%  of  US  Organiza3ons  in  2017          -­‐  PwC  Survey   •  The  GDPR  requirements  will  force  U.S.  companies  to   change  the  way  they  process,  store,  and  protect   customers’  personal  data.   •  Companies  must  be  able  to  show  compliance  by   May  25,  2018   •  Data  Elements  Regulated:   •  Basic  iden>ty  informa>on  such  as  name,  address  and   ID  numbers   •  Web  data  such  as  loca>on,  IP  address,  cookie  data   and  RFID  tags   •  Health  and  gene>c  data   •  Biometric  data   •  Racial  or  ethnic  data   •  Poli>cal  opinions   •  Sexual  orienta>on   •  A  data  protec>on  officer  (DPO)  may  be  required   New  York  legislature,  inspired  by  the  GDPR,   proposed  the  Right  to  be  Forgo[en  Act,.     •  GDPR  will  con>nue  influencing  privacy   regula>ons  across  the  globe   •  Companies  that  comply  with  the  GDPR  will   be  be[er  prepared  for  future  changes  in   U.S.  legisla>on.  
  • 14. @joe_caserta    #mitcdoiq   The  New  Data  Paradigm     OLD  WAY:   •  Structure  Data  à  Ingest  Data    à  Analyze  Data   •  Fully  Governed   •  Monolith   NEW  WAY:   •  Ingest  Data  à  Analyze  Data  à  Structure  Data   •  Just  Enough  Governance   •  Dynamic   RECIPE:   •  Data  Officer  &  Data  Organiza>on   •  Enterprise  Data  Lake   •  Holis>c  Data  Architecture  &  Framework  
  • 15. @joe_caserta    #mitcdoiq   Ingest  Raw   Data   Organize,  Define,   Complete   Munging,  Blending   Machine  Learning   Data  Quality  and  Monitoring          Metadata,  ILM  ,  Security                    Data  Catalog                            Data  Integra>on   Fully  Governed  (  trusted)   Arbitrary/Ad-­‐hoc    Queries   and  Repor>ng   Big   Data   Warehouse   Data  Science  Workspace   Data  Lake  –  Integrated  Sandbox     Landing  Area  –  Source  Data  in  “Full  Fidelity”   Usage  Pa[ern   Data  Governance   Metadata,  ILM,        Security     Corporate  Data  Pyramid  (CDP)  
  • 16. @joe_caserta    #mitcdoiq   Data  Asset  Development  Lifecycle   •  Data  Science  is  performed  in  the  ephemeral  workspaces  to  derive  new  insights/assets   •  The  work  products  of  data  science  is  promoted  from  insights  to  assets.     •  Rigorous  Data  Governance  applied   •  Processes  must  be  hardened,  repeatable,  and  performant   Big$ Data$ Warehouse$ Data$Science$Workspace$ Data$Lake$–$Integrated$Sandbox$$ Landing$Area$–$Source$Data$in$“Full$Fidelity”$ New$$ Data$ New$ Insights$ Governance Refinery
  • 17. @joe_caserta    #mitcdoiq   Enter  the  Chief  Data  Officer   •  Evangelize  a  data  vision  for  the  organiza>on   •  Support  &  enforce  data  governance  policies  via  outreach,  training  &  tools   •  Monitor  and  enforce  data  quality  in  collabora>on  with  data  owners   •  Monitor  and  enforce  data  security  along  with  Legal/Security/Compliance   •  Work  with  IT  to  develop/maintain  an  enterprise  repository  of  strategic  data   •  Set  standards  for  analy>cal  repor>ng  and  generate  data  insights   •  Provide  a  single  point  of  accountability  for  data   ini>a>ves  and  issues   •  Innovate  ways  to  use  exis>ng  data   •  Enrich  and  augment  data  by  combining  internal  and   external  sources   •  Support  efficient  and  agile  analy1cs  through  training   and  templates  
  • 18. @joe_caserta    #mitcdoiq   The  CDO:  The  Whole  Brain  Challenge   Front   Back   Analy1cs  Oriented   •  Data  Science   •  Research   Process  Oriented   •  Data  Governance   •  Compliance   Opera1ons  Oriented   •  Shared  Services   •  Data  Engineering   Revenue  Oriented   •  Revenue  Goals   •  Mone>zing  Data  
  • 19. @joe_caserta    #mitcdoiq   Data  Officer   • Create  and  evangelize  vision,   strategy,  and  mission   statement   • Create,  communicate,  and   enforce  policies,  procedures,   and  processes   • Plan,  priori>ze,  and  project   manage  data  ini>a>ves   • Prepare  &  maintain  budget   for  staff,  infrastructure,   services,  tools  &  training   • Innovate  ways  to  use  exis>ng   data   • Enrich  and  augment  data  by   combining  internal  and   external  sources   • Protec>on  –  ensuring  data   privacy  and  security   Data  Governance  Lead     • Represent  business  interests   across  departments   • Priori>ze  and  manage  data   requests  and  remedia>on   efforts   • Iden>fy  pockets  of  business,   technical,  and  data  exper>se   • Socialize  policies  and  support   programs   Data  Stewards   • Receive,  manage,  priori>ze   and  track  data  quality  issues   • Proac>vely  lead  data  quality   monitoring  of  high  value  data   • Iden>fy,  train,  and  manage   cri>cal  data  sources   • Ensure  remedia>on  efforts   follow  change  management   policies   • Assist  in  management  and   maintenance  of  master  data   Data  Librarian       • Track  and  manage  data   related  assets  (sources,   metadata,  business  glossary,   data  lineage)   • Track  and  manage  common   queries  with  embedded   business  logic   • Track  and  manage  canned   reports  (to  prevent   duplica>on)   • Track  and  manage  custom   reports  (to  prevent   duplica>on)   • Track  and  manage  standard   reports  and  dashboard   templates   • Track  internal  and  external   data  and  tool  experts   • Manage  the  Data  Governance   knowledge  repository   Data  Organiza>on  Roles  
  • 20. @joe_caserta    #mitcdoiq                 Global  economics   Intensity  of  compe>>on   Reduce  costs   Move  to  cross-­‐func>onal  teams   New  execu>ve  leadership   Speed  of  technical  change   Social  trends  and  changes   Period  of  >me  in  present  role   Status  &  perks  of  office/dept  under  threat   No  apparent  reasons  for  proposed  changes   Lack  of  understanding  of  proposed  changes   Fear  of  inability  to  cope  with  new  technology   Concern  over  job  security   Forces  for  Change   Forces  Resis>ng  Change   Status  Quo   Disrup>on  Management   h[p://www.change-­‐management-­‐coach.com/force-­‐field-­‐analysis.html  
  • 21. @joe_caserta    #mitcdoiq   Chief  Data  Organiza1on  (Oversight)   Ver1cal  Business  Area   [Sales/Finance/Marke>ng/Opera>ons/Customer  Svc]   Product  Owner   SCRUM  Master     Development  Team       Business  Subject  Ma[er  Exper>se   Data  Librarian/Data  Stewardship   Data  Science/  Sta>s>cal  Skills   Data  Engineering    /  Architecture   Presenta>on/  BI  Report  Development  Skills   Data  Quality  Assurance   DevOps     IT  Organiza1on     (Oversight)   Enterprise  Data  Architect     Solu>on  Engineers   Data  Integra>on  Prac>ce     User  Experience  Prac>ce      QA  Prac>ce   Opera>ons  Prac>ce   Advanced  Analy1cs       Business  Analysts   Data  Analysts   Data  Scien>sts   Sta>s>cians   Data  Engineers     Planning  Organiza1on       Project  Managers   Data  Organiza1on     Data  Gov  Coordinator   Data  Librarians   Data  Stewards     It  Takes  a  Village!  
  • 22. @joe_caserta    #mitcdoiq   Cau1on:  Assembly  Required   —  Some  of  the  most  hopeful  tools  are  brand  new  or  in   incuba>on   —  Enterprise  big  data  implementa>ons  typically  combine   products  with  custom  built  components   Making  it  Happen   People,  Processes  and  Business  commitment  are  s1ll  cri1cal!                                                     Data  Integra1on  &  Quality   Data  Catalog  &  Governance   Emerging  Solu1ons  
  • 23. @joe_caserta    #mitcdoiq   CDO  Success  in  Summary   •  Self-­‐service,  reduce  ongoing   dependency  on  IT   •  Automate  Workflows   Streamline  Processes   Automa>on  Business  Defini>ons   •  Iden>fica>on  of  KPI’s   •  Itera>ve  Process  –  defini>ons   mature  over  >me   •  Tools  provide  user-­‐centric   experience   •  Data  Discovery   •  Data  Profiling   •  Workflows   •  Data  Quality   •  Automated  ILM   •  CDO   •  Data  Governance  Council   •  Data  Stewardship  Team   •  Business  SME’s   •  Data  Scien>sts  for  Insights   Roles   Metrics  Architecture   •  Consolidated  view  of  data   •  Flexibility  for  future  growth   •  Viewable  Everywhere   •  Gauge  overall  governance  of  data   •  Data  Quality  repor>ng   •  Issue  Tracking   Data  Centric,  Technology  Enabled,  Business  Focused  
  • 24. @joe_caserta    #mitcdoiq   •  DevOps  for  Analy>cs     •  Search-­‐Based  BI    (NLP)   •  Ar>ficial  Intelligence  (AI)   •  Virtual  Reality  BI    (VR)   •  Virtual  Assistant  BI  (Voice)   •  Repor>ng/Predic>ons  Converge     •  Ci>zen  Data  Scien>sts  Emerge   What  the  Future  Holds  
  • 25. @joe_caserta    #mitcdoiq   Joe Caserta President, Caserta Concepts joe@casertaconcepts.com Data is not important, it’s what you do with it that’s important! Thank  You   Massachusetts Institute of Technology Chief Data Officer and Information Quality Symposium
  • 26. @joe_caserta    #mitcdoiq   S3 Ingest Storage ETL Presentation VisualizationData Sources •  OPRA •  Equifax •  CDS •  Moody’s •  BlackBox Relational Datasets •  Barclay •  Eureka •  Hedge Fund Intelligence •  Hedge Fund Research •  Lipper •  Morningstar •  MF Holdings •  BD/ADV Flat File Datasets S/FTP Push Kinesis •  CAT Landing Data Lake (Tier 1) Data Lake (Tier 2) Data Science (Ephemeral) Redshift Spark (Streaming* /Batch) Lambda& Data&Science& •  Python& •  SQL& •  Scala& •  Predic5ve& Analy5cs& •  Text&Analy5cs& •  Business& Intelligence& Structured& Data& Redshift Metadata& Repository& •  Data& Marketplace& •  Clean& •  Match& •  Derive& •  Aggregate& •  Mllib& •  CoreNLP& •  Prepare& •  Deliver& Streaming Data Sets Sample  Solu>on  Architecture  
  • 27. @joe_caserta    #mitcdoiq            Cloud  Component   AWS   Google   Microsog   Scalable  distributed  storage   S3   GCS   Azure  Storage   Pluggable  fit-­‐for-­‐purpose  processing   EMR   DataProc   HDInsight   Compute  Services   EC2   GCE   VMs   Consistent  extensible  framework   Spark   Spark   Spark   Dimensional  MPP  Data  Warehouse   Redshix/   Snowflake   BigQuery   Azure  SQL  Data   Warehouse   Data  Streaming   Kenesis   PubSub   Azure  Stream   Common  Interface   Jupyter   DataLab   Azure  Notebook   The  Data  Lake  on  the  Cloud   •  Remove  barriers  between  data  inges>on  and  analysis   •  Democra>ze  data  with  Just  Enough  Data  Governance  (JEDG)