Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT CDOIQ, 2017)


Published on

The role of the Chief Data Officer (CDO) has become integral to the evolution needed to turn a wisdom-driven company into an analytics-driven company. With Data Governance at the core of your responsibility, moving the innovation meter is a global challenge among CDOs. Specifically the CDO must:

• Provide a single point of accountability for data initiatives and issues
• Innovate ways to use existing data and evangelize a data vision for the organization
• Support & enforce data governance policies via outreach, training & tools
• Work with IT to develop/maintain an enterprise data repository
• Set standards for analytical reporting and generate data insights through data science

In this session, Joe Caserta addresses real-word CDO challenges, shares techniques to overcome them, manage corporate disruption and achieve success.

Published in: Data & Analytics
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website!
    Are you sure you want to  Yes  No
    Your message goes here

Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT CDOIQ, 2017)

  1. 1. @joe_caserta    #mitcdoiq   Integrating the CDO Role Into Your Organization Managing the Disruption Presented By: Joe Caserta July 13, 2017 @joe_Caserta #MITCDOIQ Massachusetts Institute of Technology Chief Data Officer and Information Quality Symposium
  2. 2. @joe_caserta    #mitcdoiq   Joe  Caserta   Launched Big Data practice Co-author, with Ralph Kimball, The Data Warehouse ETL Toolkit (Wiley) Data Analysis, Data Warehousing and Business Intelligence since 1996 Began consulting database programing and data modeling 30+ years hands-on experience building database solutions Founded Caserta Concepts in NYC Web log analytics solution published in Intelligent Enterprise magazine Launched Data Science, Data Interaction and Cloud practices Laser focus on extending Data Analytics with Big Data solutions 1986   2004   1996   2009   2001   2013   2012   2016   Dedicated to Data Governance Techniques on Big Data (Innovation) Awarded Top 20 Big Data Companies 2016 Top 20 Most Powerful Big Data consulting firms Launched Big Data Warehousing (BDW) Meetup NYC: 4.500+ Members 2017   Added Disruption Management Practice to Caserta Established Best Practices for big data ecosystem implementations
  3. 3. @joe_caserta    #mitcdoiq   About  Caserta  Concepts   –  Consul1ng  Data  Innova>on  and  Modern  Data  Engineering   –  Award-­‐winning  company   –  Interna>onally  recognized  work  force   –  Strategy,  Architecture,  Implementa>on,  Governance   –  Innova1on  Partner   –  Strategic  Consul>ng   –  Advanced  Architecture   –  Build  &  Deploy   –  Leader  in  Enterprise  Data  Solu>ons   –  Big  Data  Analy>cs   –  Data  Warehousing   –  Business  Intelligence   –  Data  Science   –  Cloud  Compu>ng   –  Data  Governance  
  4. 4. @joe_caserta    #mitcdoiq   Caserta  Client  PorQolio   Retail/eCommerce   &  Manufacturing   Finance,  Healthcare   &  Insurance   Digital  Media/AdTech   Educa>on  &  Services  
  5. 5. @joe_caserta    #mitcdoiq   Awards  &  Recogni>on   Top 10 Fastest Growing Big Data Companies 2016
  6. 6. @joe_caserta    #mitcdoiq   Our  Partners  
  7. 7. @joe_caserta    #mitcdoiq   1500s% Prin*ng%Press% 1840s% Penny%Post% 1850s% Telegraph% 1850s% Rural%Free%Post% 1890s% Telephone% 1900s% Radio% 1950s% TV% 1970s% PCs% 1980s% Internet% 1990s% Web% 2000s% Social%Media,%Mobile,%Big%Data,%Cloud%%% 98,000+%Tweets& 695,000&Status&Updates& 11%Million&instant&messages& 698,445&Google&Searches& 168%million+&emails&sent& 1,829%TB%of&data&created&& 217&new&mobile&web& users& Every 60 Seconds Why  is  Data  So  Important?  
  8. 8. @joe_caserta    #mitcdoiq   Harnessing  the  Customer  Journey   Awareness   Considera>on   Purchase   Service   Loyalty   Expansion   PR   Radio   TV   Print   Outdoor   Word  of  Mouth   Direct  Mail   Customer  Service   Physical  Touchpoints   Digital  Touchpoints   Search   Paid  Content   email   Website/   Landing  Pages   Social  Media   Community   Chat   Social  Media   Call  Center   Offers   Mailings   Survey   Loyalty  Programs   email   Agents   Partners   Ads   Website   Mobile   3rd  Party  Sites   Offers   Web  self-­‐service  
  9. 9. @joe_caserta    #mitcdoiq   A[ribu>on   Type   Comments   Single  Touch   Rules-­‐Based   Sta>s>cally  Driven   Assign  the  credit   to  the  first  or  last   exposure   Assign  the  credit   to  each  interac>on   based  on  business   rules   Assign  the  credit  to   interac>ons  based   on  data-­‐driven   model   Ad-­‐Click   Mailing   Mailing  E-­‐mail   E-­‐mail  Ad-­‐Click   Ad-­‐Click   100%   33%   33%   33%   27%   49%   24%   -  Last  touch  only   -  Ignores  bulk  of   customer  journey   -  Undervalues   other  interac>ons   and  influencers     -  Subjec>ve   -  Assigns  arbitrary   values  to  each   interac>on   -  Lacks  analy>cs  rigor   to  determine  weights   ü  Looks  at  full  behavior   pa[erns   ü  Consider  all  touch  points   ü  Can  apply  different   models  for  best  results   ü  Use  data  to  find   correla>ons  between   touch  points  (winning   combina>ons)     Why  do  we  Care?  
  10. 10. @joe_caserta    #mitcdoiq   Onboarding  New  Data   Business:    “I  need  to  analyze  some  new  data”     ü     IT  collects  requirements   ü     Creates  normalized  and/or  dimensional  data  models   ü     Profiles  and  conforms  and  the  data   ü     Sophis>cated  ETL  programs  and  quality  standards     ü     Loads  it  into  data  models   ü     Builds  a  BI  seman>c  layer   ü     Creates  dashboards  and  reports   IT:  “You’ll  have  your  data  in  3-­‐6  months  to  see    if  it  has  value!   –  Onboarding  new  data  is  difficult!   –  Rigid  Structures  and  Data  Governance   –  Disconnected/removed  from  business    
  11. 11. @joe_caserta    #mitcdoiq   Houston,  we  have  a  Problem:  Data  Sprawl   •  There  is  one  applica>on  for  every  5-­‐10  employees  genera>ng  copies  of   the  same  files  leading  to  massive  amounts  of  duplicate  idle  data  strewn  all   across  the  enterprise.                                      -­‐  Michael  Vizard,   •  Employees  spend  35%  of  their  work  >me  searching  for  informa>on...   finding  what  they  seek  50%  of  the  >me  or  less.                                                                                      -­‐  “The  High  Cost  of  Not  Finding  Informa>on,”  IDC  
  12. 12. @joe_caserta    #mitcdoiq  
  13. 13. @joe_caserta    #mitcdoiq   GDPR  Cannot  be  Ignored   GDPR  Compliance  Top  Data  Protec3on  Priority  for  92%  of  US  Organiza3ons  in  2017          -­‐  PwC  Survey   •  The  GDPR  requirements  will  force  U.S.  companies  to   change  the  way  they  process,  store,  and  protect   customers’  personal  data.   •  Companies  must  be  able  to  show  compliance  by   May  25,  2018   •  Data  Elements  Regulated:   •  Basic  iden>ty  informa>on  such  as  name,  address  and   ID  numbers   •  Web  data  such  as  loca>on,  IP  address,  cookie  data   and  RFID  tags   •  Health  and  gene>c  data   •  Biometric  data   •  Racial  or  ethnic  data   •  Poli>cal  opinions   •  Sexual  orienta>on   •  A  data  protec>on  officer  (DPO)  may  be  required   New  York  legislature,  inspired  by  the  GDPR,   proposed  the  Right  to  be  Forgo[en  Act,.     •  GDPR  will  con>nue  influencing  privacy   regula>ons  across  the  globe   •  Companies  that  comply  with  the  GDPR  will   be  be[er  prepared  for  future  changes  in   U.S.  legisla>on.  
  14. 14. @joe_caserta    #mitcdoiq   The  New  Data  Paradigm     OLD  WAY:   •  Structure  Data  à  Ingest  Data    à  Analyze  Data   •  Fully  Governed   •  Monolith   NEW  WAY:   •  Ingest  Data  à  Analyze  Data  à  Structure  Data   •  Just  Enough  Governance   •  Dynamic   RECIPE:   •  Data  Officer  &  Data  Organiza>on   •  Enterprise  Data  Lake   •  Holis>c  Data  Architecture  &  Framework  
  15. 15. @joe_caserta    #mitcdoiq   Ingest  Raw   Data   Organize,  Define,   Complete   Munging,  Blending   Machine  Learning   Data  Quality  and  Monitoring          Metadata,  ILM  ,  Security                    Data  Catalog                            Data  Integra>on   Fully  Governed  (  trusted)   Arbitrary/Ad-­‐hoc    Queries   and  Repor>ng   Big   Data   Warehouse   Data  Science  Workspace   Data  Lake  –  Integrated  Sandbox     Landing  Area  –  Source  Data  in  “Full  Fidelity”   Usage  Pa[ern   Data  Governance   Metadata,  ILM,        Security     Corporate  Data  Pyramid  (CDP)  
  16. 16. @joe_caserta    #mitcdoiq   Data  Asset  Development  Lifecycle   •  Data  Science  is  performed  in  the  ephemeral  workspaces  to  derive  new  insights/assets   •  The  work  products  of  data  science  is  promoted  from  insights  to  assets.     •  Rigorous  Data  Governance  applied   •  Processes  must  be  hardened,  repeatable,  and  performant   Big$ Data$ Warehouse$ Data$Science$Workspace$ Data$Lake$–$Integrated$Sandbox$$ Landing$Area$–$Source$Data$in$“Full$Fidelity”$ New$$ Data$ New$ Insights$ Governance Refinery
  17. 17. @joe_caserta    #mitcdoiq   Enter  the  Chief  Data  Officer   •  Evangelize  a  data  vision  for  the  organiza>on   •  Support  &  enforce  data  governance  policies  via  outreach,  training  &  tools   •  Monitor  and  enforce  data  quality  in  collabora>on  with  data  owners   •  Monitor  and  enforce  data  security  along  with  Legal/Security/Compliance   •  Work  with  IT  to  develop/maintain  an  enterprise  repository  of  strategic  data   •  Set  standards  for  analy>cal  repor>ng  and  generate  data  insights   •  Provide  a  single  point  of  accountability  for  data   ini>a>ves  and  issues   •  Innovate  ways  to  use  exis>ng  data   •  Enrich  and  augment  data  by  combining  internal  and   external  sources   •  Support  efficient  and  agile  analy1cs  through  training   and  templates  
  18. 18. @joe_caserta    #mitcdoiq   The  CDO:  The  Whole  Brain  Challenge   Front   Back   Analy1cs  Oriented   •  Data  Science   •  Research   Process  Oriented   •  Data  Governance   •  Compliance   Opera1ons  Oriented   •  Shared  Services   •  Data  Engineering   Revenue  Oriented   •  Revenue  Goals   •  Mone>zing  Data  
  19. 19. @joe_caserta    #mitcdoiq   Data  Officer   • Create  and  evangelize  vision,   strategy,  and  mission   statement   • Create,  communicate,  and   enforce  policies,  procedures,   and  processes   • Plan,  priori>ze,  and  project   manage  data  ini>a>ves   • Prepare  &  maintain  budget   for  staff,  infrastructure,   services,  tools  &  training   • Innovate  ways  to  use  exis>ng   data   • Enrich  and  augment  data  by   combining  internal  and   external  sources   • Protec>on  –  ensuring  data   privacy  and  security   Data  Governance  Lead     • Represent  business  interests   across  departments   • Priori>ze  and  manage  data   requests  and  remedia>on   efforts   • Iden>fy  pockets  of  business,   technical,  and  data  exper>se   • Socialize  policies  and  support   programs   Data  Stewards   • Receive,  manage,  priori>ze   and  track  data  quality  issues   • Proac>vely  lead  data  quality   monitoring  of  high  value  data   • Iden>fy,  train,  and  manage   cri>cal  data  sources   • Ensure  remedia>on  efforts   follow  change  management   policies   • Assist  in  management  and   maintenance  of  master  data   Data  Librarian       • Track  and  manage  data   related  assets  (sources,   metadata,  business  glossary,   data  lineage)   • Track  and  manage  common   queries  with  embedded   business  logic   • Track  and  manage  canned   reports  (to  prevent   duplica>on)   • Track  and  manage  custom   reports  (to  prevent   duplica>on)   • Track  and  manage  standard   reports  and  dashboard   templates   • Track  internal  and  external   data  and  tool  experts   • Manage  the  Data  Governance   knowledge  repository   Data  Organiza>on  Roles  
  20. 20. @joe_caserta    #mitcdoiq                 Global  economics   Intensity  of  compe>>on   Reduce  costs   Move  to  cross-­‐func>onal  teams   New  execu>ve  leadership   Speed  of  technical  change   Social  trends  and  changes   Period  of  >me  in  present  role   Status  &  perks  of  office/dept  under  threat   No  apparent  reasons  for  proposed  changes   Lack  of  understanding  of  proposed  changes   Fear  of  inability  to  cope  with  new  technology   Concern  over  job  security   Forces  for  Change   Forces  Resis>ng  Change   Status  Quo   Disrup>on  Management   h[p://www.change-­‐management-­‐­‐field-­‐analysis.html  
  21. 21. @joe_caserta    #mitcdoiq   Chief  Data  Organiza1on  (Oversight)   Ver1cal  Business  Area   [Sales/Finance/Marke>ng/Opera>ons/Customer  Svc]   Product  Owner   SCRUM  Master     Development  Team       Business  Subject  Ma[er  Exper>se   Data  Librarian/Data  Stewardship   Data  Science/  Sta>s>cal  Skills   Data  Engineering    /  Architecture   Presenta>on/  BI  Report  Development  Skills   Data  Quality  Assurance   DevOps     IT  Organiza1on     (Oversight)   Enterprise  Data  Architect     Solu>on  Engineers   Data  Integra>on  Prac>ce     User  Experience  Prac>ce      QA  Prac>ce   Opera>ons  Prac>ce   Advanced  Analy1cs       Business  Analysts   Data  Analysts   Data  Scien>sts   Sta>s>cians   Data  Engineers     Planning  Organiza1on       Project  Managers   Data  Organiza1on     Data  Gov  Coordinator   Data  Librarians   Data  Stewards     It  Takes  a  Village!  
  22. 22. @joe_caserta    #mitcdoiq   Cau1on:  Assembly  Required   —  Some  of  the  most  hopeful  tools  are  brand  new  or  in   incuba>on   —  Enterprise  big  data  implementa>ons  typically  combine   products  with  custom  built  components   Making  it  Happen   People,  Processes  and  Business  commitment  are  s1ll  cri1cal!                                                     Data  Integra1on  &  Quality   Data  Catalog  &  Governance   Emerging  Solu1ons  
  23. 23. @joe_caserta    #mitcdoiq   CDO  Success  in  Summary   •  Self-­‐service,  reduce  ongoing   dependency  on  IT   •  Automate  Workflows   Streamline  Processes   Automa>on  Business  Defini>ons   •  Iden>fica>on  of  KPI’s   •  Itera>ve  Process  –  defini>ons   mature  over  >me   •  Tools  provide  user-­‐centric   experience   •  Data  Discovery   •  Data  Profiling   •  Workflows   •  Data  Quality   •  Automated  ILM   •  CDO   •  Data  Governance  Council   •  Data  Stewardship  Team   •  Business  SME’s   •  Data  Scien>sts  for  Insights   Roles   Metrics  Architecture   •  Consolidated  view  of  data   •  Flexibility  for  future  growth   •  Viewable  Everywhere   •  Gauge  overall  governance  of  data   •  Data  Quality  repor>ng   •  Issue  Tracking   Data  Centric,  Technology  Enabled,  Business  Focused  
  24. 24. @joe_caserta    #mitcdoiq   •  DevOps  for  Analy>cs     •  Search-­‐Based  BI    (NLP)   •  Ar>ficial  Intelligence  (AI)   •  Virtual  Reality  BI    (VR)   •  Virtual  Assistant  BI  (Voice)   •  Repor>ng/Predic>ons  Converge     •  Ci>zen  Data  Scien>sts  Emerge   What  the  Future  Holds  
  25. 25. @joe_caserta    #mitcdoiq   Joe Caserta President, Caserta Concepts Data is not important, it’s what you do with it that’s important! Thank  You   Massachusetts Institute of Technology Chief Data Officer and Information Quality Symposium
  26. 26. @joe_caserta    #mitcdoiq   S3 Ingest Storage ETL Presentation VisualizationData Sources •  OPRA •  Equifax •  CDS •  Moody’s •  BlackBox Relational Datasets •  Barclay •  Eureka •  Hedge Fund Intelligence •  Hedge Fund Research •  Lipper •  Morningstar •  MF Holdings •  BD/ADV Flat File Datasets S/FTP Push Kinesis •  CAT Landing Data Lake (Tier 1) Data Lake (Tier 2) Data Science (Ephemeral) Redshift Spark (Streaming* /Batch) Lambda& Data&Science& •  Python& •  SQL& •  Scala& •  Predic5ve& Analy5cs& •  Text&Analy5cs& •  Business& Intelligence& Structured& Data& Redshift Metadata& Repository& •  Data& Marketplace& •  Clean& •  Match& •  Derive& •  Aggregate& •  Mllib& •  CoreNLP& •  Prepare& •  Deliver& Streaming Data Sets Sample  Solu>on  Architecture  
  27. 27. @joe_caserta    #mitcdoiq            Cloud  Component   AWS   Google   Microsog   Scalable  distributed  storage   S3   GCS   Azure  Storage   Pluggable  fit-­‐for-­‐purpose  processing   EMR   DataProc   HDInsight   Compute  Services   EC2   GCE   VMs   Consistent  extensible  framework   Spark   Spark   Spark   Dimensional  MPP  Data  Warehouse   Redshix/   Snowflake   BigQuery   Azure  SQL  Data   Warehouse   Data  Streaming   Kenesis   PubSub   Azure  Stream   Common  Interface   Jupyter   DataLab   Azure  Notebook   The  Data  Lake  on  the  Cloud   •  Remove  barriers  between  data  inges>on  and  analysis   •  Democra>ze  data  with  Just  Enough  Data  Governance  (JEDG)