Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Data Unicorns

603 views

Published on

An overview of Data Centric trends, analytics, data science and the journey to becoming a data driven + data centric business.

Published in: Technology
  • Be the first to comment

The Data Unicorns

  1. 1. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 1 STKI Summit 2019 THE DATA UNICORNS
  2. 2. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 2 Main Themes 2019 for Data and Analytics DATA-CENTRIC THE DATA DEBT 01 02 03 06 05 08 07 04 Applications, processes & decisions becoming data- centric Data Catalogs proliferation But Lack of data ownership and strategy remains REAL PROBLEMS Use of Design Thinking and Empathy concepts to solve REAL problems DATA LITERACY The data “language” in organizations will increase DATA SCIENCE FOR ALL AI, ML and Automation will empower citizen DS DATA PRODUCTS “Data product managers” will manage the entire lifecycle DATA TEAMS Agile-like teams will collaborate around data “products” AUTOMATION Automation in data management data science processes STKI Summit 2019
  3. 3. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 3 ARE WE READY FOR A DATA-CENTRIC REALITY? Intelligent Automation Seamless Experiences AI-fueled processes
  4. 4. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 4 Payroll Sales Call center Software Software Software Infra Infra Infra Developers Developers Developers Users Users Silo Silo Silo Application Centric Computing (systems of transactions) Customer Facing Computing (systems of engagement) DATA Centric Computing (systems of decisions) Automation Revolution (Preemptive) AI/ML/DL Data Science Intelligence Systems Human/ Machine Workforce IoT Process Engineering UsersUsers Digital (forced) Transformation Channels APIs AGILE Customer Journey UX Marketing Automation RPA Data Analytics
  5. 5. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 5 of organizations have adopted or have plans to adopt AI in the next 5 years (IDC) AI-Driven companies will steal $1.2 Trillion from competitors by 2020 The race for AI
  6. 6. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 6 63% of CEOs think AI will have a greater impact than the internet Source: PWC The race for AI
  7. 7. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 7 Like it or not, the AGE of AUTOMATION is here Ratio of human-machine working hours – 2018 vs. 2022 human machinehuman machine
  8. 8. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 8 Processes and business operations rely on data This means future businesses will be DATA CENTRIC
  9. 9. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 9 Source: PWC 10-year old gap! WHAT DO CEOs SAY ABOUT THEIR OWN DATA GAP?
  10. 10. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 10 3 reasons for this gap: lack of analytical Mindset Data Siloing Poor data reliability
  11. 11. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 11 DATA DRIVEN is more of a cultural thing Being data-driven means that people’s decisions & actions rely on data
  12. 12. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 12 DATA LITERACY* is a new language, and we all need to be fluent in it *Data literacy: the ability to read, write and communicate data in context
  13. 13. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 13 Source: The data literacy project
  14. 14. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 14 POOR DATA LITERACY IS A MAJOR ROAD BLOCK FOR CDOs Source: Gartner CDO Survey
  15. 15. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 15 thedataliteracyproject.org A global community dedicated to building a data-literate culture for all
  16. 16. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 16 The rise of CDOs Source: Gartner CDO Survey 29 FTEs reporting directly to CDOs 25% increase in CDOs funding Will be Mission-critical function in 75% of orgs.
  17. 17. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 17 Source: Gartner CDO Survey Risk Mitigation Cost Cutting Value Creation 27% 28% 45% CDOs time allocation:
  18. 18. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 18 63% 28% DO YOU HAVE A CDO (DATA) IN PLACE? Source: STKI DATA Survey, 2019Source: Gartner CDO Study YES
  19. 19. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 19 CDO survey: Israel STKI CX Survey 2019
  20. 20. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 20 CDO survey: Israel STKI CX Survey 2019
  21. 21. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 21 ONE CDO STRUCTURE >DOESN’T< FIT ALL
  22. 22. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 22 Source: Oracle
  23. 23. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 23 Source: IBM
  24. 24. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 24 WE WANT DATA DEMOCRACY
  25. 25. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 25 WE WANT DATA DEMOCRACY WHO ARE WE? DATA SCIENTISTS! WHAT DO WE WANT? WHEN DO WE WANT IT? NOW!!! SELF SERVICE!
  26. 26. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 26 …and then came the DATA LAKE
  27. 27. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 27 MYTH: REALITY: Data lakes are the answer to data democratization and self service. Let’s upload a lot of data into the lake as quickly as possible.
  28. 28. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 28 This actually created a big problem. Data is not harmonized, data lakes are full of isolated data islands: Organizations widened their data debt
  29. 29. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 29 The DATA DEBT 64% Duplicate data Missing data - Fields that should contain values, but do not. 25% data entry errors No single version of the truth
  30. 30. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 30 “80% of data science is cleaning the data 20 % is complaining about cleaning the data” Source: Kaggle State of Data Science Survey WHAT ARE DATA SCIENTISTS’ MAIN CHALLENGES? 1. Dirty data 2. No access 3. Privacy issues
  31. 31. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 31
  32. 32. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 32 20% 33%600B$ 12% of average data set is dirty is the annual cost to the U.S economy due to bad data of company projects fail because of weak data is the average annual revenue loss (Sources: Springer Link; IBM ) BAD DATA = BAD DECISIONS
  33. 33. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 33 STORE ACCESSDEPLOY PREPARE MODEL 6 1 2 3 5 4 Store the data: DW/ DL/ Data Mart/ Logical DW Transform Clean Understand DATA CENTRIC ARCHITECTURE 6.DEPLOY 1.ACCESS 2.INGEST 3.STORE4.PREPARE Model 5.MODEL Learn Train GOVERN INGEST Run code in operational processes Systems of Decisions Data Dictionary
  34. 34. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 34 From Waterfall to Agile, Iterative Processes
  35. 35. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 35 WHAT’S THE RIGHT BALANCE for a DATA-CENTRIC-READY BUSINESS?
  36. 36. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 36 Define a data governance strategy Enforce a “data catalog” policy Define harmonized data definitions Create a central COE for DS teams HARMONIZED DATA PLATFORM AGILE TEAM AUTOMATION (DATAOPS) 1 2 3 Create data teams/ squads Product owner Automate as many processes as possible in the DS value chain Use DataOps/ MLOps as reference
  37. 37. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 37 Define a data governance strategy Enforce a “data catalog” policy Define harmonized data definitions Create a central COE for DS teams HARMONIZED DATA PLATFORM AGILE TEAM AUTOMATION: DATAOPS 1 2 3 Create data teams/ squads Product owner Automate as many processes as possible in the DS value chain Use DataOps/ MLOps as reference
  38. 38. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 38 What is stopping you from becoming data centric? Source: Atscale Big Data Maturity Report
  39. 39. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 39 Do you need a data catalog? Yes.
  40. 40. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 40 DISCOVERY Easily Search and browse data WHY DO YOU NEED A DATA CATALOG? ENABLES Self service to DS and analysts TAGGING Data is described technical & business CURATION Self service to DS and analysts FEEDBACK Rating and reviews by users BALANCE Between the need to control and to consume AWARENESS Be informed of relevant and available data HARMONIZE Enable single version of the truth
  41. 41. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 41 Through to the end of 2022, manual tasks in data management will be cut by 45% thanks to ML and automated service-level management (Gartner) AUTOMATION in Data Management Cleaning, wrangling, transforming, and loading
  42. 42. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 42 Define a data governance strategy Enforce a “data catalog” policy Define harmonized data definitions Create a central COE for DS teams HARMONIZED DATA PLATFORM AGILE TEAM AUTOMATION: DATAOPS 1 2 3 Create data teams/ squads Product owner Automate as many processes as possible in the DS value chain Use DataOps/ MLOps as reference
  43. 43. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 43 WANTED: Analytics Engineer Research tasks Build/plan models Statistical languages Prototype ML models DATA ENGINEER BUSINESS ANALYST DATA SCIENTIST Ingestion Storage Transformation Preparation Virtualization Enrichment Business Logic Understand the impact to the business R, Python Hadoop Spark Kafka ML Data Visualization Unstructured data Business understanding Communication skills Storytelling skills DB Administration Storage Visualization SQL Data Pipeline Business understanding Communication skills Data Architecture NoSQL Data Integration, ETL, APIs
  44. 44. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 44 Data Engineer Business Analyst Operations Data Scientist
  45. 45. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 45 Define a data governance strategy Enforce a “data catalog” policy Define harmonized data definitions Create a central COE for DS teams HARMONIZED DATA PLATFORM AGILE TEAM AUTOMATION: DATAOPS 1 2 3 Create data teams/ squads Product owner Automate as many processes as possible in the DS value chain Use DataOps/ MLOps as reference
  46. 46. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 46 THE GOAL: Managing data products
  47. 47. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 47 DATAOPS IS NOW A “THING”
  48. 48. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 48 The next evolution: MLOps (“A/B Testing” for DS) ML training (a.k.a model generation, model build or model fit) that generates the model ML inference (a.k.a prediction, scoring, or model serve) that generates the insights.
  49. 49. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 49 Source: EY
  50. 50. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 50 Source: EY Chatbots, NLP/NLG and RPA. Chatbots, NLP/NLG. IPA, ML, NLP/NLG, RPA IPA (Intelligent process automation), ML and RPA Deep Learning, ML and IPA (Intelligent process automation)
  51. 51. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 51 GNS HEALTH: DISCOVERING CAUSAL LINKS AGRICULTURE : FARMERS RECOMMENDATIONS GNS applies ML to find overlooked relationships in patients’ health records. It creates hypotheses to explain it and then suggests which are most likely. Result: GNS uncovered a new drug interaction hidden in unstructured patient notes. AI system provides real-time recommendations for farmers on how to increase productivity (which crops to plant, where to grow, nitrogen in soil…) Result: farmers happy about the crop yields obtained with AI’s guidance. AI solution that improves accuracy of fraud detection. Monitors millions of transactions daily, purchase location, customer behavior, IP addresses… to identify patterns that signal possible fraud. DANSKE BANK: AI FOR FRAUD DETECTION AI & ML USE CASE EXAMPLES Result: Improved fraud detection rate by 50%, decreased false positives by 60%. Investigators can concentrate efforts on flagged transactions.
  52. 52. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 52 OCEAN MEDALLION A “LOVE BOAT” EXPERIENCE DANSKE BANK: AI FOR FRAUD DETECTION Instead of just alleviating the “friction” of typical travel experiences (lines, room keys, paying for things) it will use data to anticipate what you want to do, eat, and see. The medallion can be used to pay; unlock the door to your room as you approach; can be used on the ship’s gambling platform; provide recommendations based on preferences
  53. 53. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 53 35% consider Machine learning models to be ‘black boxes’ (but feel the models can be explained by experts – “explainers”). 10% of the participants are confident of explaining most or all models.
  54. 54. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 54 #MyData
  55. 55. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 55
  56. 56. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 56 What is “personal data”? How do you manage it? Source: Skillzme
  57. 57. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 57 Fair data economy
  58. 58. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 58 42% say that lack of trust prevents them from using digital services Source: Sitra’s 2018 four country survey (Europe: Finland, Netherlands, France, Germany) Trust is built by having the power to influence how your data is used In a survey for IBM, 75 percent of respondents said they will not buy a product from a company – no matter how great the product – if they don’t trust that company to protect their data “Give me your data”
  59. 59. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 59 • Build a data catalog as a “data-lake gatekeeper” • Tackle point-specific data quality projects • Assign mixed data teams and an “agile way of working” for specific dynamic analytic • Automate as much as possible! • Define key business questions: “Start with the problem, not the data” • Design a data governance strategy • Establish CDO-IT-LOBs collaborative processes • Focus on promoting data literacy • Implement DevOps/DataOps principals
  60. 60. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 60 Einat Shimoni EVP & Senior Analyst STKI

×