Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Keynote 2 the challenge of data management in the big data and the underlying enterprise architecture shift

114 views

Published on

Keynote presentation of the 3rd workshop on Real-time & Stream Analytics in Big Data & Stream Data Management (https://workshop.euranova.eu/bigdata18.html)

Published in: Software
  • Be the first to comment

  • Be the first to like this

Keynote 2 the challenge of data management in the big data and the underlying enterprise architecture shift

  1. 1. 2 THE PROGRAM COMMITTEE The brains behind the workshop
  2. 2. 3 TWO KEYNOTES The Workshop content Fabian Hüske co-founder Data Artisans Unified Processing of Static and Streaming Data with SQL on Apache Flink. 55min Sabri SKHIRI R&D Director EURA NOVA The challenge of Data Management in the Big Data Era & its underlying Enterprise architecture shift 15min
  3. 3. 4 THE PAPERS The Workshop Topics Data Streaming Architecture CEP / CER Stream Mining IoT Device integration
  4. 4. KEYNOTE 1 Unified Processing of Static and Streaming Data with SQL on Apache Flink. Fabian Hüske co-founder Data Artisans
  5. 5. KEYNOTE 2 The challenge of Data Management in the Big Data Era & its underlying Enterprise architecture shift Sabri Skhiri Research director @EURA NOVA
  6. 6. Agenda 1. Emerging challenges in data management 2. What is a data architecture? 3. The linkedin/Confluent vision of data architecture 4. Open Challenges 5. Digazu as an implementation
  7. 7. Emerging challenges in data management Supporting Digital Transformations
  8. 8. The objectives of your company & The new Customer’s behaviour What is the Current Situation?
  9. 9. The Objectives of your Company make new revenues reduce cost better operations
  10. 10. The New Customer’s behaviour Require more direct interactions Chat Bots & Context-aware applications Insurance Real-Time quotes Real-Time Marketing Marketing Automation Dynamic & adaptive QoE Proactive Customer Exp. Management (CEM) Trends analysis
  11. 11. Before Application Reporting Enterprise Data Warehouse DataMart Decision makers business logic database
  12. 12. Needed Architecture Application Enterprise Data Warehouse data-driven database Application Application algorithm Decision in real time Decision batch
  13. 13. Challenge 1 Application Enterprise Data Warehouse data-driven database Application Application algorithm Decision in real time Decision batch 1 Sharing information in real time between applications and data storages
  14. 14. Challenge 2 Application Enterprise Data Warehouse data-driven database Application Application algorithm Decision in real time Decision batch 2 Implementing algorithms in a real-time-driven environment
  15. 15. Challenge 3 Application Enterprise Data Warehouse data-driven database Application Application algorithm Decision in real time Decision batch 3 Online / Incremental / Reinforcement learning
  16. 16. Challenge 4 Application Enterprise Data Warehouse data-driven database Application Application algorithm Decision in real time Decision batch4 Integration strategy BI-Datalake
  17. 17. Challenge 5 Application Enterprise Data Warehouse data-driven database Application Application algorithm Decision in real time Decision batch 5 GDPR Compliance
  18. 18. Challenge 5 Application Enterprise Data Warehouse data-driven database Application Application algorithm Decision in real time Decision batch 5 Access policy management On-purpose storage ● Contracts ● Opt-in ● Legitimate interest ● Regulations Deletion GDPR Compliance
  19. 19. Challenge 6 Application Enterprise Data Warehouse data-driven database Application Application algorithm Decision in real time Decision batch 6 Data Governance -data lineage -where is my data? -data meaning
  20. 20. With these 6 features... 1 Sharing information in real time between applications and data storages 2 Implementing algorithms in a real-time-driven environment 3 Online / Incremental / Reinforcement learning 4 Integration strategy BI-Datalake 5 GDPR Compliance 6 Data Governance -data lineage -where is my data? -data meaning
  21. 21. ...the business strategy is supported And it is called a “data architecture” The objectives of your company + The new Customer’s behaviour
  22. 22. What is a Data Architecture? Organising your data strategy
  23. 23. What is a Data architecture? A global plan depicting how to collect, store, use, & manage data App. 1 App. 2 ... App. N Analytics layerExposure layer Governancelayer Securitylayer Storage layer Users Data processes (Create, Read, Update, Delete) Questions ● Where is the master data? ● How do we manage the replica's consistency ? ● Where are the data? ● How to use the data in apps or analytics? ● Best technology stack ? ● Convergence of BI/Analytics ? (The 3 DW from Gartner) ● How to productize predictive models? ● What about data governance processes?
  24. 24. 3 needs in Enterprises 3 facets of the same story business teams want to implement use cases CDO wants to mutualise the use cases IT want to set up the right infrastructure
  25. 25. The foundation The (simplified) Hadoop ecosystem 26
  26. 26. 27 So, you have the choice Data Architecture Tooling-Driven Data Strategy-Driven
  27. 27. The linkedin/Confluent Vision Thinking Data Management as an Event-Driven Architecture
  28. 28. Point-to-point data architecture
  29. 29. Point-to-point data architecture Who are the users? data scientists / BI analysts developers of data-driven app data owners
  30. 30. Point-to-point data architecture
  31. 31. Point-to-point data architecture Every new use case increases maintenance cost. The more I stick to the roadmap of company use cases, the higher exploitation cost of data is. Problem IT ENTROPY DATA TCO
  32. 32. The Story of the Data stream The new wave of architecture 1 2 3 4 5 We can use these patterns in 1. DATA ARCHITECTURE 2. SERVICE ARCHITECTURE https://data-artisans.com/flink-forward-berlin/resources/the-convergence-of-stream-processing-and-microservice-architecture
  33. 33. 34 Apps Apps Apps Apps OLAPNewsfeedSearch Social Graph Log Search Monitoring Security RT analytic Samza Apps Apps Stream Data Platform Hadoop Key value storage Oracle Teradata FIRST EFFICIENT SOLUTION @LINKEDIN DECOUPLING DATA PRODUCERS & CONSUMERS
  34. 34. Open Challenges Technological challenges and entry barriers
  35. 35. 36 Apps Apps Apps Apps OLAPNewsfeedSearch Social Graph Log Search Monitoring Security RT analytic Samza Apps Apps Stream Data Platform Hadoop Key value storage Oracle Teradata Open challenges STILL A LOT OF QUESTIONS Governance? Data exposure management? Security & regulation? Data Transf.? ETL? History Management in data lake ? Integration with Data Science Workbench Integration with EDW ?
  36. 36. Data Warehouse Historical Storage Layer 37 THE DAV: FUNCTIONAL COMPONENTS THE RESULT OF 7 YEARS OF R&D @EURANOVA ON DATA MANAGEMENT Operational System 1 Operational System 2 Operational System 3 Applications Data Profiling Profiling Lake Access & Policy Manager Audit & Reporting Management Lineage tracker CIM & Data Location Tracker Governance Stack Governance BI Stack Data Analytics Lab DAL Data Service Gateway Derived- views Transformer Layer Transformer Data Collector Policy Interceptor CEP Interceptor Collector External sources of data Existing operational systems Existing EDW/BI tooling DIGAZU components Labels Legend: External data
  37. 37. Data Warehouse Historical Storage Layer 38 FROM ARCHITECTURE TO PRODUCT DATA & IGAZU FALLS => DIGAZU Operational System 1 Operational System 2 Operational System 3 Applications Data Profiling Profiling Lake Access & Policy Manager Audit & Reporting Management Lineage tracker CIM & Data Location Tracker Governance Stack Governance BI Stack Data Analytics Lab DAL Data Service Gateway Derived- views Transformer Layer Transformer Data Collector Policy Interceptor CEP Interceptor Collector External sources of data Existing operational systems Existing EDW/BI tooling DIGAZU components Labels Legend: External data
  38. 38. A product strategy for end-to-end data engineering
  39. 39. 40 digazu 40 is an end to end data engineering platform which includes ○ data integration ○ data preparation and ○ data lake. connects to many data sources, collects only once & streams the data to all data consumers.
  40. 40. data scientists marketing teams Sources Live 360° view Context-aware services Business Intelligence Cubes Data Analytics Lab data warehouse open sources and third parties connected homes legacy systems smartwatches still unused databases Usages Users connected devices sensors
  41. 41. data scientists marketing teams Sources Live 360° view Context-aware services Business Intelligence Cubes Data Analytics Lab Usages Users Data lake Transformation layer Collector Distributor Exploration tool data warehouse open sources and third parties connected homes legacy systems smartwatches still unused databases
  42. 42. Data lake Transformation layer Collector Distributor Exploration tool 1 Stop-shop data management Historical data management Real-time & batch data pipeline management Real-time enrichment process management (built-in) Data Registry Connector for files, RDB, Kafka, NoSQL Fully elastic GDPR-ready Data Governance pre-built connector https://digazu.com/
  43. 43. Summary Key takeaways
  44. 44. 45 CONCLUSION Key takeaways The digital transformation drivers all rely on data New Customer behaviors and direct interaction require a new way think about data architecture DATA CAN BE SHARED THROUGH STREAMS APPLYING KAPPA-stlyle ARCHITECTURE =>APPLY FOR EITHER APPLICATIONS OR DATA YOU STILL NEED TO PUT IN PLACE A GLOBAL DATA MANAGEMENT STRATEGY (GOVERNANCE, SECURITY, REGULATION, INTEGRATION WITH EDWH)
  45. 45. @sskhiri @euranova euranova.eu research.euranova.eu CONTACT

×