Keynote 2 the challenge of data management in the big data and the underlying enterprise architecture shift
Keynote presentation of the 3rd workshop on Real-time & Stream Analytics in Big Data & Stream Data Management (https://workshop.euranova.eu/bigdata18.html)
3
TWO KEYNOTES
The Workshopcontent
Fabian Hüske co-founder Data
Artisans
Unified Processing of Static and Streaming Data with
SQL on Apache Flink.
55min
Sabri SKHIRI R&D Director EURA
NOVA
The challenge of Data Management in the Big Data Era &
its underlying Enterprise architecture shift
15min
4.
4
THE PAPERS
The WorkshopTopics
Data Streaming
Architecture
CEP / CER Stream Mining
IoT Device
integration
5.
KEYNOTE 1
Unified Processingof Static and Streaming Data with
SQL on Apache Flink.
Fabian Hüske co-founder Data Artisans
6.
KEYNOTE 2
The challengeof Data Management in the Big Data
Era & its underlying Enterprise architecture shift
Sabri Skhiri Research director @EURA NOVA
7.
Agenda
1. Emerging challengesin data management
2. What is a data architecture?
3. The linkedin/Confluent vision of data architecture
4. Open Challenges
5. Digazu as an implementation
Challenge 1
Application
Enterprise DataWarehouse
data-driven
database
Application Application
algorithm
Decision in
real time
Decision
batch
1
Sharing information in real time
between applications and data
storages
15.
Challenge 2
Application
Enterprise DataWarehouse
data-driven
database
Application Application
algorithm
Decision in
real time
Decision
batch
2
Implementing algorithms in a
real-time-driven environment
16.
Challenge 3
Application
Enterprise DataWarehouse
data-driven
database
Application Application
algorithm
Decision in
real time
Decision
batch
3
Online / Incremental /
Reinforcement learning
17.
Challenge 4
Application
Enterprise DataWarehouse
data-driven
database
Application Application
algorithm
Decision in
real time
Decision
batch4
Integration strategy BI-Datalake
18.
Challenge 5
Application
Enterprise DataWarehouse
data-driven
database
Application Application
algorithm
Decision in
real time
Decision
batch 5
GDPR Compliance
Challenge 6
Application
Enterprise DataWarehouse
data-driven
database
Application Application
algorithm
Decision in
real time
Decision
batch
6
Data Governance
-data lineage
-where is my data?
-data meaning
21.
With these 6features...
1
Sharing information in real time between applications and data storages
2
Implementing algorithms in a real-time-driven environment
3
Online / Incremental / Reinforcement learning
4
Integration strategy BI-Datalake
5
GDPR Compliance
6
Data Governance -data lineage -where is my data? -data meaning
22.
...the business strategyis supported
And it is called a “data architecture”
The objectives of your company + The new Customer’s behaviour
23.
What is aData
Architecture?
Organising your data strategy
24.
What is aData architecture?
A global plan depicting how to collect, store, use, &
manage data
App. 1
App. 2
...
App. N
Analytics layerExposure layer
Governancelayer
Securitylayer
Storage layer
Users
Data processes
(Create, Read, Update, Delete)
Questions
● Where is the master data?
● How do we manage the
replica's consistency ?
● Where are the data?
● How to use the data in apps or
analytics?
● Best technology stack ?
● Convergence of BI/Analytics ?
(The 3 DW from Gartner)
● How to productize predictive
models?
● What about data governance
processes?
25.
3 needs inEnterprises
3 facets of the same story
business teams
want to implement
use cases
CDO
wants to mutualise
the use cases
IT
want to set up the
right infrastructure
Point-to-point data architecture
Everynew use case increases
maintenance cost.
The more I stick to the roadmap of
company use cases, the higher
exploitation cost of data is.
Problem
IT ENTROPY
DATA TCO
33.
The Story ofthe Data stream
The new wave of architecture
1
2 3
4
5
We can use these patterns in
1. DATA ARCHITECTURE
2. SERVICE ARCHITECTURE
https://data-artisans.com/flink-forward-berlin/resources/the-convergence-of-stream-processing-and-microservice-architecture
34.
34
Apps Apps AppsApps
OLAPNewsfeedSearch
Social
Graph
Log
Search
Monitoring
Security
RT analytic
Samza
Apps
Apps
Stream
Data
Platform
Hadoop
Key
value
storage
Oracle
Teradata
FIRST EFFICIENT SOLUTION @LINKEDIN
DECOUPLING DATA PRODUCERS & CONSUMERS
36
Apps Apps AppsApps
OLAPNewsfeedSearch
Social
Graph
Log
Search
Monitoring
Security
RT analytic
Samza
Apps
Apps
Stream
Data
Platform
Hadoop
Key
value
storage
Oracle
Teradata
Open challenges
STILL A LOT OF QUESTIONS
Governance?
Data exposure
management?
Security &
regulation?
Data Transf.?
ETL?
History
Management in
data lake ?
Integration with
Data Science
Workbench
Integration
with EDW ?
37.
Data Warehouse
Historical Storage
Layer
37
THEDAV: FUNCTIONAL COMPONENTS
THE RESULT OF 7 YEARS OF R&D @EURANOVA ON DATA MANAGEMENT
Operational
System 1
Operational
System 2
Operational
System 3
Applications
Data Profiling
Profiling
Lake
Access & Policy Manager
Audit & Reporting
Management
Lineage tracker
CIM & Data Location
Tracker
Governance
Stack
Governance
BI Stack
Data Analytics
Lab
DAL
Data Service
Gateway
Derived- views
Transformer
Layer
Transformer
Data Collector
Policy
Interceptor
CEP
Interceptor
Collector
External sources of
data
Existing operational
systems
Existing EDW/BI
tooling
DIGAZU
components
Labels
Legend:
External data
38.
Data Warehouse
Historical Storage
Layer
38
FROMARCHITECTURE TO PRODUCT
DATA & IGAZU FALLS => DIGAZU
Operational
System 1
Operational
System 2
Operational
System 3
Applications
Data Profiling
Profiling
Lake
Access & Policy Manager
Audit & Reporting
Management
Lineage tracker
CIM & Data Location
Tracker
Governance
Stack
Governance
BI Stack
Data Analytics
Lab
DAL
Data Service
Gateway
Derived- views
Transformer
Layer
Transformer
Data Collector
Policy
Interceptor
CEP
Interceptor
Collector
External sources of
data
Existing operational
systems
Existing EDW/BI
tooling
DIGAZU
components
Labels
Legend:
External data
40
digazu
40
is an endto end data engineering platform which
includes
○ data integration
○ data preparation and
○ data lake.
connects to many data sources, collects only once & streams
the data to all data consumers.
41.
data scientists
marketing teams
Sources
Live360° view
Context-aware
services
Business
Intelligence
Cubes
Data Analytics
Lab
data
warehouse
open sources
and third
parties
connected
homes
legacy
systems
smartwatches
still unused
databases
Usages Users
connected
devices
sensors
42.
data scientists
marketing teams
Sources
Live360° view
Context-aware
services
Business
Intelligence
Cubes
Data Analytics
Lab
Usages Users
Data lake
Transformation layer
Collector
Distributor
Exploration tool
data
warehouse
open sources
and third
parties
connected
homes
legacy
systems
smartwatches
still unused
databases
43.
Data lake
Transformation layer
Collector
Distributor
Explorationtool
1 Stop-shop data management
Historical data management
Real-time & batch data pipeline management
Real-time enrichment process management (built-in)
Data Registry
Connector for files, RDB, Kafka, NoSQL
Fully elastic
GDPR-ready
Data Governance pre-built connector
https://digazu.com/
45
CONCLUSION
Key takeaways
The digitaltransformation drivers all rely on data
New Customer behaviors and direct interaction require a new way think about data
architecture
DATA CAN BE SHARED THROUGH STREAMS APPLYING KAPPA-stlyle ARCHITECTURE
=>APPLY FOR EITHER APPLICATIONS OR DATA
YOU STILL NEED TO PUT IN PLACE A GLOBAL DATA MANAGEMENT STRATEGY
(GOVERNANCE, SECURITY, REGULATION, INTEGRATION WITH EDWH)