2
THE PROGRAM COMMITTEE
The brains behind the workshop
3
TWO KEYNOTES
The Workshop content
Fabian Hüske co-founder Data
Artisans
Unified Processing of Static and Streaming Data with
SQL on Apache Flink.
55min
Sabri SKHIRI R&D Director EURA
NOVA
The challenge of Data Management in the Big Data Era &
its underlying Enterprise architecture shift
15min
4
THE PAPERS
The Workshop Topics
Data Streaming
Architecture
CEP / CER Stream Mining
IoT Device
integration
KEYNOTE 1
Unified Processing of Static and Streaming Data with
SQL on Apache Flink.
Fabian Hüske co-founder Data Artisans
KEYNOTE 2
The challenge of Data Management in the Big Data
Era & its underlying Enterprise architecture shift
Sabri Skhiri Research director @EURA NOVA
Agenda
1. Emerging challenges in data management
2. What is a data architecture?
3. The linkedin/Confluent vision of data architecture
4. Open Challenges
5. Digazu as an implementation
Emerging challenges in
data management
Supporting Digital Transformations
The objectives of your company & The new Customer’s behaviour
What is the Current Situation?
The Objectives of your Company
make new revenues
reduce cost better
operations
The New Customer’s behaviour
Require more direct interactions
Chat Bots & Context-aware
applications
Insurance Real-Time quotes
Real-Time Marketing
Marketing Automation
Dynamic & adaptive QoE
Proactive Customer Exp.
Management (CEM)
Trends analysis
Before
Application Reporting
Enterprise Data Warehouse
DataMart
Decision
makers
business
logic
database
Needed Architecture
Application
Enterprise Data Warehouse
data-driven
database
Application Application
algorithm
Decision in
real time
Decision
batch
Challenge 1
Application
Enterprise Data Warehouse
data-driven
database
Application Application
algorithm
Decision in
real time
Decision
batch
1
Sharing information in real time
between applications and data
storages
Challenge 2
Application
Enterprise Data Warehouse
data-driven
database
Application Application
algorithm
Decision in
real time
Decision
batch
2
Implementing algorithms in a
real-time-driven environment
Challenge 3
Application
Enterprise Data Warehouse
data-driven
database
Application Application
algorithm
Decision in
real time
Decision
batch
3
Online / Incremental /
Reinforcement learning
Challenge 4
Application
Enterprise Data Warehouse
data-driven
database
Application Application
algorithm
Decision in
real time
Decision
batch4
Integration strategy BI-Datalake
Challenge 5
Application
Enterprise Data Warehouse
data-driven
database
Application Application
algorithm
Decision in
real time
Decision
batch 5
GDPR Compliance
Challenge 5
Application
Enterprise Data Warehouse
data-driven
database
Application Application
algorithm
Decision in
real time
Decision
batch 5
Access policy management
On-purpose storage
● Contracts
● Opt-in
● Legitimate interest
● Regulations
Deletion
GDPR Compliance
Challenge 6
Application
Enterprise Data Warehouse
data-driven
database
Application Application
algorithm
Decision in
real time
Decision
batch
6
Data Governance
-data lineage
-where is my data?
-data meaning
With these 6 features...
1
Sharing information in real time between applications and data storages
2
Implementing algorithms in a real-time-driven environment
3
Online / Incremental / Reinforcement learning
4
Integration strategy BI-Datalake
5
GDPR Compliance
6
Data Governance -data lineage -where is my data? -data meaning
...the business strategy is supported
And it is called a “data architecture”
The objectives of your company + The new Customer’s behaviour
What is a Data
Architecture?
Organising your data strategy
What is a Data architecture?
A global plan depicting how to collect, store, use, &
manage data
App. 1
App. 2
...
App. N
Analytics layerExposure layer
Governancelayer
Securitylayer
Storage layer
Users
Data processes
(Create, Read, Update, Delete)
Questions
● Where is the master data?
● How do we manage the
replica's consistency ?
● Where are the data?
● How to use the data in apps or
analytics?
● Best technology stack ?
● Convergence of BI/Analytics ?
(The 3 DW from Gartner)
● How to productize predictive
models?
● What about data governance
processes?
3 needs in Enterprises
3 facets of the same story
business teams
want to implement
use cases
CDO
wants to mutualise
the use cases
IT
want to set up the
right infrastructure
The foundation
The (simplified) Hadoop ecosystem
26
27
So, you have the choice
Data Architecture
Tooling-Driven
Data
Strategy-Driven
The linkedin/Confluent
Vision
Thinking Data Management as an Event-Driven
Architecture
Point-to-point data architecture
Point-to-point data architecture
Who are the
users?
data scientists /
BI analysts
developers of
data-driven app
data owners
Point-to-point data architecture
Point-to-point data architecture
Every new use case increases
maintenance cost.
The more I stick to the roadmap of
company use cases, the higher
exploitation cost of data is.
Problem
IT ENTROPY
DATA TCO
The Story of the Data stream
The new wave of architecture
1
2 3
4
5
We can use these patterns in
1. DATA ARCHITECTURE
2. SERVICE ARCHITECTURE
https://data-artisans.com/flink-forward-berlin/resources/the-convergence-of-stream-processing-and-microservice-architecture
34
Apps Apps Apps Apps
OLAPNewsfeedSearch
Social
Graph
Log
Search
Monitoring
Security
RT analytic
Samza
Apps
Apps
Stream
Data
Platform
Hadoop
Key
value
storage
Oracle
Teradata
FIRST EFFICIENT SOLUTION @LINKEDIN
DECOUPLING DATA PRODUCERS & CONSUMERS
Open Challenges
Technological challenges and entry barriers
36
Apps Apps Apps Apps
OLAPNewsfeedSearch
Social
Graph
Log
Search
Monitoring
Security
RT analytic
Samza
Apps
Apps
Stream
Data
Platform
Hadoop
Key
value
storage
Oracle
Teradata
Open challenges
STILL A LOT OF QUESTIONS
Governance?
Data exposure
management?
Security &
regulation?
Data Transf.?
ETL?
History
Management in
data lake ?
Integration with
Data Science
Workbench
Integration
with EDW ?
Data Warehouse
Historical Storage
Layer
37
THE DAV: FUNCTIONAL COMPONENTS
THE RESULT OF 7 YEARS OF R&D @EURANOVA ON DATA MANAGEMENT
Operational
System 1
Operational
System 2
Operational
System 3
Applications
Data Profiling
Profiling
Lake
Access & Policy Manager
Audit & Reporting
Management
Lineage tracker
CIM & Data Location
Tracker
Governance
Stack
Governance
BI Stack
Data Analytics
Lab
DAL
Data Service
Gateway
Derived- views
Transformer
Layer
Transformer
Data Collector
Policy
Interceptor
CEP
Interceptor
Collector
External sources of
data
Existing operational
systems
Existing EDW/BI
tooling
DIGAZU
components
Labels
Legend:
External data
Data Warehouse
Historical Storage
Layer
38
FROM ARCHITECTURE TO PRODUCT
DATA & IGAZU FALLS => DIGAZU
Operational
System 1
Operational
System 2
Operational
System 3
Applications
Data Profiling
Profiling
Lake
Access & Policy Manager
Audit & Reporting
Management
Lineage tracker
CIM & Data Location
Tracker
Governance
Stack
Governance
BI Stack
Data Analytics
Lab
DAL
Data Service
Gateway
Derived- views
Transformer
Layer
Transformer
Data Collector
Policy
Interceptor
CEP
Interceptor
Collector
External sources of
data
Existing operational
systems
Existing EDW/BI
tooling
DIGAZU
components
Labels
Legend:
External data
A product strategy for end-to-end data engineering
40
digazu
40
is an end to end data engineering platform which
includes
○ data integration
○ data preparation and
○ data lake.
connects to many data sources, collects only once & streams
the data to all data consumers.
data scientists
marketing teams
Sources
Live 360° view
Context-aware
services
Business
Intelligence
Cubes
Data Analytics
Lab
data
warehouse
open sources
and third
parties
connected
homes
legacy
systems
smartwatches
still unused
databases
Usages Users
connected
devices
sensors
data scientists
marketing teams
Sources
Live 360° view
Context-aware
services
Business
Intelligence
Cubes
Data Analytics
Lab
Usages Users
Data lake
Transformation layer
Collector
Distributor
Exploration tool
data
warehouse
open sources
and third
parties
connected
homes
legacy
systems
smartwatches
still unused
databases
Data lake
Transformation layer
Collector
Distributor
Exploration tool
1 Stop-shop data management
Historical data management
Real-time & batch data pipeline management
Real-time enrichment process management (built-in)
Data Registry
Connector for files, RDB, Kafka, NoSQL
Fully elastic
GDPR-ready
Data Governance pre-built connector
https://digazu.com/
Summary
Key takeaways
45
CONCLUSION
Key takeaways
The digital transformation drivers all rely on data
New Customer behaviors and direct interaction require a new way think about data
architecture
DATA CAN BE SHARED THROUGH STREAMS APPLYING KAPPA-stlyle ARCHITECTURE
=>APPLY FOR EITHER APPLICATIONS OR DATA
YOU STILL NEED TO PUT IN PLACE A GLOBAL DATA MANAGEMENT STRATEGY
(GOVERNANCE, SECURITY, REGULATION, INTEGRATION WITH EDWH)
@sskhiri
@euranova
euranova.eu
research.euranova.eu
CONTACT

Keynote 2 the challenge of data management in the big data and the underlying enterprise architecture shift

  • 2.
    2 THE PROGRAM COMMITTEE Thebrains behind the workshop
  • 3.
    3 TWO KEYNOTES The Workshopcontent Fabian Hüske co-founder Data Artisans Unified Processing of Static and Streaming Data with SQL on Apache Flink. 55min Sabri SKHIRI R&D Director EURA NOVA The challenge of Data Management in the Big Data Era & its underlying Enterprise architecture shift 15min
  • 4.
    4 THE PAPERS The WorkshopTopics Data Streaming Architecture CEP / CER Stream Mining IoT Device integration
  • 5.
    KEYNOTE 1 Unified Processingof Static and Streaming Data with SQL on Apache Flink. Fabian Hüske co-founder Data Artisans
  • 6.
    KEYNOTE 2 The challengeof Data Management in the Big Data Era & its underlying Enterprise architecture shift Sabri Skhiri Research director @EURA NOVA
  • 7.
    Agenda 1. Emerging challengesin data management 2. What is a data architecture? 3. The linkedin/Confluent vision of data architecture 4. Open Challenges 5. Digazu as an implementation
  • 8.
    Emerging challenges in datamanagement Supporting Digital Transformations
  • 9.
    The objectives ofyour company & The new Customer’s behaviour What is the Current Situation?
  • 10.
    The Objectives ofyour Company make new revenues reduce cost better operations
  • 11.
    The New Customer’sbehaviour Require more direct interactions Chat Bots & Context-aware applications Insurance Real-Time quotes Real-Time Marketing Marketing Automation Dynamic & adaptive QoE Proactive Customer Exp. Management (CEM) Trends analysis
  • 12.
    Before Application Reporting Enterprise DataWarehouse DataMart Decision makers business logic database
  • 13.
    Needed Architecture Application Enterprise DataWarehouse data-driven database Application Application algorithm Decision in real time Decision batch
  • 14.
    Challenge 1 Application Enterprise DataWarehouse data-driven database Application Application algorithm Decision in real time Decision batch 1 Sharing information in real time between applications and data storages
  • 15.
    Challenge 2 Application Enterprise DataWarehouse data-driven database Application Application algorithm Decision in real time Decision batch 2 Implementing algorithms in a real-time-driven environment
  • 16.
    Challenge 3 Application Enterprise DataWarehouse data-driven database Application Application algorithm Decision in real time Decision batch 3 Online / Incremental / Reinforcement learning
  • 17.
    Challenge 4 Application Enterprise DataWarehouse data-driven database Application Application algorithm Decision in real time Decision batch4 Integration strategy BI-Datalake
  • 18.
    Challenge 5 Application Enterprise DataWarehouse data-driven database Application Application algorithm Decision in real time Decision batch 5 GDPR Compliance
  • 19.
    Challenge 5 Application Enterprise DataWarehouse data-driven database Application Application algorithm Decision in real time Decision batch 5 Access policy management On-purpose storage ● Contracts ● Opt-in ● Legitimate interest ● Regulations Deletion GDPR Compliance
  • 20.
    Challenge 6 Application Enterprise DataWarehouse data-driven database Application Application algorithm Decision in real time Decision batch 6 Data Governance -data lineage -where is my data? -data meaning
  • 21.
    With these 6features... 1 Sharing information in real time between applications and data storages 2 Implementing algorithms in a real-time-driven environment 3 Online / Incremental / Reinforcement learning 4 Integration strategy BI-Datalake 5 GDPR Compliance 6 Data Governance -data lineage -where is my data? -data meaning
  • 22.
    ...the business strategyis supported And it is called a “data architecture” The objectives of your company + The new Customer’s behaviour
  • 23.
    What is aData Architecture? Organising your data strategy
  • 24.
    What is aData architecture? A global plan depicting how to collect, store, use, & manage data App. 1 App. 2 ... App. N Analytics layerExposure layer Governancelayer Securitylayer Storage layer Users Data processes (Create, Read, Update, Delete) Questions ● Where is the master data? ● How do we manage the replica's consistency ? ● Where are the data? ● How to use the data in apps or analytics? ● Best technology stack ? ● Convergence of BI/Analytics ? (The 3 DW from Gartner) ● How to productize predictive models? ● What about data governance processes?
  • 25.
    3 needs inEnterprises 3 facets of the same story business teams want to implement use cases CDO wants to mutualise the use cases IT want to set up the right infrastructure
  • 26.
  • 27.
    27 So, you havethe choice Data Architecture Tooling-Driven Data Strategy-Driven
  • 28.
    The linkedin/Confluent Vision Thinking DataManagement as an Event-Driven Architecture
  • 29.
  • 30.
    Point-to-point data architecture Whoare the users? data scientists / BI analysts developers of data-driven app data owners
  • 31.
  • 32.
    Point-to-point data architecture Everynew use case increases maintenance cost. The more I stick to the roadmap of company use cases, the higher exploitation cost of data is. Problem IT ENTROPY DATA TCO
  • 33.
    The Story ofthe Data stream The new wave of architecture 1 2 3 4 5 We can use these patterns in 1. DATA ARCHITECTURE 2. SERVICE ARCHITECTURE https://data-artisans.com/flink-forward-berlin/resources/the-convergence-of-stream-processing-and-microservice-architecture
  • 34.
    34 Apps Apps AppsApps OLAPNewsfeedSearch Social Graph Log Search Monitoring Security RT analytic Samza Apps Apps Stream Data Platform Hadoop Key value storage Oracle Teradata FIRST EFFICIENT SOLUTION @LINKEDIN DECOUPLING DATA PRODUCERS & CONSUMERS
  • 35.
  • 36.
    36 Apps Apps AppsApps OLAPNewsfeedSearch Social Graph Log Search Monitoring Security RT analytic Samza Apps Apps Stream Data Platform Hadoop Key value storage Oracle Teradata Open challenges STILL A LOT OF QUESTIONS Governance? Data exposure management? Security & regulation? Data Transf.? ETL? History Management in data lake ? Integration with Data Science Workbench Integration with EDW ?
  • 37.
    Data Warehouse Historical Storage Layer 37 THEDAV: FUNCTIONAL COMPONENTS THE RESULT OF 7 YEARS OF R&D @EURANOVA ON DATA MANAGEMENT Operational System 1 Operational System 2 Operational System 3 Applications Data Profiling Profiling Lake Access & Policy Manager Audit & Reporting Management Lineage tracker CIM & Data Location Tracker Governance Stack Governance BI Stack Data Analytics Lab DAL Data Service Gateway Derived- views Transformer Layer Transformer Data Collector Policy Interceptor CEP Interceptor Collector External sources of data Existing operational systems Existing EDW/BI tooling DIGAZU components Labels Legend: External data
  • 38.
    Data Warehouse Historical Storage Layer 38 FROMARCHITECTURE TO PRODUCT DATA & IGAZU FALLS => DIGAZU Operational System 1 Operational System 2 Operational System 3 Applications Data Profiling Profiling Lake Access & Policy Manager Audit & Reporting Management Lineage tracker CIM & Data Location Tracker Governance Stack Governance BI Stack Data Analytics Lab DAL Data Service Gateway Derived- views Transformer Layer Transformer Data Collector Policy Interceptor CEP Interceptor Collector External sources of data Existing operational systems Existing EDW/BI tooling DIGAZU components Labels Legend: External data
  • 39.
    A product strategyfor end-to-end data engineering
  • 40.
    40 digazu 40 is an endto end data engineering platform which includes ○ data integration ○ data preparation and ○ data lake. connects to many data sources, collects only once & streams the data to all data consumers.
  • 41.
    data scientists marketing teams Sources Live360° view Context-aware services Business Intelligence Cubes Data Analytics Lab data warehouse open sources and third parties connected homes legacy systems smartwatches still unused databases Usages Users connected devices sensors
  • 42.
    data scientists marketing teams Sources Live360° view Context-aware services Business Intelligence Cubes Data Analytics Lab Usages Users Data lake Transformation layer Collector Distributor Exploration tool data warehouse open sources and third parties connected homes legacy systems smartwatches still unused databases
  • 43.
    Data lake Transformation layer Collector Distributor Explorationtool 1 Stop-shop data management Historical data management Real-time & batch data pipeline management Real-time enrichment process management (built-in) Data Registry Connector for files, RDB, Kafka, NoSQL Fully elastic GDPR-ready Data Governance pre-built connector https://digazu.com/
  • 44.
  • 45.
    45 CONCLUSION Key takeaways The digitaltransformation drivers all rely on data New Customer behaviors and direct interaction require a new way think about data architecture DATA CAN BE SHARED THROUGH STREAMS APPLYING KAPPA-stlyle ARCHITECTURE =>APPLY FOR EITHER APPLICATIONS OR DATA YOU STILL NEED TO PUT IN PLACE A GLOBAL DATA MANAGEMENT STRATEGY (GOVERNANCE, SECURITY, REGULATION, INTEGRATION WITH EDWH)
  • 46.