SlideShare a Scribd company logo
1 of 54
Download to read offline
world of microservices
Flink in Zalando’s
Agenda
Zalando’s Microservices Architecture
Saiki - Data Integration and Distribution at Scale
Flink in a Microservices World
Stream Processing Use Cases:
Business Process Monitoring
Continuous ETL
Future Work
2
About us
Mihail Vieru
Big Data Engineer,
Business Intelligence
Javier López
Big Data Engineer,
Business Intelligence
3
4
15 countries
4 fulfillment centers
~18 million active customers
2.9 billion € revenue 2015
150,000+ products
10,000+ employees
One of Europe's largest
online fashion retailers
Zalando Technology
1100+ TECHNOLOGISTS
Rapidly growing
international team
http://tech.zalando.com
6
VINTAGE ARCHITECTURE
Classical ETL process
Vintage Business Intelligence
Business Logic
Data Warehouse (DWH)
DatabaseDBA
BI
Business Logic
Database
Business Logic
Database
Business Logic
Database
Dev
8
Vintage Business Intelligence
Very stable architecture that is still in use in the oldest
components
Classical ETL process
• Use-case specific
• Usually outputs data into a Data Warehouse
• well structured
• easy to use by the end user (SQL) 9
RADICAL AGILITY
AUTONOMY
MASTERY
PURPOSE
Radical Agility
11
Autonomy
Autonomous teams
• can choose own technology stack
• including persistence layer
• are responsible for operations
• should use isolated AWS accounts
12
Supporting autonomy — Microservices
Business
Logic
Database
RESTAPI
Business
Logic
Database
RESTAPI
Business
Logic
Database
RESTAPI
Business
Logic
Database
RESTAPI
Business
Logic
Database
RESTAPI
Business
Logic
Database
RESTAPI
Business
Logic
Database
RESTAPI
13
Supporting autonomy — Microservices
Business Logic
Database
Team
A
Business Logic
Database
Team
B
RESTAPI
RESTAPI
Applications communicate using REST APIs
Databases hidden behind the walls of AWS VPC
public Internet
14
Supporting autonomy — Microservices
Business Logic
Database
Team
A
Business Logic
Database
Team
B
Classical ETL process is impossible!
RESTAPI
public Internet
RESTAPI
15
Supporting autonomy — Microservices
Business
Logic
Database
RESTAPI
AppA
Business
Logic
Database
RESTAPI
AppB
Business
Logic
Database
RESTAPI
AppC
Business
Logic
Database
RESTAPI
AppD
16
Supporting autonomy — Microservices
Business Intelligence
Business
Logic
Database
RESTAPI
AppA
Business
Logic
Database
RESTAPI
AppB
Business
Logic
Database
RESTAPI
AppC
Business
Logic
Database
RESTAPI
AppD
17
Supporting autonomy — Microservices
App A App B App DApp C BI
Data Warehouse
?
18
SAIKI
SAIKI
Saiki Data Platform
App A App B App DApp C BI
Data Warehouse
20
SAIKI
Saiki — Data Integration & Distribution
App A App B App DApp C BI
Data Warehouse
Buku
21
SAIKI
Saiki — Data Integration & Distribution
App A App B App DApp C BI
Data Warehouse
Buku
AWS S3
Tukang
REST API
22
SAIKI
Saiki — Data Integration & Distribution
App A App B App DApp C BI
Data Warehouse
Buku Tukang
REST API
AWS S3
23
E.g. Forecast DB
Saiki — Data Integration & Distribution
Old Load Process New Load Process
relied on Delta Loads relies on Event Stream
JDBC connection RESTful HTTPS connections
Data Quality could be controlled by BI
independently
trust for Correctness of Data in the
delivery teams
PostgreSQL dependent Independent of the source technology
stack
N to 1 data stream N to M streams, no single data sink
24
BI
Data Warehouse E.g. Forecast DB
SAIKI
Saiki — Data Integration & Distribution
App A App B App DApp C
Buku Tukang
REST API
AWS S3
25
BI
Data Warehouse E.g. Forecast DB
SAIKI
Saiki — Data Integration & Distribution
App A App B App DApp C
Buku Tukang
REST API
Stream Processing
via Apache Flink
AWS S3
26
BI
Data Warehouse E.g. Forecast DB
SAIKI
Saiki — Data Integration & Distribution
App A App B App DApp C
Buku Tukang
REST API
Stream Processing
via Apache Flink Data Lake .
AWS S3
27
FLINK IN A
MICROSERVICES
WORLD
Current state in BI
29
• Centralized ETL tools
(Pentaho DI/Kettle and Oracle Stored Procedures)
• Reporting data does not arrive in real time
• Data to data science & analytical teams is provided using
Exasol DB
• All data comes from SQL databases
Opportunities for Next Gen BI
• Cloud Computing
• Distributed ETL
• Scale
• Access to real time data
• All teams publishing data to central bus (Kafka)
30
Opportunities for Next Gen BI
• Hub for Data teams
• Data Lake provides distributed access and fine
grained security
• Data can be transformed (aggregated, joined, etc.)
before delivering it to data teams
• Non structured data
• “General-purpose data processing engines like Flink or
Spark let you define own data types and functions.” -
Fabian Hueske 31
The right fit
Stream Processing
32
The right fit - Data processing engine (I)
33
Feature Apache Spark 1.5.2 Apache Flink 0.10.1
processing mode micro-batching tuple at a time
temporal processing
support
processing time event time, ingestion time, processing
time
batch processing yes yes
latency seconds sub-second
back pressure handling through manual configuration implicit, through system architecture
state storage distributed dataset in memory distributed key/value store
state access full state scan for each microbatch value lookup by key
state checkpointing yes yes
high availability mode yes yes
The right fit - Data processing engine (II)
34
Feature Apache Spark 1.5.2 Apache Flink 0.10.1
event processing guarantee exactly once exactly once
Apache Kafka connector yes yes
Amazon S3 connector yes yes
operator library neutral ++ (split, windowByCount..)
language support Java, Scala, Python Java, Scala, Python
deployment standalone, YARN, Mesos standalone, YARN
support neutral ++ (mailing list, direct contact & support
from data Artisans)
documentation + neutral
maturity ++ neutral
Saiki Data Platform
Apache Flink
• true stream processing framework
• process events at a consistently high rate with low latency
• scalable
• great community and on-site support from Berlin/ Europe
• university graduates with Flink skills
https://tech.zalando.com/blog/apache-showdown-flink-vs.-spark/
35
Flink in AWS - Our appliance
36
MASTER ELB
EC2
Docker
Flink Master
EC2
Docker
Flink Shadow Master
WORKERS ELB
EC2
Docker
Flink Worker
EC2
Docker
Flink Worker
EC2
Docker
Flink Worker
USE CASES
BUSINESS PROCESS
MONITORING
Business Process
A business process is in its simplest form a chain of
correlated events:
Business Events from the whole Zalando platform flow through
Saiki => opportunity to process those streams in near real-time
39
start event completion event
ORDER_CREATED ALL_SHIPMENTS_DONE
Near Real-Time Business Process Monitoring
• Check if technically the Zalando platform works
• 1000+ Event Types
• Analyze data on the fly:
• Order velocities
• Delivery velocities
• Control SLAs of correlated events, e.g. parcel sent out
after order
40
Architecture BPM
41
App A App B
Nakadi Event Bus
App C
Operational Systems
Kafka2Kafka
Unified Log
Stream Processing Tokoh
ZMON
Cfg Service
PUBLICINTERNET
OAUTH
Saiki BPM
Elasticsearch
How we use Flink in BPM
• 1 Event Type -> 1 Kafka topic
• Define a granularity level of 1 minute (TumblingWindows)
• Analyze processes with multiple event types (Join &
Union)
• Generation and processing of Complex Events (CEP &
State)
42
CONTINUOUS ETL
Extract Transform Load (ETL)
Traditional ETL process:
• Batch processing
• No real time
• ETL tools
• Heavy processing on the storage side
44
What changed with Radical Agility?
• Data comes in a semi-structured format (JSON payload)
• Data is distributed in separate Kafka topics
• There would be peak times, meaning that the data flow
will increase by several factors
• Data sources number increased by several factors
45
Architecture Continuous ETL
46
App A App B
Nakadi Event Bus
App C
Operational Systems
Kafka2Kafka
Unified Log
Stream Processing Saiki Continuous ETL
Tukang
Oracle DWH
Orang
How we (would) use Flink in Continuous ETL
• Transformation of complex payloads into simple ones for
easier consumption in Oracle DWH
• Combine several topics based on Business Rules (Union,
Join)
• Pre-Aggregate data to improve performance in the
generation of reports (Windows, State)
• Data cleansing
• Data validation
47
FUTURE WORK
Complex Event Processing for BPM
Cont. example business process:
• Multiple PARCEL_SHIPPED events per order
• Generate complex event ALL_SHIPMENTS_DONE, when
all PARCEL_SHIPPED events received
(CEP lib, State)
49
Deployments from other BI Teams
Flink Jobs from other BI Teams
Requirements:
• manage and control deployments
• isolation of data flows
• prevent different jobs from writing to the same sink
• resource management in Flink
• share cluster resources among concurrently running jobs
StreamSQL would significantly lower the entry barrier
50
Replace Kafka2Kafka
• Python app
• extracts events from REST API Nakadi Event Bus
• writes them to our Kafka cluster
Idea: Replace Python app with Flink Jobs, which would write
directly to Saiki’s Kafka cluster (under discussion)
51
Other Future Topics
• CD integration for Flink appliance
• Monitoring data flows via heartbeats
• Flink <-> Kafka
• Flink <-> Elasticsearch
• New use cases for Real Time Analytics/ BI
• Sales monitoring
52
THANK YOU
Open Source @ZalandoTech
https://zalando.github.io/
https://tech.zalando.de/blog
https://github.com/zalando/saiki/wiki
54
QUESTIONS?

More Related Content

What's hot

Geospatial Analytics at Scale with Deep Learning and Apache Spark with Tim hu...
Geospatial Analytics at Scale with Deep Learning and Apache Spark with Tim hu...Geospatial Analytics at Scale with Deep Learning and Apache Spark with Tim hu...
Geospatial Analytics at Scale with Deep Learning and Apache Spark with Tim hu...Databricks
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconYiwei Ma
 
Solr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for HadoopSolr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for Hadoopgregchanan
 
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)Spark Summit
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingDatabricks
 
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey Kharlamov
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey KharlamovRUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey Kharlamov
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey KharlamovBig Data Spain
 
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...Databricks
 
Lambda architecture with Spark
Lambda architecture with SparkLambda architecture with Spark
Lambda architecture with SparkVincent GALOPIN
 
Spark and Spark Streaming
Spark and Spark StreamingSpark and Spark Streaming
Spark and Spark Streaming宇 傅
 
Lambda architecture @ Indix
Lambda architecture @ IndixLambda architecture @ Indix
Lambda architecture @ IndixRajesh Muppalla
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun JeongSpark Summit
 
Spark summit 2019 infrastructure for deep learning in apache spark 0425
Spark summit 2019 infrastructure for deep learning in apache spark 0425Spark summit 2019 infrastructure for deep learning in apache spark 0425
Spark summit 2019 infrastructure for deep learning in apache spark 0425Wee Hyong Tok
 
Spark - Migration Story
Spark - Migration Story Spark - Migration Story
Spark - Migration Story Roman Chukh
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeDatabricks
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark Juan Pedro Moreno
 
Advanced Natural Language Processing with Apache Spark NLP
Advanced Natural Language Processing with Apache Spark NLPAdvanced Natural Language Processing with Apache Spark NLP
Advanced Natural Language Processing with Apache Spark NLPDatabricks
 
Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...
Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...
Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...Databricks
 
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...Alex Zeltov
 

What's hot (20)

Geospatial Analytics at Scale with Deep Learning and Apache Spark with Tim hu...
Geospatial Analytics at Scale with Deep Learning and Apache Spark with Tim hu...Geospatial Analytics at Scale with Deep Learning and Apache Spark with Tim hu...
Geospatial Analytics at Scale with Deep Learning and Apache Spark with Tim hu...
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
 
Deeplearning
Deeplearning Deeplearning
Deeplearning
 
Solr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for HadoopSolr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for Hadoop
 
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured Streaming
 
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey Kharlamov
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey KharlamovRUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey Kharlamov
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey Kharlamov
 
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...
 
Lambda architecture with Spark
Lambda architecture with SparkLambda architecture with Spark
Lambda architecture with Spark
 
Spark and Spark Streaming
Spark and Spark StreamingSpark and Spark Streaming
Spark and Spark Streaming
 
Lambda architecture @ Indix
Lambda architecture @ IndixLambda architecture @ Indix
Lambda architecture @ Indix
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun Jeong
 
Spark summit 2019 infrastructure for deep learning in apache spark 0425
Spark summit 2019 infrastructure for deep learning in apache spark 0425Spark summit 2019 infrastructure for deep learning in apache spark 0425
Spark summit 2019 infrastructure for deep learning in apache spark 0425
 
Spark - Migration Story
Spark - Migration Story Spark - Migration Story
Spark - Migration Story
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark
 
Advanced Natural Language Processing with Apache Spark NLP
Advanced Natural Language Processing with Apache Spark NLPAdvanced Natural Language Processing with Apache Spark NLP
Advanced Natural Language Processing with Apache Spark NLP
 
Quark Virtualization Engine for Analytics
Quark Virtualization Engine for Analytics Quark Virtualization Engine for Analytics
Quark Virtualization Engine for Analytics
 
Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...
Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...
Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...
 
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
 

Viewers also liked

The ultimate benchmark
The ultimate benchmarkThe ultimate benchmark
The ultimate benchmarkarif1948
 
John Resume17(default)
John Resume17(default)John Resume17(default)
John Resume17(default)John Hinton
 
Paul charife-allen resume-it security
Paul charife-allen resume-it securityPaul charife-allen resume-it security
Paul charife-allen resume-it securityPaul-Charife Allen
 
WEF_FinancialDevelopmentReport_2010
WEF_FinancialDevelopmentReport_2010WEF_FinancialDevelopmentReport_2010
WEF_FinancialDevelopmentReport_2010Eva Gustavsson
 
Dispositivos De Almacenamiento, Maria Belen
Dispositivos De Almacenamiento, Maria BelenDispositivos De Almacenamiento, Maria Belen
Dispositivos De Almacenamiento, Maria Belenguestecf5a92c
 
Daily Forex News February 18th 2013
Daily Forex News February 18th 2013Daily Forex News February 18th 2013
Daily Forex News February 18th 2013FCTOFX
 
ποντιακη κουζινα
ποντιακη κουζιναποντιακη κουζινα
ποντιακη κουζιναeirkara
 
Diagrama de actividades power point
Diagrama de actividades power pointDiagrama de actividades power point
Diagrama de actividades power pointarteaga22
 
Origen Del Relieve
Origen Del RelieveOrigen Del Relieve
Origen Del Relievemanuelmch
 
Nuevas tecnologías en el estudio del medio ambiente
Nuevas tecnologías en el estudio del medio ambienteNuevas tecnologías en el estudio del medio ambiente
Nuevas tecnologías en el estudio del medio ambientegabrielavargas123
 

Viewers also liked (13)

The ultimate benchmark
The ultimate benchmarkThe ultimate benchmark
The ultimate benchmark
 
Alhuda CIBE - UK Real Estate Investment Trust an overview
Alhuda CIBE - UK Real Estate Investment Trust an overviewAlhuda CIBE - UK Real Estate Investment Trust an overview
Alhuda CIBE - UK Real Estate Investment Trust an overview
 
John Resume17(default)
John Resume17(default)John Resume17(default)
John Resume17(default)
 
Paul charife-allen resume-it security
Paul charife-allen resume-it securityPaul charife-allen resume-it security
Paul charife-allen resume-it security
 
WEF_FinancialDevelopmentReport_2010
WEF_FinancialDevelopmentReport_2010WEF_FinancialDevelopmentReport_2010
WEF_FinancialDevelopmentReport_2010
 
Dispositivos De Almacenamiento, Maria Belen
Dispositivos De Almacenamiento, Maria BelenDispositivos De Almacenamiento, Maria Belen
Dispositivos De Almacenamiento, Maria Belen
 
CV ERZALINA PDF
CV ERZALINA PDFCV ERZALINA PDF
CV ERZALINA PDF
 
Daily Forex News February 18th 2013
Daily Forex News February 18th 2013Daily Forex News February 18th 2013
Daily Forex News February 18th 2013
 
ποντιακη κουζινα
ποντιακη κουζιναποντιακη κουζινα
ποντιακη κουζινα
 
Diagrama de actividades power point
Diagrama de actividades power pointDiagrama de actividades power point
Diagrama de actividades power point
 
Origen Del Relieve
Origen Del RelieveOrigen Del Relieve
Origen Del Relieve
 
Nuevas tecnologías en el estudio del medio ambiente
Nuevas tecnologías en el estudio del medio ambienteNuevas tecnologías en el estudio del medio ambiente
Nuevas tecnologías en el estudio del medio ambiente
 
Parazitologie: Toxoplasmoza
Parazitologie: ToxoplasmozaParazitologie: Toxoplasmoza
Parazitologie: Toxoplasmoza
 

Similar to Flink in Zalando's world of Microservices

Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...Zalando Technology
 
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...Flink Forward
 
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Value Association
 
data Artisans Product Announcement
data Artisans Product Announcementdata Artisans Product Announcement
data Artisans Product AnnouncementFlink Forward
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flinkconfluent
 
What's new in Elasticsearch v5
What's new in Elasticsearch v5What's new in Elasticsearch v5
What's new in Elasticsearch v5Idan Tohami
 
From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake Edition
From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake EditionFrom BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake Edition
From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake EditionRittman Analytics
 
ETL Is Dead, Long-live Streams
ETL Is Dead, Long-live StreamsETL Is Dead, Long-live Streams
ETL Is Dead, Long-live StreamsC4Media
 
Querona Presentation 2018
Querona Presentation 2018Querona Presentation 2018
Querona Presentation 2018Synergo!
 
AMIS Oracle OpenWorld & CodeOne Review - Pillar 2 - Custom Application Develo...
AMIS Oracle OpenWorld & CodeOne Review - Pillar 2 - Custom Application Develo...AMIS Oracle OpenWorld & CodeOne Review - Pillar 2 - Custom Application Develo...
AMIS Oracle OpenWorld & CodeOne Review - Pillar 2 - Custom Application Develo...Lucas Jellema
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkFabian Hueske
 
Apache Flink: Past, Present and Future
Apache Flink: Past, Present and FutureApache Flink: Past, Present and Future
Apache Flink: Past, Present and FutureGyula Fóra
 
The Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingTimothy Spann
 
Gs08 modernize your data platform with sql technologies wash dc
Gs08 modernize your data platform with sql technologies   wash dcGs08 modernize your data platform with sql technologies   wash dc
Gs08 modernize your data platform with sql technologies wash dcBob Ward
 
Data Infrastructure at LinkedIn
Data Infrastructure at LinkedInData Infrastructure at LinkedIn
Data Infrastructure at LinkedInAmy W. Tang
 
Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.Fabian Hueske
 
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIsFabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIsFlink Forward
 
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
Achieve Sub-Second Analytics on Apache Kafka with Confluent and ImplyAchieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Implyconfluent
 

Similar to Flink in Zalando's world of Microservices (20)

Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
 
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
 
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICS
 
data Artisans Product Announcement
data Artisans Product Announcementdata Artisans Product Announcement
data Artisans Product Announcement
 
Oracle OpenWo2014 review part 03 three_paa_s_database
Oracle OpenWo2014 review part 03 three_paa_s_databaseOracle OpenWo2014 review part 03 three_paa_s_database
Oracle OpenWo2014 review part 03 three_paa_s_database
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
What's new in Elasticsearch v5
What's new in Elasticsearch v5What's new in Elasticsearch v5
What's new in Elasticsearch v5
 
From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake Edition
From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake EditionFrom BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake Edition
From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake Edition
 
ETL Is Dead, Long-live Streams
ETL Is Dead, Long-live StreamsETL Is Dead, Long-live Streams
ETL Is Dead, Long-live Streams
 
Querona Presentation 2018
Querona Presentation 2018Querona Presentation 2018
Querona Presentation 2018
 
AMIS Oracle OpenWorld en Code One Review 2018 - Pillar 2: Custom Application ...
AMIS Oracle OpenWorld en Code One Review 2018 - Pillar 2: Custom Application ...AMIS Oracle OpenWorld en Code One Review 2018 - Pillar 2: Custom Application ...
AMIS Oracle OpenWorld en Code One Review 2018 - Pillar 2: Custom Application ...
 
AMIS Oracle OpenWorld & CodeOne Review - Pillar 2 - Custom Application Develo...
AMIS Oracle OpenWorld & CodeOne Review - Pillar 2 - Custom Application Develo...AMIS Oracle OpenWorld & CodeOne Review - Pillar 2 - Custom Application Develo...
AMIS Oracle OpenWorld & CodeOne Review - Pillar 2 - Custom Application Develo...
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
 
Apache Flink: Past, Present and Future
Apache Flink: Past, Present and FutureApache Flink: Past, Present and Future
Apache Flink: Past, Present and Future
 
The Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and Streaming
 
Gs08 modernize your data platform with sql technologies wash dc
Gs08 modernize your data platform with sql technologies   wash dcGs08 modernize your data platform with sql technologies   wash dc
Gs08 modernize your data platform with sql technologies wash dc
 
Data Infrastructure at LinkedIn
Data Infrastructure at LinkedInData Infrastructure at LinkedIn
Data Infrastructure at LinkedIn
 
Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.
 
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIsFabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
 
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
Achieve Sub-Second Analytics on Apache Kafka with Confluent and ImplyAchieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
 

Recently uploaded

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 

Recently uploaded (20)

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

Flink in Zalando's world of Microservices

  • 2. Agenda Zalando’s Microservices Architecture Saiki - Data Integration and Distribution at Scale Flink in a Microservices World Stream Processing Use Cases: Business Process Monitoring Continuous ETL Future Work 2
  • 3. About us Mihail Vieru Big Data Engineer, Business Intelligence Javier López Big Data Engineer, Business Intelligence 3
  • 4. 4
  • 5. 15 countries 4 fulfillment centers ~18 million active customers 2.9 billion € revenue 2015 150,000+ products 10,000+ employees One of Europe's largest online fashion retailers
  • 6. Zalando Technology 1100+ TECHNOLOGISTS Rapidly growing international team http://tech.zalando.com 6
  • 8. Classical ETL process Vintage Business Intelligence Business Logic Data Warehouse (DWH) DatabaseDBA BI Business Logic Database Business Logic Database Business Logic Database Dev 8
  • 9. Vintage Business Intelligence Very stable architecture that is still in use in the oldest components Classical ETL process • Use-case specific • Usually outputs data into a Data Warehouse • well structured • easy to use by the end user (SQL) 9
  • 12. Autonomy Autonomous teams • can choose own technology stack • including persistence layer • are responsible for operations • should use isolated AWS accounts 12
  • 13. Supporting autonomy — Microservices Business Logic Database RESTAPI Business Logic Database RESTAPI Business Logic Database RESTAPI Business Logic Database RESTAPI Business Logic Database RESTAPI Business Logic Database RESTAPI Business Logic Database RESTAPI 13
  • 14. Supporting autonomy — Microservices Business Logic Database Team A Business Logic Database Team B RESTAPI RESTAPI Applications communicate using REST APIs Databases hidden behind the walls of AWS VPC public Internet 14
  • 15. Supporting autonomy — Microservices Business Logic Database Team A Business Logic Database Team B Classical ETL process is impossible! RESTAPI public Internet RESTAPI 15
  • 16. Supporting autonomy — Microservices Business Logic Database RESTAPI AppA Business Logic Database RESTAPI AppB Business Logic Database RESTAPI AppC Business Logic Database RESTAPI AppD 16
  • 17. Supporting autonomy — Microservices Business Intelligence Business Logic Database RESTAPI AppA Business Logic Database RESTAPI AppB Business Logic Database RESTAPI AppC Business Logic Database RESTAPI AppD 17
  • 18. Supporting autonomy — Microservices App A App B App DApp C BI Data Warehouse ? 18
  • 19. SAIKI
  • 20. SAIKI Saiki Data Platform App A App B App DApp C BI Data Warehouse 20
  • 21. SAIKI Saiki — Data Integration & Distribution App A App B App DApp C BI Data Warehouse Buku 21
  • 22. SAIKI Saiki — Data Integration & Distribution App A App B App DApp C BI Data Warehouse Buku AWS S3 Tukang REST API 22
  • 23. SAIKI Saiki — Data Integration & Distribution App A App B App DApp C BI Data Warehouse Buku Tukang REST API AWS S3 23 E.g. Forecast DB
  • 24. Saiki — Data Integration & Distribution Old Load Process New Load Process relied on Delta Loads relies on Event Stream JDBC connection RESTful HTTPS connections Data Quality could be controlled by BI independently trust for Correctness of Data in the delivery teams PostgreSQL dependent Independent of the source technology stack N to 1 data stream N to M streams, no single data sink 24
  • 25. BI Data Warehouse E.g. Forecast DB SAIKI Saiki — Data Integration & Distribution App A App B App DApp C Buku Tukang REST API AWS S3 25
  • 26. BI Data Warehouse E.g. Forecast DB SAIKI Saiki — Data Integration & Distribution App A App B App DApp C Buku Tukang REST API Stream Processing via Apache Flink AWS S3 26
  • 27. BI Data Warehouse E.g. Forecast DB SAIKI Saiki — Data Integration & Distribution App A App B App DApp C Buku Tukang REST API Stream Processing via Apache Flink Data Lake . AWS S3 27
  • 29. Current state in BI 29 • Centralized ETL tools (Pentaho DI/Kettle and Oracle Stored Procedures) • Reporting data does not arrive in real time • Data to data science & analytical teams is provided using Exasol DB • All data comes from SQL databases
  • 30. Opportunities for Next Gen BI • Cloud Computing • Distributed ETL • Scale • Access to real time data • All teams publishing data to central bus (Kafka) 30
  • 31. Opportunities for Next Gen BI • Hub for Data teams • Data Lake provides distributed access and fine grained security • Data can be transformed (aggregated, joined, etc.) before delivering it to data teams • Non structured data • “General-purpose data processing engines like Flink or Spark let you define own data types and functions.” - Fabian Hueske 31
  • 32. The right fit Stream Processing 32
  • 33. The right fit - Data processing engine (I) 33 Feature Apache Spark 1.5.2 Apache Flink 0.10.1 processing mode micro-batching tuple at a time temporal processing support processing time event time, ingestion time, processing time batch processing yes yes latency seconds sub-second back pressure handling through manual configuration implicit, through system architecture state storage distributed dataset in memory distributed key/value store state access full state scan for each microbatch value lookup by key state checkpointing yes yes high availability mode yes yes
  • 34. The right fit - Data processing engine (II) 34 Feature Apache Spark 1.5.2 Apache Flink 0.10.1 event processing guarantee exactly once exactly once Apache Kafka connector yes yes Amazon S3 connector yes yes operator library neutral ++ (split, windowByCount..) language support Java, Scala, Python Java, Scala, Python deployment standalone, YARN, Mesos standalone, YARN support neutral ++ (mailing list, direct contact & support from data Artisans) documentation + neutral maturity ++ neutral
  • 35. Saiki Data Platform Apache Flink • true stream processing framework • process events at a consistently high rate with low latency • scalable • great community and on-site support from Berlin/ Europe • university graduates with Flink skills https://tech.zalando.com/blog/apache-showdown-flink-vs.-spark/ 35
  • 36. Flink in AWS - Our appliance 36 MASTER ELB EC2 Docker Flink Master EC2 Docker Flink Shadow Master WORKERS ELB EC2 Docker Flink Worker EC2 Docker Flink Worker EC2 Docker Flink Worker
  • 39. Business Process A business process is in its simplest form a chain of correlated events: Business Events from the whole Zalando platform flow through Saiki => opportunity to process those streams in near real-time 39 start event completion event ORDER_CREATED ALL_SHIPMENTS_DONE
  • 40. Near Real-Time Business Process Monitoring • Check if technically the Zalando platform works • 1000+ Event Types • Analyze data on the fly: • Order velocities • Delivery velocities • Control SLAs of correlated events, e.g. parcel sent out after order 40
  • 41. Architecture BPM 41 App A App B Nakadi Event Bus App C Operational Systems Kafka2Kafka Unified Log Stream Processing Tokoh ZMON Cfg Service PUBLICINTERNET OAUTH Saiki BPM Elasticsearch
  • 42. How we use Flink in BPM • 1 Event Type -> 1 Kafka topic • Define a granularity level of 1 minute (TumblingWindows) • Analyze processes with multiple event types (Join & Union) • Generation and processing of Complex Events (CEP & State) 42
  • 44. Extract Transform Load (ETL) Traditional ETL process: • Batch processing • No real time • ETL tools • Heavy processing on the storage side 44
  • 45. What changed with Radical Agility? • Data comes in a semi-structured format (JSON payload) • Data is distributed in separate Kafka topics • There would be peak times, meaning that the data flow will increase by several factors • Data sources number increased by several factors 45
  • 46. Architecture Continuous ETL 46 App A App B Nakadi Event Bus App C Operational Systems Kafka2Kafka Unified Log Stream Processing Saiki Continuous ETL Tukang Oracle DWH Orang
  • 47. How we (would) use Flink in Continuous ETL • Transformation of complex payloads into simple ones for easier consumption in Oracle DWH • Combine several topics based on Business Rules (Union, Join) • Pre-Aggregate data to improve performance in the generation of reports (Windows, State) • Data cleansing • Data validation 47
  • 49. Complex Event Processing for BPM Cont. example business process: • Multiple PARCEL_SHIPPED events per order • Generate complex event ALL_SHIPMENTS_DONE, when all PARCEL_SHIPPED events received (CEP lib, State) 49
  • 50. Deployments from other BI Teams Flink Jobs from other BI Teams Requirements: • manage and control deployments • isolation of data flows • prevent different jobs from writing to the same sink • resource management in Flink • share cluster resources among concurrently running jobs StreamSQL would significantly lower the entry barrier 50
  • 51. Replace Kafka2Kafka • Python app • extracts events from REST API Nakadi Event Bus • writes them to our Kafka cluster Idea: Replace Python app with Flink Jobs, which would write directly to Saiki’s Kafka cluster (under discussion) 51
  • 52. Other Future Topics • CD integration for Flink appliance • Monitoring data flows via heartbeats • Flink <-> Kafka • Flink <-> Elasticsearch • New use cases for Real Time Analytics/ BI • Sales monitoring 52