SlideShare a Scribd company logo
1 of 34
KONSTANTIN KNAUF, SOLUTION ARCHITECT
STREAM PROCESSING
TAKES ON EVERYTHING
© 2018 data Artisans2
About Data Artisans
Original Creators of
Apache Flink®
Enterprise Ready
Real Time Stream Processing
© 2018 data Artisans3
Stream Processing
Your
Code
one at a time
event processing
...
State
© 2018 data Artisans4
Streaming Applications over Time
Real time
Approximate
Analytics
Fraud
Detection
Online
Machine
Learning
Realtime
ETL
Intrusion
Prevention
Financial Risk &
Reporting
Real time
dashboards
Anomaly
Detection
Logistics
Tracking
Recommender
Systems
Web
Applications
Masterdata
Management
Continuous
Processing
Continuous
Processing
and Analytics
Unification of
Analytics and
Applications
Data-driven
Applications
© 2018 data Artisans5
Enablers of new Applications
Abstractions,
APIs
Consistency
Event-time, streaming SQL,
state & time, CEP
Exactly-once,
savepoints
Interoperability
Deployment, Connectors,
Operations
Scalability
high parallelism, large
state
Scalable timers,
dynamic scaling,
local recovery, …
Framework and Library,
REST-ified Flink,
SQL Client, …
© 2018 data Artisans6
STREAM PROCESSING
TAKES ON ACID
© 2018 data Artisans7
Exactly-once Changed Applications
Stateless
Application
K/V Store
CRUD / request/response
Applications
Streaming
Application
State
Stateful Stream
Processing Applications
… become …
© 2018 data Artisans8
Some Applications don't move to Stream
Processing
Application
Relational Database
© 2018 data Artisans9
Limitation of Current Stream Processors
Transferring money from one account (key) to
another with transactional guarantees is not feasible
The Limitation Example
All stream processors so far can
update a single key-at-a-time
with correctness guarantees
(exactly once)
© 2018 data Artisans10
With dA Streaming Ledger you can ...
•… access and update state with multiple keys at the same time
•… maintain full isolation/correctness for the multi-key operations
•… operate on multiple states at the same time
•… share the states between multiple streams
© 2018 data Artisans11
‒ Atomicity: the transfer affects
either both accounts or none
‒ Consistency: the transfer must
only happen if the account
have sufficient funds
‒ Isolation: no other operation
can interfere and cause an
incorrect result
‒ Durability: the result of the
transfer is durable
ACID Transactions for Multi-key Stream Processing
Streaming Ledger provides ACID guarantees
across multiple states, rows, and streams
© 2018 data Artisans12
Example: Transferring Cash/Assets between
Accounts
© 2018 data Artisans13
Example: Position-keeping, Reporting, Risk
Management in Investment Banking
© 2018 data Artisans14
A Library on top of Apache Flink
• https://github.com/dataArtisans/da-
streamingledger
• No additional dependencies needed
• Seamlessly integrates and composes with
DataStream API and SQL
• Read from- and write to all Flink
connectors
• Supports savepoints for upgrades
© 2018 data Artisans15
STREAM PROCESSING
TAKES ON SQL
© 2018 data Artisans16
StreamSQL in Flink
Flink APIs
Stream/Batch Processing
Runtime
Distributed Streaming Data Flow
Java/Scala
© 2018 data Artisans17
Flink APIs
Stream/Batch Processing
Runtime
Distributed Streaming Data Flow
Java/Scala SQL
StreamSQL in Flink
● SQL Command Line Client
○ https://github.com/dataArtisans/sql-training
● Event & Processing Time
● Configuration in YAML
● Source/Sink Definition in YAML
● User-defined functions
● Streaming and Batch
© 2018 data Artisans18
https://github.com/dataArtisans/sql-training
© 2018 data Artisans19
Join
Enrichment Joins
bu
y
bu
y
sell
bu
y
bu
y
sell
$ 17
£ 42
12.5₪
© 2018 data Artisans20
Join
Temporal Table Joins
bu
y
bu
y
sell
bu
y
bu
y
sell
1453
31753
14
curr rate time
£ 42 3
£ 12 17
© 2018 data Artisans21
SELECT * from ?
Complex Event Processing with SQL
© 2018 data Artisans22
SELECT *
FROM TaxiRides
MATCH_RECOGNIZE (
PARTITION BY driverId
ORDER BY rideTime
MEASURES
S.rideId as sRideId
AFTER MATCH SKIP PAST LAST ROW
PATTERN (S M{2,} E)
DEFINE
S AS S.isStart = true,
M AS M.rideId <> S.rideId,
E AS E.isStart = false
AND E.rideId = S.rideId
)
Introducing MATCH_RECOGNIZE
© 2018 data Artisans23
TECHNICAL ENABLERS MATTER
© 2018 data Artisans24
Resource Efficient, Scalable, Real-Time Services
based on
The Promise of Stateful Stream Processing
Parallel Computation on Local State
© 2018 data Artisans25
Local State leads to Scalability
Stateless
Application
Database
Streaming
Application
State
… become …
● database needs to be
scaled in addition to
application
● in pratice database often
becomes the bottleneck
● compute and storage
are scaled alongside
© 2018 data Artisans26
Local State leads to Performance
Stateless
Application
Database
Streaming
Application
State
… become …
● reads/write over tier
boundaries
● local state
● asynchronous writes of
large blobs for durability
© 2018 data Artisans27
Local State leads to Simpler Operations
Stateless
Application
Database
Streaming
Application
State
… become …
● additional database
operations
● new service requries
additional database
● only additional backup
storage needed
© 2018 data Artisans28
Local State leads to Consistency
Stateless
Application
Database
Streaming
Application
State
… become …
● distributed transactions
● at scale usually low
isolation and consistency
guarantees
● exactly-once
● multi-key multi-table
serializable transaction
© 2018 data Artisans29
Resource Efficient, Scalable, Real-Time Services
● Scalability
● Performance
based on
The Promise of Stateful Stream Processing
Parallel Computation on Local State
● Operational Simplicity
● Consistency
THANK YOU!
@snntrable
@dataArtisans
@ApacheFlink
WE ARE HIRING
data-artisans.com/careers
© 2018 data Artisans31
BACKUP
© 2018 data Artisans32
DATA ARTISANS PLATFORM OVERVIEW
Disclaimer: Apache Flink is not a product of data Artisans. It is a project by the Apache Software Foundation.
© 2018 data Artisans33
Performance (early results) - Scalability
(parallelism)
200 million rows
100% update
queries
4 rows
written/query
© 2018 data Artisans34
Performance (early results) – Key Contention
Artificial case of
extreme contention:
4 x 200,000
= 800,000 updates/sec
on the same 1,000 keys
Slowdown, but does not
break down like
optimistic concurrency
approaches
100% update
queries
4 written/query

More Related Content

What's hot

Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
confluent
 
Serving the Real-Time Data Needs of an Airport with Kafka Streams and KSQL
Serving the Real-Time Data Needs of an Airport with Kafka Streams and KSQL Serving the Real-Time Data Needs of an Airport with Kafka Streams and KSQL
Serving the Real-Time Data Needs of an Airport with Kafka Streams and KSQL
confluent
 
Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...
Event Streaming Architecture for Industry 4.0 -  Abdelkrim Hadjidj & Jan Kuni...Event Streaming Architecture for Industry 4.0 -  Abdelkrim Hadjidj & Jan Kuni...
Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...
Flink Forward
 
Building Pinterest Real-Time Ads Platform Using Kafka Streams
Building Pinterest Real-Time Ads Platform Using Kafka Streams Building Pinterest Real-Time Ads Platform Using Kafka Streams
Building Pinterest Real-Time Ads Platform Using Kafka Streams
confluent
 

What's hot (20)

Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, GoogleHybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google
 
Real-Time Market Data Analytics Using Kafka Streams
Real-Time Market Data Analytics Using Kafka StreamsReal-Time Market Data Analytics Using Kafka Streams
Real-Time Market Data Analytics Using Kafka Streams
 
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
 
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...
 
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
 
Money Heist - A Stream Processing Original! | Meha Pandey and Shengze Yu, Net...
Money Heist - A Stream Processing Original! | Meha Pandey and Shengze Yu, Net...Money Heist - A Stream Processing Original! | Meha Pandey and Shengze Yu, Net...
Money Heist - A Stream Processing Original! | Meha Pandey and Shengze Yu, Net...
 
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
 
What does an event mean? Manage the meaning of your data! | Andreas Wombacher...
What does an event mean? Manage the meaning of your data! | Andreas Wombacher...What does an event mean? Manage the meaning of your data! | Andreas Wombacher...
What does an event mean? Manage the meaning of your data! | Andreas Wombacher...
 
Pipelining the Heroes with Kafka and Graph
Pipelining the Heroes with Kafka and GraphPipelining the Heroes with Kafka and Graph
Pipelining the Heroes with Kafka and Graph
 
[INFOGRAPHIC] Event-driven Business: How to Handle the Flow of Event Data
[INFOGRAPHIC] Event-driven Business: How to Handle the Flow of Event Data[INFOGRAPHIC] Event-driven Business: How to Handle the Flow of Event Data
[INFOGRAPHIC] Event-driven Business: How to Handle the Flow of Event Data
 
Transform Your Mainframe and IBM i Data for the Cloud with Precisely and Apac...
Transform Your Mainframe and IBM i Data for the Cloud with Precisely and Apac...Transform Your Mainframe and IBM i Data for the Cloud with Precisely and Apac...
Transform Your Mainframe and IBM i Data for the Cloud with Precisely and Apac...
 
Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...
Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...
Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...
 
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
 
Death of the dumb pipes: Using Apache Kafka® for Integration projects
Death of the dumb pipes: Using Apache Kafka® for Integration projectsDeath of the dumb pipes: Using Apache Kafka® for Integration projects
Death of the dumb pipes: Using Apache Kafka® for Integration projects
 
Ingesting IoT data in Food Processing
Ingesting IoT data in Food ProcessingIngesting IoT data in Food Processing
Ingesting IoT data in Food Processing
 
Flink Forward Berlin 2017: Gyula Fora - Building and operating large-scale st...
Flink Forward Berlin 2017: Gyula Fora - Building and operating large-scale st...Flink Forward Berlin 2017: Gyula Fora - Building and operating large-scale st...
Flink Forward Berlin 2017: Gyula Fora - Building and operating large-scale st...
 
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
 
Serving the Real-Time Data Needs of an Airport with Kafka Streams and KSQL
Serving the Real-Time Data Needs of an Airport with Kafka Streams and KSQL Serving the Real-Time Data Needs of an Airport with Kafka Streams and KSQL
Serving the Real-Time Data Needs of an Airport with Kafka Streams and KSQL
 
Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...
Event Streaming Architecture for Industry 4.0 -  Abdelkrim Hadjidj & Jan Kuni...Event Streaming Architecture for Industry 4.0 -  Abdelkrim Hadjidj & Jan Kuni...
Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...
 
Building Pinterest Real-Time Ads Platform Using Kafka Streams
Building Pinterest Real-Time Ads Platform Using Kafka Streams Building Pinterest Real-Time Ads Platform Using Kafka Streams
Building Pinterest Real-Time Ads Platform Using Kafka Streams
 

Similar to Big Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHING

Why and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on FlinkWhy and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on Flink
DataWorks Summit
 
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward
 

Similar to Big Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHING (20)

The Past, Present, and Future of Apache Flink
The Past, Present, and Future of Apache FlinkThe Past, Present, and Future of Apache Flink
The Past, Present, and Future of Apache Flink
 
Flink Forward Berlin 2018: Stephan Ewen - Keynote: "Unlocking the next wave o...
Flink Forward Berlin 2018: Stephan Ewen - Keynote: "Unlocking the next wave o...Flink Forward Berlin 2018: Stephan Ewen - Keynote: "Unlocking the next wave o...
Flink Forward Berlin 2018: Stephan Ewen - Keynote: "Unlocking the next wave o...
 
The Past, Present, and Future of Apache Flink®
The Past, Present, and Future of Apache Flink®The Past, Present, and Future of Apache Flink®
The Past, Present, and Future of Apache Flink®
 
YugaByte + PKS CloudFoundry Meetup 10/15/2018
YugaByte + PKS CloudFoundry Meetup 10/15/2018YugaByte + PKS CloudFoundry Meetup 10/15/2018
YugaByte + PKS CloudFoundry Meetup 10/15/2018
 
Stream processing for the practitioner: Blueprints for common stream processi...
Stream processing for the practitioner: Blueprints for common stream processi...Stream processing for the practitioner: Blueprints for common stream processi...
Stream processing for the practitioner: Blueprints for common stream processi...
 
dA Platform Overview
dA Platform OverviewdA Platform Overview
dA Platform Overview
 
Flink Forward San Francisco 2018: Robert Metzger & Patrick Lucas - "dA Platfo...
Flink Forward San Francisco 2018: Robert Metzger & Patrick Lucas - "dA Platfo...Flink Forward San Francisco 2018: Robert Metzger & Patrick Lucas - "dA Platfo...
Flink Forward San Francisco 2018: Robert Metzger & Patrick Lucas - "dA Platfo...
 
Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise
Pivotal Digital Transformation Forum: Journey to Become a Data-Driven EnterprisePivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise
Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise
 
The role of NoSQL in the Next Generation of Financial Informatics
The role of NoSQL in the Next Generation of Financial InformaticsThe role of NoSQL in the Next Generation of Financial Informatics
The role of NoSQL in the Next Generation of Financial Informatics
 
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
 
Make your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSMake your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWS
 
Laboratorio práctico: Data warehouse en la nube
Laboratorio práctico: Data warehouse en la nubeLaboratorio práctico: Data warehouse en la nube
Laboratorio práctico: Data warehouse en la nube
 
Event Horizon at Solace Connect Singapore
Event Horizon at Solace Connect SingaporeEvent Horizon at Solace Connect Singapore
Event Horizon at Solace Connect Singapore
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
 
Why and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on FlinkWhy and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on Flink
 
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
 
Cloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and FastCloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and Fast
 
... No it's Apache Kafka!
... No it's Apache Kafka!... No it's Apache Kafka!
... No it's Apache Kafka!
 
IT Modernization in Practice
IT Modernization in PracticeIT Modernization in Practice
IT Modernization in Practice
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
 

More from Matt Stubbs

Speed Up Your Apache Cassandra™ Applications: A Practical Guide to Reactive P...
Speed Up Your Apache Cassandra™ Applications: A Practical Guide to Reactive P...Speed Up Your Apache Cassandra™ Applications: A Practical Guide to Reactive P...
Speed Up Your Apache Cassandra™ Applications: A Practical Guide to Reactive P...
Matt Stubbs
 
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Matt Stubbs
 

More from Matt Stubbs (20)

Blueprint Series: Banking In The Cloud – Ultra-high Reliability Architectures
Blueprint Series: Banking In The Cloud – Ultra-high Reliability ArchitecturesBlueprint Series: Banking In The Cloud – Ultra-high Reliability Architectures
Blueprint Series: Banking In The Cloud – Ultra-high Reliability Architectures
 
Speed Up Your Apache Cassandra™ Applications: A Practical Guide to Reactive P...
Speed Up Your Apache Cassandra™ Applications: A Practical Guide to Reactive P...Speed Up Your Apache Cassandra™ Applications: A Practical Guide to Reactive P...
Speed Up Your Apache Cassandra™ Applications: A Practical Guide to Reactive P...
 
Blueprint Series: Expedia Partner Solutions, Data Platform
Blueprint Series: Expedia Partner Solutions, Data PlatformBlueprint Series: Expedia Partner Solutions, Data Platform
Blueprint Series: Expedia Partner Solutions, Data Platform
 
Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...
Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...
Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...
 
Big Data LDN 2018: DATA, WHAT PEOPLE THINK AND WHAT YOU CAN DO TO BUILD TRUST.
Big Data LDN 2018: DATA, WHAT PEOPLE THINK AND WHAT YOU CAN DO TO BUILD TRUST.Big Data LDN 2018: DATA, WHAT PEOPLE THINK AND WHAT YOU CAN DO TO BUILD TRUST.
Big Data LDN 2018: DATA, WHAT PEOPLE THINK AND WHAT YOU CAN DO TO BUILD TRUST.
 
Big Data LDN 2018: DATABASE FOR THE INSTANT EXPERIENCE
Big Data LDN 2018: DATABASE FOR THE INSTANT EXPERIENCEBig Data LDN 2018: DATABASE FOR THE INSTANT EXPERIENCE
Big Data LDN 2018: DATABASE FOR THE INSTANT EXPERIENCE
 
Big Data LDN 2018: BIG DATA TOO SLOW? SPRINKLE IN SOME NOSQL
Big Data LDN 2018: BIG DATA TOO SLOW? SPRINKLE IN SOME NOSQLBig Data LDN 2018: BIG DATA TOO SLOW? SPRINKLE IN SOME NOSQL
Big Data LDN 2018: BIG DATA TOO SLOW? SPRINKLE IN SOME NOSQL
 
Big Data LDN 2018: ENABLING DATA-DRIVEN DECISIONS WITH AUTOMATED INSIGHTS
Big Data LDN 2018: ENABLING DATA-DRIVEN DECISIONS WITH AUTOMATED INSIGHTSBig Data LDN 2018: ENABLING DATA-DRIVEN DECISIONS WITH AUTOMATED INSIGHTS
Big Data LDN 2018: ENABLING DATA-DRIVEN DECISIONS WITH AUTOMATED INSIGHTS
 
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
 
Big Data LDN 2018: AI VS. GDPR
Big Data LDN 2018: AI VS. GDPRBig Data LDN 2018: AI VS. GDPR
Big Data LDN 2018: AI VS. GDPR
 
Big Data LDN 2018: REALISING THE PROMISE OF SELF-SERVICE ANALYTICS WITH DATA ...
Big Data LDN 2018: REALISING THE PROMISE OF SELF-SERVICE ANALYTICS WITH DATA ...Big Data LDN 2018: REALISING THE PROMISE OF SELF-SERVICE ANALYTICS WITH DATA ...
Big Data LDN 2018: REALISING THE PROMISE OF SELF-SERVICE ANALYTICS WITH DATA ...
 
Big Data LDN 2018: TURNING MULTIPLE DATA LAKES INTO A UNIFIED ANALYTIC DATA L...
Big Data LDN 2018: TURNING MULTIPLE DATA LAKES INTO A UNIFIED ANALYTIC DATA L...Big Data LDN 2018: TURNING MULTIPLE DATA LAKES INTO A UNIFIED ANALYTIC DATA L...
Big Data LDN 2018: TURNING MULTIPLE DATA LAKES INTO A UNIFIED ANALYTIC DATA L...
 
Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...
Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...
Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...
 
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...
 
Big Data LDN 2018: MICROLISE: USING BIG DATA AND AI IN TRANSPORT AND LOGISTICS
Big Data LDN 2018: MICROLISE: USING BIG DATA AND AI IN TRANSPORT AND LOGISTICSBig Data LDN 2018: MICROLISE: USING BIG DATA AND AI IN TRANSPORT AND LOGISTICS
Big Data LDN 2018: MICROLISE: USING BIG DATA AND AI IN TRANSPORT AND LOGISTICS
 
Big Data LDN 2018: EXPERIAN: MAXIMISE EVERY OPPORTUNITY IN THE BIG DATA UNIVERSE
Big Data LDN 2018: EXPERIAN: MAXIMISE EVERY OPPORTUNITY IN THE BIG DATA UNIVERSEBig Data LDN 2018: EXPERIAN: MAXIMISE EVERY OPPORTUNITY IN THE BIG DATA UNIVERSE
Big Data LDN 2018: EXPERIAN: MAXIMISE EVERY OPPORTUNITY IN THE BIG DATA UNIVERSE
 
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNING
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNINGBig Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNING
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNING
 
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...
 
Big Data LDN 2018: FROM PROLIFERATION TO PRODUCTIVITY: MACHINE LEARNING DATA ...
Big Data LDN 2018: FROM PROLIFERATION TO PRODUCTIVITY: MACHINE LEARNING DATA ...Big Data LDN 2018: FROM PROLIFERATION TO PRODUCTIVITY: MACHINE LEARNING DATA ...
Big Data LDN 2018: FROM PROLIFERATION TO PRODUCTIVITY: MACHINE LEARNING DATA ...
 
Big Data LDN 2018: DATA APIS DON’T DISCRIMINATE
Big Data LDN 2018: DATA APIS DON’T DISCRIMINATEBig Data LDN 2018: DATA APIS DON’T DISCRIMINATE
Big Data LDN 2018: DATA APIS DON’T DISCRIMINATE
 

Recently uploaded

Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
pyhepag
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
Amil baba
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
ppy8zfkfm
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
fztigerwe
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
dq9vz1isj
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
pyhepag
 
如何办理滑铁卢大学毕业证(Waterloo毕业证)成绩单本科学位证原版一比一
如何办理滑铁卢大学毕业证(Waterloo毕业证)成绩单本科学位证原版一比一如何办理滑铁卢大学毕业证(Waterloo毕业证)成绩单本科学位证原版一比一
如何办理滑铁卢大学毕业证(Waterloo毕业证)成绩单本科学位证原版一比一
0uyfyq0q4
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
pyhepag
 

Recently uploaded (20)

Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
123.docx. .
123.docx.                                 .123.docx.                                 .
123.docx. .
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)
 
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdfGenerative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data Analytics
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
 
What is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationWhat is Insertion Sort. Its basic information
What is Insertion Sort. Its basic information
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshare
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdf
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdf
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancing
 
如何办理滑铁卢大学毕业证(Waterloo毕业证)成绩单本科学位证原版一比一
如何办理滑铁卢大学毕业证(Waterloo毕业证)成绩单本科学位证原版一比一如何办理滑铁卢大学毕业证(Waterloo毕业证)成绩单本科学位证原版一比一
如何办理滑铁卢大学毕业证(Waterloo毕业证)成绩单本科学位证原版一比一
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
 

Big Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHING

  • 1. KONSTANTIN KNAUF, SOLUTION ARCHITECT STREAM PROCESSING TAKES ON EVERYTHING
  • 2. © 2018 data Artisans2 About Data Artisans Original Creators of Apache Flink® Enterprise Ready Real Time Stream Processing
  • 3. © 2018 data Artisans3 Stream Processing Your Code one at a time event processing ... State
  • 4. © 2018 data Artisans4 Streaming Applications over Time Real time Approximate Analytics Fraud Detection Online Machine Learning Realtime ETL Intrusion Prevention Financial Risk & Reporting Real time dashboards Anomaly Detection Logistics Tracking Recommender Systems Web Applications Masterdata Management Continuous Processing Continuous Processing and Analytics Unification of Analytics and Applications Data-driven Applications
  • 5. © 2018 data Artisans5 Enablers of new Applications Abstractions, APIs Consistency Event-time, streaming SQL, state & time, CEP Exactly-once, savepoints Interoperability Deployment, Connectors, Operations Scalability high parallelism, large state Scalable timers, dynamic scaling, local recovery, … Framework and Library, REST-ified Flink, SQL Client, …
  • 6. © 2018 data Artisans6 STREAM PROCESSING TAKES ON ACID
  • 7. © 2018 data Artisans7 Exactly-once Changed Applications Stateless Application K/V Store CRUD / request/response Applications Streaming Application State Stateful Stream Processing Applications … become …
  • 8. © 2018 data Artisans8 Some Applications don't move to Stream Processing Application Relational Database
  • 9. © 2018 data Artisans9 Limitation of Current Stream Processors Transferring money from one account (key) to another with transactional guarantees is not feasible The Limitation Example All stream processors so far can update a single key-at-a-time with correctness guarantees (exactly once)
  • 10. © 2018 data Artisans10 With dA Streaming Ledger you can ... •… access and update state with multiple keys at the same time •… maintain full isolation/correctness for the multi-key operations •… operate on multiple states at the same time •… share the states between multiple streams
  • 11. © 2018 data Artisans11 ‒ Atomicity: the transfer affects either both accounts or none ‒ Consistency: the transfer must only happen if the account have sufficient funds ‒ Isolation: no other operation can interfere and cause an incorrect result ‒ Durability: the result of the transfer is durable ACID Transactions for Multi-key Stream Processing Streaming Ledger provides ACID guarantees across multiple states, rows, and streams
  • 12. © 2018 data Artisans12 Example: Transferring Cash/Assets between Accounts
  • 13. © 2018 data Artisans13 Example: Position-keeping, Reporting, Risk Management in Investment Banking
  • 14. © 2018 data Artisans14 A Library on top of Apache Flink • https://github.com/dataArtisans/da- streamingledger • No additional dependencies needed • Seamlessly integrates and composes with DataStream API and SQL • Read from- and write to all Flink connectors • Supports savepoints for upgrades
  • 15. © 2018 data Artisans15 STREAM PROCESSING TAKES ON SQL
  • 16. © 2018 data Artisans16 StreamSQL in Flink Flink APIs Stream/Batch Processing Runtime Distributed Streaming Data Flow Java/Scala
  • 17. © 2018 data Artisans17 Flink APIs Stream/Batch Processing Runtime Distributed Streaming Data Flow Java/Scala SQL StreamSQL in Flink ● SQL Command Line Client ○ https://github.com/dataArtisans/sql-training ● Event & Processing Time ● Configuration in YAML ● Source/Sink Definition in YAML ● User-defined functions ● Streaming and Batch
  • 18. © 2018 data Artisans18 https://github.com/dataArtisans/sql-training
  • 19. © 2018 data Artisans19 Join Enrichment Joins bu y bu y sell bu y bu y sell $ 17 £ 42 12.5₪
  • 20. © 2018 data Artisans20 Join Temporal Table Joins bu y bu y sell bu y bu y sell 1453 31753 14 curr rate time £ 42 3 £ 12 17
  • 21. © 2018 data Artisans21 SELECT * from ? Complex Event Processing with SQL
  • 22. © 2018 data Artisans22 SELECT * FROM TaxiRides MATCH_RECOGNIZE ( PARTITION BY driverId ORDER BY rideTime MEASURES S.rideId as sRideId AFTER MATCH SKIP PAST LAST ROW PATTERN (S M{2,} E) DEFINE S AS S.isStart = true, M AS M.rideId <> S.rideId, E AS E.isStart = false AND E.rideId = S.rideId ) Introducing MATCH_RECOGNIZE
  • 23. © 2018 data Artisans23 TECHNICAL ENABLERS MATTER
  • 24. © 2018 data Artisans24 Resource Efficient, Scalable, Real-Time Services based on The Promise of Stateful Stream Processing Parallel Computation on Local State
  • 25. © 2018 data Artisans25 Local State leads to Scalability Stateless Application Database Streaming Application State … become … ● database needs to be scaled in addition to application ● in pratice database often becomes the bottleneck ● compute and storage are scaled alongside
  • 26. © 2018 data Artisans26 Local State leads to Performance Stateless Application Database Streaming Application State … become … ● reads/write over tier boundaries ● local state ● asynchronous writes of large blobs for durability
  • 27. © 2018 data Artisans27 Local State leads to Simpler Operations Stateless Application Database Streaming Application State … become … ● additional database operations ● new service requries additional database ● only additional backup storage needed
  • 28. © 2018 data Artisans28 Local State leads to Consistency Stateless Application Database Streaming Application State … become … ● distributed transactions ● at scale usually low isolation and consistency guarantees ● exactly-once ● multi-key multi-table serializable transaction
  • 29. © 2018 data Artisans29 Resource Efficient, Scalable, Real-Time Services ● Scalability ● Performance based on The Promise of Stateful Stream Processing Parallel Computation on Local State ● Operational Simplicity ● Consistency
  • 31. © 2018 data Artisans31 BACKUP
  • 32. © 2018 data Artisans32 DATA ARTISANS PLATFORM OVERVIEW Disclaimer: Apache Flink is not a product of data Artisans. It is a project by the Apache Software Foundation.
  • 33. © 2018 data Artisans33 Performance (early results) - Scalability (parallelism) 200 million rows 100% update queries 4 rows written/query
  • 34. © 2018 data Artisans34 Performance (early results) – Key Contention Artificial case of extreme contention: 4 x 200,000 = 800,000 updates/sec on the same 1,000 keys Slowdown, but does not break down like optimistic concurrency approaches 100% update queries 4 written/query