SlideShare a Scribd company logo
1 of 26
Download to read offline
Introduction to stream processing
with Apache Flink
Seif Haridi
KTH/SICS
Stream processing
2Data Science Summit 2015
Why streaming
3
Data
Warehouse
Batch
Data availability Streaming
2008 20152000
- Which data?
- When?
- Who?
Data Science Summit 2015 S. Haridi
3 Parts of a Streaming Infrastructure
4
Gathering Broker Analysis
Sensors
Transaction
logs …
Server Logs
Data Science Summit 2015 S. Haridi
Example: Bouygues Telecom
5Data Science Summit 2015 S. Haridi
• Network and subscriber data
gathered
• Added to Broker in raw format
• Transformed and analyzed by
streaming engine
• Stored back for further procesing
http://data-artisans.com/flink-at-bouygues.html
What is Apache Flink?
6Data Science Summit 2015
1 year of Flink - code
April 2014 April 2015
Data Science Summit 2015 S. Haridi 7
What is Apache Flink
8
Distributed Data Flow Processing System
▪Focused on large-scale data analytics
▪Unified real-time stream and batch processing
▪Expressive and rich APIs in Java / Scala (+ Python)
▪Robust and fast execution backend
Reduce
Join
Filter
Reduce
Map
Iterate
Source
Sink
Source
Data Science Summit 2015 S. Haridi
Flink Stack
9
Gelly
Table
ML
SAMOA
DataSet (Java/Scala) DataStream (Java/Scala)
HadoopM/R
Local Cluster Yarn
Tez
Embedded
Dataflow
Dataflow
Table
Streaming dataflow
runtime
Storm
Zeppelin
Data Science Summit 2015 S. Haridi
Stream Processing with Flink
10Data Science Summit 2015
What is Flink Streaming
11
 Native, low-latency stream processor
 Expressive functional API
 Flexible operator state, iterations, windows
 Exactly-once processing semantics
Data Science Summit 2015 S. Haridi
Native vs non-native streaming
12
Stream
discretizer
Job Job Job Jobwhile (true) {
// get next few records
// issue batch computation
}
Non-native streaming
while (true) {
// process next record
}
Long-standing
operators
Native streaming
Data Science Summit 2015 S. Haridi
Stream processing in Flink
 Continuous Streaming model
 Low processing latency
 O(1) state updates per operator
 Exactly once semantics for state
operators
Data Science Summit 2015 S. Haridi 13
DataStream API
14Data Science Summit 2015
Overview of the API
15Data Science Summit 2015 S. Haridi
Windowing Semantics
16
• Trigger and Eviction policies
• window(<eviction>).every(<trigger>)
• Built-in policies:
– Time: Time.of(length, TimeUnit/Custom timestamp)
– window(Time.of(20, SECONDS))
– Count: Count.of(windowSize)
– window(Count.of(20)).every(Count.of(10))
– Delta: Delta.of(Threshold, Distance function, Start value)
– window(Delta.of(0.1, priceDistanceFun, initPrice)
Data Science Summit 2015 S. Haridi
Word count in Batch and Streaming
17
case class Word (word: String, frequency: Int)
val lines: DataStream[String] = env.fromSocketStream(...)
lines.flatMap {line => line.split(" ")
.map(word => Word(word,1))}
.keyBy("word”).window(Time.of(5,SECONDS))
.every(Time.of(1,SECONDS)).sum("frequency")
.print()
val lines: DataSet[String] = env.readTextFile(...)
lines.flatMap {line => line.split(" ")
.map(word => Word(word,1))}
.groupBy("word").sum("frequency")
.print()
DataSet API (batch):
DataStream API (streaming):
Data Science Summit 2015 S. Haridi
Flexible windows
18
More at: http://flink.apache.org/news/2015/02/09/streaming-example.html
Keyed Stream Windowed StreamData Stream Keyed Stream Windowed Stream
 Stream of stocks
 Trigger warning if price fluctuates by 5%
 Count the number of warnings per stock in
30 second (tumbling) window
 Do it continuously
Data Science Summit 2015 S. Haridi
Stock
Stream
Delta 5%
of price
Warning Count
30 sec
window Sum
keyBy
symbol
keyBy
symbol
Flexible windows
19
More at: http://flink.apache.org/news/2015/02/09/streaming-example.html
case class Count(symbol: String, count:
Int)
val defaultPrice = StockPrice(“”, 1000)
val priceWarnings =
stockStream.keyBy(“symbol”)
.window(Delta.of(0.05, priceChange,
defaultPrice)
.mapWindow(sendWarning _)
Use delta policy to create
change warnings
Count number of warning
per stock every half a minute
val warningPerStock = priceWarnings.flatten()
.map(Count(_, 1))
.keyBy(“symbol”)
.window(Time.of(30, SECONDS))
.sum(“count”)
Data Science Summit 2015 S. Haridi
Stock
Stream
Delta 5%
of price
Warning Count
30 sec
window
Sum
keyBy
symbol
keyBy
symbol
Iterative stream processing
20
Motivation
 Many applications require cyclic
streams
 Machine learning applications (parallel
model training, evaluation)
Iterations in Flink Streaming
 Native support for cyclic dataflows
 Integrated with functional API
 High performance and expressivity
Input
Train
Evaluate
Data Science Summit 2015 S. Haridi
Fault tolerance
21Data Science Summit 2015
Exactly-once processing in for operator
state
22
 Based on consistent global snapshots
 Low runtime overhead, stateful exactly-
once semantics
Data Science Summit 2015 S. Haridi
Checkpointing / Recovery
23
Detailed algorithm: Lightweight Asynchronous Snapshots for Distributed Dataflows
Data Science Summit 2015 S. Haridi
Fault tolerance
 Check-pointing and recovery of operator state
is very fast
• Data processing does not block
 Executions based on CPU/operator time are
not idempotent
 Other execution modes are based on
timestamps of input streams (Event/Ingress
time)
• Allows idempotent executions
• End-to-End exactly-once semantics
• In Flink version 0.10
24Data Science Summit 2015 S. Haridi
Streaming in Apache Flink
 True streaming over stateful distributed
dataflow engine
 Expressive Streaming API in Java/Scala
• Flexible window semantics
• Iterative computation
 Low streaming latency, exactly-once
semantics depending on execution
mode, and low overhead for recovery
25Data Science Summit 2015 S. Haridi
Special Thanks to
Gyula Fora, SICS
Paris Carbone, KTH
Kostas Tzoumas, Data Artisans
Stephan Ewen, Data Artisans
Volker Markl, TU-Berlin
26Data Science Summit 2015

More Related Content

What's hot

Stateful Distributed Stream Processing
Stateful Distributed Stream ProcessingStateful Distributed Stream Processing
Stateful Distributed Stream ProcessingGyula Fóra
 
Apache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapApache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapKostas Tzoumas
 
Apache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and FriendsApache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and FriendsStephan Ewen
 
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache FlinkUnifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache FlinkDataWorks Summit/Hadoop Summit
 
Don't Cross The Streams - Data Streaming And Apache Flink
Don't Cross The Streams  - Data Streaming And Apache FlinkDon't Cross The Streams  - Data Streaming And Apache Flink
Don't Cross The Streams - Data Streaming And Apache FlinkJohn Gorman (BSc, CISSP)
 
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache FlinkSuneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache FlinkFlink Forward
 
Alexander Kolb – Flink. Yet another Streaming Framework?
Alexander Kolb – Flink. Yet another Streaming Framework?Alexander Kolb – Flink. Yet another Streaming Framework?
Alexander Kolb – Flink. Yet another Streaming Framework?Flink Forward
 
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?Flink Forward
 
Apache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyondApache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyondBowen Li
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used forAljoscha Krettek
 
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache FlinkGelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache FlinkVasia Kalavri
 
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15Vasia Kalavri
 
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015Martin Junghanns
 
Dongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of FlinkDongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of FlinkFlink Forward
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkDataWorks Summit
 
Marton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream ProcessingMarton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream ProcessingFlink Forward
 
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Flink Forward
 
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data ProcessingApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data ProcessingFabian Hueske
 
Till Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
Till Rohrmann – Fault Tolerance and Job Recovery in Apache FlinkTill Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
Till Rohrmann – Fault Tolerance and Job Recovery in Apache FlinkFlink Forward
 
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)Robert Metzger
 

What's hot (20)

Stateful Distributed Stream Processing
Stateful Distributed Stream ProcessingStateful Distributed Stream Processing
Stateful Distributed Stream Processing
 
Apache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapApache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmap
 
Apache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and FriendsApache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and Friends
 
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache FlinkUnifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
 
Don't Cross The Streams - Data Streaming And Apache Flink
Don't Cross The Streams  - Data Streaming And Apache FlinkDon't Cross The Streams  - Data Streaming And Apache Flink
Don't Cross The Streams - Data Streaming And Apache Flink
 
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache FlinkSuneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
 
Alexander Kolb – Flink. Yet another Streaming Framework?
Alexander Kolb – Flink. Yet another Streaming Framework?Alexander Kolb – Flink. Yet another Streaming Framework?
Alexander Kolb – Flink. Yet another Streaming Framework?
 
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
 
Apache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyondApache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyond
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
 
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache FlinkGelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
 
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
 
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015
 
Dongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of FlinkDongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of Flink
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache Flink
 
Marton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream ProcessingMarton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream Processing
 
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
 
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data ProcessingApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
 
Till Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
Till Rohrmann – Fault Tolerance and Job Recovery in Apache FlinkTill Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
Till Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
 
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
 

Viewers also liked

Business Envirment of England
Business Envirment of EnglandBusiness Envirment of England
Business Envirment of EnglandSheikh Shahnawaz
 
Nash Community College BDF Program Presentation - Local Economic Outlook Lunc...
Nash Community College BDF Program Presentation - Local Economic Outlook Lunc...Nash Community College BDF Program Presentation - Local Economic Outlook Lunc...
Nash Community College BDF Program Presentation - Local Economic Outlook Lunc...rmtjaycees
 
Аналіз бойових дій в районі Іловайська після вторгнення російських військ 24-...
Аналіз бойових дій в районі Іловайська після вторгнення російських військ 24-...Аналіз бойових дій в районі Іловайська після вторгнення російських військ 24-...
Аналіз бойових дій в районі Іловайська після вторгнення російських військ 24-...Марья Ивановна
 
CV Svindland Inger (english) 2016
CV Svindland Inger (english) 2016CV Svindland Inger (english) 2016
CV Svindland Inger (english) 2016Inger Svindland
 
Рік безкарності: громадський аналіз розслідування справ Євромайдану
Рік безкарності: громадський аналіз розслідування справ ЄвромайдануРік безкарності: громадський аналіз розслідування справ Євромайдану
Рік безкарності: громадський аналіз розслідування справ ЄвромайдануМарья Ивановна
 
Collective Spammer Detection in Evolving Multi-Relational Social Networks
Collective Spammer Detection in Evolving Multi-Relational Social NetworksCollective Spammer Detection in Evolving Multi-Relational Social Networks
Collective Spammer Detection in Evolving Multi-Relational Social NetworksTuri, Inc.
 
Додаткові докази участі військовослужбовців ГРУ ГШ РФ у військових діях на те...
Додаткові докази участі військовослужбовців ГРУ ГШ РФ у військових діях на те...Додаткові докази участі військовослужбовців ГРУ ГШ РФ у військових діях на те...
Додаткові докази участі військовослужбовців ГРУ ГШ РФ у військових діях на те...Марья Ивановна
 
Deep Learning in a Dumpster
Deep Learning in a DumpsterDeep Learning in a Dumpster
Deep Learning in a DumpsterTuri, Inc.
 
ASAM 2014 Year in Review
ASAM 2014 Year in ReviewASAM 2014 Year in Review
ASAM 2014 Year in Reviewasamdecks
 
Manufacturing Analytics at Scale
Manufacturing Analytics at ScaleManufacturing Analytics at Scale
Manufacturing Analytics at ScaleTuri, Inc.
 

Viewers also liked (20)

Business Envirment of England
Business Envirment of EnglandBusiness Envirment of England
Business Envirment of England
 
Driving in the dark
Driving in the darkDriving in the dark
Driving in the dark
 
CV Jannie
CV JannieCV Jannie
CV Jannie
 
Nash Community College BDF Program Presentation - Local Economic Outlook Lunc...
Nash Community College BDF Program Presentation - Local Economic Outlook Lunc...Nash Community College BDF Program Presentation - Local Economic Outlook Lunc...
Nash Community College BDF Program Presentation - Local Economic Outlook Lunc...
 
Аналіз бойових дій в районі Іловайська після вторгнення російських військ 24-...
Аналіз бойових дій в районі Іловайська після вторгнення російських військ 24-...Аналіз бойових дій в районі Іловайська після вторгнення російських військ 24-...
Аналіз бойових дій в районі Іловайська після вторгнення російських військ 24-...
 
Car tips for winter
Car tips for winterCar tips for winter
Car tips for winter
 
Opps... i got a speeding ticket
Opps... i got a speeding ticketOpps... i got a speeding ticket
Opps... i got a speeding ticket
 
CV Svindland Inger (english) 2016
CV Svindland Inger (english) 2016CV Svindland Inger (english) 2016
CV Svindland Inger (english) 2016
 
Рік безкарності: громадський аналіз розслідування справ Євромайдану
Рік безкарності: громадський аналіз розслідування справ ЄвромайдануРік безкарності: громадський аналіз розслідування справ Євромайдану
Рік безкарності: громадський аналіз розслідування справ Євромайдану
 
Collective Spammer Detection in Evolving Multi-Relational Social Networks
Collective Spammer Detection in Evolving Multi-Relational Social NetworksCollective Spammer Detection in Evolving Multi-Relational Social Networks
Collective Spammer Detection in Evolving Multi-Relational Social Networks
 
Додаткові докази участі військовослужбовців ГРУ ГШ РФ у військових діях на те...
Додаткові докази участі військовослужбовців ГРУ ГШ РФ у військових діях на те...Додаткові докази участі військовослужбовців ГРУ ГШ РФ у військових діях на те...
Додаткові докази участі військовослужбовців ГРУ ГШ РФ у військових діях на те...
 
Gregory Crewdson
Gregory CrewdsonGregory Crewdson
Gregory Crewdson
 
Creative
CreativeCreative
Creative
 
Driving anxiety
Driving anxietyDriving anxiety
Driving anxiety
 
Miss movin on
Miss movin onMiss movin on
Miss movin on
 
Deep Learning in a Dumpster
Deep Learning in a DumpsterDeep Learning in a Dumpster
Deep Learning in a Dumpster
 
ASAM 2014 Year in Review
ASAM 2014 Year in ReviewASAM 2014 Year in Review
ASAM 2014 Year in Review
 
Manufacturing Analytics at Scale
Manufacturing Analytics at ScaleManufacturing Analytics at Scale
Manufacturing Analytics at Scale
 
Resume_Arun
Resume_ArunResume_Arun
Resume_Arun
 
Road games for everyone
Road games for everyoneRoad games for everyone
Road games for everyone
 

Similar to SICS: Apache Flink Streaming

Historian & Live Dashboard
Historian & Live DashboardHistorian & Live Dashboard
Historian & Live DashboardAvanceon-Lahore
 
Real-Time Event & Stream Processing on MS Azure
Real-Time Event & Stream Processing on MS AzureReal-Time Event & Stream Processing on MS Azure
Real-Time Event & Stream Processing on MS AzureKhalid Salama
 
Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise
Pivotal Digital Transformation Forum: Journey to Become a Data-Driven EnterprisePivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise
Pivotal Digital Transformation Forum: Journey to Become a Data-Driven EnterpriseVMware Tanzu
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futuremarkgrover
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesKarthik Murugesan
 
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Value Association
 
Real-time processing of large amounts of data
Real-time processing of large amounts of dataReal-time processing of large amounts of data
Real-time processing of large amounts of dataconfluent
 
Scaling up uber's real time data analytics
Scaling up uber's real time data analyticsScaling up uber's real time data analytics
Scaling up uber's real time data analyticsXiang Fu
 
Stream Processing – Concepts and Frameworks
Stream Processing – Concepts and FrameworksStream Processing – Concepts and Frameworks
Stream Processing – Concepts and FrameworksGuido Schmutz
 
How to Build Streaming Apps with Confluent II
How to Build Streaming Apps with Confluent IIHow to Build Streaming Apps with Confluent II
How to Build Streaming Apps with Confluent IIconfluent
 
Flink Forward Berlin 2018: Stephan Ewen - Keynote: "Unlocking the next wave o...
Flink Forward Berlin 2018: Stephan Ewen - Keynote: "Unlocking the next wave o...Flink Forward Berlin 2018: Stephan Ewen - Keynote: "Unlocking the next wave o...
Flink Forward Berlin 2018: Stephan Ewen - Keynote: "Unlocking the next wave o...Flink Forward
 
Apache StreamPipes – Flexible Industrial IoT Management
Apache StreamPipes – Flexible Industrial IoT ManagementApache StreamPipes – Flexible Industrial IoT Management
Apache StreamPipes – Flexible Industrial IoT ManagementApache StreamPipes
 
SpringOne Tour Denver - Spring Boot & Spring Cloud on Pivotal Application Ser...
SpringOne Tour Denver - Spring Boot & Spring Cloud on Pivotal Application Ser...SpringOne Tour Denver - Spring Boot & Spring Cloud on Pivotal Application Ser...
SpringOne Tour Denver - Spring Boot & Spring Cloud on Pivotal Application Ser...VMware Tanzu
 
Spring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel Lavoie
Spring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel LavoieSpring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel Lavoie
Spring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel LavoieVMware Tanzu
 
From measurement to knowledge with sofia2 Platform
From measurement to knowledge with sofia2 PlatformFrom measurement to knowledge with sofia2 Platform
From measurement to knowledge with sofia2 PlatformSofia2 Smart Platform
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...confluent
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksSlim Baltagi
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analyticskgshukla
 

Similar to SICS: Apache Flink Streaming (20)

Historian & Live Dashboard
Historian & Live DashboardHistorian & Live Dashboard
Historian & Live Dashboard
 
Real-Time Event & Stream Processing on MS Azure
Real-Time Event & Stream Processing on MS AzureReal-Time Event & Stream Processing on MS Azure
Real-Time Event & Stream Processing on MS Azure
 
Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise
Pivotal Digital Transformation Forum: Journey to Become a Data-Driven EnterprisePivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise
Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the future
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slides
 
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICS
 
Real-time processing of large amounts of data
Real-time processing of large amounts of dataReal-time processing of large amounts of data
Real-time processing of large amounts of data
 
Scaling up uber's real time data analytics
Scaling up uber's real time data analyticsScaling up uber's real time data analytics
Scaling up uber's real time data analytics
 
Stream Processing – Concepts and Frameworks
Stream Processing – Concepts and FrameworksStream Processing – Concepts and Frameworks
Stream Processing – Concepts and Frameworks
 
How to Build Streaming Apps with Confluent II
How to Build Streaming Apps with Confluent IIHow to Build Streaming Apps with Confluent II
How to Build Streaming Apps with Confluent II
 
Flink Forward Berlin 2018: Stephan Ewen - Keynote: "Unlocking the next wave o...
Flink Forward Berlin 2018: Stephan Ewen - Keynote: "Unlocking the next wave o...Flink Forward Berlin 2018: Stephan Ewen - Keynote: "Unlocking the next wave o...
Flink Forward Berlin 2018: Stephan Ewen - Keynote: "Unlocking the next wave o...
 
Apache StreamPipes – Flexible Industrial IoT Management
Apache StreamPipes – Flexible Industrial IoT ManagementApache StreamPipes – Flexible Industrial IoT Management
Apache StreamPipes – Flexible Industrial IoT Management
 
SpringOne Tour Denver - Spring Boot & Spring Cloud on Pivotal Application Ser...
SpringOne Tour Denver - Spring Boot & Spring Cloud on Pivotal Application Ser...SpringOne Tour Denver - Spring Boot & Spring Cloud on Pivotal Application Ser...
SpringOne Tour Denver - Spring Boot & Spring Cloud on Pivotal Application Ser...
 
Spring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel Lavoie
Spring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel LavoieSpring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel Lavoie
Spring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel Lavoie
 
Io t data streaming
Io t data streamingIo t data streaming
Io t data streaming
 
From measurement to knowledge with sofia2 Platform
From measurement to knowledge with sofia2 PlatformFrom measurement to knowledge with sofia2 Platform
From measurement to knowledge with sofia2 Platform
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics Frameworks
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analytics
 

More from Turi, Inc.

Webinar - Analyzing Video
Webinar - Analyzing VideoWebinar - Analyzing Video
Webinar - Analyzing VideoTuri, Inc.
 
Webinar - Patient Readmission Risk
Webinar - Patient Readmission RiskWebinar - Patient Readmission Risk
Webinar - Patient Readmission RiskTuri, Inc.
 
Webinar - Know Your Customer - Arya (20160526)
Webinar - Know Your Customer - Arya (20160526)Webinar - Know Your Customer - Arya (20160526)
Webinar - Know Your Customer - Arya (20160526)Turi, Inc.
 
Webinar - Product Matching - Palombo (20160428)
Webinar - Product Matching - Palombo (20160428)Webinar - Product Matching - Palombo (20160428)
Webinar - Product Matching - Palombo (20160428)Turi, Inc.
 
Webinar - Pattern Mining Log Data - Vega (20160426)
Webinar - Pattern Mining Log Data - Vega (20160426)Webinar - Pattern Mining Log Data - Vega (20160426)
Webinar - Pattern Mining Log Data - Vega (20160426)Turi, Inc.
 
Webinar - Fraud Detection - Palombo (20160428)
Webinar - Fraud Detection - Palombo (20160428)Webinar - Fraud Detection - Palombo (20160428)
Webinar - Fraud Detection - Palombo (20160428)Turi, Inc.
 
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge DatasetsScaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge DatasetsTuri, Inc.
 
Pattern Mining: Extracting Value from Log Data
Pattern Mining: Extracting Value from Log DataPattern Mining: Extracting Value from Log Data
Pattern Mining: Extracting Value from Log DataTuri, Inc.
 
Intelligent Applications with Machine Learning Toolkits
Intelligent Applications with Machine Learning ToolkitsIntelligent Applications with Machine Learning Toolkits
Intelligent Applications with Machine Learning ToolkitsTuri, Inc.
 
Text Analysis with Machine Learning
Text Analysis with Machine LearningText Analysis with Machine Learning
Text Analysis with Machine LearningTuri, Inc.
 
Machine Learning with GraphLab Create
Machine Learning with GraphLab CreateMachine Learning with GraphLab Create
Machine Learning with GraphLab CreateTuri, Inc.
 
Machine Learning in Production with Dato Predictive Services
Machine Learning in Production with Dato Predictive ServicesMachine Learning in Production with Dato Predictive Services
Machine Learning in Production with Dato Predictive ServicesTuri, Inc.
 
Machine Learning in 2016: Live Q&A with Carlos Guestrin
Machine Learning in 2016: Live Q&A with Carlos GuestrinMachine Learning in 2016: Live Q&A with Carlos Guestrin
Machine Learning in 2016: Live Q&A with Carlos GuestrinTuri, Inc.
 
Scalable data structures for data science
Scalable data structures for data scienceScalable data structures for data science
Scalable data structures for data scienceTuri, Inc.
 
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015Turi, Inc.
 
Introduction to Recommender Systems
Introduction to Recommender SystemsIntroduction to Recommender Systems
Introduction to Recommender SystemsTuri, Inc.
 
Machine learning in production
Machine learning in productionMachine learning in production
Machine learning in productionTuri, Inc.
 
Overview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature EngineeringOverview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature EngineeringTuri, Inc.
 
Building Personalized Data Products with Dato
Building Personalized Data Products with DatoBuilding Personalized Data Products with Dato
Building Personalized Data Products with DatoTuri, Inc.
 

More from Turi, Inc. (20)

Webinar - Analyzing Video
Webinar - Analyzing VideoWebinar - Analyzing Video
Webinar - Analyzing Video
 
Webinar - Patient Readmission Risk
Webinar - Patient Readmission RiskWebinar - Patient Readmission Risk
Webinar - Patient Readmission Risk
 
Webinar - Know Your Customer - Arya (20160526)
Webinar - Know Your Customer - Arya (20160526)Webinar - Know Your Customer - Arya (20160526)
Webinar - Know Your Customer - Arya (20160526)
 
Webinar - Product Matching - Palombo (20160428)
Webinar - Product Matching - Palombo (20160428)Webinar - Product Matching - Palombo (20160428)
Webinar - Product Matching - Palombo (20160428)
 
Webinar - Pattern Mining Log Data - Vega (20160426)
Webinar - Pattern Mining Log Data - Vega (20160426)Webinar - Pattern Mining Log Data - Vega (20160426)
Webinar - Pattern Mining Log Data - Vega (20160426)
 
Webinar - Fraud Detection - Palombo (20160428)
Webinar - Fraud Detection - Palombo (20160428)Webinar - Fraud Detection - Palombo (20160428)
Webinar - Fraud Detection - Palombo (20160428)
 
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge DatasetsScaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
 
Pattern Mining: Extracting Value from Log Data
Pattern Mining: Extracting Value from Log DataPattern Mining: Extracting Value from Log Data
Pattern Mining: Extracting Value from Log Data
 
Intelligent Applications with Machine Learning Toolkits
Intelligent Applications with Machine Learning ToolkitsIntelligent Applications with Machine Learning Toolkits
Intelligent Applications with Machine Learning Toolkits
 
Text Analysis with Machine Learning
Text Analysis with Machine LearningText Analysis with Machine Learning
Text Analysis with Machine Learning
 
Machine Learning with GraphLab Create
Machine Learning with GraphLab CreateMachine Learning with GraphLab Create
Machine Learning with GraphLab Create
 
Machine Learning in Production with Dato Predictive Services
Machine Learning in Production with Dato Predictive ServicesMachine Learning in Production with Dato Predictive Services
Machine Learning in Production with Dato Predictive Services
 
Machine Learning in 2016: Live Q&A with Carlos Guestrin
Machine Learning in 2016: Live Q&A with Carlos GuestrinMachine Learning in 2016: Live Q&A with Carlos Guestrin
Machine Learning in 2016: Live Q&A with Carlos Guestrin
 
Scalable data structures for data science
Scalable data structures for data scienceScalable data structures for data science
Scalable data structures for data science
 
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
 
Introduction to Recommender Systems
Introduction to Recommender SystemsIntroduction to Recommender Systems
Introduction to Recommender Systems
 
Machine learning in production
Machine learning in productionMachine learning in production
Machine learning in production
 
Overview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature EngineeringOverview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature Engineering
 
SFrame
SFrameSFrame
SFrame
 
Building Personalized Data Products with Dato
Building Personalized Data Products with DatoBuilding Personalized Data Products with Dato
Building Personalized Data Products with Dato
 

Recently uploaded

Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
UiPath Studio Web workshop series - Day 5
UiPath Studio Web workshop series - Day 5UiPath Studio Web workshop series - Day 5
UiPath Studio Web workshop series - Day 5DianaGray10
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Governance in SharePoint Premium:What's in the box?
Governance in SharePoint Premium:What's in the box?Governance in SharePoint Premium:What's in the box?
Governance in SharePoint Premium:What's in the box?Juan Carlos Gonzalez
 
IEEE Computer Society’s Strategic Activities and Products including SWEBOK Guide
IEEE Computer Society’s Strategic Activities and Products including SWEBOK GuideIEEE Computer Society’s Strategic Activities and Products including SWEBOK Guide
IEEE Computer Society’s Strategic Activities and Products including SWEBOK GuideHironori Washizaki
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdfPaige Cruz
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 

Recently uploaded (20)

Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
UiPath Studio Web workshop series - Day 5
UiPath Studio Web workshop series - Day 5UiPath Studio Web workshop series - Day 5
UiPath Studio Web workshop series - Day 5
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Governance in SharePoint Premium:What's in the box?
Governance in SharePoint Premium:What's in the box?Governance in SharePoint Premium:What's in the box?
Governance in SharePoint Premium:What's in the box?
 
IEEE Computer Society’s Strategic Activities and Products including SWEBOK Guide
IEEE Computer Society’s Strategic Activities and Products including SWEBOK GuideIEEE Computer Society’s Strategic Activities and Products including SWEBOK Guide
IEEE Computer Society’s Strategic Activities and Products including SWEBOK Guide
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdf
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 

SICS: Apache Flink Streaming

  • 1. Introduction to stream processing with Apache Flink Seif Haridi KTH/SICS
  • 3. Why streaming 3 Data Warehouse Batch Data availability Streaming 2008 20152000 - Which data? - When? - Who? Data Science Summit 2015 S. Haridi
  • 4. 3 Parts of a Streaming Infrastructure 4 Gathering Broker Analysis Sensors Transaction logs … Server Logs Data Science Summit 2015 S. Haridi
  • 5. Example: Bouygues Telecom 5Data Science Summit 2015 S. Haridi • Network and subscriber data gathered • Added to Broker in raw format • Transformed and analyzed by streaming engine • Stored back for further procesing http://data-artisans.com/flink-at-bouygues.html
  • 6. What is Apache Flink? 6Data Science Summit 2015
  • 7. 1 year of Flink - code April 2014 April 2015 Data Science Summit 2015 S. Haridi 7
  • 8. What is Apache Flink 8 Distributed Data Flow Processing System ▪Focused on large-scale data analytics ▪Unified real-time stream and batch processing ▪Expressive and rich APIs in Java / Scala (+ Python) ▪Robust and fast execution backend Reduce Join Filter Reduce Map Iterate Source Sink Source Data Science Summit 2015 S. Haridi
  • 9. Flink Stack 9 Gelly Table ML SAMOA DataSet (Java/Scala) DataStream (Java/Scala) HadoopM/R Local Cluster Yarn Tez Embedded Dataflow Dataflow Table Streaming dataflow runtime Storm Zeppelin Data Science Summit 2015 S. Haridi
  • 10. Stream Processing with Flink 10Data Science Summit 2015
  • 11. What is Flink Streaming 11  Native, low-latency stream processor  Expressive functional API  Flexible operator state, iterations, windows  Exactly-once processing semantics Data Science Summit 2015 S. Haridi
  • 12. Native vs non-native streaming 12 Stream discretizer Job Job Job Jobwhile (true) { // get next few records // issue batch computation } Non-native streaming while (true) { // process next record } Long-standing operators Native streaming Data Science Summit 2015 S. Haridi
  • 13. Stream processing in Flink  Continuous Streaming model  Low processing latency  O(1) state updates per operator  Exactly once semantics for state operators Data Science Summit 2015 S. Haridi 13
  • 15. Overview of the API 15Data Science Summit 2015 S. Haridi
  • 16. Windowing Semantics 16 • Trigger and Eviction policies • window(<eviction>).every(<trigger>) • Built-in policies: – Time: Time.of(length, TimeUnit/Custom timestamp) – window(Time.of(20, SECONDS)) – Count: Count.of(windowSize) – window(Count.of(20)).every(Count.of(10)) – Delta: Delta.of(Threshold, Distance function, Start value) – window(Delta.of(0.1, priceDistanceFun, initPrice) Data Science Summit 2015 S. Haridi
  • 17. Word count in Batch and Streaming 17 case class Word (word: String, frequency: Int) val lines: DataStream[String] = env.fromSocketStream(...) lines.flatMap {line => line.split(" ") .map(word => Word(word,1))} .keyBy("word”).window(Time.of(5,SECONDS)) .every(Time.of(1,SECONDS)).sum("frequency") .print() val lines: DataSet[String] = env.readTextFile(...) lines.flatMap {line => line.split(" ") .map(word => Word(word,1))} .groupBy("word").sum("frequency") .print() DataSet API (batch): DataStream API (streaming): Data Science Summit 2015 S. Haridi
  • 18. Flexible windows 18 More at: http://flink.apache.org/news/2015/02/09/streaming-example.html Keyed Stream Windowed StreamData Stream Keyed Stream Windowed Stream  Stream of stocks  Trigger warning if price fluctuates by 5%  Count the number of warnings per stock in 30 second (tumbling) window  Do it continuously Data Science Summit 2015 S. Haridi Stock Stream Delta 5% of price Warning Count 30 sec window Sum keyBy symbol keyBy symbol
  • 19. Flexible windows 19 More at: http://flink.apache.org/news/2015/02/09/streaming-example.html case class Count(symbol: String, count: Int) val defaultPrice = StockPrice(“”, 1000) val priceWarnings = stockStream.keyBy(“symbol”) .window(Delta.of(0.05, priceChange, defaultPrice) .mapWindow(sendWarning _) Use delta policy to create change warnings Count number of warning per stock every half a minute val warningPerStock = priceWarnings.flatten() .map(Count(_, 1)) .keyBy(“symbol”) .window(Time.of(30, SECONDS)) .sum(“count”) Data Science Summit 2015 S. Haridi Stock Stream Delta 5% of price Warning Count 30 sec window Sum keyBy symbol keyBy symbol
  • 20. Iterative stream processing 20 Motivation  Many applications require cyclic streams  Machine learning applications (parallel model training, evaluation) Iterations in Flink Streaming  Native support for cyclic dataflows  Integrated with functional API  High performance and expressivity Input Train Evaluate Data Science Summit 2015 S. Haridi
  • 22. Exactly-once processing in for operator state 22  Based on consistent global snapshots  Low runtime overhead, stateful exactly- once semantics Data Science Summit 2015 S. Haridi
  • 23. Checkpointing / Recovery 23 Detailed algorithm: Lightweight Asynchronous Snapshots for Distributed Dataflows Data Science Summit 2015 S. Haridi
  • 24. Fault tolerance  Check-pointing and recovery of operator state is very fast • Data processing does not block  Executions based on CPU/operator time are not idempotent  Other execution modes are based on timestamps of input streams (Event/Ingress time) • Allows idempotent executions • End-to-End exactly-once semantics • In Flink version 0.10 24Data Science Summit 2015 S. Haridi
  • 25. Streaming in Apache Flink  True streaming over stateful distributed dataflow engine  Expressive Streaming API in Java/Scala • Flexible window semantics • Iterative computation  Low streaming latency, exactly-once semantics depending on execution mode, and low overhead for recovery 25Data Science Summit 2015 S. Haridi
  • 26. Special Thanks to Gyula Fora, SICS Paris Carbone, KTH Kostas Tzoumas, Data Artisans Stephan Ewen, Data Artisans Volker Markl, TU-Berlin 26Data Science Summit 2015