SlideShare a Scribd company logo
Real-time driving score service
using Flink
Dongwon Kim
SK telecom
My talks @FlinkForward
Flink Forward 2015
A Comparative Performance
Evaluation of Flink
Flink Forward 2017
Predictive Maintenance
with
Deep Learning and Flink .
Flink Forward 2018
Real-time driving score service
using Flink
T map, a mobile navigation app by SK telecom
≈
Choose from
frequent locations
Enter an address
or
a place nameWaze
Google Maps
T map, a mobile navigation app by SK telecom
multiple route options in driving mode arriving at destination
Driving score service by T map
I scored 83
out of 100! yay!
Driving score
KB Insurance DB Insurance
10% discount 10% discount
Car insurance discount for safe drivers
If you drive safely with ,
automobile insurance premiums
go down.
Driving score is based on three factors
My driving score
Rank : 970k Speeding
Rapid
accel.
Rapid
decel.
greatgood good
Monthly chart
Apr May Jun Jul Aug
The three factors are calculated for each session
6/29 (Fri.)
min
min
SKT Network Operation Center
Yanghyeon Village
●speeding 0 ●rapid acc. 0 ●rapid decel. 0
●speeding 1 ●rapid acc. 1 ●rapid decel. 0
6/28 (Thu.)
min
min
SKT Network Operation Center
Yanghyeon Village
●speeding 1 ●rapid acc. 1 ●rapid decel. 0
●speeding 1 ●rapid acc. 1 ●rapid decel. 1
● ● ●
The three factors are calculated for each session
● ● ●
Speeding
0.2km
My speed : 90km/h
(Speed limit : 70km/h)
Rapid accel.
(within 3 sec)
Rapid decel.
(within 3 sec)
Current client-server architecture
A GPS trajectory is generated
for each driving session
…
GPS coord.
• latitude
• longitude
• altitude
T1
GPS coord.
• latitude
• longitude
• altitude
T2
GPS coord.
• latitude
• longitude
• altitude
TN
T map
GPS trajectory
driving score
(+1day)
Batch ETL jobs are executed
twice a day
to calculate three factors ●●●
from trajectories
The main drawback
Users cannot see today’s driving scores until tomorrow
T map service server
...
11min
SKT Network Operation Center
●speeding 1 ●rapid acc. 1 ●rapid decel. 1
Migration from batch ETL to streaming processing
... ...
Service
DB
Millions
of users
...
Batch processing
Real-time streaming
processing
Goal
Let users know driving scores ASAP
Why we choose to use Flink?
https://flink.apache.org/introduction.html#features-why-flink
Exactly-once semantics
for stateful computations
stream processing and windowing
with event time semantics
flexible windowing
light-weight fault-tolerance high throughput and low latency
Contents
• Dataflow design and trigger customization
• Instrumentation with Prometheus
Source
JSON
parser SinkKafka Kafka Service DBUser
key-based
Bounded
OutOfOrderness
TimestampExtractor
(BOOTE)
messages
USER1 to ...
USER2 to ...
USER3 to ...
......
user ID +
destination
Session window
with a custom trigger
Define metrics Collect metrics Plot metrics
A 12-minute driving with 720 GPS coordinates
T map T map service server
...
...
...
...
T map generates
a GPS coordinate
every second
T map sends 4 messages to the service server
1st periodic message
(300 coordinates for the first 5 mins)
2nd periodic message
(300 coordinates for the next 5 mins)
End message
(120 coordinates for the last 2 mins)
...
...
...
T map T map service server
...
Init
message
Return scores right after receiving end messages
T map
driving score
7:20
T map service server
...
Init
a
7:08
Periodic
b
7:13
c
7:18
End
d
7:20
Messages
11min
SKT Network Operation Center
●speeding 1 ●rapid acc. 1 ●rapid decel. 1
Real-time driving score dataflow using
Source
JSON
parser SinkKafka Kafka Service DBUser
key-based
Logical
dataflow
messages
USER1 to ...
USER2 to ...
USER3 to ...
......
user ID +
destination
Session window
with a custom trigger
Bounded
OutOfOrderness
TimestampExtractor
(BOOTE)
at-least-once Kafka producer
session gap : 1 hour
Real-time driving score dataflow using
Source
JSON
parser SinkKafka Kafka Service DBUser
key-based
Logical
dataflow
Bounded
OutOfOrderness
TimestampExtractor
(BOOTE)
...
Source
Session window
with a custom trigger
p0
p1
p2
p19
20
partitions
20
tasks
256 tasks
...
...
...
p0
p1
p2
p19
20
partitions
Sink
...
several million
users
20
tasks
256
tasks
Service DB
...
User
Physical
dataflow
...
20
tasks
JSON
parser BOOTE
messages
USER1 to ...
USER2 to ...
USER3 to ...
......
user ID +
destination
messages
…
…
......
user ID +
destination
messages
…
…
......
user ID +
destination
messages
…
…
......
user ID +
destination
Session window
with a custom trigger
Session window (gap : 1 hour) with different triggers
The default
EventTimeTrigger
EarlyResultEventTimeTrigger
Time
Time
Session window (gap : 1 hour) with different triggers
8:13 8:18 8:208:08
abcd
7:13
b
Periodic
7:08
a
Init
7:18
c
Periodic
The default
EventTimeTrigger
● 1 ● 1 ● 1
EarlyResultEventTimeTrigger
7:20
d
End
fire
Time
Time
Watermark
Session window (gap : 1 hour) with different triggers
8:13 8:18 8:208:087:13
b
Periodic
7:08
a
Init
7:18
c
Periodic
7:20
d
End
8:13 8:18 8:208:087:13
b
Periodic
7:08
a
Init
7:18
c
Periodic
abcd
abcd ● 1 ● 1 ● 1
● 1 ● 1 ● 1
The default
EventTimeTrigger
EarlyResultEventTimeTrigger
7:20
d
End
early fire DO NOT fire
fire
(necessary in case of out-of-order messages)
Time
Time
Early timer
Slow for some reason
Out-or-order messages
...
Source
... ......
JSON
parser
p0
p1
p2
p19
...
p0
p1
p2
p19
Service
DB
... ...
a
Init
b
Periodic
cd
End
a
b
c
d
messages
……
……
......
user ID +
destination
messages
……
……
......
user ID +
destination
Session window w/
EarlyResultEventTimeTrigger
(session gap : 1 hour)
Sink
messages
user ID +
destination
……
……
Dongwon
to
SKT NOC
ab
Dongwon’s
iPhone
BOOTE
(maxOoO : 1 sec)
c d
How EarlyResultEventTimeTrigger deals with out-or-order messages
[Case 1] C arrives
before the early timer expires
c
[Case 2] C arrives
after the early timer expires
c
Time
Time
b
Periodic
a
Init
c
Periodic
abdc ● 1 ● 1 ● 1
d
End
early fire (perfect result) DO NOT fire
(no messages added after the last fire)
[Case 1] C arrives
before the early timer expires
c
[Case 2] C arrives
after the early timer expires
c
Time
Time
How EarlyResultEventTimeTrigger deals with out-or-order messages
b
Periodic
a
Init
c
Periodic
● 1 ● 1 ● 1
d
End
early fire (perfect result) DO NOT fire
(no messages added after the last fire)
b
Periodic
a
Init
c
Periodic
abd ● 0 ● 1 ● 1
d
End
early fire (incomplete result)
[Case 1] C arrives
before the early timer expires
c
[Case 2] C arrives
after the early timer expires
c
2nd fire (perfect result)
abc d ● 1 ● 1 ● 1
Time
Time
abdc
How EarlyResultEventTimeTrigger deals with out-or-order messages
EarlyResultEventTimeTrigger
[Constructor]
Get an evaluator to determine early firing
https://github.com/eastcirclek/flink-examples/blob/master/src/main/scala/com/github/eastcirclek/flink/trigger/EarlyResultEventTimeTrigger.scala
[onElement]
register an early timer
if the evaluator returns true
(e.g. when the end message comes in)
[onEventTime]
Fire if the early timer expires
Contents
• Dataflow design and trigger customization
• Instrumentation with Prometheus
Source
JSON
parser SinkKafka Kafka Service DBUser
key-based
Bounded
OutOfOrderness
TimestampExtractor
(BOOTE)
messages
USER1 to ...
USER2 to ...
USER3 to ...
......
user ID +
destination
Session window
with a custom trigger
Define metrics Collect metrics Plot metrics
Individual message statistics
N:1
Message stats.
extractor
Message stats.
sink
Source SinkKafka Kafka
JSON
parser BOOTE
key-based
messages
……
……
......
user ID +
destination
Source
20 tasks
... ...
20 tasks
JSON
parser
...
Message stats.
extractor
Message stats.
sink
20 tasks
1 task
Logical
dataflow
Physical
dataflow
Session window
Service DBUser
Individual message statistics
1K messages per second
100M messages per day
10s of MB per second
2 TB per day
N:1
Message stats.
extractor
Message stats.
sink
Source SinkKafka Kafka
JSON
parser BOOTE
key-based
messages
……
……
......
user ID +
destination
Logical
dataflow
Session window
Service DBUser
meter histogram histogrammeter
Jitter (ingestion time – event time)
Source SinkKafka Kafka
JSON
parser
Bounded
OutOfOrderness
TimestampExtractor
key-based
messages
……
……
......
user ID +
destination
Logical
dataflow
Session window
Service DB
event
time
ingestion
time
User
1 sec
Based on this observation,
we use 1 sec for maxOutOfOrderness
Session output statistics
N:1 N:1
Message stats.
extractor
Message stats.
sink
Session output stats.
extractor
Session output stats.
sink
Source SinkKafka Kafka
JSON
parser BOOTE
key-based
messages
……
……
......
user ID +
destination
Source
20 tasks
... ...
20 tasks
JSON
parser
...
Message stats.
extractor
Message stats.
sink
20 tasks
1 task
messages
……
……
......
user ID +
destination
256 tasks
Session output stats.
extractor
Session output stats.
sink
256 tasks
1 task...
Session window
...
messages
……
……
......
user ID +
destination
messages
……
……
......
user ID +
destination
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
Logical
dataflow
Physical
dataflow
Session window
Service DBUser
Session output statistics
N:1
Session output stats.
extractor
Session output stats.
sink
Source SinkKafka Kafka
JSON
parser BOOTE
key-based
messages
……
……
......
user ID +
destination
Logical
dataflow
Session window
Service DBUser
N:1
Message stats.
extractor
Message stats.
sink
meter histogram histogrammeter
Our own definition of latency
ingestion time
of end messages
Session output stats.
extractor
Session output stats.
sink
Source
Sink
Kafka
Kafka
JSON
parser BOOTE
Session window
messages
user ID +
destination
……
……
Dongwon
to
SKT NOC
abcd
End
d
End
d
End
d
End
d
processing time of
the resultant session output
@extractor
● 1 ● 1 ● 1
● 1 ● 1 ● 1
Considering maxOutOfOrderness is 1 second,
Flink takes at most 250 milliseconds
N:1 N:1
Message stats.
extractor
Message stats.
sink
Session output stats.
extractor
Session output stats.
sink
Source SinkKafka Kafka
JSON
parser BOOTE
key-based
messages
……
……
......
user ID +
destination
Service DBUser
How to expose metrics to Prometheus?
Session window
Flink metric reporters
TaskManager #1 TaskManager #2JobManager
Push-model and pull-model
Ganglia
reporter
Ganglia
reporter
Ganglia
reporter
push push push
TaskManager #1 TaskManager #2JobManager
Push-model and pull-model
Ganglia
reporter
Graphite
reporter
Ganglia
reporter
Graphite
reporter
Ganglia
reporter
Graphite
reporter
push push push
pushed
TaskManager #1 TaskManager #2JobManager
Push-model and pull-model
Prometheus
reporter
(HTTP endpoint)
Ganglia
reporter
Graphite
reporter
Prometheus
reporter
(HTTP endpoint)
Ganglia
reporter
Graphite
reporter
Prometheus
reporter
(HTTP endpoint)
Ganglia
reporter
Graphite
reporter
pull
pushed pushed
Reporter configuration
Prometheus
reporter
Ganglia
reporter
Graphite
reporter
https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html#reporter
client
socket
client
socket
server
socket
Node
Manager
w1
Node
Manager
w2
Node
Manager
w3
Node
Manager
w4
Resource
Manager
Endpoint addresses cannot be determined in advance
#!/bin/bash
# launch a Flink per-job cluster on YARN
flink run
--jobmanager yarn-cluster
--yarncontainer 4
...
# flink-conf.yaml
...
metrics.reporter.prom.port: 5001-5100
...
Q. Can we list the endpoint addresses
before YARN’s scheduling?
A. No, impossible
Node
Manager
w1
Node
Manager
w2
Node
Manager
w3
Node
Manager
w4
Resource
Manager
Endpoint addresses cannot be determined in advance
#!/bin/bash
# launch a Flink per-job cluster on YARN
flink run
--jobmanager yarn-cluster
--yarncontainer 4
...
# flink-conf.yaml
...
metrics.reporter.prom.port: 5001-5100
...
TM
Prom. endpoint
w1:5002
TM
Prom. endpoint
w2:5001
JM
Prom. endpoint
w2:5001 TM
Prom. endpoint
w3:5001
TM
Prom. endpoint
w4:5001
Possible world #1
Node
Manager
w1
Node
Manager
w2
Node
Manager
w3
Node
Manager
w4
Resource
Manager
Endpoint addresses cannot be determined in advance
#!/bin/bash
# launch a Flink per-job cluster on YARN
flink run
--jobmanager yarn-cluster
--yarncontainer 4
...
# flink-conf.yaml
...
metrics.reporter.prom.port: 5001-5100
...
TM
Prom. endpoint
w1:5001
TM
Prom. endpoint
w2:5002
JM
Prom. endpoint
w2:5001 TM
Prom. endpoint
w3:5001
TM
Prom. endpoint
w4:5001
Possible world #2
Node
Manager
w1
Node
Manager
w2
Node
Manager
w3
Node
Manager
w4
Resource
Manager
Endpoint addresses cannot be determined in advance
#!/bin/bash
# launch a Flink per-job cluster on YARN
flink run
--jobmanager yarn-cluster
--yarncontainer 4
...
# flink-conf.yaml
...
metrics.reporter.prom.port: 5001-5100
...
TM
Prom. endpoint
w1:5001
TM
Prom. endpoint
w2:5001
JM
Prom. endpoint
w2:5001
TM
Prom. endpoint
w3:5002
TM
Prom. endpoint
w4:5001
Possible world #3
Node
Manager
w1
Node
Manager
w2
Node
Manager
w3
Node
Manager
w4
Resource
Manager
Endpoint addresses cannot be determined in advance
#!/bin/bash
# launch a Flink per-job cluster on YARN
flink run
--jobmanager yarn-cluster
--yarncontainer 4
...
# flink-conf.yaml
...
metrics.reporter.prom.port: 5001-5100
...
TM
Prom. endpoint
w1:5001
TM
Prom. endpoint
w2:5001
JM
Prom. endpoint
w2:5001TM
Prom. endpoint
w3:5001
TM
Prom. endpoint
w4:5002
Possible world #4
Node
Manager
w1
Node
Manager
w2
Node
Manager
w3
Node
Manager
w4
Resource
Manager
Endpoint addresses cannot be determined in advance
#!/bin/bash
# launch a Flink per-job cluster on YARN
flink run
--jobmanager yarn-cluster
--yarncontainer 4
...
# flink-conf.yaml
...
metrics.reporter.prom.port: 5001-5100
...
TM
Prom. endpoint
w1:5001
TM
Prom. endpoint
w2:5002
JM
Prom. endpoint
w2:5001 TM
Prom. endpoint
w3:5001
TM
Prom. endpoint
w4:5001
TM
Prom. endpoint
w1:5002
TM
Prom. endpoint
w2:5003
JM
Prom. endpoint
w3:5002
TM
Prom. endpoint
w3:5003
TM
Prom. endpoint
w4:5002
Where to scrape metrics form?
Endpoint addresses are available after a cluster is up
TM
Prom. endpoint
w1:5001
TM
Prom. endpoint
w2:5002
JobManager
Prom. endpoint : w2:5001
TM
Prom. endpoint
w3:5001
TM
Prom. endpoint
w4:5001
TM
Prom. endpoint
w1:5002
TM
Prom. endpoint
w2:5003
JobManager
Prom. endpoint : w3:5002
TM
Prom. endpoint
w3:5003
TM
Prom. endpoint
w4:5002
A per-job cluster
(YARN ID : application_1500000000000_0001)
Another per-job cluster
(YARN ID : application_1500000000000_0002)
File-based service discovery mechanism
TM
Prom. endpoint
w1:5001
TM
Prom. endpoint
w2:5002
JobManager
Prom. endpoint : w2:5001
TM
Prom. endpoint
w3:5001
TM
Prom. endpoint
w4:5001
TM
Prom. endpoint
w1:5002
TM
Prom. endpoint
w2:5003
JobManager
Prom. endpoint : w3:5002
TM
Prom. endpoint
w3:5003
TM
Prom. endpoint
w4:5002
A per-job cluster
(YARN ID : application_1500000000000_0001)
Another per-job cluster
(YARN ID : application_1500000000000_0002)
/etc/prometheus/flink-service-discovery/
[
{
“targets”: [“w2:5001”, “w1:5001”, “w2:5002”, “w3:5001”, “w4:5001”],
}
]
application_1528160315197_0001.json
[
{
“targets”: [“w3:5002”, “w1:5002”, “w2:5003”, “w3:5003”, “w4:5002”],
}
]
application_1528160315197_0002.json
🙄 watches file names matching a given pattern
File-based service discovery
scrape metrics
from known endpoints
TM
Prom. endpoint
w1:5001
TM
Prom. endpoint
w2:5002
JobManager
Prom. endpoint : w2:5001
TM
Prom. endpoint
w3:5001
TM
Prom. endpoint
w4:5001
TM
Prom. endpoint
w1:5002
TM
Prom. endpoint
w2:5003
JobManager
Prom. endpoint : w3:5002
TM
Prom. endpoint
w3:5003
TM
Prom. endpoint
w4:5002
A per-job cluster
(YARN ID : application_1500000000000_0001)
Another per-job cluster
(YARN ID : application_1500000000000_0002)
w2:5001, w1:5001, w2:5002, w3:5001, w4:5001
w3:5002, w1:5002, w2:5003, w3:5003, w4:5002
flink-service-discovery
https://github.com/eastcirclek/flink-service-discovery
YARN
Resource
Manager
discovery.py
param1) rmAddr
param2) targetDir
TM
Prom. endpoint
w1:5001
TM
Prom. endpoint
w2:5002
JobManager
Prom. endpoint : w2:5001
TM
Prom. endpoint
w3:5001
TM
Prom. endpoint
w4:5001
1) watch a new Flink cluster
2) get the address of JM
3) get all TM identifiers
4) identify all endpoints
by scrapping JM/TM logs
[
{
“targets”: [“w2:5001”, “w1:5001”,
“w2:5002”, “w3:5001”, “w4:5001”],
}
]
application_1528160315197_0001.json
/etc/prometheus/flink-service-discovery/
🙄
5) create a file
6) scrape metrics from JM and TMs
Grafana dashboard
Overview & summary
• Dataflow design and trigger customization
• Instrumentation with Prometheus
Source
JSON
parser SinkKafka Kafka Service DBUser
key-based
Bounded
OutOfOrderness
TimestampExtractor
(BOOTE)
messages
USER1 to ...
USER2 to ...
USER3 to ...
......
user ID +
destination
Session window
with a custom trigger
Define metrics Collect metrics Plot metrics
THE END

More Related Content

Similar to Flink Forward Berlin 2018: Dongwon Kim - "Real-time driving score service using Flink"

Netflix - Realtime Impression Store
Netflix - Realtime Impression Store Netflix - Realtime Impression Store
Netflix - Realtime Impression Store
Nitin S
 
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's ScalePinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Seunghyun Lee
 
Story of migrating event pipeline from batch to streaming
Story of migrating event pipeline from batch to streamingStory of migrating event pipeline from batch to streaming
Story of migrating event pipeline from batch to streaming
lohitvijayarenu
 
Keystone event processing pipeline on a dockerized microservices architecture
Keystone event processing pipeline on a dockerized microservices architectureKeystone event processing pipeline on a dockerized microservices architecture
Keystone event processing pipeline on a dockerized microservices architecture
Zhenzhong Xu
 
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
Amazon Web Services
 
Evolution of Real-time User Engagement Event Consumption at Pinterest
Evolution of Real-time User Engagement Event Consumption at PinterestEvolution of Real-time User Engagement Event Consumption at Pinterest
Evolution of Real-time User Engagement Event Consumption at Pinterest
HostedbyConfluent
 
Mirko Damiani - An Embedded soft real time distributed system in Go
Mirko Damiani - An Embedded soft real time distributed system in GoMirko Damiani - An Embedded soft real time distributed system in Go
Mirko Damiani - An Embedded soft real time distributed system in Go
linuxlab_conf
 
BDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
BDA403 How Netflix Monitors Applications in Real-time with Amazon KinesisBDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
BDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
Amazon Web Services
 
Introduction to ksqlDB and stream processing (Vish Srinivasan - Confluent)
Introduction to ksqlDB and stream processing (Vish Srinivasan  - Confluent)Introduction to ksqlDB and stream processing (Vish Srinivasan  - Confluent)
Introduction to ksqlDB and stream processing (Vish Srinivasan - Confluent)
KafkaZone
 
Flink Forward San Francisco 2019: Building Financial Identity Platform using ...
Flink Forward San Francisco 2019: Building Financial Identity Platform using ...Flink Forward San Francisco 2019: Building Financial Identity Platform using ...
Flink Forward San Francisco 2019: Building Financial Identity Platform using ...
Flink Forward
 
Stream processing at Hotstar
Stream processing at HotstarStream processing at Hotstar
Stream processing at Hotstar
KafkaZone
 
Unbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxiniUnbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxini
Monal Daxini
 
Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018
Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018
Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018
Seunghyun Lee
 
Running Flink in Production: The good, The bad and The in Between - Lakshmi ...
Running Flink in Production:  The good, The bad and The in Between - Lakshmi ...Running Flink in Production:  The good, The bad and The in Between - Lakshmi ...
Running Flink in Production: The good, The bad and The in Between - Lakshmi ...
Flink Forward
 
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Big Data Spain
 
A Practical Deep Dive into Observability of Streaming Applications with Kosta...
A Practical Deep Dive into Observability of Streaming Applications with Kosta...A Practical Deep Dive into Observability of Streaming Applications with Kosta...
A Practical Deep Dive into Observability of Streaming Applications with Kosta...
HostedbyConfluent
 
So you think you can stream.pptx
So you think you can stream.pptxSo you think you can stream.pptx
So you think you can stream.pptx
Prakash Chockalingam
 
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward
 
Predictive Maintenance with Deep Learning and Apache Flink
Predictive Maintenance with Deep Learning and Apache FlinkPredictive Maintenance with Deep Learning and Apache Flink
Predictive Maintenance with Deep Learning and Apache Flink
Dongwon Kim
 
Norikra: SQL Stream Processing In Ruby
Norikra: SQL Stream Processing In RubyNorikra: SQL Stream Processing In Ruby
Norikra: SQL Stream Processing In Ruby
SATOSHI TAGOMORI
 

Similar to Flink Forward Berlin 2018: Dongwon Kim - "Real-time driving score service using Flink" (20)

Netflix - Realtime Impression Store
Netflix - Realtime Impression Store Netflix - Realtime Impression Store
Netflix - Realtime Impression Store
 
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's ScalePinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
 
Story of migrating event pipeline from batch to streaming
Story of migrating event pipeline from batch to streamingStory of migrating event pipeline from batch to streaming
Story of migrating event pipeline from batch to streaming
 
Keystone event processing pipeline on a dockerized microservices architecture
Keystone event processing pipeline on a dockerized microservices architectureKeystone event processing pipeline on a dockerized microservices architecture
Keystone event processing pipeline on a dockerized microservices architecture
 
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
 
Evolution of Real-time User Engagement Event Consumption at Pinterest
Evolution of Real-time User Engagement Event Consumption at PinterestEvolution of Real-time User Engagement Event Consumption at Pinterest
Evolution of Real-time User Engagement Event Consumption at Pinterest
 
Mirko Damiani - An Embedded soft real time distributed system in Go
Mirko Damiani - An Embedded soft real time distributed system in GoMirko Damiani - An Embedded soft real time distributed system in Go
Mirko Damiani - An Embedded soft real time distributed system in Go
 
BDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
BDA403 How Netflix Monitors Applications in Real-time with Amazon KinesisBDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
BDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
 
Introduction to ksqlDB and stream processing (Vish Srinivasan - Confluent)
Introduction to ksqlDB and stream processing (Vish Srinivasan  - Confluent)Introduction to ksqlDB and stream processing (Vish Srinivasan  - Confluent)
Introduction to ksqlDB and stream processing (Vish Srinivasan - Confluent)
 
Flink Forward San Francisco 2019: Building Financial Identity Platform using ...
Flink Forward San Francisco 2019: Building Financial Identity Platform using ...Flink Forward San Francisco 2019: Building Financial Identity Platform using ...
Flink Forward San Francisco 2019: Building Financial Identity Platform using ...
 
Stream processing at Hotstar
Stream processing at HotstarStream processing at Hotstar
Stream processing at Hotstar
 
Unbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxiniUnbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxini
 
Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018
Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018
Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018
 
Running Flink in Production: The good, The bad and The in Between - Lakshmi ...
Running Flink in Production:  The good, The bad and The in Between - Lakshmi ...Running Flink in Production:  The good, The bad and The in Between - Lakshmi ...
Running Flink in Production: The good, The bad and The in Between - Lakshmi ...
 
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
 
A Practical Deep Dive into Observability of Streaming Applications with Kosta...
A Practical Deep Dive into Observability of Streaming Applications with Kosta...A Practical Deep Dive into Observability of Streaming Applications with Kosta...
A Practical Deep Dive into Observability of Streaming Applications with Kosta...
 
So you think you can stream.pptx
So you think you can stream.pptxSo you think you can stream.pptx
So you think you can stream.pptx
 
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
 
Predictive Maintenance with Deep Learning and Apache Flink
Predictive Maintenance with Deep Learning and Apache FlinkPredictive Maintenance with Deep Learning and Apache Flink
Predictive Maintenance with Deep Learning and Apache Flink
 
Norikra: SQL Stream Processing In Ruby
Norikra: SQL Stream Processing In RubyNorikra: SQL Stream Processing In Ruby
Norikra: SQL Stream Processing In Ruby
 

More from Flink Forward

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
Flink Forward
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Flink Forward
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
Flink Forward
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
Flink Forward
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
Flink Forward
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
Flink Forward
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
Flink Forward
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
Flink Forward
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
Flink Forward
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
Flink Forward
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
Flink Forward
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
Flink Forward
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
Flink Forward
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
Flink Forward
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
Flink Forward
 

More from Flink Forward (20)

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 

Recently uploaded

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
UiPathCommunity
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 

Recently uploaded (20)

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 

Flink Forward Berlin 2018: Dongwon Kim - "Real-time driving score service using Flink"

  • 1. Real-time driving score service using Flink Dongwon Kim SK telecom
  • 2. My talks @FlinkForward Flink Forward 2015 A Comparative Performance Evaluation of Flink Flink Forward 2017 Predictive Maintenance with Deep Learning and Flink . Flink Forward 2018 Real-time driving score service using Flink
  • 3. T map, a mobile navigation app by SK telecom ≈ Choose from frequent locations Enter an address or a place nameWaze Google Maps
  • 4. T map, a mobile navigation app by SK telecom multiple route options in driving mode arriving at destination
  • 5. Driving score service by T map I scored 83 out of 100! yay! Driving score KB Insurance DB Insurance 10% discount 10% discount Car insurance discount for safe drivers If you drive safely with , automobile insurance premiums go down.
  • 6. Driving score is based on three factors My driving score Rank : 970k Speeding Rapid accel. Rapid decel. greatgood good Monthly chart Apr May Jun Jul Aug
  • 7. The three factors are calculated for each session 6/29 (Fri.) min min SKT Network Operation Center Yanghyeon Village ●speeding 0 ●rapid acc. 0 ●rapid decel. 0 ●speeding 1 ●rapid acc. 1 ●rapid decel. 0 6/28 (Thu.) min min SKT Network Operation Center Yanghyeon Village ●speeding 1 ●rapid acc. 1 ●rapid decel. 0 ●speeding 1 ●rapid acc. 1 ●rapid decel. 1 ● ● ●
  • 8. The three factors are calculated for each session ● ● ● Speeding 0.2km My speed : 90km/h (Speed limit : 70km/h) Rapid accel. (within 3 sec) Rapid decel. (within 3 sec)
  • 9. Current client-server architecture A GPS trajectory is generated for each driving session … GPS coord. • latitude • longitude • altitude T1 GPS coord. • latitude • longitude • altitude T2 GPS coord. • latitude • longitude • altitude TN T map GPS trajectory driving score (+1day) Batch ETL jobs are executed twice a day to calculate three factors ●●● from trajectories The main drawback Users cannot see today’s driving scores until tomorrow T map service server ... 11min SKT Network Operation Center ●speeding 1 ●rapid acc. 1 ●rapid decel. 1
  • 10. Migration from batch ETL to streaming processing ... ... Service DB Millions of users ... Batch processing Real-time streaming processing Goal Let users know driving scores ASAP
  • 11. Why we choose to use Flink? https://flink.apache.org/introduction.html#features-why-flink Exactly-once semantics for stateful computations stream processing and windowing with event time semantics flexible windowing light-weight fault-tolerance high throughput and low latency
  • 12. Contents • Dataflow design and trigger customization • Instrumentation with Prometheus Source JSON parser SinkKafka Kafka Service DBUser key-based Bounded OutOfOrderness TimestampExtractor (BOOTE) messages USER1 to ... USER2 to ... USER3 to ... ...... user ID + destination Session window with a custom trigger Define metrics Collect metrics Plot metrics
  • 13. A 12-minute driving with 720 GPS coordinates T map T map service server ... ... ... ... T map generates a GPS coordinate every second
  • 14. T map sends 4 messages to the service server 1st periodic message (300 coordinates for the first 5 mins) 2nd periodic message (300 coordinates for the next 5 mins) End message (120 coordinates for the last 2 mins) ... ... ... T map T map service server ... Init message
  • 15. Return scores right after receiving end messages T map driving score 7:20 T map service server ... Init a 7:08 Periodic b 7:13 c 7:18 End d 7:20 Messages 11min SKT Network Operation Center ●speeding 1 ●rapid acc. 1 ●rapid decel. 1
  • 16. Real-time driving score dataflow using Source JSON parser SinkKafka Kafka Service DBUser key-based Logical dataflow messages USER1 to ... USER2 to ... USER3 to ... ...... user ID + destination Session window with a custom trigger Bounded OutOfOrderness TimestampExtractor (BOOTE) at-least-once Kafka producer session gap : 1 hour
  • 17. Real-time driving score dataflow using Source JSON parser SinkKafka Kafka Service DBUser key-based Logical dataflow Bounded OutOfOrderness TimestampExtractor (BOOTE) ... Source Session window with a custom trigger p0 p1 p2 p19 20 partitions 20 tasks 256 tasks ... ... ... p0 p1 p2 p19 20 partitions Sink ... several million users 20 tasks 256 tasks Service DB ... User Physical dataflow ... 20 tasks JSON parser BOOTE messages USER1 to ... USER2 to ... USER3 to ... ...... user ID + destination messages … … ...... user ID + destination messages … … ...... user ID + destination messages … … ...... user ID + destination Session window with a custom trigger
  • 18. Session window (gap : 1 hour) with different triggers The default EventTimeTrigger EarlyResultEventTimeTrigger Time Time
  • 19. Session window (gap : 1 hour) with different triggers 8:13 8:18 8:208:08 abcd 7:13 b Periodic 7:08 a Init 7:18 c Periodic The default EventTimeTrigger ● 1 ● 1 ● 1 EarlyResultEventTimeTrigger 7:20 d End fire Time Time Watermark
  • 20. Session window (gap : 1 hour) with different triggers 8:13 8:18 8:208:087:13 b Periodic 7:08 a Init 7:18 c Periodic 7:20 d End 8:13 8:18 8:208:087:13 b Periodic 7:08 a Init 7:18 c Periodic abcd abcd ● 1 ● 1 ● 1 ● 1 ● 1 ● 1 The default EventTimeTrigger EarlyResultEventTimeTrigger 7:20 d End early fire DO NOT fire fire (necessary in case of out-of-order messages) Time Time Early timer
  • 21. Slow for some reason Out-or-order messages ... Source ... ...... JSON parser p0 p1 p2 p19 ... p0 p1 p2 p19 Service DB ... ... a Init b Periodic cd End a b c d messages …… …… ...... user ID + destination messages …… …… ...... user ID + destination Session window w/ EarlyResultEventTimeTrigger (session gap : 1 hour) Sink messages user ID + destination …… …… Dongwon to SKT NOC ab Dongwon’s iPhone BOOTE (maxOoO : 1 sec) c d
  • 22. How EarlyResultEventTimeTrigger deals with out-or-order messages [Case 1] C arrives before the early timer expires c [Case 2] C arrives after the early timer expires c Time Time
  • 23. b Periodic a Init c Periodic abdc ● 1 ● 1 ● 1 d End early fire (perfect result) DO NOT fire (no messages added after the last fire) [Case 1] C arrives before the early timer expires c [Case 2] C arrives after the early timer expires c Time Time How EarlyResultEventTimeTrigger deals with out-or-order messages
  • 24. b Periodic a Init c Periodic ● 1 ● 1 ● 1 d End early fire (perfect result) DO NOT fire (no messages added after the last fire) b Periodic a Init c Periodic abd ● 0 ● 1 ● 1 d End early fire (incomplete result) [Case 1] C arrives before the early timer expires c [Case 2] C arrives after the early timer expires c 2nd fire (perfect result) abc d ● 1 ● 1 ● 1 Time Time abdc How EarlyResultEventTimeTrigger deals with out-or-order messages
  • 25. EarlyResultEventTimeTrigger [Constructor] Get an evaluator to determine early firing https://github.com/eastcirclek/flink-examples/blob/master/src/main/scala/com/github/eastcirclek/flink/trigger/EarlyResultEventTimeTrigger.scala [onElement] register an early timer if the evaluator returns true (e.g. when the end message comes in) [onEventTime] Fire if the early timer expires
  • 26. Contents • Dataflow design and trigger customization • Instrumentation with Prometheus Source JSON parser SinkKafka Kafka Service DBUser key-based Bounded OutOfOrderness TimestampExtractor (BOOTE) messages USER1 to ... USER2 to ... USER3 to ... ...... user ID + destination Session window with a custom trigger Define metrics Collect metrics Plot metrics
  • 27. Individual message statistics N:1 Message stats. extractor Message stats. sink Source SinkKafka Kafka JSON parser BOOTE key-based messages …… …… ...... user ID + destination Source 20 tasks ... ... 20 tasks JSON parser ... Message stats. extractor Message stats. sink 20 tasks 1 task Logical dataflow Physical dataflow Session window Service DBUser
  • 28. Individual message statistics 1K messages per second 100M messages per day 10s of MB per second 2 TB per day N:1 Message stats. extractor Message stats. sink Source SinkKafka Kafka JSON parser BOOTE key-based messages …… …… ...... user ID + destination Logical dataflow Session window Service DBUser meter histogram histogrammeter
  • 29. Jitter (ingestion time – event time) Source SinkKafka Kafka JSON parser Bounded OutOfOrderness TimestampExtractor key-based messages …… …… ...... user ID + destination Logical dataflow Session window Service DB event time ingestion time User 1 sec Based on this observation, we use 1 sec for maxOutOfOrderness
  • 30. Session output statistics N:1 N:1 Message stats. extractor Message stats. sink Session output stats. extractor Session output stats. sink Source SinkKafka Kafka JSON parser BOOTE key-based messages …… …… ...... user ID + destination Source 20 tasks ... ... 20 tasks JSON parser ... Message stats. extractor Message stats. sink 20 tasks 1 task messages …… …… ...... user ID + destination 256 tasks Session output stats. extractor Session output stats. sink 256 tasks 1 task... Session window ... messages …… …… ...... user ID + destination messages …… …… ...... user ID + destination ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Logical dataflow Physical dataflow Session window Service DBUser
  • 31. Session output statistics N:1 Session output stats. extractor Session output stats. sink Source SinkKafka Kafka JSON parser BOOTE key-based messages …… …… ...... user ID + destination Logical dataflow Session window Service DBUser N:1 Message stats. extractor Message stats. sink meter histogram histogrammeter
  • 32. Our own definition of latency ingestion time of end messages Session output stats. extractor Session output stats. sink Source Sink Kafka Kafka JSON parser BOOTE Session window messages user ID + destination …… …… Dongwon to SKT NOC abcd End d End d End d End d processing time of the resultant session output @extractor ● 1 ● 1 ● 1 ● 1 ● 1 ● 1 Considering maxOutOfOrderness is 1 second, Flink takes at most 250 milliseconds
  • 33. N:1 N:1 Message stats. extractor Message stats. sink Session output stats. extractor Session output stats. sink Source SinkKafka Kafka JSON parser BOOTE key-based messages …… …… ...... user ID + destination Service DBUser How to expose metrics to Prometheus? Session window
  • 35. TaskManager #1 TaskManager #2JobManager Push-model and pull-model Ganglia reporter Ganglia reporter Ganglia reporter push push push
  • 36. TaskManager #1 TaskManager #2JobManager Push-model and pull-model Ganglia reporter Graphite reporter Ganglia reporter Graphite reporter Ganglia reporter Graphite reporter push push push pushed
  • 37. TaskManager #1 TaskManager #2JobManager Push-model and pull-model Prometheus reporter (HTTP endpoint) Ganglia reporter Graphite reporter Prometheus reporter (HTTP endpoint) Ganglia reporter Graphite reporter Prometheus reporter (HTTP endpoint) Ganglia reporter Graphite reporter pull pushed pushed
  • 39. Node Manager w1 Node Manager w2 Node Manager w3 Node Manager w4 Resource Manager Endpoint addresses cannot be determined in advance #!/bin/bash # launch a Flink per-job cluster on YARN flink run --jobmanager yarn-cluster --yarncontainer 4 ... # flink-conf.yaml ... metrics.reporter.prom.port: 5001-5100 ... Q. Can we list the endpoint addresses before YARN’s scheduling? A. No, impossible
  • 40. Node Manager w1 Node Manager w2 Node Manager w3 Node Manager w4 Resource Manager Endpoint addresses cannot be determined in advance #!/bin/bash # launch a Flink per-job cluster on YARN flink run --jobmanager yarn-cluster --yarncontainer 4 ... # flink-conf.yaml ... metrics.reporter.prom.port: 5001-5100 ... TM Prom. endpoint w1:5002 TM Prom. endpoint w2:5001 JM Prom. endpoint w2:5001 TM Prom. endpoint w3:5001 TM Prom. endpoint w4:5001 Possible world #1
  • 41. Node Manager w1 Node Manager w2 Node Manager w3 Node Manager w4 Resource Manager Endpoint addresses cannot be determined in advance #!/bin/bash # launch a Flink per-job cluster on YARN flink run --jobmanager yarn-cluster --yarncontainer 4 ... # flink-conf.yaml ... metrics.reporter.prom.port: 5001-5100 ... TM Prom. endpoint w1:5001 TM Prom. endpoint w2:5002 JM Prom. endpoint w2:5001 TM Prom. endpoint w3:5001 TM Prom. endpoint w4:5001 Possible world #2
  • 42. Node Manager w1 Node Manager w2 Node Manager w3 Node Manager w4 Resource Manager Endpoint addresses cannot be determined in advance #!/bin/bash # launch a Flink per-job cluster on YARN flink run --jobmanager yarn-cluster --yarncontainer 4 ... # flink-conf.yaml ... metrics.reporter.prom.port: 5001-5100 ... TM Prom. endpoint w1:5001 TM Prom. endpoint w2:5001 JM Prom. endpoint w2:5001 TM Prom. endpoint w3:5002 TM Prom. endpoint w4:5001 Possible world #3
  • 43. Node Manager w1 Node Manager w2 Node Manager w3 Node Manager w4 Resource Manager Endpoint addresses cannot be determined in advance #!/bin/bash # launch a Flink per-job cluster on YARN flink run --jobmanager yarn-cluster --yarncontainer 4 ... # flink-conf.yaml ... metrics.reporter.prom.port: 5001-5100 ... TM Prom. endpoint w1:5001 TM Prom. endpoint w2:5001 JM Prom. endpoint w2:5001TM Prom. endpoint w3:5001 TM Prom. endpoint w4:5002 Possible world #4
  • 44. Node Manager w1 Node Manager w2 Node Manager w3 Node Manager w4 Resource Manager Endpoint addresses cannot be determined in advance #!/bin/bash # launch a Flink per-job cluster on YARN flink run --jobmanager yarn-cluster --yarncontainer 4 ... # flink-conf.yaml ... metrics.reporter.prom.port: 5001-5100 ... TM Prom. endpoint w1:5001 TM Prom. endpoint w2:5002 JM Prom. endpoint w2:5001 TM Prom. endpoint w3:5001 TM Prom. endpoint w4:5001 TM Prom. endpoint w1:5002 TM Prom. endpoint w2:5003 JM Prom. endpoint w3:5002 TM Prom. endpoint w3:5003 TM Prom. endpoint w4:5002
  • 45. Where to scrape metrics form? Endpoint addresses are available after a cluster is up TM Prom. endpoint w1:5001 TM Prom. endpoint w2:5002 JobManager Prom. endpoint : w2:5001 TM Prom. endpoint w3:5001 TM Prom. endpoint w4:5001 TM Prom. endpoint w1:5002 TM Prom. endpoint w2:5003 JobManager Prom. endpoint : w3:5002 TM Prom. endpoint w3:5003 TM Prom. endpoint w4:5002 A per-job cluster (YARN ID : application_1500000000000_0001) Another per-job cluster (YARN ID : application_1500000000000_0002)
  • 46. File-based service discovery mechanism TM Prom. endpoint w1:5001 TM Prom. endpoint w2:5002 JobManager Prom. endpoint : w2:5001 TM Prom. endpoint w3:5001 TM Prom. endpoint w4:5001 TM Prom. endpoint w1:5002 TM Prom. endpoint w2:5003 JobManager Prom. endpoint : w3:5002 TM Prom. endpoint w3:5003 TM Prom. endpoint w4:5002 A per-job cluster (YARN ID : application_1500000000000_0001) Another per-job cluster (YARN ID : application_1500000000000_0002) /etc/prometheus/flink-service-discovery/ [ { “targets”: [“w2:5001”, “w1:5001”, “w2:5002”, “w3:5001”, “w4:5001”], } ] application_1528160315197_0001.json [ { “targets”: [“w3:5002”, “w1:5002”, “w2:5003”, “w3:5003”, “w4:5002”], } ] application_1528160315197_0002.json 🙄 watches file names matching a given pattern
  • 47. File-based service discovery scrape metrics from known endpoints TM Prom. endpoint w1:5001 TM Prom. endpoint w2:5002 JobManager Prom. endpoint : w2:5001 TM Prom. endpoint w3:5001 TM Prom. endpoint w4:5001 TM Prom. endpoint w1:5002 TM Prom. endpoint w2:5003 JobManager Prom. endpoint : w3:5002 TM Prom. endpoint w3:5003 TM Prom. endpoint w4:5002 A per-job cluster (YARN ID : application_1500000000000_0001) Another per-job cluster (YARN ID : application_1500000000000_0002) w2:5001, w1:5001, w2:5002, w3:5001, w4:5001 w3:5002, w1:5002, w2:5003, w3:5003, w4:5002
  • 48. flink-service-discovery https://github.com/eastcirclek/flink-service-discovery YARN Resource Manager discovery.py param1) rmAddr param2) targetDir TM Prom. endpoint w1:5001 TM Prom. endpoint w2:5002 JobManager Prom. endpoint : w2:5001 TM Prom. endpoint w3:5001 TM Prom. endpoint w4:5001 1) watch a new Flink cluster 2) get the address of JM 3) get all TM identifiers 4) identify all endpoints by scrapping JM/TM logs [ { “targets”: [“w2:5001”, “w1:5001”, “w2:5002”, “w3:5001”, “w4:5001”], } ] application_1528160315197_0001.json /etc/prometheus/flink-service-discovery/ 🙄 5) create a file 6) scrape metrics from JM and TMs
  • 50. Overview & summary • Dataflow design and trigger customization • Instrumentation with Prometheus Source JSON parser SinkKafka Kafka Service DBUser key-based Bounded OutOfOrderness TimestampExtractor (BOOTE) messages USER1 to ... USER2 to ... USER3 to ... ...... user ID + destination Session window with a custom trigger Define metrics Collect metrics Plot metrics