SlideShare a Scribd company logo
Gearpump
Realtime Streaming on Akka
2015/4/24
钟翔
Mail: xiang.zhong@intel.com
Weibo:http://weibo.com/clockfly
Intel SSG
Big Data Technology Department
QCON Beijing 2015
What is Gearpump
• Akka based lightweight Real time data processing platform.
• Apache License http://gearpump.io
• Akka:
• Communication,
concurrency, Isolation,
and fault-tolerant
Simple and Powerful
Message level streaming
Long running daemons
2
What is Akka?
It is like our human society, driven by message
What is Akka?
• Micro-service(Actor) oriented.
• Message Driven
• Lock-free
• Location-transparent
• Better performance
• Fail in-dependently
• Scales linearly
Gearpump in Big Data Stack
store
visualization
batch stream
SQL
Catalyst
StreamSQL
Impala
Cluster manager monitor/alert/notify
Machine learning
Graphx
Cloudera
Manager
Storage
Engine
Analytics
Visualization
& management
storm
Data explore
4
Gearpump
Why another streaming platform?
• The requirements are not fully met.
• We need a higher abstraction to build software!
5
Streaming requirements
• Michael Stonebraker, The 8 Requirements of
Real-Time Stream Processing (2006)
My summary
6
Flexible Volume Speed Accuracy Visual
Easy programing
Any time
Any where
Any size
Any source
Any use case
Dynamic DAG
②StreamSQL
High
throughput
⑦Scale linearly
①In-Stream
Zero latency
⑥HA
⑧Responsive
Exactly-once
③Message
loss/delay/out
of order
④Predictable
WYSWYG
Gearpump Highlights
7
100% Akka
Throughput
11 million/s (*)
2ms Latency
Exactly-once Dynamic DAG
Out of Order
Message
Flexible DSL
DAG
Visualization
Internet of
Thing
[*] on 4 nodes
performance
function
usability
USING GEARPUMP
8
How to Submit an application
Master
Workers
1. Submit a Jar
Master
Workers
AppMaster 2. Create AppMaster
Master
Workers
AppMaster
YARN
Master
Workers
AppMaster Executor Executor Executor
1 2
43
94. Report Executor to AppMaster
DAG representation and API
Graph(A~>B~>C~>D, B~>E~>D)
DAG API Syntax:
10
A B C D
E
Processor
Processor Processor Processor
Processor
DAG
Shuffle
DAG API Example - WordCount
val context = new ClientContext()
val split = Processor[Split](splitParallism)
val sum = Processor[Sum](sumParallism)
val app = StreamApplication("wordCount", Graph(split ~> sum), UserConfig.empty)
val appId = context.submit(app)
context.close()
class Split(taskContext : TaskContext, conf: UserConfig) extends Task(taskContext, conf) {
override def onNext(msg : Message) : Unit = { /* split the line */ }
}
class Sum (taskContext : TaskContext, conf: UserConfig) extends Task(taskContext, conf) {
override def onNext(msg : Message) : Unit = {/* do aggregation on word*/}
}
11
WordCount with DSL API
val context = ClientContext()
val app = new StreamApp("dsl", context)
val data = "This is a good start, bingo!! bingo!!"
app.fromCollection(data.lines)
// word => (word, count = 1)
.flatMap(line => line.split("[s]+")).map((_, 1))
// (word, count1), (word, count2) => (word, count1 + count2)
.groupByKey().sum.log
val appId = context.submit(app)
context.close()
12
High level DSL API Details
Transformer Operators
flatMap merge
map groupBy
filter process
reduce
Concepts Description
StreamApp
OP(Operator)
Graph
Stream path of OP Graph
OP Transformer
Transformation on
Stream
OP Graph Processor Graph
Optimize
X
source
merge
reduceflatmap
flatmap
map
join
sink
sink
source
Bottom-up optimization
Co-locate
13
DAG Page
DAG Visualization
Track global min-Clock
of all message DAG:
• Node size reflect throughput
• Edge width represents flow rate
• Red node means something goes wrong
14
DAG Visualization
Processor Page
Skew analysis Task throughput and latency
Executor JVM deployment
15
PERFORMANCE TEST
THROUGHOUT、SCALABILITY、FAULT TOLLERANCE
16
Throughput and Latency
Throughput: 11 million message/second
Latency: 17ms on full load SOL Shuffle test
32 tasks->32 tasks
4 nodes 10GbE
32 core E52680
17
Scalability
• Test run on 100 nodes and 3000 tasks
• Gearpump performance scales:
18
100 nodes
Fault-Tolerance: Recovery time
Failure scenarios Recovery time [*] comment
Cluster Master node Down 0 s Master HA take effect
Cluster Worker node down ~ 10 seconds timeout detection take a long
time
Message loss ~ 300 ms Still optimizing
Target will be less than 10ms
Application AppMaster down ~ 10 seconds timeout detection take a log
timeTest environment: 91 worker nodes, 1000 tasks (We use 7 machines to simulate 91 worker nodes)
[*]: Recovery time is the time interval between: a) failure happen b) all tasks in topology resume processing data.19
91 worker nodes, 1000 tasks
GEARPUMP INTERNAL
20
Overview and general ideas
State
Minclock service
Replayable
Source
DAG runtime
MinClock service track the min
timestamp of all pending messages in the
system now and future
Message(timestamp)
Every message in the system
Has a application
timestamp(message birth time)
21
Normal Flow
① High Performance Streaming
Overview and general ideas
State
Minclock service
Replayable
Source
Checkpoint Store
DAG runtime
Message(timestamp)
⑥Exactly-once State can be reconstructed by message replay:
② Detect Message loss at Tp
④clock pause at Tp
③ recover DAG executor process
⑤ replay from Tp
22
Recovery Flow
1. High performance streaming
2. Detect Message loss and other failures
3. DAG Executor Recovery
4. Clock Service, know when message is lost
5. Message replay from clock
6. Exactly-once, de-duplication
23
Actor Hierarchy
100% Actor: communication, concurrency, isolation, error handling
Master Cluster
HA Design
Client
Hook in and query state
24
As general service
YARN
WorkerWorkerWorker
Master
standby
Master
Standby
Master
State
Gossip
Master HA
• Quorum (Majority)
• Conflict free data types(CRDT) for consistency
CRDT Data type example:
Decentralized: No central meta server
leader
25
High performance Messaging Layer
• Akka remote message has a big overhead, (sender + receiver address)
• Reduce 95% overhead (400 bytes to ~20 bytes)
Effective
batching
convert from
short address
convert to short address
Sync with other executors
26
Effective batching
Network Idle: Flush as fast as we can
Network Busy: Smart batching until the network is open again.
27
This feature is ported from Storm-297
Network Bandwidth
Doubled
For 100 byte per message
TaskTaskTask
Flow Control
Pass back-pressure level-by-level
28
TaskTaskTask TaskTaskTask TaskTaskTask
Back-pressure
Sliding window
Another option(not used): big-loop-feedback flow control
1. High performance streaming
2. Detect Message loss and other failures
3. DAG Recovery
4. Clock Service, know when message is lost
5. Message replay from clock
6. Exactly-once, de-duplication
29
Failure Detection
For Message loss:
 AckRequest and Ack
For JVM Crash, Network Down:
 Actor Supervision
Master
AppMaster
Executor
Task
Failure
Failure
Failure
30
1. High performance streaming
2. Detect Message loss and other failures
3. DAG Recovery
4. Clock Service, know when message is lost
5. Message replay from clock
6. Exactly-once, de-duplication
31
DAG Recovery: Quarantine and Recover
32
Executor Executor
AppMaster
TaskTaskTask
Store
Source
Send message
Global clock
service
① error
detected
1. Message loss detected
Executor Executor
AppMaster
TaskTaskTask
Store
Source
Global clock
service
① error
detected
②Fence zombie
DAG Recovery: Quarantine and Recover
2. Use dynamic session ID to fence zombies
33
Executor
AppMaster
Task
Store
Source
Replay
Global clock
service
① error
detected
②isolate zombie
Executor
Task
③ Recover
DAG Recovery: Quarantine and Recover
3. Recover the executor JVM, replay message
34Send message
1. High performance streaming
2. Detect Message loss and other failures
3. DAG Recovery
4. Clock Service, know when message is lost
5. Message replay from clock
6. Exactly-once, de-duplication
35
Global Clock Service – track application min-Clock (1)
Definition: Task min-clock is
Minimum of (
min timestamp of pending-messages in current task
Task min-Clock of all upstream tasks
)
Report task min-clock
36
A
B
C
D
E
Min-Clock of D is
min-clock of global
Global Clock Service – track application min-Clock (2)
Definition: Task min-clock is
Minimum of (
min timestamp of pending-messages in current task
Task min-Clock of all upstream tasks
)
Report task min-clock
37
One Task can have thousands upstream
tasks, how to effectively track all of them?
Task
tasktasktask
Source
Global Clock Service – track application min-Clock (3)
38
Task
tasktasktask
Source
Implementation
Level Clock
A
B
C
D
E
Later
Earlier
1000
800
600
400
Ever incrementalOne Task can have thousands upstream
tasks, how to effectively track all of them?
1. High performance streaming
2. Detect Message loss and other failures
3. DAG Executor Recovery
4. Clock Service, know when message is lost
5. Message replay from clock
6. Exactly-once, de-duplication
39
Source-based Message Replay
40
Normal Flow
Replay from the very-beginning source
Source
Source-based Message Replay
Replay from the very-beginning source
Global Clock Service
①Resolve offset②Replay from offset
Source
41
Recovery Flow
1. High performance streaming
2. Detect Message loss and other failures
3. DAG Executor Recovery
4. Clock Service, know when message is lost
5. Message replay from clock
6. Exactly-once, de-duplication
42
Checkpoint Store
DAG runtime
Key: Ensure State(t) only contains
message(timestamp <= t)
Exactly-once message processing
How?
43
Exactly-once message processing
Checkpoint Flow
44
Exactly-once message processing
45
Recovery Flow
Dynamic DAG
Use Pub-Sub model
① Add processor F
F
② Remove processor E
A B C D
E
Subscribe C to F
Un-subscribe
E from B Un-subscribe D from E
Maintain the correct min clock
during transition
46
UNIQUE USE CASES
独特用例
47
IOT Transparent Cloud
Location transparent. Same programming model on IOT device
log
Data Center
dag on
device side
48
Target Problem Large gap between edge device with data center
Unified log ingestion and processing
Agent
Agent
Agent
Single Gearpump Application For All
Collect logs from different
places
Query
Visualization
Kafka queue
DAG
49
Target Problem Distributed application is broken into many isolated pieces
Service
Exactly-once: Financial use cases
User
Billing System
Model
Streaming
System
Bill
Admin Portal
50
Target Problem transactional exactly-once real-time streaming system
Delete
Transformers: Dynamical DAG
 Dynamic Attach  Dynamic Remove
Add Sub Graph
Manipulate the DAG on the fly
 Dynamic Replace
B
51
Target Problem No existing way to manipulate the DAG on the fly
Eve: Online Machine Learning
• Decide immediately based online learning result
sensor
Learn
Predict
Decide
Input
Output
52
Target Problem ML train and predict online in real-time for decision support.
Crowd
sourcing
53
APPLICATION AND DEMO
54
Example1: Stock Drawdown Tracking
More than 3000 index
Crawlers
Publish
Alerts
Query
Group by
Stock id
Drawdown Definition 55
Example2: Intelligent Traffic System
3000 cross roads
专线
Gateway
DAG
KV Query
Travel time
Fake Plate
Over speed
Traffic planning
Real time statistics 56
Other demos
• complex Graph
• Stock data analysis (Drawdown tracking)
• ITS
• Scalability, 100 nodes, 1,000,000 tasks
• DSL
57
Status & Plan
• Our goal: Make this an Apache project
– Welcome code contribution!
– http://gearpump.io
• Plan:
– Connect: IOT
– Platform: Dynamic DAG, Exactly-once, etc. (release soon)
– Data: Real-time analysis algorithm
58
References
• 钟翔 大数据时代的软件架构范式:Reactive架构及Akka实践, 程序员期刊2015年2A期
• Gearpump whitepaper http://typesafe.com/blog/gearpump-real-time-streaming-engine-using-akka
• 吴甘沙 低延迟流处理系统的逆袭, 程序员期刊2013年10期
• Stonebraker http://cs.brown.edu/~ugur/8rulesSigRec.pdf
• https://github.com/intel-hadoop/gearpump
• Gearpump: https://github.com/intel-hadoop/gearpump
• http://highlyscalable.wordpress.com/2013/08/20/in-stream-big-data-processing/
• https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-
three-cheap-machines
• Sqlstream http://www.sqlstream.com/customers/
• http://www.statsblogs.com/2014/05/19/a-general-introduction-to-stream-processing/
• http://www.statalgo.com/2014/05/28/stream-processing-with-messaging-systems/
• Gartner report on IOT http://www.zdnet.com/article/internet-of-things-devices-will-dwarf-number-
of-pcs-tablets-and-smartphones/
59
60

More Related Content

What's hot

Till Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
Till Rohrmann – Fault Tolerance and Job Recovery in Apache FlinkTill Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
Till Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
Flink Forward
 

What's hot (20)

Apache Beam (incubating)
Apache Beam (incubating)Apache Beam (incubating)
Apache Beam (incubating)
 
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
 
Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - HackacIntro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
 
Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014
 
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
 
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
 
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
 
Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch - Dynami...
Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch -  Dynami...Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch -  Dynami...
Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch - Dynami...
 
ApacheCon BigData Europe 2015
ApacheCon BigData Europe 2015 ApacheCon BigData Europe 2015
ApacheCon BigData Europe 2015
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
 
Flink Forward SF 2017: Kenneth Knowles - Back to Sessions overview
Flink Forward SF 2017: Kenneth Knowles - Back to Sessions overviewFlink Forward SF 2017: Kenneth Knowles - Back to Sessions overview
Flink Forward SF 2017: Kenneth Knowles - Back to Sessions overview
 
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra TagareActionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
 
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...
Flink Forward SF 2017: Malo Deniélou -  No shard left behind: Dynamic work re...Flink Forward SF 2017: Malo Deniélou -  No shard left behind: Dynamic work re...
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...
 
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
 
Flink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasFlink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paas
 
Introduction to Apache Apex - CoDS 2016
Introduction to Apache Apex - CoDS 2016Introduction to Apache Apex - CoDS 2016
Introduction to Apache Apex - CoDS 2016
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
 
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
 
Till Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
Till Rohrmann – Fault Tolerance and Job Recovery in Apache FlinkTill Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
Till Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
 

Viewers also liked

Agile Lab_BigData_Meetup_AKKA
Agile Lab_BigData_Meetup_AKKAAgile Lab_BigData_Meetup_AKKA
Agile Lab_BigData_Meetup_AKKA
Paolo Platter
 
Agile Lab_BigData_Meetup
Agile Lab_BigData_MeetupAgile Lab_BigData_Meetup
Agile Lab_BigData_Meetup
Paolo Platter
 
iBeacon - I vantaggi nel mondo retail
iBeacon - I vantaggi nel mondo retailiBeacon - I vantaggi nel mondo retail
iBeacon - I vantaggi nel mondo retail
Emanuele Cucci
 
Big data & hadoop framework
Big data & hadoop frameworkBig data & hadoop framework
Big data & hadoop framework
Tu Pham
 
Big Data and Implications on Platform Architecture
Big Data and Implications on Platform ArchitectureBig Data and Implications on Platform Architecture
Big Data and Implications on Platform Architecture
Odinot Stanislas
 
Presentation Wor(l)d Global Network
Presentation Wor(l)d Global NetworkPresentation Wor(l)d Global Network
Presentation Wor(l)d Global Network
mohim1
 
קטעים מיוחדים במשולש קיפולי נייר
קטעים מיוחדים במשולש  קיפולי ניירקטעים מיוחדים במשולש  קיפולי נייר
קטעים מיוחדים במשולש קיפולי נייר
Liad Melamed
 
2016.02.25 CSUSM BlairBakerCO
2016.02.25 CSUSM BlairBakerCO2016.02.25 CSUSM BlairBakerCO
2016.02.25 CSUSM BlairBakerCO
blairbaker008
 
未來驅勢報告
未來驅勢報告未來驅勢報告
未來驅勢報告
honan4108
 

Viewers also liked (20)

Agile Lab_BigData_Meetup_AKKA
Agile Lab_BigData_Meetup_AKKAAgile Lab_BigData_Meetup_AKKA
Agile Lab_BigData_Meetup_AKKA
 
Agile Lab_BigData_Meetup
Agile Lab_BigData_MeetupAgile Lab_BigData_Meetup
Agile Lab_BigData_Meetup
 
iBeacon - I vantaggi nel mondo retail
iBeacon - I vantaggi nel mondo retailiBeacon - I vantaggi nel mondo retail
iBeacon - I vantaggi nel mondo retail
 
Big data & hadoop framework
Big data & hadoop frameworkBig data & hadoop framework
Big data & hadoop framework
 
Introduzione ai Big Data e alla scienza dei dati - Big Data
Introduzione ai Big Data e alla scienza dei dati - Big DataIntroduzione ai Big Data e alla scienza dei dati - Big Data
Introduzione ai Big Data e alla scienza dei dati - Big Data
 
Big Data and Implications on Platform Architecture
Big Data and Implications on Platform ArchitectureBig Data and Implications on Platform Architecture
Big Data and Implications on Platform Architecture
 
R-IoT Retail IoT - Come le IoT influenzano il mercato del Retail
R-IoT Retail IoT - Come le IoT influenzano il mercato del RetailR-IoT Retail IoT - Come le IoT influenzano il mercato del Retail
R-IoT Retail IoT - Come le IoT influenzano il mercato del Retail
 
Introduzione ai Big Data e alla scienza dei dati
Introduzione ai Big Data e alla scienza dei datiIntroduzione ai Big Data e alla scienza dei dati
Introduzione ai Big Data e alla scienza dei dati
 
SharePoint - The hybrid story and beyond
SharePoint - The hybrid story and beyondSharePoint - The hybrid story and beyond
SharePoint - The hybrid story and beyond
 
Building capacity for open, data-driven science - Grand Rounds
Building capacity for open, data-driven science - Grand RoundsBuilding capacity for open, data-driven science - Grand Rounds
Building capacity for open, data-driven science - Grand Rounds
 
Hoe kom ik in de Office365 cloud?
Hoe kom ik in de Office365 cloud?Hoe kom ik in de Office365 cloud?
Hoe kom ik in de Office365 cloud?
 
Schoonbroer koningin Mathilde verdient geld aan asielcrisis
Schoonbroer koningin Mathilde verdient geld aan asielcrisisSchoonbroer koningin Mathilde verdient geld aan asielcrisis
Schoonbroer koningin Mathilde verdient geld aan asielcrisis
 
Встреча МАМИ по вопросам EVENT рынка
Встреча МАМИ по вопросам EVENT рынкаВстреча МАМИ по вопросам EVENT рынка
Встреча МАМИ по вопросам EVENT рынка
 
Presentation Wor(l)d Global Network
Presentation Wor(l)d Global NetworkPresentation Wor(l)d Global Network
Presentation Wor(l)d Global Network
 
קטעים מיוחדים במשולש קיפולי נייר
קטעים מיוחדים במשולש  קיפולי ניירקטעים מיוחדים במשולש  קיפולי נייר
קטעים מיוחדים במשולש קיפולי נייר
 
Future of Asia
Future of AsiaFuture of Asia
Future of Asia
 
2016.02.25 CSUSM BlairBakerCO
2016.02.25 CSUSM BlairBakerCO2016.02.25 CSUSM BlairBakerCO
2016.02.25 CSUSM BlairBakerCO
 
Just how broken is football's financial model?
Just how broken is football's financial model?Just how broken is football's financial model?
Just how broken is football's financial model?
 
Zaragoza turismo 225
Zaragoza turismo 225Zaragoza turismo 225
Zaragoza turismo 225
 
未來驅勢報告
未來驅勢報告未來驅勢報告
未來驅勢報告
 

Similar to QCON 2015: Gearpump, Realtime Streaming on Akka

Mininet: Moving Forward
Mininet: Moving ForwardMininet: Moving Forward
Mininet: Moving Forward
ON.Lab
 

Similar to QCON 2015: Gearpump, Realtime Streaming on Akka (20)

Samza at LinkedIn
Samza at LinkedInSamza at LinkedIn
Samza at LinkedIn
 
Unified Stream Processing at Scale with Apache Samza - BDS2017
Unified Stream Processing at Scale with Apache Samza - BDS2017Unified Stream Processing at Scale with Apache Samza - BDS2017
Unified Stream Processing at Scale with Apache Samza - BDS2017
 
Chicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architectureChicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architecture
 
stream-processing-at-linkedin-with-apache-samza
stream-processing-at-linkedin-with-apache-samzastream-processing-at-linkedin-with-apache-samza
stream-processing-at-linkedin-with-apache-samza
 
Data Pipelines and Telephony Fraud Detection Using Machine Learning
Data Pipelines and Telephony Fraud Detection Using Machine Learning Data Pipelines and Telephony Fraud Detection Using Machine Learning
Data Pipelines and Telephony Fraud Detection Using Machine Learning
 
An adaptive and eventually self healing framework for geo-distributed real-ti...
An adaptive and eventually self healing framework for geo-distributed real-ti...An adaptive and eventually self healing framework for geo-distributed real-ti...
An adaptive and eventually self healing framework for geo-distributed real-ti...
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
 
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaDeep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
 
Keystone - ApacheCon 2016
Keystone - ApacheCon 2016Keystone - ApacheCon 2016
Keystone - ApacheCon 2016
 
Practice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China MobilePractice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China Mobile
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
 
Velocity 2018 preetha appan final
Velocity 2018   preetha appan finalVelocity 2018   preetha appan final
Velocity 2018 preetha appan final
 
ApacheCon BigData - What it takes to process a trillion events a day?
ApacheCon BigData - What it takes to process a trillion events a day?ApacheCon BigData - What it takes to process a trillion events a day?
ApacheCon BigData - What it takes to process a trillion events a day?
 
Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applications
 
Mininet: Moving Forward
Mininet: Moving ForwardMininet: Moving Forward
Mininet: Moving Forward
 

Recently uploaded

Article writing on excessive use of internet.pptx
Article writing on excessive use of internet.pptxArticle writing on excessive use of internet.pptx
Article writing on excessive use of internet.pptx
abhinandnam9997
 
Living-in-IT-era-Module-7-Imaging-and-Design-for-Social-Impact.pptx
Living-in-IT-era-Module-7-Imaging-and-Design-for-Social-Impact.pptxLiving-in-IT-era-Module-7-Imaging-and-Design-for-Social-Impact.pptx
Living-in-IT-era-Module-7-Imaging-and-Design-for-Social-Impact.pptx
TristanJasperRamos
 
一比一原版UTS毕业证悉尼科技大学毕业证成绩单如何办理
一比一原版UTS毕业证悉尼科技大学毕业证成绩单如何办理一比一原版UTS毕业证悉尼科技大学毕业证成绩单如何办理
一比一原版UTS毕业证悉尼科技大学毕业证成绩单如何办理
aagad
 

Recently uploaded (12)

1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...
 
ER(Entity Relationship) Diagram for online shopping - TAE
ER(Entity Relationship) Diagram for online shopping - TAEER(Entity Relationship) Diagram for online shopping - TAE
ER(Entity Relationship) Diagram for online shopping - TAE
 
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
 
Article writing on excessive use of internet.pptx
Article writing on excessive use of internet.pptxArticle writing on excessive use of internet.pptx
Article writing on excessive use of internet.pptx
 
BASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptxBASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptx
 
How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptx
 
The+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxThe+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptx
 
Latest trends in computer networking.pptx
Latest trends in computer networking.pptxLatest trends in computer networking.pptx
Latest trends in computer networking.pptx
 
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shopHistory+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
 
Living-in-IT-era-Module-7-Imaging-and-Design-for-Social-Impact.pptx
Living-in-IT-era-Module-7-Imaging-and-Design-for-Social-Impact.pptxLiving-in-IT-era-Module-7-Imaging-and-Design-for-Social-Impact.pptx
Living-in-IT-era-Module-7-Imaging-and-Design-for-Social-Impact.pptx
 
一比一原版UTS毕业证悉尼科技大学毕业证成绩单如何办理
一比一原版UTS毕业证悉尼科技大学毕业证成绩单如何办理一比一原版UTS毕业证悉尼科技大学毕业证成绩单如何办理
一比一原版UTS毕业证悉尼科技大学毕业证成绩单如何办理
 
The AI Powered Organization-Intro to AI-LAN.pdf
The AI Powered Organization-Intro to AI-LAN.pdfThe AI Powered Organization-Intro to AI-LAN.pdf
The AI Powered Organization-Intro to AI-LAN.pdf
 

QCON 2015: Gearpump, Realtime Streaming on Akka

  • 1. Gearpump Realtime Streaming on Akka 2015/4/24 钟翔 Mail: xiang.zhong@intel.com Weibo:http://weibo.com/clockfly Intel SSG Big Data Technology Department QCON Beijing 2015
  • 2. What is Gearpump • Akka based lightweight Real time data processing platform. • Apache License http://gearpump.io • Akka: • Communication, concurrency, Isolation, and fault-tolerant Simple and Powerful Message level streaming Long running daemons 2 What is Akka?
  • 3. It is like our human society, driven by message What is Akka? • Micro-service(Actor) oriented. • Message Driven • Lock-free • Location-transparent • Better performance • Fail in-dependently • Scales linearly
  • 4. Gearpump in Big Data Stack store visualization batch stream SQL Catalyst StreamSQL Impala Cluster manager monitor/alert/notify Machine learning Graphx Cloudera Manager Storage Engine Analytics Visualization & management storm Data explore 4 Gearpump
  • 5. Why another streaming platform? • The requirements are not fully met. • We need a higher abstraction to build software! 5
  • 6. Streaming requirements • Michael Stonebraker, The 8 Requirements of Real-Time Stream Processing (2006) My summary 6 Flexible Volume Speed Accuracy Visual Easy programing Any time Any where Any size Any source Any use case Dynamic DAG ②StreamSQL High throughput ⑦Scale linearly ①In-Stream Zero latency ⑥HA ⑧Responsive Exactly-once ③Message loss/delay/out of order ④Predictable WYSWYG
  • 7. Gearpump Highlights 7 100% Akka Throughput 11 million/s (*) 2ms Latency Exactly-once Dynamic DAG Out of Order Message Flexible DSL DAG Visualization Internet of Thing [*] on 4 nodes performance function usability
  • 9. How to Submit an application Master Workers 1. Submit a Jar Master Workers AppMaster 2. Create AppMaster Master Workers AppMaster YARN Master Workers AppMaster Executor Executor Executor 1 2 43 94. Report Executor to AppMaster
  • 10. DAG representation and API Graph(A~>B~>C~>D, B~>E~>D) DAG API Syntax: 10 A B C D E Processor Processor Processor Processor Processor DAG Shuffle
  • 11. DAG API Example - WordCount val context = new ClientContext() val split = Processor[Split](splitParallism) val sum = Processor[Sum](sumParallism) val app = StreamApplication("wordCount", Graph(split ~> sum), UserConfig.empty) val appId = context.submit(app) context.close() class Split(taskContext : TaskContext, conf: UserConfig) extends Task(taskContext, conf) { override def onNext(msg : Message) : Unit = { /* split the line */ } } class Sum (taskContext : TaskContext, conf: UserConfig) extends Task(taskContext, conf) { override def onNext(msg : Message) : Unit = {/* do aggregation on word*/} } 11
  • 12. WordCount with DSL API val context = ClientContext() val app = new StreamApp("dsl", context) val data = "This is a good start, bingo!! bingo!!" app.fromCollection(data.lines) // word => (word, count = 1) .flatMap(line => line.split("[s]+")).map((_, 1)) // (word, count1), (word, count2) => (word, count1 + count2) .groupByKey().sum.log val appId = context.submit(app) context.close() 12
  • 13. High level DSL API Details Transformer Operators flatMap merge map groupBy filter process reduce Concepts Description StreamApp OP(Operator) Graph Stream path of OP Graph OP Transformer Transformation on Stream OP Graph Processor Graph Optimize X source merge reduceflatmap flatmap map join sink sink source Bottom-up optimization Co-locate 13
  • 14. DAG Page DAG Visualization Track global min-Clock of all message DAG: • Node size reflect throughput • Edge width represents flow rate • Red node means something goes wrong 14
  • 15. DAG Visualization Processor Page Skew analysis Task throughput and latency Executor JVM deployment 15
  • 17. Throughput and Latency Throughput: 11 million message/second Latency: 17ms on full load SOL Shuffle test 32 tasks->32 tasks 4 nodes 10GbE 32 core E52680 17
  • 18. Scalability • Test run on 100 nodes and 3000 tasks • Gearpump performance scales: 18 100 nodes
  • 19. Fault-Tolerance: Recovery time Failure scenarios Recovery time [*] comment Cluster Master node Down 0 s Master HA take effect Cluster Worker node down ~ 10 seconds timeout detection take a long time Message loss ~ 300 ms Still optimizing Target will be less than 10ms Application AppMaster down ~ 10 seconds timeout detection take a log timeTest environment: 91 worker nodes, 1000 tasks (We use 7 machines to simulate 91 worker nodes) [*]: Recovery time is the time interval between: a) failure happen b) all tasks in topology resume processing data.19 91 worker nodes, 1000 tasks
  • 21. Overview and general ideas State Minclock service Replayable Source DAG runtime MinClock service track the min timestamp of all pending messages in the system now and future Message(timestamp) Every message in the system Has a application timestamp(message birth time) 21 Normal Flow ① High Performance Streaming
  • 22. Overview and general ideas State Minclock service Replayable Source Checkpoint Store DAG runtime Message(timestamp) ⑥Exactly-once State can be reconstructed by message replay: ② Detect Message loss at Tp ④clock pause at Tp ③ recover DAG executor process ⑤ replay from Tp 22 Recovery Flow
  • 23. 1. High performance streaming 2. Detect Message loss and other failures 3. DAG Executor Recovery 4. Clock Service, know when message is lost 5. Message replay from clock 6. Exactly-once, de-duplication 23
  • 24. Actor Hierarchy 100% Actor: communication, concurrency, isolation, error handling Master Cluster HA Design Client Hook in and query state 24 As general service YARN
  • 25. WorkerWorkerWorker Master standby Master Standby Master State Gossip Master HA • Quorum (Majority) • Conflict free data types(CRDT) for consistency CRDT Data type example: Decentralized: No central meta server leader 25
  • 26. High performance Messaging Layer • Akka remote message has a big overhead, (sender + receiver address) • Reduce 95% overhead (400 bytes to ~20 bytes) Effective batching convert from short address convert to short address Sync with other executors 26
  • 27. Effective batching Network Idle: Flush as fast as we can Network Busy: Smart batching until the network is open again. 27 This feature is ported from Storm-297 Network Bandwidth Doubled For 100 byte per message
  • 28. TaskTaskTask Flow Control Pass back-pressure level-by-level 28 TaskTaskTask TaskTaskTask TaskTaskTask Back-pressure Sliding window Another option(not used): big-loop-feedback flow control
  • 29. 1. High performance streaming 2. Detect Message loss and other failures 3. DAG Recovery 4. Clock Service, know when message is lost 5. Message replay from clock 6. Exactly-once, de-duplication 29
  • 30. Failure Detection For Message loss:  AckRequest and Ack For JVM Crash, Network Down:  Actor Supervision Master AppMaster Executor Task Failure Failure Failure 30
  • 31. 1. High performance streaming 2. Detect Message loss and other failures 3. DAG Recovery 4. Clock Service, know when message is lost 5. Message replay from clock 6. Exactly-once, de-duplication 31
  • 32. DAG Recovery: Quarantine and Recover 32 Executor Executor AppMaster TaskTaskTask Store Source Send message Global clock service ① error detected 1. Message loss detected
  • 33. Executor Executor AppMaster TaskTaskTask Store Source Global clock service ① error detected ②Fence zombie DAG Recovery: Quarantine and Recover 2. Use dynamic session ID to fence zombies 33
  • 34. Executor AppMaster Task Store Source Replay Global clock service ① error detected ②isolate zombie Executor Task ③ Recover DAG Recovery: Quarantine and Recover 3. Recover the executor JVM, replay message 34Send message
  • 35. 1. High performance streaming 2. Detect Message loss and other failures 3. DAG Recovery 4. Clock Service, know when message is lost 5. Message replay from clock 6. Exactly-once, de-duplication 35
  • 36. Global Clock Service – track application min-Clock (1) Definition: Task min-clock is Minimum of ( min timestamp of pending-messages in current task Task min-Clock of all upstream tasks ) Report task min-clock 36 A B C D E Min-Clock of D is min-clock of global
  • 37. Global Clock Service – track application min-Clock (2) Definition: Task min-clock is Minimum of ( min timestamp of pending-messages in current task Task min-Clock of all upstream tasks ) Report task min-clock 37 One Task can have thousands upstream tasks, how to effectively track all of them? Task tasktasktask Source
  • 38. Global Clock Service – track application min-Clock (3) 38 Task tasktasktask Source Implementation Level Clock A B C D E Later Earlier 1000 800 600 400 Ever incrementalOne Task can have thousands upstream tasks, how to effectively track all of them?
  • 39. 1. High performance streaming 2. Detect Message loss and other failures 3. DAG Executor Recovery 4. Clock Service, know when message is lost 5. Message replay from clock 6. Exactly-once, de-duplication 39
  • 40. Source-based Message Replay 40 Normal Flow Replay from the very-beginning source Source
  • 41. Source-based Message Replay Replay from the very-beginning source Global Clock Service ①Resolve offset②Replay from offset Source 41 Recovery Flow
  • 42. 1. High performance streaming 2. Detect Message loss and other failures 3. DAG Executor Recovery 4. Clock Service, know when message is lost 5. Message replay from clock 6. Exactly-once, de-duplication 42
  • 43. Checkpoint Store DAG runtime Key: Ensure State(t) only contains message(timestamp <= t) Exactly-once message processing How? 43
  • 46. Dynamic DAG Use Pub-Sub model ① Add processor F F ② Remove processor E A B C D E Subscribe C to F Un-subscribe E from B Un-subscribe D from E Maintain the correct min clock during transition 46
  • 48. IOT Transparent Cloud Location transparent. Same programming model on IOT device log Data Center dag on device side 48 Target Problem Large gap between edge device with data center
  • 49. Unified log ingestion and processing Agent Agent Agent Single Gearpump Application For All Collect logs from different places Query Visualization Kafka queue DAG 49 Target Problem Distributed application is broken into many isolated pieces
  • 50. Service Exactly-once: Financial use cases User Billing System Model Streaming System Bill Admin Portal 50 Target Problem transactional exactly-once real-time streaming system
  • 51. Delete Transformers: Dynamical DAG  Dynamic Attach  Dynamic Remove Add Sub Graph Manipulate the DAG on the fly  Dynamic Replace B 51 Target Problem No existing way to manipulate the DAG on the fly
  • 52. Eve: Online Machine Learning • Decide immediately based online learning result sensor Learn Predict Decide Input Output 52 Target Problem ML train and predict online in real-time for decision support.
  • 55. Example1: Stock Drawdown Tracking More than 3000 index Crawlers Publish Alerts Query Group by Stock id Drawdown Definition 55
  • 56. Example2: Intelligent Traffic System 3000 cross roads 专线 Gateway DAG KV Query Travel time Fake Plate Over speed Traffic planning Real time statistics 56
  • 57. Other demos • complex Graph • Stock data analysis (Drawdown tracking) • ITS • Scalability, 100 nodes, 1,000,000 tasks • DSL 57
  • 58. Status & Plan • Our goal: Make this an Apache project – Welcome code contribution! – http://gearpump.io • Plan: – Connect: IOT – Platform: Dynamic DAG, Exactly-once, etc. (release soon) – Data: Real-time analysis algorithm 58
  • 59. References • 钟翔 大数据时代的软件架构范式:Reactive架构及Akka实践, 程序员期刊2015年2A期 • Gearpump whitepaper http://typesafe.com/blog/gearpump-real-time-streaming-engine-using-akka • 吴甘沙 低延迟流处理系统的逆袭, 程序员期刊2013年10期 • Stonebraker http://cs.brown.edu/~ugur/8rulesSigRec.pdf • https://github.com/intel-hadoop/gearpump • Gearpump: https://github.com/intel-hadoop/gearpump • http://highlyscalable.wordpress.com/2013/08/20/in-stream-big-data-processing/ • https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second- three-cheap-machines • Sqlstream http://www.sqlstream.com/customers/ • http://www.statsblogs.com/2014/05/19/a-general-introduction-to-stream-processing/ • http://www.statalgo.com/2014/05/28/stream-processing-with-messaging-systems/ • Gartner report on IOT http://www.zdnet.com/article/internet-of-things-devices-will-dwarf-number- of-pcs-tablets-and-smartphones/ 59
  • 60. 60