Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
dbisINSTITUT FÜR INFORMATIK
HUMBOLDT−UNIVERSITÄT ZU ERLINB
A Tale of Squirrels and Storms
Flink Forward 2015
Matthias J. S...
–MatthiasJ.Sax–SquirrelsandStorms
1/22
About Me
Ph. D. student in CS, DBIS Group, HU Berlin
involved in Stratosphere resea...
Flink and Storm
vs.
Flink and Storm
Flinkvs.Storm
–MatthiasJ.Sax–SquirrelsandStorms
3/22
Similarities of Flink and Storm
–MatthiasJ.Sax–SquirrelsandStorms
3/22
Similarities of Flink and Storm
true stream processing engines (no micro-batching)
–MatthiasJ.Sax–SquirrelsandStorms
3/22
Similarities of Flink and Storm
true stream processing engines (no micro-batching)
...
–MatthiasJ.Sax–SquirrelsandStorms
3/22
Similarities of Flink and Storm
true stream processing engines (no micro-batching)
...
–MatthiasJ.Sax–SquirrelsandStorms
3/22
Similarities of Flink and Storm
true stream processing engines (no micro-batching)
...
–MatthiasJ.Sax–SquirrelsandStorms
3/22
Similarities of Flink and Storm
true stream processing engines (no micro-batching)
...
–MatthiasJ.Sax–SquirrelsandStorms
3/22
Similarities of Flink and Storm
true stream processing engines (no micro-batching)
...
–MatthiasJ.Sax–SquirrelsandStorms
3/22
Similarities of Flink and Storm
true stream processing engines (no micro-batching)
...
–MatthiasJ.Sax–SquirrelsandStorms
4/22
Flink vs. Storm
Advantages of Storm:
super low latency (< 10ms)
very robust:
statel...
–MatthiasJ.Sax–SquirrelsandStorms
5/22
Flink vs. Storm
Advantages of Flink:1
richer API
Java and Scala
type safe programs
...
–MatthiasJ.Sax–SquirrelsandStorms
6/22
Flink vs. Storm
Advantages of Flink (cont.):
provides exactly-once sinks
native flow...
–MatthiasJ.Sax–SquirrelsandStorms
7/22
System Architecture: Storm
–MatthiasJ.Sax–SquirrelsandStorms
7/22
System Architecture: Storm
Nimbus
–MatthiasJ.Sax–SquirrelsandStorms
7/22
System Architecture: Storm
Nimbus
Client
–MatthiasJ.Sax–SquirrelsandStorms
7/22
System Architecture: Storm
Nimbus
Client
Supervisor
Supervisor
Supervisor
Superviso...
–MatthiasJ.Sax–SquirrelsandStorms
7/22
System Architecture: Storm
Nimbus
Client
Supervisor
Supervisor
Supervisor
Superviso...
–MatthiasJ.Sax–SquirrelsandStorms
7/22
System Architecture: Storm
Nimbus
Client
Supervisor
Supervisor
Supervisor
Superviso...
–MatthiasJ.Sax–SquirrelsandStorms
7/22
System Architecture: Storm
Nimbus
Client
Supervisor
Supervisor
Supervisor
Superviso...
–MatthiasJ.Sax–SquirrelsandStorms
7/22
System Architecture: Storm
Nimbus
Client
Supervisor
Supervisor
Supervisor
Superviso...
–MatthiasJ.Sax–SquirrelsandStorms
7/22
System Architecture: Storm
Nimbus
Client
Supervisor
Supervisor
Supervisor
Superviso...
–MatthiasJ.Sax–SquirrelsandStorms
8/22
System Architecture: Flink
–MatthiasJ.Sax–SquirrelsandStorms
8/22
System Architecture: Flink
JobManager
–MatthiasJ.Sax–SquirrelsandStorms
8/22
System Architecture: Flink
JobManager
WebClientCLI Shell
–MatthiasJ.Sax–SquirrelsandStorms
8/22
System Architecture: Flink
JobManager
WebClientCLI Shell
TaskManager
TaskManager
Ta...
–MatthiasJ.Sax–SquirrelsandStorms
8/22
System Architecture: Flink
JobManager
WebClientCLI Shell
TaskManager
TaskManager
Ta...
–MatthiasJ.Sax–SquirrelsandStorms
8/22
System Architecture: Flink
JobManager
WebClientCLI Shell
TaskManager
TaskManager
Ta...
–MatthiasJ.Sax–SquirrelsandStorms
8/22
System Architecture: Flink
JobManager
WebClientCLI Shell
TaskManager
TaskManager
Ta...
–MatthiasJ.Sax–SquirrelsandStorms
9/22
Topology Deployment: Storm
per default: round-robin scheduling
high overhead due to...
–MatthiasJ.Sax–SquirrelsandStorms
9/22
Topology Deployment: Storm
per default: round-robin scheduling
high overhead due to...
–MatthiasJ.Sax–SquirrelsandStorms
9/22
Topology Deployment: Storm
per default: round-robin scheduling
high overhead due to...
–MatthiasJ.Sax–SquirrelsandStorms
9/22
Topology Deployment: Storm
per default: round-robin scheduling
high overhead due to...
–MatthiasJ.Sax–SquirrelsandStorms
9/22
Topology Deployment: Storm
per default: round-robin scheduling
high overhead due to...
–MatthiasJ.Sax–SquirrelsandStorms
9/22
Topology Deployment: Storm
per default: round-robin scheduling
high overhead due to...
–MatthiasJ.Sax–SquirrelsandStorms
9/22
Topology Deployment: Storm
per default: round-robin scheduling
high overhead due to...
–MatthiasJ.Sax–SquirrelsandStorms
10/22
Topology Deployment: Flink
deploys whole pipeline to each TaskManager
local-forwar...
–MatthiasJ.Sax–SquirrelsandStorms
10/22
Topology Deployment: Flink
deploys whole pipeline to each TaskManager
local-forwar...
–MatthiasJ.Sax–SquirrelsandStorms
10/22
Topology Deployment: Flink
deploys whole pipeline to each TaskManager
local-forwar...
–MatthiasJ.Sax–SquirrelsandStorms
10/22
Topology Deployment: Flink
deploys whole pipeline to each TaskManager
local-forwar...
–MatthiasJ.Sax–SquirrelsandStorms
10/22
Topology Deployment: Flink
deploys whole pipeline to each TaskManager
local-forwar...
–MatthiasJ.Sax–SquirrelsandStorms
10/22
Topology Deployment: Flink
deploys whole pipeline to each TaskManager
local-forwar...
–MatthiasJ.Sax–SquirrelsandStorms
10/22
Topology Deployment: Flink
deploys whole pipeline to each TaskManager
local-forwar...
–MatthiasJ.Sax–SquirrelsandStorms
10/22
Topology Deployment: Flink
deploys whole pipeline to each TaskManager
local-forwar...
Storm Compatibility
–MatthiasJ.Sax–SquirrelsandStorms
12/22
Storm Compatibility
Allows to4
execute Storm topologies in Flink
embed Spouts/Bolt...
–MatthiasJ.Sax–SquirrelsandStorms
12/22
Storm Compatibility
Allows to4
execute Storm topologies in Flink
embed Spouts/Bolt...
–MatthiasJ.Sax–SquirrelsandStorms
13/22
Storm Compatibility: API
Execute whole topologies:
FlinkTopologyBuilder
FlinkSubmi...
–MatthiasJ.Sax–SquirrelsandStorms
13/22
Storm Compatibility: API
Execute whole topologies:
FlinkTopologyBuilder
FlinkSubmi...
–MatthiasJ.Sax–SquirrelsandStorms
13/22
Storm Compatibility: API
Execute whole topologies:
FlinkTopologyBuilder
FlinkSubmi...
–MatthiasJ.Sax–SquirrelsandStorms
14/22
Storm Compatibility: Internals
Wrappers for Operators and Collectors
–MatthiasJ.Sax–SquirrelsandStorms
14/22
Storm Compatibility: Internals
Wrappers for Operators and Collectors
Bolt
–MatthiasJ.Sax–SquirrelsandStorms
14/22
Storm Compatibility: Internals
Wrappers for Operators and Collectors
Bolt
BoltWrap...
–MatthiasJ.Sax–SquirrelsandStorms
14/22
Storm Compatibility: Internals
Wrappers for Operators and Collectors
Bolt
BoltWrap...
–MatthiasJ.Sax–SquirrelsandStorms
14/22
Storm Compatibility: Internals
Wrappers for Operators and Collectors
redirecting m...
–MatthiasJ.Sax–SquirrelsandStorms
14/22
Storm Compatibility: Internals
Wrappers for Operators and Collectors
redirecting m...
–MatthiasJ.Sax–SquirrelsandStorms
14/22
Storm Compatibility: Internals
Wrappers for Operators and Collectors
redirecting m...
–MatthiasJ.Sax–SquirrelsandStorms
15/22
WordCount on Storm
public void main(String [] args) {
–MatthiasJ.Sax–SquirrelsandStorms
15/22
WordCount on Storm
public void main(String [] args) {
TopologyBuilder builder
= ne...
–MatthiasJ.Sax–SquirrelsandStorms
15/22
WordCount on Storm
public void main(String [] args) {
TopologyBuilder builder
= ne...
–MatthiasJ.Sax–SquirrelsandStorms
15/22
WordCount on Storm
public void main(String [] args) {
TopologyBuilder builder
= ne...
–MatthiasJ.Sax–SquirrelsandStorms
15/22
WordCount on Storm
public void main(String [] args) {
TopologyBuilder builder
= ne...
–MatthiasJ.Sax–SquirrelsandStorms
15/22
WordCount on Storm
public void main(String [] args) {
TopologyBuilder builder
= ne...
–MatthiasJ.Sax–SquirrelsandStorms
15/22
WordCount on Storm
public void main(String [] args) {
TopologyBuilder builder
= ne...
–MatthiasJ.Sax–SquirrelsandStorms
16/22
WordCount on Flink
public void main(String [] args) {
FlinkTopologyBuilder builder...
–MatthiasJ.Sax–SquirrelsandStorms
17/22
Storm on Flink
run Storm topology on Flink:
changing two lines of code
sufficient
–MatthiasJ.Sax–SquirrelsandStorms
18/22
WordCount: Embedded Spout
public void main(String [] args) {
StreamExecutionEnviro...
–MatthiasJ.Sax–SquirrelsandStorms
18/22
WordCount: Embedded Spout
public void main(String [] args) {
StreamExecutionEnviro...
–MatthiasJ.Sax–SquirrelsandStorms
18/22
WordCount: Embedded Spout
public void main(String [] args) {
StreamExecutionEnviro...
–MatthiasJ.Sax–SquirrelsandStorms
18/22
WordCount: Embedded Spout
public void main(String [] args) {
StreamExecutionEnviro...
–MatthiasJ.Sax–SquirrelsandStorms
18/22
WordCount: Embedded Spout
public void main(String [] args) {
StreamExecutionEnviro...
–MatthiasJ.Sax–SquirrelsandStorms
18/22
WordCount: Embedded Spout
public void main(String [] args) {
StreamExecutionEnviro...
–MatthiasJ.Sax–SquirrelsandStorms
19/22
WordCount: Embedded Bolt
public void main(String [] args) {
StreamExecutionEnviron...
–MatthiasJ.Sax–SquirrelsandStorms
19/22
WordCount: Embedded Bolt
public void main(String [] args) {
StreamExecutionEnviron...
–MatthiasJ.Sax–SquirrelsandStorms
19/22
WordCount: Embedded Bolt
public void main(String [] args) {
StreamExecutionEnviron...
–MatthiasJ.Sax–SquirrelsandStorms
19/22
WordCount: Embedded Bolt
public void main(String [] args) {
StreamExecutionEnviron...
–MatthiasJ.Sax–SquirrelsandStorms
19/22
WordCount: Embedded Bolt
public void main(String [] args) {
StreamExecutionEnviron...
–MatthiasJ.Sax–SquirrelsandStorms
19/22
WordCount: Embedded Bolt
public void main(String [] args) {
StreamExecutionEnviron...
–MatthiasJ.Sax–SquirrelsandStorms
20/22
Embedded Compatibility Mode
Re-use code within Flink streaming program:
Spouts as ...
–MatthiasJ.Sax–SquirrelsandStorms
20/22
Embedded Compatibility Mode
Re-use code within Flink streaming program:
Spouts as ...
–MatthiasJ.Sax–SquirrelsandStorms
20/22
Embedded Compatibility Mode
Re-use code within Flink streaming program:
Spouts as ...
–MatthiasJ.Sax–SquirrelsandStorms
21/22
Outlook: Storm Compatibility
Current status:
available in master branch
based on S...
–MatthiasJ.Sax–SquirrelsandStorms
21/22
Outlook: Storm Compatibility
Current status:
available in master branch
based on S...
–MatthiasJ.Sax–SquirrelsandStorms
21/22
Outlook: Storm Compatibility
Current status:
available in master branch
based on S...
dbisINSTITUT FÜR INFORMATIK
HUMBOLDT−UNIVERSITÄT ZU ERLINB
A Tale of Squirrels and Storms
Flink Forward 2015
Thanks!
Upcoming SlideShare
Loading in …5
×

A Tale of Squirrels and Storms

160 views

Published on

Describes some differences and similarities of Apache Flink and Apache Storm. Gives a introduction into Flink's compatibility layer that allows to run Storm topologies in Flink and to embed spouts and bolts in Flink streaming programs.

Published in: Software
  • Be the first to comment

  • Be the first to like this

A Tale of Squirrels and Storms

  1. 1. dbisINSTITUT FÜR INFORMATIK HUMBOLDT−UNIVERSITÄT ZU ERLINB A Tale of Squirrels and Storms Flink Forward 2015 Matthias J. Sax mjsax@{informatik.hu-berlin.de|apache.org} @MatthiasJSax Humboldt-Universit¨at zu Berlin Department of Computer Science October 13st 2015
  2. 2. –MatthiasJ.Sax–SquirrelsandStorms 1/22 About Me Ph. D. student in CS, DBIS Group, HU Berlin involved in Stratosphere research project working on data stream processing and optimization Aeolus: build on top of Apache Storm (https://github.com/mjsax/aeolus) Committer at Apache Flink
  3. 3. Flink and Storm vs.
  4. 4. Flink and Storm Flinkvs.Storm
  5. 5. –MatthiasJ.Sax–SquirrelsandStorms 3/22 Similarities of Flink and Storm
  6. 6. –MatthiasJ.Sax–SquirrelsandStorms 3/22 Similarities of Flink and Storm true stream processing engines (no micro-batching)
  7. 7. –MatthiasJ.Sax–SquirrelsandStorms 3/22 Similarities of Flink and Storm true stream processing engines (no micro-batching) low latencies ( 100ms)
  8. 8. –MatthiasJ.Sax–SquirrelsandStorms 3/22 Similarities of Flink and Storm true stream processing engines (no micro-batching) low latencies ( 100ms) executing data flow programs
  9. 9. –MatthiasJ.Sax–SquirrelsandStorms 3/22 Similarities of Flink and Storm true stream processing engines (no micro-batching) low latencies ( 100ms) executing data flow programs parallel and distributed
  10. 10. –MatthiasJ.Sax–SquirrelsandStorms 3/22 Similarities of Flink and Storm true stream processing engines (no micro-batching) low latencies ( 100ms) executing data flow programs parallel and distributed fault-tolerant
  11. 11. –MatthiasJ.Sax–SquirrelsandStorms 3/22 Similarities of Flink and Storm true stream processing engines (no micro-batching) low latencies ( 100ms) executing data flow programs parallel and distributed fault-tolerant cloud or cluster environment
  12. 12. –MatthiasJ.Sax–SquirrelsandStorms 3/22 Similarities of Flink and Storm true stream processing engines (no micro-batching) low latencies ( 100ms) executing data flow programs parallel and distributed fault-tolerant cloud or cluster environment Trident: similar Java API exactly-once processing
  13. 13. –MatthiasJ.Sax–SquirrelsandStorms 4/22 Flink vs. Storm Advantages of Storm: super low latency (< 10ms) very robust: stateless JVM for easy restart on failure Zookeeper manages cluster state isolation of topology dynamic scaling (to some extent) multi-language protocol (for experts only) distributed RPC
  14. 14. –MatthiasJ.Sax–SquirrelsandStorms 5/22 Flink vs. Storm Advantages of Flink:1 richer API Java and Scala type safe programs system is aware of multiple input streams ordered stream processing system and user timestamps count/time and customized windows stateful processing light weight fault-tolerance Chandy-Lamport distributed snapshots 1 http: //data-artisans.com/real-time-stream-processing-the-next-step-for-apache-flink/
  15. 15. –MatthiasJ.Sax–SquirrelsandStorms 6/22 Flink vs. Storm Advantages of Flink (cont.): provides exactly-once sinks native flow control (back pressure)2 higher throughput (> x 100)3 no lambda or kappa architecture necessary native support for iterations (cyclic data flows) managed memory 2 http://data-artisans.com/how-flink-handles-backpressure/ 3 http://data-artisans.com/ high-throughput-low-latency-and-exactly-once-stream-processing-with-apache-flink/
  16. 16. –MatthiasJ.Sax–SquirrelsandStorms 7/22 System Architecture: Storm
  17. 17. –MatthiasJ.Sax–SquirrelsandStorms 7/22 System Architecture: Storm Nimbus
  18. 18. –MatthiasJ.Sax–SquirrelsandStorms 7/22 System Architecture: Storm Nimbus Client
  19. 19. –MatthiasJ.Sax–SquirrelsandStorms 7/22 System Architecture: Storm Nimbus Client Supervisor Supervisor Supervisor Supervisor Supervisor
  20. 20. –MatthiasJ.Sax–SquirrelsandStorms 7/22 System Architecture: Storm Nimbus Client Supervisor Supervisor Supervisor Supervisor Supervisor Zookeeper Zookeeper Zookeeper
  21. 21. –MatthiasJ.Sax–SquirrelsandStorms 7/22 System Architecture: Storm Nimbus Client Supervisor Supervisor Supervisor Supervisor Supervisor Zookeeper Zookeeper Zookeeper
  22. 22. –MatthiasJ.Sax–SquirrelsandStorms 7/22 System Architecture: Storm Nimbus Client Supervisor Supervisor Supervisor Supervisor Supervisor Zookeeper Zookeeper Zookeeper Worker Worker Worker Worker Worker Worker
  23. 23. –MatthiasJ.Sax–SquirrelsandStorms 7/22 System Architecture: Storm Nimbus Client Supervisor Supervisor Supervisor Supervisor Supervisor Zookeeper Zookeeper Zookeeper Worker Worker Worker Worker Worker Worker
  24. 24. –MatthiasJ.Sax–SquirrelsandStorms 7/22 System Architecture: Storm Nimbus Client Supervisor Supervisor Supervisor Supervisor Supervisor Zookeeper Zookeeper Zookeeper Worker Worker Worker Worker Worker Worker
  25. 25. –MatthiasJ.Sax–SquirrelsandStorms 8/22 System Architecture: Flink
  26. 26. –MatthiasJ.Sax–SquirrelsandStorms 8/22 System Architecture: Flink JobManager
  27. 27. –MatthiasJ.Sax–SquirrelsandStorms 8/22 System Architecture: Flink JobManager WebClientCLI Shell
  28. 28. –MatthiasJ.Sax–SquirrelsandStorms 8/22 System Architecture: Flink JobManager WebClientCLI Shell TaskManager TaskManager TaskManager TaskManager TaskManager
  29. 29. –MatthiasJ.Sax–SquirrelsandStorms 8/22 System Architecture: Flink JobManager WebClientCLI Shell TaskManager TaskManager TaskManager TaskManager TaskManager
  30. 30. –MatthiasJ.Sax–SquirrelsandStorms 8/22 System Architecture: Flink JobManager WebClientCLI Shell TaskManager TaskManager TaskManager TaskManager TaskManager
  31. 31. –MatthiasJ.Sax–SquirrelsandStorms 8/22 System Architecture: Flink JobManager WebClientCLI Shell TaskManager TaskManager TaskManager TaskManager TaskManager JobManager
  32. 32. –MatthiasJ.Sax–SquirrelsandStorms 9/22 Topology Deployment: Storm per default: round-robin scheduling high overhead due to intra JVM and/or network communication localOfShuffle connection pattern poorly exploited isolation of topologies custom scheduler possible (for experts only)
  33. 33. –MatthiasJ.Sax–SquirrelsandStorms 9/22 Topology Deployment: Storm per default: round-robin scheduling high overhead due to intra JVM and/or network communication localOfShuffle connection pattern poorly exploited isolation of topologies custom scheduler possible (for experts only)
  34. 34. –MatthiasJ.Sax–SquirrelsandStorms 9/22 Topology Deployment: Storm per default: round-robin scheduling high overhead due to intra JVM and/or network communication localOfShuffle connection pattern poorly exploited isolation of topologies custom scheduler possible (for experts only) Src
  35. 35. –MatthiasJ.Sax–SquirrelsandStorms 9/22 Topology Deployment: Storm per default: round-robin scheduling high overhead due to intra JVM and/or network communication localOfShuffle connection pattern poorly exploited isolation of topologies custom scheduler possible (for experts only) Src T1 T2
  36. 36. –MatthiasJ.Sax–SquirrelsandStorms 9/22 Topology Deployment: Storm per default: round-robin scheduling high overhead due to intra JVM and/or network communication localOfShuffle connection pattern poorly exploited isolation of topologies custom scheduler possible (for experts only) Src T1 T2 F1 F2
  37. 37. –MatthiasJ.Sax–SquirrelsandStorms 9/22 Topology Deployment: Storm per default: round-robin scheduling high overhead due to intra JVM and/or network communication localOfShuffle connection pattern poorly exploited isolation of topologies custom scheduler possible (for experts only) Src T1 T2 F1 F2 C1 C2
  38. 38. –MatthiasJ.Sax–SquirrelsandStorms 9/22 Topology Deployment: Storm per default: round-robin scheduling high overhead due to intra JVM and/or network communication localOfShuffle connection pattern poorly exploited isolation of topologies custom scheduler possible (for experts only) Src T1 T2 F1 F2 C1 C2 Sk
  39. 39. –MatthiasJ.Sax–SquirrelsandStorms 10/22 Topology Deployment: Flink deploys whole pipeline to each TaskManager local-forward is default operator chaining
  40. 40. –MatthiasJ.Sax–SquirrelsandStorms 10/22 Topology Deployment: Flink deploys whole pipeline to each TaskManager local-forward is default operator chaining
  41. 41. –MatthiasJ.Sax–SquirrelsandStorms 10/22 Topology Deployment: Flink deploys whole pipeline to each TaskManager local-forward is default operator chaining Src
  42. 42. –MatthiasJ.Sax–SquirrelsandStorms 10/22 Topology Deployment: Flink deploys whole pipeline to each TaskManager local-forward is default operator chaining Src T1 T2
  43. 43. –MatthiasJ.Sax–SquirrelsandStorms 10/22 Topology Deployment: Flink deploys whole pipeline to each TaskManager local-forward is default operator chaining Src T1 T2 F1 F2
  44. 44. –MatthiasJ.Sax–SquirrelsandStorms 10/22 Topology Deployment: Flink deploys whole pipeline to each TaskManager local-forward is default operator chaining Src T1 T2 F1 F2 C1 C2
  45. 45. –MatthiasJ.Sax–SquirrelsandStorms 10/22 Topology Deployment: Flink deploys whole pipeline to each TaskManager local-forward is default operator chaining Src T1 T2 F1 F2 C1 C2 Sk
  46. 46. –MatthiasJ.Sax–SquirrelsandStorms 10/22 Topology Deployment: Flink deploys whole pipeline to each TaskManager local-forward is default operator chaining Src T1 T2 F1 F2 C1 C2 Sk
  47. 47. Storm Compatibility
  48. 48. –MatthiasJ.Sax–SquirrelsandStorms 12/22 Storm Compatibility Allows to4 execute Storm topologies in Flink embed Spouts/Bolts in Flink streaming programs 4 https://ci.apache.org/projects/flink/flink-docs-master/apis/storm_compatibility.html
  49. 49. –MatthiasJ.Sax–SquirrelsandStorms 12/22 Storm Compatibility Allows to4 execute Storm topologies in Flink embed Spouts/Bolts in Flink streaming programs Runtime Distributed Streaming Dataflow DataSet API Batch Processing Streaming API Stream Processing Local JVM, Embedded Cluster Standalone, YARN Cloud GCE, EC2 FlinkML MachineLearning Gelly GraphAPI&Library TableAPI Batch HadoopM/R Comptibility TableAPI Streaming Storm Compatibility 4 https://ci.apache.org/projects/flink/flink-docs-master/apis/storm_compatibility.html
  50. 50. –MatthiasJ.Sax–SquirrelsandStorms 13/22 Storm Compatibility: API Execute whole topologies: FlinkTopologyBuilder FlinkSubmitter FlinkClient FlinkLocalCluster
  51. 51. –MatthiasJ.Sax–SquirrelsandStorms 13/22 Storm Compatibility: API Execute whole topologies: FlinkTopologyBuilder FlinkSubmitter FlinkClient FlinkLocalCluster Embedded mode: SpoutWrapper BoltWrapper
  52. 52. –MatthiasJ.Sax–SquirrelsandStorms 13/22 Storm Compatibility: API Execute whole topologies: FlinkTopologyBuilder FlinkSubmitter FlinkClient FlinkLocalCluster Embedded mode: SpoutWrapper BoltWrapper Additionally: FiniteSpout interface
  53. 53. –MatthiasJ.Sax–SquirrelsandStorms 14/22 Storm Compatibility: Internals Wrappers for Operators and Collectors
  54. 54. –MatthiasJ.Sax–SquirrelsandStorms 14/22 Storm Compatibility: Internals Wrappers for Operators and Collectors Bolt
  55. 55. –MatthiasJ.Sax–SquirrelsandStorms 14/22 Storm Compatibility: Internals Wrappers for Operators and Collectors Bolt BoltWrapper
  56. 56. –MatthiasJ.Sax–SquirrelsandStorms 14/22 Storm Compatibility: Internals Wrappers for Operators and Collectors Bolt BoltWrapper Flink Collector
  57. 57. –MatthiasJ.Sax–SquirrelsandStorms 14/22 Storm Compatibility: Internals Wrappers for Operators and Collectors redirecting method calls run() ⇒ nextTuple() processElement() ⇒ execute() emit() ⇒ collect() Bolt BoltWrapper Flink Collector
  58. 58. –MatthiasJ.Sax–SquirrelsandStorms 14/22 Storm Compatibility: Internals Wrappers for Operators and Collectors redirecting method calls run() ⇒ nextTuple() processElement() ⇒ execute() emit() ⇒ collect() Bolt BoltWrapper Flink Collector execute() processElement() emit() collect()
  59. 59. –MatthiasJ.Sax–SquirrelsandStorms 14/22 Storm Compatibility: Internals Wrappers for Operators and Collectors redirecting method calls run() ⇒ nextTuple() processElement() ⇒ execute() emit() ⇒ collect() translating data types TupleX, POJO ⇔ Tuple/Values primitive types for single attribute input/output Bolt BoltWrapper Flink Collector execute() processElement() emit() collect()
  60. 60. –MatthiasJ.Sax–SquirrelsandStorms 15/22 WordCount on Storm public void main(String [] args) {
  61. 61. –MatthiasJ.Sax–SquirrelsandStorms 15/22 WordCount on Storm public void main(String [] args) { TopologyBuilder builder = new TopologyBuilder ();
  62. 62. –MatthiasJ.Sax–SquirrelsandStorms 15/22 WordCount on Storm public void main(String [] args) { TopologyBuilder builder = new TopologyBuilder (); builder.setSpout("source", new FileSpout("/tmp/hamlet.txt"));
  63. 63. –MatthiasJ.Sax–SquirrelsandStorms 15/22 WordCount on Storm public void main(String [] args) { TopologyBuilder builder = new TopologyBuilder (); builder.setSpout("source", new FileSpout("/tmp/hamlet.txt")); builder.setBolt("tokenizer", new BoltTokenizer ()) . shuffleGrouping ("source");
  64. 64. –MatthiasJ.Sax–SquirrelsandStorms 15/22 WordCount on Storm public void main(String [] args) { TopologyBuilder builder = new TopologyBuilder (); builder.setSpout("source", new FileSpout("/tmp/hamlet.txt")); builder.setBolt("tokenizer", new BoltTokenizer ()) . shuffleGrouping ("source"); builder.setBolt("counter", new BoltCounter ()) . fieldsGrouping("tokenizer", new Fields("word"));
  65. 65. –MatthiasJ.Sax–SquirrelsandStorms 15/22 WordCount on Storm public void main(String [] args) { TopologyBuilder builder = new TopologyBuilder (); builder.setSpout("source", new FileSpout("/tmp/hamlet.txt")); builder.setBolt("tokenizer", new BoltTokenizer ()) . shuffleGrouping ("source"); builder.setBolt("counter", new BoltCounter ()) . fieldsGrouping("tokenizer", new Fields("word")); builder.setBolt("sink", new BoltFileSink("/tmp/count.txt")) . shuffleGrouping ("counter");
  66. 66. –MatthiasJ.Sax–SquirrelsandStorms 15/22 WordCount on Storm public void main(String [] args) { TopologyBuilder builder = new TopologyBuilder (); builder.setSpout("source", new FileSpout("/tmp/hamlet.txt")); builder.setBolt("tokenizer", new BoltTokenizer ()) . shuffleGrouping ("source"); builder.setBolt("counter", new BoltCounter ()) . fieldsGrouping("tokenizer", new Fields("word")); builder.setBolt("sink", new BoltFileSink("/tmp/count.txt")) . shuffleGrouping ("counter"); Config conf = new Config (); StormSubmitter. submitTopology("WordCount", conf , builder.createTopology ()); }
  67. 67. –MatthiasJ.Sax–SquirrelsandStorms 16/22 WordCount on Flink public void main(String [] args) { FlinkTopologyBuilder builder = new FlinkTopologyBuilder (); builder.setSpout("source", new FileSpout("/tmp/hamlet.txt")); builder.setBolt("tokenizer", new BoltTokenizer ()) . shuffleGrouping ("source"); builder.setBolt("counter", new BoltCounter ()) . fieldsGrouping("tokenizer", new Fields("word")); builder.setBolt("sink", new BoltFileSink("/tmp/count.txt")) . shuffleGrouping ("counter"); Config conf = new Config (); FlinkSubmitter. submitTopology("WordCount", conf , builder.createTopology ()); }
  68. 68. –MatthiasJ.Sax–SquirrelsandStorms 17/22 Storm on Flink run Storm topology on Flink: changing two lines of code sufficient
  69. 69. –MatthiasJ.Sax–SquirrelsandStorms 18/22 WordCount: Embedded Spout public void main(String [] args) { StreamExecutionEnvironment env = StreamExecutionEnvironment . getExecutionEnvironment ();
  70. 70. –MatthiasJ.Sax–SquirrelsandStorms 18/22 WordCount: Embedded Spout public void main(String [] args) { StreamExecutionEnvironment env = StreamExecutionEnvironment . getExecutionEnvironment (); DataStream <Tuple1 <String >> source = env.addSource(
  71. 71. –MatthiasJ.Sax–SquirrelsandStorms 18/22 WordCount: Embedded Spout public void main(String [] args) { StreamExecutionEnvironment env = StreamExecutionEnvironment . getExecutionEnvironment (); DataStream <Tuple1 <String >> source = env.addSource( new SpoutWrapper <Tuple1 <String >>(
  72. 72. –MatthiasJ.Sax–SquirrelsandStorms 18/22 WordCount: Embedded Spout public void main(String [] args) { StreamExecutionEnvironment env = StreamExecutionEnvironment . getExecutionEnvironment (); DataStream <Tuple1 <String >> source = env.addSource( new SpoutWrapper <Tuple1 <String >>( new FileSpout("/tmp/hamlet.txt")),
  73. 73. –MatthiasJ.Sax–SquirrelsandStorms 18/22 WordCount: Embedded Spout public void main(String [] args) { StreamExecutionEnvironment env = StreamExecutionEnvironment . getExecutionEnvironment (); DataStream <Tuple1 <String >> source = env.addSource( new SpoutWrapper <Tuple1 <String >>( new FileSpout("/tmp/hamlet.txt")), TypeExtractor.getForObject( new Tuple1 <String >("")));
  74. 74. –MatthiasJ.Sax–SquirrelsandStorms 18/22 WordCount: Embedded Spout public void main(String [] args) { StreamExecutionEnvironment env = StreamExecutionEnvironment . getExecutionEnvironment (); DataStream <Tuple1 <String >> source = env.addSource( new SpoutWrapper <Tuple1 <String >>( new FileSpout("/tmp/hamlet.txt")), TypeExtractor.getForObject( new Tuple1 <String >(""))); // do further processing on source source.flatMap(new Tokenizer ()) // out -> Tuple2 <String ,Integer > .keyBy (0). sum (1). writeAsText("/tmp/count.txt"); env.execute("WordCount"); }
  75. 75. –MatthiasJ.Sax–SquirrelsandStorms 19/22 WordCount: Embedded Bolt public void main(String [] args) { StreamExecutionEnvironment env = StreamExecutionEnvironment . getExecutionEnvironment (); DataStream <String > text = env.readTextFile("/tmp/hamlet.txt");
  76. 76. –MatthiasJ.Sax–SquirrelsandStorms 19/22 WordCount: Embedded Bolt public void main(String [] args) { StreamExecutionEnvironment env = StreamExecutionEnvironment . getExecutionEnvironment (); DataStream <String > text = env.readTextFile("/tmp/hamlet.txt"); DataStream <Tuple2 <String ,Integer >> tokens = text.transform(
  77. 77. –MatthiasJ.Sax–SquirrelsandStorms 19/22 WordCount: Embedded Bolt public void main(String [] args) { StreamExecutionEnvironment env = StreamExecutionEnvironment . getExecutionEnvironment (); DataStream <String > text = env.readTextFile("/tmp/hamlet.txt"); DataStream <Tuple2 <String ,Integer >> tokens = text.transform( "tokenizer", new BoltWrapper <String , Tuple2 <String ,Integer >>(
  78. 78. –MatthiasJ.Sax–SquirrelsandStorms 19/22 WordCount: Embedded Bolt public void main(String [] args) { StreamExecutionEnvironment env = StreamExecutionEnvironment . getExecutionEnvironment (); DataStream <String > text = env.readTextFile("/tmp/hamlet.txt"); DataStream <Tuple2 <String ,Integer >> tokens = text.transform( "tokenizer", new BoltWrapper <String , Tuple2 <String ,Integer >>( new BoltTokenizer ()));
  79. 79. –MatthiasJ.Sax–SquirrelsandStorms 19/22 WordCount: Embedded Bolt public void main(String [] args) { StreamExecutionEnvironment env = StreamExecutionEnvironment . getExecutionEnvironment (); DataStream <String > text = env.readTextFile("/tmp/hamlet.txt"); DataStream <Tuple2 <String ,Integer >> tokens = text.transform( "tokenizer", TypeExtractor.getForObject new Tuple2 <String ,Integer >("", 0), new BoltWrapper <String , Tuple2 <String ,Integer >>( new BoltTokenizer ()));
  80. 80. –MatthiasJ.Sax–SquirrelsandStorms 19/22 WordCount: Embedded Bolt public void main(String [] args) { StreamExecutionEnvironment env = StreamExecutionEnvironment . getExecutionEnvironment (); DataStream <String > text = env.readTextFile("/tmp/hamlet.txt"); DataStream <Tuple2 <String ,Integer >> tokens = text.transform( "tokenizer", TypeExtractor.getForObject new Tuple2 <String ,Integer >("", 0), new BoltWrapper <String , Tuple2 <String ,Integer >>( new BoltTokenizer ())); // do further processing on tokens tokens.keyBy (0). sum (1). writeAsText("/tmp/count.txt"); env.execute("WordCount"); }
  81. 81. –MatthiasJ.Sax–SquirrelsandStorms 20/22 Embedded Compatibility Mode Re-use code within Flink streaming program: Spouts as Flink sources Bolts as Flink operators
  82. 82. –MatthiasJ.Sax–SquirrelsandStorms 20/22 Embedded Compatibility Mode Re-use code within Flink streaming program: Spouts as Flink sources Bolts as Flink operators Pros: mix-and-match of Storm and Flink operators configure Spouts/Bolts (Map/Config) spliting Spout/Bolt output streams type-safe embedding also raw types, ie, String instead of Tuple1 String convert infinite Spouts to finite sources FinitSpout interfacee
  83. 83. –MatthiasJ.Sax–SquirrelsandStorms 20/22 Embedded Compatibility Mode Re-use code within Flink streaming program: Spouts as Flink sources Bolts as Flink operators Pros: mix-and-match of Storm and Flink operators configure Spouts/Bolts (Map/Config) spliting Spout/Bolt output streams type-safe embedding also raw types, ie, String instead of Tuple1 String convert infinite Spouts to finite sources FinitSpout interfacee Cons: Currently, quite some boilderplate code necessary :/
  84. 84. –MatthiasJ.Sax–SquirrelsandStorms 21/22 Outlook: Storm Compatibility Current status: available in master branch based on Storm 0.9.4 will be part of Flink 0.10.0
  85. 85. –MatthiasJ.Sax–SquirrelsandStorms 21/22 Outlook: Storm Compatibility Current status: available in master branch based on Storm 0.9.4 will be part of Flink 0.10.0 Work in progress: Hooks Metrics
  86. 86. –MatthiasJ.Sax–SquirrelsandStorms 21/22 Outlook: Storm Compatibility Current status: available in master branch based on Storm 0.9.4 will be part of Flink 0.10.0 Work in progress: Hooks Metrics Next steps: enable fault-tolerance introduce FlinkTridentTopology improve embedded mode (StormEnvironment)
  87. 87. dbisINSTITUT FÜR INFORMATIK HUMBOLDT−UNIVERSITÄT ZU ERLINB A Tale of Squirrels and Storms Flink Forward 2015 Thanks!

×