Processing
“BIG-DATA”
In Real Time
Yanai Franchi , Tikal

1
2
Vacation to Barcelona

3
After a Long Travel Day

4
Going to a Salsa Club

5
Best Salsa Club
NOW
●

Good Music

●

Crowded –
Now!
6
Same Problem in “gogobot”

7
8
Lets' Develop
“Gogobot Checkins Heat-Map”

gogobot checkin
Heat Map Service

9
Key Notes
●

Collector Service - Collects checkins as text addresses
–

We need to use GeoLocation Service

●

Upon elapse...
Heat-Map Context
Heat-Map
Gogobot System
Text-Address
Checkins Heat-Map
Service

Gogobot
Micro Service
Gogobot
Micro Servi...
Plan A
Simulate Checkins with a File
Check-in #1
Check-in #2
Check-in #3
Check-in #4
Check-in #5
Check-in #6
Check-in #7
C...
Tons of Addresses
Arriving Every Second

13
Architect - First Reaction...

14
Second Reaction...

15
Developer
First
Reaction

16
Second
Reaction

17
Problems ?
●

Tedious: Spend time conf guring where to send
i
messages, deploying workers, and deploying
intermediate queu...
What We Want ?
● Horizontal scalability
● Fault-tolerance
● No intermediate message brokers!
● Higher level abstraction th...
Apache Storm

✔Horizontal scalability
✔Fault-tolerance
✔No intermediate message brokers!
✔Higher level abstraction than me...
Anatomy of Storm

21
What is Storm ?
●

CEP - Open source and distributed realtime
computation system.
–
–

●

Makes it easy to reliably proces...
Streams

Tuple

Tuple

Tuple

Tuple

Tuple

Tuple

Unbounded sequence of tuples

23
Spouts
Tuple

Tuple

Tuple

Tuple

Sources of Streams
24
Bolts
Tuple

Tuple

Tuple

Tuple

Tuple

TupleTuple

uple
ple T
Tu
Tuple

Processes input streams and produces
new streams...
Storm Topology
Tuple
Tuple

TupleTuple

TupleTuple
Tuple

Tuple
Tuple
Tuple

le
e Tup
Tupl
Tuple

Tuple

Tuple

TupleTuple...
Guarantee for Processing
●

●
●

Storm guarantees the full processing of a tuple by
tracking its state
In case of failure,...
Tasks (Bolt/Spout Instance)

Spouts and bolts execute as
many tasks across the cluster

28
Stream Grouping

When a tuple is emitted, which task
(instance) does it go to?

29
Stream Grouping
●
●

●
●

Shuff e grouping: pick a random task
l
Fields grouping: consistent hashing on a subset of
tuple ...
Tasks , Executors , Workers

Worker Process

Executor

Task

JVM

Executor

Task

Task

Thread

=

Thread

Sput /
Bolt

Sp...
Node
Worker Process
Executor

Spout A

Supervisor

Executor

Bolt C Bolt C

Executor
Bolt B Bolt B

Node
Worker Process

S...
Storm Architecture
NOT critical
for running topology
Upload/Rebalance
Heat-Map Topology
Supervisor
Supervisor

Supervisor
...
Storm Architecture
A few
nodes
Upload/Rebalance
Heat-Map Topology
Supervisor
Supervisor

Supervisor

Supervisor

Nimbus

S...
Storm Architecture

Upload/Rebalance
Heat-Map Topology
Supervisor
Supervisor

Supervisor

Supervisor

Nimbus

Supervisor

...
Assembling Heatmap Topology

36
HeatMap Input/Output Tuples
●

Input Tuples: Timestamp and Text Address :
–

●

(9:00:07 PM , “287 Hudson St New York NY 1...
Checkins
Spout

Heat Map
Storm
Topology

(9:01 PM @ 287 Hudson st)
Geocode
Lookup
Bolt
(9:01 PM , (40.736, -74,354)))
Heat...
Checkins Spout

public class CheckinsSpout extends BaseRichSpout {

private List<String> sampleLocations;
private int next...
Geocode Lookup Bolt
public class GeocodeLookupBolt extends BaseBasicBolt {
private LocatorService locatorService;
@Overrid...
Tick Tuple – Repeating Mantra

41
Two Streams to Heat-Map Builder
Checkin 1

Checkin 4 Checkin 5 Checkin 6

HeatMapBuilder Bolt

On tick tuple, we f ush our...
Tick Tuple in Action
public class HeatMapBuilderBolt extends BaseBasicBolt {
private Map<String, List<LocationDTO>> heatma...
Persister Bolt
public class PersistorBolt extends BaseBasicBolt {
private Jedis jedis;
@Override
public void execute(Tuple...
Transforming the Tuples
Checkins
Spout

Sample Checkins File
Check-in #1
Check-in #2
Check-in #3
Check-in #4
Check-in #5
C...
Heat Map Topology
public class LocalTopologyRunner {
public static void main(String[] args) {
TopologyBuilder builder = bu...
Its NOT Scaled

47
48
Scaling the Topology
public class LocalTopologyRunner {
conf.setNumWorkers(20);
public static void main(String[] args) {
T...
Demo

50
Recap – Plan A
Sample Checkins File
Check-in #1
Check-in #2
Check-in #3
Check-in #4
Check-in #5
Check-in #6
Check-in #7
Ch...
We have
something working

52
Add Kafka Messaging

53
Plan B Kafka Spout&Bolt to HeatMap
Publish
Checkins

Geo Location
Service

Checkin
Kafka Topic

Read
Kafka
Text Addresses ...
55
They all are Good
But not for all use-cases

56
Kafka
A little introduction

57
58
Pub-Sub Messaging System

59
60
61
62
63
Doesn't Fear the File System

64
65
66
67
Topics
●
●

Logical collections of partitions (the physical f les).
i
A broker contains some of the partitions for a topic...
A partition is Consumed by
Exactly One Group's Consumer

69
Distributed &
Fault-Tolerant
70
Producer 2

Producer 1

Broker 1

Broker 2

Broker 3

Zoo Keeper

Consumer 1

Consumer 2

71
Producer 2

Producer 1

Broker 1

Broker 2

Broker 3

Broker 4

Zoo Keeper

Consumer 1

Consumer 2

72
Producer 2

Producer 1

Broker 1

Broker 2

Broker 3

Broker 4

Zoo Keeper

Consumer 1

Consumer 2

73
Producer 2

Producer 1

Broker 1

Broker 2

Broker 3

Broker 4

Zoo Keeper

Consumer 1

Consumer 2

74
Producer 2

Producer 1

Broker 1

Broker 2

Broker 3

Broker 4

Zoo Keeper

Consumer 1

Consumer 2

75
Producer 2

Producer 1

Broker 1

Broker 2

Broker 3

Broker 4

Zoo Keeper

Consumer 1

Consumer 2

76
Producer 2

Producer 1

Broker 1

Broker 2

Broker 3

Broker 4

Zoo Keeper

Consumer 1

Consumer 2

77
Producer 2

Producer 1

Broker 1

Broker 2

Broker 3

Zoo Keeper

Consumer 1

Consumer 2

78
Producer 2

Producer 1

Broker 1

Broker 2

Broker 3

Zoo Keeper

Consumer 1

Consumer 2

79
Producer 2

Producer 1

Broker 1

Broker 2

Broker 3

Zoo Keeper

Consumer 1

Consumer 2

80
Producer 2

Producer 1

Broker 1

Broker 2

Broker 3

Zoo Keeper

Consumer 1

81
Producer 2

Producer 1

Broker 1

Broker 2

Broker 3

Zoo Keeper

Consumer 1

82
Producer 2

Producer 1

Broker 1

Broker 2

Broker 3

Zoo Keeper

Consumer 1

83
Performance Benchmark
1 Broker
1 Producer
1 Consumer
84
85
86
Add Kafka to our Topology
public class LocalTopologyRunner {
...
private static TopologyBuilder buildTopolgy() {

Kafka Sp...
Plan C – Add Reactor
Text-Address

Publish
Checkins

Checkin HTTP
Reactor

Checkin
Kafka Topic

Consume Checkins

Storm He...
Why Reactor ?

89
C10K
Problem
90
2008:
Thread Per Request/Response

91
Reactor Pattern Paradigm

Application
registers
handlers

...events
trigger
handlers

92
Reactor Pattern – Key Points
●
●
●
●

Single thread / single event loop
EVERYTHING runs on it
You MUST NOT block the event...
Reactor Pattern Problems
●

Some work is naturally blocking:
–
–

●

Intensive data crunching
3rd-party blocking API’s (e....
95
Vert.X Architecture
Vert.X Architecture

Event Bus

96
Vert.X Goodies
●

Growing Module

●

TCP/SSL servers/clients

●

Repository

●

●

web server

HTTP/HTTPS servers/
clients...
Node.JS vs Vert.X

98
Node.js vs Vert.X
●

Node.js

●

Vert.X

–

JavaScript Only

–

Polyglot (JavaScript,
Java, Ruby, Python...)

–

Inherentl...
Node.js vs Vert.X Benchmark

http://vertxproject.wordpress.com/2012/05/09/vert-x-vs-node-js-simple-http-benchmarks/

AMD P...
HeatMap Reactor Architecture
Vert.X Instance

Vert.X Instance
Automatically sends
EventBus Msg → KafkaTopic

HTTP
Server
V...
Heat-Map Server – Only 6 LOC !
var
var
var
var

vertx = require('vertx');
container = require('vertx/container');
console ...
Publish
Checkins

Checkin HTTP
Reactor

Checkin
Kafka Topic

Consume Checkins

Checkin HTTP
Firehose

GET Geo
Location

St...
Demo

104
Summary

105
When You go out to Salsa Club

●

Good Music

●

Crowded

106
More Conclusions..
●

Storm – Great for real-time BigData processing.
Complementary for Hadoop batch jobs.

●

Kafka – Gre...
Thanks

108
Upcoming SlideShare
Loading in...5
×

Processing Big Data in Realtime

2,334

Published on

Published in: Technology, Business
1 Comment
6 Likes
Statistics
Notes
  • Why we need Kafka here? Why not instead, more distributed workers/executes of "Checkins Spout" is enough?
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
2,334
On Slideshare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
66
Comments
1
Likes
6
Embeds 0
No embeds

No notes for slide

Transcript of "Processing Big Data in Realtime"

  1. 1. Processing “BIG-DATA” In Real Time Yanai Franchi , Tikal 1
  2. 2. 2
  3. 3. Vacation to Barcelona 3
  4. 4. After a Long Travel Day 4
  5. 5. Going to a Salsa Club 5
  6. 6. Best Salsa Club NOW ● Good Music ● Crowded – Now! 6
  7. 7. Same Problem in “gogobot” 7
  8. 8. 8
  9. 9. Lets' Develop “Gogobot Checkins Heat-Map” gogobot checkin Heat Map Service 9
  10. 10. Key Notes ● Collector Service - Collects checkins as text addresses – We need to use GeoLocation Service ● Upon elapsed interval, the last locations list will be displayed as Heat-Map in GUI. ● Web Scale service – 10Ks checkins/seconds all over the world (imaginary, but lets do it for the exercise). ● Accuracy – Sample data, NOT critical data. – – Proportionately representative Data volume is large enough to compensate for data loss. 10
  11. 11. Heat-Map Context Heat-Map Gogobot System Text-Address Checkins Heat-Map Service Gogobot Micro Service Gogobot Micro Service Get-GeoCode(Address) Gogobot Micro Service Geo Location Service Last Interval Locations 11
  12. 12. Plan A Simulate Checkins with a File Check-in #1 Check-in #2 Check-in #3 Check-in #4 Check-in #5 Check-in #6 Check-in #7 Check-in #8 Check-in #9 ... Geo Location Service GET Geo Location Read Text Address Processing Checkins Persist Checkin Intervals Database 12
  13. 13. Tons of Addresses Arriving Every Second 13
  14. 14. Architect - First Reaction... 14
  15. 15. Second Reaction... 15
  16. 16. Developer First Reaction 16
  17. 17. Second Reaction 17
  18. 18. Problems ? ● Tedious: Spend time conf guring where to send i messages, deploying workers, and deploying intermediate queues. ● Brittle: There's little fault-tolerance. ● Painful to scale: Partition of running worker/s is complicated. 18
  19. 19. What We Want ? ● Horizontal scalability ● Fault-tolerance ● No intermediate message brokers! ● Higher level abstraction than message passing ● “Just works” ● Guaranteed data processing (not in this case) 19
  20. 20. Apache Storm ✔Horizontal scalability ✔Fault-tolerance ✔No intermediate message brokers! ✔Higher level abstraction than message passing ✔“Just works” ✔Guaranteed data processing 20
  21. 21. Anatomy of Storm 21
  22. 22. What is Storm ? ● CEP - Open source and distributed realtime computation system. – – ● Makes it easy to reliably process unbounded streams of tuples Doing for realtime processing what Hadoop did for batch processing. Fast - 1M Tuples/sec per node. – It is scalable,fault-tolerant, guarantees your data will be processed, and is easy to set up and operate. 22
  23. 23. Streams Tuple Tuple Tuple Tuple Tuple Tuple Unbounded sequence of tuples 23
  24. 24. Spouts Tuple Tuple Tuple Tuple Sources of Streams 24
  25. 25. Bolts Tuple Tuple Tuple Tuple Tuple TupleTuple uple ple T Tu Tuple Processes input streams and produces new streams 25
  26. 26. Storm Topology Tuple Tuple TupleTuple TupleTuple Tuple Tuple Tuple Tuple le e Tup Tupl Tuple Tuple Tuple TupleTupleTuple Network of spouts and bolts 26
  27. 27. Guarantee for Processing ● ● ● Storm guarantees the full processing of a tuple by tracking its state In case of failure, Storm can re-process it. Source tuples with full “acked” trees are removed from the system 27
  28. 28. Tasks (Bolt/Spout Instance) Spouts and bolts execute as many tasks across the cluster 28
  29. 29. Stream Grouping When a tuple is emitted, which task (instance) does it go to? 29
  30. 30. Stream Grouping ● ● ● ● Shuff e grouping: pick a random task l Fields grouping: consistent hashing on a subset of tuple f elds i All grouping: send to all tasks Global grouping: pick task with lowest id 30
  31. 31. Tasks , Executors , Workers Worker Process Executor Task JVM Executor Task Task Thread = Thread Sput / Bolt Sput / Sput / Bolt Bolt 31
  32. 32. Node Worker Process Executor Spout A Supervisor Executor Bolt C Bolt C Executor Bolt B Bolt B Node Worker Process Supervisor Executor Executor Spout A Bolt C Bolt C Executor Bolt B Bolt B 32
  33. 33. Storm Architecture NOT critical for running topology Upload/Rebalance Heat-Map Topology Supervisor Supervisor Supervisor Supervisor Nimbus Supervisor Supervisor Zoo Keeper Nodes Master Node (similar to Hadoop JobTracker) 33
  34. 34. Storm Architecture A few nodes Upload/Rebalance Heat-Map Topology Supervisor Supervisor Supervisor Supervisor Nimbus Supervisor Supervisor Zoo Keeper Used For Cluster Coordination 34
  35. 35. Storm Architecture Upload/Rebalance Heat-Map Topology Supervisor Supervisor Supervisor Supervisor Nimbus Supervisor Supervisor Zoo Keeper Run Worker Processes 35
  36. 36. Assembling Heatmap Topology 36
  37. 37. HeatMap Input/Output Tuples ● Input Tuples: Timestamp and Text Address : – ● (9:00:07 PM , “287 Hudson St New York NY 10013”) Output Tuple: Time interval, and a list of points for it: – (9:00:00 PM to 9:00:15 PM, List((40.719,-73.987),(40.726,-74.001),(40.719,-73.987)) 37
  38. 38. Checkins Spout Heat Map Storm Topology (9:01 PM @ 287 Hudson st) Geocode Lookup Bolt (9:01 PM , (40.736, -74,354))) Heatmap Builder Bolt Upon Elapsed Interval (9:00 PM – 9:15 PM , List((40.73, -74,34), (51.36, -83,33),(69.73, -34,24)) Persistor Bolt 38
  39. 39. Checkins Spout public class CheckinsSpout extends BaseRichSpout { private List<String> sampleLocations; private int nextEmitIndex; private SpoutOutputCollector outputCollector; We hold state No need for thread safety @Override public void open(Map map, TopologyContext topologyContext, SpoutOutputCollector spoutOutputCollector) { this.outputCollector = spoutOutputCollector; this.nextEmitIndex = 0; sampleLocations = IOUtils.readLines( } ClassLoader.getSystemResourceAsStream("sanple-locations.txt")); @Override Been called public void nextTuple() { iteratively by Storm String address = checkins.get(nextEmitIndex); String checkin = new Date().getTime()+"@ADDRESS:"+address; outputCollector.emit(new Values(checkin)); } nextEmitIndex = (nextEmitIndex + 1) % sampleLocations.size(); @Override declareOutputFields(OutputFieldsDeclarer declarer.declare(new Fields("str")); public void } Declare output fields declarer) { 39
  40. 40. Geocode Lookup Bolt public class GeocodeLookupBolt extends BaseBasicBolt { private LocatorService locatorService; @Override public void prepare(Map stormConf, TopologyContext context) { locatorService = new GoogleLocatorService(); } @Override public void execute(Tuple tuple, BasicOutputCollector outputCollector) { String str = tuple.getStringByField("str"); String[] parts = str.split("@"); Long time = Long.valueOf(parts[0]); String address = parts[1]; Get Geocode, Create DTO LocationDTO locationDTO = locatorService.getLocation(address); if(checkinDTO!=null) } outputCollector.emit(new Values(time,locationDTO) ); @Override public void declareOutputFields(OutputFieldsDeclarer fieldsDeclarer) { } fieldsDeclarer.declare(new Fields("time", "location")); 40 }
  41. 41. Tick Tuple – Repeating Mantra 41
  42. 42. Two Streams to Heat-Map Builder Checkin 1 Checkin 4 Checkin 5 Checkin 6 HeatMapBuilder Bolt On tick tuple, we f ush our Heat-Map l 42
  43. 43. Tick Tuple in Action public class HeatMapBuilderBolt extends BaseBasicBolt { private Map<String, List<LocationDTO>> heatmaps; Hold latest intervals @Override public Map<String, Object> getComponentConfiguration() { Config conf = new Config(); conf.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, 60 ); return conf; } Tick interval @Override public void execute(Tuple tuple, BasicOutputCollector outputCollector) { if (isTickTuple(tuple)) { // Emit accumulated intervals } else { // Add check-in info to the current interval in the Map } } private boolean isTickTuple(Tuple tuple) { return tuple.getSourceComponent().equals(Constants.SYSTEM_COMPONENT_ID) && tuple.getSourceStreamId().equals(Constants.SYSTEM_TICK_STREAM_ID); } 43
  44. 44. Persister Bolt public class PersistorBolt extends BaseBasicBolt { private Jedis jedis; @Override public void execute(Tuple tuple, BasicOutputCollector outputCollector) { Long timeInterval = tuple.getLongByField("time-interval"); String city = tuple.getStringByField("city"); String locationsList = objectMapper.writeValueAsString ( tuple.getValueByField("locationsList")); String dbKey = "checkins-" + timeInterval+"@"+city; Persist in Redis for 24h jedis.setex(dbKey, 3600*24 ,locationsList); jedis.publish("location-key", dbKey); } } Publish in Redis channel for debugging 44
  45. 45. Transforming the Tuples Checkins Spout Sample Checkins File Check-in #1 Check-in #2 Check-in #3 Check-in #4 Check-in #5 Check-in #6 Check-in #7 Check-in #8 Check-in #9 ... Read Text Addresses Shuffle Grouping Geocode Lookup Bolt Get Geo Location Geo Location Service Group by city Field Grouping(city) Heatmap Builder Bolt Shuffle Grouping Database Persistor Bolt 45
  46. 46. Heat Map Topology public class LocalTopologyRunner { public static void main(String[] args) { TopologyBuilder builder = buildTopolgy(); StormSubmitter.submitTopology( "local-heatmap", new Config(), builder.createTopology()); } private static TopologyBuilder buildTopolgy() { topologyBuilder builder = new TopologyBuilder(); builder.setSpout("checkins", new CheckinsSpout()); builder.setBolt("geocode-lookup", new GeocodeLookupBolt() ) .shuffleGrouping("checkins"); builder.setBolt("heatmap-builder", new HeatMapBuilderBolt() ) .fieldsGrouping("geocode-lookup", new Fields("city")); builder.setBolt("persistor", new PersistorBolt() ) .shuffleGrouping("heatmap-builder"); } } return builder; 46
  47. 47. Its NOT Scaled 47
  48. 48. 48
  49. 49. Scaling the Topology public class LocalTopologyRunner { conf.setNumWorkers(20); public static void main(String[] args) { TopologyBuilder builder = buildTopolgy(); Config conf = new Config(); Set no. of workers conf.setNumWorkers(2); } StormSubmitter.submitTopology( "local-heatmap", conf, builder.createTopology()); Parallelism hint private static TopologyBuilder buildTopolgy() { topologyBuilder builder = new TopologyBuilder(); builder.setSpout("checkins", new CheckinsSpout(), 4 ); Increase Tasks For Future 8 ) .shuffleGrouping("checkins").setNumTasks(64); builder.setBolt("geocode-lookup", new GeocodeLookupBolt() , builder.setBolt("heatmap-builder", new HeatMapBuilderBolt() , 4) .fieldsGrouping("geocode-lookup", new Fields("city")); builder.setBolt("persistor", new PersistorBolt() , 2 ) .shuffleGrouping("heatmap-builder").setNumTasks(4);49 return builder;
  50. 50. Demo 50
  51. 51. Recap – Plan A Sample Checkins File Check-in #1 Check-in #2 Check-in #3 Check-in #4 Check-in #5 Check-in #6 Check-in #7 Check-in #8 Check-in #9 ... Geo Location Service GET Geo Location Read Text Address Storm Heat-Map Topology Persist Checkin Intervals Database 51
  52. 52. We have something working 52
  53. 53. Add Kafka Messaging 53
  54. 54. Plan B Kafka Spout&Bolt to HeatMap Publish Checkins Geo Location Service Checkin Kafka Topic Read Kafka Text Addresses Checkins Spout Geocode Lookup Bolt Heatmap Builder Bolt Database Persistor Bolt Kafka Locations Bolt Locations Topic 54
  55. 55. 55
  56. 56. They all are Good But not for all use-cases 56
  57. 57. Kafka A little introduction 57
  58. 58. 58
  59. 59. Pub-Sub Messaging System 59
  60. 60. 60
  61. 61. 61
  62. 62. 62
  63. 63. 63
  64. 64. Doesn't Fear the File System 64
  65. 65. 65
  66. 66. 66
  67. 67. 67
  68. 68. Topics ● ● Logical collections of partitions (the physical f les). i A broker contains some of the partitions for a topic 68
  69. 69. A partition is Consumed by Exactly One Group's Consumer 69
  70. 70. Distributed & Fault-Tolerant 70
  71. 71. Producer 2 Producer 1 Broker 1 Broker 2 Broker 3 Zoo Keeper Consumer 1 Consumer 2 71
  72. 72. Producer 2 Producer 1 Broker 1 Broker 2 Broker 3 Broker 4 Zoo Keeper Consumer 1 Consumer 2 72
  73. 73. Producer 2 Producer 1 Broker 1 Broker 2 Broker 3 Broker 4 Zoo Keeper Consumer 1 Consumer 2 73
  74. 74. Producer 2 Producer 1 Broker 1 Broker 2 Broker 3 Broker 4 Zoo Keeper Consumer 1 Consumer 2 74
  75. 75. Producer 2 Producer 1 Broker 1 Broker 2 Broker 3 Broker 4 Zoo Keeper Consumer 1 Consumer 2 75
  76. 76. Producer 2 Producer 1 Broker 1 Broker 2 Broker 3 Broker 4 Zoo Keeper Consumer 1 Consumer 2 76
  77. 77. Producer 2 Producer 1 Broker 1 Broker 2 Broker 3 Broker 4 Zoo Keeper Consumer 1 Consumer 2 77
  78. 78. Producer 2 Producer 1 Broker 1 Broker 2 Broker 3 Zoo Keeper Consumer 1 Consumer 2 78
  79. 79. Producer 2 Producer 1 Broker 1 Broker 2 Broker 3 Zoo Keeper Consumer 1 Consumer 2 79
  80. 80. Producer 2 Producer 1 Broker 1 Broker 2 Broker 3 Zoo Keeper Consumer 1 Consumer 2 80
  81. 81. Producer 2 Producer 1 Broker 1 Broker 2 Broker 3 Zoo Keeper Consumer 1 81
  82. 82. Producer 2 Producer 1 Broker 1 Broker 2 Broker 3 Zoo Keeper Consumer 1 82
  83. 83. Producer 2 Producer 1 Broker 1 Broker 2 Broker 3 Zoo Keeper Consumer 1 83
  84. 84. Performance Benchmark 1 Broker 1 Producer 1 Consumer 84
  85. 85. 85
  86. 86. 86
  87. 87. Add Kafka to our Topology public class LocalTopologyRunner { ... private static TopologyBuilder buildTopolgy() { Kafka Spout ... builder.setSpout("checkins", new KafkaSpout(kafkaConfig)); ... builder.setBolt("kafkaProducer", new KafkaOutputBolt ( "localhost:9092", "kafka.serializer.StringEncoder", "locations-topic")) .shuffleGrouping("persistor"); } } Kafka Bolt return builder; 87
  88. 88. Plan C – Add Reactor Text-Address Publish Checkins Checkin HTTP Reactor Checkin Kafka Topic Consume Checkins Storm Heat-Map Topology Persist Checkin Intervals Database GET Geo Location Publish Interval Key Locations Kafka Topic Geo Location Service Index Interval Locations Search Server Index 88
  89. 89. Why Reactor ? 89
  90. 90. C10K Problem 90
  91. 91. 2008: Thread Per Request/Response 91
  92. 92. Reactor Pattern Paradigm Application registers handlers ...events trigger handlers 92
  93. 93. Reactor Pattern – Key Points ● ● ● ● Single thread / single event loop EVERYTHING runs on it You MUST NOT block the event loop Many Implementations (partial list): – Node.js (JavaScrip), EventMachine (Ruby), Twisted (Python)... and Vert.X 93
  94. 94. Reactor Pattern Problems ● Some work is naturally blocking: – – ● Intensive data crunching 3rd-party blocking API’s (e.g. JDBC) Pure reactor (e.g. Node.js) is not a good f t for this i kind of work! 94
  95. 95. 95
  96. 96. Vert.X Architecture Vert.X Architecture Event Bus 96
  97. 97. Vert.X Goodies ● Growing Module ● TCP/SSL servers/clients ● Repository ● ● web server HTTP/HTTPS servers/ clients ● WebSockets support ● SockJS support ● Persistors (Mongo, JDBC, ...) ● Work queue ● Timers ● Authentication ● Buffers ● Manager ● Streams and Pumps ● Session manager ● Routing ● Socket.IO ● Asynchronous File I/O 97
  98. 98. Node.JS vs Vert.X 98
  99. 99. Node.js vs Vert.X ● Node.js ● Vert.X – JavaScript Only – Polyglot (JavaScript, Java, Ruby, Python...) – Inherently Single Threaded – Leverages JVM multithreading – No help much with IPC – Nervous Event Bus – All code MUST be in Event loop – Blocking work can be done off the event loop 99
  100. 100. Node.js vs Vert.X Benchmark http://vertxproject.wordpress.com/2012/05/09/vert-x-vs-node-js-simple-http-benchmarks/ AMD Phenom II X6 (6 core), 8GB RAM, Ubuntu 11.04 100
  101. 101. HeatMap Reactor Architecture Vert.X Instance Vert.X Instance Automatically sends EventBus Msg → KafkaTopic HTTP Server Verticle Kafka module Kafka Topic Event Bus Storm Topology 101
  102. 102. Heat-Map Server – Only 6 LOC ! var var var var vertx = require('vertx'); container = require('vertx/container'); console = require('vertx/console'); config = container.config; Send checkin to Vert.X EventBus vertx.createHttpServer().requestHandler(function(request) { request.dataHandler(function(buffer) { vertx.eventBus.send(config.address, {payload:buffer.toString()}); }); request.response.end(); }).listen(config.httpServerPort, config.httpServerHost); console.log("HTTP CheckinsReactor started on port "+config.httpServerPort); 102
  103. 103. Publish Checkins Checkin HTTP Reactor Checkin Kafka Topic Consume Checkins Checkin HTTP Firehose GET Geo Location Storm Heat-Map Topology Persist Checkin Intervals Publish Interval Key Index Interval Locations Hotzones Kafka Topic Database Consume Intervals Keys Search Get Interval Locations Geo Location Service Search Server Web App Push via WebSocket Index 103
  104. 104. Demo 104
  105. 105. Summary 105
  106. 106. When You go out to Salsa Club ● Good Music ● Crowded 106
  107. 107. More Conclusions.. ● Storm – Great for real-time BigData processing. Complementary for Hadoop batch jobs. ● Kafka – Great messaging for logs/events data, been served as a good “source” for Storm spout ● Vert.X – Worth trial and check as an alternative for reactor. 107
  108. 108. Thanks 108
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×