SlideShare a Scribd company logo
1 of 108
Download to read offline
Processing
“BIG-DATA”
In Real Time
Yanai Franchi , Tikal

1
2
Vacation to Barcelona

3
After a Long Travel Day

4
Going to a Salsa Club

5
Best Salsa Club
NOW
●

Good Music

●

Crowded –
Now!
6
Same Problem in “gogobot”

7
8
Lets' Develop
“Gogobot Checkins Heat-Map”

gogobot checkin
Heat Map Service

9
Key Notes
●

Collector Service - Collects checkins as text addresses
–

We need to use GeoLocation Service

●

Upon elapsed interval, the last locations list will be
displayed as Heat-Map in GUI.

●

Web Scale service – 10Ks checkins/seconds all over the
world (imaginary, but lets do it for the exercise).

●

Accuracy – Sample data, NOT critical data.
–
–

Proportionately representative
Data volume is large enough to compensate for data loss.
10
Heat-Map Context
Heat-Map
Gogobot System
Text-Address
Checkins Heat-Map
Service

Gogobot
Micro Service
Gogobot
Micro Service

Get-GeoCode(Address)

Gogobot
Micro Service

Geo Location
Service

Last Interval Locations
11
Plan A
Simulate Checkins with a File
Check-in #1
Check-in #2
Check-in #3
Check-in #4
Check-in #5
Check-in #6
Check-in #7
Check-in #8
Check-in #9
...

Geo Location
Service
GET Geo
Location

Read
Text Address
Processing
Checkins

Persist Checkin
Intervals

Database
12
Tons of Addresses
Arriving Every Second

13
Architect - First Reaction...

14
Second Reaction...

15
Developer
First
Reaction

16
Second
Reaction

17
Problems ?
●

Tedious: Spend time conf guring where to send
i
messages, deploying workers, and deploying
intermediate queues.

●

Brittle: There's little fault-tolerance.

●

Painful to scale: Partition of running worker/s is
complicated.

18
What We Want ?
● Horizontal scalability
● Fault-tolerance
● No intermediate message brokers!
● Higher level abstraction than message
passing
● “Just works”
● Guaranteed data processing (not in this
case)
19
Apache Storm

✔Horizontal scalability
✔Fault-tolerance
✔No intermediate message brokers!
✔Higher level abstraction than message
passing
✔“Just works”
✔Guaranteed data processing

20
Anatomy of Storm

21
What is Storm ?
●

CEP - Open source and distributed realtime
computation system.
–
–

●

Makes it easy to reliably process unbounded streams of
tuples
Doing for realtime processing what Hadoop did for batch
processing.

Fast - 1M Tuples/sec per node.
–

It is scalable,fault-tolerant, guarantees your data will be
processed, and is easy to set up and operate.
22
Streams

Tuple

Tuple

Tuple

Tuple

Tuple

Tuple

Unbounded sequence of tuples

23
Spouts
Tuple

Tuple

Tuple

Tuple

Sources of Streams
24
Bolts
Tuple

Tuple

Tuple

Tuple

Tuple

TupleTuple

uple
ple T
Tu
Tuple

Processes input streams and produces
new streams

25
Storm Topology
Tuple
Tuple

TupleTuple

TupleTuple
Tuple

Tuple
Tuple
Tuple

le
e Tup
Tupl
Tuple

Tuple

Tuple

TupleTupleTuple

Network of spouts and bolts

26
Guarantee for Processing
●

●
●

Storm guarantees the full processing of a tuple by
tracking its state
In case of failure, Storm can re-process it.
Source tuples with full “acked” trees are removed
from the system

27
Tasks (Bolt/Spout Instance)

Spouts and bolts execute as
many tasks across the cluster

28
Stream Grouping

When a tuple is emitted, which task
(instance) does it go to?

29
Stream Grouping
●
●

●
●

Shuff e grouping: pick a random task
l
Fields grouping: consistent hashing on a subset of
tuple f elds
i
All grouping: send to all tasks
Global grouping: pick task with lowest id

30
Tasks , Executors , Workers

Worker Process

Executor

Task

JVM

Executor

Task

Task

Thread

=

Thread

Sput /
Bolt

Sput / Sput /
Bolt Bolt

31
Node
Worker Process
Executor

Spout A

Supervisor

Executor

Bolt C Bolt C

Executor
Bolt B Bolt B

Node
Worker Process

Supervisor

Executor

Executor

Spout A

Bolt C Bolt C

Executor
Bolt B Bolt B

32
Storm Architecture
NOT critical
for running topology
Upload/Rebalance
Heat-Map Topology
Supervisor
Supervisor

Supervisor

Supervisor

Nimbus

Supervisor

Supervisor

Zoo Keeper
Nodes

Master Node
(similar to Hadoop JobTracker)

33
Storm Architecture
A few
nodes
Upload/Rebalance
Heat-Map Topology
Supervisor
Supervisor

Supervisor

Supervisor

Nimbus

Supervisor

Supervisor

Zoo Keeper

Used For Cluster Coordination
34
Storm Architecture

Upload/Rebalance
Heat-Map Topology
Supervisor
Supervisor

Supervisor

Supervisor

Nimbus

Supervisor

Supervisor

Zoo Keeper

Run Worker Processes
35
Assembling Heatmap Topology

36
HeatMap Input/Output Tuples
●

Input Tuples: Timestamp and Text Address :
–

●

(9:00:07 PM , “287 Hudson St New York NY 10013”)

Output Tuple: Time interval, and a list of points for
it:
–

(9:00:00 PM to 9:00:15 PM,
List((40.719,-73.987),(40.726,-74.001),(40.719,-73.987))

37
Checkins
Spout

Heat Map
Storm
Topology

(9:01 PM @ 287 Hudson st)
Geocode
Lookup
Bolt
(9:01 PM , (40.736, -74,354)))
Heatmap
Builder
Bolt

Upon
Elapsed Interval

(9:00 PM – 9:15 PM , List((40.73, -74,34),
(51.36, -83,33),(69.73, -34,24))
Persistor
Bolt
38
Checkins Spout

public class CheckinsSpout extends BaseRichSpout {

private List<String> sampleLocations;
private int nextEmitIndex;
private SpoutOutputCollector outputCollector;

We hold state
No need for thread safety

@Override
public void open(Map map, TopologyContext topologyContext,
SpoutOutputCollector spoutOutputCollector) {
this.outputCollector = spoutOutputCollector;
this.nextEmitIndex = 0;

sampleLocations = IOUtils.readLines(

}

ClassLoader.getSystemResourceAsStream("sanple-locations.txt"));

@Override
Been called
public void nextTuple() {
iteratively by Storm
String address = checkins.get(nextEmitIndex);
String checkin = new Date().getTime()+"@ADDRESS:"+address;

outputCollector.emit(new Values(checkin));

}

nextEmitIndex = (nextEmitIndex + 1) % sampleLocations.size();

@Override

declareOutputFields(OutputFieldsDeclarer
declarer.declare(new Fields("str"));

public void
}

Declare
output fields

declarer) {

39
Geocode Lookup Bolt
public class GeocodeLookupBolt extends BaseBasicBolt {
private LocatorService locatorService;
@Override
public void prepare(Map stormConf, TopologyContext context) {
locatorService = new GoogleLocatorService();
}
@Override
public void execute(Tuple tuple, BasicOutputCollector outputCollector) {

String str = tuple.getStringByField("str");
String[] parts = str.split("@");
Long time = Long.valueOf(parts[0]);
String address = parts[1];

Get Geocode,
Create DTO

LocationDTO locationDTO = locatorService.getLocation(address);
if(checkinDTO!=null)
}

outputCollector.emit(new Values(time,locationDTO) );

@Override
public void declareOutputFields(OutputFieldsDeclarer fieldsDeclarer) {
}

fieldsDeclarer.declare(new Fields("time", "location"));

40
}
Tick Tuple – Repeating Mantra

41
Two Streams to Heat-Map Builder
Checkin 1

Checkin 4 Checkin 5 Checkin 6

HeatMapBuilder Bolt

On tick tuple, we f ush our Heat-Map
l
42
Tick Tuple in Action
public class HeatMapBuilderBolt extends BaseBasicBolt {
private Map<String, List<LocationDTO>> heatmaps;

Hold latest intervals

@Override
public Map<String, Object> getComponentConfiguration() {
Config conf = new Config();
conf.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, 60 );
return conf;
}

Tick interval

@Override
public void execute(Tuple tuple, BasicOutputCollector outputCollector) {
if (isTickTuple(tuple)) {
// Emit accumulated intervals
} else {
// Add check-in info to the current interval in the Map
}
}
private boolean isTickTuple(Tuple tuple) {
return tuple.getSourceComponent().equals(Constants.SYSTEM_COMPONENT_ID)
&& tuple.getSourceStreamId().equals(Constants.SYSTEM_TICK_STREAM_ID);
}
43
Persister Bolt
public class PersistorBolt extends BaseBasicBolt {
private Jedis jedis;
@Override
public void execute(Tuple tuple, BasicOutputCollector outputCollector) {
Long timeInterval = tuple.getLongByField("time-interval");
String city = tuple.getStringByField("city");
String locationsList = objectMapper.writeValueAsString
( tuple.getValueByField("locationsList"));
String dbKey = "checkins-" + timeInterval+"@"+city;

Persist in Redis
for 24h

jedis.setex(dbKey, 3600*24 ,locationsList);
jedis.publish("location-key", dbKey);
}

}

Publish in
Redis channel
for debugging
44
Transforming the Tuples
Checkins
Spout

Sample Checkins File
Check-in #1
Check-in #2
Check-in #3
Check-in #4
Check-in #5
Check-in #6
Check-in #7
Check-in #8
Check-in #9
...

Read
Text Addresses

Shuffle Grouping
Geocode
Lookup
Bolt

Get Geo
Location
Geo Location
Service

Group by city

Field Grouping(city)
Heatmap
Builder
Bolt
Shuffle Grouping

Database

Persistor
Bolt

45
Heat Map Topology
public class LocalTopologyRunner {
public static void main(String[] args) {
TopologyBuilder builder = buildTopolgy();
StormSubmitter.submitTopology(
"local-heatmap", new Config(), builder.createTopology());
}
private static TopologyBuilder buildTopolgy() {
topologyBuilder builder = new TopologyBuilder();
builder.setSpout("checkins", new CheckinsSpout());
builder.setBolt("geocode-lookup", new GeocodeLookupBolt() )
.shuffleGrouping("checkins");
builder.setBolt("heatmap-builder", new HeatMapBuilderBolt() )

.fieldsGrouping("geocode-lookup",

new Fields("city"));

builder.setBolt("persistor", new PersistorBolt() )
.shuffleGrouping("heatmap-builder");

}

}

return builder;
46
Its NOT Scaled

47
48
Scaling the Topology
public class LocalTopologyRunner {
conf.setNumWorkers(20);
public static void main(String[] args) {
TopologyBuilder builder = buildTopolgy();
Config conf = new Config();

Set no. of workers

conf.setNumWorkers(2);

}

StormSubmitter.submitTopology(
"local-heatmap", conf, builder.createTopology());

Parallelism hint
private static TopologyBuilder buildTopolgy() {
topologyBuilder builder = new TopologyBuilder();
builder.setSpout("checkins", new CheckinsSpout(),

4

);

Increase Tasks
For Future

8 )
.shuffleGrouping("checkins").setNumTasks(64);

builder.setBolt("geocode-lookup", new GeocodeLookupBolt() ,

builder.setBolt("heatmap-builder", new HeatMapBuilderBolt() , 4)
.fieldsGrouping("geocode-lookup", new Fields("city"));
builder.setBolt("persistor", new PersistorBolt() ,

2

)

.shuffleGrouping("heatmap-builder").setNumTasks(4);49
return builder;
Demo

50
Recap – Plan A
Sample Checkins File
Check-in #1
Check-in #2
Check-in #3
Check-in #4
Check-in #5
Check-in #6
Check-in #7
Check-in #8
Check-in #9
...

Geo Location
Service
GET Geo
Location

Read
Text Address
Storm Heat-Map
Topology

Persist Checkin
Intervals

Database
51
We have
something working

52
Add Kafka Messaging

53
Plan B Kafka Spout&Bolt to HeatMap
Publish
Checkins

Geo Location
Service

Checkin
Kafka Topic

Read
Kafka
Text Addresses Checkins
Spout

Geocode
Lookup
Bolt
Heatmap
Builder
Bolt

Database

Persistor
Bolt

Kafka
Locations
Bolt
Locations
Topic

54
55
They all are Good
But not for all use-cases

56
Kafka
A little introduction

57
58
Pub-Sub Messaging System

59
60
61
62
63
Doesn't Fear the File System

64
65
66
67
Topics
●
●

Logical collections of partitions (the physical f les).
i
A broker contains some of the partitions for a topic

68
A partition is Consumed by
Exactly One Group's Consumer

69
Distributed &
Fault-Tolerant
70
Producer 2

Producer 1

Broker 1

Broker 2

Broker 3

Zoo Keeper

Consumer 1

Consumer 2

71
Producer 2

Producer 1

Broker 1

Broker 2

Broker 3

Broker 4

Zoo Keeper

Consumer 1

Consumer 2

72
Producer 2

Producer 1

Broker 1

Broker 2

Broker 3

Broker 4

Zoo Keeper

Consumer 1

Consumer 2

73
Producer 2

Producer 1

Broker 1

Broker 2

Broker 3

Broker 4

Zoo Keeper

Consumer 1

Consumer 2

74
Producer 2

Producer 1

Broker 1

Broker 2

Broker 3

Broker 4

Zoo Keeper

Consumer 1

Consumer 2

75
Producer 2

Producer 1

Broker 1

Broker 2

Broker 3

Broker 4

Zoo Keeper

Consumer 1

Consumer 2

76
Producer 2

Producer 1

Broker 1

Broker 2

Broker 3

Broker 4

Zoo Keeper

Consumer 1

Consumer 2

77
Producer 2

Producer 1

Broker 1

Broker 2

Broker 3

Zoo Keeper

Consumer 1

Consumer 2

78
Producer 2

Producer 1

Broker 1

Broker 2

Broker 3

Zoo Keeper

Consumer 1

Consumer 2

79
Producer 2

Producer 1

Broker 1

Broker 2

Broker 3

Zoo Keeper

Consumer 1

Consumer 2

80
Producer 2

Producer 1

Broker 1

Broker 2

Broker 3

Zoo Keeper

Consumer 1

81
Producer 2

Producer 1

Broker 1

Broker 2

Broker 3

Zoo Keeper

Consumer 1

82
Producer 2

Producer 1

Broker 1

Broker 2

Broker 3

Zoo Keeper

Consumer 1

83
Performance Benchmark
1 Broker
1 Producer
1 Consumer
84
85
86
Add Kafka to our Topology
public class LocalTopologyRunner {
...
private static TopologyBuilder buildTopolgy() {

Kafka Spout

...
builder.setSpout("checkins", new KafkaSpout(kafkaConfig));
...
builder.setBolt("kafkaProducer", new KafkaOutputBolt
( "localhost:9092",
"kafka.serializer.StringEncoder",
"locations-topic"))
.shuffleGrouping("persistor");

}

}

Kafka Bolt

return builder;

87
Plan C – Add Reactor
Text-Address

Publish
Checkins

Checkin HTTP
Reactor

Checkin
Kafka Topic

Consume Checkins

Storm Heat-Map
Topology

Persist Checkin
Intervals

Database

GET Geo
Location

Publish
Interval Key
Locations
Kafka Topic

Geo Location
Service

Index Interval
Locations

Search
Server
Index

88
Why Reactor ?

89
C10K
Problem
90
2008:
Thread Per Request/Response

91
Reactor Pattern Paradigm

Application
registers
handlers

...events
trigger
handlers

92
Reactor Pattern – Key Points
●
●
●
●

Single thread / single event loop
EVERYTHING runs on it
You MUST NOT block the event loop
Many Implementations (partial list):
–

Node.js (JavaScrip), EventMachine (Ruby), Twisted
(Python)... and Vert.X

93
Reactor Pattern Problems
●

Some work is naturally blocking:
–
–

●

Intensive data crunching
3rd-party blocking API’s (e.g. JDBC)

Pure reactor (e.g. Node.js) is not a good f t for this
i
kind of work!

94
95
Vert.X Architecture
Vert.X Architecture

Event Bus

96
Vert.X Goodies
●

Growing Module

●

TCP/SSL servers/clients

●

Repository

●

●

web server

HTTP/HTTPS servers/
clients

●

WebSockets support

●

SockJS support

●

Persistors (Mongo,
JDBC, ...)

●

Work queue

●

Timers

●

Authentication

●

Buffers

●

Manager

●

Streams and Pumps

●

Session manager

●

Routing

●

Socket.IO

●

Asynchronous File I/O
97
Node.JS vs Vert.X

98
Node.js vs Vert.X
●

Node.js

●

Vert.X

–

JavaScript Only

–

Polyglot (JavaScript,
Java, Ruby, Python...)

–

Inherently Single
Threaded

–

Leverages JVM multithreading

–

No help much with IPC

–

Nervous Event Bus

–

All code MUST be in
Event loop

–

Blocking work can be
done off the event loop
99
Node.js vs Vert.X Benchmark

http://vertxproject.wordpress.com/2012/05/09/vert-x-vs-node-js-simple-http-benchmarks/

AMD Phenom II X6 (6 core), 8GB
RAM, Ubuntu 11.04

100
HeatMap Reactor Architecture
Vert.X Instance

Vert.X Instance
Automatically sends
EventBus Msg → KafkaTopic

HTTP
Server
Verticle

Kafka
module
Kafka Topic

Event Bus

Storm
Topology
101
Heat-Map Server – Only 6 LOC !
var
var
var
var

vertx = require('vertx');
container = require('vertx/container');
console = require('vertx/console');
config = container.config;

Send checkin
to Vert.X EventBus

vertx.createHttpServer().requestHandler(function(request) {
request.dataHandler(function(buffer) {
vertx.eventBus.send(config.address, {payload:buffer.toString()});
});
request.response.end();
}).listen(config.httpServerPort, config.httpServerHost);
console.log("HTTP CheckinsReactor started on port "+config.httpServerPort);

102
Publish
Checkins

Checkin HTTP
Reactor

Checkin
Kafka Topic

Consume Checkins

Checkin HTTP
Firehose

GET Geo
Location

Storm Heat-Map
Topology

Persist Checkin
Intervals

Publish
Interval Key

Index Interval
Locations

Hotzones
Kafka Topic

Database

Consume Intervals Keys
Search

Get Interval Locations

Geo Location
Service

Search
Server

Web App
Push via WebSocket

Index

103
Demo

104
Summary

105
When You go out to Salsa Club

●

Good Music

●

Crowded

106
More Conclusions..
●

Storm – Great for real-time BigData processing.
Complementary for Hadoop batch jobs.

●

Kafka – Great messaging for logs/events data, been
served as a good “source” for Storm spout

●

Vert.X – Worth trial and check as an alternative for
reactor.

107
Thanks

108

More Related Content

What's hot

Taking Your Database Beyond the Border of a Single Kubernetes Cluster
Taking Your Database Beyond the Border of a Single Kubernetes ClusterTaking Your Database Beyond the Border of a Single Kubernetes Cluster
Taking Your Database Beyond the Border of a Single Kubernetes ClusterChristopher Bradford
 
Super COMPUTING Journal
Super COMPUTING JournalSuper COMPUTING Journal
Super COMPUTING JournalPandey_G
 
On heap cache vs off-heap cache
On heap cache vs off-heap cacheOn heap cache vs off-heap cache
On heap cache vs off-heap cachergrebski
 
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Igor Sfiligoi
 
Alternative cryptocurrencies
Alternative cryptocurrencies Alternative cryptocurrencies
Alternative cryptocurrencies vpnmentor
 
Smallworld Data Check-Out to Microstation
Smallworld Data Check-Out to MicrostationSmallworld Data Check-Out to Microstation
Smallworld Data Check-Out to MicrostationSafe Software
 
SkyhookDM - Towards an Arrow-Native Storage System
SkyhookDM - Towards an Arrow-Native Storage SystemSkyhookDM - Towards an Arrow-Native Storage System
SkyhookDM - Towards an Arrow-Native Storage SystemJayjeetChakraborty
 
Computer Performance Microscopy with SHIM
Computer Performance Microscopy with SHIMComputer Performance Microscopy with SHIM
Computer Performance Microscopy with SHIMhiyangxi
 
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...Igor Sfiligoi
 
NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
 NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic... NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...Igor Sfiligoi
 
Burst data retrieval after 50k GPU Cloud run
Burst data retrieval after 50k GPU Cloud runBurst data retrieval after 50k GPU Cloud run
Burst data retrieval after 50k GPU Cloud runIgor Sfiligoi
 
Smallworld to GreGG - FME Server Automation
Smallworld to GreGG - FME Server AutomationSmallworld to GreGG - FME Server Automation
Smallworld to GreGG - FME Server AutomationSafe Software
 
Pig TPC-H Benchmark and Performance Tuning
Pig TPC-H Benchmark and Performance TuningPig TPC-H Benchmark and Performance Tuning
Pig TPC-H Benchmark and Performance TuningJie Li
 
Pharo Status Fosdem 2015
Pharo Status Fosdem 2015Pharo Status Fosdem 2015
Pharo Status Fosdem 2015Marcus Denker
 
Data-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstData-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstIgor Sfiligoi
 
하이퍼커넥트 데이터 팀이 데이터 증가에 대처해온 기록
하이퍼커넥트 데이터 팀이 데이터 증가에 대처해온 기록하이퍼커넥트 데이터 팀이 데이터 증가에 대처해온 기록
하이퍼커넥트 데이터 팀이 데이터 증가에 대처해온 기록Jaehyeuk Oh
 
"Building and running the cloud GPU vacuum cleaner"
"Building and running the cloud GPU vacuum cleaner""Building and running the cloud GPU vacuum cleaner"
"Building and running the cloud GPU vacuum cleaner"Frank Wuerthwein
 

What's hot (19)

Fun421 stephens
Fun421 stephensFun421 stephens
Fun421 stephens
 
Taking Your Database Beyond the Border of a Single Kubernetes Cluster
Taking Your Database Beyond the Border of a Single Kubernetes ClusterTaking Your Database Beyond the Border of a Single Kubernetes Cluster
Taking Your Database Beyond the Border of a Single Kubernetes Cluster
 
Super COMPUTING Journal
Super COMPUTING JournalSuper COMPUTING Journal
Super COMPUTING Journal
 
On heap cache vs off-heap cache
On heap cache vs off-heap cacheOn heap cache vs off-heap cache
On heap cache vs off-heap cache
 
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
 
Alternative cryptocurrencies
Alternative cryptocurrencies Alternative cryptocurrencies
Alternative cryptocurrencies
 
Smallworld Data Check-Out to Microstation
Smallworld Data Check-Out to MicrostationSmallworld Data Check-Out to Microstation
Smallworld Data Check-Out to Microstation
 
SkyhookDM - Towards an Arrow-Native Storage System
SkyhookDM - Towards an Arrow-Native Storage SystemSkyhookDM - Towards an Arrow-Native Storage System
SkyhookDM - Towards an Arrow-Native Storage System
 
Computer Performance Microscopy with SHIM
Computer Performance Microscopy with SHIMComputer Performance Microscopy with SHIM
Computer Performance Microscopy with SHIM
 
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
 
NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
 NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic... NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
 
Burst data retrieval after 50k GPU Cloud run
Burst data retrieval after 50k GPU Cloud runBurst data retrieval after 50k GPU Cloud run
Burst data retrieval after 50k GPU Cloud run
 
Smallworld to GreGG - FME Server Automation
Smallworld to GreGG - FME Server AutomationSmallworld to GreGG - FME Server Automation
Smallworld to GreGG - FME Server Automation
 
Pig TPC-H Benchmark and Performance Tuning
Pig TPC-H Benchmark and Performance TuningPig TPC-H Benchmark and Performance Tuning
Pig TPC-H Benchmark and Performance Tuning
 
Progress_190315
Progress_190315Progress_190315
Progress_190315
 
Pharo Status Fosdem 2015
Pharo Status Fosdem 2015Pharo Status Fosdem 2015
Pharo Status Fosdem 2015
 
Data-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstData-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud Burst
 
하이퍼커넥트 데이터 팀이 데이터 증가에 대처해온 기록
하이퍼커넥트 데이터 팀이 데이터 증가에 대처해온 기록하이퍼커넥트 데이터 팀이 데이터 증가에 대처해온 기록
하이퍼커넥트 데이터 팀이 데이터 증가에 대처해온 기록
 
"Building and running the cloud GPU vacuum cleaner"
"Building and running the cloud GPU vacuum cleaner""Building and running the cloud GPU vacuum cleaner"
"Building and running the cloud GPU vacuum cleaner"
 

Similar to Processing Big Data in Real Time with Apache Storm

Processing Big Data in Real-Time - Yanai Franchi, Tikal
Processing Big Data in Real-Time - Yanai Franchi, TikalProcessing Big Data in Real-Time - Yanai Franchi, Tikal
Processing Big Data in Real-Time - Yanai Franchi, TikalCodemotion Tel Aviv
 
Developing Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache StormDeveloping Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache StormLester Martin
 
Apache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapApache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapKostas Tzoumas
 
Where in the world is Franz Kafka? | Will LaForest, Confluent
Where in the world is Franz Kafka? | Will LaForest, ConfluentWhere in the world is Franz Kafka? | Will LaForest, Confluent
Where in the world is Franz Kafka? | Will LaForest, ConfluentHostedbyConfluent
 
Real time and reliable processing with Apache Storm
Real time and reliable processing with Apache StormReal time and reliable processing with Apache Storm
Real time and reliable processing with Apache StormAndrea Iacono
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Intel® Software
 
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormDavorin Vukelic
 
S4495-plasma-turbulence-sims-gyrokinetic-tokamak-solver
S4495-plasma-turbulence-sims-gyrokinetic-tokamak-solverS4495-plasma-turbulence-sims-gyrokinetic-tokamak-solver
S4495-plasma-turbulence-sims-gyrokinetic-tokamak-solverPraveen Narayanan
 
Deuce STM - CMP'09
Deuce STM - CMP'09Deuce STM - CMP'09
Deuce STM - CMP'09Guy Korland
 
Yevhen Tatarynov "From POC to High-Performance .NET applications"
Yevhen Tatarynov "From POC to High-Performance .NET applications"Yevhen Tatarynov "From POC to High-Performance .NET applications"
Yevhen Tatarynov "From POC to High-Performance .NET applications"LogeekNightUkraine
 
Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.Dan Lynn
 
Concurrency in Go by Denys Goldiner.pdf
Concurrency in Go by Denys Goldiner.pdfConcurrency in Go by Denys Goldiner.pdf
Concurrency in Go by Denys Goldiner.pdfDenys Goldiner
 
Stevens 3rd Annual Conference Hfc2011
Stevens 3rd Annual Conference Hfc2011Stevens 3rd Annual Conference Hfc2011
Stevens 3rd Annual Conference Hfc2011jzw200
 

Similar to Processing Big Data in Real Time with Apache Storm (20)

Heatmap
HeatmapHeatmap
Heatmap
 
Processing Big Data in Real-Time - Yanai Franchi, Tikal
Processing Big Data in Real-Time - Yanai Franchi, TikalProcessing Big Data in Real-Time - Yanai Franchi, Tikal
Processing Big Data in Real-Time - Yanai Franchi, Tikal
 
Developing Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache StormDeveloping Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache Storm
 
Apache Storm Tutorial
Apache Storm TutorialApache Storm Tutorial
Apache Storm Tutorial
 
Apache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapApache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmap
 
So you think you can stream.pptx
So you think you can stream.pptxSo you think you can stream.pptx
So you think you can stream.pptx
 
Where in the world is Franz Kafka? | Will LaForest, Confluent
Where in the world is Franz Kafka? | Will LaForest, ConfluentWhere in the world is Franz Kafka? | Will LaForest, Confluent
Where in the world is Franz Kafka? | Will LaForest, Confluent
 
Real time and reliable processing with Apache Storm
Real time and reliable processing with Apache StormReal time and reliable processing with Apache Storm
Real time and reliable processing with Apache Storm
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*
 
RxJava on Android
RxJava on AndroidRxJava on Android
RxJava on Android
 
Storm is coming
Storm is comingStorm is coming
Storm is coming
 
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache Storm
 
S4495-plasma-turbulence-sims-gyrokinetic-tokamak-solver
S4495-plasma-turbulence-sims-gyrokinetic-tokamak-solverS4495-plasma-turbulence-sims-gyrokinetic-tokamak-solver
S4495-plasma-turbulence-sims-gyrokinetic-tokamak-solver
 
Deuce STM - CMP'09
Deuce STM - CMP'09Deuce STM - CMP'09
Deuce STM - CMP'09
 
Yevhen Tatarynov "From POC to High-Performance .NET applications"
Yevhen Tatarynov "From POC to High-Performance .NET applications"Yevhen Tatarynov "From POC to High-Performance .NET applications"
Yevhen Tatarynov "From POC to High-Performance .NET applications"
 
Storm
StormStorm
Storm
 
Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.
 
Concurrency in Go by Denys Goldiner.pdf
Concurrency in Go by Denys Goldiner.pdfConcurrency in Go by Denys Goldiner.pdf
Concurrency in Go by Denys Goldiner.pdf
 
Introduction to Apache Storm
Introduction to Apache StormIntroduction to Apache Storm
Introduction to Apache Storm
 
Stevens 3rd Annual Conference Hfc2011
Stevens 3rd Annual Conference Hfc2011Stevens 3rd Annual Conference Hfc2011
Stevens 3rd Annual Conference Hfc2011
 

More from Tikal Knowledge

Clojure - LISP on the JVM
Clojure - LISP on the JVM Clojure - LISP on the JVM
Clojure - LISP on the JVM Tikal Knowledge
 
Tabtale story: Building a publishing and monitoring mobile games architecture...
Tabtale story: Building a publishing and monitoring mobile games architecture...Tabtale story: Building a publishing and monitoring mobile games architecture...
Tabtale story: Building a publishing and monitoring mobile games architecture...Tikal Knowledge
 
Docking your services_with_docker
Docking your services_with_dockerDocking your services_with_docker
Docking your services_with_dockerTikal Knowledge
 
Writing a Fullstack Application with Javascript - Remote media player
Writing a Fullstack Application with Javascript - Remote media playerWriting a Fullstack Application with Javascript - Remote media player
Writing a Fullstack Application with Javascript - Remote media playerTikal Knowledge
 
.Net OSS Ci & CD with Jenkins - JUC ISRAEL 2013
.Net OSS Ci & CD with Jenkins - JUC ISRAEL 2013 .Net OSS Ci & CD with Jenkins - JUC ISRAEL 2013
.Net OSS Ci & CD with Jenkins - JUC ISRAEL 2013 Tikal Knowledge
 
Tikal's Backbone_js introduction workshop
Tikal's Backbone_js introduction workshopTikal's Backbone_js introduction workshop
Tikal's Backbone_js introduction workshopTikal Knowledge
 
Cloud computing - an insight into "how does it really work ?"
Cloud computing - an insight into "how does it really work ?" Cloud computing - an insight into "how does it really work ?"
Cloud computing - an insight into "how does it really work ?" Tikal Knowledge
 
Introduction to Cloud Computing
Introduction to Cloud ComputingIntroduction to Cloud Computing
Introduction to Cloud ComputingTikal Knowledge
 
Tikal Fuse Day Access Layer Implementation (C#) Based On Mongo Db
Tikal Fuse Day   Access Layer Implementation (C#) Based On Mongo DbTikal Fuse Day   Access Layer Implementation (C#) Based On Mongo Db
Tikal Fuse Day Access Layer Implementation (C#) Based On Mongo DbTikal Knowledge
 
Ship early ship often with Django
Ship early ship often with DjangoShip early ship often with Django
Ship early ship often with DjangoTikal Knowledge
 
Introduction To Cloud Computing
Introduction To Cloud ComputingIntroduction To Cloud Computing
Introduction To Cloud ComputingTikal Knowledge
 
Building Components In Flex3
Building Components In Flex3Building Components In Flex3
Building Components In Flex3Tikal Knowledge
 
JBUG 11 - Django-The Web Framework For Perfectionists With Deadlines
JBUG 11 - Django-The Web Framework For Perfectionists With DeadlinesJBUG 11 - Django-The Web Framework For Perfectionists With Deadlines
JBUG 11 - Django-The Web Framework For Perfectionists With DeadlinesTikal Knowledge
 

More from Tikal Knowledge (20)

Clojure - LISP on the JVM
Clojure - LISP on the JVM Clojure - LISP on the JVM
Clojure - LISP on the JVM
 
Clojure presentation
Clojure presentationClojure presentation
Clojure presentation
 
Tabtale story: Building a publishing and monitoring mobile games architecture...
Tabtale story: Building a publishing and monitoring mobile games architecture...Tabtale story: Building a publishing and monitoring mobile games architecture...
Tabtale story: Building a publishing and monitoring mobile games architecture...
 
Docking your services_with_docker
Docking your services_with_dockerDocking your services_with_docker
Docking your services_with_docker
 
Who moved my_box
Who moved my_boxWho moved my_box
Who moved my_box
 
Writing a Fullstack Application with Javascript - Remote media player
Writing a Fullstack Application with Javascript - Remote media playerWriting a Fullstack Application with Javascript - Remote media player
Writing a Fullstack Application with Javascript - Remote media player
 
.Net OSS Ci & CD with Jenkins - JUC ISRAEL 2013
.Net OSS Ci & CD with Jenkins - JUC ISRAEL 2013 .Net OSS Ci & CD with Jenkins - JUC ISRAEL 2013
.Net OSS Ci & CD with Jenkins - JUC ISRAEL 2013
 
Tikal's Backbone_js introduction workshop
Tikal's Backbone_js introduction workshopTikal's Backbone_js introduction workshop
Tikal's Backbone_js introduction workshop
 
TCE Automation
TCE AutomationTCE Automation
TCE Automation
 
Tce automation-d4
Tce automation-d4Tce automation-d4
Tce automation-d4
 
Cloud computing - an insight into "how does it really work ?"
Cloud computing - an insight into "how does it really work ?" Cloud computing - an insight into "how does it really work ?"
Cloud computing - an insight into "how does it really work ?"
 
Introduction to Cloud Computing
Introduction to Cloud ComputingIntroduction to Cloud Computing
Introduction to Cloud Computing
 
Tikal Fuse Day Access Layer Implementation (C#) Based On Mongo Db
Tikal Fuse Day   Access Layer Implementation (C#) Based On Mongo DbTikal Fuse Day   Access Layer Implementation (C#) Based On Mongo Db
Tikal Fuse Day Access Layer Implementation (C#) Based On Mongo Db
 
Ship early ship often with Django
Ship early ship often with DjangoShip early ship often with Django
Ship early ship often with Django
 
Google App Engine
Google App EngineGoogle App Engine
Google App Engine
 
AWS Case Study
AWS Case StudyAWS Case Study
AWS Case Study
 
Introduction To Cloud Computing
Introduction To Cloud ComputingIntroduction To Cloud Computing
Introduction To Cloud Computing
 
Building Components In Flex3
Building Components In Flex3Building Components In Flex3
Building Components In Flex3
 
Osgi Democamp
Osgi DemocampOsgi Democamp
Osgi Democamp
 
JBUG 11 - Django-The Web Framework For Perfectionists With Deadlines
JBUG 11 - Django-The Web Framework For Perfectionists With DeadlinesJBUG 11 - Django-The Web Framework For Perfectionists With Deadlines
JBUG 11 - Django-The Web Framework For Perfectionists With Deadlines
 

Recently uploaded

Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 

Processing Big Data in Real Time with Apache Storm

  • 2. 2
  • 4. After a Long Travel Day 4
  • 5. Going to a Salsa Club 5
  • 6. Best Salsa Club NOW ● Good Music ● Crowded – Now! 6
  • 7. Same Problem in “gogobot” 7
  • 8. 8
  • 9. Lets' Develop “Gogobot Checkins Heat-Map” gogobot checkin Heat Map Service 9
  • 10. Key Notes ● Collector Service - Collects checkins as text addresses – We need to use GeoLocation Service ● Upon elapsed interval, the last locations list will be displayed as Heat-Map in GUI. ● Web Scale service – 10Ks checkins/seconds all over the world (imaginary, but lets do it for the exercise). ● Accuracy – Sample data, NOT critical data. – – Proportionately representative Data volume is large enough to compensate for data loss. 10
  • 11. Heat-Map Context Heat-Map Gogobot System Text-Address Checkins Heat-Map Service Gogobot Micro Service Gogobot Micro Service Get-GeoCode(Address) Gogobot Micro Service Geo Location Service Last Interval Locations 11
  • 12. Plan A Simulate Checkins with a File Check-in #1 Check-in #2 Check-in #3 Check-in #4 Check-in #5 Check-in #6 Check-in #7 Check-in #8 Check-in #9 ... Geo Location Service GET Geo Location Read Text Address Processing Checkins Persist Checkin Intervals Database 12
  • 13. Tons of Addresses Arriving Every Second 13
  • 14. Architect - First Reaction... 14
  • 18. Problems ? ● Tedious: Spend time conf guring where to send i messages, deploying workers, and deploying intermediate queues. ● Brittle: There's little fault-tolerance. ● Painful to scale: Partition of running worker/s is complicated. 18
  • 19. What We Want ? ● Horizontal scalability ● Fault-tolerance ● No intermediate message brokers! ● Higher level abstraction than message passing ● “Just works” ● Guaranteed data processing (not in this case) 19
  • 20. Apache Storm ✔Horizontal scalability ✔Fault-tolerance ✔No intermediate message brokers! ✔Higher level abstraction than message passing ✔“Just works” ✔Guaranteed data processing 20
  • 22. What is Storm ? ● CEP - Open source and distributed realtime computation system. – – ● Makes it easy to reliably process unbounded streams of tuples Doing for realtime processing what Hadoop did for batch processing. Fast - 1M Tuples/sec per node. – It is scalable,fault-tolerant, guarantees your data will be processed, and is easy to set up and operate. 22
  • 27. Guarantee for Processing ● ● ● Storm guarantees the full processing of a tuple by tracking its state In case of failure, Storm can re-process it. Source tuples with full “acked” trees are removed from the system 27
  • 28. Tasks (Bolt/Spout Instance) Spouts and bolts execute as many tasks across the cluster 28
  • 29. Stream Grouping When a tuple is emitted, which task (instance) does it go to? 29
  • 30. Stream Grouping ● ● ● ● Shuff e grouping: pick a random task l Fields grouping: consistent hashing on a subset of tuple f elds i All grouping: send to all tasks Global grouping: pick task with lowest id 30
  • 31. Tasks , Executors , Workers Worker Process Executor Task JVM Executor Task Task Thread = Thread Sput / Bolt Sput / Sput / Bolt Bolt 31
  • 32. Node Worker Process Executor Spout A Supervisor Executor Bolt C Bolt C Executor Bolt B Bolt B Node Worker Process Supervisor Executor Executor Spout A Bolt C Bolt C Executor Bolt B Bolt B 32
  • 33. Storm Architecture NOT critical for running topology Upload/Rebalance Heat-Map Topology Supervisor Supervisor Supervisor Supervisor Nimbus Supervisor Supervisor Zoo Keeper Nodes Master Node (similar to Hadoop JobTracker) 33
  • 34. Storm Architecture A few nodes Upload/Rebalance Heat-Map Topology Supervisor Supervisor Supervisor Supervisor Nimbus Supervisor Supervisor Zoo Keeper Used For Cluster Coordination 34
  • 37. HeatMap Input/Output Tuples ● Input Tuples: Timestamp and Text Address : – ● (9:00:07 PM , “287 Hudson St New York NY 10013”) Output Tuple: Time interval, and a list of points for it: – (9:00:00 PM to 9:00:15 PM, List((40.719,-73.987),(40.726,-74.001),(40.719,-73.987)) 37
  • 38. Checkins Spout Heat Map Storm Topology (9:01 PM @ 287 Hudson st) Geocode Lookup Bolt (9:01 PM , (40.736, -74,354))) Heatmap Builder Bolt Upon Elapsed Interval (9:00 PM – 9:15 PM , List((40.73, -74,34), (51.36, -83,33),(69.73, -34,24)) Persistor Bolt 38
  • 39. Checkins Spout public class CheckinsSpout extends BaseRichSpout { private List<String> sampleLocations; private int nextEmitIndex; private SpoutOutputCollector outputCollector; We hold state No need for thread safety @Override public void open(Map map, TopologyContext topologyContext, SpoutOutputCollector spoutOutputCollector) { this.outputCollector = spoutOutputCollector; this.nextEmitIndex = 0; sampleLocations = IOUtils.readLines( } ClassLoader.getSystemResourceAsStream("sanple-locations.txt")); @Override Been called public void nextTuple() { iteratively by Storm String address = checkins.get(nextEmitIndex); String checkin = new Date().getTime()+"@ADDRESS:"+address; outputCollector.emit(new Values(checkin)); } nextEmitIndex = (nextEmitIndex + 1) % sampleLocations.size(); @Override declareOutputFields(OutputFieldsDeclarer declarer.declare(new Fields("str")); public void } Declare output fields declarer) { 39
  • 40. Geocode Lookup Bolt public class GeocodeLookupBolt extends BaseBasicBolt { private LocatorService locatorService; @Override public void prepare(Map stormConf, TopologyContext context) { locatorService = new GoogleLocatorService(); } @Override public void execute(Tuple tuple, BasicOutputCollector outputCollector) { String str = tuple.getStringByField("str"); String[] parts = str.split("@"); Long time = Long.valueOf(parts[0]); String address = parts[1]; Get Geocode, Create DTO LocationDTO locationDTO = locatorService.getLocation(address); if(checkinDTO!=null) } outputCollector.emit(new Values(time,locationDTO) ); @Override public void declareOutputFields(OutputFieldsDeclarer fieldsDeclarer) { } fieldsDeclarer.declare(new Fields("time", "location")); 40 }
  • 41. Tick Tuple – Repeating Mantra 41
  • 42. Two Streams to Heat-Map Builder Checkin 1 Checkin 4 Checkin 5 Checkin 6 HeatMapBuilder Bolt On tick tuple, we f ush our Heat-Map l 42
  • 43. Tick Tuple in Action public class HeatMapBuilderBolt extends BaseBasicBolt { private Map<String, List<LocationDTO>> heatmaps; Hold latest intervals @Override public Map<String, Object> getComponentConfiguration() { Config conf = new Config(); conf.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, 60 ); return conf; } Tick interval @Override public void execute(Tuple tuple, BasicOutputCollector outputCollector) { if (isTickTuple(tuple)) { // Emit accumulated intervals } else { // Add check-in info to the current interval in the Map } } private boolean isTickTuple(Tuple tuple) { return tuple.getSourceComponent().equals(Constants.SYSTEM_COMPONENT_ID) && tuple.getSourceStreamId().equals(Constants.SYSTEM_TICK_STREAM_ID); } 43
  • 44. Persister Bolt public class PersistorBolt extends BaseBasicBolt { private Jedis jedis; @Override public void execute(Tuple tuple, BasicOutputCollector outputCollector) { Long timeInterval = tuple.getLongByField("time-interval"); String city = tuple.getStringByField("city"); String locationsList = objectMapper.writeValueAsString ( tuple.getValueByField("locationsList")); String dbKey = "checkins-" + timeInterval+"@"+city; Persist in Redis for 24h jedis.setex(dbKey, 3600*24 ,locationsList); jedis.publish("location-key", dbKey); } } Publish in Redis channel for debugging 44
  • 45. Transforming the Tuples Checkins Spout Sample Checkins File Check-in #1 Check-in #2 Check-in #3 Check-in #4 Check-in #5 Check-in #6 Check-in #7 Check-in #8 Check-in #9 ... Read Text Addresses Shuffle Grouping Geocode Lookup Bolt Get Geo Location Geo Location Service Group by city Field Grouping(city) Heatmap Builder Bolt Shuffle Grouping Database Persistor Bolt 45
  • 46. Heat Map Topology public class LocalTopologyRunner { public static void main(String[] args) { TopologyBuilder builder = buildTopolgy(); StormSubmitter.submitTopology( "local-heatmap", new Config(), builder.createTopology()); } private static TopologyBuilder buildTopolgy() { topologyBuilder builder = new TopologyBuilder(); builder.setSpout("checkins", new CheckinsSpout()); builder.setBolt("geocode-lookup", new GeocodeLookupBolt() ) .shuffleGrouping("checkins"); builder.setBolt("heatmap-builder", new HeatMapBuilderBolt() ) .fieldsGrouping("geocode-lookup", new Fields("city")); builder.setBolt("persistor", new PersistorBolt() ) .shuffleGrouping("heatmap-builder"); } } return builder; 46
  • 48. 48
  • 49. Scaling the Topology public class LocalTopologyRunner { conf.setNumWorkers(20); public static void main(String[] args) { TopologyBuilder builder = buildTopolgy(); Config conf = new Config(); Set no. of workers conf.setNumWorkers(2); } StormSubmitter.submitTopology( "local-heatmap", conf, builder.createTopology()); Parallelism hint private static TopologyBuilder buildTopolgy() { topologyBuilder builder = new TopologyBuilder(); builder.setSpout("checkins", new CheckinsSpout(), 4 ); Increase Tasks For Future 8 ) .shuffleGrouping("checkins").setNumTasks(64); builder.setBolt("geocode-lookup", new GeocodeLookupBolt() , builder.setBolt("heatmap-builder", new HeatMapBuilderBolt() , 4) .fieldsGrouping("geocode-lookup", new Fields("city")); builder.setBolt("persistor", new PersistorBolt() , 2 ) .shuffleGrouping("heatmap-builder").setNumTasks(4);49 return builder;
  • 51. Recap – Plan A Sample Checkins File Check-in #1 Check-in #2 Check-in #3 Check-in #4 Check-in #5 Check-in #6 Check-in #7 Check-in #8 Check-in #9 ... Geo Location Service GET Geo Location Read Text Address Storm Heat-Map Topology Persist Checkin Intervals Database 51
  • 54. Plan B Kafka Spout&Bolt to HeatMap Publish Checkins Geo Location Service Checkin Kafka Topic Read Kafka Text Addresses Checkins Spout Geocode Lookup Bolt Heatmap Builder Bolt Database Persistor Bolt Kafka Locations Bolt Locations Topic 54
  • 55. 55
  • 56. They all are Good But not for all use-cases 56
  • 58. 58
  • 60. 60
  • 61. 61
  • 62. 62
  • 63. 63
  • 64. Doesn't Fear the File System 64
  • 65. 65
  • 66. 66
  • 67. 67
  • 68. Topics ● ● Logical collections of partitions (the physical f les). i A broker contains some of the partitions for a topic 68
  • 69. A partition is Consumed by Exactly One Group's Consumer 69
  • 71. Producer 2 Producer 1 Broker 1 Broker 2 Broker 3 Zoo Keeper Consumer 1 Consumer 2 71
  • 72. Producer 2 Producer 1 Broker 1 Broker 2 Broker 3 Broker 4 Zoo Keeper Consumer 1 Consumer 2 72
  • 73. Producer 2 Producer 1 Broker 1 Broker 2 Broker 3 Broker 4 Zoo Keeper Consumer 1 Consumer 2 73
  • 74. Producer 2 Producer 1 Broker 1 Broker 2 Broker 3 Broker 4 Zoo Keeper Consumer 1 Consumer 2 74
  • 75. Producer 2 Producer 1 Broker 1 Broker 2 Broker 3 Broker 4 Zoo Keeper Consumer 1 Consumer 2 75
  • 76. Producer 2 Producer 1 Broker 1 Broker 2 Broker 3 Broker 4 Zoo Keeper Consumer 1 Consumer 2 76
  • 77. Producer 2 Producer 1 Broker 1 Broker 2 Broker 3 Broker 4 Zoo Keeper Consumer 1 Consumer 2 77
  • 78. Producer 2 Producer 1 Broker 1 Broker 2 Broker 3 Zoo Keeper Consumer 1 Consumer 2 78
  • 79. Producer 2 Producer 1 Broker 1 Broker 2 Broker 3 Zoo Keeper Consumer 1 Consumer 2 79
  • 80. Producer 2 Producer 1 Broker 1 Broker 2 Broker 3 Zoo Keeper Consumer 1 Consumer 2 80
  • 81. Producer 2 Producer 1 Broker 1 Broker 2 Broker 3 Zoo Keeper Consumer 1 81
  • 82. Producer 2 Producer 1 Broker 1 Broker 2 Broker 3 Zoo Keeper Consumer 1 82
  • 83. Producer 2 Producer 1 Broker 1 Broker 2 Broker 3 Zoo Keeper Consumer 1 83
  • 84. Performance Benchmark 1 Broker 1 Producer 1 Consumer 84
  • 85. 85
  • 86. 86
  • 87. Add Kafka to our Topology public class LocalTopologyRunner { ... private static TopologyBuilder buildTopolgy() { Kafka Spout ... builder.setSpout("checkins", new KafkaSpout(kafkaConfig)); ... builder.setBolt("kafkaProducer", new KafkaOutputBolt ( "localhost:9092", "kafka.serializer.StringEncoder", "locations-topic")) .shuffleGrouping("persistor"); } } Kafka Bolt return builder; 87
  • 88. Plan C – Add Reactor Text-Address Publish Checkins Checkin HTTP Reactor Checkin Kafka Topic Consume Checkins Storm Heat-Map Topology Persist Checkin Intervals Database GET Geo Location Publish Interval Key Locations Kafka Topic Geo Location Service Index Interval Locations Search Server Index 88
  • 93. Reactor Pattern – Key Points ● ● ● ● Single thread / single event loop EVERYTHING runs on it You MUST NOT block the event loop Many Implementations (partial list): – Node.js (JavaScrip), EventMachine (Ruby), Twisted (Python)... and Vert.X 93
  • 94. Reactor Pattern Problems ● Some work is naturally blocking: – – ● Intensive data crunching 3rd-party blocking API’s (e.g. JDBC) Pure reactor (e.g. Node.js) is not a good f t for this i kind of work! 94
  • 95. 95
  • 97. Vert.X Goodies ● Growing Module ● TCP/SSL servers/clients ● Repository ● ● web server HTTP/HTTPS servers/ clients ● WebSockets support ● SockJS support ● Persistors (Mongo, JDBC, ...) ● Work queue ● Timers ● Authentication ● Buffers ● Manager ● Streams and Pumps ● Session manager ● Routing ● Socket.IO ● Asynchronous File I/O 97
  • 99. Node.js vs Vert.X ● Node.js ● Vert.X – JavaScript Only – Polyglot (JavaScript, Java, Ruby, Python...) – Inherently Single Threaded – Leverages JVM multithreading – No help much with IPC – Nervous Event Bus – All code MUST be in Event loop – Blocking work can be done off the event loop 99
  • 100. Node.js vs Vert.X Benchmark http://vertxproject.wordpress.com/2012/05/09/vert-x-vs-node-js-simple-http-benchmarks/ AMD Phenom II X6 (6 core), 8GB RAM, Ubuntu 11.04 100
  • 101. HeatMap Reactor Architecture Vert.X Instance Vert.X Instance Automatically sends EventBus Msg → KafkaTopic HTTP Server Verticle Kafka module Kafka Topic Event Bus Storm Topology 101
  • 102. Heat-Map Server – Only 6 LOC ! var var var var vertx = require('vertx'); container = require('vertx/container'); console = require('vertx/console'); config = container.config; Send checkin to Vert.X EventBus vertx.createHttpServer().requestHandler(function(request) { request.dataHandler(function(buffer) { vertx.eventBus.send(config.address, {payload:buffer.toString()}); }); request.response.end(); }).listen(config.httpServerPort, config.httpServerHost); console.log("HTTP CheckinsReactor started on port "+config.httpServerPort); 102
  • 103. Publish Checkins Checkin HTTP Reactor Checkin Kafka Topic Consume Checkins Checkin HTTP Firehose GET Geo Location Storm Heat-Map Topology Persist Checkin Intervals Publish Interval Key Index Interval Locations Hotzones Kafka Topic Database Consume Intervals Keys Search Get Interval Locations Geo Location Service Search Server Web App Push via WebSocket Index 103
  • 106. When You go out to Salsa Club ● Good Music ● Crowded 106
  • 107. More Conclusions.. ● Storm – Great for real-time BigData processing. Complementary for Hadoop batch jobs. ● Kafka – Great messaging for logs/events data, been served as a good “source” for Storm spout ● Vert.X – Worth trial and check as an alternative for reactor. 107