SlideShare a Scribd company logo
Storm
Anatomy
Eiichiro Uchiumi
http://www.eiichiro.org/
About Me
Eiichiro Uchiumi
• A solutions architect at
working in emerging enterprise
technologies
- Cloud transformation
- Enterprise mobility
- Information optimization (big data)
https://github.com/eiichiro
@eiichirouchiumi
http://www.facebook.com/
eiichiro.uchiumi
What is Stream Processing?
Stream processing is a technical paradigm to process
big volume unbound sequence of tuples in realtime
• Algorithmic trading
• Sensor data monitoring
• Continuous analytics
= Stream
Source Stream Processor
What is Storm?
Storm is
• Fast & scalable
• Fault-tolerant
• Guarantees messages will be processed
• Easy to setup & operate
• Free & open source
distributed realtime computation system
- Originally developed by Nathan Marz at BackType (acquired by Twitter)
- Written in Java and Clojure
Conceptual View
Bolt
Bolt
Bolt
Bolt
BoltSpout
Spout
Bolt:
Consumer of streams does some processing
and possibly emits new tuples
Spout:
Source of streams
Stream:
Unbound sequence of tuples
Tuple
Tuple:
List of name-value pair
Topology: Graph of computation composed of spout/bolt as the node and stream as the edge
Tuple
Tuple
Physical View
SupervisorNimbus
Worker
* N
Worker
Executor
* N
Task
* N
Supervisor
Supervisor
ZooKeeper
Supervisor
Supervisor
ZooKeeper
ZooKeeper Worker
Nimbus:
Master daemon process
responsible for
• distributing code
• assigning tasks
• monitoring failures
ZooKeeper:
Storing cluster operational state
Supervisor:
Worker daemon process listening for
work assigned its node
Worker:
Java process
executes a subset
of topology
Worker node
Worker process
Executor:
Java thread spawned
by worker runs on
one or more tasks of
the same component
Task:
Component (spout/
bolt) instance
performs the actual
data processing
Spout
import backtype.storm.spout.SpoutOutputCollector;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseRichSpout;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Values;
import backtype.storm.utils.Utils;
public class RandomSentenceSpout extends BaseRichSpout {
! SpoutOutputCollector collector;
! Random random;
!
! @Override
! public void open(Map conf, TopologyContext context,
! ! ! SpoutOutputCollector collector) {
! ! this.collector = collector;
! ! random = new Random();
! }
! @Override
! public void nextTuple() {
! ! String[] sentences = new String[] {
! ! ! ! "the cow jumped over the moon",
! ! ! ! "an apple a day keeps the doctor away",
! ! ! ! "four score and seven years ago",
! ! ! ! "snow white and the seven dwarfs",
! ! ! ! "i am at two with nature"
! ! };
! ! String sentence = sentences[random.nextInt(sentences.length)];
! ! collector.emit(new Values(sentence));
! }
Spout
! @Override
! public void open(Map conf, TopologyContext context,
! ! ! SpoutOutputCollector collector) {
! ! this.collector = collector;
! ! random = new Random();
! }
! @Override
! public void nextTuple() {
! ! String[] sentences = new String[] {
! ! ! ! "the cow jumped over the moon",
! ! ! ! "an apple a day keeps the doctor away",
! ! ! ! "four score and seven years ago",
! ! ! ! "snow white and the seven dwarfs",
! ! ! ! "i am at two with nature"
! ! };
! ! String sentence = sentences[random.nextInt(sentences.length)];
! ! collector.emit(new Values(sentence));
! }
! @Override
! public void declareOutputFields(OutputFieldsDeclarer declarer) {
! ! declarer.declare(new Fields("sentence"));
! }
@Override
public void ack(Object msgId) {}
@Override
public void fail(Object msgId) {}
}
Bolt
import backtype.storm.task.OutputCollector;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseRichBolt;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Tuple;
import backtype.storm.tuple.Values;
public class SplitSentenceBolt extends BaseRichBolt {
! OutputCollector collector;
!
! @Override
! public void prepare(Map stormConf, TopologyContext context,
! ! ! OutputCollector collector) {
! ! this.collector = collector;
! }
! @Override
! public void execute(Tuple input) {
! ! for (String s : input.getString(0).split("s")) {
! ! ! collector.emit(new Values(s));
! ! }
! }
! @Override
! public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word"));
! }
}
Topology
import backtype.storm.Config;
import backtype.storm.LocalCluster;
import backtype.storm.StormSubmitter;
import backtype.storm.topology.TopologyBuilder;
import backtype.storm.tuple.Fields;
public class WordCountTopology {
! public static void main(String[] args) throws Exception {
! ! TopologyBuilder builder = new TopologyBuilder();
! ! builder.setSpout("sentence", new RandomSentenceSpout(), 2);
! ! builder.setBolt("split", new SplitSentenceBolt(), 4)
! ! ! ! .shuffleGrouping("sentence")
! ! ! ! .setNumTasks(8);
! ! builder.setBolt("count", new WordCountBolt(), 6)
! ! ! ! .fieldsGrouping("split", new Fields("word"));
! !
! ! Config config = new Config();
! ! config.setNumWorkers(4);
! !
! ! StormSubmitter.submitTopology("wordcount", config, builder.createTopology());
! !
! ! // Local testing
//! ! LocalCluster cluster = new LocalCluster();
//! ! cluster.submitTopology("wordcount", config, builder.createTopology());
//! ! Thread.sleep(10000);
//! ! cluster.shutdown();
! }
!
}
Starting Topology
Nimbus
Thrift server
ZooKeeperStormSubmitter
> bin/storm jar
Uploads topology JAR to
Nimbus’ inbox with
dependencies
Submits topology
configuration as JSON
and structure as Thrift
Copies topology JAR,
configuration and structure
into local file system
Sets up static information
for topology
Makes assignment
Starts topology
Starting Topology
ZooKeeper Executor
Task
Worker
Supervisor
Nimbus
Thrift server
Downloads topology
JAR, configuration and
structure
Writes assignment on its
node into local file system
Starts worker based on
the assignment
Refreshes connections
Makes executors
Makes tasks
Starts processing
What is Storm?
Storm is
• Fast & scalable
• Fault-tolerant
• Guarantees messages will be processed
• Easy to setup & operate
• Free & open source
distributed realtime computation system
- Originally developed by Nathan Marz at BackType (acquired by Twitter)
- Written in Java and Clojure
Extremely Significant Performance
Parallelism
RandomSentence
Spout
SplitSentence
Bolt
WordCount
Bolt
Parallelism
hint = 2
Parallelism
hint = 4
Parallelism
hint = 6
Number of
tasks = Not
specified =
Same as
parallelism
hint = 2
Number of
tasks = 8
Number of
tasks = Not
specified
= 6
Number of topology worker = 4
Number of worker slots / node = 4
Number of worker nodes = 2
Number of executor threads
= 2 + 4 + 6 = 12
Number of component instances
= 2 + 8 + 6 = 16
Worker node
Worker node
Worker process
Worker process
SS
Bolt
WC
Bolt
RS
Spout
SS
Bolt
SS
Bolt
WC
Bolt
RS
Spout
SS
Bolt
SS
Bolt
WC
Bolt
SS
Bolt
WC
Bolt
SS
Bolt
WC
Bolt
SS
Bolt
WC
Bolt
Executor thread
Topology can be spread out manually without downtime
when a worker node is added
Message Passing
Worker process
Executor
Executor Transfer
thread
Executor
Receive
thread
From other
workers
To other
workers
Receiver queue
Transfer queue
Internal transfer queue
Interprocess communication is mediated by ZeroMQ
Outside transfer is done with Kryo serialization
Local communication is mediated by LMAX Disruptor
Inside transfer is done with no serialization
LMAX Disruptor
• Consumer can easily
keep up with
producer by batching
• CPU cache friendly
- The ring is implemented as
an array, so the entries can
be preloaded
• GC safe
- The entries are preallocated
up front and live forever
Large concurrent
magic ring buffer
can be used like
blocking queue
Producer
Consumer
6 million orders per second can be processed
on a single thread at LMAX
What is Storm?
Storm is
• Fast & scalable
• Fault-tolerant
• Guarantees messages will be processed
• Easy to setup & operate
• Free & open source
distributed realtime computation system
- Originally developed by Nathan Marz at BackType (acquired by Twitter)
- Written in Java and Clojure
Fault-tolerance
Cluster works normally
ZooKeeper WorkerSupervisorNimbus
Monitoring
cluster state
Synchronizing
assignment
Sending heartbeat
Reading worker
heartbeat from
local file system
Sending executor heartbeat
Fault-tolerance
Nimbus goes down
ZooKeeper WorkerSupervisorNimbus
Synchronizing
assignment
Sending heartbeat
Reading worker
heartbeat from
local file system
Sending executor heartbeat
Monitoring
cluster state
Processing will still continue. But topology lifecycle operations
and reassignment facility are lost
Fault-tolerance
Worker node goes down
ZooKeeper WorkerSupervisorNimbus
Monitoring
cluster state
Synchronizing
assignment
Sending heartbeat
Reading worker
heartbeat from
local file system
Sending executor heartbeat
WorkerSupervisor
Nimbus will reassign the tasks to other machines
and the processing will continue
Fault-tolerance
Supervisor goes down
ZooKeeper WorkerSupervisorNimbus
Monitoring
cluster state
Synchronizing
assignment
Sending heartbeat
Reading worker
heartbeat from
local file system
Sending executor heartbeat
Processing will still continue. But assignment is
never synchronized
Fault-tolerance
Worker process goes down
ZooKeeper WorkerSupervisorNimbus
Monitoring
cluster state
Synchronizing
assignment
Sending heartbeat
Reading worker
heartbeat from
local file system
Sending executor heartbeat
Supervisor will restart the worker process
and the processing will continue
What is Storm?
Storm is
• Fast & scalable
• Fault-tolerant
• Guarantees messages will be processed
• Easy to setup & operate
• Free & open source
distributed realtime computation system
- Originally developed by Nathan Marz at BackType (acquired by Twitter)
- Written in Java and Clojure
Reliability API
public class RandomSentenceSpout extends BaseRichSpout {
! public void nextTuple() {
! ! ...;
! ! UUID msgId = getMsgId();
! ! collector.emit(new Values(sentence), msgId);
! }
public void ack(Object msgId) {
! // Do something with acked message id.
}
public void fail(Object msgId) {
! // Do something with failed message id.
}
}
public class SplitSentenceBolt extends BaseRichBolt {
! public void execute(Tuple input) {
! ! for (String s : input.getString(0).split("s")) {
! ! ! collector.emit(input, new Values(s));
! ! }
! !
! ! collector.ack(input);
! }
}
"the"
"the cow jumped
over the moon"
"cow"
"jumped"
"over"
"the"
"moon"
Emitting tuple
with message id
Anchoring incoming tuple
to outgoing tuples
Sending ack
Tuple tree
Acking Framework
SplitSentence
Bolt
RandomSentence
Spout
WordCount
Bolt
Acker
implicit bolt
Acker ack
Acker fail
Acker init
Acker implicit bolt
Tuple A
Tuple C
Tuple B
64 bit number called “Ack val”Spout tuple id Spout task id
Ack val has become 0, Acker implicit bolt knows
the tuple tree has been completed
Acker ack
Acker fail
• Emitted tuple A, XOR tuple A id with ack val
• Emitted tuple B, XOR tuple B id with ack val
• Emitted tuple C, XOR tuple C id with ack val
• Acked tuple A, XOR tuple A id with ack val
• Acked tuple B, XOR tuple B id with ack val
• Acked tuple C, XOR tuple C id with ack val
What is Storm?
Storm is
• Fast & scalable
• Fault-tolerant
• Guarantees messages will be processed
• Easy to setup & operate
• Free & open source
distributed realtime computation system
- Originally developed by Nathan Marz at BackType (acquired by Twitter)
- Written in Java and Clojure
Cluster Setup
• Setup ZooKeeper cluster
• Install dependencies on Nimbus and worker
machines
- ZeroMQ 2.1.7 and JZMQ
- Java 6 and Python 2.6.6
- unzip
• Download and extract a Storm release to Nimbus
and worker machines
• Fill in mandatory configuration into storm.yaml
• Launch daemons under supervision using “storm”
script
Cluster Summary
Topology Summary
Component Summary
What is Storm?
Storm is
• Fast & scalable
• Fault-tolerant
• Guarantees messages will be processed
• Easy to setup & operate
• Free & open source
distributed realtime computation system
- Originally developed by Nathan Marz at BackType (acquired by Twitter)
- Written in Java and Clojure
Basic Resources
• Storm is available at
- http://storm-project.net/
- https://github.com/nathanmarz/storm
under Eclipse Public License 1.0
• Get help on
- http://groups.google.com/group/storm-user
- #storm-user freenode room
• Follow
- @stormprocessor and @nathanmarz
for updates on the project
Many Contributions
• Community repository for modules to use Storm at
- https://github.com/nathanmarz/storm-contrib
including integration with Redis, Kafka, MongoDB,
HBase, JMS, Amazon SQS and so on
• Good articles for understanding Storm internals
- http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm-
topology/
- http://www.michael-noll.com/blog/2013/06/21/understanding-storm-internal-message-
buffers/
• Good slides for understanding real-life examples
- http://www.slideshare.net/DanLynn1/storm-as-deep-into-realtime-data-processing-as-you-
can-get-in-30-minutes
- http://www.slideshare.net/KrishnaGade2/storm-at-twitter
Features on Deck
• Current release: 0.8.2 as of 6/28/2013
• Work in progress (older): 0.8.3-wip3
- Some bug fixes
• Work in progress (newest): 0.9.0-wip19
- SLF4J and Logback
- Pluggable tuple serialization and blowfish encryption
- Pluggable interprocess messaging and Netty implementation
- Some bug fixes
- And more
Advanced Topics
• Distributed RPC
• Transactional topologies
• Trident
• Using non-JVM languages with Storm
• Unit testing
• Patterns
...Not described in this presentation. So check
these out by yourself, or my upcoming session if a
chance is given :)
Thank You

More Related Content

What's hot

Improved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as exampleImproved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as example
DataWorks Summit/Hadoop Summit
 
Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)
Robert Evans
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm Architecture
P. Taylor Goetz
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceP. Taylor Goetz
 
Multi-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop GridMulti-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop GridDataWorks Summit
 
Slide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormSlide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache Storm
Md. Shamsur Rahim
 
Distributed Realtime Computation using Apache Storm
Distributed Realtime Computation using Apache StormDistributed Realtime Computation using Apache Storm
Distributed Realtime Computation using Apache Storm
the100rabh
 
Resource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache StormResource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache Storm
DataWorks Summit/Hadoop Summit
 
Storm
StormStorm
Storm
nathanmarz
 
Apache Storm Tutorial
Apache Storm TutorialApache Storm Tutorial
Apache Storm Tutorial
Farzad Nozarian
 
Storm and Cassandra
Storm and Cassandra Storm and Cassandra
Storm and Cassandra
T Jake Luciani
 
Apache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integrationApache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integration
Uday Vakalapudi
 
Apache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignApache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - Verisign
Michael Noll
 
Learning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormLearning Stream Processing with Apache Storm
Learning Stream Processing with Apache Storm
Eugene Dvorkin
 
Real-time Big Data Processing with Storm
Real-time Big Data Processing with StormReal-time Big Data Processing with Storm
Real-time Big Data Processing with Storm
viirya
 
Apache Storm
Apache StormApache Storm
Apache Storm
Nguyen Quang
 
Storm: The Real-Time Layer - GlueCon 2012
Storm: The Real-Time Layer  - GlueCon 2012Storm: The Real-Time Layer  - GlueCon 2012
Storm: The Real-Time Layer - GlueCon 2012
Dan Lynn
 
Stream Processing Frameworks
Stream Processing FrameworksStream Processing Frameworks
Stream Processing Frameworks
SirKetchup
 

What's hot (19)

Improved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as exampleImproved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as example
 
Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm Architecture
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market Sceince
 
Multi-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop GridMulti-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop Grid
 
Slide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormSlide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache Storm
 
Distributed Realtime Computation using Apache Storm
Distributed Realtime Computation using Apache StormDistributed Realtime Computation using Apache Storm
Distributed Realtime Computation using Apache Storm
 
Resource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache StormResource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache Storm
 
Storm
StormStorm
Storm
 
Apache Storm Tutorial
Apache Storm TutorialApache Storm Tutorial
Apache Storm Tutorial
 
Storm and Cassandra
Storm and Cassandra Storm and Cassandra
Storm and Cassandra
 
Apache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integrationApache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integration
 
Apache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignApache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - Verisign
 
Learning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormLearning Stream Processing with Apache Storm
Learning Stream Processing with Apache Storm
 
Real-time Big Data Processing with Storm
Real-time Big Data Processing with StormReal-time Big Data Processing with Storm
Real-time Big Data Processing with Storm
 
Apache Storm
Apache StormApache Storm
Apache Storm
 
Storm: The Real-Time Layer - GlueCon 2012
Storm: The Real-Time Layer  - GlueCon 2012Storm: The Real-Time Layer  - GlueCon 2012
Storm: The Real-Time Layer - GlueCon 2012
 
Introduction to Storm
Introduction to StormIntroduction to Storm
Introduction to Storm
 
Stream Processing Frameworks
Stream Processing FrameworksStream Processing Frameworks
Stream Processing Frameworks
 

Similar to Storm Anatomy

Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Davorin Vukelic
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
DataWorks Summit/Hadoop Summit
 
Real-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpacesReal-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpaces
Oleksii Diagiliev
 
Unraveling mysteries of the Universe at CERN, with OpenStack and Hadoop
Unraveling mysteries of the Universe at CERN, with OpenStack and HadoopUnraveling mysteries of the Universe at CERN, with OpenStack and Hadoop
Unraveling mysteries of the Universe at CERN, with OpenStack and Hadoop
Piotr Turek
 
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation systemBWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
Andrii Gakhov
 
Golang Performance : microbenchmarks, profilers, and a war story
Golang Performance : microbenchmarks, profilers, and a war storyGolang Performance : microbenchmarks, profilers, and a war story
Golang Performance : microbenchmarks, profilers, and a war story
Aerospike
 
Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...
Anubhav Jain
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
DataWorks Summit/Hadoop Summit
 
PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.
DECK36
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 
Storm
StormStorm
An Introduction to Go
An Introduction to GoAn Introduction to Go
An Introduction to GoCloudflare
 
Apache Storm
Apache StormApache Storm
Apache Storm
Rajind Ruparathna
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014
P. Taylor Goetz
 
Fact based monitoring
Fact based monitoringFact based monitoring
Fact based monitoringDatadog
 
Fact-Based Monitoring
Fact-Based MonitoringFact-Based Monitoring
Fact-Based Monitoring
Datadog
 
Storm presentation
Storm presentationStorm presentation
Storm presentation
Shyam Raj
 
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Databricks
 
Kotlin from-scratch 3 - coroutines
Kotlin from-scratch 3 - coroutinesKotlin from-scratch 3 - coroutines
Kotlin from-scratch 3 - coroutines
Franco Lombardo
 

Similar to Storm Anatomy (20)

Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache Storm
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
Real-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpacesReal-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpaces
 
Unraveling mysteries of the Universe at CERN, with OpenStack and Hadoop
Unraveling mysteries of the Universe at CERN, with OpenStack and HadoopUnraveling mysteries of the Universe at CERN, with OpenStack and Hadoop
Unraveling mysteries of the Universe at CERN, with OpenStack and Hadoop
 
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation systemBWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
 
Golang Performance : microbenchmarks, profilers, and a war story
Golang Performance : microbenchmarks, profilers, and a war storyGolang Performance : microbenchmarks, profilers, and a war story
Golang Performance : microbenchmarks, profilers, and a war story
 
Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
Storm
StormStorm
Storm
 
Storm
StormStorm
Storm
 
An Introduction to Go
An Introduction to GoAn Introduction to Go
An Introduction to Go
 
Apache Storm
Apache StormApache Storm
Apache Storm
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014
 
Fact based monitoring
Fact based monitoringFact based monitoring
Fact based monitoring
 
Fact-Based Monitoring
Fact-Based MonitoringFact-Based Monitoring
Fact-Based Monitoring
 
Storm presentation
Storm presentationStorm presentation
Storm presentation
 
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
 
Kotlin from-scratch 3 - coroutines
Kotlin from-scratch 3 - coroutinesKotlin from-scratch 3 - coroutines
Kotlin from-scratch 3 - coroutines
 

Recently uploaded

From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 

Recently uploaded (20)

From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 

Storm Anatomy

  • 2. About Me Eiichiro Uchiumi • A solutions architect at working in emerging enterprise technologies - Cloud transformation - Enterprise mobility - Information optimization (big data) https://github.com/eiichiro @eiichirouchiumi http://www.facebook.com/ eiichiro.uchiumi
  • 3. What is Stream Processing? Stream processing is a technical paradigm to process big volume unbound sequence of tuples in realtime • Algorithmic trading • Sensor data monitoring • Continuous analytics = Stream Source Stream Processor
  • 4. What is Storm? Storm is • Fast & scalable • Fault-tolerant • Guarantees messages will be processed • Easy to setup & operate • Free & open source distributed realtime computation system - Originally developed by Nathan Marz at BackType (acquired by Twitter) - Written in Java and Clojure
  • 5. Conceptual View Bolt Bolt Bolt Bolt BoltSpout Spout Bolt: Consumer of streams does some processing and possibly emits new tuples Spout: Source of streams Stream: Unbound sequence of tuples Tuple Tuple: List of name-value pair Topology: Graph of computation composed of spout/bolt as the node and stream as the edge Tuple Tuple
  • 6. Physical View SupervisorNimbus Worker * N Worker Executor * N Task * N Supervisor Supervisor ZooKeeper Supervisor Supervisor ZooKeeper ZooKeeper Worker Nimbus: Master daemon process responsible for • distributing code • assigning tasks • monitoring failures ZooKeeper: Storing cluster operational state Supervisor: Worker daemon process listening for work assigned its node Worker: Java process executes a subset of topology Worker node Worker process Executor: Java thread spawned by worker runs on one or more tasks of the same component Task: Component (spout/ bolt) instance performs the actual data processing
  • 7. Spout import backtype.storm.spout.SpoutOutputCollector; import backtype.storm.task.TopologyContext; import backtype.storm.topology.OutputFieldsDeclarer; import backtype.storm.topology.base.BaseRichSpout; import backtype.storm.tuple.Fields; import backtype.storm.tuple.Values; import backtype.storm.utils.Utils; public class RandomSentenceSpout extends BaseRichSpout { ! SpoutOutputCollector collector; ! Random random; ! ! @Override ! public void open(Map conf, TopologyContext context, ! ! ! SpoutOutputCollector collector) { ! ! this.collector = collector; ! ! random = new Random(); ! } ! @Override ! public void nextTuple() { ! ! String[] sentences = new String[] { ! ! ! ! "the cow jumped over the moon", ! ! ! ! "an apple a day keeps the doctor away", ! ! ! ! "four score and seven years ago", ! ! ! ! "snow white and the seven dwarfs", ! ! ! ! "i am at two with nature" ! ! }; ! ! String sentence = sentences[random.nextInt(sentences.length)]; ! ! collector.emit(new Values(sentence)); ! }
  • 8. Spout ! @Override ! public void open(Map conf, TopologyContext context, ! ! ! SpoutOutputCollector collector) { ! ! this.collector = collector; ! ! random = new Random(); ! } ! @Override ! public void nextTuple() { ! ! String[] sentences = new String[] { ! ! ! ! "the cow jumped over the moon", ! ! ! ! "an apple a day keeps the doctor away", ! ! ! ! "four score and seven years ago", ! ! ! ! "snow white and the seven dwarfs", ! ! ! ! "i am at two with nature" ! ! }; ! ! String sentence = sentences[random.nextInt(sentences.length)]; ! ! collector.emit(new Values(sentence)); ! } ! @Override ! public void declareOutputFields(OutputFieldsDeclarer declarer) { ! ! declarer.declare(new Fields("sentence")); ! } @Override public void ack(Object msgId) {} @Override public void fail(Object msgId) {} }
  • 9. Bolt import backtype.storm.task.OutputCollector; import backtype.storm.task.TopologyContext; import backtype.storm.topology.OutputFieldsDeclarer; import backtype.storm.topology.base.BaseRichBolt; import backtype.storm.tuple.Fields; import backtype.storm.tuple.Tuple; import backtype.storm.tuple.Values; public class SplitSentenceBolt extends BaseRichBolt { ! OutputCollector collector; ! ! @Override ! public void prepare(Map stormConf, TopologyContext context, ! ! ! OutputCollector collector) { ! ! this.collector = collector; ! } ! @Override ! public void execute(Tuple input) { ! ! for (String s : input.getString(0).split("s")) { ! ! ! collector.emit(new Values(s)); ! ! } ! } ! @Override ! public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word")); ! } }
  • 10. Topology import backtype.storm.Config; import backtype.storm.LocalCluster; import backtype.storm.StormSubmitter; import backtype.storm.topology.TopologyBuilder; import backtype.storm.tuple.Fields; public class WordCountTopology { ! public static void main(String[] args) throws Exception { ! ! TopologyBuilder builder = new TopologyBuilder(); ! ! builder.setSpout("sentence", new RandomSentenceSpout(), 2); ! ! builder.setBolt("split", new SplitSentenceBolt(), 4) ! ! ! ! .shuffleGrouping("sentence") ! ! ! ! .setNumTasks(8); ! ! builder.setBolt("count", new WordCountBolt(), 6) ! ! ! ! .fieldsGrouping("split", new Fields("word")); ! ! ! ! Config config = new Config(); ! ! config.setNumWorkers(4); ! ! ! ! StormSubmitter.submitTopology("wordcount", config, builder.createTopology()); ! ! ! ! // Local testing //! ! LocalCluster cluster = new LocalCluster(); //! ! cluster.submitTopology("wordcount", config, builder.createTopology()); //! ! Thread.sleep(10000); //! ! cluster.shutdown(); ! } ! }
  • 11. Starting Topology Nimbus Thrift server ZooKeeperStormSubmitter > bin/storm jar Uploads topology JAR to Nimbus’ inbox with dependencies Submits topology configuration as JSON and structure as Thrift Copies topology JAR, configuration and structure into local file system Sets up static information for topology Makes assignment Starts topology
  • 12. Starting Topology ZooKeeper Executor Task Worker Supervisor Nimbus Thrift server Downloads topology JAR, configuration and structure Writes assignment on its node into local file system Starts worker based on the assignment Refreshes connections Makes executors Makes tasks Starts processing
  • 13. What is Storm? Storm is • Fast & scalable • Fault-tolerant • Guarantees messages will be processed • Easy to setup & operate • Free & open source distributed realtime computation system - Originally developed by Nathan Marz at BackType (acquired by Twitter) - Written in Java and Clojure
  • 15. Parallelism RandomSentence Spout SplitSentence Bolt WordCount Bolt Parallelism hint = 2 Parallelism hint = 4 Parallelism hint = 6 Number of tasks = Not specified = Same as parallelism hint = 2 Number of tasks = 8 Number of tasks = Not specified = 6 Number of topology worker = 4 Number of worker slots / node = 4 Number of worker nodes = 2 Number of executor threads = 2 + 4 + 6 = 12 Number of component instances = 2 + 8 + 6 = 16 Worker node Worker node Worker process Worker process SS Bolt WC Bolt RS Spout SS Bolt SS Bolt WC Bolt RS Spout SS Bolt SS Bolt WC Bolt SS Bolt WC Bolt SS Bolt WC Bolt SS Bolt WC Bolt Executor thread Topology can be spread out manually without downtime when a worker node is added
  • 16. Message Passing Worker process Executor Executor Transfer thread Executor Receive thread From other workers To other workers Receiver queue Transfer queue Internal transfer queue Interprocess communication is mediated by ZeroMQ Outside transfer is done with Kryo serialization Local communication is mediated by LMAX Disruptor Inside transfer is done with no serialization
  • 17. LMAX Disruptor • Consumer can easily keep up with producer by batching • CPU cache friendly - The ring is implemented as an array, so the entries can be preloaded • GC safe - The entries are preallocated up front and live forever Large concurrent magic ring buffer can be used like blocking queue Producer Consumer 6 million orders per second can be processed on a single thread at LMAX
  • 18. What is Storm? Storm is • Fast & scalable • Fault-tolerant • Guarantees messages will be processed • Easy to setup & operate • Free & open source distributed realtime computation system - Originally developed by Nathan Marz at BackType (acquired by Twitter) - Written in Java and Clojure
  • 19. Fault-tolerance Cluster works normally ZooKeeper WorkerSupervisorNimbus Monitoring cluster state Synchronizing assignment Sending heartbeat Reading worker heartbeat from local file system Sending executor heartbeat
  • 20. Fault-tolerance Nimbus goes down ZooKeeper WorkerSupervisorNimbus Synchronizing assignment Sending heartbeat Reading worker heartbeat from local file system Sending executor heartbeat Monitoring cluster state Processing will still continue. But topology lifecycle operations and reassignment facility are lost
  • 21. Fault-tolerance Worker node goes down ZooKeeper WorkerSupervisorNimbus Monitoring cluster state Synchronizing assignment Sending heartbeat Reading worker heartbeat from local file system Sending executor heartbeat WorkerSupervisor Nimbus will reassign the tasks to other machines and the processing will continue
  • 22. Fault-tolerance Supervisor goes down ZooKeeper WorkerSupervisorNimbus Monitoring cluster state Synchronizing assignment Sending heartbeat Reading worker heartbeat from local file system Sending executor heartbeat Processing will still continue. But assignment is never synchronized
  • 23. Fault-tolerance Worker process goes down ZooKeeper WorkerSupervisorNimbus Monitoring cluster state Synchronizing assignment Sending heartbeat Reading worker heartbeat from local file system Sending executor heartbeat Supervisor will restart the worker process and the processing will continue
  • 24. What is Storm? Storm is • Fast & scalable • Fault-tolerant • Guarantees messages will be processed • Easy to setup & operate • Free & open source distributed realtime computation system - Originally developed by Nathan Marz at BackType (acquired by Twitter) - Written in Java and Clojure
  • 25. Reliability API public class RandomSentenceSpout extends BaseRichSpout { ! public void nextTuple() { ! ! ...; ! ! UUID msgId = getMsgId(); ! ! collector.emit(new Values(sentence), msgId); ! } public void ack(Object msgId) { ! // Do something with acked message id. } public void fail(Object msgId) { ! // Do something with failed message id. } } public class SplitSentenceBolt extends BaseRichBolt { ! public void execute(Tuple input) { ! ! for (String s : input.getString(0).split("s")) { ! ! ! collector.emit(input, new Values(s)); ! ! } ! ! ! ! collector.ack(input); ! } } "the" "the cow jumped over the moon" "cow" "jumped" "over" "the" "moon" Emitting tuple with message id Anchoring incoming tuple to outgoing tuples Sending ack Tuple tree
  • 26. Acking Framework SplitSentence Bolt RandomSentence Spout WordCount Bolt Acker implicit bolt Acker ack Acker fail Acker init Acker implicit bolt Tuple A Tuple C Tuple B 64 bit number called “Ack val”Spout tuple id Spout task id Ack val has become 0, Acker implicit bolt knows the tuple tree has been completed Acker ack Acker fail • Emitted tuple A, XOR tuple A id with ack val • Emitted tuple B, XOR tuple B id with ack val • Emitted tuple C, XOR tuple C id with ack val • Acked tuple A, XOR tuple A id with ack val • Acked tuple B, XOR tuple B id with ack val • Acked tuple C, XOR tuple C id with ack val
  • 27. What is Storm? Storm is • Fast & scalable • Fault-tolerant • Guarantees messages will be processed • Easy to setup & operate • Free & open source distributed realtime computation system - Originally developed by Nathan Marz at BackType (acquired by Twitter) - Written in Java and Clojure
  • 28. Cluster Setup • Setup ZooKeeper cluster • Install dependencies on Nimbus and worker machines - ZeroMQ 2.1.7 and JZMQ - Java 6 and Python 2.6.6 - unzip • Download and extract a Storm release to Nimbus and worker machines • Fill in mandatory configuration into storm.yaml • Launch daemons under supervision using “storm” script
  • 32. What is Storm? Storm is • Fast & scalable • Fault-tolerant • Guarantees messages will be processed • Easy to setup & operate • Free & open source distributed realtime computation system - Originally developed by Nathan Marz at BackType (acquired by Twitter) - Written in Java and Clojure
  • 33. Basic Resources • Storm is available at - http://storm-project.net/ - https://github.com/nathanmarz/storm under Eclipse Public License 1.0 • Get help on - http://groups.google.com/group/storm-user - #storm-user freenode room • Follow - @stormprocessor and @nathanmarz for updates on the project
  • 34. Many Contributions • Community repository for modules to use Storm at - https://github.com/nathanmarz/storm-contrib including integration with Redis, Kafka, MongoDB, HBase, JMS, Amazon SQS and so on • Good articles for understanding Storm internals - http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm- topology/ - http://www.michael-noll.com/blog/2013/06/21/understanding-storm-internal-message- buffers/ • Good slides for understanding real-life examples - http://www.slideshare.net/DanLynn1/storm-as-deep-into-realtime-data-processing-as-you- can-get-in-30-minutes - http://www.slideshare.net/KrishnaGade2/storm-at-twitter
  • 35. Features on Deck • Current release: 0.8.2 as of 6/28/2013 • Work in progress (older): 0.8.3-wip3 - Some bug fixes • Work in progress (newest): 0.9.0-wip19 - SLF4J and Logback - Pluggable tuple serialization and blowfish encryption - Pluggable interprocess messaging and Netty implementation - Some bug fixes - And more
  • 36. Advanced Topics • Distributed RPC • Transactional topologies • Trident • Using non-JVM languages with Storm • Unit testing • Patterns ...Not described in this presentation. So check these out by yourself, or my upcoming session if a chance is given :)