SlideShare a Scribd company logo
Farzad Nozarian
5/9/15 @AUT
Purpose
• In this tutorial, you'll learn how to create Storm topologies and deploy
them to a Storm cluster.
• Java will be the main language used.
• This tutorial uses examples from the storm-starter project. It's
recommended that you clone the project and follow along with the
examples.
2
Components of a Storm cluster (1/2)
• Hadoop "MapReduce jobs“ vs. Storm "topologies“
• One key difference: MapReduce job eventually finishes, whereas a topology
processes messages forever (or until you kill it).
• There are two kinds of nodes on a Storm cluster:
• Master node: runs a daemon called "Nimbus“. (similar to Hadoop's "JobTracker")
• Distributing code around the cluster.
• Assigning tasks to machines.
• Monitoring for failures
• Worker nodes: runs a daemon called "Supervisor“. (similar to Hadoop's “TaskTracker")
• Listens for work assigned to its machine.
• Starts and stops worker processes.
3
Components of a Storm cluster (2/2)
• All coordination between Nimbus and the Supervisors is done through
a Zookeeper cluster
4
Topologies
• A topology is a graph of computation.
• Node in a topology contains processing logic.
• Links between nodes indicate how data should be passed around between
nodes.
• Running a topology is straightforward:
• The storm jar part takes care of connecting to Nimbus and uploading the jar.
• A topology runs forever, or until you kill it!
5
storm jar all-my-code.jar backtype.storm.MyTopology arg1 arg2
Streams (1/2)
• A stream is an unbounded sequence of tuples.
• Storm provides the primitives for transforming a stream into a new stream
in a distributed and reliable way.
• The basic primitives Storm provides for doing stream transformations are
spouts and bolts.
• Spouts and bolts have interfaces that you implement to run your
application-specific logic.
• A spout is a source of streams.
• For example, spout may connect to the Twitter API and emit a stream of
tweets.
• A bolt consumes any number of input streams, does some processing,
and possibly emits new streams.
6
Streams (2/2)
• Bolts can do anything:
• run functions,
• filter tuples,
• do streaming aggregations,
• do streaming joins,
• talk to databases.
• Networks of spouts and bolts are packaged into a "topology“.
• Edges in the graph indicate which bolts are subscribing to which streams.
• Each node in a Storm topology executes in parallel.
• You can specify how much parallelism you want for each node.
7
Data model
• Storm uses tuples as its data model.
• A tuple is a named list of values, and a field in a tuple can be an object of any
type.
• Storm supports all the primitive types, strings, and byte arrays as tuple field
values.
• To use an object of another type, you just need to implement a serializer for
the type.
• Every node in a topology must declare the output fields for the tuples it
emits:
8
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("double", "triple"));
}
A simple topology (1/8)
• Let's look at the ExclamationTopology definition from storm-starter:
• This topology contains a spout and two bolts.
• The spout emits words, and each bolt appends the string "!!!" to its input.
9
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("words", new TestWordSpout(), 10);
builder.setBolt("exclaim1", new ExclamationBolt(), 3)
.shuffleGrouping("words");
builder.setBolt("exclaim2", new ExclamationBolt(), 2)
.shuffleGrouping("exclaim1");
builder.setSpout("words", new TestWordSpout(), 10);
A simple topology (2/8)
• This code defines the nodes using the setSpout and setBolt methods.
• These methods take as input:
• A user-specified id,
• An object containing the processing logic,
• The amount of parallelism you want for the node.
• The spout is given id "words" and the bolts are given ids "exclaim1" and
"exclaim2"
• The object containing the processing logic implements
the IRichSpout interface for spouts and the IRichBolt interface for bolts.
• The last parameter, how much parallelism you want for the node, is optional.
• How many threads should execute that component across the cluster. Default,
Storm will only allocate one thread for that node.
10
A simple topology (3/8)
• setBolt returns an InputDeclarer object that is used to define the
inputs to the Bolt.
• "shuffle grouping" means that tuples should be randomly distributed from the
input tasks to the bolt's tasks.
• There are many ways to group data between components.
• Input declarations can be chained to specify multiple sources for the Bolt:
11
builder.setBolt("exclaim2", new ExclamationBolt(), 5)
.shuffleGrouping("words")
.shuffleGrouping("exclaim1");
A simple topology (4/8)
Spouts implementation
Let's dig into the implementations of the spouts and bolts in this topology.
• Spouts are responsible for emitting new messages into the topology.
• The implementation of nextTuple() in TestWordSpout looks like this:
12
public void nextTuple() {
Utils.sleep(100);
final String[] words = new String[] {"nathan", "mike", "jackson", "golda","bertels"};
final Random rand = new Random();
final String word = words[rand.nextInt(words.length)];
_collector.emit(new Values(word));
}
A simple topology (5/8)
Bolts implementation
13
public static class ExclamationBolt extends BaseRichBolt {
OutputCollector _collector;
public void prepare(Map conf, TopologyContext context, OutputCollector collector) {
_collector = collector;
}
public void execute(Tuple tuple) {
_collector.emit(tuple, new Values(tuple.getString(0) + "!!!"));
_collector.ack(tuple);
}
public void cleanup() {
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word"));
}
public Map getComponentConfiguration() {
return null;
}
}
A simple topology (6/8)
Bolts implementation
• The prepare method provides the bolt with an OutputCollector that is
used for emitting tuples from this bolt.
• Tuples can be emitted at anytime from the bolt:
• in the prepare,
• in the execute,
• or cleanup methods,
• or even asynchronously in another thread.
14
public void prepare(Map conf, TopologyContext context, OutputCollector collector){
_collector = collector;
}
A simple topology (7/8)
Bolts implementation
• The execute method receives a tuple from one of the bolt's inputs.
• The ExclamationBolt grabs the first field from the tuple and emits a
new tuple with the string "!!!" appended to it.
• In case of multiple input sources, you can find out which component
the Tuple came from by using the Tuple#getSourceComponent method.
15
public void execute(Tuple tuple) {
_collector.emit(tuple, new Values(tuple.getString(0) + "!!!"));
_collector.ack(tuple);
}
A simple topology (8/8)
Bolts implementation
• The cleanup method is called when a Bolt is being shutdown and should
cleanup any resources that were opened.
• The declareOutputFields method declares that the ExclamationBolt
emits 1-tuples with one field called "word".
• The getComponentConfiguration method allows you to configure
various aspects of how this component runs.
16
public void cleanup() {}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word"));
}
public Map getComponentConfiguration() { return null; }
Running ExclamationTopology in local mode (1/3)
• Storm has two modes of operation:
• Local mode
• Storm executes completely in process by simulating worker nodes with threads.
• Local mode is useful for testing and development of topologies.
• To create an in-process cluster, simply use the LocalCluster class.
• Distributed mode:
• Storm operates as a cluster of machines.
• Running topologies on a production cluster is similar to running in Local mode:
1. Define the topology (Use TopologyBuilder if defining using Java).
2. Use StormSubmitter to submit the topology to the cluster.
3. Create a jar containing your code and all the dependencies of your code.
4. Submit the topology to the cluster using the storm client.
17
Running ExclamationTopology in local mode (2/3)
Here's the code that runs ExclamationTopology in local mode:
• It submits a topology to the LocalCluster by calling submitTopology.
• This method takes as arguments a name for the running topology, a
configuration for the topology, and then the topology itself.
18
Config conf = new Config();
conf.setDebug(true);
conf.setNumWorkers(2);
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("test", conf, builder.createTopology());
Utils.sleep(10000);
cluster.killTopology("test");
cluster.shutdown();
Running ExclamationTopology in local mode (3/3)
• The name is used to identify the topology so that you can kill it later on.
• The configuration is used to tune various aspects of the running topology.
• The two configurations specified here are very common:
• TOPOLOGY_WORKERS (set with setNumWorkers):
• Specifies how many processes you want allocated around the cluster to execute the
topology.
• Each component in the topology will execute as many threads.
• Those threads exist within worker processes.
• TOPOLOGY_DEBUG (set with setDebug):
• When set to true, tells Storm to log every message every emitted by a component.
• This is useful in local mode when testing topologies
19
Stream groupings (1/3)
• A stream grouping tells a topology how to send tuples between two
components.
• A task level topology execution is looks something like this:
20
When a task for Bolt A
emits a tuple to Bolt B,
which task should it send
the tuple to?
Stream groupings (2/3)
Let's take a look at another topology from storm-starter: WordCountTopology
• It reads sentences off of a spout and streams out of WordCountBolt the total
number of times it has seen that word before:
• SplitSentence emits a tuple for each word in each sentence it receives.
• WordCount keeps a map in memory from word to count.
• Each time WordCount receives a word, it updates its state and emits the new word count.
21
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("sentences", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8)
.shuffleGrouping("sentences");
builder.setBolt("count", new WordCount(), 12)
.fieldsGrouping("split", new Fields("word"));
Stream groupings (3/3)
• There's a few different kinds of stream groupings:
• Shuffle grouping (simplest kind of grouping):
• Sends the tuple to a random task.
• A shuffle grouping is used in the WordCountTopology to send tuples from
RandomSentenceSpout to the SplitSentence bolt.
• Distribute the work of processing the tuples across all of SplitSentence bolt's tasks.
• Fields grouping:
• Lets you group a stream by a subset of its fields.
• This causes equal values for that subset of fields to go to the same task.
• A fields grouping is used between the SplitSentence bolt and the WordCount bolt.
• It is critical for the functioning of the WordCount bolt that the same word always go to
the same task.
22
References:
• http://storm.apache.org/documentation/Tutorial.html
23

More Related Content

What's hot

Realtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQRealtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQXin Wang
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceP. Taylor Goetz
 
Storm Real Time Computation
Storm Real Time ComputationStorm Real Time Computation
Storm Real Time ComputationSonal Raj
 
Introduction to Twitter Storm
Introduction to Twitter StormIntroduction to Twitter Storm
Introduction to Twitter StormUwe Printz
 
Streams processing with Storm
Streams processing with StormStreams processing with Storm
Streams processing with StormMariusz Gil
 
Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Robert Evans
 
Multi-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop GridMulti-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop GridDataWorks Summit
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureP. Taylor Goetz
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014P. Taylor Goetz
 
Distributed real time stream processing- why and how
Distributed real time stream processing- why and howDistributed real time stream processing- why and how
Distributed real time stream processing- why and howPetr Zapletal
 
Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.Dan Lynn
 
Storm: The Real-Time Layer - GlueCon 2012
Storm: The Real-Time Layer  - GlueCon 2012Storm: The Real-Time Layer  - GlueCon 2012
Storm: The Real-Time Layer - GlueCon 2012Dan Lynn
 
Apache Flink Training: DataStream API Part 1 Basic
 Apache Flink Training: DataStream API Part 1 Basic Apache Flink Training: DataStream API Part 1 Basic
Apache Flink Training: DataStream API Part 1 BasicFlink Forward
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormJohn Georgiadis
 

What's hot (19)

Realtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQRealtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQ
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market Sceince
 
Storm Real Time Computation
Storm Real Time ComputationStorm Real Time Computation
Storm Real Time Computation
 
Introduction to Twitter Storm
Introduction to Twitter StormIntroduction to Twitter Storm
Introduction to Twitter Storm
 
Streams processing with Storm
Streams processing with StormStreams processing with Storm
Streams processing with Storm
 
Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)
 
Multi-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop GridMulti-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop Grid
 
Apache Storm Internals
Apache Storm InternalsApache Storm Internals
Apache Storm Internals
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm Architecture
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014
 
Apache Storm
Apache StormApache Storm
Apache Storm
 
Distributed real time stream processing- why and how
Distributed real time stream processing- why and howDistributed real time stream processing- why and how
Distributed real time stream processing- why and how
 
Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.
 
Yahoo compares Storm and Spark
Yahoo compares Storm and SparkYahoo compares Storm and Spark
Yahoo compares Storm and Spark
 
Resource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache StormResource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache Storm
 
Storm: The Real-Time Layer - GlueCon 2012
Storm: The Real-Time Layer  - GlueCon 2012Storm: The Real-Time Layer  - GlueCon 2012
Storm: The Real-Time Layer - GlueCon 2012
 
Storm and Cassandra
Storm and Cassandra Storm and Cassandra
Storm and Cassandra
 
Apache Flink Training: DataStream API Part 1 Basic
 Apache Flink Training: DataStream API Part 1 Basic Apache Flink Training: DataStream API Part 1 Basic
Apache Flink Training: DataStream API Part 1 Basic
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and Storm
 

Viewers also liked

Apache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignApache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignMichael Noll
 
Apache HDFS - Lab Assignment
Apache HDFS - Lab AssignmentApache HDFS - Lab Assignment
Apache HDFS - Lab AssignmentFarzad Nozarian
 
Apache HBase - Lab Assignment
Apache HBase - Lab AssignmentApache HBase - Lab Assignment
Apache HBase - Lab AssignmentFarzad Nozarian
 
Apache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce TutorialApache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce TutorialFarzad Nozarian
 
Big Data Processing in Cloud Computing Environments
Big Data Processing in Cloud Computing EnvironmentsBig Data Processing in Cloud Computing Environments
Big Data Processing in Cloud Computing EnvironmentsFarzad Nozarian
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud ComputingFarzad Nozarian
 
S4: Distributed Stream Computing Platform
S4: Distributed Stream Computing PlatformS4: Distributed Stream Computing Platform
S4: Distributed Stream Computing PlatformFarzad Nozarian
 
Big data Clustering Algorithms And Strategies
Big data Clustering Algorithms And StrategiesBig data Clustering Algorithms And Strategies
Big data Clustering Algorithms And StrategiesFarzad Nozarian
 

Viewers also liked (11)

Apache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignApache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - Verisign
 
Apache HDFS - Lab Assignment
Apache HDFS - Lab AssignmentApache HDFS - Lab Assignment
Apache HDFS - Lab Assignment
 
Shark - Lab Assignment
Shark - Lab AssignmentShark - Lab Assignment
Shark - Lab Assignment
 
Apache HBase - Lab Assignment
Apache HBase - Lab AssignmentApache HBase - Lab Assignment
Apache HBase - Lab Assignment
 
Apache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce TutorialApache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce Tutorial
 
Big Data Processing in Cloud Computing Environments
Big Data Processing in Cloud Computing EnvironmentsBig Data Processing in Cloud Computing Environments
Big Data Processing in Cloud Computing Environments
 
Object Based Databases
Object Based DatabasesObject Based Databases
Object Based Databases
 
Apache Spark Tutorial
Apache Spark TutorialApache Spark Tutorial
Apache Spark Tutorial
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud Computing
 
S4: Distributed Stream Computing Platform
S4: Distributed Stream Computing PlatformS4: Distributed Stream Computing Platform
S4: Distributed Stream Computing Platform
 
Big data Clustering Algorithms And Strategies
Big data Clustering Algorithms And StrategiesBig data Clustering Algorithms And Strategies
Big data Clustering Algorithms And Strategies
 

Similar to Apache Storm Tutorial (20)

Storm
StormStorm
Storm
 
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation systemBWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
 
Storm 0.8.2
Storm 0.8.2Storm 0.8.2
Storm 0.8.2
 
Cleveland HUG - Storm
Cleveland HUG - StormCleveland HUG - Storm
Cleveland HUG - Storm
 
Distributed Realtime Computation using Apache Storm
Distributed Realtime Computation using Apache StormDistributed Realtime Computation using Apache Storm
Distributed Realtime Computation using Apache Storm
 
1 storm-intro
1 storm-intro1 storm-intro
1 storm-intro
 
STORM
STORMSTORM
STORM
 
Storm
StormStorm
Storm
 
.NET Multithreading/Multitasking
.NET Multithreading/Multitasking.NET Multithreading/Multitasking
.NET Multithreading/Multitasking
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
java Statements
java Statementsjava Statements
java Statements
 
Apache Storm Basics
Apache Storm BasicsApache Storm Basics
Apache Storm Basics
 
Apache Storm
Apache StormApache Storm
Apache Storm
 
Introduction to Apache Storm - Concept & Example
Introduction to Apache Storm - Concept & ExampleIntroduction to Apache Storm - Concept & Example
Introduction to Apache Storm - Concept & Example
 
Real-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpacesReal-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpaces
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
Slide #2: How to Setup Apache STROM
Slide #2: How to Setup Apache STROMSlide #2: How to Setup Apache STROM
Slide #2: How to Setup Apache STROM
 
Slide #2: Setup Apache Storm
Slide #2: Setup Apache StormSlide #2: Setup Apache Storm
Slide #2: Setup Apache Storm
 
StormCrawler at Bristech
StormCrawler at BristechStormCrawler at Bristech
StormCrawler at Bristech
 
Blocks & GCD
Blocks & GCDBlocks & GCD
Blocks & GCD
 

Recently uploaded

Breaking the Code : A Guide to WhatsApp Business API.pdf
Breaking the Code : A Guide to WhatsApp Business API.pdfBreaking the Code : A Guide to WhatsApp Business API.pdf
Breaking the Code : A Guide to WhatsApp Business API.pdfMeon Technology
 
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1KnowledgeSeed
 
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfA Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfkalichargn70th171
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownloadvrstrong314
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...informapgpstrackings
 
Agnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in KrakówAgnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in Krakówbim.edu.pl
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...Alluxio, Inc.
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?XfilesPro
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTier1 app
 
iGaming Platform & Lottery Solutions by Skilrock
iGaming Platform & Lottery Solutions by SkilrockiGaming Platform & Lottery Solutions by Skilrock
iGaming Platform & Lottery Solutions by SkilrockSkilrock Technologies
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAlluxio, Inc.
 
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfmbmh111980
 
Studiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting softwareStudiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting softwareinfo611746
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfOrtus Solutions, Corp
 
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...Abortion Clinic
 
De mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEDe mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEJelle | Nordend
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAlluxio, Inc.
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessWSO2
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
 

Recently uploaded (20)

Breaking the Code : A Guide to WhatsApp Business API.pdf
Breaking the Code : A Guide to WhatsApp Business API.pdfBreaking the Code : A Guide to WhatsApp Business API.pdf
Breaking the Code : A Guide to WhatsApp Business API.pdf
 
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
 
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfA Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 
Agnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in KrakówAgnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in Kraków
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
iGaming Platform & Lottery Solutions by Skilrock
iGaming Platform & Lottery Solutions by SkilrockiGaming Platform & Lottery Solutions by Skilrock
iGaming Platform & Lottery Solutions by Skilrock
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
 
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
 
Studiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting softwareStudiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting software
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
 
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
 
De mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEDe mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FME
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 

Apache Storm Tutorial

  • 2. Purpose • In this tutorial, you'll learn how to create Storm topologies and deploy them to a Storm cluster. • Java will be the main language used. • This tutorial uses examples from the storm-starter project. It's recommended that you clone the project and follow along with the examples. 2
  • 3. Components of a Storm cluster (1/2) • Hadoop "MapReduce jobs“ vs. Storm "topologies“ • One key difference: MapReduce job eventually finishes, whereas a topology processes messages forever (or until you kill it). • There are two kinds of nodes on a Storm cluster: • Master node: runs a daemon called "Nimbus“. (similar to Hadoop's "JobTracker") • Distributing code around the cluster. • Assigning tasks to machines. • Monitoring for failures • Worker nodes: runs a daemon called "Supervisor“. (similar to Hadoop's “TaskTracker") • Listens for work assigned to its machine. • Starts and stops worker processes. 3
  • 4. Components of a Storm cluster (2/2) • All coordination between Nimbus and the Supervisors is done through a Zookeeper cluster 4
  • 5. Topologies • A topology is a graph of computation. • Node in a topology contains processing logic. • Links between nodes indicate how data should be passed around between nodes. • Running a topology is straightforward: • The storm jar part takes care of connecting to Nimbus and uploading the jar. • A topology runs forever, or until you kill it! 5 storm jar all-my-code.jar backtype.storm.MyTopology arg1 arg2
  • 6. Streams (1/2) • A stream is an unbounded sequence of tuples. • Storm provides the primitives for transforming a stream into a new stream in a distributed and reliable way. • The basic primitives Storm provides for doing stream transformations are spouts and bolts. • Spouts and bolts have interfaces that you implement to run your application-specific logic. • A spout is a source of streams. • For example, spout may connect to the Twitter API and emit a stream of tweets. • A bolt consumes any number of input streams, does some processing, and possibly emits new streams. 6
  • 7. Streams (2/2) • Bolts can do anything: • run functions, • filter tuples, • do streaming aggregations, • do streaming joins, • talk to databases. • Networks of spouts and bolts are packaged into a "topology“. • Edges in the graph indicate which bolts are subscribing to which streams. • Each node in a Storm topology executes in parallel. • You can specify how much parallelism you want for each node. 7
  • 8. Data model • Storm uses tuples as its data model. • A tuple is a named list of values, and a field in a tuple can be an object of any type. • Storm supports all the primitive types, strings, and byte arrays as tuple field values. • To use an object of another type, you just need to implement a serializer for the type. • Every node in a topology must declare the output fields for the tuples it emits: 8 @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("double", "triple")); }
  • 9. A simple topology (1/8) • Let's look at the ExclamationTopology definition from storm-starter: • This topology contains a spout and two bolts. • The spout emits words, and each bolt appends the string "!!!" to its input. 9 TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("words", new TestWordSpout(), 10); builder.setBolt("exclaim1", new ExclamationBolt(), 3) .shuffleGrouping("words"); builder.setBolt("exclaim2", new ExclamationBolt(), 2) .shuffleGrouping("exclaim1");
  • 10. builder.setSpout("words", new TestWordSpout(), 10); A simple topology (2/8) • This code defines the nodes using the setSpout and setBolt methods. • These methods take as input: • A user-specified id, • An object containing the processing logic, • The amount of parallelism you want for the node. • The spout is given id "words" and the bolts are given ids "exclaim1" and "exclaim2" • The object containing the processing logic implements the IRichSpout interface for spouts and the IRichBolt interface for bolts. • The last parameter, how much parallelism you want for the node, is optional. • How many threads should execute that component across the cluster. Default, Storm will only allocate one thread for that node. 10
  • 11. A simple topology (3/8) • setBolt returns an InputDeclarer object that is used to define the inputs to the Bolt. • "shuffle grouping" means that tuples should be randomly distributed from the input tasks to the bolt's tasks. • There are many ways to group data between components. • Input declarations can be chained to specify multiple sources for the Bolt: 11 builder.setBolt("exclaim2", new ExclamationBolt(), 5) .shuffleGrouping("words") .shuffleGrouping("exclaim1");
  • 12. A simple topology (4/8) Spouts implementation Let's dig into the implementations of the spouts and bolts in this topology. • Spouts are responsible for emitting new messages into the topology. • The implementation of nextTuple() in TestWordSpout looks like this: 12 public void nextTuple() { Utils.sleep(100); final String[] words = new String[] {"nathan", "mike", "jackson", "golda","bertels"}; final Random rand = new Random(); final String word = words[rand.nextInt(words.length)]; _collector.emit(new Values(word)); }
  • 13. A simple topology (5/8) Bolts implementation 13 public static class ExclamationBolt extends BaseRichBolt { OutputCollector _collector; public void prepare(Map conf, TopologyContext context, OutputCollector collector) { _collector = collector; } public void execute(Tuple tuple) { _collector.emit(tuple, new Values(tuple.getString(0) + "!!!")); _collector.ack(tuple); } public void cleanup() { } public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word")); } public Map getComponentConfiguration() { return null; } }
  • 14. A simple topology (6/8) Bolts implementation • The prepare method provides the bolt with an OutputCollector that is used for emitting tuples from this bolt. • Tuples can be emitted at anytime from the bolt: • in the prepare, • in the execute, • or cleanup methods, • or even asynchronously in another thread. 14 public void prepare(Map conf, TopologyContext context, OutputCollector collector){ _collector = collector; }
  • 15. A simple topology (7/8) Bolts implementation • The execute method receives a tuple from one of the bolt's inputs. • The ExclamationBolt grabs the first field from the tuple and emits a new tuple with the string "!!!" appended to it. • In case of multiple input sources, you can find out which component the Tuple came from by using the Tuple#getSourceComponent method. 15 public void execute(Tuple tuple) { _collector.emit(tuple, new Values(tuple.getString(0) + "!!!")); _collector.ack(tuple); }
  • 16. A simple topology (8/8) Bolts implementation • The cleanup method is called when a Bolt is being shutdown and should cleanup any resources that were opened. • The declareOutputFields method declares that the ExclamationBolt emits 1-tuples with one field called "word". • The getComponentConfiguration method allows you to configure various aspects of how this component runs. 16 public void cleanup() {} public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word")); } public Map getComponentConfiguration() { return null; }
  • 17. Running ExclamationTopology in local mode (1/3) • Storm has two modes of operation: • Local mode • Storm executes completely in process by simulating worker nodes with threads. • Local mode is useful for testing and development of topologies. • To create an in-process cluster, simply use the LocalCluster class. • Distributed mode: • Storm operates as a cluster of machines. • Running topologies on a production cluster is similar to running in Local mode: 1. Define the topology (Use TopologyBuilder if defining using Java). 2. Use StormSubmitter to submit the topology to the cluster. 3. Create a jar containing your code and all the dependencies of your code. 4. Submit the topology to the cluster using the storm client. 17
  • 18. Running ExclamationTopology in local mode (2/3) Here's the code that runs ExclamationTopology in local mode: • It submits a topology to the LocalCluster by calling submitTopology. • This method takes as arguments a name for the running topology, a configuration for the topology, and then the topology itself. 18 Config conf = new Config(); conf.setDebug(true); conf.setNumWorkers(2); LocalCluster cluster = new LocalCluster(); cluster.submitTopology("test", conf, builder.createTopology()); Utils.sleep(10000); cluster.killTopology("test"); cluster.shutdown();
  • 19. Running ExclamationTopology in local mode (3/3) • The name is used to identify the topology so that you can kill it later on. • The configuration is used to tune various aspects of the running topology. • The two configurations specified here are very common: • TOPOLOGY_WORKERS (set with setNumWorkers): • Specifies how many processes you want allocated around the cluster to execute the topology. • Each component in the topology will execute as many threads. • Those threads exist within worker processes. • TOPOLOGY_DEBUG (set with setDebug): • When set to true, tells Storm to log every message every emitted by a component. • This is useful in local mode when testing topologies 19
  • 20. Stream groupings (1/3) • A stream grouping tells a topology how to send tuples between two components. • A task level topology execution is looks something like this: 20 When a task for Bolt A emits a tuple to Bolt B, which task should it send the tuple to?
  • 21. Stream groupings (2/3) Let's take a look at another topology from storm-starter: WordCountTopology • It reads sentences off of a spout and streams out of WordCountBolt the total number of times it has seen that word before: • SplitSentence emits a tuple for each word in each sentence it receives. • WordCount keeps a map in memory from word to count. • Each time WordCount receives a word, it updates its state and emits the new word count. 21 TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("sentences", new RandomSentenceSpout(), 5); builder.setBolt("split", new SplitSentence(), 8) .shuffleGrouping("sentences"); builder.setBolt("count", new WordCount(), 12) .fieldsGrouping("split", new Fields("word"));
  • 22. Stream groupings (3/3) • There's a few different kinds of stream groupings: • Shuffle grouping (simplest kind of grouping): • Sends the tuple to a random task. • A shuffle grouping is used in the WordCountTopology to send tuples from RandomSentenceSpout to the SplitSentence bolt. • Distribute the work of processing the tuples across all of SplitSentence bolt's tasks. • Fields grouping: • Lets you group a stream by a subset of its fields. • This causes equal values for that subset of fields to go to the same task. • A fields grouping is used between the SplitSentence bolt and the WordCount bolt. • It is critical for the functioning of the WordCount bolt that the same word always go to the same task. 22