This document provides an overview of Storm, an open source distributed real-time computation system. It describes key Storm concepts like tuples, streams, spouts, bolts and topologies. It provides an example of a streaming word count topology and discusses tips for using Storm like using log aggregation, rolling deployments, tuning parallelism and anchoring tuples. It also briefly introduces the Trident abstraction for Storm.
34. Streaming Word Count
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("sentences", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8)
.shuffleGrouping("sentences");
builder.setBolt("count", new WordCount(), 12)
.fieldsGrouping("split", new Fields("word"));
35. Streaming Word Count
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("sentences", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8)
.shuffleGrouping("sentences");
builder.setBolt("count", new WordCount(), 12)
.fieldsGrouping("split", new Fields("word"));
36. Streaming Word Count
public static class SplitSentence extends ShellBolt implements IRichBolt {
public SplitSentence() {
super("python", "splitsentence.py");
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word"));
}
@Override
public Map<String, Object> getComponentConfiguration() {
return null;
}
}
SplitSentence.java
37. Streaming Word Count
public static class SplitSentence extends ShellBolt implements IRichBolt {
public SplitSentence() {
super("python", "splitsentence.py");
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word"));
}
@Override
public Map<String, Object> getComponentConfiguration() {
return null;
}
}
SplitSentence.java
splitsentence.py
38. Streaming Word Count
public static class SplitSentence extends ShellBolt implements IRichBolt {
public SplitSentence() {
super("python", "splitsentence.py");
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word"));
}
@Override
public Map<String, Object> getComponentConfiguration() {
return null;
}
}
SplitSentence.java
39. Streaming Word Count
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("sentences", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8)
.shuffleGrouping("sentences");
builder.setBolt("count", new WordCount(), 12)
.fieldsGrouping("split", new Fields("word"));
java
40. Streaming Word Count
public static class WordCount extends BaseBasicBolt {
Map<String, Integer> counts = new HashMap<String, Integer>();
@Override
public void execute(Tuple tuple, BasicOutputCollector collector) {
String word = tuple.getString(0);
Integer count = counts.get(word);
if(count==null) count = 0;
count++;
counts.put(word, count);
collector.emit(new Values(word, count));
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word", "count"));
}
}
WordCount.java
41. Streaming Word Count
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("sentences", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8)
.shuffleGrouping("sentences");
builder.setBolt("count", new WordCount(), 12)
.fieldsGrouping("split", new Fields("word"));
java
Groupings control how tuples are routed
54. TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("sentences", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8)
.shuffleGrouping("sentences");
builder.setBolt("count", new WordCount(), 12)
.fieldsGrouping("split", new Fields("word"));
java
see:
https://github.com/nathanmarz/storm/wiki/Understanding-the-parallelism-of-a-Storm-topology
Tune your parallelism
55. Tune your parallelism
Supervisor
Worker
Process
(JVM)
Executor
(thread)
Task
Task
Executor
(thread)
Task
Task
Worker
Process
(JVM)
Executor
(thread)
Task
Task
Executor
(thread)
Task
Task
Parallelism hints control
the number of Executors
59. A little taste of Trident
TridentState
urlToTweeters
=
topology.newStaticState(getUrlToTweetersState());
TridentState
tweetersToFollowers
=
topology.newStaticState(getTweeterToFollowersState());
topology.newDRPCStream("reach")
.stateQuery(urlToTweeters,
new
Fields("args"),
new
MapGet(),
new
Fields("tweeters"))
.each(new
Fields("tweeters"),
new
ExpandList(),
new
Fields("tweeter"))
.shuffle()
.stateQuery(tweetersToFollowers,
new
Fields("tweeter"),
new
MapGet(),
new
Fields("followers"))
.parallelismHint(200)
.each(new
Fields("followers"),
new
ExpandList(),
new
Fields("follower"))
.groupBy(new
Fields("follower"))
.aggregate(new
One(),
new
Fields("one"))
.parallelismHint(20)
.aggregate(new
Count(),
new
Fields("reach"));
h,ps://github.com/nathanmarz/storm/wiki/Trident-‐tutorial