The Future of Apache Storm
Hadoop Summit 2016, San Jose, CA
P. Taylor Goetz, Hortonworks
@ptgoetz
About Me
• Tech Staff @ Hortonworks
• PMC Chair, Apache Storm
• ASF Member
• PMC, Apache Incubator, Apache Arrow, Apache
Kylin, Apache Apex
• Mentor/PPMC, Apache Eagle (Incubating), Apache
Mynewt (Incubating), Apache Metron (Incubating),
Apache Gossip (Incubating)
Apache Storm 0.9.x
Storm moves to Apache
Apache Storm 0.9.x
• First official Apache Release
• Storm becomes an Apache TLP
• 0mq to Netty for inter-worker communication
• Expanded Integration (Kafka, HDFS, HBase)
• Dependency conflict reduction (It was a start ;) )
Apache Storm 0.10.x
Enterprise Readiness
Apache Storm 0.10.x
• Security, Multi-Tenancy
• Enable Rolling Upgrades
• Flux (declarative topology wiring/configuration)
• Partial Key Groupings
Apache Storm 0.10.x
• Improved logging (Log4j 2)
• Streaming Ingest to Apache Hive
• Azure Event Hubs Integration
• Redis Integration
• JDBC Integration
Apache Storm 1.0
Maturity and Improved Performance
Release Date: April 12, 2016
Pacemaker
Heartbeat Server
Pacemaker
• Replaces Zookeeper for Heartbeats
• In-Memory key-value store
• Allows Scaling to 2k-3k+ Nodes
• Secure: Kerberos/Digest Authentication
Pacemaker
• Compared to Zookeeper:
• Less Memory/CPU
• No Disk
• Spared the overhead of maintaining consistency
Distributed Cache API
Distributed Cache API
• Topology resources:
• Dictionaries, ML Models, Geolocation Data, etc.
• Typically packaged in topology jar
• Fine for small files
• Large files negatively impact topology startup time
• Immutable: Changes require repackaging and deployment
Distributed Cache API
• Allows sharing of files (BLOBs) among topologies
• Files can be updated from the command line
• Allows for files from several KB to several GB in size
• Files can change over the lifetime of the topology
• Allows for compression (e.g. zip, tar, gzip)
Distributed Cache API
• Two implementations: LocalFsBlobStore and HdfsBlobStore
• Local implementation supports Replication Factor (not needed for
HDFS-backed implementation)
• Both support ACLs
Distributed Cache API
Creating a blob:
storm blobstore create --file dict.txt --acl o::rwa
--repl-fctr 2 key1
Making it available to a topology:
storm jar topo.jar my.topo.Class test_topo -c
topology.blobstore.map=‘{"key1":
{"localname":"dict.txt", "uncompress":"false"}}'
High Availability Nimbus
Before HA Nimbus
ZooKeeperNimbus
Supervisor Supervisor Supervisor Supervisor
Worker* Worker* Worker* Worker*
HA Nimbus
Pacemaker
(ZooKeeper)Nimbus
Supervisor Supervisor Supervisor Supervisor
Worker* Worker* Worker* Worker*
Nimbus
Nimbus
Leader
HA Nimbus - Failover
Pacemaker
(ZooKeeper)Nimbus
Supervisor Supervisor Supervisor Supervisor
Worker* Worker* Worker* Worker*
Nimbus
Nimbus
Leader
X
Leader Election
HA Nimbus - Failover
Pacemaker
(ZooKeeper)Nimbus
Supervisor Supervisor Supervisor Supervisor
Worker* Worker* Worker* Worker*
Nimbus
Nimbus
X
Leader
HA Nimbus
• Increase overall availability of Nimbus
• Nimbus hosts can join/leave at any time
• Leverages Distributed Cache API
• Topology JAR, Config, and Serialized Topology uploaded to
Distributed Cache
• Replication guarantees availability of all files
Native Streaming Windows
Streaming Windows
• Specify Length - Duration or Tuple Count
• Slide Interval - How often to advance the window
Sliding Windows
Windows can overlap
{…} {…} {…} {…} {…} {…} {…} {…} {…}
Time
Window 1 Window 2
Tumbling Windows
Windows do not overlap
{…} {…} {…} {…} {…} {…} {…} {…} {…}
Time
Window 1 Window 2 Window 3
Streaming Windows
• Timestamps (Event Time, Ingestion Time and Processing Time)
• Out of Order Tuples, Late Tuples
• Watermarks
• Window State Checkpointing
Sate Management
Stateful Bolts with Automatic Checkpointing
What you see.
Spout Stateful Bolt 1 Stateful Bolt 2Bolt
State Management
State Management
public class WordCountBolt extends BaseStatefulBolt<KeyValueState> {
private KeyValueState wordCounts;
private OutputCollector collector;
public void prepare(Map conf, TopologyContext context, OutputCollector collector) {
this.collector = collector;
}
public void initState(KeyValueState state) {
this.wordCounts = state;
}
public void execute(Tuple tuple) {
String word = tuple.getString(0);
Integer count = (Integer) wordCounts.get(word, 0);
count++;
wordCounts.put(word, count);
collector.emit(new Values(word, count));
}
}
public class WordCountBolt extends BaseStatefulBolt<KeyValueState> {
private KeyValueState wordCounts;
private OutputCollector collector;
public void prepare(Map conf, TopologyContext context, OutputCollector collector) {
this.collector = collector;
}
public void initState(KeyValueState state) {
this.wordCounts = state;
}
public void execute(Tuple tuple) {
String word = tuple.getString(0);
Integer count = (Integer) wordCounts.get(word, 0);
count++;
wordCounts.put(word, count);
collector.emit(new Values(word, count));
}
}
Initialize State
State Management
public class WordCountBolt extends BaseStatefulBolt<KeyValueState> {
private KeyValueState wordCounts;
private OutputCollector collector;
public void prepare(Map conf, TopologyContext context, OutputCollector collector) {
this.collector = collector;
}
public void initState(KeyValueState state) {
this.wordCounts = state;
}
public void execute(Tuple tuple) {
String word = tuple.getString(0);
Integer count = (Integer) wordCounts.get(word, 0);
count++;
wordCounts.put(word, count);
collector.emit(new Values(word, count));
}
}
Read/Update State
State Management
Sate Management
Automatic Checkpointing
Checkpointing/Snapshotting
• Asynchronous Barrier Snapshotting (ABS) algorithm [1]
• Chandy-Lamport Algorithm [2]
[1] http://arxiv.org/pdf/1506.08603v1.pdf
[2] http://research.microsoft.com/en-us/um/people/lamport/pubs/chandy.pdf
State Management
Checkpointing/Snapshotting: What you see.
Spout Stateful Bolt 1 Stateful Bolt 2Bolt
Storm State Management
execute/update state execute execute/update state
Checkpointing/Snapshotting: What you get.
Spout Stateful Bolt 1 Stateful Bolt 2
Checkpoint Spout ACKER
State Store
Bolt
$chkpt
$chkpt
$chkpt
ACK
ACK
ACK
Storm State Management
Automatic Back Pressure
Automatic Back Pressure
• In previous Storm versions, the only way to throttle topologies was to
enable ACKing and set topology.spout.max.pending.
• If you don’t require at-least-once guarantees, this imposed a
significant performance penalty.**
** In Storm 1.0 this penalty is drastically reduced (more on this later)
Automatic Backpressure
• High/Low Watermarks (expressed as % of buffer size)
• Back Pressure thread monitors buffers
• If High Watermark reached, slow down Spouts
• If Low Watermark reached, stop throttling
• All Spouts Supported
Resource Aware Scheduler
(RAS)
Resource Aware Scheduler
• Specify the resource requirements (Memory/CPU) for individual
topology components (Spouts/Bolts)
• Memory: On-Heap / Off-Heap (if off-heap is used)
• CPU: Point system based on number of cores
• Resource requirements are per component instance (parallelism
matters)
Resource Aware Scheduler
• CPU and Memory availability described in storm.yaml on each
supervisor node. E.g.:



supervisor.memory.capacity.mb: 3072.0

supervisor.cpu.capacity: 400.0
• Convention for CPU capacity is to use 100 for each CPU core
Resource Aware Scheduler
Setting component resource requirements:
SpoutDeclarer spout = builder.setSpout("sp1", new TestSpout(), 10);
//set cpu requirement
spout.setCPULoad(20);
//set onheap and offheap memory requirement
spout.setMemoryLoad(64, 16);
BoltDeclarer bolt1 = builder.setBolt("b1", new MyBolt(), 3).shuffleGrouping("sp1");
//sets cpu requirement. Not neccessary to set both CPU and memory.
//For requirements not set, a default value will be used
bolt1.setCPULoad(15);
BoltDeclarer bolt2 = builder.setBolt("b2", new MyBolt(), 2).shuffleGrouping("b1");
bolt2.setMemoryLoad(100);
Storm Usability Improvements
Enhanced Debugging and Monitoring of Topologies
Dynamic Log Level Settings
Dynamic Log Levels
• Set log level setting for a running topology
• Via Storm UI and CLI
• Optional timeout after which changes will be reverted
• Logs searchable from Storm UI/Logviewer
Dynamic Log Levels
Via Storm UI:
Dynamic Log Levels
Via Storm CLI:
./bin/storm set_log_level [topology name] -l
[logger_name]=[LEVEL]:[TIMEOUT]
Tuple Sampling
• No more debug bolts or Trident functions!
• In Storm UI: Select a Topology or component and click “Debug”
• Specify a sampling percentage (% of tuples to be sampled)
• Click on the “Events” link to view the sample log.
Distributed Log Search
• Search across all log files for a specific topology
• Search in archived (ZIP) logs
• Results include matches from all Supervisor nodes
Dynamic Worker Profiling
• Request worker profile data from Storm UI:
• Heap Dumps
• JStack Output
• JProfile Recordings
• Download generated files for off-line analysis
• Restart workers from UI
Supervisor Health Checks
• Identify Supervisor nodes that are in a bad state
• Automatically decommission bad nodes
• Simple shell script
• You define what constitutes “Unhealthy”
New Integrations
• Cassandra
• Solr
• Elastic Search
• MQTT
Integration Improvements
• Kafka
• HDFS Spout
• Avro Integration for HDFS
• HBase
• Hive
Before I forget...
Performance
Up to 16x faster throughput.
Realistically 3x -- Highly dependent on use case and fault tolerance settings
> 60% Latency Reduction
Bear in mind performance varies
widely depending on the use case.
Consider the origin and motivation
behind any third party benchmark.
The most important benchmarks
are the ones you do.
Storm 1.1.0
Summer 2016
Apache Storm v1.1.0
• Revamped metrics API
• Focus on user-defined metrics
• More metrics available in Storm UI
• Enhanced metrics integration with Apache Ambari
What’s next for Storm?
2.0 and Beyond
Clojure to Java
Broadening the contributor base
Clojure to Java
Alibaba JStorm Contribution
Streaming SQL
Currently Beta/WIP
Apache Beam (incubating) Integration
• Unified API for dealing with bounded/
unbounded data sources (i.e. batch/
streaming)
• One API. Multiple implementations
(execution engines). Called
“Runners” in Beamspeak.
Twitter Heron
2 years late.
Twitter Heron vs. Storm
• Heron benchmarked against pre-Apache WIP version of Storm.
• 2 years late to the game 

(the Apache Storm community has been anything but idle)
• Storm is now ahead in terms of features and on par wrt
performance.
• Apache is about collaboration, and the Storm community is
committed to advancing innovation in stream processing.
Is Apache Storm Dead?
Competitors may say so. But hundreds of successful production deployments, and
a vibrant, growing developer community tell a very different story.
Thank you!
Questions?
P. Taylor Goetz, Hortonworks
@ptgoetz

The Future of Apache Storm

  • 1.
    The Future ofApache Storm Hadoop Summit 2016, San Jose, CA P. Taylor Goetz, Hortonworks @ptgoetz
  • 2.
    About Me • TechStaff @ Hortonworks • PMC Chair, Apache Storm • ASF Member • PMC, Apache Incubator, Apache Arrow, Apache Kylin, Apache Apex • Mentor/PPMC, Apache Eagle (Incubating), Apache Mynewt (Incubating), Apache Metron (Incubating), Apache Gossip (Incubating)
  • 3.
    Apache Storm 0.9.x Stormmoves to Apache
  • 4.
    Apache Storm 0.9.x •First official Apache Release • Storm becomes an Apache TLP • 0mq to Netty for inter-worker communication • Expanded Integration (Kafka, HDFS, HBase) • Dependency conflict reduction (It was a start ;) )
  • 5.
  • 6.
    Apache Storm 0.10.x •Security, Multi-Tenancy • Enable Rolling Upgrades • Flux (declarative topology wiring/configuration) • Partial Key Groupings
  • 7.
    Apache Storm 0.10.x •Improved logging (Log4j 2) • Streaming Ingest to Apache Hive • Azure Event Hubs Integration • Redis Integration • JDBC Integration
  • 8.
    Apache Storm 1.0 Maturityand Improved Performance Release Date: April 12, 2016
  • 9.
  • 10.
    Pacemaker • Replaces Zookeeperfor Heartbeats • In-Memory key-value store • Allows Scaling to 2k-3k+ Nodes • Secure: Kerberos/Digest Authentication
  • 11.
    Pacemaker • Compared toZookeeper: • Less Memory/CPU • No Disk • Spared the overhead of maintaining consistency
  • 12.
  • 13.
    Distributed Cache API •Topology resources: • Dictionaries, ML Models, Geolocation Data, etc. • Typically packaged in topology jar • Fine for small files • Large files negatively impact topology startup time • Immutable: Changes require repackaging and deployment
  • 14.
    Distributed Cache API •Allows sharing of files (BLOBs) among topologies • Files can be updated from the command line • Allows for files from several KB to several GB in size • Files can change over the lifetime of the topology • Allows for compression (e.g. zip, tar, gzip)
  • 15.
    Distributed Cache API •Two implementations: LocalFsBlobStore and HdfsBlobStore • Local implementation supports Replication Factor (not needed for HDFS-backed implementation) • Both support ACLs
  • 16.
    Distributed Cache API Creatinga blob: storm blobstore create --file dict.txt --acl o::rwa --repl-fctr 2 key1 Making it available to a topology: storm jar topo.jar my.topo.Class test_topo -c topology.blobstore.map=‘{"key1": {"localname":"dict.txt", "uncompress":"false"}}'
  • 17.
  • 18.
    Before HA Nimbus ZooKeeperNimbus SupervisorSupervisor Supervisor Supervisor Worker* Worker* Worker* Worker*
  • 19.
    HA Nimbus Pacemaker (ZooKeeper)Nimbus Supervisor SupervisorSupervisor Supervisor Worker* Worker* Worker* Worker* Nimbus Nimbus Leader
  • 20.
    HA Nimbus -Failover Pacemaker (ZooKeeper)Nimbus Supervisor Supervisor Supervisor Supervisor Worker* Worker* Worker* Worker* Nimbus Nimbus Leader X Leader Election
  • 21.
    HA Nimbus -Failover Pacemaker (ZooKeeper)Nimbus Supervisor Supervisor Supervisor Supervisor Worker* Worker* Worker* Worker* Nimbus Nimbus X Leader
  • 22.
    HA Nimbus • Increaseoverall availability of Nimbus • Nimbus hosts can join/leave at any time • Leverages Distributed Cache API • Topology JAR, Config, and Serialized Topology uploaded to Distributed Cache • Replication guarantees availability of all files
  • 23.
  • 24.
    Streaming Windows • SpecifyLength - Duration or Tuple Count • Slide Interval - How often to advance the window
  • 25.
    Sliding Windows Windows canoverlap {…} {…} {…} {…} {…} {…} {…} {…} {…} Time Window 1 Window 2
  • 26.
    Tumbling Windows Windows donot overlap {…} {…} {…} {…} {…} {…} {…} {…} {…} Time Window 1 Window 2 Window 3
  • 27.
    Streaming Windows • Timestamps(Event Time, Ingestion Time and Processing Time) • Out of Order Tuples, Late Tuples • Watermarks • Window State Checkpointing
  • 28.
    Sate Management Stateful Boltswith Automatic Checkpointing
  • 29.
    What you see. SpoutStateful Bolt 1 Stateful Bolt 2Bolt State Management
  • 30.
    State Management public classWordCountBolt extends BaseStatefulBolt<KeyValueState> { private KeyValueState wordCounts; private OutputCollector collector; public void prepare(Map conf, TopologyContext context, OutputCollector collector) { this.collector = collector; } public void initState(KeyValueState state) { this.wordCounts = state; } public void execute(Tuple tuple) { String word = tuple.getString(0); Integer count = (Integer) wordCounts.get(word, 0); count++; wordCounts.put(word, count); collector.emit(new Values(word, count)); } }
  • 31.
    public class WordCountBoltextends BaseStatefulBolt<KeyValueState> { private KeyValueState wordCounts; private OutputCollector collector; public void prepare(Map conf, TopologyContext context, OutputCollector collector) { this.collector = collector; } public void initState(KeyValueState state) { this.wordCounts = state; } public void execute(Tuple tuple) { String word = tuple.getString(0); Integer count = (Integer) wordCounts.get(word, 0); count++; wordCounts.put(word, count); collector.emit(new Values(word, count)); } } Initialize State State Management
  • 32.
    public class WordCountBoltextends BaseStatefulBolt<KeyValueState> { private KeyValueState wordCounts; private OutputCollector collector; public void prepare(Map conf, TopologyContext context, OutputCollector collector) { this.collector = collector; } public void initState(KeyValueState state) { this.wordCounts = state; } public void execute(Tuple tuple) { String word = tuple.getString(0); Integer count = (Integer) wordCounts.get(word, 0); count++; wordCounts.put(word, count); collector.emit(new Values(word, count)); } } Read/Update State State Management
  • 33.
  • 34.
    Checkpointing/Snapshotting • Asynchronous BarrierSnapshotting (ABS) algorithm [1] • Chandy-Lamport Algorithm [2] [1] http://arxiv.org/pdf/1506.08603v1.pdf [2] http://research.microsoft.com/en-us/um/people/lamport/pubs/chandy.pdf State Management
  • 35.
    Checkpointing/Snapshotting: What yousee. Spout Stateful Bolt 1 Stateful Bolt 2Bolt Storm State Management execute/update state execute execute/update state
  • 36.
    Checkpointing/Snapshotting: What youget. Spout Stateful Bolt 1 Stateful Bolt 2 Checkpoint Spout ACKER State Store Bolt $chkpt $chkpt $chkpt ACK ACK ACK Storm State Management
  • 37.
  • 38.
    Automatic Back Pressure •In previous Storm versions, the only way to throttle topologies was to enable ACKing and set topology.spout.max.pending. • If you don’t require at-least-once guarantees, this imposed a significant performance penalty.** ** In Storm 1.0 this penalty is drastically reduced (more on this later)
  • 39.
    Automatic Backpressure • High/LowWatermarks (expressed as % of buffer size) • Back Pressure thread monitors buffers • If High Watermark reached, slow down Spouts • If Low Watermark reached, stop throttling • All Spouts Supported
  • 40.
  • 41.
    Resource Aware Scheduler •Specify the resource requirements (Memory/CPU) for individual topology components (Spouts/Bolts) • Memory: On-Heap / Off-Heap (if off-heap is used) • CPU: Point system based on number of cores • Resource requirements are per component instance (parallelism matters)
  • 42.
    Resource Aware Scheduler •CPU and Memory availability described in storm.yaml on each supervisor node. E.g.:
 
 supervisor.memory.capacity.mb: 3072.0
 supervisor.cpu.capacity: 400.0 • Convention for CPU capacity is to use 100 for each CPU core
  • 43.
    Resource Aware Scheduler Settingcomponent resource requirements: SpoutDeclarer spout = builder.setSpout("sp1", new TestSpout(), 10); //set cpu requirement spout.setCPULoad(20); //set onheap and offheap memory requirement spout.setMemoryLoad(64, 16); BoltDeclarer bolt1 = builder.setBolt("b1", new MyBolt(), 3).shuffleGrouping("sp1"); //sets cpu requirement. Not neccessary to set both CPU and memory. //For requirements not set, a default value will be used bolt1.setCPULoad(15); BoltDeclarer bolt2 = builder.setBolt("b2", new MyBolt(), 2).shuffleGrouping("b1"); bolt2.setMemoryLoad(100);
  • 44.
    Storm Usability Improvements EnhancedDebugging and Monitoring of Topologies
  • 45.
  • 46.
    Dynamic Log Levels •Set log level setting for a running topology • Via Storm UI and CLI • Optional timeout after which changes will be reverted • Logs searchable from Storm UI/Logviewer
  • 47.
  • 48.
    Dynamic Log Levels ViaStorm CLI: ./bin/storm set_log_level [topology name] -l [logger_name]=[LEVEL]:[TIMEOUT]
  • 49.
    Tuple Sampling • Nomore debug bolts or Trident functions! • In Storm UI: Select a Topology or component and click “Debug” • Specify a sampling percentage (% of tuples to be sampled) • Click on the “Events” link to view the sample log.
  • 50.
    Distributed Log Search •Search across all log files for a specific topology • Search in archived (ZIP) logs • Results include matches from all Supervisor nodes
  • 51.
    Dynamic Worker Profiling •Request worker profile data from Storm UI: • Heap Dumps • JStack Output • JProfile Recordings • Download generated files for off-line analysis • Restart workers from UI
  • 52.
    Supervisor Health Checks •Identify Supervisor nodes that are in a bad state • Automatically decommission bad nodes • Simple shell script • You define what constitutes “Unhealthy”
  • 53.
    New Integrations • Cassandra •Solr • Elastic Search • MQTT
  • 54.
    Integration Improvements • Kafka •HDFS Spout • Avro Integration for HDFS • HBase • Hive
  • 55.
  • 56.
  • 57.
    Up to 16xfaster throughput. Realistically 3x -- Highly dependent on use case and fault tolerance settings
  • 58.
    > 60% LatencyReduction
  • 59.
    Bear in mindperformance varies widely depending on the use case.
  • 60.
    Consider the originand motivation behind any third party benchmark.
  • 61.
    The most importantbenchmarks are the ones you do.
  • 62.
  • 63.
    Apache Storm v1.1.0 •Revamped metrics API • Focus on user-defined metrics • More metrics available in Storm UI • Enhanced metrics integration with Apache Ambari
  • 64.
    What’s next forStorm? 2.0 and Beyond
  • 65.
    Clojure to Java Broadeningthe contributor base
  • 66.
    Clojure to Java AlibabaJStorm Contribution
  • 67.
  • 68.
    Apache Beam (incubating)Integration • Unified API for dealing with bounded/ unbounded data sources (i.e. batch/ streaming) • One API. Multiple implementations (execution engines). Called “Runners” in Beamspeak.
  • 69.
  • 70.
    Twitter Heron vs.Storm • Heron benchmarked against pre-Apache WIP version of Storm. • 2 years late to the game 
 (the Apache Storm community has been anything but idle) • Storm is now ahead in terms of features and on par wrt performance. • Apache is about collaboration, and the Storm community is committed to advancing innovation in stream processing.
  • 71.
    Is Apache StormDead? Competitors may say so. But hundreds of successful production deployments, and a vibrant, growing developer community tell a very different story.
  • 72.
    Thank you! Questions? P. TaylorGoetz, Hortonworks @ptgoetz