SlideShare a Scribd company logo
1 of 65
The Future of Apache Storm
Hadoop Summit 2016 - Dublin
P. Taylor Goetz, Hortonworks
@ptgoetz
About Me
• Tech Staff @ Hortonworks
• PMC Chair, Apache Storm
• ASF Member
• PMC, Apache Incubator
• PMC, Apache Arrow
• PMC, Apache Kylin
• Mentor/PPMC, Apache Eagle (Incubating)
• Mentor/PPMC, Apache Metron (Incubating)
• Mentor/PPMC, Apache Apex (Incubating)
Apache Storm 0.9.x
Storm moves to Apache
Apache Storm 0.9.x
• First official Apache Release
• Storm becomes an Apache TLP
• 0mq to Netty for inter-worker communication
• Expanded Integration (Kafka, HDFS, HBase)
• Dependency conflict reduction (It was a start ;) )
Apache Storm 0.10.x
Enterprise Readiness
Apache Storm 0.10.x
• Security, Multi-Tenancy
• Enable Rolling Upgrades
• Flux (declarative topology wiring/configuration)
• Partial Key Groupings
Apache Storm 0.10.x
• Improved logging (Log4j 2)
• Streaming Ingest to Apache Hive
• Azure Event Hubs Integration
• Redis Integration
• JDBC Integration
Apache Storm 1.0
Maturity and Improved Performance
Release Date: April 12, 2016
Pacemaker
Heartbeat Server
Pacemaker
• Replaces Zookeeper for Heartbeats
• In-Memory key-value store
• Allows Scaling to 2k-3k+ Nodes
• Secure: Kerberos/Digest Authentication
Pacemaker
• Compared to Zookeeper:
• Less Memory/CPU
• No Disk
• Spared the overhead of maintaining consistency
Distributed Cache API
Distributed Cache API
• Topology resources:
• Dictionaries, ML Models, Geolocation Data, etc.
• Typically packaged in topology jar
• Fine for small files
• Large files negatively impact topology startup time
• Immutable: Changes require repackaging and deployment
Distributed Cache API
• Allows sharing of files (BLOBs) among topologies
• Files can be updated from the command line
• Allows for files from several KB to several GB in size
• Files can change over the lifetime of the topology
• Allows for compression (e.g. zip, tar, gzip)
Distributed Cache API
• Two implementations: LocalFsBlobStore and HdfsBlobStore
• Local implementation supports Replication Factor (not needed for
HDFS-backed implementation)
• Both support ACLs
Distributed Cache API
Creating a blob:
storm blobstore create --file dict.txt --acl o::rwa
--repl-fctr 2 key1
Making it available to a topology:
storm jar topo.jar my.topo.Class test_topo -c
topology.blobstore.map=‘{"key1":{"localname":"dict.t
xt", "uncompress":"false"}}'
High Availability Nimbus
Before HA Nimbus
HA Nimbus
HA Nimbus - Failover
HA Nimbus - Failover
HA Nimbus
• Increase overall availability of Nimbus
• Nimbus hosts can join/leave at any time
• Leverages Distributed Cache API
• Topology JAR, Config, and Serialized Topology uploaded to
Distributed Cache
• Replication guarantees availability of all files
Native Streaming Windows
Streaming Windows
• Specify Length - Duration or Tuple Count
• Slide Interval - How often to advance the window
Sliding Windows
Windows can overlap
Tumbling Windows
Windows do not overlap
Streaming Windows
• Timestamps (Event Time, Ingestion Time and Processing Time)
• Out of Order Tuples
• Watermarks
• Window State Checkpointing
Sate Management
Stateful Bolts with Automatic Checkpointing
What you see.
State Management
State Management
public class WordCountBolt extends BaseStatefulBolt<KeyValueState> {
private KeyValueState wordCounts;
private OutputCollector collector;
public void prepare(Map conf, TopologyContext context, OutputCollector collector) {
this.collector = collector;
}
public void initState(KeyValueState state) {
this.wordCounts = state;
}
public void execute(Tuple tuple) {
String word = tuple.getString(0);
Integer count = (Integer) wordCounts.get(word, 0);
count++;
wordCounts.put(word, count);
collector.emit(new Values(word, count));
}
}
public class WordCountBolt extends BaseStatefulBolt<KeyValueState> {
private KeyValueState wordCounts;
private OutputCollector collector;
public void prepare(Map conf, TopologyContext context, OutputCollector collector) {
this.collector = collector;
}
public void initState(KeyValueState state) {
this.wordCounts = state;
}
public void execute(Tuple tuple) {
String word = tuple.getString(0);
Integer count = (Integer) wordCounts.get(word, 0);
count++;
wordCounts.put(word, count);
collector.emit(new Values(word, count));
}
}
Initialize State
State Management
public class WordCountBolt extends BaseStatefulBolt<KeyValueState> {
private KeyValueState wordCounts;
private OutputCollector collector;
public void prepare(Map conf, TopologyContext context, OutputCollector collector) {
this.collector = collector;
}
public void initState(KeyValueState state) {
this.wordCounts = state;
}
public void execute(Tuple tuple) {
String word = tuple.getString(0);
Integer count = (Integer) wordCounts.get(word, 0);
count++;
wordCounts.put(word, count);
collector.emit(new Values(word, count));
}
}
Read/Update State
State Management
Sate Management
Automatic Checkpointing
Checkpointing/Snapshotting
• Asynchronous Barrier Snapshotting (ABS) algorithm [1]
• Chandy-Lamport Algorithm [2]
[1] http://arxiv.org/pdf/1506.08603v1.pdf
[2] http://research.microsoft.com/en-us/um/people/lamport/pubs/chandy.pdf
State Management
Checkpointing/Snapshotting: What you see.
Storm State Management
execute/update state execute execute/update state
Checkpointing/Snapshotting: What you get.
Storm State Management
Automatic Back Pressure
Automatic Back Pressure
• In previous Storm versions, the only way to throttle topologies was to
enable ACKing and set topology.spout.max.pending.
• If you don’t require at-least-once guarantees, this imposed a
significant performance penalty.**
** In Storm 1.0 this penalty is drastically reduced (more on this later)
Automatic Backpressure
• High/Low Watermarks (expressed as % of buffer size)
• Back Pressure thread monitors buffers
• If High Watermark reached, slow down Spouts
• If Low Watermark reached, stop throttling
• All Spouts Supported
Resource Aware Scheduler
(RAS)
Resource Aware Scheduler
• Specify the resource requirements (Memory/CPU) for individual
topology components (Spouts/Bolts)
• Memory: On-Heap / Off-Heap (if off-heap is used)
• CPU: Point system based on number of cores
• Resource requirements are per component instance (parallelism
matters)
Resource Aware Scheduler
• CPU and Memory availability described in storm.yaml on each
supervisor node. E.g.:
supervisor.memory.capacity.mb: 3072.0
supervisor.cpu.capacity: 400.0
• Convention for CPU capacity is to use 100 for each CPU core
Resource Aware Scheduler
Setting component resource requirements:
SpoutDeclarer spout = builder.setSpout("sp1", new TestSpout(), 10);
//set cpu requirement
spout.setCPULoad(20);
//set onheap and offheap memory requirement
spout.setMemoryLoad(64, 16);
BoltDeclarer bolt1 = builder.setBolt("b1", new MyBolt(), 3).shuffleGrouping("sp1");
//sets cpu requirement. Not neccessary to set both CPU and memory.
//For requirements not set, a default value will be used
bolt1.setCPULoad(15);
BoltDeclarer bolt2 = builder.setBolt("b2", new MyBolt(), 2).shuffleGrouping("b1");
bolt2.setMemoryLoad(100);
Storm Usability Improvements
Enhanced Debugging and Monitoring of Topologies
Dynamic Log Level Settings
Dynamic Log Levels
• Set log level setting for a running topology
• Via Storm UI and CLI
• Optional timeout after which changes will be reverted
• Logs searchable from Storm UI/Logviewer
Dynamic Log Levels
Via Storm UI:
Dynamic Log Levels
Via Storm CLI:
./bin/storm set_log_level [topology name] -l
[logger_name]=[LEVEL]:[TIMEOUT]
Tuple Sampling
• No more debug bolts or Trident functions!
• In Storm UI: Select a Topology or component and click “Debug”
• Specify a sampling percentage (% of tuples to be sampled)
• Click on the “Events” link to view the sample log.
Distributed Log Search
• Search across all log files for a specific topology
• Search in archived (ZIP) logs
• Results include matches from all Supervisor nodes
Dynamic Worker Profiling
• Request worker profile data from Storm UI:
• Heap Dumps
• JStack Output
• JProfile Recordings
• Download generated files for off-line analysis
• Restart workers from UI
Supervisor Health Checks
• Identify Supervisor nodes that are in a bad state
• Automatically decommission bad nodes
• Simple shell script
• You define what constitutes “Unhealthy”
New Integrations
• Cassandra
• Solr
• Elastic Search
• MQTT
Integration Improvements
• Kafka
• HDFS Spout
• Avro Integration for HDFS
• HBase
• Hive
Before I forget...
Performance
Up to 16x faster throughput.
Realistically 3x -- Highly dependent on use case
> 60% Latency Reduction
Bear in mind performance varies
widely depending on the use case.
The most important benchmarks
are the ones you do.
Storm 2.0
Underway @ Apache
Clojure to Java
Broadening the contributor base
Clojure to Java
Alibaba JStorm Contribution
Questions?
Thank you!
P. Taylor Goetz, Hortonworks
@ptgoetz

More Related Content

What's hot

Experience with Kafka & Storm
Experience with Kafka & StormExperience with Kafka & Storm
Experience with Kafka & StormOtto Mok
 
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormDavorin Vukelic
 
Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Robert Evans
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormJohn Georgiadis
 
Storm-on-YARN: Convergence of Low-Latency and Big-Data
Storm-on-YARN: Convergence of Low-Latency and Big-DataStorm-on-YARN: Convergence of Low-Latency and Big-Data
Storm-on-YARN: Convergence of Low-Latency and Big-DataDataWorks Summit
 
Streaming and Messaging
Streaming and MessagingStreaming and Messaging
Streaming and MessagingXin Wang
 
Storm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationStorm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationnathanmarz
 
Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...
Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...
Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...Folio3 Software
 
Distributed real time stream processing- why and how
Distributed real time stream processing- why and howDistributed real time stream processing- why and how
Distributed real time stream processing- why and howPetr Zapletal
 
Learning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormLearning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormEugene Dvorkin
 
Real Time Data Streaming using Kafka & Storm
Real Time Data Streaming using Kafka & StormReal Time Data Streaming using Kafka & Storm
Real Time Data Streaming using Kafka & StormRan Silberman
 
Functional Comparison and Performance Evaluation of Streaming Frameworks
Functional Comparison and Performance Evaluation of Streaming FrameworksFunctional Comparison and Performance Evaluation of Streaming Frameworks
Functional Comparison and Performance Evaluation of Streaming FrameworksHuafeng Wang
 
Building Streaming Applications with Apache Storm 1.1
Building Streaming Applications with Apache Storm 1.1Building Streaming Applications with Apache Storm 1.1
Building Streaming Applications with Apache Storm 1.1Hugo Louro
 
Multi-tenant Apache Storm as a service
Multi-tenant Apache Storm as a serviceMulti-tenant Apache Storm as a service
Multi-tenant Apache Storm as a serviceRobert Evans
 
Spark vs storm
Spark vs stormSpark vs storm
Spark vs stormTrong Ton
 

What's hot (19)

The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
Yahoo compares Storm and Spark
Yahoo compares Storm and SparkYahoo compares Storm and Spark
Yahoo compares Storm and Spark
 
Resource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache StormResource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache Storm
 
Experience with Kafka & Storm
Experience with Kafka & StormExperience with Kafka & Storm
Experience with Kafka & Storm
 
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache Storm
 
Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and Storm
 
Storm-on-YARN: Convergence of Low-Latency and Big-Data
Storm-on-YARN: Convergence of Low-Latency and Big-DataStorm-on-YARN: Convergence of Low-Latency and Big-Data
Storm-on-YARN: Convergence of Low-Latency and Big-Data
 
Streaming and Messaging
Streaming and MessagingStreaming and Messaging
Streaming and Messaging
 
Storm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationStorm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computation
 
Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...
Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...
Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...
 
Distributed real time stream processing- why and how
Distributed real time stream processing- why and howDistributed real time stream processing- why and how
Distributed real time stream processing- why and how
 
Learning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormLearning Stream Processing with Apache Storm
Learning Stream Processing with Apache Storm
 
Storm and Cassandra
Storm and Cassandra Storm and Cassandra
Storm and Cassandra
 
Real Time Data Streaming using Kafka & Storm
Real Time Data Streaming using Kafka & StormReal Time Data Streaming using Kafka & Storm
Real Time Data Streaming using Kafka & Storm
 
Functional Comparison and Performance Evaluation of Streaming Frameworks
Functional Comparison and Performance Evaluation of Streaming FrameworksFunctional Comparison and Performance Evaluation of Streaming Frameworks
Functional Comparison and Performance Evaluation of Streaming Frameworks
 
Building Streaming Applications with Apache Storm 1.1
Building Streaming Applications with Apache Storm 1.1Building Streaming Applications with Apache Storm 1.1
Building Streaming Applications with Apache Storm 1.1
 
Multi-tenant Apache Storm as a service
Multi-tenant Apache Storm as a serviceMulti-tenant Apache Storm as a service
Multi-tenant Apache Storm as a service
 
Spark vs storm
Spark vs stormSpark vs storm
Spark vs storm
 

Similar to The Future of Apache Storm

Past, Present, and Future of Apache Storm
Past, Present, and Future of Apache StormPast, Present, and Future of Apache Storm
Past, Present, and Future of Apache StormP. Taylor Goetz
 
Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogJoe Stein
 
Apache Eagle: eBay构建开源分布式实时预警引擎实践
Apache Eagle: eBay构建开源分布式实时预警引擎实践Apache Eagle: eBay构建开源分布式实时预警引擎实践
Apache Eagle: eBay构建开源分布式实时预警引擎实践Hao Chen
 
StormCrawler at Bristech
StormCrawler at BristechStormCrawler at Bristech
StormCrawler at BristechJulien Nioche
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsthelabdude
 
April 2013 HUG: Storm and Hadoop - Convergence of Big-Data and Low-Latency Pr...
April 2013 HUG: Storm and Hadoop - Convergence of Big-Data and Low-Latency Pr...April 2013 HUG: Storm and Hadoop - Convergence of Big-Data and Low-Latency Pr...
April 2013 HUG: Storm and Hadoop - Convergence of Big-Data and Low-Latency Pr...Yahoo Developer Network
 
Chicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architectureChicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architectureRobert Metzger
 
Apache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San JoseApache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San JoseHao Chen
 
Docker presentasjon java bin
Docker presentasjon java binDocker presentasjon java bin
Docker presentasjon java binOlve Hansen
 
Apache Eagle in Action
Apache Eagle in ActionApache Eagle in Action
Apache Eagle in ActionHao Chen
 
Real-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpacesReal-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpacesOleksii Diagiliev
 
Kubernetes for the PHP developer
Kubernetes for the PHP developerKubernetes for the PHP developer
Kubernetes for the PHP developerPaul Czarkowski
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomyDongmin Yu
 
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,GrafanaPrometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,GrafanaSridhar Kumar N
 
Altinity Cluster Manager: ClickHouse Management for Kubernetes and Cloud
Altinity Cluster Manager: ClickHouse Management for Kubernetes and CloudAltinity Cluster Manager: ClickHouse Management for Kubernetes and Cloud
Altinity Cluster Manager: ClickHouse Management for Kubernetes and CloudAltinity Ltd
 
LibOS as a regression test framework for Linux networking #netdev1.1
LibOS as a regression test framework for Linux networking #netdev1.1LibOS as a regression test framework for Linux networking #netdev1.1
LibOS as a regression test framework for Linux networking #netdev1.1Hajime Tazaki
 
Apache Submarine: Unified Machine Learning Platform
Apache Submarine: Unified Machine Learning PlatformApache Submarine: Unified Machine Learning Platform
Apache Submarine: Unified Machine Learning PlatformWangda Tan
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Guido Schmutz
 

Similar to The Future of Apache Storm (20)

The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
Past, Present, and Future of Apache Storm
Past, Present, and Future of Apache StormPast, Present, and Future of Apache Storm
Past, Present, and Future of Apache Storm
 
Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit Log
 
Apache Eagle: eBay构建开源分布式实时预警引擎实践
Apache Eagle: eBay构建开源分布式实时预警引擎实践Apache Eagle: eBay构建开源分布式实时预警引擎实践
Apache Eagle: eBay构建开源分布式实时预警引擎实践
 
StormCrawler at Bristech
StormCrawler at BristechStormCrawler at Bristech
StormCrawler at Bristech
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
April 2013 HUG: Storm and Hadoop - Convergence of Big-Data and Low-Latency Pr...
April 2013 HUG: Storm and Hadoop - Convergence of Big-Data and Low-Latency Pr...April 2013 HUG: Storm and Hadoop - Convergence of Big-Data and Low-Latency Pr...
April 2013 HUG: Storm and Hadoop - Convergence of Big-Data and Low-Latency Pr...
 
Chicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architectureChicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architecture
 
Apache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real TimeApache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real Time
 
Apache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San JoseApache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San Jose
 
Docker presentasjon java bin
Docker presentasjon java binDocker presentasjon java bin
Docker presentasjon java bin
 
Apache Eagle in Action
Apache Eagle in ActionApache Eagle in Action
Apache Eagle in Action
 
Real-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpacesReal-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpaces
 
Kubernetes for the PHP developer
Kubernetes for the PHP developerKubernetes for the PHP developer
Kubernetes for the PHP developer
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
 
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,GrafanaPrometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
 
Altinity Cluster Manager: ClickHouse Management for Kubernetes and Cloud
Altinity Cluster Manager: ClickHouse Management for Kubernetes and CloudAltinity Cluster Manager: ClickHouse Management for Kubernetes and Cloud
Altinity Cluster Manager: ClickHouse Management for Kubernetes and Cloud
 
LibOS as a regression test framework for Linux networking #netdev1.1
LibOS as a regression test framework for Linux networking #netdev1.1LibOS as a regression test framework for Linux networking #netdev1.1
LibOS as a regression test framework for Linux networking #netdev1.1
 
Apache Submarine: Unified Machine Learning Platform
Apache Submarine: Unified Machine Learning PlatformApache Submarine: Unified Machine Learning Platform
Apache Submarine: Unified Machine Learning Platform
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 

More from P. Taylor Goetz

Flux: Apache Storm Frictionless Topology Configuration & Deployment
Flux: Apache Storm Frictionless Topology Configuration & DeploymentFlux: Apache Storm Frictionless Topology Configuration & Deployment
Flux: Apache Storm Frictionless Topology Configuration & DeploymentP. Taylor Goetz
 
From Device to Data Center to Insights: Architectural Considerations for the ...
From Device to Data Center to Insights: Architectural Considerations for the ...From Device to Data Center to Insights: Architectural Considerations for the ...
From Device to Data Center to Insights: Architectural Considerations for the ...P. Taylor Goetz
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphP. Taylor Goetz
 
Apache storm vs. Spark Streaming
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark StreamingP. Taylor Goetz
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureP. Taylor Goetz
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceP. Taylor Goetz
 

More from P. Taylor Goetz (6)

Flux: Apache Storm Frictionless Topology Configuration & Deployment
Flux: Apache Storm Frictionless Topology Configuration & DeploymentFlux: Apache Storm Frictionless Topology Configuration & Deployment
Flux: Apache Storm Frictionless Topology Configuration & Deployment
 
From Device to Data Center to Insights: Architectural Considerations for the ...
From Device to Data Center to Insights: Architectural Considerations for the ...From Device to Data Center to Insights: Architectural Considerations for the ...
From Device to Data Center to Insights: Architectural Considerations for the ...
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraph
 
Apache storm vs. Spark Streaming
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark Streaming
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm Architecture
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market Sceince
 

Recently uploaded

VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With RoomVIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Roomdivyansh0kumar0
 
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls KolkataLow Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceDelhi Call girls
 
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...tanu pandey
 
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl ServiceRussian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl Servicegwenoracqe6
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebJames Anderson
 
Russian Call girls in Dubai +971563133746 Dubai Call girls
Russian  Call girls in Dubai +971563133746 Dubai  Call girlsRussian  Call girls in Dubai +971563133746 Dubai  Call girls
Russian Call girls in Dubai +971563133746 Dubai Call girlsstephieert
 
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxellan12
 
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts servicevipmodelshub1
 
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine ServiceHot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Servicesexy call girls service in goa
 
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...APNIC
 
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Sheetaleventcompany
 
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call GirlVIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girladitipandeya
 
Networking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOGNetworking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOGAPNIC
 
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night StandHot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Standkumarajju5765
 
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 

Recently uploaded (20)

VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With RoomVIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Room
 
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls KolkataLow Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
 
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
 
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl ServiceRussian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
 
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
 
Russian Call girls in Dubai +971563133746 Dubai Call girls
Russian  Call girls in Dubai +971563133746 Dubai  Call girlsRussian  Call girls in Dubai +971563133746 Dubai  Call girls
Russian Call girls in Dubai +971563133746 Dubai Call girls
 
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
 
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
 
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine ServiceHot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
 
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
 
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
 
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call GirlVIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
 
Networking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOGNetworking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOG
 
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night StandHot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
 
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
 
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
 

The Future of Apache Storm

  • 1. The Future of Apache Storm Hadoop Summit 2016 - Dublin P. Taylor Goetz, Hortonworks @ptgoetz
  • 2. About Me • Tech Staff @ Hortonworks • PMC Chair, Apache Storm • ASF Member • PMC, Apache Incubator • PMC, Apache Arrow • PMC, Apache Kylin • Mentor/PPMC, Apache Eagle (Incubating) • Mentor/PPMC, Apache Metron (Incubating) • Mentor/PPMC, Apache Apex (Incubating)
  • 3. Apache Storm 0.9.x Storm moves to Apache
  • 4. Apache Storm 0.9.x • First official Apache Release • Storm becomes an Apache TLP • 0mq to Netty for inter-worker communication • Expanded Integration (Kafka, HDFS, HBase) • Dependency conflict reduction (It was a start ;) )
  • 6. Apache Storm 0.10.x • Security, Multi-Tenancy • Enable Rolling Upgrades • Flux (declarative topology wiring/configuration) • Partial Key Groupings
  • 7. Apache Storm 0.10.x • Improved logging (Log4j 2) • Streaming Ingest to Apache Hive • Azure Event Hubs Integration • Redis Integration • JDBC Integration
  • 8. Apache Storm 1.0 Maturity and Improved Performance Release Date: April 12, 2016
  • 10. Pacemaker • Replaces Zookeeper for Heartbeats • In-Memory key-value store • Allows Scaling to 2k-3k+ Nodes • Secure: Kerberos/Digest Authentication
  • 11. Pacemaker • Compared to Zookeeper: • Less Memory/CPU • No Disk • Spared the overhead of maintaining consistency
  • 13. Distributed Cache API • Topology resources: • Dictionaries, ML Models, Geolocation Data, etc. • Typically packaged in topology jar • Fine for small files • Large files negatively impact topology startup time • Immutable: Changes require repackaging and deployment
  • 14. Distributed Cache API • Allows sharing of files (BLOBs) among topologies • Files can be updated from the command line • Allows for files from several KB to several GB in size • Files can change over the lifetime of the topology • Allows for compression (e.g. zip, tar, gzip)
  • 15. Distributed Cache API • Two implementations: LocalFsBlobStore and HdfsBlobStore • Local implementation supports Replication Factor (not needed for HDFS-backed implementation) • Both support ACLs
  • 16. Distributed Cache API Creating a blob: storm blobstore create --file dict.txt --acl o::rwa --repl-fctr 2 key1 Making it available to a topology: storm jar topo.jar my.topo.Class test_topo -c topology.blobstore.map=‘{"key1":{"localname":"dict.t xt", "uncompress":"false"}}'
  • 20. HA Nimbus - Failover
  • 21. HA Nimbus - Failover
  • 22. HA Nimbus • Increase overall availability of Nimbus • Nimbus hosts can join/leave at any time • Leverages Distributed Cache API • Topology JAR, Config, and Serialized Topology uploaded to Distributed Cache • Replication guarantees availability of all files
  • 24. Streaming Windows • Specify Length - Duration or Tuple Count • Slide Interval - How often to advance the window
  • 27. Streaming Windows • Timestamps (Event Time, Ingestion Time and Processing Time) • Out of Order Tuples • Watermarks • Window State Checkpointing
  • 28. Sate Management Stateful Bolts with Automatic Checkpointing
  • 29. What you see. State Management
  • 30. State Management public class WordCountBolt extends BaseStatefulBolt<KeyValueState> { private KeyValueState wordCounts; private OutputCollector collector; public void prepare(Map conf, TopologyContext context, OutputCollector collector) { this.collector = collector; } public void initState(KeyValueState state) { this.wordCounts = state; } public void execute(Tuple tuple) { String word = tuple.getString(0); Integer count = (Integer) wordCounts.get(word, 0); count++; wordCounts.put(word, count); collector.emit(new Values(word, count)); } }
  • 31. public class WordCountBolt extends BaseStatefulBolt<KeyValueState> { private KeyValueState wordCounts; private OutputCollector collector; public void prepare(Map conf, TopologyContext context, OutputCollector collector) { this.collector = collector; } public void initState(KeyValueState state) { this.wordCounts = state; } public void execute(Tuple tuple) { String word = tuple.getString(0); Integer count = (Integer) wordCounts.get(word, 0); count++; wordCounts.put(word, count); collector.emit(new Values(word, count)); } } Initialize State State Management
  • 32. public class WordCountBolt extends BaseStatefulBolt<KeyValueState> { private KeyValueState wordCounts; private OutputCollector collector; public void prepare(Map conf, TopologyContext context, OutputCollector collector) { this.collector = collector; } public void initState(KeyValueState state) { this.wordCounts = state; } public void execute(Tuple tuple) { String word = tuple.getString(0); Integer count = (Integer) wordCounts.get(word, 0); count++; wordCounts.put(word, count); collector.emit(new Values(word, count)); } } Read/Update State State Management
  • 34. Checkpointing/Snapshotting • Asynchronous Barrier Snapshotting (ABS) algorithm [1] • Chandy-Lamport Algorithm [2] [1] http://arxiv.org/pdf/1506.08603v1.pdf [2] http://research.microsoft.com/en-us/um/people/lamport/pubs/chandy.pdf State Management
  • 35. Checkpointing/Snapshotting: What you see. Storm State Management execute/update state execute execute/update state
  • 36. Checkpointing/Snapshotting: What you get. Storm State Management
  • 38. Automatic Back Pressure • In previous Storm versions, the only way to throttle topologies was to enable ACKing and set topology.spout.max.pending. • If you don’t require at-least-once guarantees, this imposed a significant performance penalty.** ** In Storm 1.0 this penalty is drastically reduced (more on this later)
  • 39. Automatic Backpressure • High/Low Watermarks (expressed as % of buffer size) • Back Pressure thread monitors buffers • If High Watermark reached, slow down Spouts • If Low Watermark reached, stop throttling • All Spouts Supported
  • 41. Resource Aware Scheduler • Specify the resource requirements (Memory/CPU) for individual topology components (Spouts/Bolts) • Memory: On-Heap / Off-Heap (if off-heap is used) • CPU: Point system based on number of cores • Resource requirements are per component instance (parallelism matters)
  • 42. Resource Aware Scheduler • CPU and Memory availability described in storm.yaml on each supervisor node. E.g.: supervisor.memory.capacity.mb: 3072.0 supervisor.cpu.capacity: 400.0 • Convention for CPU capacity is to use 100 for each CPU core
  • 43. Resource Aware Scheduler Setting component resource requirements: SpoutDeclarer spout = builder.setSpout("sp1", new TestSpout(), 10); //set cpu requirement spout.setCPULoad(20); //set onheap and offheap memory requirement spout.setMemoryLoad(64, 16); BoltDeclarer bolt1 = builder.setBolt("b1", new MyBolt(), 3).shuffleGrouping("sp1"); //sets cpu requirement. Not neccessary to set both CPU and memory. //For requirements not set, a default value will be used bolt1.setCPULoad(15); BoltDeclarer bolt2 = builder.setBolt("b2", new MyBolt(), 2).shuffleGrouping("b1"); bolt2.setMemoryLoad(100);
  • 44. Storm Usability Improvements Enhanced Debugging and Monitoring of Topologies
  • 45. Dynamic Log Level Settings
  • 46. Dynamic Log Levels • Set log level setting for a running topology • Via Storm UI and CLI • Optional timeout after which changes will be reverted • Logs searchable from Storm UI/Logviewer
  • 48. Dynamic Log Levels Via Storm CLI: ./bin/storm set_log_level [topology name] -l [logger_name]=[LEVEL]:[TIMEOUT]
  • 49. Tuple Sampling • No more debug bolts or Trident functions! • In Storm UI: Select a Topology or component and click “Debug” • Specify a sampling percentage (% of tuples to be sampled) • Click on the “Events” link to view the sample log.
  • 50. Distributed Log Search • Search across all log files for a specific topology • Search in archived (ZIP) logs • Results include matches from all Supervisor nodes
  • 51. Dynamic Worker Profiling • Request worker profile data from Storm UI: • Heap Dumps • JStack Output • JProfile Recordings • Download generated files for off-line analysis • Restart workers from UI
  • 52. Supervisor Health Checks • Identify Supervisor nodes that are in a bad state • Automatically decommission bad nodes • Simple shell script • You define what constitutes “Unhealthy”
  • 53. New Integrations • Cassandra • Solr • Elastic Search • MQTT
  • 54. Integration Improvements • Kafka • HDFS Spout • Avro Integration for HDFS • HBase • Hive
  • 57. Up to 16x faster throughput. Realistically 3x -- Highly dependent on use case
  • 58. > 60% Latency Reduction
  • 59. Bear in mind performance varies widely depending on the use case.
  • 60. The most important benchmarks are the ones you do.
  • 62. Clojure to Java Broadening the contributor base
  • 63. Clojure to Java Alibaba JStorm Contribution
  • 65. Thank you! P. Taylor Goetz, Hortonworks @ptgoetz