Building Streaming Applications with Apache Storm 1.1

Building Streaming Applications
With
Apache Storm 1.1
Meetup
Hortonworks, April 20th, 2017

Presenters
• Sriharsha Chintalapani, Storm & Kafka
Committer, PMC @ Hortonworks
• Karthik Deivasigamani, Walmart Labs
• Roshan Naik, Storm Contributor, Flume
Commiter @ Hortonworks
• Hugo Louro, Storm Committer, PMC @
Hortonworks

Apache Storm Brief History
• 2010 - First Streaming Framework - Backtype
• 2011 – Acquired by and Deployed at Twitter
• 2013 - Open Sourced into Apache
• Present – Large Scale Production Deployments
– Yahoo 3500+ Nodes
– Alibaba 1PB of Data per Day

Prior Releases Highlights
• 0.9.x
• Storm becomes an Apache TLP
• First Official Apache Release
• Expanded Kafka, HDFS, HBase Integration
• 0.10.x
• Multi Tenancy
• Rolling Upgrades
• Improved Logging (Log4j2)
• JDBC, Event Hubs, Hive Integration

• 1.0
– Pacemaker (Replaces Zookeeper for Heartbeats)
– Security (Kerberos/Digest Authentication)
– Nimbus HA (Eliminates Single Point of Failure)
– Supervisor Health Checks
– Resource Aware Scheduler

• 1.0
– Stateful Bolts
– Automatic Checkpointing/Snapshots
• ABS [2], Chandy-Lamport [3] Algorithms
– Streaming Windows
• Sliding, Tumbling, Watermarks, Out of Order Tuples
– Dynamic Log Levels
– Distributed Log Search
– Worker Profiling
– Solr, Cassandra, Elastic Search, MQTT Integration

Apache Storm 1.1.0
March 29, 2017
• Streaming SQL
• Improved Apache Kafka Integration
• PMML Support (Machine Learning)
• Druid Integration
• OpenTSDB Integration

Apache Storm 1.1.0
March 29, 2017
• AWS Kinesis Support
• HDFS Spout
• Other Enhancements
–Flux
–Topology Deployment
–Resource Aware Scheduler

Streaming SQL
• Apache Calcite for Query Parsing/Planning
• Define Topology Using SQL Like Query
• SQL Compiled and Transformed onto a Trident
Topology
• Streaming Onto/From Arbitrary Data Sources
– Kafka, Redis, HDFS, MongoDB
– Extensible Implementing ISqlTridentDataSource

Streaming SQL
• Tuple Filtering
• Projections
• CSV, TSV, and Avro input/output formats
• User Defined Functions (UDFs)
• User fine control of Parallelism of Generated
Components

Streaming SQL – Example [1]
• Read Apache HTTPD server logs from Kafka
• Filter out everything but error log events
• Write the error events onto a Kafka topic

Improved Apache Kafka Integration
• Enhanced configuration API
• Support Consumer Groups
• Pluggable Translators Kafka Record ->Tuple
• Support for Topic Wildcards
• Support Multiple Streams, Topics/Stream
• Trident Kafka supporting Kafka 0.10 onwards
• Integrates with Secure Kafka Environments

Improved Apache Kafka Integration

PMML Support (Machine Learning)
• Predictive Model Markup Language
• Describes Model Learned by ML algorithms
• PmmlPredictorBolt Computes Predicted Scores
for Live Tuples according to PMML Model
• PMML Model Uploaded or Downloaded from
Distributed Cache

PMML Support (Machine Learning)

Storm 1.1.0 Improvements
• Flux
– Visualization in Storm UI
• Specify the resource requirements (Memory/CPU) for
individual topology components (Spouts/Bolts)
• Topology Deployment
– Alternative to Uber Jar
– storm jar --jars /path/to/local/jar --artifacts `resolve Maven
dependencies` -- arfifactRepository `additional Maven
repos`
• Specify the resource requirements (Memory/CPU) for
individual topology components (Spouts/Bolts)

Try Storm 1.1.0
https://hortonworks.com/hadoop-tutorial/processing-
trucking-iot-data-with-apache-storm/

Apache Storm 2.0
• Storm Code entirely in Java (no more Clojure)
• Performance Improvements
• Worker/Threading Model Redesign
• Apache Beam Integration
• Bounded Spouts
• Metrics Enhancements
• Worker-Classloader Isolation
• Improved Backpressure
• Dynamic Topology Updates

References
• [1] Taylor Goetz Presentation @ DataWorks/Hadoop Summit, Munich 2017
• [2] http://arxiv.org/pdf/1506.08603v1.pdf
• [3] http://research.microsoft.com/en-us/um/people/lamport/pubs/chandy.pdf

Building Streaming Applications with Apache Storm 1.1

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Building Streaming Applications with Apache Storm 1.1

Similar to Building Streaming Applications with Apache Storm 1.1 (20)

Recently uploaded

Recently uploaded (20)

Building Streaming Applications with Apache Storm 1.1