SlideShare a Scribd company logo
1 of 22
Building Streaming Applications
With
Apache Storm 1.1
Meetup
Hortonworks, April 20th, 2017
Presenters
• Sriharsha Chintalapani, Storm & Kafka
Committer, PMC @ Hortonworks
• Karthik Deivasigamani, Walmart Labs
• Roshan Naik, Storm Contributor, Flume
Commiter @ Hortonworks
• Hugo Louro, Storm Committer, PMC @
Hortonworks
Apache Storm
Apache Storm Brief History
• 2010 - First Streaming Framework - Backtype
• 2011 – Acquired by and Deployed at Twitter
• 2013 - Open Sourced into Apache
• Present – Large Scale Production Deployments
– Yahoo 3500+ Nodes
– Alibaba 1PB of Data per Day
Prior Releases Highlights
• 0.9.x
• Storm becomes an Apache TLP
• First Official Apache Release
• Expanded Kafka, HDFS, HBase Integration
• 0.10.x
• Multi Tenancy
• Rolling Upgrades
• Improved Logging (Log4j2)
• JDBC, Event Hubs, Hive Integration
Prior Releases Highlights
• 1.0
– Pacemaker (Replaces Zookeeper for Heartbeats)
– Security (Kerberos/Digest Authentication)
– Nimbus HA (Eliminates Single Point of Failure)
– Supervisor Health Checks
– Resource Aware Scheduler
Prior Releases Highlights
• 1.0
– Stateful Bolts
– Automatic Checkpointing/Snapshots
• ABS [2], Chandy-Lamport [3] Algorithms
– Streaming Windows
• Sliding, Tumbling, Watermarks, Out of Order Tuples
– Dynamic Log Levels
– Distributed Log Search
– Worker Profiling
– Solr, Cassandra, Elastic Search, MQTT Integration
Apache Storm 1.1.0
March 29, 2017
• Streaming SQL
• Improved Apache Kafka Integration
• PMML Support (Machine Learning)
• Druid Integration
• OpenTSDB Integration
Apache Storm 1.1.0
March 29, 2017
• AWS Kinesis Support
• HDFS Spout
• Other Enhancements
–Flux
–Topology Deployment
–Resource Aware Scheduler
Streaming SQL
• Apache Calcite for Query Parsing/Planning
• Define Topology Using SQL Like Query
• SQL Compiled and Transformed onto a Trident
Topology
• Streaming Onto/From Arbitrary Data Sources
– Kafka, Redis, HDFS, MongoDB
– Extensible Implementing ISqlTridentDataSource
Streaming SQL
• Tuple Filtering
• Projections
• CSV, TSV, and Avro input/output formats
• User Defined Functions (UDFs)
• User fine control of Parallelism of Generated
Components
Streaming SQL - Aggregate UDF
Streaming SQL – Example [1]
• Read Apache HTTPD server logs from Kafka
• Filter out everything but error log events
• Write the error events onto a Kafka topic
Streaming SQL – Example [1]
Improved Apache Kafka Integration
• Enhanced configuration API
• Support Consumer Groups
• Pluggable Translators Kafka Record ->Tuple
• Support for Topic Wildcards
• Support Multiple Streams, Topics/Stream
• Trident Kafka supporting Kafka 0.10 onwards
• Integrates with Secure Kafka Environments
Improved Apache Kafka Integration
PMML Support (Machine Learning)
• Predictive Model Markup Language
• Describes Model Learned by ML algorithms
• PmmlPredictorBolt Computes Predicted Scores
for Live Tuples according to PMML Model
• PMML Model Uploaded or Downloaded from
Distributed Cache
PMML Support (Machine Learning)
Storm 1.1.0 Improvements
• Flux
– Visualization in Storm UI
• Specify the resource requirements (Memory/CPU) for
individual topology components (Spouts/Bolts)
• Topology Deployment
– Alternative to Uber Jar
– storm jar --jars /path/to/local/jar --artifacts `resolve Maven
dependencies` -- arfifactRepository `additional Maven
repos`
• Specify the resource requirements (Memory/CPU) for
individual topology components (Spouts/Bolts)
Try Storm 1.1.0
https://hortonworks.com/hadoop-tutorial/processing-
trucking-iot-data-with-apache-storm/
Apache Storm 2.0
• Storm Code entirely in Java (no more Clojure)
• Performance Improvements
• Worker/Threading Model Redesign
• Apache Beam Integration
• Bounded Spouts
• Metrics Enhancements
• Worker-Classloader Isolation
• Improved Backpressure
• Dynamic Topology Updates
References
• [1] Taylor Goetz Presentation @ DataWorks/Hadoop Summit, Munich 2017
• [2] http://arxiv.org/pdf/1506.08603v1.pdf
• [3] http://research.microsoft.com/en-us/um/people/lamport/pubs/chandy.pdf

More Related Content

What's hot

Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Michael Noll
 
Fraud Detection for Israel BigThings Meetup
Fraud Detection  for Israel BigThings MeetupFraud Detection  for Israel BigThings Meetup
Fraud Detection for Israel BigThings MeetupGwen (Chen) Shapira
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014P. Taylor Goetz
 
Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016Gwen (Chen) Shapira
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataRahul Jain
 
Kafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupKafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupGwen (Chen) Shapira
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache StormP. Taylor Goetz
 
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Erik Onnen
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Amazon Web Services
 
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...Michael Noll
 
Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...
Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...
Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...Chris Fregly
 
Tuning and Monitoring Deep Learning on Apache Spark
Tuning and Monitoring Deep Learning on Apache SparkTuning and Monitoring Deep Learning on Apache Spark
Tuning and Monitoring Deep Learning on Apache SparkDatabricks
 
Stream processing using Apache Storm - Big Data Meetup Athens 2016
Stream processing using Apache Storm - Big Data Meetup Athens 2016Stream processing using Apache Storm - Big Data Meetup Athens 2016
Stream processing using Apache Storm - Big Data Meetup Athens 2016Adrianos Dadis
 
Real Time Data Streaming using Kafka & Storm
Real Time Data Streaming using Kafka & StormReal Time Data Streaming using Kafka & Storm
Real Time Data Streaming using Kafka & StormRan Silberman
 
Kafka and Spark Streaming
Kafka and Spark StreamingKafka and Spark Streaming
Kafka and Spark Streamingdatamantra
 
Real-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormReal-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormNati Shalom
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaJoe Stein
 

What's hot (20)

Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
 
Fraud Detection for Israel BigThings Meetup
Fraud Detection  for Israel BigThings MeetupFraud Detection  for Israel BigThings Meetup
Fraud Detection for Israel BigThings Meetup
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014
 
Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big Data
 
Kafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupKafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka Meetup
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
 
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
 
Spark streaming + kafka 0.10
Spark streaming + kafka 0.10Spark streaming + kafka 0.10
Spark streaming + kafka 0.10
 
Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...
Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...
Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...
 
Tuning and Monitoring Deep Learning on Apache Spark
Tuning and Monitoring Deep Learning on Apache SparkTuning and Monitoring Deep Learning on Apache Spark
Tuning and Monitoring Deep Learning on Apache Spark
 
Stream processing using Apache Storm - Big Data Meetup Athens 2016
Stream processing using Apache Storm - Big Data Meetup Athens 2016Stream processing using Apache Storm - Big Data Meetup Athens 2016
Stream processing using Apache Storm - Big Data Meetup Athens 2016
 
Real Time Data Streaming using Kafka & Storm
Real Time Data Streaming using Kafka & StormReal Time Data Streaming using Kafka & Storm
Real Time Data Streaming using Kafka & Storm
 
kafka for db as postgres
kafka for db as postgreskafka for db as postgres
kafka for db as postgres
 
Kafka and Spark Streaming
Kafka and Spark StreamingKafka and Spark Streaming
Kafka and Spark Streaming
 
Real-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormReal-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using Storm
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
 
Apache Storm In Retail Context
Apache Storm In Retail ContextApache Storm In Retail Context
Apache Storm In Retail Context
 

Similar to Building Streaming Applications with Apache Storm 1.1

Past, Present, and Future of Apache Storm
Past, Present, and Future of Apache StormPast, Present, and Future of Apache Storm
Past, Present, and Future of Apache StormP. Taylor Goetz
 
Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive Hortonworks
 
Cloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureCloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureTimothy Spann
 
messaging.pptx
messaging.pptxmessaging.pptx
messaging.pptxNParakh1
 
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Using Apache NiFi with Apache Pulsar for Fast Data On-RampUsing Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Using Apache NiFi with Apache Pulsar for Fast Data On-RampTimothy Spann
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
 
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar SeriesIntroducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar SeriesAmazon Web Services
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopEvans Ye
 
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014Chris Fregly
 
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeTimothy Spann
 
Introduction to Apache NiFi And Storm
Introduction to Apache NiFi And StormIntroduction to Apache NiFi And Storm
Introduction to Apache NiFi And StormJungtaek Lim
 
PMIx Tiered Storage Support
PMIx Tiered Storage SupportPMIx Tiered Storage Support
PMIx Tiered Storage Supportrcastain
 
Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of HortonworksStinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of HortonworksData Con LA
 
What's New in IBM Streams V4.1
What's New in IBM Streams V4.1What's New in IBM Streams V4.1
What's New in IBM Streams V4.1lisanl
 
Search On Hadoop Frontier Meetup
Search On Hadoop Frontier MeetupSearch On Hadoop Frontier Meetup
Search On Hadoop Frontier Meetupgregchanan
 
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Kai Wähner
 
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis
 
DEVNET-1106 Upcoming Services in OpenStack
DEVNET-1106	Upcoming Services in OpenStackDEVNET-1106	Upcoming Services in OpenStack
DEVNET-1106 Upcoming Services in OpenStackCisco DevNet
 
Apache Content Technologies
Apache Content TechnologiesApache Content Technologies
Apache Content Technologiesgagravarr
 

Similar to Building Streaming Applications with Apache Storm 1.1 (20)

Past, Present, and Future of Apache Storm
Past, Present, and Future of Apache StormPast, Present, and Future of Apache Storm
Past, Present, and Future of Apache Storm
 
Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive
 
Cloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureCloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azure
 
messaging.pptx
messaging.pptxmessaging.pptx
messaging.pptx
 
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Using Apache NiFi with Apache Pulsar for Fast Data On-RampUsing Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar SeriesIntroducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache Bigtop
 
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
 
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lake
 
Introduction to Apache NiFi And Storm
Introduction to Apache NiFi And StormIntroduction to Apache NiFi And Storm
Introduction to Apache NiFi And Storm
 
PMIx Tiered Storage Support
PMIx Tiered Storage SupportPMIx Tiered Storage Support
PMIx Tiered Storage Support
 
Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of HortonworksStinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of Hortonworks
 
What's New in IBM Streams V4.1
What's New in IBM Streams V4.1What's New in IBM Streams V4.1
What's New in IBM Streams V4.1
 
Search On Hadoop Frontier Meetup
Search On Hadoop Frontier MeetupSearch On Hadoop Frontier Meetup
Search On Hadoop Frontier Meetup
 
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
 
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
 
DEVNET-1106 Upcoming Services in OpenStack
DEVNET-1106	Upcoming Services in OpenStackDEVNET-1106	Upcoming Services in OpenStack
DEVNET-1106 Upcoming Services in OpenStack
 
Apache Content Technologies
Apache Content TechnologiesApache Content Technologies
Apache Content Technologies
 

Recently uploaded

Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxDilipVasan
 
Data analytics courses in Nepal Presentation
Data analytics courses in Nepal PresentationData analytics courses in Nepal Presentation
Data analytics courses in Nepal Presentationanshikakulshreshtha11
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Calllward7
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfscitechtalktv
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理pyhepag
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理pyhepag
 
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdfGenerative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdfEmmanuel Dauda
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Jon Hansen
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp onlinebalibahu1313
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理pyhepag
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理cyebo
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonPayment Village
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsCEPTES Software Inc
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理pyhepag
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyRafigAliyev2
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfMichaelSenkow
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdfvyankatesh1
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group MeetingAlison Pitt
 

Recently uploaded (20)

Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
 
Data analytics courses in Nepal Presentation
Data analytics courses in Nepal PresentationData analytics courses in Nepal Presentation
Data analytics courses in Nepal Presentation
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdf
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
 
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdfGenerative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
 
Machine Learning for Accident Severity Prediction
Machine Learning for Accident Severity PredictionMachine Learning for Accident Severity Prediction
Machine Learning for Accident Severity Prediction
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp online
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prison
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertainty
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdf
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdf
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting
 

Building Streaming Applications with Apache Storm 1.1

  • 1. Building Streaming Applications With Apache Storm 1.1 Meetup Hortonworks, April 20th, 2017
  • 2. Presenters • Sriharsha Chintalapani, Storm & Kafka Committer, PMC @ Hortonworks • Karthik Deivasigamani, Walmart Labs • Roshan Naik, Storm Contributor, Flume Commiter @ Hortonworks • Hugo Louro, Storm Committer, PMC @ Hortonworks
  • 4. Apache Storm Brief History • 2010 - First Streaming Framework - Backtype • 2011 – Acquired by and Deployed at Twitter • 2013 - Open Sourced into Apache • Present – Large Scale Production Deployments – Yahoo 3500+ Nodes – Alibaba 1PB of Data per Day
  • 5. Prior Releases Highlights • 0.9.x • Storm becomes an Apache TLP • First Official Apache Release • Expanded Kafka, HDFS, HBase Integration • 0.10.x • Multi Tenancy • Rolling Upgrades • Improved Logging (Log4j2) • JDBC, Event Hubs, Hive Integration
  • 6. Prior Releases Highlights • 1.0 – Pacemaker (Replaces Zookeeper for Heartbeats) – Security (Kerberos/Digest Authentication) – Nimbus HA (Eliminates Single Point of Failure) – Supervisor Health Checks – Resource Aware Scheduler
  • 7. Prior Releases Highlights • 1.0 – Stateful Bolts – Automatic Checkpointing/Snapshots • ABS [2], Chandy-Lamport [3] Algorithms – Streaming Windows • Sliding, Tumbling, Watermarks, Out of Order Tuples – Dynamic Log Levels – Distributed Log Search – Worker Profiling – Solr, Cassandra, Elastic Search, MQTT Integration
  • 8. Apache Storm 1.1.0 March 29, 2017 • Streaming SQL • Improved Apache Kafka Integration • PMML Support (Machine Learning) • Druid Integration • OpenTSDB Integration
  • 9. Apache Storm 1.1.0 March 29, 2017 • AWS Kinesis Support • HDFS Spout • Other Enhancements –Flux –Topology Deployment –Resource Aware Scheduler
  • 10. Streaming SQL • Apache Calcite for Query Parsing/Planning • Define Topology Using SQL Like Query • SQL Compiled and Transformed onto a Trident Topology • Streaming Onto/From Arbitrary Data Sources – Kafka, Redis, HDFS, MongoDB – Extensible Implementing ISqlTridentDataSource
  • 11. Streaming SQL • Tuple Filtering • Projections • CSV, TSV, and Avro input/output formats • User Defined Functions (UDFs) • User fine control of Parallelism of Generated Components
  • 12. Streaming SQL - Aggregate UDF
  • 13. Streaming SQL – Example [1] • Read Apache HTTPD server logs from Kafka • Filter out everything but error log events • Write the error events onto a Kafka topic
  • 14. Streaming SQL – Example [1]
  • 15. Improved Apache Kafka Integration • Enhanced configuration API • Support Consumer Groups • Pluggable Translators Kafka Record ->Tuple • Support for Topic Wildcards • Support Multiple Streams, Topics/Stream • Trident Kafka supporting Kafka 0.10 onwards • Integrates with Secure Kafka Environments
  • 16. Improved Apache Kafka Integration
  • 17. PMML Support (Machine Learning) • Predictive Model Markup Language • Describes Model Learned by ML algorithms • PmmlPredictorBolt Computes Predicted Scores for Live Tuples according to PMML Model • PMML Model Uploaded or Downloaded from Distributed Cache
  • 19. Storm 1.1.0 Improvements • Flux – Visualization in Storm UI • Specify the resource requirements (Memory/CPU) for individual topology components (Spouts/Bolts) • Topology Deployment – Alternative to Uber Jar – storm jar --jars /path/to/local/jar --artifacts `resolve Maven dependencies` -- arfifactRepository `additional Maven repos` • Specify the resource requirements (Memory/CPU) for individual topology components (Spouts/Bolts)
  • 21. Apache Storm 2.0 • Storm Code entirely in Java (no more Clojure) • Performance Improvements • Worker/Threading Model Redesign • Apache Beam Integration • Bounded Spouts • Metrics Enhancements • Worker-Classloader Isolation • Improved Backpressure • Dynamic Topology Updates
  • 22. References • [1] Taylor Goetz Presentation @ DataWorks/Hadoop Summit, Munich 2017 • [2] http://arxiv.org/pdf/1506.08603v1.pdf • [3] http://research.microsoft.com/en-us/um/people/lamport/pubs/chandy.pdf