SlideShare a Scribd company logo
APACHE STORM
Presenter: Kyle Lin
Date: Dec, 2014 1
Agenda
2
 Introduction
 Architecture
 Topology
 Related Packages
 Read from/Write to multiple data store
 Demo
Profile
3
 History
 Initial release in 2011
 Open sourced after being acquired by Twitter.
 An Apache Incubator project since September
2013
 Became an Apache Top-Level Project in
September 2014
 Development
 Backtype, Twitter
 On 5 July 2011 BackType was acquired by Twitter.
Ref: http://en.wikipedia.org/wiki/Storm_(event_processor)
http://en.wikipedia.org/wiki/BackType
Architecture
4
Ref: http://doc.mapr.com/display/MapR/Storm
Architecture
5
Hadoop Storm
JobTracker Nimbus
TaskTracker Supervisor
Child Worker
Job Topology
Mapper/Reducer Spout/Bolt
Ref: http://www.biaodianfu.com/storm.html
Architecture
6
 Topology
 Like MR job
 MapReduce job eventually finishes, whereas a
topology processes messages forever (or until
you kill it).
 A topology is a graph of stream transformations
where each node is a spout or bolt.
Ref: http://storm.apache.org/documentation/Tutorial.html
Architecture - Topology
7
Ref: https://blog.safaribooksonline.com/2013/06/11/your-guide-to-storm/
Features
8
 Scalability
 Fault tolerant
 When workers die, Storm will automatically restart
them. If a node dies, the worker will be restarted
on another node.
 Nimbus and the Supervisors, are designed to be
stateless and fail-fast.
Ref: https://storm.apache.org/about/fault-tolerant.html
https://storm.apache.org/documentation/Fault-tolerance.html
Features
9
 Real-time processing
 “Storm is more for real-time computation (e.g.
streaming analytics) where you analyze data in
flight and don't necessarily land it anywhere.”
 HDP included
 Delivered from Hortonworks HDP 2.1
Ref: https://groups.google.com/forum/#!topic/storm-user/sjGoDf2FMCs
Topology
10
 Topology
 DAG of spouts and bolts
 Spout
 Source of streams
 “Typically a spout reads from a queuing broker such as
Kestrel, RabbitMQ, or Kafka, but a spout can also generate
its own stream or read from somewhere like the Twitter
streaming API.”
 Bolt
 Function
 Core functions of a streaming computation
 Metrics
 Report summary statistics across the full topology.
Ref: https://storm.apache.org/about/simple-api.html
Topology
11
Ref: http://jansipke.nl/storm-in-pictures/
Topology
12
Ref: https://blog.safaribooksonline.com/2013/06/11/your-guide-to-storm/
Stream grouping
13
 Telling Storm how to send tuples between sets
of tasks.
Ref: https://storm.apache.org/documentation/Tutorial.html
Stream grouping
14
 Shuffle grouping
 Stream tuples are randomly distributed such that each
bolt is guaranteed to get an equal number of tuples.
 Fields grouping
 The tuples are partitioned by the fields specified in the
grouping.
 All grouping
 The stream tuples are replicated across all the bolts.
 Global grouping
 The entire stream goes to a single bolt.
 Direct Grouping
 The source decides which component will receive the
tuple
Ref: https://blog.safaribooksonline.com/2013/06/11/your-guide-to-storm/
Stream grouping
15
Ref: https://blog.safaribooksonline.com/2013/06/11/your-guide-to-storm/
Topology
16
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new SentenceSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8).shuffleGrouping("spout");
builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split”, new Fields("wo
builder.setBolt("es-bolt", new EsBolt("storm/docs"), 5).shuffleGrouping("count");
Config conf = new Config();
conf.registerMetricsConsumer(LoggingMetricsConsumer.class, 2);
conf.setNumWorkers(3);
StormSubmitter.submitTopology(args[0], conf, builder.createTopology());
Trident on Storm
17
 Trident
 “A high-level abstraction for doing realtime
computing on top of Storm.”
 “An abstraction on storm, just like pig over
hadoop, which provides us with various useful
functions like aggregation, filter etc.”
Ref:
https://storm.apache.org/documentation/Trident-tutorial.html
http://realtime-cachedmind.tumblr.com/post/89974796387/real-time-processing-storm-tride
Trident
18
TridentTopology topology = new TridentTopology();
TridentState wordCounts =
topology.newStream("spout1", spout)
.each(new Fields("sentence"), new Split(), new Fields("word"))
.groupBy(new Fields("word"))
.persistentAggregate(new MemoryMapState.Factory(), new Count(), new Fields("c
.parallelismHint(6);
Ref: https://storm.apache.org/documentation/Trident-tutorial.html
Packages
19
 $STORM_HOME/contrib
 storm-cassandra, storm-hbase, storm-hdfs,
storm-hive, storm-jms, storm-jmxetric, storm-
kafka, storm-kafka-example, storm-starter
 Access Elasticsearch
 Elasticsearch-hadoop
 Access HDFS
 Hadoop-hdfs
 Storm-hdfs
Demo
20
 Start a topology
 Write random sentence into ES
 See the status of topology
 From web UI
Steps
21
 Done
 Test(Arrange maven dependencies, Usability of
packages)
 Write/Read ES
 Write/Read HDFS
 Next
 Integrate Kafka and Storm
 Kafka
 Distributed messaging system
22
Q and A
Reference
23
 Storm concepts
 https://storm.apache.org/documentation/Concepts
.html

More Related Content

Similar to Storm introduction

Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)
Robert Evans
 
Storm @ Fifth Elephant 2013
Storm @ Fifth Elephant 2013Storm @ Fifth Elephant 2013
Storm @ Fifth Elephant 2013
Prashanth Babu
 
FAIR Projector Builder
FAIR Projector BuilderFAIR Projector Builder
FAIR Projector Builder
Mark Wilkinson
 
Storm and Cassandra
Storm and Cassandra Storm and Cassandra
Storm and Cassandra
T Jake Luciani
 
From Gust To Tempest: Scaling Storm
From Gust To Tempest: Scaling StormFrom Gust To Tempest: Scaling Storm
From Gust To Tempest: Scaling Storm
DataWorks Summit
 
Infrastructure as Code with Chef / Puppet
Infrastructure as Code with Chef / PuppetInfrastructure as Code with Chef / Puppet
Infrastructure as Code with Chef / Puppet
Edmund Siegfried Haselwanter
 
Hadoop at Lookout
Hadoop at LookoutHadoop at Lookout
Hadoop at Lookout
Yash Ranadive
 
storm-170531123446.pptx
storm-170531123446.pptxstorm-170531123446.pptx
storm-170531123446.pptx
IbrahimBenhadhria
 
Future of Apache Storm
Future of Apache StormFuture of Apache Storm
Future of Apache Storm
DataWorks Summit/Hadoop Summit
 
DSD-INT 2017 Introduction to the WFlow framework - Schellekens
DSD-INT 2017 Introduction to the WFlow framework - SchellekensDSD-INT 2017 Introduction to the WFlow framework - Schellekens
DSD-INT 2017 Introduction to the WFlow framework - Schellekens
Deltares
 
Swoole Love PHP
Swoole Love PHPSwoole Love PHP
Swoole Love PHP
Yi-Feng Tzeng
 
VictoriaMetrics for the Atlas Cluster
VictoriaMetrics for the Atlas ClusterVictoriaMetrics for the Atlas Cluster
VictoriaMetrics for the Atlas Cluster
VictoriaMetrics
 
A Tale of Squirrels and Storms
A Tale of Squirrels and StormsA Tale of Squirrels and Storms
A Tale of Squirrels and Storms
Matthias J. Sax
 
Matthias J. Sax – A Tale of Squirrels and Storms
Matthias J. Sax – A Tale of Squirrels and StormsMatthias J. Sax – A Tale of Squirrels and Storms
Matthias J. Sax – A Tale of Squirrels and Storms
Flink Forward
 
Keynote session - LOD2014 W3C event
Keynote session - LOD2014 W3C eventKeynote session - LOD2014 W3C event
Keynote session - LOD2014 W3C event
Diego Valerio Camarda
 
Tensorflow data preparation on Apache Beam using Portable Flink Runner, Ankur...
Tensorflow data preparation on Apache Beam using Portable Flink Runner, Ankur...Tensorflow data preparation on Apache Beam using Portable Flink Runner, Ankur...
Tensorflow data preparation on Apache Beam using Portable Flink Runner, Ankur...
Bowen Li
 
Apache Zeppelin Helium and Beyond
Apache Zeppelin Helium and BeyondApache Zeppelin Helium and Beyond
Apache Zeppelin Helium and Beyond
DataWorks Summit/Hadoop Summit
 
Quadrupling your elephants - RDF and the Hadoop ecosystem
Quadrupling your elephants - RDF and the Hadoop ecosystemQuadrupling your elephants - RDF and the Hadoop ecosystem
Quadrupling your elephants - RDF and the Hadoop ecosystem
Rob Vesse
 
Rapid Prototyping in PySpark Streaming: The Thermodynamics of Docker Containe...
Rapid Prototyping in PySpark Streaming: The Thermodynamics of Docker Containe...Rapid Prototyping in PySpark Streaming: The Thermodynamics of Docker Containe...
Rapid Prototyping in PySpark Streaming: The Thermodynamics of Docker Containe...
Richard Seymour
 
OpenAerialMap
OpenAerialMapOpenAerialMap
OpenAerialMap
FOSS4G 2011
 

Similar to Storm introduction (20)

Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)
 
Storm @ Fifth Elephant 2013
Storm @ Fifth Elephant 2013Storm @ Fifth Elephant 2013
Storm @ Fifth Elephant 2013
 
FAIR Projector Builder
FAIR Projector BuilderFAIR Projector Builder
FAIR Projector Builder
 
Storm and Cassandra
Storm and Cassandra Storm and Cassandra
Storm and Cassandra
 
From Gust To Tempest: Scaling Storm
From Gust To Tempest: Scaling StormFrom Gust To Tempest: Scaling Storm
From Gust To Tempest: Scaling Storm
 
Infrastructure as Code with Chef / Puppet
Infrastructure as Code with Chef / PuppetInfrastructure as Code with Chef / Puppet
Infrastructure as Code with Chef / Puppet
 
Hadoop at Lookout
Hadoop at LookoutHadoop at Lookout
Hadoop at Lookout
 
storm-170531123446.pptx
storm-170531123446.pptxstorm-170531123446.pptx
storm-170531123446.pptx
 
Future of Apache Storm
Future of Apache StormFuture of Apache Storm
Future of Apache Storm
 
DSD-INT 2017 Introduction to the WFlow framework - Schellekens
DSD-INT 2017 Introduction to the WFlow framework - SchellekensDSD-INT 2017 Introduction to the WFlow framework - Schellekens
DSD-INT 2017 Introduction to the WFlow framework - Schellekens
 
Swoole Love PHP
Swoole Love PHPSwoole Love PHP
Swoole Love PHP
 
VictoriaMetrics for the Atlas Cluster
VictoriaMetrics for the Atlas ClusterVictoriaMetrics for the Atlas Cluster
VictoriaMetrics for the Atlas Cluster
 
A Tale of Squirrels and Storms
A Tale of Squirrels and StormsA Tale of Squirrels and Storms
A Tale of Squirrels and Storms
 
Matthias J. Sax – A Tale of Squirrels and Storms
Matthias J. Sax – A Tale of Squirrels and StormsMatthias J. Sax – A Tale of Squirrels and Storms
Matthias J. Sax – A Tale of Squirrels and Storms
 
Keynote session - LOD2014 W3C event
Keynote session - LOD2014 W3C eventKeynote session - LOD2014 W3C event
Keynote session - LOD2014 W3C event
 
Tensorflow data preparation on Apache Beam using Portable Flink Runner, Ankur...
Tensorflow data preparation on Apache Beam using Portable Flink Runner, Ankur...Tensorflow data preparation on Apache Beam using Portable Flink Runner, Ankur...
Tensorflow data preparation on Apache Beam using Portable Flink Runner, Ankur...
 
Apache Zeppelin Helium and Beyond
Apache Zeppelin Helium and BeyondApache Zeppelin Helium and Beyond
Apache Zeppelin Helium and Beyond
 
Quadrupling your elephants - RDF and the Hadoop ecosystem
Quadrupling your elephants - RDF and the Hadoop ecosystemQuadrupling your elephants - RDF and the Hadoop ecosystem
Quadrupling your elephants - RDF and the Hadoop ecosystem
 
Rapid Prototyping in PySpark Streaming: The Thermodynamics of Docker Containe...
Rapid Prototyping in PySpark Streaming: The Thermodynamics of Docker Containe...Rapid Prototyping in PySpark Streaming: The Thermodynamics of Docker Containe...
Rapid Prototyping in PySpark Streaming: The Thermodynamics of Docker Containe...
 
OpenAerialMap
OpenAerialMapOpenAerialMap
OpenAerialMap
 

Recently uploaded

Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Paul Brebner
 
The Rising Future of CPaaS in the Middle East 2024
The Rising Future of CPaaS in the Middle East 2024The Rising Future of CPaaS in the Middle East 2024
The Rising Future of CPaaS in the Middle East 2024
Yara Milbes
 
一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理
一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理
一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理
kgyxske
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Julian Hyde
 
TMU毕业证书精仿办理
TMU毕业证书精仿办理TMU毕业证书精仿办理
TMU毕业证书精仿办理
aeeva
 
14 th Edition of International conference on computer vision
14 th Edition of International conference on computer vision14 th Edition of International conference on computer vision
14 th Edition of International conference on computer vision
ShulagnaSarkar2
 
Enhanced Screen Flows UI/UX using SLDS with Tom Kitt
Enhanced Screen Flows UI/UX using SLDS with Tom KittEnhanced Screen Flows UI/UX using SLDS with Tom Kitt
Enhanced Screen Flows UI/UX using SLDS with Tom Kitt
Peter Caitens
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
gapen1
 
Boost Your Savings with These Money Management Apps
Boost Your Savings with These Money Management AppsBoost Your Savings with These Money Management Apps
Boost Your Savings with These Money Management Apps
Jhone kinadey
 
Photoshop Tutorial for Beginners (2024 Edition)
Photoshop Tutorial for Beginners (2024 Edition)Photoshop Tutorial for Beginners (2024 Edition)
Photoshop Tutorial for Beginners (2024 Edition)
alowpalsadig
 
Migration From CH 1.0 to CH 2.0 and Mule 4.6 & Java 17 Upgrade.pptx
Migration From CH 1.0 to CH 2.0 and  Mule 4.6 & Java 17 Upgrade.pptxMigration From CH 1.0 to CH 2.0 and  Mule 4.6 & Java 17 Upgrade.pptx
Migration From CH 1.0 to CH 2.0 and Mule 4.6 & Java 17 Upgrade.pptx
ervikas4
 
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
dakas1
 
What’s New in Odoo 17 – A Complete Roadmap
What’s New in Odoo 17 – A Complete RoadmapWhat’s New in Odoo 17 – A Complete Roadmap
What’s New in Odoo 17 – A Complete Roadmap
Envertis Software Solutions
 
Modelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - AmsterdamModelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - Amsterdam
Alberto Brandolini
 
Operational ease MuleSoft and Salesforce Service Cloud Solution v1.0.pptx
Operational ease MuleSoft and Salesforce Service Cloud Solution v1.0.pptxOperational ease MuleSoft and Salesforce Service Cloud Solution v1.0.pptx
Operational ease MuleSoft and Salesforce Service Cloud Solution v1.0.pptx
sandeepmenon62
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
dakas1
 
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
Bert Jan Schrijver
 
Kubernetes at Scale: Going Multi-Cluster with Istio
Kubernetes at Scale:  Going Multi-Cluster  with IstioKubernetes at Scale:  Going Multi-Cluster  with Istio
Kubernetes at Scale: Going Multi-Cluster with Istio
Severalnines
 

Recently uploaded (20)

Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
 
The Rising Future of CPaaS in the Middle East 2024
The Rising Future of CPaaS in the Middle East 2024The Rising Future of CPaaS in the Middle East 2024
The Rising Future of CPaaS in the Middle East 2024
 
一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理
一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理
一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
 
TMU毕业证书精仿办理
TMU毕业证书精仿办理TMU毕业证书精仿办理
TMU毕业证书精仿办理
 
14 th Edition of International conference on computer vision
14 th Edition of International conference on computer vision14 th Edition of International conference on computer vision
14 th Edition of International conference on computer vision
 
Enhanced Screen Flows UI/UX using SLDS with Tom Kitt
Enhanced Screen Flows UI/UX using SLDS with Tom KittEnhanced Screen Flows UI/UX using SLDS with Tom Kitt
Enhanced Screen Flows UI/UX using SLDS with Tom Kitt
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
 
Boost Your Savings with These Money Management Apps
Boost Your Savings with These Money Management AppsBoost Your Savings with These Money Management Apps
Boost Your Savings with These Money Management Apps
 
Photoshop Tutorial for Beginners (2024 Edition)
Photoshop Tutorial for Beginners (2024 Edition)Photoshop Tutorial for Beginners (2024 Edition)
Photoshop Tutorial for Beginners (2024 Edition)
 
Migration From CH 1.0 to CH 2.0 and Mule 4.6 & Java 17 Upgrade.pptx
Migration From CH 1.0 to CH 2.0 and  Mule 4.6 & Java 17 Upgrade.pptxMigration From CH 1.0 to CH 2.0 and  Mule 4.6 & Java 17 Upgrade.pptx
Migration From CH 1.0 to CH 2.0 and Mule 4.6 & Java 17 Upgrade.pptx
 
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
 
What’s New in Odoo 17 – A Complete Roadmap
What’s New in Odoo 17 – A Complete RoadmapWhat’s New in Odoo 17 – A Complete Roadmap
What’s New in Odoo 17 – A Complete Roadmap
 
Modelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - AmsterdamModelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - Amsterdam
 
Operational ease MuleSoft and Salesforce Service Cloud Solution v1.0.pptx
Operational ease MuleSoft and Salesforce Service Cloud Solution v1.0.pptxOperational ease MuleSoft and Salesforce Service Cloud Solution v1.0.pptx
Operational ease MuleSoft and Salesforce Service Cloud Solution v1.0.pptx
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
 
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
 
Kubernetes at Scale: Going Multi-Cluster with Istio
Kubernetes at Scale:  Going Multi-Cluster  with IstioKubernetes at Scale:  Going Multi-Cluster  with Istio
Kubernetes at Scale: Going Multi-Cluster with Istio
 

Storm introduction

  • 1. APACHE STORM Presenter: Kyle Lin Date: Dec, 2014 1
  • 2. Agenda 2  Introduction  Architecture  Topology  Related Packages  Read from/Write to multiple data store  Demo
  • 3. Profile 3  History  Initial release in 2011  Open sourced after being acquired by Twitter.  An Apache Incubator project since September 2013  Became an Apache Top-Level Project in September 2014  Development  Backtype, Twitter  On 5 July 2011 BackType was acquired by Twitter. Ref: http://en.wikipedia.org/wiki/Storm_(event_processor) http://en.wikipedia.org/wiki/BackType
  • 5. Architecture 5 Hadoop Storm JobTracker Nimbus TaskTracker Supervisor Child Worker Job Topology Mapper/Reducer Spout/Bolt Ref: http://www.biaodianfu.com/storm.html
  • 6. Architecture 6  Topology  Like MR job  MapReduce job eventually finishes, whereas a topology processes messages forever (or until you kill it).  A topology is a graph of stream transformations where each node is a spout or bolt. Ref: http://storm.apache.org/documentation/Tutorial.html
  • 7. Architecture - Topology 7 Ref: https://blog.safaribooksonline.com/2013/06/11/your-guide-to-storm/
  • 8. Features 8  Scalability  Fault tolerant  When workers die, Storm will automatically restart them. If a node dies, the worker will be restarted on another node.  Nimbus and the Supervisors, are designed to be stateless and fail-fast. Ref: https://storm.apache.org/about/fault-tolerant.html https://storm.apache.org/documentation/Fault-tolerance.html
  • 9. Features 9  Real-time processing  “Storm is more for real-time computation (e.g. streaming analytics) where you analyze data in flight and don't necessarily land it anywhere.”  HDP included  Delivered from Hortonworks HDP 2.1 Ref: https://groups.google.com/forum/#!topic/storm-user/sjGoDf2FMCs
  • 10. Topology 10  Topology  DAG of spouts and bolts  Spout  Source of streams  “Typically a spout reads from a queuing broker such as Kestrel, RabbitMQ, or Kafka, but a spout can also generate its own stream or read from somewhere like the Twitter streaming API.”  Bolt  Function  Core functions of a streaming computation  Metrics  Report summary statistics across the full topology. Ref: https://storm.apache.org/about/simple-api.html
  • 13. Stream grouping 13  Telling Storm how to send tuples between sets of tasks. Ref: https://storm.apache.org/documentation/Tutorial.html
  • 14. Stream grouping 14  Shuffle grouping  Stream tuples are randomly distributed such that each bolt is guaranteed to get an equal number of tuples.  Fields grouping  The tuples are partitioned by the fields specified in the grouping.  All grouping  The stream tuples are replicated across all the bolts.  Global grouping  The entire stream goes to a single bolt.  Direct Grouping  The source decides which component will receive the tuple Ref: https://blog.safaribooksonline.com/2013/06/11/your-guide-to-storm/
  • 16. Topology 16 TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("spout", new SentenceSpout(), 5); builder.setBolt("split", new SplitSentence(), 8).shuffleGrouping("spout"); builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split”, new Fields("wo builder.setBolt("es-bolt", new EsBolt("storm/docs"), 5).shuffleGrouping("count"); Config conf = new Config(); conf.registerMetricsConsumer(LoggingMetricsConsumer.class, 2); conf.setNumWorkers(3); StormSubmitter.submitTopology(args[0], conf, builder.createTopology());
  • 17. Trident on Storm 17  Trident  “A high-level abstraction for doing realtime computing on top of Storm.”  “An abstraction on storm, just like pig over hadoop, which provides us with various useful functions like aggregation, filter etc.” Ref: https://storm.apache.org/documentation/Trident-tutorial.html http://realtime-cachedmind.tumblr.com/post/89974796387/real-time-processing-storm-tride
  • 18. Trident 18 TridentTopology topology = new TridentTopology(); TridentState wordCounts = topology.newStream("spout1", spout) .each(new Fields("sentence"), new Split(), new Fields("word")) .groupBy(new Fields("word")) .persistentAggregate(new MemoryMapState.Factory(), new Count(), new Fields("c .parallelismHint(6); Ref: https://storm.apache.org/documentation/Trident-tutorial.html
  • 19. Packages 19  $STORM_HOME/contrib  storm-cassandra, storm-hbase, storm-hdfs, storm-hive, storm-jms, storm-jmxetric, storm- kafka, storm-kafka-example, storm-starter  Access Elasticsearch  Elasticsearch-hadoop  Access HDFS  Hadoop-hdfs  Storm-hdfs
  • 20. Demo 20  Start a topology  Write random sentence into ES  See the status of topology  From web UI
  • 21. Steps 21  Done  Test(Arrange maven dependencies, Usability of packages)  Write/Read ES  Write/Read HDFS  Next  Integrate Kafka and Storm  Kafka  Distributed messaging system
  • 23. Reference 23  Storm concepts  https://storm.apache.org/documentation/Concepts .html