SlideShare a Scribd company logo
Slide updated for
                                   STORM 0.8.2




        STORM
    COMPARISON – INTRODUCTION - CONCEPTS




PRESENTATION BY KASPER MADSEN
NOVEMBER - 2012
HADOOP                              VS             STORM
     Batch processing                            Real-time processing
     Jobs runs to completion                   Topologies run forever
     JobTracker is SPOF*                      No single point of failure
     Stateful nodes                                    Stateless nodes


     Scalable                                                 Scalable
     Guarantees no data loss                  Guarantees no data loss
     Open source                                          Open source




* Hadoop 0.21 added some checkpointing
 SPOF: Single Point Of Failure
COMPONENTS
     Nimbus daemon is comparable to Hadoop JobTracker. It is the master
     Supervisor daemon spawns workers, it is comparable to Hadoop TaskTracker
     Worker is spawned by supervisor, one per port defined in storm.yaml configuration
     Executor is spawned by worker, run as a thread
     Task is spawned by executors, run as a thread
     Zookeeper* is a distributed system, used to store metadata. Nimbus and
     Supervisor daemons are fail-fast and stateless. All state is kept in Zookeeper.


         Notice all communication between Nimbus and
           Supervisors are done through Zookeeper

      On a cluster with 2k+1 zookeeper nodes, the system
          can recover when maximally k nodes fails.




* Zookeeper is an Apache top-level project
EXECUTORS
Executor is a new abstraction
    •   Disassociate tasks of a
        component to #threads
    •   Allows dynamically
        changing #executors,
        without changing #tasks
    •   Makes elasticity much
        simpler, as semantics are
        kept valid (e.g. for a
        grouping)
    •   Enables elasticity in a
        multi-core environment
STREAMS
Stream is an unbounded sequence of tuples.
Topology is a graph where each node is a spout or bolt, and the edges indicate
which bolts are subscribing to which streams.
•   A spout is a source of a stream
•   A bolt is consuming a stream (possibly emits a new one)
                                                              Subscribes: A
•   An edge represents a grouping                             Emits: C


                                                                                 Subscribes: C & D

                                                              Subscribes: A
                                 Source of stream A           Emits: D




                                 Source of stream B
                                                              Subscribes:A & B
GROUPINGS
Each spout or bolt are running X instances in parallel (called tasks).
Groupings are used to decide which task in the subscribing bolt, the tuple is sent to
Shuffle grouping     is a random grouping
Fields grouping      is grouped by value, such that equal value results in equal task
All grouping         replicates to all tasks
Global grouping      makes all tuples go to one task
None grouping        makes bolt run in same thread as bolt/spout it subscribes to
Direct grouping      producer (task that emits) controls which consumer will receive
                                          4 tasks   3 tasks


                                2 tasks


                                          2 tasks
TestWordSpout          ExclamationBolt     ExclamationBolt

    EXAMPLE
     TopologyBuilder builder = new TopologyBuilder();                   Create stream called ”words”

                                                                        Run 10 tasks
     builder.setSpout("words", new TestWordSpout(), 10);
                                                                        Create stream called ”exclaim1”
     builder.setBolt("exclaim1", new ExclamationBolt(), 3)              Run 3 tasks

                                                                        Subscribe to stream ”words”,
                 .shuffleGrouping("words");                             using shufflegrouping
                                                                        Create stream called ”exclaim2”
     builder.setBolt("exclaim2", new ExclamationBolt(), 2)
                                                                        Run 2 tasks
                 .shuffleGrouping("exclaim1");                          Subscribe to stream ”exclaim1”,
                                                                        using shufflegrouping



        A bolt can subscribe to an unlimited number of
                streams, by chaining groupings.



The sourcecode for this example is part of the storm-starter project on github
TestWordSpout        ExclamationBolt     ExclamationBolt

EXAMPLE – 1
TestWordSpout
public void nextTuple() {
     Utils.sleep(100);
     final String[] words = new String[] {"nathan", "mike", "jackson", "golda", "bertels"};
     final Random rand = new Random();
     final String word = words[rand.nextInt(words.length)];
     _collector.emit(new Values(word));
}



The TestWordSpout emits a random string from the
       array words, each 100 milliseconds
TestWordSpout          ExclamationBolt        ExclamationBolt

EXAMPLE – 2
ExclamationBolt                                    Prepare is called when bolt is created

OutputCollector _collector;
public void prepare(Map conf, TopologyContext context, OutputCollector collector) {
      _collector = collector;
}                                             Execute is called for each tuple
public void execute(Tuple tuple) {
     _collector.emit(tuple, new Values(tuple.getString(0) + "!!!"));
     _collector.ack(tuple);
 }                                            declareOutputFields is called when bolt is created
public void declareOutputFields(OutputFieldsDeclarer declarer) {
     declarer.declare(new Fields("word"));
}


declareOutputFields is used to declare streams and their schemas. It
 is possible to declare several streams and specify the stream to use
           when outputting tuples in the emit function call.
TRIDENT TOPOLOGY
  Trident topology is a new abstraction built on top of STORM primitives
  •   Supports
       • Joins
       • Aggregations
       • Grouping
       • Functions
       • Filters
  •   Easy to use, read the wiki
  •   Guarantees exactly-once processing - if using (opaque) transactional spout
        • Some basic ideas are equal to the deprecated transactional topology*
        • Tuples are processed as small batches
        • Each batch gets a transaction id, if batch is replayed same txid is given
        • State updates are strongly ordered among batches
        • State updates atomically stores meta-data with data
  •   Transactional topology is superseded by the Trident topology from 0.8.0


*see my first slide (march 2012) on STORM, for detailed information. www.slideshare.com/KasperMadsen
EXACTLY-ONCE-PROCESSING - 1
Transactional spouts guarantees same data is replayed for every batch
Guaranteeing exactly-once-processing for transactional spouts
    • txid is stored with data, such that last txid that updated the data is known
    • Information is used to know what to update in case of replay
Example
     1. Currently processing txid: 2, with data [”man”, ”dog”, ”dog”]
     2. Current state is:
            ”man” => [count=3, txid=1]
            ”dog” => [count=2, txid=2]
     3. Batch with txid 2, fails and gets replayed.
     4. Resulting state is
            ”man” => [count=4, txid=2]
            ”dog” => [count=2, txid=2]
     5. Because txid is stored with the data, it is known the count for “dog” should
        not be increased again.
EXACTLY-ONCE-PROCESSING - 2
Opaque transactional spout is not guaranteed to replay same data for a failed
batch, as originally existed in the batch.
    • Guarantees every tuple is successfully processed in exactly one batch
    • Useful for having exactly-once-processing and allowing some inputs to fail
Guaranteeing exactly-once-processing for opaque transactional spouts
      •
      Same trick doesn’t work, as replayed batch might be changed, meaning
      some state might now have stored incorrect data. Consider previous
      example!
    • Problem is solved by storing more meta-data with data (previous value)
Example
Step        Data                       Count     prevValue        Txid         Updates dog
                                                                              count then fails
1           2 dog        1 cat         2,1       0,0              1,1
2           1 dog        2 cat         3,1       2,1              2,1
2.1         2 dog        2 cat         4, 3      2,1              2,2
 Consider the potential problems if the                  Batch contains new data, but updates
new data for 2.1 doesn’t contain any cat.                   ok as previous values are used
ELASTICITY
• Rebalancing workers and executors (not tasks)
   • Pause spouts
   • Wait for message timeout
   • Set new assignment
   • All moved tasks will be killed and restarted in new location
• Swapping (STORM 0.8.2)
    •   Submit inactive new topology
    •   Pause spouts of old topology
    •   Wait for message timeout of old topology
    •   Activate new topology
    •   Deactivate old topology
    •   Kill old topology                          What about state on tasks
                                                   which are killed and restarted?

                                                   It is up to the user to solve!
LEARN MORE
Website (http://storm-project.net/)
Wiki (https://github.com/nathanmarz/storm/wiki)
Storm-starter (https://github.com/nathanmarz/storm-starter)
Mailing list (http://groups.google.com/group/storm-user)
#storm-user room on freenode
UTSL: https://github.com/nathanmarz/storm
More slides: www.slideshare.net/KasperMadsen




                                             from: http://www.cupofjoe.tv/2010/11/learn-lesson.html

More Related Content

What's hot

Intro to Reactive Thinking and RxJava 2
Intro to Reactive Thinking and RxJava 2Intro to Reactive Thinking and RxJava 2
Intro to Reactive Thinking and RxJava 2
JollyRogers5
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm Architecture
P. Taylor Goetz
 
Storm
StormStorm
Storm
nathanmarz
 
streamparse and pystorm: simple reliable parallel processing with storm
streamparse and pystorm: simple reliable parallel processing with stormstreamparse and pystorm: simple reliable parallel processing with storm
streamparse and pystorm: simple reliable parallel processing with storm
Daniel Blanchard
 
Distributed Realtime Computation using Apache Storm
Distributed Realtime Computation using Apache StormDistributed Realtime Computation using Apache Storm
Distributed Realtime Computation using Apache Storm
the100rabh
 
Real-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and KafkaReal-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and Kafka
Andrew Montalenti
 
Introduction to Twitter Storm
Introduction to Twitter StormIntroduction to Twitter Storm
Introduction to Twitter Storm
Uwe Printz
 
Automate Your Application on the Cloud
Automate Your Application on the CloudAutomate Your Application on the Cloud
Automate Your Application on the Cloud
tamirko
 
[262] netflix 빅데이터 플랫폼
[262] netflix 빅데이터 플랫폼[262] netflix 빅데이터 플랫폼
[262] netflix 빅데이터 플랫폼
NAVER D2
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceP. Taylor Goetz
 
Do more than one thing at the same time, the Python way
Do more than one thing at the same time, the Python wayDo more than one thing at the same time, the Python way
Do more than one thing at the same time, the Python wayJaime Buelta
 
Realtime processing with storm presentation
Realtime processing with storm presentationRealtime processing with storm presentation
Realtime processing with storm presentation
Gabriel Eisbruch
 
JUnit5 and TestContainers
JUnit5 and TestContainersJUnit5 and TestContainers
JUnit5 and TestContainers
Sunghyouk Bae
 
Counter Wars (JEEConf 2016)
Counter Wars (JEEConf 2016)Counter Wars (JEEConf 2016)
Counter Wars (JEEConf 2016)
Alexey Fyodorov
 
DevoxxPL: JRebel Under The Covers
DevoxxPL: JRebel Under The CoversDevoxxPL: JRebel Under The Covers
DevoxxPL: JRebel Under The Covers
Simon Maple
 
PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.
DECK36
 
Slide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormSlide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache Storm
Md. Shamsur Rahim
 
Is your profiler speaking the same language as you? -- Docklands JUG
Is your profiler speaking the same language as you? -- Docklands JUGIs your profiler speaking the same language as you? -- Docklands JUG
Is your profiler speaking the same language as you? -- Docklands JUG
Simon Maple
 
Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)
Robert Evans
 

What's hot (20)

Intro to Reactive Thinking and RxJava 2
Intro to Reactive Thinking and RxJava 2Intro to Reactive Thinking and RxJava 2
Intro to Reactive Thinking and RxJava 2
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm Architecture
 
Storm
StormStorm
Storm
 
streamparse and pystorm: simple reliable parallel processing with storm
streamparse and pystorm: simple reliable parallel processing with stormstreamparse and pystorm: simple reliable parallel processing with storm
streamparse and pystorm: simple reliable parallel processing with storm
 
Distributed Realtime Computation using Apache Storm
Distributed Realtime Computation using Apache StormDistributed Realtime Computation using Apache Storm
Distributed Realtime Computation using Apache Storm
 
Real-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and KafkaReal-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and Kafka
 
Introduction to Twitter Storm
Introduction to Twitter StormIntroduction to Twitter Storm
Introduction to Twitter Storm
 
Automate Your Application on the Cloud
Automate Your Application on the CloudAutomate Your Application on the Cloud
Automate Your Application on the Cloud
 
[262] netflix 빅데이터 플랫폼
[262] netflix 빅데이터 플랫폼[262] netflix 빅데이터 플랫폼
[262] netflix 빅데이터 플랫폼
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market Sceince
 
Do more than one thing at the same time, the Python way
Do more than one thing at the same time, the Python wayDo more than one thing at the same time, the Python way
Do more than one thing at the same time, the Python way
 
Realtime processing with storm presentation
Realtime processing with storm presentationRealtime processing with storm presentation
Realtime processing with storm presentation
 
JUnit5 and TestContainers
JUnit5 and TestContainersJUnit5 and TestContainers
JUnit5 and TestContainers
 
Counter Wars (JEEConf 2016)
Counter Wars (JEEConf 2016)Counter Wars (JEEConf 2016)
Counter Wars (JEEConf 2016)
 
DevoxxPL: JRebel Under The Covers
DevoxxPL: JRebel Under The CoversDevoxxPL: JRebel Under The Covers
DevoxxPL: JRebel Under The Covers
 
PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.
 
Slide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormSlide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache Storm
 
Is your profiler speaking the same language as you? -- Docklands JUG
Is your profiler speaking the same language as you? -- Docklands JUGIs your profiler speaking the same language as you? -- Docklands JUG
Is your profiler speaking the same language as you? -- Docklands JUG
 
속도체크
속도체크속도체크
속도체크
 
Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)
 

Similar to Storm 0.8.2

STORM
STORMSTORM
Storm
StormStorm
Storm Real Time Computation
Storm Real Time ComputationStorm Real Time Computation
Storm Real Time Computation
Sonal Raj
 
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Davorin Vukelic
 
Introduction to Apache Storm
Introduction to Apache StormIntroduction to Apache Storm
Introduction to Apache Storm
Tiziano De Matteis
 
Developing Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache StormDeveloping Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache Storm
Lester Martin
 
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation systemBWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
Andrii Gakhov
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
DataWorks Summit/Hadoop Summit
 
storm-170531123446.dotx.pptx
storm-170531123446.dotx.pptxstorm-170531123446.dotx.pptx
storm-170531123446.dotx.pptx
IbrahimBenhadhria
 
Cleveland HUG - Storm
Cleveland HUG - StormCleveland HUG - Storm
Cleveland HUG - Storm
justinjleet
 
Streams processing with Storm
Streams processing with StormStreams processing with Storm
Streams processing with StormMariusz Gil
 
Storm begins
Storm beginsStorm begins
Storm begins
SungMin OH
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
DataWorks Summit/Hadoop Summit
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
P. Taylor Goetz
 
Java 7 Whats New(), Whats Next() from Oredev
Java 7 Whats New(), Whats Next() from OredevJava 7 Whats New(), Whats Next() from Oredev
Java 7 Whats New(), Whats Next() from Oredev
Mattias Karlsson
 
storm-170531123446.pptx
storm-170531123446.pptxstorm-170531123446.pptx
storm-170531123446.pptx
IbrahimBenhadhria
 
Apache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignApache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - Verisign
Michael Noll
 
The Back Propagation Learning Algorithm
The Back Propagation Learning AlgorithmThe Back Propagation Learning Algorithm
The Back Propagation Learning AlgorithmESCOM
 

Similar to Storm 0.8.2 (20)

STORM
STORMSTORM
STORM
 
Storm
StormStorm
Storm
 
Storm Real Time Computation
Storm Real Time ComputationStorm Real Time Computation
Storm Real Time Computation
 
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache Storm
 
Introduction to Apache Storm
Introduction to Apache StormIntroduction to Apache Storm
Introduction to Apache Storm
 
Developing Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache StormDeveloping Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache Storm
 
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation systemBWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
 
Storm
StormStorm
Storm
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
storm-170531123446.dotx.pptx
storm-170531123446.dotx.pptxstorm-170531123446.dotx.pptx
storm-170531123446.dotx.pptx
 
Cleveland HUG - Storm
Cleveland HUG - StormCleveland HUG - Storm
Cleveland HUG - Storm
 
Storm
StormStorm
Storm
 
Streams processing with Storm
Streams processing with StormStreams processing with Storm
Streams processing with Storm
 
Storm begins
Storm beginsStorm begins
Storm begins
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
Java 7 Whats New(), Whats Next() from Oredev
Java 7 Whats New(), Whats Next() from OredevJava 7 Whats New(), Whats Next() from Oredev
Java 7 Whats New(), Whats Next() from Oredev
 
storm-170531123446.pptx
storm-170531123446.pptxstorm-170531123446.pptx
storm-170531123446.pptx
 
Apache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignApache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - Verisign
 
The Back Propagation Learning Algorithm
The Back Propagation Learning AlgorithmThe Back Propagation Learning Algorithm
The Back Propagation Learning Algorithm
 

Recently uploaded

Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
Jen Stirrup
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Vlad Stirbu
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 

Recently uploaded (20)

Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 

Storm 0.8.2

  • 1. Slide updated for STORM 0.8.2 STORM COMPARISON – INTRODUCTION - CONCEPTS PRESENTATION BY KASPER MADSEN NOVEMBER - 2012
  • 2. HADOOP VS STORM Batch processing Real-time processing Jobs runs to completion Topologies run forever JobTracker is SPOF* No single point of failure Stateful nodes Stateless nodes Scalable Scalable Guarantees no data loss Guarantees no data loss Open source Open source * Hadoop 0.21 added some checkpointing SPOF: Single Point Of Failure
  • 3. COMPONENTS Nimbus daemon is comparable to Hadoop JobTracker. It is the master Supervisor daemon spawns workers, it is comparable to Hadoop TaskTracker Worker is spawned by supervisor, one per port defined in storm.yaml configuration Executor is spawned by worker, run as a thread Task is spawned by executors, run as a thread Zookeeper* is a distributed system, used to store metadata. Nimbus and Supervisor daemons are fail-fast and stateless. All state is kept in Zookeeper. Notice all communication between Nimbus and Supervisors are done through Zookeeper On a cluster with 2k+1 zookeeper nodes, the system can recover when maximally k nodes fails. * Zookeeper is an Apache top-level project
  • 4. EXECUTORS Executor is a new abstraction • Disassociate tasks of a component to #threads • Allows dynamically changing #executors, without changing #tasks • Makes elasticity much simpler, as semantics are kept valid (e.g. for a grouping) • Enables elasticity in a multi-core environment
  • 5. STREAMS Stream is an unbounded sequence of tuples. Topology is a graph where each node is a spout or bolt, and the edges indicate which bolts are subscribing to which streams. • A spout is a source of a stream • A bolt is consuming a stream (possibly emits a new one) Subscribes: A • An edge represents a grouping Emits: C Subscribes: C & D Subscribes: A Source of stream A Emits: D Source of stream B Subscribes:A & B
  • 6. GROUPINGS Each spout or bolt are running X instances in parallel (called tasks). Groupings are used to decide which task in the subscribing bolt, the tuple is sent to Shuffle grouping is a random grouping Fields grouping is grouped by value, such that equal value results in equal task All grouping replicates to all tasks Global grouping makes all tuples go to one task None grouping makes bolt run in same thread as bolt/spout it subscribes to Direct grouping producer (task that emits) controls which consumer will receive 4 tasks 3 tasks 2 tasks 2 tasks
  • 7. TestWordSpout ExclamationBolt ExclamationBolt EXAMPLE TopologyBuilder builder = new TopologyBuilder(); Create stream called ”words” Run 10 tasks builder.setSpout("words", new TestWordSpout(), 10); Create stream called ”exclaim1” builder.setBolt("exclaim1", new ExclamationBolt(), 3) Run 3 tasks Subscribe to stream ”words”, .shuffleGrouping("words"); using shufflegrouping Create stream called ”exclaim2” builder.setBolt("exclaim2", new ExclamationBolt(), 2) Run 2 tasks .shuffleGrouping("exclaim1"); Subscribe to stream ”exclaim1”, using shufflegrouping A bolt can subscribe to an unlimited number of streams, by chaining groupings. The sourcecode for this example is part of the storm-starter project on github
  • 8. TestWordSpout ExclamationBolt ExclamationBolt EXAMPLE – 1 TestWordSpout public void nextTuple() { Utils.sleep(100); final String[] words = new String[] {"nathan", "mike", "jackson", "golda", "bertels"}; final Random rand = new Random(); final String word = words[rand.nextInt(words.length)]; _collector.emit(new Values(word)); } The TestWordSpout emits a random string from the array words, each 100 milliseconds
  • 9. TestWordSpout ExclamationBolt ExclamationBolt EXAMPLE – 2 ExclamationBolt Prepare is called when bolt is created OutputCollector _collector; public void prepare(Map conf, TopologyContext context, OutputCollector collector) { _collector = collector; } Execute is called for each tuple public void execute(Tuple tuple) { _collector.emit(tuple, new Values(tuple.getString(0) + "!!!")); _collector.ack(tuple); } declareOutputFields is called when bolt is created public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word")); } declareOutputFields is used to declare streams and their schemas. It is possible to declare several streams and specify the stream to use when outputting tuples in the emit function call.
  • 10. TRIDENT TOPOLOGY Trident topology is a new abstraction built on top of STORM primitives • Supports • Joins • Aggregations • Grouping • Functions • Filters • Easy to use, read the wiki • Guarantees exactly-once processing - if using (opaque) transactional spout • Some basic ideas are equal to the deprecated transactional topology* • Tuples are processed as small batches • Each batch gets a transaction id, if batch is replayed same txid is given • State updates are strongly ordered among batches • State updates atomically stores meta-data with data • Transactional topology is superseded by the Trident topology from 0.8.0 *see my first slide (march 2012) on STORM, for detailed information. www.slideshare.com/KasperMadsen
  • 11. EXACTLY-ONCE-PROCESSING - 1 Transactional spouts guarantees same data is replayed for every batch Guaranteeing exactly-once-processing for transactional spouts • txid is stored with data, such that last txid that updated the data is known • Information is used to know what to update in case of replay Example 1. Currently processing txid: 2, with data [”man”, ”dog”, ”dog”] 2. Current state is: ”man” => [count=3, txid=1] ”dog” => [count=2, txid=2] 3. Batch with txid 2, fails and gets replayed. 4. Resulting state is ”man” => [count=4, txid=2] ”dog” => [count=2, txid=2] 5. Because txid is stored with the data, it is known the count for “dog” should not be increased again.
  • 12. EXACTLY-ONCE-PROCESSING - 2 Opaque transactional spout is not guaranteed to replay same data for a failed batch, as originally existed in the batch. • Guarantees every tuple is successfully processed in exactly one batch • Useful for having exactly-once-processing and allowing some inputs to fail Guaranteeing exactly-once-processing for opaque transactional spouts • Same trick doesn’t work, as replayed batch might be changed, meaning some state might now have stored incorrect data. Consider previous example! • Problem is solved by storing more meta-data with data (previous value) Example Step Data Count prevValue Txid Updates dog count then fails 1 2 dog 1 cat 2,1 0,0 1,1 2 1 dog 2 cat 3,1 2,1 2,1 2.1 2 dog 2 cat 4, 3 2,1 2,2 Consider the potential problems if the Batch contains new data, but updates new data for 2.1 doesn’t contain any cat. ok as previous values are used
  • 13. ELASTICITY • Rebalancing workers and executors (not tasks) • Pause spouts • Wait for message timeout • Set new assignment • All moved tasks will be killed and restarted in new location • Swapping (STORM 0.8.2) • Submit inactive new topology • Pause spouts of old topology • Wait for message timeout of old topology • Activate new topology • Deactivate old topology • Kill old topology What about state on tasks which are killed and restarted? It is up to the user to solve!
  • 14. LEARN MORE Website (http://storm-project.net/) Wiki (https://github.com/nathanmarz/storm/wiki) Storm-starter (https://github.com/nathanmarz/storm-starter) Mailing list (http://groups.google.com/group/storm-user) #storm-user room on freenode UTSL: https://github.com/nathanmarz/storm More slides: www.slideshare.net/KasperMadsen from: http://www.cupofjoe.tv/2010/11/learn-lesson.html