SlideShare a Scribd company logo
Realtime Processing with
         Storm
Who are we?

              Dario Simonassi
              @ldsimonassi



              Gabriel Eisbruch
              @geisbruch



              Jonathan Leibiusky
              @xetorthio
Agenda

         ● The problem

         ● Storm - Basic concepts

         ● Trident

         ● Deploying

         ● Some other features

         ● Contributions
Tons of
                                                                           information
                                                                           arriving
                                                                           every
                                                                           second




http://necromentia.files.wordpress.com/2011/01/20070103213517_iguazu.jpg
Heavy
                                                   workload




http://www.computerkochen.de/fun/bilder/esel.jpg
Do
                                     Re   it
                                        alti
                                             me
                                                 !




http://www.asimon.co.il/data/images/a10689.jpg
Worker
                                                         Complex
                       Queue
              Worker

                                        Queue


                       Queue   Worker           Worker

Information   Worker
  Source                                Queue              DB

                       Queue   Worker
                                                Worker

              Worker
                                        Queue

                       Queue   Worker
What is Storm?

●   Is a highly distributed realtime computation system.

●   Provides general primitives to do realtime computation.

●   Can be used with any programming language.

●   It's scalable and fault-tolerant.
Example - Main Concepts
https://github.com/geisbruch/strata2012
Main Concepts - Example



         "Count tweets related to strataconf and show hashtag totals."
Main Concepts - Spout

● First thing we need is to read all tweets related to strataconf from twitter

● A Spout is a special component that is a source of messages to be processed.




                                Tweets                    Tuples
                                            Spout
Main Concepts - Spout
      public class ApiStreamingSpout extends BaseRichSpout {
        SpoutOutputCollector collector;
        TwitterReader reader;

          public void nextTuple() {
            Tweet tweet = reader.getNextTweet();
            if(tweet != null)
               collector.emit(new Values(tweet));
          }
          public void open(Map conf, TopologyContext context,
               SpoutOutputCollector collector) {
            reader = new TwitterReader(conf.get("track"), conf.get("user"), conf.get("pass"));
            this.collector = collector;
          }
          public void declareOutputFields(OutputFieldsDeclarer declarer) {
            declarer.declare(new Fields("tweet"));
          }
      }



                                           Tweets                                Tuples
                                                             Spout
Main Concepts - Bolt

● For every tweet, we find hashtags

● A Bolt is a component that does the processing.




                        tweet                                            hashtags
                                                                        hashtags
                        stream               tweet                      hashtags

                                  Spout              HashtagsSplitter
Main Concepts - Bolt
         public class HashtagsSplitter extends BaseBasicBolt {
           public void execute(Tuple input, BasicOutputCollector collector) {
              Tweet tweet = (Tweet)input.getValueByField("tweet");
              for(String hashtag : tweet.getHashTags()){
                collector.emit(new Values(hashtag));
              }
            }
           public void declareOutputFields(OutputFieldsDeclarer declarer) {
              declarer.declare(new Fields("hashtag"));
           }
         }


                    tweet                                                 hashtags
                                              tweet                      hashtags
                                                                         hashtags
                    stream
                                Spout                 HashtagsSplitter
Main Concepts - Bolt
           public class HashtagsCounterBolt extends BaseBasicBolt {
             private Jedis jedis;

                 public void execute(Tuple input, BasicOutputCollector collector) {
                   String hashtag = input.getStringByField("hashtag");
                   jedis.hincrBy("hashs", key, val);
                 }

                 public void prepare(Map conf, TopologyContext context) {
                   jedis = new Jedis("localhost");
                 }
           }

        tweet
        stream                     tweet                      hashtag
                                                              hashtag
                                                               hashtag
                      Spout                HashtagsSplitter              HashtagsCounterBolt
Main Concepts - Topology
     public static Topology createTopology() {
       TopologyBuilder builder = new TopologyBuilder();

         builder.setSpout("tweets-collector", new ApiStreamingSpout(), 1);
         builder.setBolt("hashtags-splitter", new HashTagsSplitter(), 2)
           .shuffleGrouping("tweets-collector");
         builder.setBolt("hashtags-counter", new HashtagsCounterBolt(), 2)
           .fieldsGrouping("hashtags-splitter", new Fields("hashtags"));

         return builder.createTopology();
     }

                                                               fields
                                  shuffle

                                            HashtagsSplitter            HashtagsCounterBolt
                      Spout
                                            HashtagsSplitter            HashtagsCounterBolt
Example Overview - Read Tweet

 Spout                           Splitter   Counter   Redis



         This tweet has #hello
         #world hashtags!!!
Example Overview - Split Tweet

 Spout                           Splitter                     Counter   Redis



         This tweet has #hello
         #world hashtags!!!                 #hello



                                                     #world
Example Overview - Split Tweet

 Spout                           Splitter                     Counter                  Redis



         This tweet has #hello
         #world hashtags!!                  #hello
                                                                        incr(#hello)
                                                                                       #hello = 1
                                                     #world
                                                                        incr(#world)
                                                                                       #world = 1
Example Overview - Split Tweet

 Spout                         Splitter   Counter   Redis




                                                    #hello = 1


         This tweet has #bye
         #world hashtags
                                                    #world = 1
Example Overview - Split Tweet

 Spout                         Splitter                   Counter   Redis




                                                                    #hello = 1
                                          #world
                                                   #bye
         This tweet has #bye
         #world hashtags                                            #world = 1
Example Overview - Split Tweet

 Spout                         Splitter                   Counter                  Redis




                                                                                   #hello = 1
                                          #world
                                                   #bye
         This tweet has #bye                                        incr(#bye)
         #world hashtags                                                           #world = 2
                                                                    incr(#world)    #bye =1
Main Concepts - In summary

 ○ Topology

 ○ Spout

 ○ Stream

 ○ Bolt
Guaranteeing Message Processing




                                  http://www.steinborn.com/home_warranty.php
Guaranteeing Message Processing




                       HashtagsSplitter   HashtagsCounterBolt
             Spout
                       HashtagsSplitter   HashtagsCounterBolt
Guaranteeing Message Processing




                       HashtagsSplitter        HashtagsCounterBolt
             Spout
                       HashtagsSplitter        HashtagsCounterBolt

                      _collector.ack(tuple);
Guaranteeing Message Processing




                       HashtagsSplitter   HashtagsCounterBolt
             Spout
                       HashtagsSplitter   HashtagsCounterBolt
Guaranteeing Message Processing




                       HashtagsSplitter   HashtagsCounterBolt
             Spout
                       HashtagsSplitter   HashtagsCounterBolt
Guaranteeing Message Processing




                                      HashtagsSplitter   HashtagsCounterBolt
               Spout
                                      HashtagsSplitter   HashtagsCounterBolt

           void fail(Object msgId);
Trident




          http://www.raytekmarine.com/images/trident.jpg
Trident - What is Trident?


   ● High-level abstraction on top of Storm.

   ● Will make it easier to build topologies.

   ● Similar to Hadoop, Pig
Trident - Topology



   TridentTopology topology =
                    new TridentTopology();



                                                          hashtags
                                                         hashtags
             tweet                                       hashtags
             stream           tweet

                      Spout           HashtagsSplitter               Memory
Trident - Stream



  Stream tweetsStream = topology
          .newStream("tweets", new ApiStreamingSpout());




                        tweet
                        stream
                                 Spout
Trident
          Stream hashtagsStream =
               tweetsStrem.each(new Fields("tweet"),new BaseFunction() {
                   public void execute(TridentTuple tuple,
                                 TridentCollector collector) {
                       Tweet tweet = (Tweet)tuple.getValueByField("tweet");
                       for(String hashtag : tweet.getHashtags())
                          collector.emit(new Values(hashtag));
                       }
                    }, new Fields("hashtag")
                 );



                       tweet                                           hashtags
                                                                      hashtags
                       stream              tweet                      hashtags

                                Spout              HashtagsSplitter
Trident - States

   TridentState hashtagCounts =
      hashtagsStream.groupBy(new Fields("hashtag"))
       .persistentAggregate(new MemoryMapState.Factory(),
                            new Count(),
                            new Fields("count"));




                                                          hashtags
                                                         hashtags
             tweet                                       hashtags
             stream           tweet
                                                                     Memory
                      Spout           HashtagsSplitter
                                                                     (State)
Trident - States

   TridentState hashtagCounts =
      hashtagsStream.groupBy(new Fields("hashtag"))
       .persistentAggregate(new MemoryMapState.Factory(),
                            new Count(),
                            new Fields("count"));




                                                          hashtags
                                                         hashtags
             tweet                                       hashtags
             stream           tweet
                                                                     Memory
                      Spout           HashtagsSplitter
                                                                     (State)
Trident - States

   TridentState hashtagCounts =
      hashtagsStream.groupBy(new Fields("hashtag"))
       .persistentAggregate(new MemoryMapState.Factory(),
                            new Count(),
                            new Fields("count"));




                                                          hashtags
                                                         hashtags
             tweet                                       hashtags
             stream           tweet
                                                                     Memory
                      Spout           HashtagsSplitter
                                                                     (State)
Trident - States

   TridentState hashtagCounts =
      hashtagsStream.groupBy(new Fields("hashtag"))
       .persistentAggregate(new MemoryMapState.Factory(),
                            new Count(),
                            new Fields("count"));




                                                          hashtags
                                                         hashtags
             tweet                                       hashtags
             stream           tweet
                                                                     Memory
                      Spout           HashtagsSplitter
                                                                     (State)
Trident - Complete example


 TridentState hashtags = topology
        .newStream("tweets", new ApiStreamingSpout())
        .each(new Fields("tweet"),new HashTagSplitter(), new Fields("hashtag"))
        .groupBy(new Fields("hashtag"))
        .persistentAggregate(new MemoryMapState.Factory(),
                                      new Count(),
                                      new Fields("count"))




                                                              hashtags
                                                             hashtags
               tweet                                         hashtags
               stream             tweet
                                                                         Memory
                        Spout             HashtagsSplitter
                                                                         (State)
Transactions
Transactions
  ● Exactly once message processing semantic
Transactions
  ● The intermediate processing may be executed in a parallel way
Transactions
  ● But persistence should be strictly sequential
Transactions
  ● Spouts and persistence should be designed to be transactional
Trident - Queries




     We want to know on-line the count of tweets with hashtag hadoop and the
                    amount of tweets with strataconf hashtag
Trident - DRPC
Trident - Query



 DRPCClient drpc = new DRPCClient("MyDRPCServer",
 "MyDRPCPort");

 drpc.execute("counts", "strataconf hadoop");
Trident - Query




       Stream drpcStream =
              topology.newDRPCStream("counts",drpc)
Trident - Query



  Stream singleHashtag =
       drpcStream.each(new Fields("args"), new Split(), new Fields("hashtag"))




                                     DRPC            Hashtag
                DRPC                 stream          Splitter
                Server
Trident - Query



   Stream hashtagsCountQuery =
        singleHashtag.stateQuery(hashtagCountMemoryState, new Fields("hashtag"),
        new MapGet(), new Fields("count"));




                            DRPC              Hashtag         Hashtag Count
                            stream            Splitter           Query
       DRPC
       Server
Trident - Query



     Stream drpcCountAggregate =
                hashtagsCountQuery.groupBy(new Fields("hashtag"))
                .aggregate(new Fields("count"), new Sum(), new Fields("sum"));




                         DRPC           Hashtag       Hashtag Count   Aggregate
    DRPC                 stream         Splitter         Query          Count
    Server
Trident - Query - Complete Example


    topology.newDRPCStream("counts",drpc)
        .each(new Fields("args"), new Split(), new Fields("hashtag"))
        .stateQuery(hashtags, new Fields("hashtag"),
              new MapGet(), new Fields("count"))
        .each(new Fields("count"), new FilterNull())
        .groupBy(new Fields("hashtag"))
        .aggregate(new Fields("count"), new Sum(), new Fields("sum"));
Trident - Query - Complete Example




                       DRPC      Hashtag    Hashtag Count   Aggregate
         DRPC          stream    Splitter      Query          Count
         Server
Trident - Query - Complete Example



    Client Call

                       DRPC      Hashtag    Hashtag Count   Aggregate
              DRPC     stream    Splitter      Query          Count
              Server
Trident - Query - Complete Example


    Hold the
                        Execute
    Request


                                  DRPC     Hashtag    Hashtag Count   Aggregate
               DRPC               stream   Splitter      Query          Count
               Server
Trident - Query - Complete Example


  Request continue
      holded


                       DRPC           Hashtag        Hashtag Count   Aggregate
              DRPC     stream         Splitter          Query          Count
              Server

                                On finish return to DRPC
                                         server
Trident - Query - Complete Example




                          DRPC     Hashtag    Hashtag Count   Aggregate
              DRPC        stream   Splitter      Query          Count
              Server

 Return response to the
          client
Running Topologies
Running Topologies - Local Cluster
    Config conf = new Config();
    conf.put("some_property","some_value")
    LocalCluster cluster = new LocalCluster();
    cluster.submitTopology("twitter-hashtag-summarizer", conf, builder.createTopology());



    ●   The whole topology will run in a single machine.

    ●   Similar to run in a production cluster

    ●   Very Useful for test and development

    ●   Configurable parallelism

    ●   Similar methods than StormSubmitter
Running Topologies - Production

   package twitter.streaming;
   public class Topology {
     public static void main(String[] args){
       StormSubmitter.submitTopology("twitter-hashtag-summarizer",
          conf, builder.createTopology());
     }
   }


    mvn package

    storm jar Strata2012-0.0.1.jar twitter.streaming.Topology 
         "$REDIS_HOST" $REDIS_PORT "strata,strataconf"
       $TWITTER_USER $TWITTER_PASS
Running Topologies - Cluster Layout

● Single Master process                Storm Cluster
                                                                                 Supervisor
● No SPOF                                                                  (where the processing is done)


● Automatic Task distribution              Nimbus                                Supervisor
                                           (The master
                                                                           (where the processing is done)
                                             Process)
● Amount of task configurable by
  process                                                                        Supervisor
                                                                           (where the processing is done)

● Automatic tasks re-distribution on
  supervisors fails
                                                            Zookeeper Cluster
                                                       (Only for coordination and cluster state)
● Work continue on nimbus fails
Running Topologies - Cluster Architecture - No SPOF

          Storm Cluster


                                            Supervisor
                                      (where the processing is done)




                   Nimbus                   Supervisor
               (The master Process)   (where the processing is done)




                                            Supervisor
                                      (where the processing is done)
Running Topologies - Cluster Architecture - No SPOF

          Storm Cluster


                                            Supervisor
                                      (where the processing is done)




                   Nimbus                   Supervisor
               (The master Process)   (where the processing is done)




                                            Supervisor
                                      (where the processing is done)
Running Topologies - Cluster Architecture - No SPOF

          Storm Cluster


                                            Supervisor
                                      (where the processing is done)




                   Nimbus                   Supervisor
               (The master Process)   (where the processing is done)




                                            Supervisor
                                      (where the processing is done)
Other features
Other features - Non JVM languages
                                             {
    ● Use Non-JVM Languages                      "conf": {
                                                    "topology.message.timeout.secs": 3,
                                                    // etc
    ● Simple protocol                            },
                                                 "context": {
    ● Use stdin & stdout for communication          "task->component": {
                                                       "1": "example-spout",
                                                       "2": "__acker",
    ● Many implementations and abstraction
                                                       "3": "example-bolt"
      layers done by the community                  },
                                                    "taskid": 3
                                                 },
                                                 "pidDir": "..."
                                             }
Contribution
●   A collection of community developed spouts, bolts, serializers, and other goodies.
●   You can find it in: https://github.com/nathanmarz/storm-contrib

●   Some of those goodies are:

    ○   cassandra: Integrates Storm and Cassandra by writing tuples to a Cassandra Column Family.

    ○   mongodb: Provides a spout to read documents from mongo, and a bolt to write documents to mongo.

    ○   SQS: Provides a spout to consume messages from Amazon SQS.

    ○   hbase: Provides a spout and a bolt to read and write from HBase.

    ○   kafka: Provides a spout to read messages from kafka.

    ○   rdbms: Provides a bolt to write tuples to a table.

    ○   redis-pubsub: Provides a spout to read messages from redis pubsub.

    ○   And more...
Questions?

More Related Content

What's hot

Real time and reliable processing with Apache Storm
Real time and reliable processing with Apache StormReal time and reliable processing with Apache Storm
Real time and reliable processing with Apache Storm
Andrea Iacono
 
Learning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormLearning Stream Processing with Apache Storm
Learning Stream Processing with Apache Storm
Eugene Dvorkin
 
Storm and Cassandra
Storm and Cassandra Storm and Cassandra
Storm and Cassandra
T Jake Luciani
 
Yahoo compares Storm and Spark
Yahoo compares Storm and SparkYahoo compares Storm and Spark
Yahoo compares Storm and Spark
Chicago Hadoop Users Group
 
Streams processing with Storm
Streams processing with StormStreams processing with Storm
Streams processing with StormMariusz Gil
 
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopDataWorks Summit
 
Introduction to Apache Storm
Introduction to Apache StormIntroduction to Apache Storm
Introduction to Apache Storm
Tiziano De Matteis
 
Storm Real Time Computation
Storm Real Time ComputationStorm Real Time Computation
Storm Real Time Computation
Sonal Raj
 
Storm Anatomy
Storm AnatomyStorm Anatomy
Storm Anatomy
Eiichiro Uchiumi
 
Introduction to Apache Storm - Concept & Example
Introduction to Apache Storm - Concept & ExampleIntroduction to Apache Storm - Concept & Example
Introduction to Apache Storm - Concept & Example
Dung Ngua
 
Storm presentation
Storm presentationStorm presentation
Storm presentation
Shyam Raj
 
Storm-on-YARN: Convergence of Low-Latency and Big-Data
Storm-on-YARN: Convergence of Low-Latency and Big-DataStorm-on-YARN: Convergence of Low-Latency and Big-Data
Storm-on-YARN: Convergence of Low-Latency and Big-Data
DataWorks Summit
 
Real-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormReal-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using Storm
Nati Shalom
 
Real-time Big Data Processing with Storm
Real-time Big Data Processing with StormReal-time Big Data Processing with Storm
Real-time Big Data Processing with Storm
viirya
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceP. Taylor Goetz
 
Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)
Robert Evans
 
Storm
StormStorm
Tutorial Kafka-Storm
Tutorial Kafka-StormTutorial Kafka-Storm
Tutorial Kafka-Storm
Universidad de Santiago de Chile
 
Developing Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache StormDeveloping Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache Storm
Lester Martin
 

What's hot (20)

Real time and reliable processing with Apache Storm
Real time and reliable processing with Apache StormReal time and reliable processing with Apache Storm
Real time and reliable processing with Apache Storm
 
Learning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormLearning Stream Processing with Apache Storm
Learning Stream Processing with Apache Storm
 
Storm and Cassandra
Storm and Cassandra Storm and Cassandra
Storm and Cassandra
 
Yahoo compares Storm and Spark
Yahoo compares Storm and SparkYahoo compares Storm and Spark
Yahoo compares Storm and Spark
 
Streams processing with Storm
Streams processing with StormStreams processing with Storm
Streams processing with Storm
 
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and Hadoop
 
Introduction to Apache Storm
Introduction to Apache StormIntroduction to Apache Storm
Introduction to Apache Storm
 
Storm Real Time Computation
Storm Real Time ComputationStorm Real Time Computation
Storm Real Time Computation
 
Introduction to Storm
Introduction to StormIntroduction to Storm
Introduction to Storm
 
Storm Anatomy
Storm AnatomyStorm Anatomy
Storm Anatomy
 
Introduction to Apache Storm - Concept & Example
Introduction to Apache Storm - Concept & ExampleIntroduction to Apache Storm - Concept & Example
Introduction to Apache Storm - Concept & Example
 
Storm presentation
Storm presentationStorm presentation
Storm presentation
 
Storm-on-YARN: Convergence of Low-Latency and Big-Data
Storm-on-YARN: Convergence of Low-Latency and Big-DataStorm-on-YARN: Convergence of Low-Latency and Big-Data
Storm-on-YARN: Convergence of Low-Latency and Big-Data
 
Real-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormReal-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using Storm
 
Real-time Big Data Processing with Storm
Real-time Big Data Processing with StormReal-time Big Data Processing with Storm
Real-time Big Data Processing with Storm
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market Sceince
 
Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)
 
Storm
StormStorm
Storm
 
Tutorial Kafka-Storm
Tutorial Kafka-StormTutorial Kafka-Storm
Tutorial Kafka-Storm
 
Developing Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache StormDeveloping Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache Storm
 

Similar to Realtime processing with storm presentation

Storm - The Real-Time Layer Your Big Data's Been Missing
Storm - The Real-Time Layer Your Big Data's Been MissingStorm - The Real-Time Layer Your Big Data's Been Missing
Storm - The Real-Time Layer Your Big Data's Been Missing
FullContact
 
掀起 Swift 的面紗
掀起 Swift 的面紗掀起 Swift 的面紗
掀起 Swift 的面紗
Pofat Tseng
 
Code as Data workshop: Using source{d} Engine to extract insights from git re...
Code as Data workshop: Using source{d} Engine to extract insights from git re...Code as Data workshop: Using source{d} Engine to extract insights from git re...
Code as Data workshop: Using source{d} Engine to extract insights from git re...
source{d}
 
The Open Source... Behind the Tweets
The Open Source... Behind the TweetsThe Open Source... Behind the Tweets
The Open Source... Behind the TweetsChris Aniszczyk
 
Unleashing Twitter Data for Fun and Insight
Unleashing Twitter Data for Fun and InsightUnleashing Twitter Data for Fun and Insight
Unleashing Twitter Data for Fun and Insight
Matthew Russell
 
Unleashing twitter data for fun and insight
Unleashing twitter data for fun and insightUnleashing twitter data for fun and insight
Unleashing twitter data for fun and insight
Digital Reasoning
 
Twitter Big Data
Twitter Big DataTwitter Big Data
Twitter Big Data
Colin Surprenant
 
FOSDEM2018 Janus Lua plugin presentation
FOSDEM2018 Janus Lua plugin presentationFOSDEM2018 Janus Lua plugin presentation
FOSDEM2018 Janus Lua plugin presentation
Lorenzo Miniero
 
TypeScript와 Flow: 
자바스크립트 개발에 정적 타이핑 도입하기
TypeScript와 Flow: 
자바스크립트 개발에 정적 타이핑 도입하기TypeScript와 Flow: 
자바스크립트 개발에 정적 타이핑 도입하기
TypeScript와 Flow: 
자바스크립트 개발에 정적 타이핑 도입하기
Heejong Ahn
 
Git_new.pptx
Git_new.pptxGit_new.pptx
Git_new.pptx
BruceLee275640
 
Git, from the beginning
Git, from the beginningGit, from the beginning
Git, from the beginning
James Aylett
 
Twitter Stream Processing
Twitter Stream ProcessingTwitter Stream Processing
Twitter Stream Processing
Colin Surprenant
 
Git 101 for Beginners
Git 101 for Beginners Git 101 for Beginners
Git 101 for Beginners
Anurag Upadhaya
 
He stopped using for/while loops, you won't believe what happened next!
He stopped using for/while loops, you won't believe what happened next!He stopped using for/while loops, you won't believe what happened next!
He stopped using for/while loops, you won't believe what happened next!
François-Guillaume Ribreau
 
Introduction to Git and GitHub
Introduction to Git and GitHubIntroduction to Git and GitHub
The journey to GitOps
The journey to GitOpsThe journey to GitOps
The journey to GitOps
Nicola Baldi
 
KI University - Git internals
KI University - Git internalsKI University - Git internals
KI University - Git internals
mfkaptan
 
State of GeoTools 2012
State of GeoTools 2012State of GeoTools 2012
State of GeoTools 2012
Jody Garnett
 
Adopting F# at SBTech
Adopting F# at SBTechAdopting F# at SBTech
Adopting F# at SBTech
Antya Dev
 
OpenWhisk by Example - Auto Retweeting Example in Python
OpenWhisk by Example - Auto Retweeting Example in PythonOpenWhisk by Example - Auto Retweeting Example in Python
OpenWhisk by Example - Auto Retweeting Example in Python
CodeOps Technologies LLP
 

Similar to Realtime processing with storm presentation (20)

Storm - The Real-Time Layer Your Big Data's Been Missing
Storm - The Real-Time Layer Your Big Data's Been MissingStorm - The Real-Time Layer Your Big Data's Been Missing
Storm - The Real-Time Layer Your Big Data's Been Missing
 
掀起 Swift 的面紗
掀起 Swift 的面紗掀起 Swift 的面紗
掀起 Swift 的面紗
 
Code as Data workshop: Using source{d} Engine to extract insights from git re...
Code as Data workshop: Using source{d} Engine to extract insights from git re...Code as Data workshop: Using source{d} Engine to extract insights from git re...
Code as Data workshop: Using source{d} Engine to extract insights from git re...
 
The Open Source... Behind the Tweets
The Open Source... Behind the TweetsThe Open Source... Behind the Tweets
The Open Source... Behind the Tweets
 
Unleashing Twitter Data for Fun and Insight
Unleashing Twitter Data for Fun and InsightUnleashing Twitter Data for Fun and Insight
Unleashing Twitter Data for Fun and Insight
 
Unleashing twitter data for fun and insight
Unleashing twitter data for fun and insightUnleashing twitter data for fun and insight
Unleashing twitter data for fun and insight
 
Twitter Big Data
Twitter Big DataTwitter Big Data
Twitter Big Data
 
FOSDEM2018 Janus Lua plugin presentation
FOSDEM2018 Janus Lua plugin presentationFOSDEM2018 Janus Lua plugin presentation
FOSDEM2018 Janus Lua plugin presentation
 
TypeScript와 Flow: 
자바스크립트 개발에 정적 타이핑 도입하기
TypeScript와 Flow: 
자바스크립트 개발에 정적 타이핑 도입하기TypeScript와 Flow: 
자바스크립트 개발에 정적 타이핑 도입하기
TypeScript와 Flow: 
자바스크립트 개발에 정적 타이핑 도입하기
 
Git_new.pptx
Git_new.pptxGit_new.pptx
Git_new.pptx
 
Git, from the beginning
Git, from the beginningGit, from the beginning
Git, from the beginning
 
Twitter Stream Processing
Twitter Stream ProcessingTwitter Stream Processing
Twitter Stream Processing
 
Git 101 for Beginners
Git 101 for Beginners Git 101 for Beginners
Git 101 for Beginners
 
He stopped using for/while loops, you won't believe what happened next!
He stopped using for/while loops, you won't believe what happened next!He stopped using for/while loops, you won't believe what happened next!
He stopped using for/while loops, you won't believe what happened next!
 
Introduction to Git and GitHub
Introduction to Git and GitHubIntroduction to Git and GitHub
Introduction to Git and GitHub
 
The journey to GitOps
The journey to GitOpsThe journey to GitOps
The journey to GitOps
 
KI University - Git internals
KI University - Git internalsKI University - Git internals
KI University - Git internals
 
State of GeoTools 2012
State of GeoTools 2012State of GeoTools 2012
State of GeoTools 2012
 
Adopting F# at SBTech
Adopting F# at SBTechAdopting F# at SBTech
Adopting F# at SBTech
 
OpenWhisk by Example - Auto Retweeting Example in Python
OpenWhisk by Example - Auto Retweeting Example in PythonOpenWhisk by Example - Auto Retweeting Example in Python
OpenWhisk by Example - Auto Retweeting Example in Python
 

Recently uploaded

Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 

Recently uploaded (20)

Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 

Realtime processing with storm presentation

  • 2. Who are we? Dario Simonassi @ldsimonassi Gabriel Eisbruch @geisbruch Jonathan Leibiusky @xetorthio
  • 3. Agenda ● The problem ● Storm - Basic concepts ● Trident ● Deploying ● Some other features ● Contributions
  • 4. Tons of information arriving every second http://necromentia.files.wordpress.com/2011/01/20070103213517_iguazu.jpg
  • 5. Heavy workload http://www.computerkochen.de/fun/bilder/esel.jpg
  • 6. Do Re it alti me ! http://www.asimon.co.il/data/images/a10689.jpg
  • 7.
  • 8.
  • 9. Worker Complex Queue Worker Queue Queue Worker Worker Information Worker Source Queue DB Queue Worker Worker Worker Queue Queue Worker
  • 10. What is Storm? ● Is a highly distributed realtime computation system. ● Provides general primitives to do realtime computation. ● Can be used with any programming language. ● It's scalable and fault-tolerant.
  • 11. Example - Main Concepts https://github.com/geisbruch/strata2012
  • 12. Main Concepts - Example "Count tweets related to strataconf and show hashtag totals."
  • 13. Main Concepts - Spout ● First thing we need is to read all tweets related to strataconf from twitter ● A Spout is a special component that is a source of messages to be processed. Tweets Tuples Spout
  • 14. Main Concepts - Spout public class ApiStreamingSpout extends BaseRichSpout { SpoutOutputCollector collector; TwitterReader reader; public void nextTuple() { Tweet tweet = reader.getNextTweet(); if(tweet != null) collector.emit(new Values(tweet)); } public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) { reader = new TwitterReader(conf.get("track"), conf.get("user"), conf.get("pass")); this.collector = collector; } public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("tweet")); } } Tweets Tuples Spout
  • 15. Main Concepts - Bolt ● For every tweet, we find hashtags ● A Bolt is a component that does the processing. tweet hashtags hashtags stream tweet hashtags Spout HashtagsSplitter
  • 16. Main Concepts - Bolt public class HashtagsSplitter extends BaseBasicBolt { public void execute(Tuple input, BasicOutputCollector collector) { Tweet tweet = (Tweet)input.getValueByField("tweet"); for(String hashtag : tweet.getHashTags()){ collector.emit(new Values(hashtag)); } } public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("hashtag")); } } tweet hashtags tweet hashtags hashtags stream Spout HashtagsSplitter
  • 17. Main Concepts - Bolt public class HashtagsCounterBolt extends BaseBasicBolt { private Jedis jedis; public void execute(Tuple input, BasicOutputCollector collector) { String hashtag = input.getStringByField("hashtag"); jedis.hincrBy("hashs", key, val); } public void prepare(Map conf, TopologyContext context) { jedis = new Jedis("localhost"); } } tweet stream tweet hashtag hashtag hashtag Spout HashtagsSplitter HashtagsCounterBolt
  • 18. Main Concepts - Topology public static Topology createTopology() { TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("tweets-collector", new ApiStreamingSpout(), 1); builder.setBolt("hashtags-splitter", new HashTagsSplitter(), 2) .shuffleGrouping("tweets-collector"); builder.setBolt("hashtags-counter", new HashtagsCounterBolt(), 2) .fieldsGrouping("hashtags-splitter", new Fields("hashtags")); return builder.createTopology(); } fields shuffle HashtagsSplitter HashtagsCounterBolt Spout HashtagsSplitter HashtagsCounterBolt
  • 19. Example Overview - Read Tweet Spout Splitter Counter Redis This tweet has #hello #world hashtags!!!
  • 20. Example Overview - Split Tweet Spout Splitter Counter Redis This tweet has #hello #world hashtags!!! #hello #world
  • 21. Example Overview - Split Tweet Spout Splitter Counter Redis This tweet has #hello #world hashtags!! #hello incr(#hello) #hello = 1 #world incr(#world) #world = 1
  • 22. Example Overview - Split Tweet Spout Splitter Counter Redis #hello = 1 This tweet has #bye #world hashtags #world = 1
  • 23. Example Overview - Split Tweet Spout Splitter Counter Redis #hello = 1 #world #bye This tweet has #bye #world hashtags #world = 1
  • 24. Example Overview - Split Tweet Spout Splitter Counter Redis #hello = 1 #world #bye This tweet has #bye incr(#bye) #world hashtags #world = 2 incr(#world) #bye =1
  • 25. Main Concepts - In summary ○ Topology ○ Spout ○ Stream ○ Bolt
  • 26. Guaranteeing Message Processing http://www.steinborn.com/home_warranty.php
  • 27. Guaranteeing Message Processing HashtagsSplitter HashtagsCounterBolt Spout HashtagsSplitter HashtagsCounterBolt
  • 28. Guaranteeing Message Processing HashtagsSplitter HashtagsCounterBolt Spout HashtagsSplitter HashtagsCounterBolt _collector.ack(tuple);
  • 29. Guaranteeing Message Processing HashtagsSplitter HashtagsCounterBolt Spout HashtagsSplitter HashtagsCounterBolt
  • 30. Guaranteeing Message Processing HashtagsSplitter HashtagsCounterBolt Spout HashtagsSplitter HashtagsCounterBolt
  • 31. Guaranteeing Message Processing HashtagsSplitter HashtagsCounterBolt Spout HashtagsSplitter HashtagsCounterBolt void fail(Object msgId);
  • 32. Trident http://www.raytekmarine.com/images/trident.jpg
  • 33. Trident - What is Trident? ● High-level abstraction on top of Storm. ● Will make it easier to build topologies. ● Similar to Hadoop, Pig
  • 34. Trident - Topology TridentTopology topology = new TridentTopology(); hashtags hashtags tweet hashtags stream tweet Spout HashtagsSplitter Memory
  • 35. Trident - Stream Stream tweetsStream = topology .newStream("tweets", new ApiStreamingSpout()); tweet stream Spout
  • 36. Trident Stream hashtagsStream = tweetsStrem.each(new Fields("tweet"),new BaseFunction() { public void execute(TridentTuple tuple, TridentCollector collector) { Tweet tweet = (Tweet)tuple.getValueByField("tweet"); for(String hashtag : tweet.getHashtags()) collector.emit(new Values(hashtag)); } }, new Fields("hashtag") ); tweet hashtags hashtags stream tweet hashtags Spout HashtagsSplitter
  • 37. Trident - States TridentState hashtagCounts = hashtagsStream.groupBy(new Fields("hashtag")) .persistentAggregate(new MemoryMapState.Factory(), new Count(), new Fields("count")); hashtags hashtags tweet hashtags stream tweet Memory Spout HashtagsSplitter (State)
  • 38. Trident - States TridentState hashtagCounts = hashtagsStream.groupBy(new Fields("hashtag")) .persistentAggregate(new MemoryMapState.Factory(), new Count(), new Fields("count")); hashtags hashtags tweet hashtags stream tweet Memory Spout HashtagsSplitter (State)
  • 39. Trident - States TridentState hashtagCounts = hashtagsStream.groupBy(new Fields("hashtag")) .persistentAggregate(new MemoryMapState.Factory(), new Count(), new Fields("count")); hashtags hashtags tweet hashtags stream tweet Memory Spout HashtagsSplitter (State)
  • 40. Trident - States TridentState hashtagCounts = hashtagsStream.groupBy(new Fields("hashtag")) .persistentAggregate(new MemoryMapState.Factory(), new Count(), new Fields("count")); hashtags hashtags tweet hashtags stream tweet Memory Spout HashtagsSplitter (State)
  • 41. Trident - Complete example TridentState hashtags = topology .newStream("tweets", new ApiStreamingSpout()) .each(new Fields("tweet"),new HashTagSplitter(), new Fields("hashtag")) .groupBy(new Fields("hashtag")) .persistentAggregate(new MemoryMapState.Factory(), new Count(), new Fields("count")) hashtags hashtags tweet hashtags stream tweet Memory Spout HashtagsSplitter (State)
  • 43. Transactions ● Exactly once message processing semantic
  • 44. Transactions ● The intermediate processing may be executed in a parallel way
  • 45. Transactions ● But persistence should be strictly sequential
  • 46. Transactions ● Spouts and persistence should be designed to be transactional
  • 47. Trident - Queries We want to know on-line the count of tweets with hashtag hadoop and the amount of tweets with strataconf hashtag
  • 49. Trident - Query DRPCClient drpc = new DRPCClient("MyDRPCServer", "MyDRPCPort"); drpc.execute("counts", "strataconf hadoop");
  • 50. Trident - Query Stream drpcStream = topology.newDRPCStream("counts",drpc)
  • 51. Trident - Query Stream singleHashtag = drpcStream.each(new Fields("args"), new Split(), new Fields("hashtag")) DRPC Hashtag DRPC stream Splitter Server
  • 52. Trident - Query Stream hashtagsCountQuery = singleHashtag.stateQuery(hashtagCountMemoryState, new Fields("hashtag"), new MapGet(), new Fields("count")); DRPC Hashtag Hashtag Count stream Splitter Query DRPC Server
  • 53. Trident - Query Stream drpcCountAggregate = hashtagsCountQuery.groupBy(new Fields("hashtag")) .aggregate(new Fields("count"), new Sum(), new Fields("sum")); DRPC Hashtag Hashtag Count Aggregate DRPC stream Splitter Query Count Server
  • 54. Trident - Query - Complete Example topology.newDRPCStream("counts",drpc) .each(new Fields("args"), new Split(), new Fields("hashtag")) .stateQuery(hashtags, new Fields("hashtag"), new MapGet(), new Fields("count")) .each(new Fields("count"), new FilterNull()) .groupBy(new Fields("hashtag")) .aggregate(new Fields("count"), new Sum(), new Fields("sum"));
  • 55. Trident - Query - Complete Example DRPC Hashtag Hashtag Count Aggregate DRPC stream Splitter Query Count Server
  • 56. Trident - Query - Complete Example Client Call DRPC Hashtag Hashtag Count Aggregate DRPC stream Splitter Query Count Server
  • 57. Trident - Query - Complete Example Hold the Execute Request DRPC Hashtag Hashtag Count Aggregate DRPC stream Splitter Query Count Server
  • 58. Trident - Query - Complete Example Request continue holded DRPC Hashtag Hashtag Count Aggregate DRPC stream Splitter Query Count Server On finish return to DRPC server
  • 59. Trident - Query - Complete Example DRPC Hashtag Hashtag Count Aggregate DRPC stream Splitter Query Count Server Return response to the client
  • 61. Running Topologies - Local Cluster Config conf = new Config(); conf.put("some_property","some_value") LocalCluster cluster = new LocalCluster(); cluster.submitTopology("twitter-hashtag-summarizer", conf, builder.createTopology()); ● The whole topology will run in a single machine. ● Similar to run in a production cluster ● Very Useful for test and development ● Configurable parallelism ● Similar methods than StormSubmitter
  • 62. Running Topologies - Production package twitter.streaming; public class Topology { public static void main(String[] args){ StormSubmitter.submitTopology("twitter-hashtag-summarizer", conf, builder.createTopology()); } } mvn package storm jar Strata2012-0.0.1.jar twitter.streaming.Topology "$REDIS_HOST" $REDIS_PORT "strata,strataconf" $TWITTER_USER $TWITTER_PASS
  • 63. Running Topologies - Cluster Layout ● Single Master process Storm Cluster Supervisor ● No SPOF (where the processing is done) ● Automatic Task distribution Nimbus Supervisor (The master (where the processing is done) Process) ● Amount of task configurable by process Supervisor (where the processing is done) ● Automatic tasks re-distribution on supervisors fails Zookeeper Cluster (Only for coordination and cluster state) ● Work continue on nimbus fails
  • 64. Running Topologies - Cluster Architecture - No SPOF Storm Cluster Supervisor (where the processing is done) Nimbus Supervisor (The master Process) (where the processing is done) Supervisor (where the processing is done)
  • 65. Running Topologies - Cluster Architecture - No SPOF Storm Cluster Supervisor (where the processing is done) Nimbus Supervisor (The master Process) (where the processing is done) Supervisor (where the processing is done)
  • 66. Running Topologies - Cluster Architecture - No SPOF Storm Cluster Supervisor (where the processing is done) Nimbus Supervisor (The master Process) (where the processing is done) Supervisor (where the processing is done)
  • 68. Other features - Non JVM languages { ● Use Non-JVM Languages "conf": { "topology.message.timeout.secs": 3, // etc ● Simple protocol }, "context": { ● Use stdin & stdout for communication "task->component": { "1": "example-spout", "2": "__acker", ● Many implementations and abstraction "3": "example-bolt" layers done by the community }, "taskid": 3 }, "pidDir": "..." }
  • 69. Contribution ● A collection of community developed spouts, bolts, serializers, and other goodies. ● You can find it in: https://github.com/nathanmarz/storm-contrib ● Some of those goodies are: ○ cassandra: Integrates Storm and Cassandra by writing tuples to a Cassandra Column Family. ○ mongodb: Provides a spout to read documents from mongo, and a bolt to write documents to mongo. ○ SQS: Provides a spout to consume messages from Amazon SQS. ○ hbase: Provides a spout and a bolt to read and write from HBase. ○ kafka: Provides a spout to read messages from kafka. ○ rdbms: Provides a bolt to write tuples to a table. ○ redis-pubsub: Provides a spout to read messages from redis pubsub. ○ And more...