• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Realtime Computation with Storm
 

Realtime Computation with Storm

on

  • 1,947 views

Storm is a distributed realtime computation system. Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing realtime ...

Storm is a distributed realtime computation system. Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing realtime computation. Storm is simple, can be used with any programming language, and is a lot of fun to use! We will talk about how Storm is architected, how to interoperate with Hadoop, and a few real-world use-cases.

Statistics

Views

Total Views
1,947
Views on SlideShare
1,947
Embed Views
0

Actions

Likes
1
Downloads
55
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Realtime Computation with Storm Realtime Computation with Storm Presentation Transcript

    • Realtime Computation with Storm Brad Anderson banderson@maprtech.com @boorad
    • Definition & Overview Interoperability Use Cases
    • Stream Processing CEP Distributed RPC
    • Source Data• Social Media • Weather Data Feeds • Auctions of Ad• Network Sensors Impressions• App/Web Logs • Payment• Stock Tick Data Transactions
    • Before StormQueues Workers
    • Example (simplified)
    • StormGuaranteed data processingHorizontal scalabilityFault-toleranceNo intermediate message brokers!Higher level abstraction than message passing“Just works”
    • Concepts
    • streamsTuple Tuple Tuple Tuple Tuple Tuple Tuple Unbounded sequence of tuples
    • spoutsSource of streams
    • spoutspublic  interface  ISpout  extends  Serializable  {        void  open(Map  conf,                            TopologyContext  context,                            SpoutOutputCollector  collector);        void  close();        void  nextTuple();        void  ack(Object  msgId);        void  fail(Object  msgId);}
    • boltsProcesses input streams and produces new streams
    • boltspublic  class  DoubleAndTripleBolt  extends  BaseRichBolt  {        private  OutputCollectorBase  _collector;        public  void  prepare(Map  conf,                                                TopologyContext  context,                                                OutputCollectorBase  collector)  {                _collector  =  collector;        }        public  void  execute(Tuple  input)  {                int  val  =  input.getInteger(0);                                _collector.emit(input,  new  Values(val*2,  val*3));                _collector.ack(input);        }        public  void  declareOutputFields(OutputFieldsDeclarer  declarer)  {                declarer.declare(new  Fields("double",  "triple"));        }        }
    • topologiesNetwork of spouts and bolts
    • topologies        TopologyBuilder builder = new TopologyBuilder();                builder.setSpout("spout", new RandomSentenceSpout(), 5);                builder.setBolt("split", new SplitSentence(), 8)                 .shuffleGrouping("spout");        builder.setBolt("count", new WordCount(), 12)                 .fieldsGrouping("split", new Fields("word"));
    • TridentCascading for Storm
    • Trident Facilities• Joins• Aggregations• Grouping• Functions• Filters• Consistent, Exactly-Once Semantics
    • TridentTopology  topology  =  new  TridentTopology();                TridentState  wordCounts  =          topology.newStream("spout1",  spout)              .each(new  Fields("sentence"),  new  Split(),  new  Fields("word"))              .groupBy(new  Fields("word"))              .persistentAggregate(new  MemoryMapState.Factory(),                                                        new  Count(),                                                        new  Fields("count"))                                              .parallelismHint(6);
    • Interoperability
    • spouts•Kafka (with transactions)• Kestrel• JMS• AMQP• Beanstalkd
    • bolts• Functions• Filters• Aggregation• Joins• Talk to databases, Hadoop write-behind
    • Storm realtime processes Queue AppsRawData Business Value Hadoop batch processes
    • Storm realtime processes Queue AppsRawData Business Value Hadoop Parallel Cluster Ingest batch processes
    • Storm realtime processes Apps Queue TailSpoutRawData Business Franz Value Hadoop batch processes
    • Storm realtime processes Apps TailSpoutRawData Business Franz Value Hadoop batch processes
    • Use Cases
    • Twitter Follower Distinct Tweeter Follower follower Follower Distinct URL Tweeter follower Reach Follower Follower Distinct Tweeter follower Follower
    • Heartbyte
    • Fleet Logistics
    • http://github.com/{tdunning | boorad}/mapr-spout Brad Anderson banderson@maprtech.com @boorad
    • Thank you.http://github.com/{tdunning | boorad}/mapr-spout Brad Anderson banderson@maprtech.com @boorad