Realtime Computation with Storm
Upcoming SlideShare
Loading in...5
×
 

Realtime Computation with Storm

on

  • 2,364 views

Storm is a distributed realtime computation system. Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing realtime ...

Storm is a distributed realtime computation system. Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing realtime computation. Storm is simple, can be used with any programming language, and is a lot of fun to use!

Statistics

Views

Total Views
2,364
Views on SlideShare
2,343
Embed Views
21

Actions

Likes
5
Downloads
107
Comments
0

3 Embeds 21

http://wiki.kthcorp.com 12
https://www.linkedin.com 5
http://www.linkedin.com 4

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • \n
  • C - Best accessible distributed realtime computation system going\nA - Learn about and start using Storm\nB - You will get a great new tool in your technology stack - interesting uses\n
  • CEP - continuous\n\nNot HFT-grade\n\n
  • \n
  • scaling is painful\npoor fault tolerance\ncoding is hard\n
  • \n
  • \n
  • tweets stock ticks manufacturing machine data sensor messages\n
  • \n
  • \n
  • \n
  • \n
  • DAG\n\nruns continuously\n
  • abstractions like Cascading, Hive, Pig make MR approachable\n\ncode size reduction\n
  • \n
  • \n
  • kestrel - via thrift\nkafka - transactional topologies, idempotentcy, process only once\nactivemq\n
  • \n
  • current architecture\n\ndata ingest tool for hadoop (avoid Flume madness)\n
  • new architecture\n
  • \n
  • Trending Topics (stream processing of the firehose)\ncomputing the ‘reach’ of a URL (Dist RPC)\n
  • \n
  • Android devices, sampling geo every 5 seconds\nroute optimization\nroad tax reduction\nidle alerts\n
  • C - Exciting times, much like Hadoop/NoSQL beginning\nA - Start tinkering with Storm, integrate into your workflows\nB - be more responsive in turning data into information\n

Realtime Computation with Storm Realtime Computation with Storm Presentation Transcript

  • Realtime Computation with Storm Brad Anderson banderson@maprtech.com @boorad
  • Definition & Overview Interoperability Use Cases
  • Stream Processing CEP Distributed RPC
  • Before StormQueues Workers
  • Example (simplified)
  • StormGuaranteed data processingHorizontal scalabilityFault-toleranceNo intermediate message brokers!Higher level abstraction than message passing
  • Concepts
  • streamsTuple Tuple Tuple Tuple Tuple Tuple Tuple Unbounded sequence of tuples
  • spoutsSource of streams
  • spoutspublic interface ISpout extends Serializable { void open(Map conf, TopologyContext context, SpoutOutputCollector collector); void close(); void nextTuple(); void ack(Object msgId); void fail(Object msgId);}
  • boltsProcesses input streams and produces new streams
  • boltspublic class DoubleAndTripleBolt extends BaseRichBolt { private OutputCollectorBase _collector; public void prepare(Map conf, TopologyContext context, OutputCollectorBase collector) { _collector = collector; } public void execute(Tuple input) { int val = input.getInteger(0); _collector.emit(input, new Values(val*2, val*3)); _collector.ack(input); } public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("double", "triple")); }}
  • topologiesNetwork of spouts and bolts
  • TridentCascading for Storm
  • TridentTopology topology = new TridentTopology();TridentState wordCounts = topology.newStream("spout1", spout) .each(new Fields("sentence"), new Split(), new Fields("word")) .groupBy(new Fields("word")) .persistentAggregate(new MemoryMapState.Factory(), new Count(), new Fields("count")) .parallelismHint(6);
  • Interoperability
  • spouts•Kafka (with transactions)• Kestrel• JMS• AMQP• Beanstalkd
  • bolts• Functions• Filters• Aggregation• Joins• Talk to databases, Hadoop write-behind
  • Storm realtime processes Queue AppsRawData Business Value Hadoop batch processes
  • Storm realtime processes Queue AppsRawData Business Value Hadoop Parallel Cluster Ingest batch processes
  • Storm realtime processes Queue AppsRawData Business Value Hadoop batch processes
  • Storm realtime processes AppsRawData Business Value Hadoop batch processes
  • Use Cases
  • Twitter Follower Distinct Tweeter Follower follower Follower Distinct URL Tweeter follower Reach Follower Follower Distinct Tweeter follower Follower
  • Heartbyte
  • Fleet Logistics
  • http://github.com/{tdunning | boorad}/mapr-spout Brad Anderson banderson@maprtech.com @boorad
  • Thank you.http://github.com/{tdunning | boorad}/mapr-spout Brad Anderson banderson@maprtech.com @boorad