Successfully reported this slideshow.
Your SlideShare is downloading. ×

storm at twitter

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 46 Ad
Advertisement

More Related Content

Viewers also liked (20)

Similar to storm at twitter (20)

Advertisement

Recently uploaded (20)

storm at twitter

  1. storm stream processing @twitter Krishna Gade Twitter @krishnagade Sunday, June 16, 13
  2. what is storm? storm is a platform for doing analysis on streams of data as they come in, so you can react to data as it happens. Sunday, June 16, 13
  3. storm v hadoop storm & hadoop are complementary! hadoop => big batch processing storm => fast, reactive, real time processing Sunday, June 16, 13
  4. origins • originated at backtype, acquired by twitter in 2011. • to vastly simplify dealing with queues & workers. Sunday, June 16, 13
  5. queue-worker model queues workers a a a a a Sunday, June 16, 13
  6. typical workflow queues queues workers workers data store Sunday, June 16, 13
  7. problems • scaling is painful - queue partitioning & worker deploy. • operational overhead - worker failures & queue backups. • no guarantees on data processing. Sunday, June 16, 13
  8. storm Sunday, June 16, 13
  9. what does storm provide? • at least once message processing. • horizontal scalability. • no intermediate queues. • less operational overhead. • “just works”. Sunday, June 16, 13
  10. storm primitives • streams • spouts • bolts • topologies Sunday, June 16, 13
  11. streams unbounded sequence of tuples T T T T T T T T T T T T T T T Sunday, June 16, 13
  12. spouts source of streams A A A A A A A A A A A A B B B B B B B B B B B B Sunday, June 16, 13
  13. typical spouts • read from a kestrel/kafka queue. {tuples = events} • read from a http server log. {tuples = http requests} • read from twitter streaming api. {tuples = tweets} Sunday, June 16, 13
  14. bolts process input stream - A produce output stream - B A A A A A A A A B B B B B B B B Sunday, June 16, 13
  15. bolts • filtering tuples in a stream. • aggregation of tuples. • joining multiple streams. • arbitrary functions on streams. • communication with external caches/ dbs. Sunday, June 16, 13
  16. topology directed-acyclic-graph of spouts and bolts. s1 s2 b1 b2 b3 b4 b5 Sunday, June 16, 13
  17. storm cluster nimbus supervisor w1 w2 w3 w4 supervisor w1 w2 w3 w4 ZK topology map sync code topology submission master node slave nodes Sunday, June 16, 13
  18. nimbus • master node. • manages the topologies. • job tracker in hadoop. $ storm jar myapp.jar com.twitter.MyTopology demo Sunday, June 16, 13
  19. supervisor • runs on slave nodes. • co-ordinates with zookeeper. • manages workers. Sunday, June 16, 13
  20. worker jvm process executor task task task task executor executor Sunday, June 16, 13
  21. recap • worker - process that executes a subset of a topology. • executor - a thread spawned by a worker. • task - performs the actual data processing. Sunday, June 16, 13
  22. stream grouping • shuffle grouping - random distribution of tuples. • field grouping - groups tuples by a field. • all grouping - replicates to all tasks. • global grouping - sends the entire stream to one task. Sunday, June 16, 13
  23. streaming word-count TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("tweet_spout", new RandomTweetSpout(), 5); builder.setBolt("parse_bolt", new ParseTweetBolt(), 8) .shuffleGrouping("tweet_spout") .setNumTasks(2); builder.setBolt("count_bolt", new WordCountBolt(), 12) .fieldsGrouping("parse_bolt", new Fields("word")); Config config = new Config(); config.setNumWorkers(3); StormSubmitter.submitTopology(“demo”, config, builder.createTopology()); Sunday, June 16, 13
  24. tweet spout class RandomTweetSpout extends BaseRichSpout { SpoutOutputCollector collector; Random rand; String[] tweets = new String[] { "@jkrums:There’s a plane in the Hudson. I’m on the ferry to pick up people. Crazy", "@barackobama: Four more years. pic.twitter.com/bAJE6Vom", ... }; .... @Override public void nextTuple() { Utils.sleep(100); String tweet = tweets[rand.nextInt(tweets.length)]; collector.emit(new Values(tweet)); } } Sunday, June 16, 13
  25. parse bolt class ParseTweetBolt extends BaseBasicBolt { @Override public void execute(Tuple tuple, BasicOutputCollector collector) { String tweet = tuple.getString(0); for (String word : tweet.split(" ")) { collector.emit(new Values(word)); } } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word")); } } Sunday, June 16, 13
  26. word count bolt class WordCountBolt extends BaseBasicBolt { Map<String, Integer> counts = new HashMap<String, Integer>(); @Override public void execute(Tuple tuple, BasicOutputCollector collector) { String word = tuple.getString(0); Integer count = counts.get(word); count = (count == null) ? 1 : count + 1; counts.put(word, count); collector.emit(new Values(word, count)); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word", "count")); } } Sunday, June 16, 13
  27. word-count topology RandomTweetSpout ParseTweetBolt WordCountBolt shuffle grouping fields grouping Sunday, June 16, 13
  28. how do we run storm @twitter ? Sunday, June 16, 13
  29. storm on mesos node node node node mesos we run multiple instances of storm on the same cluster via mesos. storm (production) storm (dev) provides efficient resource isolation and sharing across distributed frameworks such as storm. Sunday, June 16, 13
  30. topology isolation isolation scheduler solves the problem of multi-tenancy – avoiding resource contention between topologies, by providing full isolation between topologies. Sunday, June 16, 13
  31. topology isolation • shared pool - multiple topologies can run on the same host. • isolated pool - dedicated set of hosts to run a single topology. Sunday, June 16, 13
  32. topology isolation shared pool storm cluster Sunday, June 16, 13
  33. topology isolation shared pool storm cluster joe’s topology isolated pools Sunday, June 16, 13
  34. topology isolation shared pool storm cluster joe’s topology isolated pools jane’s topology Sunday, June 16, 13
  35. topology isolation shared pool storm cluster joe’s topology isolated pools jane’s topology dave’s topology Sunday, June 16, 13
  36. topology isolation X shared pool storm cluster joe’s topology isolated pools jane’s topology dave’s topology host failure Sunday, June 16, 13
  37. topology isolation shared pool storm cluster joe’s topology isolated pools jane’s topology dave’s topology repair hostadd host Sunday, June 16, 13
  38. topology isolation shared pool storm cluster joe’s topology isolated pools jane’s topology dave’s topology add to shared pool Sunday, June 16, 13
  39. numbers • benchmarked at a million tuples processed per second per node. • running 30 topologies in a 200 node cluster.. • processing 50 billion messages a day with an average complete latency under 50 ms. Sunday, June 16, 13
  40. storm use-cases @twitter Sunday, June 16, 13
  41. stream processing applications tweets favorites, retweets impressions twitter stormstreams spout bolt bolt $$$$ realtime dashboards new features Sunday, June 16, 13
  42. current use-cases • discovery of emerging topics/stories. • online learning of tweet features for search result ranking. • realtime analytics for ads. • internal log processing. Sunday, June 16, 13
  43. tweet scoring pipeline tweets data streams impressions interactions storm topology graph store metadata store join: tweets, impressions join: tweets, interactions last 7 days of: tweet -> feature_val, feature_type, timestamp persistent store: tweet -> feature_val, feature_type, timestamp thrift service cassandra twemcache input: tweet id output: score write tweet features Sunday, June 16, 13
  44. road ahead • auto scaling. • persistent bolts. • better grouping schemes. • replicated computation. • higher-level abstractions. Sunday, June 16, 13
  45. companies using storm Sunday, June 16, 13
  46. questions? krishna@twitter.com project: https://storm-project.net mailing-list: http://groups.google.com/ group/storm-user Sunday, June 16, 13

×