• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Twitter Storm
 

Twitter Storm

on

  • 6,786 views

Slides for internal techtalk about Twitter Storm

Slides for internal techtalk about Twitter Storm

Statistics

Views

Total Views
6,786
Views on SlideShare
6,615
Embed Views
171

Actions

Likes
27
Downloads
374
Comments
0

6 Embeds 171

http://www.scoop.it 118
https://twitter.com 41
https://si0.twimg.com 6
http://ams.activemailservice.com 4
http://www.instacurate.com 1
http://t.co 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Twitter Storm Twitter Storm Presentation Transcript

    • Twitter Storm Realtime distributed computations Sergey Lukjanov <slukjanov@mirantis.com> Dmitry Mescheryakov <dmescheryakov@mirantis.com>Wednesday, October 3, 12
    • Real-time data processing 2Wednesday, October 3, 12
    • Real-time data processing before Twitter Storm: network of queues and workers 2Wednesday, October 3, 12
    • Real-time data processing before Twitter Storm: network of queues and workers MESSAGES QUEUE 2Wednesday, October 3, 12
    • Real-time data processing before Twitter Storm: network of queues and workers Message routing can be complex! MESSAGES QUEUE 2Wednesday, October 3, 12
    • Real-time data processing MESSAGES QUEUE MESSAGES QUEUE MESSAGES QUEUE 3Wednesday, October 3, 12
    • Real-time data processing Queues replication is needed for reliability MESSAGES QUEUE MESSAGES QUEUE MESSAGES QUEUE 3Wednesday, October 3, 12
    • Real-time data processing Queues replication is needed for reliability MESSAGES QUEUE MESSAGES QUEUE Hard to maintain queues MESSAGES QUEUE 3Wednesday, October 3, 12
    • Real-time data processing Queues replication is needed for reliability MESSAGES QUEUE Each new computation branch requires routing MESSAGES QUEUE reconfiguration Hard to maintain queues MESSAGES QUEUE 3Wednesday, October 3, 12
    • Twitter Storm 4Wednesday, October 3, 12
    • Twitter Storm distributed; 4Wednesday, October 3, 12
    • Twitter Storm distributed; fault-tolerant; 4Wednesday, October 3, 12
    • Twitter Storm distributed; fault-tolerant; real-time; 4Wednesday, October 3, 12
    • Twitter Storm distributed; fault-tolerant; real-time; computation; 4Wednesday, October 3, 12
    • Twitter Storm distributed; fault-tolerant; real-time; computation; fail-fast components. 4Wednesday, October 3, 12
    • (Very) basic info 5Wednesday, October 3, 12
    • (Very) basic info created by Nathan Marz from Backtype/Twitter; 5Wednesday, October 3, 12
    • (Very) basic info created by Nathan Marz from Backtype/Twitter; Eclipse Public License 1.0; 5Wednesday, October 3, 12
    • (Very) basic info created by Nathan Marz from Backtype/Twitter; Eclipse Public License 1.0; open sourced at September 19th, 2011; 5Wednesday, October 3, 12
    • (Very) basic info created by Nathan Marz from Backtype/Twitter; Eclipse Public License 1.0; open sourced at September 19th, 2011; about 16k Java and 7k Clojure LoC; 5Wednesday, October 3, 12
    • (Very) basic info created by Nathan Marz from Backtype/Twitter; Eclipse Public License 1.0; open sourced at September 19th, 2011; about 16k Java and 7k Clojure LoC; most watched Java repo at Github (> 4k watchers); 5Wednesday, October 3, 12
    • (Very) basic info created by Nathan Marz from Backtype/Twitter; Eclipse Public License 1.0; open sourced at September 19th, 2011; about 16k Java and 7k Clojure LoC; most watched Java repo at Github (> 4k watchers); active UG. 5Wednesday, October 3, 12
    • Current status 6Wednesday, October 3, 12
    • Current status current stable release: 0.8.1; 6Wednesday, October 3, 12
    • Current status current stable release: 0.8.1; 0.8.2 with small bug fixes is already on the way; 6Wednesday, October 3, 12
    • Current status current stable release: 0.8.1; 0.8.2 with small bug fixes is already on the way; 0.9.0 with major core improvements is planned; 6Wednesday, October 3, 12
    • Current status current stable release: 0.8.1; 0.8.2 with small bug fixes is already on the way; 0.9.0 with major core improvements is planned; not very active contributions, we can try to get into; 6Wednesday, October 3, 12
    • Current status current stable release: 0.8.1; 0.8.2 with small bug fixes is already on the way; 0.9.0 with major core improvements is planned; not very active contributions, we can try to get into; used by over 30 companies (such as Twitter, Groupon, Alibaba, GumGum, etc). 6Wednesday, October 3, 12
    • Key properties 7Wednesday, October 3, 12
    • Key properties extremely broad set of use cases: streams processing; database updating; distributed rpc; 7Wednesday, October 3, 12
    • Key properties extremely broad set of use cases: streams processing; database updating; distributed rpc; scalable and extremely robust; 7Wednesday, October 3, 12
    • Key properties extremely broad set of use cases: streams processing; database updating; distributed rpc; scalable and extremely robust; guarantees no data loss; 7Wednesday, October 3, 12
    • Key properties extremely broad set of use cases: streams processing; database updating; distributed rpc; scalable and extremely robust; guarantees no data loss; fault-tolerant; 7Wednesday, October 3, 12
    • Key properties extremely broad set of use cases: streams processing; database updating; distributed rpc; scalable and extremely robust; guarantees no data loss; fault-tolerant; programming language agnostic. 7Wednesday, October 3, 12
    • Key concepts Tuples (ordered list of elements) 8Wednesday, October 3, 12
    • Key concepts Tuples (ordered list of elements) ( “Saratov”, “slukjanov”, “event1”, “10/3/12 16:20”) 8Wednesday, October 3, 12
    • Key concepts Streams (unbounded sequence of tuples) 9Wednesday, October 3, 12
    • Key concepts Streams (unbounded sequence of tuples) TUPLE TUPLE TUPLE TUPLE TUPLE 9Wednesday, October 3, 12
    • Key concepts Spouts (source of streams) 10Wednesday, October 3, 12
    • Key concepts Spouts (source of streams) TUPLE TUPLE TUPLE TUPLE TUPLE 10Wednesday, October 3, 12
    • Key concepts Spouts (source of streams) TUPLE TUPLE TUPLE TUPLE TUPLE Spouts can talk with: queues; logs; API calls; event data. 10Wednesday, October 3, 12
    • Key concepts Bolts (process tuples and create new streams) 11Wednesday, October 3, 12
    • Key concepts Bolts (process tuples and create new streams) LE TUP LE TUP LE TUP LE TUP LE TUP TUPLE TUPLE TUPLE TUPLE TUPLE TUP LE TUP LE TUP LE TUP LE TUP LE 11Wednesday, October 3, 12
    • Key concepts You can do the following things in Bolts: apply functions / transformations; filter; aggregation; streaming joins; access DBs, APIs, etc... 12Wednesday, October 3, 12
    • Key concepts Topologies (a directed graph of Spouts and Bolts) 13Wednesday, October 3, 12
    • Key concepts Topologies (a directed graph of Spouts and Bolts) LE TUP LE TUP LE TUP LE TUP LE TUP TUPLE TUPLE TUPLE TUPLE TUPLE TUP LE TUP LE TUP LE TUP LE TUP L 13Wednesday, October 3, 12
    • Key concepts Topologies (a directed graph of Spouts and Bolts) TUPLE TUPLE TUPLE TUPLE TUPLE TUPLE TUPLE TUPLE TUPLE TUPLE LE LE TUP TUP LE LE TUP TUP LE LE TUP TUP LE LE TUP TUP LE LE TUP TUP TUPLE TUPLE TUPLE TUPLE TUPLE TUPLE TUPLE TUPLE TUPLE TUPLE 14Wednesday, October 3, 12
    • Key concepts Tasks (instances of spouts and bolts) 15Wednesday, October 3, 12
    • Key concepts Tasks (instances of spouts and bolts) Task 1 Task 2 Task 3 Task 4 15Wednesday, October 3, 12
    • Key concepts Supervisor Cluster Zookeeper Supervisor Nimbus Zookeeper Supervisor Zookeeper Supervisor Supervisor 16Wednesday, October 3, 12
    • Key concepts Supervisor Cluster UI Zookeeper Supervisor Nimbus Zookeeper Supervisor Zookeeper Supervisor Supervisor 16Wednesday, October 3, 12
    • Key concepts Supervisor Cluster UI Zookeeper Supervisor Nimbus Zookeeper Supervisor Hadoop’s Job Zookeeper Supervisor tracker Supervisor 16Wednesday, October 3, 12
    • Key concepts Supervisor Cluster UI Zookeeper Supervisor Nimbus Zookeeper Supervisor Hadoop’s Job Zookeeper Supervisor tracker Hadoop’s Supervisor Task tracker 16Wednesday, October 3, 12
    • Based on 17Wednesday, October 3, 12
    • Based on Apache Zookeeper (maintaining configs); 17Wednesday, October 3, 12
    • Based on Apache Zookeeper (maintaining configs); MQ (transport layer); 17Wednesday, October 3, 12
    • Based on Apache Zookeeper (maintaining configs); MQ (transport layer); Apache Thrift (cross-language bridge, rpc); 17Wednesday, October 3, 12
    • Based on Apache Zookeeper (maintaining configs); MQ (transport layer); Apache Thrift (cross-language bridge, rpc); LMAX Disruptor (bounded prod-cons queue); 17Wednesday, October 3, 12
    • Based on Apache Zookeeper (maintaining configs); MQ (transport layer); Apache Thrift (cross-language bridge, rpc); LMAX Disruptor (bounded prod-cons queue); Kryo (serialization framework). 17Wednesday, October 3, 12
    • Grouping 18Wednesday, October 3, 12
    • Grouping shuffle (randomly and evenly distributed); 18Wednesday, October 3, 12
    • Grouping shuffle (randomly and evenly distributed); local or shuffle (local workers are preferred); 18Wednesday, October 3, 12
    • Grouping shuffle (randomly and evenly distributed); local or shuffle (local workers are preferred); fields (the stream is partitioned by specified fields); 18Wednesday, October 3, 12
    • Grouping shuffle (randomly and evenly distributed); local or shuffle (local workers are preferred); fields (the stream is partitioned by specified fields); all (the stream is replicated across all the bolt’s tasks); 18Wednesday, October 3, 12
    • Grouping shuffle (randomly and evenly distributed); local or shuffle (local workers are preferred); fields (the stream is partitioned by specified fields); all (the stream is replicated across all the bolt’s tasks); global (the entire stream goes to a single bolt’s task); 18Wednesday, October 3, 12
    • Grouping shuffle (randomly and evenly distributed); local or shuffle (local workers are preferred); fields (the stream is partitioned by specified fields); all (the stream is replicated across all the bolt’s tasks); global (the entire stream goes to a single bolt’s task); direct (producers could directly emit tuples); 18Wednesday, October 3, 12
    • Grouping shuffle (randomly and evenly distributed); local or shuffle (local workers are preferred); fields (the stream is partitioned by specified fields); all (the stream is replicated across all the bolt’s tasks); global (the entire stream goes to a single bolt’s task); direct (producers could directly emit tuples); custom (implement interface CustomStreamGrouping). 18Wednesday, October 3, 12
    • WordCount sample 19Wednesday, October 3, 12
    • WordCount sample random sentence generator; 19Wednesday, October 3, 12
    • WordCount sample random sentence generator; sentence splitter; 19Wednesday, October 3, 12
    • WordCount sample random sentence generator; sentence splitter; word counter; 19Wednesday, October 3, 12
    • WordCount sample random sentence generator; sentence splitter; word counter; ping spout (metronome). 19Wednesday, October 3, 12
    • WordCount sample 20Wednesday, October 3, 12
    • WordCount sample SENTENCE GENERATOR SENTENCE SENTENCE 20Wednesday, October 3, 12
    • WordCount sample SENTENCE SENTENCE GENERATOR SPLITTER SENTENCE SENTENCE WORD WORD 20Wednesday, October 3, 12
    • WordCount sample SENTENCE SENTENCE WORD GENERATOR SPLITTER COUNTER SENTENCE SENTENCE WORD WORD 20Wednesday, October 3, 12
    • WordCount sample SENTENCE SENTENCE GROUP WORD GENERATOR SPLITTER BY WORD COUNTER SENTENCE SENTENCE WORD WORD 20Wednesday, October 3, 12
    • WordCount sample SENTENCE SENTENCE GROUP WORD GENERATOR SPLITTER BY WORD COUNTER SENTENCE SENTENCE WORD WORD PING PING PING GENERATOR 20Wednesday, October 3, 12
    • WordCount sample SENTENCE SENTENCE GROUP WORD GENERATOR SPLITTER BY WORD COUNTER SENTENCE SENTENCE WORD WORD PING PING PING GENERATOR SOUT 20Wednesday, October 3, 12
    • WordCount sample SENTENCE SENTENCE GROUP WORD GENERATOR SPLITTER BY WORD COUNTER SENTENCE SENTENCE WORD WORD PING PING PING GENERATOR DB 20Wednesday, October 3, 12
    • Sentence generator public class RandSentenceGenerator extends BaseRichSpout { private SpoutOutputCollector collector; private Random random; private String[] sentences; @Override public void open(Map map, TopologyContext ctx, SpoutOutputCollector collector) { this.collector = collector; this.random = new Random(); this.sentences = <sentences array>; } @Override public void nextTuple() { Utils.sleep(10); String sentence = sentences[random.nextInt(sentences.length)]; collector.emit(new Values(sentence)); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("sentence")); } } 21Wednesday, October 3, 12
    • Sentence splitter public class SplitSentence extends BaseBasicBolt { @Override public void execute(Tuple tuple, BasicOutputCollector collector) { String sentence = tuple.getString(0); for (String word : sentence.split("s")) { collector.emit(new Values(word)); } } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word")); } } 22Wednesday, October 3, 12
    • Word count public class WordCount extends BaseBasicBolt { private HashMultiset<String> words = HashMultiset.create(); @Override public void prepare(Map conf, TopologyContext ctx) { super.prepare(conf, ctx); this.logger = Logger.getLogger(this.getClass()); this.name = ctx.getThisComponentId(); this.task = ctx.getThisTaskIndex(); } @Override public void execute(Tuple tuple, BasicOutputCollector collector) { String source = tuple.getSourceComponent(); if ("split".equals(source)) { words.add(tuple.getString(0)); } else if ("ping".equals(source)) { logger.warn("RESULT " + name + ":" + task + " :: " + words); } } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word", "count")); } } 23Wednesday, October 3, 12
    • Topology builder public class WordCounter { public static void main(String[] args) throws Exception { TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("source", new RandSentenceGenerator(), 3); builder.setSpout("ping", new PingSpout()); builder.setBolt("split", new SplitSentence(), 8) .shuffleGrouping("source"); builder.setBolt("count", new WordCount(), 12) .fieldsGrouping("split", new Fields("word")) .allGrouping("ping"); <topology submitting> } } 24Wednesday, October 3, 12
    • Topology submitter public class WordCounter { public static void main(String[] args) throws Exception { TopologyBuilder builder = new TopologyBuilder(); <building topology> Config conf = new Config(); conf.setDebug(true); conf.setNumWorkers(3); StormSubmitter.submitTopology("tplg-name", conf, builder.createTopology()); } } 25Wednesday, October 3, 12
    • Multilang support 26Wednesday, October 3, 12
    • Multilang support DSLs for Scala, JRuby and Clojure; 26Wednesday, October 3, 12
    • Multilang support DSLs for Scala, JRuby and Clojure; ShellSpout, ShellBolt; 26Wednesday, October 3, 12
    • Multilang support DSLs for Scala, JRuby and Clojure; ShellSpout, ShellBolt; json-based protocol: receive/emit tuples; ack/fail tuples; write to logs. 26Wednesday, October 3, 12
    • Online logs processing 27Wednesday, October 3, 12
    • Online logs processing Application RABBITMQ Application QUEUE Application Twitter Storm Cluster STORAGE CASSANDRA 27Wednesday, October 3, 12
    • Online logs processing 28Wednesday, October 3, 12
    • Online logs processing RABBITMQ 28Wednesday, October 3, 12
    • Online logs processing QUEUE CONSUMER LOG MESSAGES RABBITMQ 28Wednesday, October 3, 12
    • Online logs processing QUEUE MESSAGE CONSUMER PARSER SHUFFLE GROUPING MESSAGE MESSAGE LOG MESSAGES RABBITMQ 28Wednesday, October 3, 12
    • Online logs processing QUEUE MESSAGE EVENT CONSUMER PARSER AGGREGATOR SHUFFLE GROUPING FIELDS GROUPING MESSAGE MESSAGE EVENT EVENT LOG MESSAGES RABBITMQ 28Wednesday, October 3, 12
    • Online logs processing QUEUE MESSAGE EVENT CONSUMER PARSER AGGREGATOR SHUFFLE GROUPING FIELDS GROUPING MESSAGE MESSAGE EVENT EVENT LOG REALTIME MESSAGES INFO & STATS RABBITMQ CASSANDRA 28Wednesday, October 3, 12
    • Storm fault-tolerance 29Wednesday, October 3, 12
    • Storm fault-tolerance Parts of Storm cluster: Zookeeper nodes; Nimbus (master) node; Supervisor nodes. 29Wednesday, October 3, 12
    • Nimbus as a point of failure 30Wednesday, October 3, 12
    • Nimbus as a point of failure when Nimbus is down: topologies continue to work; tasks from failing nodes aren’t respawned; can’t upload a new topology or rebalance an old one; 30Wednesday, October 3, 12
    • Nimbus as a point of failure when Nimbus is down: topologies continue to work; tasks from failing nodes aren’t respawned; can’t upload a new topology or rebalance an old one; impossible to run Nimbus at another node: either fix the failed node; or create new and resubmit all topologies. 30Wednesday, October 3, 12
    • Tuple types spout tuple - emitted from Spouts; child tuple - emitted from Bolts, based on parent tuple(s) (child or spout ones). 31Wednesday, October 3, 12
    • Tuple types spout tuple - emitted from Spouts; child tuple - emitted from Bolts, based on parent tuple(s) (child or spout ones). [“the”] [“the”, 1] [“cow”] [“cow”, 1] [“the cow jumped [“jumped”] [“jumped”, 1] over the moon”] [“over”] [“over”, 1] [“the”] [“the”, 2] [“moon”] [“moon”, 1] 31Wednesday, October 3, 12
    • Reliability API Guaranties public class QueueConsumer extends BaseRichSpout { ... @Override public void nextTuple() { Message msg = queueClient.popMessage(); collector.emit(msg.getPayload(), msg.getId()); } @Override public void ack(Object msgId) { queueClient.ack(msgId); } @Override public void fail(Object msgId) { queueClient.fail(msgId); } ... } 32Wednesday, October 3, 12
    • Tuple tree tracking 33Wednesday, October 3, 12
    • Tuple tree tracking spout tuple creation: collector.emit(values, msgId); 33Wednesday, October 3, 12
    • Tuple tree tracking spout tuple creation: collector.emit(values, msgId); child tuple creation: collector.emit(parentTuples, values); 33Wednesday, October 3, 12
    • Tuple tree tracking spout tuple creation: collector.emit(values, msgId); child tuple creation: collector.emit(parentTuples, values); tuple end of processing: collector.ack(tuple); 33Wednesday, October 3, 12
    • Tuple tree tracking spout tuple creation: collector.emit(values, msgId); child tuple creation: collector.emit(parentTuples, values); tuple end of processing: collector.ack(tuple); tuple failed to process: collector.fail(tuple); 33Wednesday, October 3, 12
    • Disabling reliability API 34Wednesday, October 3, 12
    • Disabling reliability API globally: Config.TOPOLOGY_ACKER_EXECUTORS = 0; 34Wednesday, October 3, 12
    • Disabling reliability API globally: Config.TOPOLOGY_ACKER_EXECUTORS = 0; on topology level: collector.emit(values, msgId); 34Wednesday, October 3, 12
    • Disabling reliability API globally: Config.TOPOLOGY_ACKER_EXECUTORS = 0; on topology level: collector.emit(values, msgId); for a single tuple: collector.emit(parentTuples, values); 34Wednesday, October 3, 12
    • Acker system impl every tuple is assigned a random 64-bit ID 35Wednesday, October 3, 12
    • Acker system impl every tuple is assigned a random 64-bit ID [1] [2] [3] Spout Bolt A Bolt B Bolt C 35Wednesday, October 3, 12
    • Acker system impl every tuple is assigned a random 64-bit ID [1] [2] [3] Spout Bolt A Bolt B Bolt C [1] emit 35Wednesday, October 3, 12
    • Acker system impl every tuple is assigned a random 64-bit ID [1] [2] [3] Spout Bolt A Bolt B Bolt C [1] emit [2] emit 35Wednesday, October 3, 12
    • Acker system impl every tuple is assigned a random 64-bit ID [1] [2] [3] Spout Bolt A Bolt B Bolt C [1] emit [2] emit [1] ack 35Wednesday, October 3, 12
    • Acker system impl every tuple is assigned a random 64-bit ID [1] [2] [3] Spout Bolt A Bolt B Bolt C [1] emit [2] emit [1] ack [3] emit 35Wednesday, October 3, 12
    • Acker system impl every tuple is assigned a random 64-bit ID [1] [2] [3] Spout Bolt A Bolt B Bolt C [1] emit [2] emit [1] ack [3] emit [2] ack 35Wednesday, October 3, 12
    • Acker system impl every tuple is assigned a random 64-bit ID [1] [2] [3] Spout Bolt A Bolt B Bolt C [1] emit [2] emit [1] ack [3] emit [2] ack [3] ack 35Wednesday, October 3, 12
    • Acker - simplified algo 36Wednesday, October 3, 12
    • Acker - simplified algo tuple tree: { Spout tuple ID, set }; 36Wednesday, October 3, 12
    • Acker - simplified algo tuple tree: { Spout tuple ID, set }; message processing: [tuple_id] emit: set.add(tuple_id); [tuple_id] ack: set.remove(tuple_id); if (set.size == 0): send ack to parent spout. 36Wednesday, October 3, 12
    • Acker - real algo tuple tree: { Spout tuple ID, ackVal: int64 }; message processing: [tuple_id] emit: ackVal ^= tuple_id; [tuple_id] ack: ackVal ^= tuple_id; if (ackVal == 0): send ack to parent spout. 37Wednesday, October 3, 12
    • Correctness of the tracking 38Wednesday, October 3, 12
    • Correctness of the tracking bolt fails before sending ack for a tuple: no ack arrive before timeout, spout tuple fails; 38Wednesday, October 3, 12
    • Correctness of the tracking bolt fails before sending ack for a tuple: no ack arrive before timeout, spout tuple fails; acker fails before acking tuple tree processing: -- the same as above --; 38Wednesday, October 3, 12
    • Correctness of the tracking bolt fails before sending ack for a tuple: no ack arrive before timeout, spout tuple fails; acker fails before acking tuple tree processing: -- the same as above --; spout fails before acking message: the message source should handle client’s death. 38Wednesday, October 3, 12
    • Reliability API - Conclusion 39Wednesday, October 3, 12
    • Reliability API - Conclusion easy to dismiss: on message - at most one processing; 39Wednesday, October 3, 12
    • Reliability API - Conclusion easy to dismiss: on message - at most one processing; if using, little overhead and high durability: one message - at least one processing; 39Wednesday, October 3, 12
    • Reliability API - Conclusion easy to dismiss: on message - at most one processing; if using, little overhead and high durability: one message - at least one processing; with some further work (transactions, Trident API): one message - exactly one processing. 39Wednesday, October 3, 12
    • Transactional approach: design #1 40Wednesday, October 3, 12
    • Transactional approach: design #1 MESSAGE TUPLE COMMIT Spout Bolt A 40Wednesday, October 3, 12
    • Transactional approach: design #1 MESSAGE TUPLE COMMIT Spout Bolt A input provides messages in strong order; 40Wednesday, October 3, 12
    • Transactional approach: design #1 MESSAGE TUPLE COMMIT Spout Bolt A input provides messages in strong order; each message is assigned Transaction ID; 40Wednesday, October 3, 12
    • Transactional approach: design #1 MESSAGE TUPLE COMMIT Spout Bolt A input provides messages in strong order; each message is assigned Transaction ID; if (curr_tx_id > prev_tx_id) commit(result, curr_tx_id). 40Wednesday, October 3, 12
    • Transactional approach: design #2 BATCH OF BATCH MESSAGES OF TUPLES COMMIT Spout Bolt A input provides messages in strong order; each batch of messages is assigned Transaction ID; if (curr_tx_id > prev_tx_id) commit(result, curr_tx_id). 41Wednesday, October 3, 12
    • Transactional approach: design #3 the same as #2, but each transaction is split: processing phase; commit phase; 42Wednesday, October 3, 12
    • Transactional approach: design #3 the same as #2, but each transaction is split: processing phase; commit phase; process phases might intersect for difference transactions; 42Wednesday, October 3, 12
    • Transactional approach: design #3 the same as #2, but each transaction is split: processing phase; commit phase; process phases might intersect for difference transactions; commit phases go in strong order. 42Wednesday, October 3, 12
    • Trident API: Intro 43Wednesday, October 3, 12
    • Trident API: Intro high-level abstraction for doing realtime computations; 43Wednesday, October 3, 12
    • Trident API: Intro high-level abstraction for doing realtime computations; high throughput (millions of messages per second); 43Wednesday, October 3, 12
    • Trident API: Intro high-level abstraction for doing realtime computations; high throughput (millions of messages per second); stateful stream processing; 43Wednesday, October 3, 12
    • Trident API: Intro high-level abstraction for doing realtime computations; high throughput (millions of messages per second); stateful stream processing; low latency distributed querying; 43Wednesday, October 3, 12
    • Trident API: Intro high-level abstraction for doing realtime computations; high throughput (millions of messages per second); stateful stream processing; low latency distributed querying; different semantics (including exactly-once one); 43Wednesday, October 3, 12
    • Trident API: Intro high-level abstraction for doing realtime computations; high throughput (millions of messages per second); stateful stream processing; low latency distributed querying; different semantics (including exactly-once one); smth. like Pig or Cascading. 43Wednesday, October 3, 12
    • Trident API: Operations 44Wednesday, October 3, 12
    • Trident API: Operations partition-local operations (w/o network transfer): function, filter, partitionAggregate, stateQuery, etc; 44Wednesday, October 3, 12
    • Trident API: Operations partition-local operations (w/o network transfer): function, filter, partitionAggregate, stateQuery, etc; repartitioning operations (grouping); 44Wednesday, October 3, 12
    • Trident API: Operations partition-local operations (w/o network transfer): function, filter, partitionAggregate, stateQuery, etc; repartitioning operations (grouping); aggregations operations: aggregate, persistentAggregate; 44Wednesday, October 3, 12
    • Trident API: Operations partition-local operations (w/o network transfer): function, filter, partitionAggregate, stateQuery, etc; repartitioning operations (grouping); aggregations operations: aggregate, persistentAggregate; operations on grouped streams; 44Wednesday, October 3, 12
    • Trident API: Operations partition-local operations (w/o network transfer): function, filter, partitionAggregate, stateQuery, etc; repartitioning operations (grouping); aggregations operations: aggregate, persistentAggregate; operations on grouped streams; merges and joins. 44Wednesday, October 3, 12
    • Trident API: Demo TridentTopology topology = new TridentTopology(); TridentState wordCounts = topology.newStream("spout1", new FixedBatchSpout()) .each(new Fields("sentence"), new Split(), new Fields("word")) .groupBy(new Fields("word")) .persistentAggregate(new MemoryMapState.Factory(), new Count(), new Fields("count")) .parallelismHint(6); topology.newDRPCStream("words") .each(new Fields("args"), new Split(), new Fields("word")) .groupBy(new Fields("word")) .stateQuery(wordCounts, new Fields("word"), new MapGet(), new Fields("count")) .each(new Fields("count"), new FilterNull()) .aggregate(new Fields("count"), new Sum(), new Fields("sum")); Config config = new Config(); config.setMaxSpoutPending(100); cluster.submitTopology("word-count-tplg", config, topology.build()); DRPCClient client = new DRPCClient("drpc.server.host", 3772); System.out.println(client.execute("words", "cat dog the man")); System.out.println(client.execute("words", "cat")); // prints the JSON-encoded result, e.g.: "[[5078]]" 45Wednesday, October 3, 12
    • Q&A 46Wednesday, October 3, 12