Your SlideShare is downloading. ×
0
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

BWB Meetup: Storm - distributed realtime computation system

807

Published on

torm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch …

torm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language, and is a lot of fun to use!

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
807
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
26
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • In Slide Show mode, click the arrow to enter the PowerPoint Getting Started Center.
  • In Slide Show mode, click the arrow to enter the PowerPoint Getting Started Center.
  • Transcript

    • 1. Storm: overview distributed and fault-tolerant realtime computation. Backend Web Berlin
    • 2. Storm www.storm-project.net Storm is a free and open source distributed realtime computation system. September BWB Meetup
    • 3. Use cases distributed RPC continuous computationsstream processing
    • 4. Overview • free and open source • integrates with any queuing and database system • distributed and scalable • fault-tolerant • supports multiple languages
    • 5. Scalable Storm topologies are inherently parallel and run across a cluster of machines. Different parts of the topology can be scaled individually by tweaking their parallelism. The "rebalance" command of the "storm" command line client can adjust the parallelism of running topologies on the fly.
    • 6. Fault tolerant When workers die, Storm will automatically restart them. If a node dies, the worker will be restarted on another node. The Storm daemons, Nimbus and the Supervisors, are designed to be stateless and fail-fast.
    • 7. Guarantees data processing Storm guarantees every tuple will be fully processed. One of Storm's core mechanisms is the ability to track the lineage of a tuple as it makes its way through the topology in an extremely efficient way. Messages are only replayed when there are failures. Storm's basic abstractions provide an at-least-once processing guarantee, the same guarantee you get when using a queueing system.
    • 8. Use with many languages Storm was designed from the ground up to be usable with any programming language. Similarly, spouts and bolts can be defined in any language. Non-JVM spouts and bolts communicate to Storm over a JSON-based protocol over stdin/stdout. Adapters that implement this protocol exist for Ruby, Python, Javascript, Perl, and PHP.
    • 9. How Storm works? Storm cluster Zookeeper Zookeeper Zookeeper Supervisor Supervisor Supervisor Supervisor Supervisor Nimbus
    • 10. How Storm works? Basic concepts Topology Topology is a graph of computation. A topology runs forever, or until you kill it. Stream Stream is an unbounded sequence of tuples. Spout Spout is a source of streams. Bolt Bolt is the place where calculations are done. Bolts can do anything from run functions, filter tuples, do streaming aggregations, joins, talk to databases etc.
    • 11. How Storm works? Basic concepts Worker process A worker process executes a subset of a topology. A worker process belongs to a specific topology and may run one or more executors for one or more components (spouts or bolts) of this topology. Executor (thread) Executor is a thread that is spawned by a worker process. It may run 1+ tasks for the same component. It always has 1 thread that it uses for all of its tasks. Task Task performs the actual data processing – each spout or bolt that you implement in your code executes as many tasks across the cluster. The number of tasks for a component is always the same throughout the lifetime of a topology.
    • 12. How Storm works? Basic concepts Spout Task1 Task2 BoltA Task1 Task2 Task3 BoltB Task1 Task2 BoltC Task1 Task2 Task3 Task4 Task5 Task6 BoltD Task1 Task2 Task3 BoltE Task1 Task2 BoltF Task1
    • 13. How Storm works? Topology Example class DemoTopology { TopologyBuilder builder = new TopologyBuilder(); builder.setSpout(“Spout", new DemoSpout(), 2).setNumTasks(2) .declareDefaultStream("uid", "item").declareStream(“item_copy", “uid”, "item"); builder.setBolt(“BoltA", new BoltA(), 2).setNumTasks(3).shuffleGrouping(“Spout“, “item_copy”); builder.setBolt(“BoltB", new BoltB(), 2).setNumTasks(2).shuffleGrouping(“Spout") .declareDefaultStream("uid", “fromB"); builder.setBolt(“BoltC", new BoltC(), 2).setNumTasks(6).shuffleGrouping(“BoltA") .declareDefaultStream("uid", “fromC"); builder.setBolt(“BoltD", new BoltD(), 3).setNumTasks(3).shuffleGrouping(“BoltC") .fieldsGrouping( “BoltC", new Fields("uid")).fieldsGrouping( “BoltB", new Fields("uid")) .declareStream("forD", "uid", "text").declareStream("forF", "uid", "text", "ne"); builder.setBolt(“BoltE", new BoltE(), 1).setNumTasks(2).shuffleGrouping(“BoltD“, “forE”); builder.setBolt(“BoltF", new BoltF(), 1).setNumTasks(1).shuffleGrouping(“BoltD“, “forF”); StormSubmitter.submitTopology(“demoTopology”, conf, builder.createTopology()); }
    • 14. How Storm works? Spout Example public class DemoSpout extends BaseRichSpout { …. @Override public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) { _collector = collector; _queue = new MyFavoritQueue<string>(); } @Override public void nextTuple() { String nextItem = queue.poll(); _collector.emit(new Values(nextItem)); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields(“item")); } }
    • 15. How Storm works? Bolt Example public class BoltA extends BaseRichBolt { private OutputCollector _collector; @Override public void execute(Tuple tuple) { Object obj = tuple.getValue(0); String capitalizedItem = capitalize((String)obj); _collector.emit(tuple, new Value(capitalizedItem)); _collector.ack(tuple); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields(“item")); } }
    • 16. Storm UI
    • 17. Read More about Storm • Storm http://storm-project.net/ • Example Storm Topologies https://github.com/nathanmarz/storm-starter • Implementing Real-Time Trending Topics With a Distributed Rolling Count Algorithm http://www.michael-noll.com/blog/2013/01/18/implementing-real-time-trending- topics-in-storm/ • Understanding the Internal Message Buffers of Storm http://www.michael-noll.com/blog/2013/06/21/understanding-storm-internal- message-buffers/ • Understanding the Parallelism of a Storm Topology http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of- a-storm-topology/
    • 18. Storm in our company ferret-go.com
    • 19. Ferret go GmbH Trend & Media Analytics ferret-go.com
    • 20. Our data flow (simplified) Twitter Facebook Google+ Blogs Comments Online media Offline media Reviews ElasticSearch ElasticSearch ElasticSearch processing classification analyzing
    • 21. Problem overview • we have a number of streams that spout items • for every item we do different calculations • at the end of calculations we save item into storage(s) – ElasticSearch, PostgreSQL etc. • if processing fails because of some environment issues, we want to re-queue item easily • some of our calculations can be done in parallel Google+ Twitter Facebook
    • 22. Solution • Redis-based queues for spouting • 1-2 spouts per topology • 1 bulk bolt for storage writing per worker • Storm cluster with 2 nodes: 32 Gb, CPU 4C-i7, Java 7, Ubuntu 12.04 • ~ 20 items per sec (could be increased) • 3 slots per worker, 198 tasks, 68 executors
    • 23. Thank you! 30.10.2013 September BWB Meetup Andrii Gakhov
    • 24. Storm: overview distributed and fault-tolerant realtime computation. Backend Web Berlin
    • 25. Storm www.storm-project.net Storm is a free and open source distributed realtime computation system. September BWB Meetup
    • 26. Use cases distributed RPC continuous computationsstream processing
    • 27. Overview • free and open source • integrates with any queuing and database system • distributed and scalable • fault-tolerant • supports multiple languages
    • 28. Scalable Storm topologies are inherently parallel and run across a cluster of machines. Different parts of the topology can be scaled individually by tweaking their parallelism. The "rebalance" command of the "storm" command line client can adjust the parallelism of running topologies on the fly.
    • 29. Fault tolerant When workers die, Storm will automatically restart them. If a node dies, the worker will be restarted on another node. The Storm daemons, Nimbus and the Supervisors, are designed to be stateless and fail-fast.
    • 30. Guarantees data processing Storm guarantees every tuple will be fully processed. One of Storm's core mechanisms is the ability to track the lineage of a tuple as it makes its way through the topology in an extremely efficient way. Messages are only replayed when there are failures. Storm's basic abstractions provide an at-least-once processing guarantee, the same guarantee you get when using a queueing system.
    • 31. Use with many languages Storm was designed from the ground up to be usable with any programming language. Similarly, spouts and bolts can be defined in any language. Non-JVM spouts and bolts communicate to Storm over a JSON-based protocol over stdin/stdout. Adapters that implement this protocol exist for Ruby, Python, Javascript, Perl, and PHP.
    • 32. How Storm works? Storm cluster Zookeeper Zookeeper Zookeeper Supervisor Supervisor Supervisor Supervisor Supervisor Nimbus
    • 33. How Storm works? Basic concepts Topology Topology is a graph of computation. A topology runs forever, or until you kill it. Stream Stream is an unbounded sequence of tuples. Spout Spout is a source of streams. Bolt Bolt is the place where calculations are done. Bolts can do anything from run functions, filter tuples, do streaming aggregations, joins, talk to databases etc.
    • 34. How Storm works? Basic concepts Worker process A worker process executes a subset of a topology. A worker process belongs to a specific topology and may run one or more executors for one or more components (spouts or bolts) of this topology. Executor (thread) Executor is a thread that is spawned by a worker process. It may run 1+ tasks for the same component. It always has 1 thread that it uses for all of its tasks. Task Task performs the actual data processing – each spout or bolt that you implement in your code executes as many tasks across the cluster. The number of tasks for a component is always the same throughout the lifetime of a topology.
    • 35. How Storm works? Basic concepts Spout Task1 Task2 BoltA Task1 Task2 Task3 BoltB Task1 Task2 BoltC Task1 Task2 Task3 Task4 Task5 Task6 BoltD Task1 Task2 Task3 BoltE Task1 Task2 BoltF Task1
    • 36. How Storm works? Topology Example class DemoTopology { TopologyBuilder builder = new TopologyBuilder(); builder.setSpout(“Spout", new DemoSpout(), 2).setNumTasks(2) .declareDefaultStream("uid", "item").declareStream(“item_copy", “uid”, "item"); builder.setBolt(“BoltA", new BoltA(), 2).setNumTasks(3).shuffleGrouping(“Spout“, “item_copy”); builder.setBolt(“BoltB", new BoltB(), 2).setNumTasks(2).shuffleGrouping(“Spout") .declareDefaultStream("uid", “fromB"); builder.setBolt(“BoltC", new BoltC(), 2).setNumTasks(6).shuffleGrouping(“BoltA") .declareDefaultStream("uid", “fromC"); builder.setBolt(“BoltD", new BoltD(), 3).setNumTasks(3).shuffleGrouping(“BoltC") .fieldsGrouping( “BoltC", new Fields("uid")).fieldsGrouping( “BoltB", new Fields("uid")) .declareStream("forD", "uid", "text").declareStream("forF", "uid", "text", "ne"); builder.setBolt(“BoltE", new BoltE(), 1).setNumTasks(2).shuffleGrouping(“BoltD“, “forE”); builder.setBolt(“BoltF", new BoltF(), 1).setNumTasks(1).shuffleGrouping(“BoltD“, “forF”); StormSubmitter.submitTopology(“demoTopology”, conf, builder.createTopology()); }
    • 37. How Storm works? Spout Example public class DemoSpout extends BaseRichSpout { …. @Override public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) { _collector = collector; _queue = new MyFavoritQueue<string>(); } @Override public void nextTuple() { String nextItem = queue.poll(); _collector.emit(new Values(nextItem)); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields(“item")); } }
    • 38. How Storm works? Bolt Example public class BoltA extends BaseRichBolt { private OutputCollector _collector; @Override public void execute(Tuple tuple) { Object obj = tuple.getValue(0); String capitalizedItem = capitalize((String)obj); _collector.emit(tuple, new Value(capitalizedItem)); _collector.ack(tuple); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields(“item")); } }
    • 39. Storm UI
    • 40. Read More about Storm • Storm http://storm-project.net/ • Example Storm Topologies https://github.com/nathanmarz/storm-starter • Implementing Real-Time Trending Topics With a Distributed Rolling Count Algorithm http://www.michael-noll.com/blog/2013/01/18/implementing-real-time-trending- topics-in-storm/ • Understanding the Internal Message Buffers of Storm http://www.michael-noll.com/blog/2013/06/21/understanding-storm-internal- message-buffers/ • Understanding the Parallelism of a Storm Topology http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of- a-storm-topology/
    • 41. Storm in our company ferret-go.com
    • 42. Ferret go GmbH Trend & Media Analytics ferret-go.com
    • 43. Our data flow (simplified) Twitter Facebook Google+ Blogs Comments Online media Offline media Reviews ElasticSearch ElasticSearch ElasticSearch processing classification analyzing
    • 44. Problem overview • we have a number of streams that spout items • for every item we do different calculations • at the end of calculations we save item into storage(s) – ElasticSearch, PostgreSQL etc. • if processing fails because of some environment issues, we want to re-queue item easily • some of our calculations can be done in parallel Google+ Twitter Facebook
    • 45. Solution • Redis-based queues for spouting • 1-2 spouts per topology • 1 bulk bolt for storage writing per worker • Storm cluster with 2 nodes: 32 Gb, CPU 4C-i7, Java 7, Ubuntu 12.04 • ~ 20 items per sec (could be increased) • 3 slots per worker, 198 tasks, 68 executors
    • 46. Thank you! 30.09.2013 September BWB Meetup Andrii Gakhov
    • 47. Storm: overview distributed and fault-tolerant realtime computation. Backend Web Berlin
    • 48. Storm www.storm-project.net Storm is a free and open source distributed realtime computation system. September BWB Meetup
    • 49. Use cases distributed RPC continuous computationsstream processing
    • 50. Overview • free and open source • integrates with any queuing and database system • distributed and scalable • fault-tolerant • supports multiple languages
    • 51. Scalable Storm topologies are inherently parallel and run across a cluster of machines. Different parts of the topology can be scaled individually by tweaking their parallelism. The "rebalance" command of the "storm" command line client can adjust the parallelism of running topologies on the fly.
    • 52. Fault tolerant When workers die, Storm will automatically restart them. If a node dies, the worker will be restarted on another node. The Storm daemons, Nimbus and the Supervisors, are designed to be stateless and fail-fast.
    • 53. Guarantees data processing Storm guarantees every tuple will be fully processed. One of Storm's core mechanisms is the ability to track the lineage of a tuple as it makes its way through the topology in an extremely efficient way. Messages are only replayed when there are failures. Storm's basic abstractions provide an at-least-once processing guarantee, the same guarantee you get when using a queueing system.
    • 54. Use with many languages Storm was designed from the ground up to be usable with any programming language. Similarly, spouts and bolts can be defined in any language. Non-JVM spouts and bolts communicate to Storm over a JSON-based protocol over stdin/stdout. Adapters that implement this protocol exist for Ruby, Python, Javascript, Perl, and PHP.
    • 55. How Storm works? Storm cluster Zookeeper Zookeeper Zookeeper Supervisor Supervisor Supervisor Supervisor Supervisor Nimbus
    • 56. How Storm works? Basic concepts Topology Topology is a graph of computation. A topology runs forever, or until you kill it. Stream Stream is an unbounded sequence of tuples. Spout Spout is a source of streams. Bolt Bolt is the place where calculations are done. Bolts can do anything from run functions, filter tuples, do streaming aggregations, joins, talk to databases etc.
    • 57. How Storm works? Basic concepts Worker process A worker process executes a subset of a topology. A worker process belongs to a specific topology and may run one or more executors for one or more components (spouts or bolts) of this topology. Executor (thread) Executor is a thread that is spawned by a worker process. It may run 1+ tasks for the same component. It always has 1 thread that it uses for all of its tasks. Task Task performs the actual data processing – each spout or bolt that you implement in your code executes as many tasks across the cluster. The number of tasks for a component is always the same throughout the lifetime of a topology.
    • 58. How Storm works? Basic concepts Spout Task1 Task2 BoltA Task1 Task2 Task3 BoltB Task1 Task2 BoltC Task1 Task2 Task3 Task4 Task5 Task6 BoltD Task1 Task2 Task3 BoltE Task1 Task2 BoltF Task1
    • 59. How Storm works? Topology Example class DemoTopology { TopologyBuilder builder = new TopologyBuilder(); builder.setSpout(“Spout", new DemoSpout(), 2).setNumTasks(2) .declareDefaultStream("uid", "item").declareStream(“item_copy", “uid”, "item"); builder.setBolt(“BoltA", new BoltA(), 2).setNumTasks(3).shuffleGrouping(“Spout“, “item_copy”); builder.setBolt(“BoltB", new BoltB(), 2).setNumTasks(2).shuffleGrouping(“Spout") .declareDefaultStream("uid", “fromB"); builder.setBolt(“BoltC", new BoltC(), 2).setNumTasks(6).shuffleGrouping(“BoltA") .declareDefaultStream("uid", “fromC"); builder.setBolt(“BoltD", new BoltD(), 3).setNumTasks(3).shuffleGrouping(“BoltC") .fieldsGrouping( “BoltC", new Fields("uid")).fieldsGrouping( “BoltB", new Fields("uid")) .declareStream("forD", "uid", "text").declareStream("forF", "uid", "text", "ne"); builder.setBolt(“BoltE", new BoltE(), 1).setNumTasks(2).shuffleGrouping(“BoltD“, “forE”); builder.setBolt(“BoltF", new BoltF(), 1).setNumTasks(1).shuffleGrouping(“BoltD“, “forF”); StormSubmitter.submitTopology(“demoTopology”, conf, builder.createTopology()); }
    • 60. How Storm works? Spout Example public class DemoSpout extends BaseRichSpout { …. @Override public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) { _collector = collector; _queue = new MyFavoritQueue<string>(); } @Override public void nextTuple() { String nextItem = queue.poll(); _collector.emit(new Values(nextItem)); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields(“item")); } }
    • 61. How Storm works? Bolt Example public class BoltA extends BaseRichBolt { private OutputCollector _collector; @Override public void execute(Tuple tuple) { Object obj = tuple.getValue(0); String capitalizedItem = capitalize((String)obj); _collector.emit(tuple, new Value(capitalizedItem)); _collector.ack(tuple); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields(“item")); } }
    • 62. Storm UI
    • 63. Read More about Storm • Storm http://storm-project.net/ • Example Storm Topologies https://github.com/nathanmarz/storm-starter • Implementing Real-Time Trending Topics With a Distributed Rolling Count Algorithm http://www.michael-noll.com/blog/2013/01/18/implementing-real-time-trending- topics-in-storm/ • Understanding the Internal Message Buffers of Storm http://www.michael-noll.com/blog/2013/06/21/understanding-storm-internal- message-buffers/ • Understanding the Parallelism of a Storm Topology http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of- a-storm-topology/
    • 64. Storm in our company ferret-go.com
    • 65. Ferret go GmbH Trend & Media Analytics ferret-go.com
    • 66. Our data flow (simplified) Twitter Facebook Google+ Blogs Comments Online media Offline media Reviews ElasticSearch ElasticSearch ElasticSearch processing classification analyzing
    • 67. Problem overview • we have a number of streams that spout items • for every item we do different calculations • at the end of calculations we save item into storage(s) – ElasticSearch, PostgreSQL etc. • if processing fails because of some environment issues, we want to re-queue item easily • some of our calculations can be done in parallel Google+ Twitter Facebook
    • 68. Solution • Redis-based queues for spouting • 1-2 spouts per topology • 1 bulk bolt for storage writing per worker • Storm cluster with 2 nodes: 32 Gb, CPU 4C-i7, Java 7, Ubuntu 12.04 • ~ 20 items per sec (could be increased) • 3 slots per worker, 198 tasks, 68 executors
    • 69. Thank you! 30.09.2013 September BWB Meetup Andrii Gakhov

    ×