Storm: Distributed and fault tolerant realtime computation

786 views
618 views

Published on

Published in: Engineering, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
786
On SlideShare
0
From Embeds
0
Number of Embeds
21
Actions
Shares
0
Downloads
30
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Storm: Distributed and fault tolerant realtime computation

  1. 1. Storm: Distributed and fault-tolerant realtime computation Ferran Galí i Reniu @ferrangali 19/06/2014
  2. 2. Ferran Galí i Reniu ● UPC - FIB ● Trovit ○ Hadoop ○ Lucene/Solr ○ Storm
  3. 3. Big Data ● Too much data ○ Store ○ Compute ○ Analyse ● Distributed systems ○ Provide horizontal scalability
  4. 4. ● Hadoop Distributed Systems HDFS HDFS HDFS File
  5. 5. ● Hadoop Distributed Systems HDFS MapReduce HDFS MapReduce HDFS MapReduce File
  6. 6. Distributed Systems ● Hadoop ○ Huge files ○ Useful for batch ○ High latency ○ No real time
  7. 7. Storm “Storm is a distributed realtime computation system. Storm provides a set of general primitives for doing realtime computation. Storm is simple, can be used with any programming language, is used by many companies, and is a lot of fun to use!” http://storm.incubator.apache.org/
  8. 8. Storm ● Who’s using it?
  9. 9. ● Tuple ○ Ordered list of elements ○ Any type Storm String Integer Serialized Object ...
  10. 10. Storm ● Stream ○ Unbounded sequence of tuples Tuple Tuple Tuple Tuple Tuple Tuple Tuple
  11. 11. Storm ● Spout ○ Source of streams ○ From data sources: Queues, API... Tuple Tuple Tuple Tuple Tuple
  12. 12. Storm ● Bolt ○ Consumes streams ○ Does some processing (transform, join,...) ○ Emits streams Tuple Tuple Tuple Tuple Tuple Tuple Tuple Tuple Tuple
  13. 13. Storm ● Topology ○ Graph of spouts & bolts ○ Runs forever
  14. 14. Architecture Nimbus Zookeeper Zookeeper Zookeeper Master Worker Worker Coordinator Supervisor Slot Slot Slot Slot Supervisor Slot Slot Slot Slot
  15. 15. Architecture Supervisor Slot Slot Slot Slot Worker process Single JVM Tasks - Threads
  16. 16. parallelism hint = 4 parallelism hint = 1 parallelism hint = 2 parallelism hint = 2 parallelism hint = 3 parallelism hint = 4 Supervisor Slot Slot Slot Slot Supervisor Slot Slot Slot Slot Worker processes = 8
  17. 17. parallelism hint = 4 parallelism hint = 1 parallelism hint = 2 parallelism hint = 2 parallelism hint = 3 parallelism hint = 4 Worker processes = 8 combined parallelism = 4 + 1 + 2 + 2 + 3 + 4 = 16 Tasks per worker = 16 / 8 = 2 Supervisor Supervisor
  18. 18. Example: Word Count line line line word word word File FileSpout SplitterBolt CounterBolt parallelism hint = 2 parallelism hint = 3 parallelism hint = 2
  19. 19. SplitterBoltFileSpout Example: Word Count CounterBolt Storm is a distributed realtime computation system. Storm provides a set of general primitives for doing realtime computation. Storm is simple, can be used with any programming language, is used by many companies, and is a lot of fun to use!
  20. 20. SplitterBoltFileSpout Example: Word Count CounterBolt Storm is a distributed Storm is a distributed realtime computation system. Storm provides a set of general primitives for doing realtime computation. Storm is simple, can be used with any programming language, is used by many companies, and is a lot of fun to use!
  21. 21. SplitterBoltFileSpout Example: Word Count CounterBolt Storm is a distributed Storm is a distributed realtime computation system. Storm provides a set of general primitives for doing realtime computation. Storm is simple, can be used with any programming language, is used by many companies, and is a lot of fun to use! realtime computation system. Storm provides a
  22. 22. SplitterBoltFileSpout Example: Word Count CounterBolt Storm is a distributed Storm is a distributed realtime computation system. Storm provides a set of general primitives for doing realtime computation. Storm is simple, can be used with any programming language, is used by many companies, and is a lot of fun to use! realtime computation system. Storm provides a shuffle grouping
  23. 23. SplitterBoltFileSpout Example: Word Count CounterBolt Storm is a distributed Storm is a distributed realtime computation system. Storm provides a set of general primitives for doing realtime computation. Storm is simple, can be used with any programming language, is used by many companies, and is a lot of fun to use! realtime computation system. Storm provides a Storm a is distributed realtime computation system provides Storm a shuffle grouping
  24. 24. SplitterBoltFileSpout Example: Word Count CounterBolt Storm is a distributed Storm is a distributed realtime computation system. Storm provides a set of general primitives for doing realtime computation. Storm is simple, can be used with any programming language, is used by many companies, and is a lot of fun to use! realtime computation system. Storm provides a Storm a is distributed realtime computation system provides Storm a Storm a is distributed realtime computation system provides Storm a x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 shuffle grouping
  25. 25. SplitterBoltFileSpout Example: Word Count CounterBolt Storm is a distributed Storm is a distributed realtime computation system. Storm provides a set of general primitives for doing realtime computation. Storm is simple, can be used with any programming language, is used by many companies, and is a lot of fun to use! realtime computation system. Storm provides a shuffle grouping a is Storm distributed provides a Storm is distributed realtime computation system a x2 x1 x1 x1 x2 x1 x1 x1 realtime computation provides fields grouping system Storm
  26. 26. Groupings ● Shuffle grouping ● Fields grouping ● All grouping ● Global grouping ● Direct grouping ● Local or shuffle grouping
  27. 27. Fault-tolerance Nimbus Zookeeper Zookeeper Zookeeper Supervisor Supervisor
  28. 28. ● Worker dies ○ Supervisor will restart it ● Worker dies too many times ○ Nimbus will reassign it to another node ● Node dies ○ Nimbus will reassign task to another node ● Nimbus is not a SPOF ● Nimbus & Supervisors are fail-fast Fault-tolerance
  29. 29. Guaranteeing message processing ● Through API ○ ack ○ fail ● Manual tuple replay ○ e.g: Spout emits again message with specific id
  30. 30. Guaranteeing message processing ● When is a message “fully processed”? ● Solutions ○ Transactional Topologies ○ Trident framework Storm is a distributed Storm is distributed a Ok Fail Ok Ok
  31. 31. Yet another example tweet tweet tweet word word word TwitterSpout SplitterBolt CounterBolt CommitBolt signal signal signal DB shuffle grouping fields grouping all grouping https://github.com/ferrangali/betabeers-storm
  32. 32. Batch + Real time ● Lambda architecture Serving Batch layer ● High latency ● Reprocesses all data New data
  33. 33. Batch + Real time ● Lambda architecture Speed layer Serving Batch layer ● Low latency ● Fast & incremental algorithms ● Eventually overridden by batch layer ● High latency ● Reprocesses all data New data
  34. 34. Storm ● Who’s using it?
  35. 35. Trovit ● 40 countries ● 5 verticals ● Hundreds of millions of ads
  36. 36. Trovit ● Batch layer: ○ MapReduce pipeline over HDFS HDFS Filter Enrich Dedup Index kafka xml
  37. 37. Trovit ● Speed layer ○ Storm topology ad ad ad ad ad ad rich ad rich ad rich ad Feeds Spout Kafka Spout Processor Bolt Indexer Bolt Group by index Commit in batch every 5 minutes kafka xml
  38. 38. Trovit HDFS Filter Enrich Dedup Index ad ad ad ad ad ad richad richad richad HBaseZookeeper kafka xml
  39. 39. Questions? Ferran Galí i Reniu @ferrangali 19/06/2014

×