Storm: distributed and fault-tolerant realtime computation
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Storm: distributed and fault-tolerant realtime computation

on

  • 91,602 views

 

Statistics

Views

Total Views
91,602
Views on SlideShare
82,841
Embed Views
8,761

Actions

Likes
176
Downloads
2,244
Comments
4

92 Embeds 8,761

http://www.readwriteweb.com 4646
http://readwrite.com 1694
http://1.bigdata.sinaapp.com 580
http://architecturalatrocities.com 570
http://paper.li 151
http://lanyrd.com 129
http://irr.posterous.com 124
http://feeds.feedburner.com 107
http://gaurko-haurrak.blogspot.com 75
http://m.readwriteweb.com 68
http://willolbrys.com 57
http://feedproxy.google.com 56
http://allenjoe.net 54
http://a0.twimg.com 35
http://bigdata.sinaapp.com 28
http://us-w1.rockmelt.com 28
http://www.linkedin.com 23
http://www.cnblogs.com 23
http://webcache.googleusercontent.com 23
http://snstouch.com 20
http://twitter.com 17
https://twitter.com 17
http://www.hanrss.com 14
http://mrmcfarlen.blogspot.com 13
http://chrisnews-unknown.blogspot.com 12
http://www.twylah.com 11
http://wiki.kthcorp.com 10
http://potap-m.com 9
http://nuevospowerpoints.blogspot.com 9
http://oschuba.cedar.buffalo.edu 8
http://semanticweb.collected.info 8
http://translate.googleusercontent.com 6
http://blog.fasoulas.com 6
http://www.newsisfree.com 6
http://presentworth.blogspot.com 5
http://readwriteweb.com 5
http://www.techgig.com 5
http://technohidelic.posterous.com 5
http://mobilemarketing.collected.info 4
http://socialmediasuperstars.collected.info 4
http://zizindrin.com 4
http://spacelounge.collected.info 4
http://digitalapprentices.collected.info 4
http://www.blogger.com 4
http://edu.ashdod.muni.il 4
http://elfarodemedianoche.blogspot.com 4
http://futureweb.collected.info 3
http://aitao.collected.info 3
http://local.einstein.golfe 3
http://ideafarming.collected.info 3
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Storm: distributed and fault-tolerant realtime computation Presentation Transcript

  • 1. StormDistributed and fault-tolerant realtime computation Nathan Marz Twitter
  • 2. Storm at Twitter Twitter Web Analytics
  • 3. Before StormQueues Workers
  • 4. Example (simplified)
  • 5. ExampleWorkers schemify tweets and append to Hadoop
  • 6. ExampleWorkers update statistics on URLs byincrementing counters in Cassandra
  • 7. ExampleDistribute tweets randomly on multiple queues
  • 8. ExampleWorkers share the load of schemifying tweets
  • 9. ExampleDesire all updates for same URL go to same worker
  • 10. Message locality• Because: • No transactions in Cassandra (and no atomic increments at the time) • More effective batching of updates
  • 11. Implementing message locality• Have a queue for each consuming worker• Choose queue for a URL using consistent hashing
  • 12. ExampleWorkers choose queue to enqueue to using hash/mod of URL
  • 13. Example All updates for same URLguaranteed to go to same worker
  • 14. Adding a worker
  • 15. Adding a worker DeployReconfigure/redeploy
  • 16. Problems• Scaling is painful• Poor fault-tolerance• Coding is tedious
  • 17. What we want• Guaranteed data processing• Horizontal scalability• Fault-tolerance• No intermediate message brokers!• Higher level abstraction than message passing• “Just works”
  • 18. StormGuaranteed data processingHorizontal scalabilityFault-toleranceNo intermediate message brokers!Higher level abstraction than message passing“Just works”
  • 19. Use cases Stream Distributed Continuousprocessing RPC computation
  • 20. Storm Cluster
  • 21. Storm ClusterMaster node (similar to Hadoop JobTracker)
  • 22. Storm ClusterUsed for cluster coordination
  • 23. Storm Cluster Run worker processes
  • 24. Starting a topology
  • 25. Killing a topology
  • 26. Concepts• Streams• Spouts• Bolts• Topologies
  • 27. StreamsTuple Tuple Tuple Tuple Tuple Tuple Tuple Unbounded sequence of tuples
  • 28. SpoutsSource of streams
  • 29. Spout examples• Read from Kestrel queue• Read from Twitter streaming API
  • 30. BoltsProcesses input streams and produces new streams
  • 31. Bolts• Functions• Filters• Aggregation• Joins• Talk to databases
  • 32. TopologyNetwork of spouts and bolts
  • 33. TasksSpouts and bolts execute asmany tasks across the cluster
  • 34. Stream groupingWhen a tuple is emitted, which task does it go to?
  • 35. Stream grouping• Shuffle grouping: pick a random task• Fields grouping: consistent hashing on a subset of tuple fields• All grouping: send to all tasks• Global grouping: pick task with lowest id
  • 36. Topologyshuffle [“id1”, “id2”] shuffle[“url”] shuffle all
  • 37. Streaming word countTopologyBuilder is used to construct topologies in Java
  • 38. Streaming word countDefine a spout in the topology with parallelism of 5 tasks
  • 39. Streaming word countSplit sentences into words with parallelism of 8 tasks
  • 40. Streaming word countConsumer decides what data it receives and how it gets groupedSplit sentences into words with parallelism of 8 tasks
  • 41. Streaming word count Create a word count stream
  • 42. Streaming word count splitsentence.py
  • 43. Streaming word count
  • 44. Streaming word count Submitting topology to a cluster
  • 45. Streaming word count Running topology in local mode
  • 46. Demo
  • 47. Traditional data processing
  • 48. Traditional data processing Intense processing (Hadoop, databases, etc.)
  • 49. Traditional data processingLight processing on a single machine to resolve queries
  • 50. Distributed RPCDistributed RPC lets you do intense processing at query-time
  • 51. Game changer
  • 52. Distributed RPCData flow for Distributed RPC
  • 53. DRPC ExampleComputing “reach” of a URL on the fly
  • 54. ReachReach is the number of unique people exposed to a URL on Twitter
  • 55. Computing reach Follower Distinct Tweeter Follower follower Follower DistinctURL Tweeter follower Count Reach Follower Follower Distinct Tweeter follower Follower
  • 56. Reach topology
  • 57. Guaranteeing message processing “Tuple tree”
  • 58. Guaranteeing message processing• A spout tuple is not fully processed until all tuples in the tree have been completed
  • 59. Guaranteeing message processing• If the tuple tree is not completed within a specified timeout, the spout tuple is replayed
  • 60. Guaranteeing message processing Reliability API
  • 61. Guaranteeing message processing“Anchoring” creates a new edge in the tuple tree
  • 62. Guaranteeing message processing Marks a single node in the tree as complete
  • 63. Guaranteeing message processing• Storm tracks tuple trees for you in an extremely efficient way
  • 64. Storm UI
  • 65. Storm UI
  • 66. Storm UI
  • 67. Storm on EC2https://github.com/nathanmarz/storm-deploy One-click deploy tool
  • 68. Documentation
  • 69. State spout (almost done) Synchronize a large amount of frequently changing state into a topology
  • 70. State spout (almost done)Optimizing reach topology by eliminating the database calls
  • 71. State spout (almost done) Each GetFollowers task keeps a synchronous cache of a subset of the social graph
  • 72. State spout (almost done)This works because GetFollowers repartitions the social graph the same way it partitions GetTweeter’s stream
  • 73. Future work• Storm on Mesos• “Swapping”• Auto-scaling• Higher level abstractions
  • 74. Questions?http://github.com/nathanmarz/storm
  • 75. What Storm does• Distributes code and configurations• Robust process management• Monitors topologies and reassigns failed tasks• Provides reliability by tracking tuple trees• Routing and partitioning of streams• Serialization• Fine-grained performance stats of topologies