Your SlideShare is downloading. ×
0
Introduction to Storm
Introduction to Storm
Introduction to Storm
Introduction to Storm
Introduction to Storm
Introduction to Storm
Introduction to Storm
Introduction to Storm
Introduction to Storm
Introduction to Storm
Introduction to Storm
Introduction to Storm
Introduction to Storm
Introduction to Storm
Introduction to Storm
Introduction to Storm
Introduction to Storm
Introduction to Storm
Introduction to Storm
Introduction to Storm
Introduction to Storm
Introduction to Storm
Introduction to Storm
Introduction to Storm
Introduction to Storm
Introduction to Storm
Introduction to Storm
Introduction to Storm
Introduction to Storm
Introduction to Storm
Introduction to Storm
Introduction to Storm
Introduction to Storm
Introduction to Storm
Introduction to Storm
Introduction to Storm
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Introduction to Storm

6,156

Published on

Published in: Technology
0 Comments
11 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
6,156
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
383
Comments
0
Likes
11
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. StormDistributed and fault-tolerant realtime computation system Chandler@PyHug previa [at] gmail.com
  • 2. Outline• Background• Why Strom• Component• Topology• Storm & DRPC• Multilang Protocol• Experience
  • 3. Background
  • 4. Background• Creates by Nathan Marz @ BackType/Twitter – Analyze twits, links, users on Twitter• Opensourced at Sep 2011 – Eclipse Public License 1.0 – Storm 0.5.2 – 16k java and 7k Clojure Loc – Current stable release 0.8.2 • 0.9.0 major core improvement
  • 5. Background• Active user group – https://groups.google.com/group/storm-user – https://github.com/nathanmarz/storm – Most watched java repo at GitHub (>4k watcher) – Used by over 30 companies • Twitter, Groupon, Alibaba, GumGum, ..
  • 6. Why Storm ?
  • 7. Before Storm
  • 8. Problems• Scale is painful• Poor fault-tolerance – Hadoop is stateful• Coding is tedious• Batch processing – Long latency – no realtime
  • 9. Storm• Scalable and robust – No persistent layer• Guarantees no data loss• Fault-tolerant• Programming language agnostic• Use case – Stream processing – Distributed RPC – Continues computation
  • 10. Components
  • 11. Base on• Apache Zookeeper – Distributed system, used to store metadata• ØMQ – Asynchronous message transport layer• Apache Thrift – Cross-language bridge, RPC• LMAX Disruptor – High performance queue shared by threads• Kryo – Serialization framework
  • 12. System architecture
  • 13. System architecture• Nimbus – Like JobtTacker in hadoop• Supervisor – Manage workers• Zookeeper – Store meta data• UI – Web-UI
  • 14. Topology
  • 15. Topology• Tuples – ordered list of elements – (“user”, “link”, “event”, “10/3/12 17:50“)• Streams – unbounded sequence of tuples
  • 16. Spouts• Source of streams• Example • Read from logs, API calls, event data, queues, …
  • 17. Spout• Interface ISpout – BaseRichSpout, ClojureSpout, DRPCSpout, FeederSpout, FixedTupleSpout, MasterBatchCoordinator, NoOpSpout, RichShellSpout, RichSpoutBatchTriggerer, ShellS pout, SpoutTracker, TestPlannerSpout, TestWordSpout, TransactionalSpoutCoordinator
  • 18. Topology• Bolts – Processes input streams and produces new streams – Example • Stream Joins, DBs, APIs, Filters, Aggregation, …
  • 19. Bolts• Interface Ibolt – BaseRichBolt, BasicBoltExecutor, BatchBoltExecutor, BoltTracker, ClojureBolt, Coordinate dBolt, JoinResult, KeyedFairBolt, NonRichBoltTracker, ReturnResults, BaseShellBolt, ShellBolt, TestAggregatesCounter, TestGlobalCount, TestPlannerBolt, TransactionalSpout BatchExecutor,TridentBoltExecutor, TupleCaptureBolt
  • 20. Topology• Topology – A directed graph of Spouts and Bolts
  • 21. Tasks• Instances of Spouts and Blots• Managed by Supervisor – http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm-topology/
  • 22. Stream grouping• All grouping – Send to all tasks• Global grouping – Pick task with lowest id• Shuffle grouping – Pick a random task• Fields grouping – Consistent hashing on a subset of tuple fields
  • 23. Storm fault-tolerance• Reliability API – Spout tuple creation • colloctor.emit(values, msgID); – Child tuple creation (Bolts) • colloctor.emit(parentTuples, values); – Tuple end of processing • collector.ack(tuple); – Tuple failed to process • collector.fail(tuple);
  • 24. Storm fault-tolerance• Disable reliability API – Globally • Config.TOPOLOGY_ACKER_EXECUTORS = 0 – On topology level • Collector.emit(values, msgID); – For a single tuple • Collector.emit(paranetTuples, values);
  • 25. Storm & DRPC
  • 26. Distributed RPC
  • 27. Multilang Protocol
  • 28. Multilang protocol• Using ShellSpout/ShellBolt• Process using stand in/out to communicate• Massage are encoded as JSON/ lines of plain text
  • 29. Three steps• Initiate a handshake – Keep track with process id – Send a json object to standard input while start – Contains • Storm configuration, topology, context, PID directory
  • 30. Three steps• Start looping – storm_sync would expect torm_ack• Read or write tuples – Follow defined structure – Implement read_msg(), storm_emit() ,…
  • 31. Experience
  • 32. Experience• Not hard to setup, but – Beware of certain version of Zookeeper – Wait a while after topology deployed• Fast, – Better use fabric• Stable – But beware of memory leak
  • 33. Reference
  • 34. Reference• “Getting started with Storm”, O’REILLY• Twitter Storm – Sergey Lukjanov@slideshare – http://www.slideshare.net/lukjanovsv/twitter-storm• Storm – nathanmarz@slideshare – http://www.slideshare.net/nathanmarz/storm-11164672• Realtime Analytics with Storm and Hadoop – Hadoop_Summit@slideshare – http://www.slideshare.net/Hadoop_Summit/realtime-analytics-with- storm
  • 35. Q/A
  • 36. Thanks

×