Storm - Altamira University Presentation

1,114 views

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,114
On SlideShare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
39
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Storm - Altamira University Presentation

  1. 1. Apache Storm A distributed, real-time computation system Ryan Lanman Some content borrowed from Nathan Marz’ Presentation of a similar name
  2. 2. Objectives 1.Their Motivation 2.Our Motivation 3.Storm Basics 4.Demo
  3. 3. Their Motivation How Storm Came To Be
  4. 4. What They Wanted • • • • • • Guaranteed data processing Horizontal scalability Fault-tolerance No intermediate message brokers! Higher level abstraction than message passing “Just works”
  5. 5. Our Motivation Why We Chose Storm ^
  6. 6. Lumify Ingest Raw Data Text Extraction Entity Extraction Text Highlighting Location Extraction Full Text Indexing
  7. 7. Issues • • • • • No Reducers High DB Read/Writes Batch-style processing M/R Overhead Zero Fault Tolerance
  8. 8. What We Really Wanted • Distributed, Stream-type Processing • Simple Logical DAG • Better Fault Tolerance
  9. 9. Storm Ingest Workflow Documents Raw Data Content Sorter Video Images Text Extraction Video Frame Splitting Image Text Extraction Video Frame Text Extraction Text …
  10. 10. Storm Basics What the heck’s a Topology?
  11. 11. Storm Cluster Supervisor Zookeeper Nimbus Supervisor Zookeeper Supervisor Zookeeper Supervisor Supervisor
  12. 12. Storm Cluster Supervisor Zookeeper Nimbus Supervisor Zookeeper Supervisor Zookeeper Supervisor Supervisor
  13. 13. Storm Cluster Supervisor Zookeeper Nimbus Supervisor Zookeeper Supervisor Zookeeper Supervisor Supervisor
  14. 14. Storm Cluster Supervisor Zookeeper Nimbus Supervisor Zookeeper Supervisor Zookeeper Supervisor Supervisor
  15. 15. Storm Data Concepts • • • • • Tuples Streams Spouts Bolts Topologies
  16. 16. Tuples • Single unit of data in Storm • Examples – Tweet – User Activity Log Entry – File Info
  17. 17. Streams Tuple Tuple Tuple Tuple Tuple An unbound sequence of Tuples Tuple Tuple
  18. 18. Spouts Spout Producers of Streams
  19. 19. Bolts Tuple Process input streams to create new streams Tuple
  20. 20. Examples Spout Examples • HDFS Filesystem Spout • Kafka Queue Spout Bolt Examples • Filtering • Aggregation • DB Operations
  21. 21. Topologies Spout Spout Spout
  22. 22. Demo
  23. 23. Demo Topology Twitter Twitter Hosebird Spout Sentence Splitter Word Count Accumulo
  24. 24. Demo Topology Twitter Twitter Hosebird Spout Shuffle Grouping Field Grouping Sentence Splitter Word Count Accumulo

×