Storm - Altamira University Presentation

  • 695 views
Uploaded on

 

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
695
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
36
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Apache Storm A distributed, real-time computation system Ryan Lanman Some content borrowed from Nathan Marz’ Presentation of a similar name
  • 2. Objectives 1.Their Motivation 2.Our Motivation 3.Storm Basics 4.Demo
  • 3. Their Motivation How Storm Came To Be
  • 4. What They Wanted • • • • • • Guaranteed data processing Horizontal scalability Fault-tolerance No intermediate message brokers! Higher level abstraction than message passing “Just works”
  • 5. Our Motivation Why We Chose Storm ^
  • 6. Lumify Ingest Raw Data Text Extraction Entity Extraction Text Highlighting Location Extraction Full Text Indexing
  • 7. Issues • • • • • No Reducers High DB Read/Writes Batch-style processing M/R Overhead Zero Fault Tolerance
  • 8. What We Really Wanted • Distributed, Stream-type Processing • Simple Logical DAG • Better Fault Tolerance
  • 9. Storm Ingest Workflow Documents Raw Data Content Sorter Video Images Text Extraction Video Frame Splitting Image Text Extraction Video Frame Text Extraction Text …
  • 10. Storm Basics What the heck’s a Topology?
  • 11. Storm Cluster Supervisor Zookeeper Nimbus Supervisor Zookeeper Supervisor Zookeeper Supervisor Supervisor
  • 12. Storm Cluster Supervisor Zookeeper Nimbus Supervisor Zookeeper Supervisor Zookeeper Supervisor Supervisor
  • 13. Storm Cluster Supervisor Zookeeper Nimbus Supervisor Zookeeper Supervisor Zookeeper Supervisor Supervisor
  • 14. Storm Cluster Supervisor Zookeeper Nimbus Supervisor Zookeeper Supervisor Zookeeper Supervisor Supervisor
  • 15. Storm Data Concepts • • • • • Tuples Streams Spouts Bolts Topologies
  • 16. Tuples • Single unit of data in Storm • Examples – Tweet – User Activity Log Entry – File Info
  • 17. Streams Tuple Tuple Tuple Tuple Tuple An unbound sequence of Tuples Tuple Tuple
  • 18. Spouts Spout Producers of Streams
  • 19. Bolts Tuple Process input streams to create new streams Tuple
  • 20. Examples Spout Examples • HDFS Filesystem Spout • Kafka Queue Spout Bolt Examples • Filtering • Aggregation • DB Operations
  • 21. Topologies Spout Spout Spout
  • 22. Demo
  • 23. Demo Topology Twitter Twitter Hosebird Spout Sentence Splitter Word Count Accumulo
  • 24. Demo Topology Twitter Twitter Hosebird Spout Shuffle Grouping Field Grouping Sentence Splitter Word Count Accumulo