Apache Storm
A distributed, real-time computation
system

Ryan Lanman

Some content borrowed from Nathan Marz’
Presentatio...
Objectives
1.Their Motivation
2.Our Motivation
3.Storm Basics
4.Demo
Their Motivation
How Storm Came To Be
What They Wanted
•
•
•
•
•
•

Guaranteed data processing
Horizontal scalability
Fault-tolerance
No intermediate message br...
Our Motivation
Why We Chose Storm
^
Lumify Ingest
Raw
Data

Text Extraction

Entity Extraction

Text Highlighting

Location Extraction

Full Text Indexing
Issues
•
•
•
•
•

No Reducers
High DB Read/Writes
Batch-style processing
M/R Overhead
Zero Fault Tolerance
What We Really Wanted

• Distributed, Stream-type Processing
• Simple Logical DAG
• Better Fault Tolerance
Storm Ingest Workflow
Documents

Raw Data

Content
Sorter

Video

Images

Text
Extraction

Video
Frame
Splitting

Image Te...
Storm Basics
What the heck’s a Topology?
Storm Cluster
Supervisor

Zookeeper

Nimbus

Supervisor

Zookeeper

Supervisor

Zookeeper

Supervisor

Supervisor
Storm Cluster
Supervisor

Zookeeper

Nimbus

Supervisor

Zookeeper

Supervisor

Zookeeper

Supervisor

Supervisor
Storm Cluster
Supervisor

Zookeeper

Nimbus

Supervisor

Zookeeper

Supervisor

Zookeeper

Supervisor

Supervisor
Storm Cluster
Supervisor

Zookeeper

Nimbus

Supervisor

Zookeeper

Supervisor

Zookeeper

Supervisor

Supervisor
Storm Data Concepts
•
•
•
•
•

Tuples
Streams
Spouts
Bolts
Topologies
Tuples
• Single unit of data in Storm
• Examples
– Tweet
– User Activity Log Entry
– File Info
Streams

Tuple

Tuple

Tuple

Tuple

Tuple

An unbound sequence of Tuples

Tuple

Tuple
Spouts

Spout

Producers of Streams
Bolts

Tuple

Process input streams to create new streams

Tuple
Examples
Spout Examples
• HDFS Filesystem Spout
• Kafka Queue Spout

Bolt Examples
• Filtering
• Aggregation
• DB Operatio...
Topologies
Spout

Spout

Spout
Demo
Demo Topology

Twitter

Twitter
Hosebird
Spout
Sentence
Splitter

Word
Count

Accumulo
Demo Topology

Twitter

Twitter
Hosebird
Spout

Shuffle
Grouping

Field
Grouping

Sentence
Splitter

Word
Count

Accumulo
Storm - Altamira University Presentation
Storm - Altamira University Presentation
Storm - Altamira University Presentation
Storm - Altamira University Presentation
Storm - Altamira University Presentation
Storm - Altamira University Presentation
Upcoming SlideShare
Loading in...5
×

Storm - Altamira University Presentation

788

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
788
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
37
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Storm - Altamira University Presentation

  1. 1. Apache Storm A distributed, real-time computation system Ryan Lanman Some content borrowed from Nathan Marz’ Presentation of a similar name
  2. 2. Objectives 1.Their Motivation 2.Our Motivation 3.Storm Basics 4.Demo
  3. 3. Their Motivation How Storm Came To Be
  4. 4. What They Wanted • • • • • • Guaranteed data processing Horizontal scalability Fault-tolerance No intermediate message brokers! Higher level abstraction than message passing “Just works”
  5. 5. Our Motivation Why We Chose Storm ^
  6. 6. Lumify Ingest Raw Data Text Extraction Entity Extraction Text Highlighting Location Extraction Full Text Indexing
  7. 7. Issues • • • • • No Reducers High DB Read/Writes Batch-style processing M/R Overhead Zero Fault Tolerance
  8. 8. What We Really Wanted • Distributed, Stream-type Processing • Simple Logical DAG • Better Fault Tolerance
  9. 9. Storm Ingest Workflow Documents Raw Data Content Sorter Video Images Text Extraction Video Frame Splitting Image Text Extraction Video Frame Text Extraction Text …
  10. 10. Storm Basics What the heck’s a Topology?
  11. 11. Storm Cluster Supervisor Zookeeper Nimbus Supervisor Zookeeper Supervisor Zookeeper Supervisor Supervisor
  12. 12. Storm Cluster Supervisor Zookeeper Nimbus Supervisor Zookeeper Supervisor Zookeeper Supervisor Supervisor
  13. 13. Storm Cluster Supervisor Zookeeper Nimbus Supervisor Zookeeper Supervisor Zookeeper Supervisor Supervisor
  14. 14. Storm Cluster Supervisor Zookeeper Nimbus Supervisor Zookeeper Supervisor Zookeeper Supervisor Supervisor
  15. 15. Storm Data Concepts • • • • • Tuples Streams Spouts Bolts Topologies
  16. 16. Tuples • Single unit of data in Storm • Examples – Tweet – User Activity Log Entry – File Info
  17. 17. Streams Tuple Tuple Tuple Tuple Tuple An unbound sequence of Tuples Tuple Tuple
  18. 18. Spouts Spout Producers of Streams
  19. 19. Bolts Tuple Process input streams to create new streams Tuple
  20. 20. Examples Spout Examples • HDFS Filesystem Spout • Kafka Queue Spout Bolt Examples • Filtering • Aggregation • DB Operations
  21. 21. Topologies Spout Spout Spout
  22. 22. Demo
  23. 23. Demo Topology Twitter Twitter Hosebird Spout Sentence Splitter Word Count Accumulo
  24. 24. Demo Topology Twitter Twitter Hosebird Spout Shuffle Grouping Field Grouping Sentence Splitter Word Count Accumulo
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×