0
Storm and Cassandra
Cassandra NYC Meetup 11/5/2013
Jake Luciani (@tjake)
What is Storm?
•

Distributed event processor

•

Provides constructs to reliably process all events

•

Simple conceptual...
Storm Concepts
Spout - Collects work and submits it to be processed.
Tracks success or failure of each tuple.

…

Tuple - ...
Storm Topologies
A directed graph of spouts and bolts connected via streams

A-F
G-P

Firehose

Zookeeper

Q-Z

Host A

Ho...
Example Topologies

•

Track the top 10 most popular links being shared in the
last N minutes.
Where does data end up?
•

Storm supports built in RPC so client requests can
effectively become a spout.
!

•

Put the da...
Why Cassandra?

•

Cassandra’s Data model allows incremental
modifications to rows.

•

Different bolts can update differen...
Example
StormScraper!
A web crawling system built on
Storm + Cassandra
!
http://github.com/tjake/stormscraper
StormScraper C* DataModel
!

CREATE TABLE scrape_list (
url text PRIMARY KEY,
last_update timestamp,
depth int
);

CREATE ...
StormScraper Topology
StormScraper Topology

Cassandra
StormScraper Topology

Url
Spout

Cassandra
StormScraper Topology

Url
Spout

Cassandra
StormScraper Topology

Url
Spout

Cassandra
StormScraper Topology

Url
Spout

Scraper
Bolt

Cassandra
StormScraper Topology

Url
Spout

Scraper
Bolt

Cassandra
StormScraper Topology

Url
Spout

Scraper
Bolt

Cassandra
StormScraper Topology
Html Writer

Url
Spout

Scraper
Bolt

Cassandra
StormScraper Topology
Html Writer

Url
Spout

Scraper
Bolt

Link Writer

Cassandra
StormScraper Topology
Html Writer

Url
Spout

Scraper
Bolt

Link Writer

Text
Extraction
Bolt
Cassandra
StormScraper Topology
Html Writer

Url
Spout

Scraper
Bolt

Link Writer

Text
Extraction
Bolt
Cassandra

Text Writer
StormScraper Topology
Html Writer

Url
Spout

Scraper
Bolt

Link Writer

Text
Extraction
Bolt
Cassandra

Text Writer
StormScraper Topology
Html Writer

Url
Spout

Scraper
Bolt

Link Writer

Text
Extraction
Bolt
Cassandra

Text Writer
StormScraper Topology
Html Writer

Url
Spout

Scraper
Bolt

Link Writer

Text
Extraction
Bolt
Cassandra

Text Writer
StormScraper Topology
Html Writer

Url
Spout

Scraper
Bolt

Link Writer

Text
Extraction
Bolt
Cassandra

Text Writer
StormScraper Topology
Html Writer

Url
Spout

Scraper
Bolt

Link Writer

Text
Extraction
Bolt
Cassandra

Text Writer
StormScraper Topology
Fail
Html Writer

Url
Spout

Scraper
Bolt

Link Writer

Text
Extraction
Bolt
Cassandra

Text Writer
StormScraper Topology
Fail
Html Writer

Url
Spout

Scraper
Bolt

Link Writer

Text
Extraction
Bolt
Cassandra

Text Writer
StormScraper Topology
Fail
Html Writer

Url
Spout

Scraper
Bolt

Link Writer

Text
Extraction
Bolt
Cassandra

Text Writer
Code Walkthrough
http://github.com/tjake/
stormscraper
Storm Summary

•

Powerful

•

But easy to make mistakes
•

Wrong tuple expectation, names, types

•

Bad topology wiring
Thank You!
Q&A?
Upcoming SlideShare
Loading in...5
×

Storm and Cassandra

6,532

Published on

Slides from talk given at the NYC Cassandra Meetup. Discussing how Storm works and how it integrates well with Apache Cassandra.

There is also a segway into a example project that uses Storm and Cassandra to implement a scalable reactive web crawler.

http://github.com/tjake/stormscraper

Published in: Technology, Design
0 Comments
11 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
6,532
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
95
Comments
0
Likes
11
Embeds 0
No embeds

No notes for slide

Transcript of "Storm and Cassandra "

  1. 1. Storm and Cassandra Cassandra NYC Meetup 11/5/2013 Jake Luciani (@tjake)
  2. 2. What is Storm? • Distributed event processor • Provides constructs to reliably process all events • Simple conceptual model • New to Apache Incubator: http://wiki.apache.org/incubator/StormProposal
  3. 3. Storm Concepts Spout - Collects work and submits it to be processed. Tracks success or failure of each tuple. … Tuple - A collection of data that is passed within storm. Bolt - Processes tuples and optionally emits more tuples. Stream - Identifies outputs from a Spout/Bolt. Forces tuples have some declared structure.
  4. 4. Storm Topologies A directed graph of spouts and bolts connected via streams A-F G-P Firehose Zookeeper Q-Z Host A Host B Host C Cassandra (optional)
  5. 5. Example Topologies • Track the top 10 most popular links being shared in the last N minutes.
  6. 6. Where does data end up? • Storm supports built in RPC so client requests can effectively become a spout. ! • Put the data into a database… • Why Cassandra though?
  7. 7. Why Cassandra? • Cassandra’s Data model allows incremental modifications to rows. • Different bolts can update different parts of a Cassandra row asynchronously.
  8. 8. Example
  9. 9. StormScraper! A web crawling system built on Storm + Cassandra ! http://github.com/tjake/stormscraper
  10. 10. StormScraper C* DataModel ! CREATE TABLE scrape_list ( url text PRIMARY KEY, last_update timestamp, depth int ); CREATE TABLE pages ( url text, scrape_date timestamp, title text, html text, text text, inbound_links set<text>, outbound_links set<text>, PRIMARY KEY (url, scrape_date) );
  11. 11. StormScraper Topology
  12. 12. StormScraper Topology Cassandra
  13. 13. StormScraper Topology Url Spout Cassandra
  14. 14. StormScraper Topology Url Spout Cassandra
  15. 15. StormScraper Topology Url Spout Cassandra
  16. 16. StormScraper Topology Url Spout Scraper Bolt Cassandra
  17. 17. StormScraper Topology Url Spout Scraper Bolt Cassandra
  18. 18. StormScraper Topology Url Spout Scraper Bolt Cassandra
  19. 19. StormScraper Topology Html Writer Url Spout Scraper Bolt Cassandra
  20. 20. StormScraper Topology Html Writer Url Spout Scraper Bolt Link Writer Cassandra
  21. 21. StormScraper Topology Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra
  22. 22. StormScraper Topology Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra Text Writer
  23. 23. StormScraper Topology Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra Text Writer
  24. 24. StormScraper Topology Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra Text Writer
  25. 25. StormScraper Topology Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra Text Writer
  26. 26. StormScraper Topology Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra Text Writer
  27. 27. StormScraper Topology Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra Text Writer
  28. 28. StormScraper Topology Fail Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra Text Writer
  29. 29. StormScraper Topology Fail Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra Text Writer
  30. 30. StormScraper Topology Fail Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra Text Writer
  31. 31. Code Walkthrough http://github.com/tjake/ stormscraper
  32. 32. Storm Summary • Powerful • But easy to make mistakes • Wrong tuple expectation, names, types • Bad topology wiring
  33. 33. Thank You! Q&A?
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×