Your SlideShare is downloading. ×
Storm and Cassandra
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Storm and Cassandra

5,450
views

Published on

Slides from talk given at the NYC Cassandra Meetup. Discussing how Storm works and how it integrates well with Apache Cassandra. …

Slides from talk given at the NYC Cassandra Meetup. Discussing how Storm works and how it integrates well with Apache Cassandra.

There is also a segway into a example project that uses Storm and Cassandra to implement a scalable reactive web crawler.

http://github.com/tjake/stormscraper

Published in: Technology, Design

0 Comments
9 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
5,450
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
79
Comments
0
Likes
9
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Storm and Cassandra Cassandra NYC Meetup 11/5/2013 Jake Luciani (@tjake)
  • 2. What is Storm? • Distributed event processor • Provides constructs to reliably process all events • Simple conceptual model • New to Apache Incubator: http://wiki.apache.org/incubator/StormProposal
  • 3. Storm Concepts Spout - Collects work and submits it to be processed. Tracks success or failure of each tuple. … Tuple - A collection of data that is passed within storm. Bolt - Processes tuples and optionally emits more tuples. Stream - Identifies outputs from a Spout/Bolt. Forces tuples have some declared structure.
  • 4. Storm Topologies A directed graph of spouts and bolts connected via streams A-F G-P Firehose Zookeeper Q-Z Host A Host B Host C Cassandra (optional)
  • 5. Example Topologies • Track the top 10 most popular links being shared in the last N minutes.
  • 6. Where does data end up? • Storm supports built in RPC so client requests can effectively become a spout. ! • Put the data into a database… • Why Cassandra though?
  • 7. Why Cassandra? • Cassandra’s Data model allows incremental modifications to rows. • Different bolts can update different parts of a Cassandra row asynchronously.
  • 8. Example
  • 9. StormScraper! A web crawling system built on Storm + Cassandra ! http://github.com/tjake/stormscraper
  • 10. StormScraper C* DataModel ! CREATE TABLE scrape_list ( url text PRIMARY KEY, last_update timestamp, depth int ); CREATE TABLE pages ( url text, scrape_date timestamp, title text, html text, text text, inbound_links set<text>, outbound_links set<text>, PRIMARY KEY (url, scrape_date) );
  • 11. StormScraper Topology
  • 12. StormScraper Topology Cassandra
  • 13. StormScraper Topology Url Spout Cassandra
  • 14. StormScraper Topology Url Spout Cassandra
  • 15. StormScraper Topology Url Spout Cassandra
  • 16. StormScraper Topology Url Spout Scraper Bolt Cassandra
  • 17. StormScraper Topology Url Spout Scraper Bolt Cassandra
  • 18. StormScraper Topology Url Spout Scraper Bolt Cassandra
  • 19. StormScraper Topology Html Writer Url Spout Scraper Bolt Cassandra
  • 20. StormScraper Topology Html Writer Url Spout Scraper Bolt Link Writer Cassandra
  • 21. StormScraper Topology Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra
  • 22. StormScraper Topology Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra Text Writer
  • 23. StormScraper Topology Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra Text Writer
  • 24. StormScraper Topology Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra Text Writer
  • 25. StormScraper Topology Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra Text Writer
  • 26. StormScraper Topology Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra Text Writer
  • 27. StormScraper Topology Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra Text Writer
  • 28. StormScraper Topology Fail Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra Text Writer
  • 29. StormScraper Topology Fail Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra Text Writer
  • 30. StormScraper Topology Fail Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra Text Writer
  • 31. Code Walkthrough http://github.com/tjake/ stormscraper
  • 32. Storm Summary • Powerful • But easy to make mistakes • Wrong tuple expectation, names, types • Bad topology wiring
  • 33. Thank You! Q&A?