Your SlideShare is downloading. ×

Storm and Cassandra

5,911

Published on

Slides from talk given at the NYC Cassandra Meetup. Discussing how Storm works and how it integrates well with Apache Cassandra. …

Slides from talk given at the NYC Cassandra Meetup. Discussing how Storm works and how it integrates well with Apache Cassandra.

There is also a segway into a example project that uses Storm and Cassandra to implement a scalable reactive web crawler.

http://github.com/tjake/stormscraper

Published in: Technology, Design
0 Comments
9 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
5,911
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
89
Comments
0
Likes
9
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Storm and Cassandra Cassandra NYC Meetup 11/5/2013 Jake Luciani (@tjake)
  • 2. What is Storm? • Distributed event processor • Provides constructs to reliably process all events • Simple conceptual model • New to Apache Incubator: http://wiki.apache.org/incubator/StormProposal
  • 3. Storm Concepts Spout - Collects work and submits it to be processed. Tracks success or failure of each tuple. … Tuple - A collection of data that is passed within storm. Bolt - Processes tuples and optionally emits more tuples. Stream - Identifies outputs from a Spout/Bolt. Forces tuples have some declared structure.
  • 4. Storm Topologies A directed graph of spouts and bolts connected via streams A-F G-P Firehose Zookeeper Q-Z Host A Host B Host C Cassandra (optional)
  • 5. Example Topologies • Track the top 10 most popular links being shared in the last N minutes.
  • 6. Where does data end up? • Storm supports built in RPC so client requests can effectively become a spout. ! • Put the data into a database… • Why Cassandra though?
  • 7. Why Cassandra? • Cassandra’s Data model allows incremental modifications to rows. • Different bolts can update different parts of a Cassandra row asynchronously.
  • 8. Example
  • 9. StormScraper! A web crawling system built on Storm + Cassandra ! http://github.com/tjake/stormscraper
  • 10. StormScraper C* DataModel ! CREATE TABLE scrape_list ( url text PRIMARY KEY, last_update timestamp, depth int ); CREATE TABLE pages ( url text, scrape_date timestamp, title text, html text, text text, inbound_links set<text>, outbound_links set<text>, PRIMARY KEY (url, scrape_date) );
  • 11. StormScraper Topology
  • 12. StormScraper Topology Cassandra
  • 13. StormScraper Topology Url Spout Cassandra
  • 14. StormScraper Topology Url Spout Cassandra
  • 15. StormScraper Topology Url Spout Cassandra
  • 16. StormScraper Topology Url Spout Scraper Bolt Cassandra
  • 17. StormScraper Topology Url Spout Scraper Bolt Cassandra
  • 18. StormScraper Topology Url Spout Scraper Bolt Cassandra
  • 19. StormScraper Topology Html Writer Url Spout Scraper Bolt Cassandra
  • 20. StormScraper Topology Html Writer Url Spout Scraper Bolt Link Writer Cassandra
  • 21. StormScraper Topology Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra
  • 22. StormScraper Topology Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra Text Writer
  • 23. StormScraper Topology Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra Text Writer
  • 24. StormScraper Topology Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra Text Writer
  • 25. StormScraper Topology Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra Text Writer
  • 26. StormScraper Topology Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra Text Writer
  • 27. StormScraper Topology Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra Text Writer
  • 28. StormScraper Topology Fail Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra Text Writer
  • 29. StormScraper Topology Fail Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra Text Writer
  • 30. StormScraper Topology Fail Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra Text Writer
  • 31. Code Walkthrough http://github.com/tjake/ stormscraper
  • 32. Storm Summary • Powerful • But easy to make mistakes • Wrong tuple expectation, names, types • Bad topology wiring
  • 33. Thank You! Q&A?

×