Your SlideShare is downloading. ×
  • Like
Storm and Cassandra
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Storm and Cassandra

  • 5,186 views
Published

Slides from talk given at the NYC Cassandra Meetup. Discussing how Storm works and how it integrates well with Apache Cassandra. …

Slides from talk given at the NYC Cassandra Meetup. Discussing how Storm works and how it integrates well with Apache Cassandra.

There is also a segway into a example project that uses Storm and Cassandra to implement a scalable reactive web crawler.

http://github.com/tjake/stormscraper

Published in Technology , Design
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
5,186
On SlideShare
0
From Embeds
0
Number of Embeds
3

Actions

Shares
Downloads
75
Comments
0
Likes
9

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Storm and Cassandra Cassandra NYC Meetup 11/5/2013 Jake Luciani (@tjake)
  • 2. What is Storm? • Distributed event processor • Provides constructs to reliably process all events • Simple conceptual model • New to Apache Incubator: http://wiki.apache.org/incubator/StormProposal
  • 3. Storm Concepts Spout - Collects work and submits it to be processed. Tracks success or failure of each tuple. … Tuple - A collection of data that is passed within storm. Bolt - Processes tuples and optionally emits more tuples. Stream - Identifies outputs from a Spout/Bolt. Forces tuples have some declared structure.
  • 4. Storm Topologies A directed graph of spouts and bolts connected via streams A-F G-P Firehose Zookeeper Q-Z Host A Host B Host C Cassandra (optional)
  • 5. Example Topologies • Track the top 10 most popular links being shared in the last N minutes.
  • 6. Where does data end up? • Storm supports built in RPC so client requests can effectively become a spout. ! • Put the data into a database… • Why Cassandra though?
  • 7. Why Cassandra? • Cassandra’s Data model allows incremental modifications to rows. • Different bolts can update different parts of a Cassandra row asynchronously.
  • 8. Example
  • 9. StormScraper! A web crawling system built on Storm + Cassandra ! http://github.com/tjake/stormscraper
  • 10. StormScraper C* DataModel ! CREATE TABLE scrape_list ( url text PRIMARY KEY, last_update timestamp, depth int ); CREATE TABLE pages ( url text, scrape_date timestamp, title text, html text, text text, inbound_links set<text>, outbound_links set<text>, PRIMARY KEY (url, scrape_date) );
  • 11. StormScraper Topology
  • 12. StormScraper Topology Cassandra
  • 13. StormScraper Topology Url Spout Cassandra
  • 14. StormScraper Topology Url Spout Cassandra
  • 15. StormScraper Topology Url Spout Cassandra
  • 16. StormScraper Topology Url Spout Scraper Bolt Cassandra
  • 17. StormScraper Topology Url Spout Scraper Bolt Cassandra
  • 18. StormScraper Topology Url Spout Scraper Bolt Cassandra
  • 19. StormScraper Topology Html Writer Url Spout Scraper Bolt Cassandra
  • 20. StormScraper Topology Html Writer Url Spout Scraper Bolt Link Writer Cassandra
  • 21. StormScraper Topology Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra
  • 22. StormScraper Topology Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra Text Writer
  • 23. StormScraper Topology Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra Text Writer
  • 24. StormScraper Topology Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra Text Writer
  • 25. StormScraper Topology Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra Text Writer
  • 26. StormScraper Topology Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra Text Writer
  • 27. StormScraper Topology Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra Text Writer
  • 28. StormScraper Topology Fail Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra Text Writer
  • 29. StormScraper Topology Fail Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra Text Writer
  • 30. StormScraper Topology Fail Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra Text Writer
  • 31. Code Walkthrough http://github.com/tjake/ stormscraper
  • 32. Storm Summary • Powerful • But easy to make mistakes • Wrong tuple expectation, names, types • Bad topology wiring
  • 33. Thank You! Q&A?