• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Storm and Cassandra
 

Storm and Cassandra

on

  • 4,026 views

Slides from talk given at the NYC Cassandra Meetup. Discussing how Storm works and how it integrates well with Apache Cassandra. ...

Slides from talk given at the NYC Cassandra Meetup. Discussing how Storm works and how it integrates well with Apache Cassandra.

There is also a segway into a example project that uses Storm and Cassandra to implement a scalable reactive web crawler.

http://github.com/tjake/stormscraper

Statistics

Views

Total Views
4,026
Views on SlideShare
3,999
Embed Views
27

Actions

Likes
8
Downloads
52
Comments
0

2 Embeds 27

https://twitter.com 22
http://tweetedtimes.com 5

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Storm and Cassandra Storm and Cassandra Presentation Transcript

    • Storm and Cassandra Cassandra NYC Meetup 11/5/2013 Jake Luciani (@tjake)
    • What is Storm? • Distributed event processor • Provides constructs to reliably process all events • Simple conceptual model • New to Apache Incubator: http://wiki.apache.org/incubator/StormProposal
    • Storm Concepts Spout - Collects work and submits it to be processed. Tracks success or failure of each tuple. … Tuple - A collection of data that is passed within storm. Bolt - Processes tuples and optionally emits more tuples. Stream - Identifies outputs from a Spout/Bolt. Forces tuples have some declared structure.
    • Storm Topologies A directed graph of spouts and bolts connected via streams A-F G-P Firehose Zookeeper Q-Z Host A Host B Host C Cassandra (optional)
    • Example Topologies • Track the top 10 most popular links being shared in the last N minutes.
    • Where does data end up? • Storm supports built in RPC so client requests can effectively become a spout. ! • Put the data into a database… • Why Cassandra though?
    • Why Cassandra? • Cassandra’s Data model allows incremental modifications to rows. • Different bolts can update different parts of a Cassandra row asynchronously.
    • Example
    • StormScraper! A web crawling system built on Storm + Cassandra ! http://github.com/tjake/stormscraper
    • StormScraper C* DataModel ! CREATE TABLE scrape_list ( url text PRIMARY KEY, last_update timestamp, depth int ); CREATE TABLE pages ( url text, scrape_date timestamp, title text, html text, text text, inbound_links set<text>, outbound_links set<text>, PRIMARY KEY (url, scrape_date) );
    • StormScraper Topology
    • StormScraper Topology Cassandra
    • StormScraper Topology Url Spout Cassandra
    • StormScraper Topology Url Spout Cassandra
    • StormScraper Topology Url Spout Cassandra
    • StormScraper Topology Url Spout Scraper Bolt Cassandra
    • StormScraper Topology Url Spout Scraper Bolt Cassandra
    • StormScraper Topology Url Spout Scraper Bolt Cassandra
    • StormScraper Topology Html Writer Url Spout Scraper Bolt Cassandra
    • StormScraper Topology Html Writer Url Spout Scraper Bolt Link Writer Cassandra
    • StormScraper Topology Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra
    • StormScraper Topology Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra Text Writer
    • StormScraper Topology Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra Text Writer
    • StormScraper Topology Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra Text Writer
    • StormScraper Topology Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra Text Writer
    • StormScraper Topology Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra Text Writer
    • StormScraper Topology Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra Text Writer
    • StormScraper Topology Fail Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra Text Writer
    • StormScraper Topology Fail Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra Text Writer
    • StormScraper Topology Fail Html Writer Url Spout Scraper Bolt Link Writer Text Extraction Bolt Cassandra Text Writer
    • Code Walkthrough http://github.com/tjake/ stormscraper
    • Storm Summary • Powerful • But easy to make mistakes • Wrong tuple expectation, names, types • Bad topology wiring
    • Thank You! Q&A?