Your SlideShare is downloading. ×
  • Like
Intro to cassandra + hadoop
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Intro to cassandra + hadoop


A high-level introduction to using hadoop analytics over data stored in Cassandra.

A high-level introduction to using hadoop analytics over data stored in Cassandra.

Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads


Total Views
On SlideShare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Cassandra + Hadoop
    An Introduction to Hadoop Analytics over Cassandra Data
  • 2. Introductions
    What is Cassandra?
    A highly scalable distributed data store
    Born at Facebook, grew up in the community
    What is Hadoop?
    A set of Apache projects
    Deal with Big Data in a distributed way
    Open source versions of MapReduce, GFS, BigTable, as well as additions, such as Pig and Hive
  • 3. What makes them compatible?
    Cassandra is great at a lot of things
    Fast, extremely scalable writes, fast random reads
    Flexible semi-structured data model
    Not as good with ad-hoc answers
    Enter Hadoop
    MapReduce, Pig, and Hive are extensible
    Output from Hadoop into Cassandra
  • 4. MapReduce
    Input from Cassandra as of 0.6.x
    Baked in output to Cassandra as of 0.7.0
    Streaming support is coming in 0.7
    Example: WordCount
  • 5. Pig
    What is Pig?
    A platform for data analytics developed at Yahoo!
    Includes PigLatin, Grunt shell, and interpreter that compiles down to MapReduce
    Simplifies data analysis
    Cassandra integration
    Stu Hood added Pig integration in Cassandra 0.6
    Example: WordCount with Pig
  • 6. Hive
    What is Hive?
    A platform for data analytics developed at Facebook
    Draws from the familiar SQL -> Hive QL
    Compiles down to MapReduce
    Cassandra integration
    Availability of a Cassandra storage handler is coming soon – HIVE-1434
  • 7. Example Use Case
    Gaming statistics and achievements across platforms
    Home-grown -> Cassandra + Hadoop (Pig)
    Idea to execution much faster
    Query runtime from hours to 10-15 minutes
  • 8. Questions
    Twitter: @jeromatron
    IRC: jeromatron on - #cassandra, #hadoop
    Further information
    Cassandra: The Definitive Guide