Your SlideShare is downloading. ×

Intro to cassandra + hadoop

5,456

Published on

A high-level introduction to using hadoop analytics over data stored in Cassandra.

A high-level introduction to using hadoop analytics over data stored in Cassandra.

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
5,456
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
110
Comments
0
Likes
4
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Cassandra + Hadoop
    An Introduction to Hadoop Analytics over Cassandra Data
  • 2. Introductions
    What is Cassandra?
    A highly scalable distributed data store
    Born at Facebook, grew up in the community
    What is Hadoop?
    A set of Apache projects
    Deal with Big Data in a distributed way
    Open source versions of MapReduce, GFS, BigTable, as well as additions, such as Pig and Hive
  • 3. What makes them compatible?
    Cassandra is great at a lot of things
    Fast, extremely scalable writes, fast random reads
    Flexible semi-structured data model
    Not as good with ad-hoc answers
    Enter Hadoop
    MapReduce, Pig, and Hive are extensible
    Output from Hadoop into Cassandra
  • 4. MapReduce
    Input from Cassandra as of 0.6.x
    Baked in output to Cassandra as of 0.7.0
    Streaming support is coming in 0.7
    Example: WordCount
  • 5. Pig
    What is Pig?
    A platform for data analytics developed at Yahoo!
    Includes PigLatin, Grunt shell, and interpreter that compiles down to MapReduce
    Simplifies data analysis
    Cassandra integration
    Stu Hood added Pig integration in Cassandra 0.6
    Example: WordCount with Pig
  • 6. Hive
    What is Hive?
    A platform for data analytics developed at Facebook
    Draws from the familiar SQL -> Hive QL
    Compiles down to MapReduce
    Cassandra integration
    Availability of a Cassandra storage handler is coming soon – HIVE-1434
  • 7. Example Use Case
    Raptr.com
    Gaming statistics and achievements across platforms
    Home-grown -> Cassandra + Hadoop (Pig)
    Idea to execution much faster
    Query runtime from hours to 10-15 minutes
  • 8. Questions
    Contact
    Email: jeremy.hanna@rackspace.com
    Twitter: @jeromatron
    IRC: jeromatron on irc.freenode.net - #cassandra, #hadoop
    Further information
    http://wiki.apache.org/cassandra/HadoopSupport
    Cassandra: The Definitive Guide

×