Intro to cassandra + hadoop

  • 5,335 views
Uploaded on

A high-level introduction to using hadoop analytics over data stored in Cassandra.

A high-level introduction to using hadoop analytics over data stored in Cassandra.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
5,335
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
109
Comments
0
Likes
4

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Cassandra + Hadoop
    An Introduction to Hadoop Analytics over Cassandra Data
  • 2. Introductions
    What is Cassandra?
    A highly scalable distributed data store
    Born at Facebook, grew up in the community
    What is Hadoop?
    A set of Apache projects
    Deal with Big Data in a distributed way
    Open source versions of MapReduce, GFS, BigTable, as well as additions, such as Pig and Hive
  • 3. What makes them compatible?
    Cassandra is great at a lot of things
    Fast, extremely scalable writes, fast random reads
    Flexible semi-structured data model
    Not as good with ad-hoc answers
    Enter Hadoop
    MapReduce, Pig, and Hive are extensible
    Output from Hadoop into Cassandra
  • 4. MapReduce
    Input from Cassandra as of 0.6.x
    Baked in output to Cassandra as of 0.7.0
    Streaming support is coming in 0.7
    Example: WordCount
  • 5. Pig
    What is Pig?
    A platform for data analytics developed at Yahoo!
    Includes PigLatin, Grunt shell, and interpreter that compiles down to MapReduce
    Simplifies data analysis
    Cassandra integration
    Stu Hood added Pig integration in Cassandra 0.6
    Example: WordCount with Pig
  • 6. Hive
    What is Hive?
    A platform for data analytics developed at Facebook
    Draws from the familiar SQL -> Hive QL
    Compiles down to MapReduce
    Cassandra integration
    Availability of a Cassandra storage handler is coming soon – HIVE-1434
  • 7. Example Use Case
    Raptr.com
    Gaming statistics and achievements across platforms
    Home-grown -> Cassandra + Hadoop (Pig)
    Idea to execution much faster
    Query runtime from hours to 10-15 minutes
  • 8. Questions
    Contact
    Email: jeremy.hanna@rackspace.com
    Twitter: @jeromatron
    IRC: jeromatron on irc.freenode.net - #cassandra, #hadoop
    Further information
    http://wiki.apache.org/cassandra/HadoopSupport
    Cassandra: The Definitive Guide