Intro to cassandra + hadoop
Upcoming SlideShare
Loading in...5
×
 

Intro to cassandra + hadoop

on

  • 6,038 views

A high-level introduction to using hadoop analytics over data stored in Cassandra.

A high-level introduction to using hadoop analytics over data stored in Cassandra.

Statistics

Views

Total Views
6,038
Views on SlideShare
6,033
Embed Views
5

Actions

Likes
4
Downloads
103
Comments
0

2 Embeds 5

http://www.linkedin.com 3
https://www.linkedin.com 2

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Intro to cassandra + hadoop Intro to cassandra + hadoop Presentation Transcript

    • Cassandra + Hadoop
      An Introduction to Hadoop Analytics over Cassandra Data
    • Introductions
      What is Cassandra?
      A highly scalable distributed data store
      Born at Facebook, grew up in the community
      What is Hadoop?
      A set of Apache projects
      Deal with Big Data in a distributed way
      Open source versions of MapReduce, GFS, BigTable, as well as additions, such as Pig and Hive
    • What makes them compatible?
      Cassandra is great at a lot of things
      Fast, extremely scalable writes, fast random reads
      Flexible semi-structured data model
      Not as good with ad-hoc answers
      Enter Hadoop
      MapReduce, Pig, and Hive are extensible
      Output from Hadoop into Cassandra
    • MapReduce
      Input from Cassandra as of 0.6.x
      Baked in output to Cassandra as of 0.7.0
      Streaming support is coming in 0.7
      Example: WordCount
    • Pig
      What is Pig?
      A platform for data analytics developed at Yahoo!
      Includes PigLatin, Grunt shell, and interpreter that compiles down to MapReduce
      Simplifies data analysis
      Cassandra integration
      Stu Hood added Pig integration in Cassandra 0.6
      Example: WordCount with Pig
    • Hive
      What is Hive?
      A platform for data analytics developed at Facebook
      Draws from the familiar SQL -> Hive QL
      Compiles down to MapReduce
      Cassandra integration
      Availability of a Cassandra storage handler is coming soon – HIVE-1434
    • Example Use Case
      Raptr.com
      Gaming statistics and achievements across platforms
      Home-grown -> Cassandra + Hadoop (Pig)
      Idea to execution much faster
      Query runtime from hours to 10-15 minutes
    • Questions
      Contact
      Email: jeremy.hanna@rackspace.com
      Twitter: @jeromatron
      IRC: jeromatron on irc.freenode.net - #cassandra, #hadoop
      Further information
      http://wiki.apache.org/cassandra/HadoopSupport
      Cassandra: The Definitive Guide