0
Cassandra + Hadoop<br />An Introduction to Hadoop Analytics over Cassandra Data<br />
Introductions<br />What is Cassandra?<br />A highly scalable distributed data store<br />Born at Facebook, grew up in the ...
What makes them compatible?<br />Cassandra is great at a lot of things<br />Fast, extremely scalable writes, fast random r...
MapReduce<br />Input from Cassandra as of 0.6.x<br />Baked in output to Cassandra as of 0.7.0<br />Streaming support is co...
Pig<br />What is Pig?<br />A platform for data analytics developed at Yahoo!<br />Includes PigLatin, Grunt shell, and inte...
Hive<br />What is Hive?<br />A platform for data analytics developed at Facebook<br />Draws from the familiar SQL -> Hive ...
Example Use Case<br />Raptr.com<br />Gaming statistics and achievements across platforms<br />Home-grown -> Cassandra + Ha...
Questions<br />Contact<br />Email: jeremy.hanna@rackspace.com<br />Twitter: @jeromatron<br />IRC: jeromatron on irc.freeno...
Upcoming SlideShare
Loading in...5
×

Intro to cassandra + hadoop

5,520

Published on

A high-level introduction to using hadoop analytics over data stored in Cassandra.

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
5,520
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
112
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Transcript of "Intro to cassandra + hadoop"

  1. 1. Cassandra + Hadoop<br />An Introduction to Hadoop Analytics over Cassandra Data<br />
  2. 2. Introductions<br />What is Cassandra?<br />A highly scalable distributed data store<br />Born at Facebook, grew up in the community<br />What is Hadoop?<br />A set of Apache projects<br />Deal with Big Data in a distributed way<br />Open source versions of MapReduce, GFS, BigTable, as well as additions, such as Pig and Hive<br />
  3. 3. What makes them compatible?<br />Cassandra is great at a lot of things<br />Fast, extremely scalable writes, fast random reads<br />Flexible semi-structured data model<br />Not as good with ad-hoc answers<br />Enter Hadoop<br />MapReduce, Pig, and Hive are extensible<br />Output from Hadoop into Cassandra<br />
  4. 4. MapReduce<br />Input from Cassandra as of 0.6.x<br />Baked in output to Cassandra as of 0.7.0<br />Streaming support is coming in 0.7<br />Example: WordCount<br />
  5. 5. Pig<br />What is Pig?<br />A platform for data analytics developed at Yahoo!<br />Includes PigLatin, Grunt shell, and interpreter that compiles down to MapReduce<br />Simplifies data analysis<br />Cassandra integration<br />Stu Hood added Pig integration in Cassandra 0.6<br />Example: WordCount with Pig<br />
  6. 6. Hive<br />What is Hive?<br />A platform for data analytics developed at Facebook<br />Draws from the familiar SQL -> Hive QL<br />Compiles down to MapReduce<br />Cassandra integration<br />Availability of a Cassandra storage handler is coming soon – HIVE-1434 <br />
  7. 7. Example Use Case<br />Raptr.com<br />Gaming statistics and achievements across platforms<br />Home-grown -> Cassandra + Hadoop (Pig)<br />Idea to execution much faster<br />Query runtime from hours to 10-15 minutes<br />
  8. 8. Questions<br />Contact<br />Email: jeremy.hanna@rackspace.com<br />Twitter: @jeromatron<br />IRC: jeromatron on irc.freenode.net - #cassandra, #hadoop<br />Further information<br />http://wiki.apache.org/cassandra/HadoopSupport<br />Cassandra: The Definitive Guide<br />
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×