Your SlideShare is downloading. ×
0
Intro to cassandra + hadoop
Intro to cassandra + hadoop
Intro to cassandra + hadoop
Intro to cassandra + hadoop
Intro to cassandra + hadoop
Intro to cassandra + hadoop
Intro to cassandra + hadoop
Intro to cassandra + hadoop
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Intro to cassandra + hadoop

5,491

Published on

A high-level introduction to using hadoop analytics over data stored in Cassandra.

A high-level introduction to using hadoop analytics over data stored in Cassandra.

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
5,491
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
112
Comments
0
Likes
4
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Cassandra + Hadoop<br />An Introduction to Hadoop Analytics over Cassandra Data<br />
  • 2. Introductions<br />What is Cassandra?<br />A highly scalable distributed data store<br />Born at Facebook, grew up in the community<br />What is Hadoop?<br />A set of Apache projects<br />Deal with Big Data in a distributed way<br />Open source versions of MapReduce, GFS, BigTable, as well as additions, such as Pig and Hive<br />
  • 3. What makes them compatible?<br />Cassandra is great at a lot of things<br />Fast, extremely scalable writes, fast random reads<br />Flexible semi-structured data model<br />Not as good with ad-hoc answers<br />Enter Hadoop<br />MapReduce, Pig, and Hive are extensible<br />Output from Hadoop into Cassandra<br />
  • 4. MapReduce<br />Input from Cassandra as of 0.6.x<br />Baked in output to Cassandra as of 0.7.0<br />Streaming support is coming in 0.7<br />Example: WordCount<br />
  • 5. Pig<br />What is Pig?<br />A platform for data analytics developed at Yahoo!<br />Includes PigLatin, Grunt shell, and interpreter that compiles down to MapReduce<br />Simplifies data analysis<br />Cassandra integration<br />Stu Hood added Pig integration in Cassandra 0.6<br />Example: WordCount with Pig<br />
  • 6. Hive<br />What is Hive?<br />A platform for data analytics developed at Facebook<br />Draws from the familiar SQL -> Hive QL<br />Compiles down to MapReduce<br />Cassandra integration<br />Availability of a Cassandra storage handler is coming soon – HIVE-1434 <br />
  • 7. Example Use Case<br />Raptr.com<br />Gaming statistics and achievements across platforms<br />Home-grown -> Cassandra + Hadoop (Pig)<br />Idea to execution much faster<br />Query runtime from hours to 10-15 minutes<br />
  • 8. Questions<br />Contact<br />Email: jeremy.hanna@rackspace.com<br />Twitter: @jeromatron<br />IRC: jeromatron on irc.freenode.net - #cassandra, #hadoop<br />Further information<br />http://wiki.apache.org/cassandra/HadoopSupport<br />Cassandra: The Definitive Guide<br />

×