Intro to cassandra + hadoop

Cassandra + Hadoop An Introduction to Hadoop Analytics over Cassandra Data

Introductions What is Cassandra? A highly scalable distributed data store Born at Facebook, grew up in the community What is Hadoop? A set of Apache projects Deal with Big Data in a distributed way Open source versions of MapReduce, GFS, BigTable, as well as additions, such as Pig and Hive

What makes them compatible? Cassandra is great at a lot of things Fast, extremely scalable writes, fast random reads Flexible semi-structured data model Not as good with ad-hoc answers Enter Hadoop MapReduce, Pig, and Hive are extensible Output from Hadoop into Cassandra

MapReduce Input from Cassandra as of 0.6.x Baked in output to Cassandra as of 0.7.0 Streaming support is coming in 0.7 Example: WordCount

Pig What is Pig? A platform for data analytics developed at Yahoo! Includes PigLatin, Grunt shell, and interpreter that compiles down to MapReduce Simplifies data analysis Cassandra integration Stu Hood added Pig integration in Cassandra 0.6 Example: WordCount with Pig

Hive What is Hive? A platform for data analytics developed at Facebook Draws from the familiar SQL -> Hive QL Compiles down to MapReduce Cassandra integration Availability of a Cassandra storage handler is coming soon – HIVE-1434

Example Use Case Raptr.com Gaming statistics and achievements across platforms Home-grown -> Cassandra + Hadoop (Pig) Idea to execution much faster Query runtime from hours to 10-15 minutes

Questions Contact Email: jeremy.hanna@rackspace.com Twitter: @jeromatron IRC: jeromatron on irc.freenode.net - #cassandra, #hadoop Further information http://wiki.apache.org/cassandra/HadoopSupport Cassandra: The Definitive Guide

Intro to cassandra + hadoop

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Intro to cassandra + hadoop

Similar to Intro to cassandra + hadoop (20)

More from Jeremy Hanna

More from Jeremy Hanna (8)

Recently uploaded

Recently uploaded (20)

Intro to cassandra + hadoop