CASSANDRA
    +
 HADOOP
Two Aspects


MapReduce
Pig
MR + Cassandra - History
MR + Cassandra - History


Writing to Cassandra - always been possible
MR + Cassandra - History


Writing to Cassandra - always been possible
Cassandra 0.6.x enables reading data
MR + Cassandra - History


Writing to Cassandra - always been possible
Cassandra 0.6.x enables reading data
Uses its own InputSplit, InputFormat, RecordReader
Why MR + Cassandra?


Cassandra is a great data store but what about
analytics? MapReduce!
Arguable win over MapReduce + HBase, no SPOF
Setup and Configuration
Setup and Configuration
Job/Task Trackers
Setup and Configuration
Job/Task Trackers
  On already established cluster
Setup and Configuration
Job/Task Trackers
  On already established cluster
  Overlays Cassandra cluster
Setup and Configuration
Job/Task Trackers
  On already established cluster
  Overlays Cassandra cluster
  Hybrid
Setup and Configuration
Job/Task Trackers
  On already established cluster
  Overlays Cassandra cluster
  Hybrid
Locality
Setup and Configuration
Job/Task Trackers
  On already established cluster
  Overlays Cassandra cluster
  Hybrid
Locality
  Gives data’s host information to job tracker
Setup and Configuration
Job/Task Trackers
  On already established cluster
  Overlays Cassandra cluster
  Hybrid
Locality
  Gives data’s host information to job tracker
  Configure both topologies - Cassandra + Hadoop
A Separate Cluster
A Complete Overlay
              Separate
             Job Tracker

            Task Trackers
           Collocated with
           Cassandra Nodes
A Complete Overlay
              Separate
             Job Tracker

            Task Trackers
           Collocated with
           Cassandra Nodes
             - Bonus -
            Data locality!
A Hybrid Cluster




 Task Trackers
      on
Cassandra nodes
A Hybrid Cluster


                             - Bonus -
                           Data locality
                        Integrate w/Cluster


 Task Trackers
      on
Cassandra nodes
Tutorial



contrib/word_count example
Pig + Cassandra


contrib/pig - a Cassandra specific storage backing
Requires latest Pig - 0.7
Future Work
Future Work

Simple output to Cassandra - Cassandra-1101
  OutputFormat, OutputReducer, OutputWriter
Future Work

Simple output to Cassandra - Cassandra-1101
  OutputFormat, OutputReducer, OutputWriter
Hive support - Cassandra-913
Future Work

Simple output to Cassandra - Cassandra-1101
  OutputFormat, OutputReducer, OutputWriter
Hive support - Cassandra-913
Optimizations for start/end row - Cassandra-1125
Future Work

Simple output to Cassandra - Cassandra-1101
  OutputFormat, OutputReducer, OutputWriter
Hive support - Cassandra-913
Optimizations for start/end row - Cassandra-1125
Other refinements based on feedback
Questions...


jeromatron on twitter
jeromatron on #cassandra channel on freenode irc
jeremy (dot) hanna (at) rackspace (dot) com

Cassandra+Hadoop