Cassandra+Hadoop

26,887 views

Published on

A presentation on recent changes to Cassandra that make it able to use Hadoop's MapReduce and Pig with it.

Published in: Technology, Business
1 Comment
27 Likes
Statistics
Notes
  • Hey very nice blog!!
    Hi there,I enjoy reading through your article post, I wanted to write a little comment to support you and wish you a good

    continuationAll the best for all your blogging efforts.
    Appreciate the recommendation! Let me try it out.
    Keep working ,great job!
    Hadoop training
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
26,887
On SlideShare
0
From Embeds
0
Number of Embeds
1,215
Actions
Shares
0
Downloads
727
Comments
1
Likes
27
Embeds 0
No embeds

No notes for slide























  • Cassandra+Hadoop

    1. 1. CASSANDRA + HADOOP
    2. 2. Two Aspects MapReduce Pig
    3. 3. MR + Cassandra - History
    4. 4. MR + Cassandra - History Writing to Cassandra - always been possible
    5. 5. MR + Cassandra - History Writing to Cassandra - always been possible Cassandra 0.6.x enables reading data
    6. 6. MR + Cassandra - History Writing to Cassandra - always been possible Cassandra 0.6.x enables reading data Uses its own InputSplit, InputFormat, RecordReader
    7. 7. Why MR + Cassandra? Cassandra is a great data store but what about analytics? MapReduce! Arguable win over MapReduce + HBase, no SPOF
    8. 8. Setup and Configuration
    9. 9. Setup and Configuration Job/Task Trackers
    10. 10. Setup and Configuration Job/Task Trackers On already established cluster
    11. 11. Setup and Configuration Job/Task Trackers On already established cluster Overlays Cassandra cluster
    12. 12. Setup and Configuration Job/Task Trackers On already established cluster Overlays Cassandra cluster Hybrid
    13. 13. Setup and Configuration Job/Task Trackers On already established cluster Overlays Cassandra cluster Hybrid Locality
    14. 14. Setup and Configuration Job/Task Trackers On already established cluster Overlays Cassandra cluster Hybrid Locality Gives data’s host information to job tracker
    15. 15. Setup and Configuration Job/Task Trackers On already established cluster Overlays Cassandra cluster Hybrid Locality Gives data’s host information to job tracker Configure both topologies - Cassandra + Hadoop
    16. 16. A Separate Cluster
    17. 17. A Complete Overlay Separate Job Tracker Task Trackers Collocated with Cassandra Nodes
    18. 18. A Complete Overlay Separate Job Tracker Task Trackers Collocated with Cassandra Nodes - Bonus - Data locality!
    19. 19. A Hybrid Cluster Task Trackers on Cassandra nodes
    20. 20. A Hybrid Cluster - Bonus - Data locality Integrate w/Cluster Task Trackers on Cassandra nodes
    21. 21. Tutorial contrib/word_count example
    22. 22. Pig + Cassandra contrib/pig - a Cassandra specific storage backing Requires latest Pig - 0.7
    23. 23. Future Work
    24. 24. Future Work Simple output to Cassandra - Cassandra-1101 OutputFormat, OutputReducer, OutputWriter
    25. 25. Future Work Simple output to Cassandra - Cassandra-1101 OutputFormat, OutputReducer, OutputWriter Hive support - Cassandra-913
    26. 26. Future Work Simple output to Cassandra - Cassandra-1101 OutputFormat, OutputReducer, OutputWriter Hive support - Cassandra-913 Optimizations for start/end row - Cassandra-1125
    27. 27. Future Work Simple output to Cassandra - Cassandra-1101 OutputFormat, OutputReducer, OutputWriter Hive support - Cassandra-913 Optimizations for start/end row - Cassandra-1125 Other refinements based on feedback
    28. 28. Questions... jeromatron on twitter jeromatron on #cassandra channel on freenode irc jeremy (dot) hanna (at) rackspace (dot) com

    ×