Cassandra Hadoop Best Practices by Jeremy Hanna

4,436 views
4,311 views

Published on

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
4,436
On SlideShare
0
From Embeds
0
Number of Embeds
2,984
Actions
Shares
0
Downloads
16
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Cassandra Hadoop Best Practices by Jeremy Hanna

  1. 1. Hadoop + CassandraBest PracticesThursday, June 6, 13
  2. 2. Some BackgroundThursday, June 6, 13
  3. 3. Some Background• Hadoop support since early 2010Thursday, June 6, 13
  4. 4. Some Background• Hadoop support since early 2010• MapReduce/Pig works with any Hadoop 1.xdistribution.Thursday, June 6, 13
  5. 5. Some Background• Hadoop support since early 2010• MapReduce/Pig works with any Hadoop 1.xdistribution.• Hive is a neatly integrated piece of DSEThursday, June 6, 13
  6. 6. Some Background• Hadoop support since early 2010• MapReduce/Pig works with any Hadoop 1.xdistribution.• Hive is a neatly integrated piece of DSE• Data locality just like with HDFSThursday, June 6, 13
  7. 7. Some Background• Hadoop support since early 2010• MapReduce/Pig works with any Hadoop 1.xdistribution.• Hive is a neatly integrated piece of DSE• Data locality just like with HDFS• Cassandra can handle ~200 CFsThursday, June 6, 13
  8. 8. SetupThursday, June 6, 13
  9. 9. Setup• Analytics specific datacenterThursday, June 6, 13
  10. 10. Setup• Analytics specific datacenter• Configure replication (KS/DC specific)Thursday, June 6, 13
  11. 11. Setup• Analytics specific datacenter• Configure replication (KS/DC specific)• Isolated reads at CL.LOCAL_QUORUMThursday, June 6, 13
  12. 12. Setup• Analytics specific datacenter• Configure replication (KS/DC specific)• Isolated reads at CL.LOCAL_QUORUM• Writes will be replicatedThursday, June 6, 13
  13. 13. Setup• Analytics specific datacenter• Configure replication (KS/DC specific)• Isolated reads at CL.LOCAL_QUORUM• Writes will be replicated• Same best practices as with Hadoop aloneThursday, June 6, 13
  14. 14. Vanilla HadoopThursday, June 6, 13
  15. 15. Vanilla Hadoop• Co-locate task trackers and data nodeswith Cassandra nodes (data locality)Thursday, June 6, 13
  16. 16. Vanilla Hadoop• Co-locate task trackers and data nodeswith Cassandra nodes (data locality)• Workload isolation with separateCassandra datacenter configuredThursday, June 6, 13
  17. 17. PlanningThursday, June 6, 13
  18. 18. Planning• MapReduce over full column familyThursday, June 6, 13
  19. 19. Planning• MapReduce over full column family• Model data accordinglyThursday, June 6, 13
  20. 20. Planning• MapReduce over full column family• Model data accordingly• Add more column familiesThursday, June 6, 13
  21. 21. Planning• MapReduce over full column family• Model data accordingly• Add more column families• Can use secondary index, but use cautionThursday, June 6, 13
  22. 22. ExecutionThursday, June 6, 13
  23. 23. Execution• Project and select early in your workflowThursday, June 6, 13
  24. 24. Execution• Project and select early in your workflow• Store common intermediate datasets (inCFS/HDFS)Thursday, June 6, 13
  25. 25. Execution• Project and select early in your workflow• Store common intermediate datasets (inCFS/HDFS)• Bulk loader output format excelsThursday, June 6, 13
  26. 26. Use CasesThursday, June 6, 13
  27. 27. Use Cases• Typical Hadoop tasksThursday, June 6, 13
  28. 28. Use Cases• Typical Hadoop tasks• Validate dataThursday, June 6, 13
  29. 29. Use Cases• Typical Hadoop tasks• Validate data• Fix dataThursday, June 6, 13
  30. 30. Use Cases• Typical Hadoop tasks• Validate data• Fix data• Bootstrap a new column family fromexisting dataThursday, June 6, 13
  31. 31. Thank you• Jeremy Hanna• @jeromatron (twitter and irc)• jeremy@datastax.com• Ping me if you have any questionsThursday, June 6, 13

×