Cassandra Hadoop Best Practices by Jeremy Hanna
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
3,945
On Slideshare
1,328
From Embeds
2,617
Number of Embeds
5

Actions

Shares
Downloads
10
Comments
0
Likes
0

Embeds 2,617

http://hugfrance.fr 2,603
http://www.google.fr 6
https://www.google.fr 4
http://translate.googleusercontent.com 3
http://ranksit.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Hadoop + CassandraBest PracticesThursday, June 6, 13
  • 2. Some BackgroundThursday, June 6, 13
  • 3. Some Background• Hadoop support since early 2010Thursday, June 6, 13
  • 4. Some Background• Hadoop support since early 2010• MapReduce/Pig works with any Hadoop 1.xdistribution.Thursday, June 6, 13
  • 5. Some Background• Hadoop support since early 2010• MapReduce/Pig works with any Hadoop 1.xdistribution.• Hive is a neatly integrated piece of DSEThursday, June 6, 13
  • 6. Some Background• Hadoop support since early 2010• MapReduce/Pig works with any Hadoop 1.xdistribution.• Hive is a neatly integrated piece of DSE• Data locality just like with HDFSThursday, June 6, 13
  • 7. Some Background• Hadoop support since early 2010• MapReduce/Pig works with any Hadoop 1.xdistribution.• Hive is a neatly integrated piece of DSE• Data locality just like with HDFS• Cassandra can handle ~200 CFsThursday, June 6, 13
  • 8. SetupThursday, June 6, 13
  • 9. Setup• Analytics specific datacenterThursday, June 6, 13
  • 10. Setup• Analytics specific datacenter• Configure replication (KS/DC specific)Thursday, June 6, 13
  • 11. Setup• Analytics specific datacenter• Configure replication (KS/DC specific)• Isolated reads at CL.LOCAL_QUORUMThursday, June 6, 13
  • 12. Setup• Analytics specific datacenter• Configure replication (KS/DC specific)• Isolated reads at CL.LOCAL_QUORUM• Writes will be replicatedThursday, June 6, 13
  • 13. Setup• Analytics specific datacenter• Configure replication (KS/DC specific)• Isolated reads at CL.LOCAL_QUORUM• Writes will be replicated• Same best practices as with Hadoop aloneThursday, June 6, 13
  • 14. Vanilla HadoopThursday, June 6, 13
  • 15. Vanilla Hadoop• Co-locate task trackers and data nodeswith Cassandra nodes (data locality)Thursday, June 6, 13
  • 16. Vanilla Hadoop• Co-locate task trackers and data nodeswith Cassandra nodes (data locality)• Workload isolation with separateCassandra datacenter configuredThursday, June 6, 13
  • 17. PlanningThursday, June 6, 13
  • 18. Planning• MapReduce over full column familyThursday, June 6, 13
  • 19. Planning• MapReduce over full column family• Model data accordinglyThursday, June 6, 13
  • 20. Planning• MapReduce over full column family• Model data accordingly• Add more column familiesThursday, June 6, 13
  • 21. Planning• MapReduce over full column family• Model data accordingly• Add more column families• Can use secondary index, but use cautionThursday, June 6, 13
  • 22. ExecutionThursday, June 6, 13
  • 23. Execution• Project and select early in your workflowThursday, June 6, 13
  • 24. Execution• Project and select early in your workflow• Store common intermediate datasets (inCFS/HDFS)Thursday, June 6, 13
  • 25. Execution• Project and select early in your workflow• Store common intermediate datasets (inCFS/HDFS)• Bulk loader output format excelsThursday, June 6, 13
  • 26. Use CasesThursday, June 6, 13
  • 27. Use Cases• Typical Hadoop tasksThursday, June 6, 13
  • 28. Use Cases• Typical Hadoop tasks• Validate dataThursday, June 6, 13
  • 29. Use Cases• Typical Hadoop tasks• Validate data• Fix dataThursday, June 6, 13
  • 30. Use Cases• Typical Hadoop tasks• Validate data• Fix data• Bootstrap a new column family fromexisting dataThursday, June 6, 13
  • 31. Thank you• Jeremy Hanna• @jeromatron (twitter and irc)• jeremy@datastax.com• Ping me if you have any questionsThursday, June 6, 13