Your SlideShare is downloading. ×
Cassandra Hadoop Best Practices by Jeremy Hanna
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Cassandra Hadoop Best Practices by Jeremy Hanna

3,955
views

Published on

Published in: Technology, Business

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
3,955
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
12
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Hadoop + CassandraBest PracticesThursday, June 6, 13
  • 2. Some BackgroundThursday, June 6, 13
  • 3. Some Background• Hadoop support since early 2010Thursday, June 6, 13
  • 4. Some Background• Hadoop support since early 2010• MapReduce/Pig works with any Hadoop 1.xdistribution.Thursday, June 6, 13
  • 5. Some Background• Hadoop support since early 2010• MapReduce/Pig works with any Hadoop 1.xdistribution.• Hive is a neatly integrated piece of DSEThursday, June 6, 13
  • 6. Some Background• Hadoop support since early 2010• MapReduce/Pig works with any Hadoop 1.xdistribution.• Hive is a neatly integrated piece of DSE• Data locality just like with HDFSThursday, June 6, 13
  • 7. Some Background• Hadoop support since early 2010• MapReduce/Pig works with any Hadoop 1.xdistribution.• Hive is a neatly integrated piece of DSE• Data locality just like with HDFS• Cassandra can handle ~200 CFsThursday, June 6, 13
  • 8. SetupThursday, June 6, 13
  • 9. Setup• Analytics specific datacenterThursday, June 6, 13
  • 10. Setup• Analytics specific datacenter• Configure replication (KS/DC specific)Thursday, June 6, 13
  • 11. Setup• Analytics specific datacenter• Configure replication (KS/DC specific)• Isolated reads at CL.LOCAL_QUORUMThursday, June 6, 13
  • 12. Setup• Analytics specific datacenter• Configure replication (KS/DC specific)• Isolated reads at CL.LOCAL_QUORUM• Writes will be replicatedThursday, June 6, 13
  • 13. Setup• Analytics specific datacenter• Configure replication (KS/DC specific)• Isolated reads at CL.LOCAL_QUORUM• Writes will be replicated• Same best practices as with Hadoop aloneThursday, June 6, 13
  • 14. Vanilla HadoopThursday, June 6, 13
  • 15. Vanilla Hadoop• Co-locate task trackers and data nodeswith Cassandra nodes (data locality)Thursday, June 6, 13
  • 16. Vanilla Hadoop• Co-locate task trackers and data nodeswith Cassandra nodes (data locality)• Workload isolation with separateCassandra datacenter configuredThursday, June 6, 13
  • 17. PlanningThursday, June 6, 13
  • 18. Planning• MapReduce over full column familyThursday, June 6, 13
  • 19. Planning• MapReduce over full column family• Model data accordinglyThursday, June 6, 13
  • 20. Planning• MapReduce over full column family• Model data accordingly• Add more column familiesThursday, June 6, 13
  • 21. Planning• MapReduce over full column family• Model data accordingly• Add more column families• Can use secondary index, but use cautionThursday, June 6, 13
  • 22. ExecutionThursday, June 6, 13
  • 23. Execution• Project and select early in your workflowThursday, June 6, 13
  • 24. Execution• Project and select early in your workflow• Store common intermediate datasets (inCFS/HDFS)Thursday, June 6, 13
  • 25. Execution• Project and select early in your workflow• Store common intermediate datasets (inCFS/HDFS)• Bulk loader output format excelsThursday, June 6, 13
  • 26. Use CasesThursday, June 6, 13
  • 27. Use Cases• Typical Hadoop tasksThursday, June 6, 13
  • 28. Use Cases• Typical Hadoop tasks• Validate dataThursday, June 6, 13
  • 29. Use Cases• Typical Hadoop tasks• Validate data• Fix dataThursday, June 6, 13
  • 30. Use Cases• Typical Hadoop tasks• Validate data• Fix data• Bootstrap a new column family fromexisting dataThursday, June 6, 13
  • 31. Thank you• Jeremy Hanna• @jeromatron (twitter and irc)• jeremy@datastax.com• Ping me if you have any questionsThursday, June 6, 13