Your SlideShare is downloading. ×
0
Cassandra Hadoop Best Practices by Jeremy Hanna
Cassandra Hadoop Best Practices by Jeremy Hanna
Cassandra Hadoop Best Practices by Jeremy Hanna
Cassandra Hadoop Best Practices by Jeremy Hanna
Cassandra Hadoop Best Practices by Jeremy Hanna
Cassandra Hadoop Best Practices by Jeremy Hanna
Cassandra Hadoop Best Practices by Jeremy Hanna
Cassandra Hadoop Best Practices by Jeremy Hanna
Cassandra Hadoop Best Practices by Jeremy Hanna
Cassandra Hadoop Best Practices by Jeremy Hanna
Cassandra Hadoop Best Practices by Jeremy Hanna
Cassandra Hadoop Best Practices by Jeremy Hanna
Cassandra Hadoop Best Practices by Jeremy Hanna
Cassandra Hadoop Best Practices by Jeremy Hanna
Cassandra Hadoop Best Practices by Jeremy Hanna
Cassandra Hadoop Best Practices by Jeremy Hanna
Cassandra Hadoop Best Practices by Jeremy Hanna
Cassandra Hadoop Best Practices by Jeremy Hanna
Cassandra Hadoop Best Practices by Jeremy Hanna
Cassandra Hadoop Best Practices by Jeremy Hanna
Cassandra Hadoop Best Practices by Jeremy Hanna
Cassandra Hadoop Best Practices by Jeremy Hanna
Cassandra Hadoop Best Practices by Jeremy Hanna
Cassandra Hadoop Best Practices by Jeremy Hanna
Cassandra Hadoop Best Practices by Jeremy Hanna
Cassandra Hadoop Best Practices by Jeremy Hanna
Cassandra Hadoop Best Practices by Jeremy Hanna
Cassandra Hadoop Best Practices by Jeremy Hanna
Cassandra Hadoop Best Practices by Jeremy Hanna
Cassandra Hadoop Best Practices by Jeremy Hanna
Cassandra Hadoop Best Practices by Jeremy Hanna
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Cassandra Hadoop Best Practices by Jeremy Hanna

4,013

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
4,013
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
13
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Hadoop + CassandraBest PracticesThursday, June 6, 13
  • 2. Some BackgroundThursday, June 6, 13
  • 3. Some Background• Hadoop support since early 2010Thursday, June 6, 13
  • 4. Some Background• Hadoop support since early 2010• MapReduce/Pig works with any Hadoop 1.xdistribution.Thursday, June 6, 13
  • 5. Some Background• Hadoop support since early 2010• MapReduce/Pig works with any Hadoop 1.xdistribution.• Hive is a neatly integrated piece of DSEThursday, June 6, 13
  • 6. Some Background• Hadoop support since early 2010• MapReduce/Pig works with any Hadoop 1.xdistribution.• Hive is a neatly integrated piece of DSE• Data locality just like with HDFSThursday, June 6, 13
  • 7. Some Background• Hadoop support since early 2010• MapReduce/Pig works with any Hadoop 1.xdistribution.• Hive is a neatly integrated piece of DSE• Data locality just like with HDFS• Cassandra can handle ~200 CFsThursday, June 6, 13
  • 8. SetupThursday, June 6, 13
  • 9. Setup• Analytics specific datacenterThursday, June 6, 13
  • 10. Setup• Analytics specific datacenter• Configure replication (KS/DC specific)Thursday, June 6, 13
  • 11. Setup• Analytics specific datacenter• Configure replication (KS/DC specific)• Isolated reads at CL.LOCAL_QUORUMThursday, June 6, 13
  • 12. Setup• Analytics specific datacenter• Configure replication (KS/DC specific)• Isolated reads at CL.LOCAL_QUORUM• Writes will be replicatedThursday, June 6, 13
  • 13. Setup• Analytics specific datacenter• Configure replication (KS/DC specific)• Isolated reads at CL.LOCAL_QUORUM• Writes will be replicated• Same best practices as with Hadoop aloneThursday, June 6, 13
  • 14. Vanilla HadoopThursday, June 6, 13
  • 15. Vanilla Hadoop• Co-locate task trackers and data nodeswith Cassandra nodes (data locality)Thursday, June 6, 13
  • 16. Vanilla Hadoop• Co-locate task trackers and data nodeswith Cassandra nodes (data locality)• Workload isolation with separateCassandra datacenter configuredThursday, June 6, 13
  • 17. PlanningThursday, June 6, 13
  • 18. Planning• MapReduce over full column familyThursday, June 6, 13
  • 19. Planning• MapReduce over full column family• Model data accordinglyThursday, June 6, 13
  • 20. Planning• MapReduce over full column family• Model data accordingly• Add more column familiesThursday, June 6, 13
  • 21. Planning• MapReduce over full column family• Model data accordingly• Add more column families• Can use secondary index, but use cautionThursday, June 6, 13
  • 22. ExecutionThursday, June 6, 13
  • 23. Execution• Project and select early in your workflowThursday, June 6, 13
  • 24. Execution• Project and select early in your workflow• Store common intermediate datasets (inCFS/HDFS)Thursday, June 6, 13
  • 25. Execution• Project and select early in your workflow• Store common intermediate datasets (inCFS/HDFS)• Bulk loader output format excelsThursday, June 6, 13
  • 26. Use CasesThursday, June 6, 13
  • 27. Use Cases• Typical Hadoop tasksThursday, June 6, 13
  • 28. Use Cases• Typical Hadoop tasks• Validate dataThursday, June 6, 13
  • 29. Use Cases• Typical Hadoop tasks• Validate data• Fix dataThursday, June 6, 13
  • 30. Use Cases• Typical Hadoop tasks• Validate data• Fix data• Bootstrap a new column family fromexisting dataThursday, June 6, 13
  • 31. Thank you• Jeremy Hanna• @jeromatron (twitter and irc)• jeremy@datastax.com• Ping me if you have any questionsThursday, June 6, 13

×