Cassandra Hadoop Best Practices by Jeremy Hanna
Upcoming SlideShare
Loading in...5
×
 

Cassandra Hadoop Best Practices by Jeremy Hanna

on

  • 3,823 views

 

Statistics

Views

Total Views
3,823
Views on SlideShare
1,224
Embed Views
2,599

Actions

Likes
0
Downloads
8
Comments
0

5 Embeds 2,599

http://hugfrance.fr 2585
http://www.google.fr 6
https://www.google.fr 4
http://translate.googleusercontent.com 3
http://ranksit.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-NonCommercial-NoDerivs LicenseCC Attribution-NonCommercial-NoDerivs LicenseCC Attribution-NonCommercial-NoDerivs License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Cassandra Hadoop Best Practices by Jeremy Hanna Cassandra Hadoop Best Practices by Jeremy Hanna Presentation Transcript

  • Hadoop + CassandraBest PracticesThursday, June 6, 13
  • Some BackgroundThursday, June 6, 13
  • Some Background• Hadoop support since early 2010Thursday, June 6, 13
  • Some Background• Hadoop support since early 2010• MapReduce/Pig works with any Hadoop 1.xdistribution.Thursday, June 6, 13
  • Some Background• Hadoop support since early 2010• MapReduce/Pig works with any Hadoop 1.xdistribution.• Hive is a neatly integrated piece of DSEThursday, June 6, 13
  • Some Background• Hadoop support since early 2010• MapReduce/Pig works with any Hadoop 1.xdistribution.• Hive is a neatly integrated piece of DSE• Data locality just like with HDFSThursday, June 6, 13
  • Some Background• Hadoop support since early 2010• MapReduce/Pig works with any Hadoop 1.xdistribution.• Hive is a neatly integrated piece of DSE• Data locality just like with HDFS• Cassandra can handle ~200 CFsThursday, June 6, 13
  • SetupThursday, June 6, 13
  • Setup• Analytics specific datacenterThursday, June 6, 13
  • Setup• Analytics specific datacenter• Configure replication (KS/DC specific)Thursday, June 6, 13
  • Setup• Analytics specific datacenter• Configure replication (KS/DC specific)• Isolated reads at CL.LOCAL_QUORUMThursday, June 6, 13
  • Setup• Analytics specific datacenter• Configure replication (KS/DC specific)• Isolated reads at CL.LOCAL_QUORUM• Writes will be replicatedThursday, June 6, 13
  • Setup• Analytics specific datacenter• Configure replication (KS/DC specific)• Isolated reads at CL.LOCAL_QUORUM• Writes will be replicated• Same best practices as with Hadoop aloneThursday, June 6, 13
  • Vanilla HadoopThursday, June 6, 13
  • Vanilla Hadoop• Co-locate task trackers and data nodeswith Cassandra nodes (data locality)Thursday, June 6, 13
  • Vanilla Hadoop• Co-locate task trackers and data nodeswith Cassandra nodes (data locality)• Workload isolation with separateCassandra datacenter configuredThursday, June 6, 13
  • PlanningThursday, June 6, 13
  • Planning• MapReduce over full column familyThursday, June 6, 13
  • Planning• MapReduce over full column family• Model data accordinglyThursday, June 6, 13
  • Planning• MapReduce over full column family• Model data accordingly• Add more column familiesThursday, June 6, 13
  • Planning• MapReduce over full column family• Model data accordingly• Add more column families• Can use secondary index, but use cautionThursday, June 6, 13
  • ExecutionThursday, June 6, 13
  • Execution• Project and select early in your workflowThursday, June 6, 13
  • Execution• Project and select early in your workflow• Store common intermediate datasets (inCFS/HDFS)Thursday, June 6, 13
  • Execution• Project and select early in your workflow• Store common intermediate datasets (inCFS/HDFS)• Bulk loader output format excelsThursday, June 6, 13
  • Use CasesThursday, June 6, 13
  • Use Cases• Typical Hadoop tasksThursday, June 6, 13
  • Use Cases• Typical Hadoop tasks• Validate dataThursday, June 6, 13
  • Use Cases• Typical Hadoop tasks• Validate data• Fix dataThursday, June 6, 13
  • Use Cases• Typical Hadoop tasks• Validate data• Fix data• Bootstrap a new column family fromexisting dataThursday, June 6, 13
  • Thank you• Jeremy Hanna• @jeromatron (twitter and irc)• jeremy@datastax.com• Ping me if you have any questionsThursday, June 6, 13