Your SlideShare is downloading. ×
CFS: Cassandra Backed Storage for Hadoop
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

CFS: Cassandra Backed Storage for Hadoop

2,330
views

Published on

Nick Bailey …

Nick Bailey
@Nickmbailey
nick@datastax.com

Published in: Technology, Business

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,330
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
34
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. CFSCassandra-backed storage for HadoopNick Bailey@nickmbaileynick@datastax.com
  • 2. ©2012 DataStaxMotivation2
  • 3. ©2012 DataStaxHelp me Cassandra, you’re myonly hope3
  • 4. ©2012 DataStaxCassandra• Distributed architecture• No SPOF• Scalable• Real time data• No ad-hoc query support4
  • 5. ©2012 DataStaxCassandra, why can’t you...5
  • 6. ©2012 DataStax...do the things Hadoop wasbuilt for.6
  • 7. ©2012 DataStaxCassandra + Hadoop = <37
  • 8. ©2012 DataStaxThe Solution• InputFormat/OutputFormat• Unfortunately, still need a DFS• Run tasktrackers/datanodes locally• Data Locality FTW!• Run namenode/jobtracker somewhere• Since Cassandra 0.6 (the dark ages)8
  • 9. ©2012 DataStaxOk, but what about these partsthat suck...9
  • 10. ©2012 DataStaxDo not want...• Multiple hadoop stacks?• SPOF?• 3 JVMS?10
  • 11. ©2012 DataStaxCFS11
  • 12. ©2012 DataStaxCassandra Data model in 1minute12
  • 13. ©2012 DataStaxColumn Families• Column Family ~= Table• Row Key + columns• Columns are sparse13
  • 14. ©2012 DataStaxStatic - Users Column Family14Row Keynickmbailey password: * name: Nickzznate password: * name: Nate phone: 512-7777
  • 15. ©2012 DataStaxSelect * from Users where name=Nick;Secondary Indexes15
  • 16. ©2012 DataStaxDynamic - Friends16Row Keynickmbailey zznate: thobbs:zznate jbeiber: thobbs: steve_watt:
  • 17. ©2012 DataStaxSo what about CFS...17
  • 18. ©2012 DataStaxSimple...18
  • 19. ©2012 DataStax 19
  • 20. ©2012 DataStaxCF: inode• Essentially, namenode replacement• File metadata20
  • 21. ©2012 DataStax 21
  • 22. ©2012 DataStaxCF: inode• Row Key = UUID• Allows for file renames• Secondary indexes for file browsing• Columns:22Columnfilename /home/nick/data.txtparent_path /home/nick/attributes nick:nick:777TimeUUID1 <block metadata>TimeUUID2 <block metadata>TimeUUID3 <block metadata>...
  • 23. ©2012 DataStax 23
  • 24. ©2012 DataStaxCF: sblocks• Essentially, datanode replacement• Stores actual contents of files• Each row is an hdfs block• Row Key = Block ID24ColumnTimeUUID1 <compressed file data>TimeUUID2 <compressed file data>TimeUUID3 <compressed file data>...
  • 25. ©2012 DataStax 25
  • 26. ©2012 DataStaxWrites• Write file metadata• Split into blocks• Still controlled by ‘dfs.block.size’• also ‘cfs.local.subblock.size’• Read in a block• split into sub blocks• Update inode, sblocks• rinse, repeat26
  • 27. ©2012 DataStax 27
  • 28. ©2012 DataStaxReads• Check for file in inode• Determine appropriate blocks• Request blocks via thrift• If data is local...• ...get location on local filesystem• If data is remote...• ...get actual file content via thrift28
  • 29. ©2012 DataStaxWhat Else?• Current Implementation: 1.0.4• <property><name>fs.cfs.impl</name><value>com.datastax.bdp.hadoop.cfs.CassandraFileSystem</value></property>• Supports HDFS append()• Immutability makes things easy• See the first incarnation• https://github.com/riptano/brisk29
  • 30. Want a job?nick@datastax.com
  • 31. Questions?

×