• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
CFS: Cassandra Backed Storage for Hadoop
 

CFS: Cassandra Backed Storage for Hadoop

on

  • 2,234 views

Nick Bailey

Nick Bailey
@Nickmbailey
nick@datastax.com

Statistics

Views

Total Views
2,234
Views on SlideShare
2,233
Embed Views
1

Actions

Likes
2
Downloads
23
Comments
0

1 Embed 1

http://localhost 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    CFS: Cassandra Backed Storage for Hadoop CFS: Cassandra Backed Storage for Hadoop Presentation Transcript

    • CFSCassandra-backed storage for HadoopNick Bailey@nickmbaileynick@datastax.com
    • ©2012 DataStaxMotivation2
    • ©2012 DataStaxHelp me Cassandra, you’re myonly hope3
    • ©2012 DataStaxCassandra• Distributed architecture• No SPOF• Scalable• Real time data• No ad-hoc query support4
    • ©2012 DataStaxCassandra, why can’t you...5
    • ©2012 DataStax...do the things Hadoop wasbuilt for.6
    • ©2012 DataStaxCassandra + Hadoop = <37
    • ©2012 DataStaxThe Solution• InputFormat/OutputFormat• Unfortunately, still need a DFS• Run tasktrackers/datanodes locally• Data Locality FTW!• Run namenode/jobtracker somewhere• Since Cassandra 0.6 (the dark ages)8
    • ©2012 DataStaxOk, but what about these partsthat suck...9
    • ©2012 DataStaxDo not want...• Multiple hadoop stacks?• SPOF?• 3 JVMS?10
    • ©2012 DataStaxCFS11
    • ©2012 DataStaxCassandra Data model in 1minute12
    • ©2012 DataStaxColumn Families• Column Family ~= Table• Row Key + columns• Columns are sparse13
    • ©2012 DataStaxStatic - Users Column Family14Row Keynickmbailey password: * name: Nickzznate password: * name: Nate phone: 512-7777
    • ©2012 DataStaxSelect * from Users where name=Nick;Secondary Indexes15
    • ©2012 DataStaxDynamic - Friends16Row Keynickmbailey zznate: thobbs:zznate jbeiber: thobbs: steve_watt:
    • ©2012 DataStaxSo what about CFS...17
    • ©2012 DataStaxSimple...18
    • ©2012 DataStax 19
    • ©2012 DataStaxCF: inode• Essentially, namenode replacement• File metadata20
    • ©2012 DataStax 21
    • ©2012 DataStaxCF: inode• Row Key = UUID• Allows for file renames• Secondary indexes for file browsing• Columns:22Columnfilename /home/nick/data.txtparent_path /home/nick/attributes nick:nick:777TimeUUID1 <block metadata>TimeUUID2 <block metadata>TimeUUID3 <block metadata>...
    • ©2012 DataStax 23
    • ©2012 DataStaxCF: sblocks• Essentially, datanode replacement• Stores actual contents of files• Each row is an hdfs block• Row Key = Block ID24ColumnTimeUUID1 <compressed file data>TimeUUID2 <compressed file data>TimeUUID3 <compressed file data>...
    • ©2012 DataStax 25
    • ©2012 DataStaxWrites• Write file metadata• Split into blocks• Still controlled by ‘dfs.block.size’• also ‘cfs.local.subblock.size’• Read in a block• split into sub blocks• Update inode, sblocks• rinse, repeat26
    • ©2012 DataStax 27
    • ©2012 DataStaxReads• Check for file in inode• Determine appropriate blocks• Request blocks via thrift• If data is local...• ...get location on local filesystem• If data is remote...• ...get actual file content via thrift28
    • ©2012 DataStaxWhat Else?• Current Implementation: 1.0.4• <property><name>fs.cfs.impl</name><value>com.datastax.bdp.hadoop.cfs.CassandraFileSystem</value></property>• Supports HDFS append()• Immutability makes things easy• See the first incarnation• https://github.com/riptano/brisk29
    • Want a job?nick@datastax.com
    • Questions?