Casbase presentation
Upcoming SlideShare
Loading in...5

Casbase presentation



A breakdown of the high level design of CasBase and vivid descriptions of the reverse indexes.

A breakdown of the high level design of CasBase and vivid descriptions of the reverse indexes.



Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds


Upload Details

Uploaded via as OpenOffice

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Casbase presentation Casbase presentation Presentation Transcript

  • CasBase Edward Capriolo
  • What is it?
    • Do it yourself secondary indexes
    • Elevator pitch... tabular get Cassandr'ified View slide
    • Pet project (not production ready...yet) View slide
    • Semi quixotic quest to make c* work like RDBMS
  • MySQL vs Cassandra
    • Row Oriented
    • Fixed columns
    • Normalized
    • Strict schema
    • Column Family
    • Ragged Columns
    • De-normalized *
    • Schema less *
  • Q. Because Cassandra is NoSQL store what is the first step in using it?
  • A. Strap relational database features and frameworks on top until it works like a relational database!* * Just Kidding / No Seriously
  • Obligatory Cassandra slides
  • Obligatory Data Model Slide
  • Obligatory physical data model
  • Obligatory Distribution Model
  • Free with Cassandra data model
    • Cassandra has three levels of “index”
    • Row Key locates server(s) with data
    • SSTable Sorted by row key
    • Inside row columns are sorted by name
      • Different sorts are available
    • Writes do not have to read
  • CasBase motivation
  • Psuedo code on how CasBase would like to work
    • Define a table and indexes
    • new Table(“mystuff”).addColumn(“a”,string).addIndex(“aidx”,[”a”]).create();
    • Insert data
    • client.insert(“mystuff”, “ed”, { a=5,b=6 } );
    • Ask questions
    • List<Col> a=client.find(“mystuff”, “a”, “5”);
  • Things missing
    • Primary key enforcement
    • Unique index enforcement
    • Indexes of column names i.e. rows with (ldap presence) *
    • Index on value i.e. username ='bob' age>4 age<8 *
    • * well not exactly. 0.7 added secondary indexes, but still reasons to make your own
  • Choosing features that matter to you
    • Primary key / row key enforcement
    • Inserts / overcerts
    • Specific column must exist for row on insert
    • Unique indexes
    • On delete or updates repair index now or defer until read
  • “ Auto-magically delicious” index building in CasBase
  • Background
    • Composite columns (link)
    • Indexes in Cassandra (ed enuff)
  • Composites why do you need them?
    • Looks like packing bytes is ok
    • Escaping?
    • Empties?
  • Case for composites
    • Not always byte order
    • Schema validators
    • Reasonable slicing
    • cli support
  • Unique index
    • There goes write without read!
    • But if you want it, you want it.
    • Not atomic in CasBase, could be with zookeeper/cages (maybe next month)
  • Non unique index
    • Do not need read before write
    • Cardinality could be a challenge in some cases (not much different from relational)
  • Index implementation: Hashed
    • One insert becomes two
    • set user['bsmith']['dog']='rover'
    • set userdogs['rover']['bsmith']=''
  • Hashed
  • Hashed
  • Hashed characteristics
    • Does equality searches dog='rover'
      • Done with c* slice
    • Does exist / not exists
      • Done with c* get_count
    • But can not do ranges
    • dogs => 'rover' AND dogs <= 'sinbad'
  • So how can we build indexes for range queries?
    • Use single key (columns are ordered)
      • That makes a contention point
      • Row not sharded (c* replication unit is row)
      • Won't scale
    • Do not mention super columns (same fundamental problem)
    • Do not even mention order preserving partitioner (Mdennis will find you)
  • If you only remember one thing from this talk
  • Index Implementation: Ordered Buckets
    • One insert becomes two
    • Create a fixed number of shards/buckets
    • set user['bsmith']['dog']='rover'
    • set userdogs['hash(rover) % buckets'][composite(rover,bsmith)]=''
    • Value mod buckets finds shard key
    • Column composite(value,src_row)
  • Properties of ordered buckets
    • No read before write on insert
    • 1 look up for equality/exist search
    • Each bucket is ordered
    • Getting all results requires optimized get_slice on all buckets
      • Bucket1: name > roger and name < sinbad
      • Bucket2: name > roger and name < sinbad ...
  • AnyType, because CasBase needs null
  • Dealing with nulls and ''
    • Null is a pretty big part of life in RDBMS
    • C* does not allow null or '' rowkey, column name or value
    • Types like LongType don't have a null
    • Argue that a non existing column is null
    • But you can not build a reverse index where 'value is null'
  • Solution: Create abstract type AnyType
    • any column could be null: int, string, etc
    • Push meta data down to column
    • byte[0] specifies type 1=int, 2=string, 3=varint, 4=binary, 5=gson serialized obj
    • byte[1]-byte[n] is actual data.
    • Sorting - Types sort first, 2 nd sort is compareTo
  • Casbase and AnyType
    • CasBase hides you from UGLY, UGLY ByteBuffers by forcing Any
    • Any a = new Any(String.class, “what up”);
    • AnyType.instance.composeAny(a) -> Ugly ByteBuffer for Cassandra
    • AnyType.instance.decomposeAny(BB) -> ByteBuffer back to Any
  • CasBase currently
    • On github, compiles, all tests pass :)
    • Pet project but great concrete implementation of index building, composite columns
    • Now'ish: Efficient map reduce
    • Future: locking w zookeeper/cages
    • Far Future: Query engine (right now API only)
  • Hack at it!
    • Http://
  • ?????????? ?Questions? ??????????