Casbase presentation

1,064
-1

Published on

A breakdown of the high level design of CasBase and vivid descriptions of the reverse indexes.

Published in: Technology, Sports
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,064
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Casbase presentation

  1. 1. CasBase Edward Capriolo
  2. 2. What is it? <ul><li>Do it yourself secondary indexes
  3. 3. Elevator pitch... tabular get Cassandr'ified
  4. 4. Pet project (not production ready...yet)
  5. 5. Semi quixotic quest to make c* work like RDBMS </li></ul>
  6. 6. MySQL vs Cassandra <ul><li>Row Oriented
  7. 7. Fixed columns
  8. 8. Normalized
  9. 9. Strict schema </li></ul><ul><li>Column Family
  10. 10. Ragged Columns
  11. 11. De-normalized *
  12. 12. Schema less * </li></ul>
  13. 13. Q. Because Cassandra is NoSQL store what is the first step in using it?
  14. 14. A. Strap relational database features and frameworks on top until it works like a relational database!* * Just Kidding / No Seriously
  15. 15. Obligatory Cassandra slides
  16. 16. Obligatory Data Model Slide
  17. 17. Obligatory physical data model
  18. 18. Obligatory Distribution Model
  19. 19. Free with Cassandra data model <ul><li>Cassandra has three levels of “index”
  20. 20. Row Key locates server(s) with data
  21. 21. SSTable Sorted by row key
  22. 22. Inside row columns are sorted by name </li><ul><li>Different sorts are available </li></ul><li>Writes do not have to read </li></ul>
  23. 23. CasBase motivation
  24. 24. Psuedo code on how CasBase would like to work <ul><li>Define a table and indexes
  25. 25. new Table(“mystuff”).addColumn(“a”,string).addIndex(“aidx”,[”a”]).create();
  26. 26. Insert data
  27. 27. client.insert(“mystuff”, “ed”, { a=5,b=6 } );
  28. 28. Ask questions
  29. 29. List<Col> a=client.find(“mystuff”, “a”, “5”); </li></ul>
  30. 30. Things missing <ul><li>Primary key enforcement
  31. 31. Unique index enforcement
  32. 32. Indexes of column names i.e. rows with column.name=username (ldap presence) *
  33. 33. Index on value i.e. username ='bob' age>4 age<8 *
  34. 34. * well not exactly. 0.7 added secondary indexes, but still reasons to make your own </li></ul>
  35. 35. Choosing features that matter to you <ul><li>Primary key / row key enforcement
  36. 36. Inserts / overcerts
  37. 37. Specific column must exist for row on insert
  38. 38. Unique indexes
  39. 39. On delete or updates repair index now or defer until read </li></ul>
  40. 40. “ Auto-magically delicious” index building in CasBase
  41. 41. Background <ul><li>Composite columns (link)
  42. 42. Indexes in Cassandra (ed enuff) </li></ul>
  43. 43. Composites why do you need them? <ul><li>Looks like packing bytes is ok
  44. 44. Escaping?
  45. 45. Empties? </li></ul>
  46. 46. Case for composites <ul><li>Not always byte order
  47. 47. Schema validators
  48. 48. Reasonable slicing
  49. 49. cli support </li></ul>
  50. 50. Unique index <ul><li>There goes write without read!
  51. 51. But if you want it, you want it.
  52. 52. Not atomic in CasBase, could be with zookeeper/cages (maybe next month) </li></ul>
  53. 53. Non unique index <ul><li>Do not need read before write
  54. 54. Cardinality could be a challenge in some cases (not much different from relational) </li></ul>
  55. 55. Index implementation: Hashed <ul><li>One insert becomes two
  56. 56. set user['bsmith']['dog']='rover'
  57. 57. set userdogs['rover']['bsmith']='' </li></ul>
  58. 58. Hashed
  59. 59. Hashed
  60. 60. Hashed characteristics <ul><li>Does equality searches dog='rover' </li><ul><li>Done with c* slice </li></ul><li>Does exist / not exists </li><ul><li>Done with c* get_count </li></ul><li>But can not do ranges
  61. 61. dogs => 'rover' AND dogs <= 'sinbad' </li></ul>
  62. 62. So how can we build indexes for range queries? <ul><li>Use single key (columns are ordered) </li><ul><li>That makes a contention point
  63. 63. Row not sharded (c* replication unit is row)
  64. 64. Won't scale </li></ul><li>Do not mention super columns (same fundamental problem)
  65. 65. Do not even mention order preserving partitioner (Mdennis will find you) </li></ul>
  66. 66. If you only remember one thing from this talk
  67. 67. Index Implementation: Ordered Buckets <ul><li>One insert becomes two
  68. 68. Create a fixed number of shards/buckets
  69. 69. set user['bsmith']['dog']='rover'
  70. 70. set userdogs['hash(rover) % buckets'][composite(rover,bsmith)]=''
  71. 71. Value mod buckets finds shard key
  72. 72. Column composite(value,src_row) </li></ul>
  73. 74. Properties of ordered buckets <ul><li>No read before write on insert
  74. 75. 1 look up for equality/exist search
  75. 76. Each bucket is ordered
  76. 77. Getting all results requires optimized get_slice on all buckets </li><ul><li>Bucket1: name > roger and name < sinbad
  77. 78. Bucket2: name > roger and name < sinbad ... </li></ul></ul>
  78. 79. AnyType, because CasBase needs null
  79. 80. Dealing with nulls and '' <ul><li>Null is a pretty big part of life in RDBMS
  80. 81. C* does not allow null or '' rowkey, column name or value
  81. 82. Types like LongType don't have a null
  82. 83. Argue that a non existing column is null
  83. 84. But you can not build a reverse index where 'value is null' </li></ul>
  84. 85. Solution: Create abstract type AnyType <ul><li>any column could be null: int, string, etc
  85. 86. Push meta data down to column
  86. 87. byte[0] specifies type 1=int, 2=string, 3=varint, 4=binary, 5=gson serialized obj
  87. 88. byte[1]-byte[n] is actual data.
  88. 89. Sorting - Types sort first, 2 nd sort is compareTo </li></ul>
  89. 90. Casbase and AnyType <ul><li>CasBase hides you from UGLY, UGLY ByteBuffers by forcing Any
  90. 91. Any a = new Any(String.class, “what up”);
  91. 92. AnyType.instance.composeAny(a) -> Ugly ByteBuffer for Cassandra
  92. 93. AnyType.instance.decomposeAny(BB) -> ByteBuffer back to Any </li></ul>
  93. 94. CasBase currently <ul><li>On github, compiles, all tests pass :)
  94. 95. Pet project but great concrete implementation of index building, composite columns
  95. 96. Now'ish: Efficient map reduce
  96. 97. Future: locking w zookeeper/cages
  97. 98. Far Future: Query engine (right now API only) </li></ul>
  98. 99. Hack at it! <ul><li>Http://github.com/edwardcapriolo/casbase
  99. 100. http://github.com/edwardcapriolo/Cassandra-AnyType </li></ul>
  100. 101. ?????????? ?Questions? ??????????
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×