Cassandra Basics
              Indexing

     Benjamin Black, b@b3k.us
Relational stores are
SCHEMA ORIENTED
Start from your SCHEMA &
WORK FORWARDS
Column stores are
QUERY ORIENTED
Start from your QUERIES &
WORK BACKWARDS
AT SCALE
AT SCALE
           Denormalization is
              THE NORM
AT SCALE
AT SCALE
           Everything depends on
               THE INDICES
Cassandra is an
INDEX CONSTRUCTION KIT
Column Family
Two-level Map

key: {
  column: value,
  column: value,
  ...
 }
Super Column Family
Three-level Map
key: {
   supercolumn: {
       column:value,
      column: value
   },
   supercolumn: {
     ...
   }
 }
column sorting defined by
         CompareWith/
CompareSubcolumnsWith
TimeUUIDType
  UTF8Type
                ASCIIType
LongType

     LexicalUUIDType
row placement determined by
             Partitioner
RandomPartitioner
Place based on MD5 of key




        OrderPreservingPartitioner
               Place based on actual key
Rows are sorted by key on each node
Regardless of partitioner
One example in
TWO ACTS
Prelude
A USER DATABASE
<ColumnFamily Name=”Users”
       CompareWith=”UTF8Type” />
“b”:    {“name”:”Ben”, “street”:”1234 Oak St.”,
        “city”:”Seattle”, “state”:”WA”}
“jason”: {”name”:”Jason”, “street”...
SELECT name FROM Users
WHERE state=”WA”
SELECT name FROM Users
               WHERE state=”WA”

How is WHERE clause
formed?
Act One
Supercolumn Indexing
<ColumnFamily Name=”LocationUserIndexSCF”
       CompareWith=”UTF8Type”
       CompareSubcolumnsWith=”UTF8Type”
       Col...
[state]: {
  [city1]: {[name1]:[user1], [name2]:[user2], ... },
  [city2]: {[name3]:[user3], [name4]:[user4], ... },
  ......
“CA”: {

 “San Francisco”: {”Jennifer”: “jen1982”}
}
“MA”: {

 “Boston”: {”Albert”: “albert”}
}
“WA”: {

 “Bellingham”: {”...
Row Key


“CA”: {

 “San Francisco”: {”Jennifer”: “jen1982”}
}
“MA”: {

 “Boston”: {”Albert”: “albert”}
}
“WA”: {

 “Belli...
Row Key
                 Super Column

“CA”: {

 “San Francisco”: {”Jennifer”: “jen1982”}
}
“MA”: {

 “Boston”: {”Albert”:...
Row Key
                                     Colum
                 Super Column
                                     n
“C...
Row Key
                                     Colum
                 Super Column                Value
                    ...
Show me
EVERYONE IN WASHINGTON
get(:LocationUserIndexSCF, ‘WA’)
{

   “Bellingham”: {”Jason”: “jason”},

   “Seattle”: {”Ben”: “b”, ”Zack”: “zack”}
}
Act Two
Composite Key Indexing
Order Preserving Partitioner
                          +
        Range Queries
<ColumnFamily Name=”LocationUserIndexCF”
       CompareWith=”UTF8Type” />
[state1]/[city1]:   {[name1]:[user1], [name2]:[user2], ... }
[state1]/[city2]:   {[name3]:[user3], [name4]:[user4], ... }
...
“CA/San Francisco”: {”Jennifer”: “jen1982”}
“MA/Boston”: {”Albert”: “albert”}
“WA/Bellingham”: {”Jason”: “jason”}
“WA/Seat...
Show me
EVERYONE IN WASHINGTON
get_range(:LocationUserIndexCF, {:start: 'WA',
                          :finish:'WB'})
{
    ”WA/Bellingham”: {”Jason”: “jason”},
    “WA/Seattle”: {”Ben”: “b”, “Zack”: “zack”}
}
Finale
BUILD SOMETHING AWESOME
(This part is up to you)
Appendix
EXAMPLE KEYSPACE
<Keyspace Name="UserDb">
  <ColumnFamily Name="Users"
          CompareWith="UTF8Type" />

  <ColumnFamily Name="LocationU...
Upcoming SlideShare
Loading in...5
×

Cassandra Basics: Indexing

28,010

Published on

An introduction to indexing with supercolumns and range queries in Cassandra.

Published in: Technology, Education, Business
4 Comments
39 Likes
Statistics
Notes
No Downloads
Views
Total Views
28,010
On Slideshare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
787
Comments
4
Likes
39
Embeds 0
No embeds

No notes for slide












































  • Cassandra Basics: Indexing

    1. 1. Cassandra Basics Indexing Benjamin Black, b@b3k.us
    2. 2. Relational stores are SCHEMA ORIENTED
    3. 3. Start from your SCHEMA & WORK FORWARDS
    4. 4. Column stores are QUERY ORIENTED
    5. 5. Start from your QUERIES & WORK BACKWARDS
    6. 6. AT SCALE
    7. 7. AT SCALE Denormalization is THE NORM
    8. 8. AT SCALE
    9. 9. AT SCALE Everything depends on THE INDICES
    10. 10. Cassandra is an INDEX CONSTRUCTION KIT
    11. 11. Column Family
    12. 12. Two-level Map key: { column: value, column: value, ... }
    13. 13. Super Column Family
    14. 14. Three-level Map key: { supercolumn: { column:value, column: value }, supercolumn: { ... } }
    15. 15. column sorting defined by CompareWith/ CompareSubcolumnsWith
    16. 16. TimeUUIDType UTF8Type ASCIIType LongType LexicalUUIDType
    17. 17. row placement determined by Partitioner
    18. 18. RandomPartitioner Place based on MD5 of key OrderPreservingPartitioner Place based on actual key
    19. 19. Rows are sorted by key on each node Regardless of partitioner
    20. 20. One example in TWO ACTS
    21. 21. Prelude A USER DATABASE
    22. 22. <ColumnFamily Name=”Users” CompareWith=”UTF8Type” />
    23. 23. “b”: {“name”:”Ben”, “street”:”1234 Oak St.”, “city”:”Seattle”, “state”:”WA”} “jason”: {”name”:”Jason”, “street”:”456 First Ave.”, “city”:”Bellingham”, “state”:”WA”} “zack”: {”name”: “Zack”, “street”: “4321 Pine St.”, “city”: “Seattle”, “state”: “WA”} “jen1982”: {”name”:”Jennifer”, “street”:”1120 Foo Lane”, “city”:”San Francisco”, “state”:”CA”} “albert”: {”name”:”Albert”, “street”:”2364 South St.”, “city”:”Boston”, “state”:”MA”}
    24. 24. SELECT name FROM Users WHERE state=”WA”
    25. 25. SELECT name FROM Users WHERE state=”WA” How is WHERE clause formed?
    26. 26. Act One Supercolumn Indexing
    27. 27. <ColumnFamily Name=”LocationUserIndexSCF” CompareWith=”UTF8Type” CompareSubcolumnsWith=”UTF8Type” ColumnType=”Super” />
    28. 28. [state]: { [city1]: {[name1]:[user1], [name2]:[user2], ... }, [city2]: {[name3]:[user3], [name4]:[user4], ... }, ... [cityX]: {[name5]:[user5], [name6]:[user6], ... } }
    29. 29. “CA”: { “San Francisco”: {”Jennifer”: “jen1982”} } “MA”: { “Boston”: {”Albert”: “albert”} } “WA”: { “Bellingham”: {”Jason”: “jason”}, “Seattle”: {”Ben”: “b”, ”Zack”: “zack”} }
    30. 30. Row Key “CA”: { “San Francisco”: {”Jennifer”: “jen1982”} } “MA”: { “Boston”: {”Albert”: “albert”} } “WA”: { “Bellingham”: {”Jason”: “jason”}, “Seattle”: {”Ben”: “b”, ”Zack”: “zack”} }
    31. 31. Row Key Super Column “CA”: { “San Francisco”: {”Jennifer”: “jen1982”} } “MA”: { “Boston”: {”Albert”: “albert”} } “WA”: { “Bellingham”: {”Jason”: “jason”}, “Seattle”: {”Ben”: “b”, ”Zack”: “zack”} }
    32. 32. Row Key Colum Super Column n “CA”: { “San Francisco”: {”Jennifer”: “jen1982”} } “MA”: { “Boston”: {”Albert”: “albert”} } “WA”: { “Bellingham”: {”Jason”: “jason”}, “Seattle”: {”Ben”: “b”, ”Zack”: “zack”} }
    33. 33. Row Key Colum Super Column Value n “CA”: { “San Francisco”: {”Jennifer”: “jen1982”} } “MA”: { “Boston”: {”Albert”: “albert”} } “WA”: { “Bellingham”: {”Jason”: “jason”}, “Seattle”: {”Ben”: “b”, ”Zack”: “zack”} }
    34. 34. Show me EVERYONE IN WASHINGTON
    35. 35. get(:LocationUserIndexSCF, ‘WA’)
    36. 36. { “Bellingham”: {”Jason”: “jason”}, “Seattle”: {”Ben”: “b”, ”Zack”: “zack”} }
    37. 37. Act Two Composite Key Indexing
    38. 38. Order Preserving Partitioner + Range Queries
    39. 39. <ColumnFamily Name=”LocationUserIndexCF” CompareWith=”UTF8Type” />
    40. 40. [state1]/[city1]: {[name1]:[user1], [name2]:[user2], ... } [state1]/[city2]: {[name3]:[user3], [name4]:[user4], ... } [state2]/[city1]: {[name5]:[user5], [name6]:[user6], ... } ... [stateX]/[cityY]: {[name7]:[user7], [name8]:[user8], ... }
    41. 41. “CA/San Francisco”: {”Jennifer”: “jen1982”} “MA/Boston”: {”Albert”: “albert”} “WA/Bellingham”: {”Jason”: “jason”} “WA/Seattle”: {”Ben”: “b”, “Zack”: “zack”}
    42. 42. Show me EVERYONE IN WASHINGTON
    43. 43. get_range(:LocationUserIndexCF, {:start: 'WA', :finish:'WB'})
    44. 44. { ”WA/Bellingham”: {”Jason”: “jason”}, “WA/Seattle”: {”Ben”: “b”, “Zack”: “zack”} }
    45. 45. Finale BUILD SOMETHING AWESOME
    46. 46. (This part is up to you)
    47. 47. Appendix EXAMPLE KEYSPACE
    48. 48. <Keyspace Name="UserDb"> <ColumnFamily Name="Users" CompareWith="UTF8Type" /> <ColumnFamily Name="LocationUserIndexSCF" CompareWith="UTF8Type" CompareSubcolumnsWith="UTF8Type" ColumnType="Super" /> <ColumnFamily Name="LocationUserIndexCF" CompareWith="UTF8Type" /> <ReplicaPlacementStrategy> org.apache.cassandra.locator.RackUnawareStrategy </ReplicaPlacementStrategy> <ReplicationFactor>1</ReplicationFactor> <EndPointSnitch>org.apache.cassandra.locator.EndPointSnitch</EndPointSnitch> </Keyspace>
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×