Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Cassandra Basics
              Indexing

     Benjamin Black, b@b3k.us
Relational stores are
SCHEMA ORIENTED
Start from your SCHEMA &
WORK FORWARDS
Column stores are
QUERY ORIENTED
Start from your QUERIES &
WORK BACKWARDS
AT SCALE
AT SCALE
           Denormalization is
              THE NORM
AT SCALE
AT SCALE
           Everything depends on
               THE INDICES
Cassandra is an
INDEX CONSTRUCTION KIT
Column Family
Two-level Map

key: {
  column: value,
  column: value,
  ...
 }
Super Column Family
Three-level Map
key: {
   supercolumn: {
       column:value,
      column: value
   },
   supercolumn: {
     ...
   }
 }
column sorting defined by
         CompareWith/
CompareSubcolumnsWith
TimeUUIDType
  UTF8Type
                ASCIIType
LongType

     LexicalUUIDType
row placement determined by
             Partitioner
RandomPartitioner
Place based on MD5 of key




        OrderPreservingPartitioner
               Place based on actual key
Rows are sorted by key on each node
Regardless of partitioner
One example in
TWO ACTS
Prelude
A USER DATABASE
<ColumnFamily Name=”Users”
       CompareWith=”UTF8Type” />
“b”:    {“name”:”Ben”, “street”:”1234 Oak St.”,
        “city”:”Seattle”, “state”:”WA”}
“jason”: {”name”:”Jason”, “street”...
SELECT name FROM Users
WHERE state=”WA”
SELECT name FROM Users
               WHERE state=”WA”

How is WHERE clause
formed?
Act One
Supercolumn Indexing
<ColumnFamily Name=”LocationUserIndexSCF”
       CompareWith=”UTF8Type”
       CompareSubcolumnsWith=”UTF8Type”
       Col...
[state]: {
  [city1]: {[name1]:[user1], [name2]:[user2], ... },
  [city2]: {[name3]:[user3], [name4]:[user4], ... },
  ......
“CA”: {

 “San Francisco”: {”Jennifer”: “jen1982”}
}
“MA”: {

 “Boston”: {”Albert”: “albert”}
}
“WA”: {

 “Bellingham”: {”...
Row Key


“CA”: {

 “San Francisco”: {”Jennifer”: “jen1982”}
}
“MA”: {

 “Boston”: {”Albert”: “albert”}
}
“WA”: {

 “Belli...
Row Key
                 Super Column

“CA”: {

 “San Francisco”: {”Jennifer”: “jen1982”}
}
“MA”: {

 “Boston”: {”Albert”:...
Row Key
                                     Colum
                 Super Column
                                     n
“C...
Row Key
                                     Colum
                 Super Column                Value
                    ...
Show me
EVERYONE IN WASHINGTON
get(:LocationUserIndexSCF, ‘WA’)
{

   “Bellingham”: {”Jason”: “jason”},

   “Seattle”: {”Ben”: “b”, ”Zack”: “zack”}
}
Act Two
Composite Key Indexing
Order Preserving Partitioner
                          +
        Range Queries
<ColumnFamily Name=”LocationUserIndexCF”
       CompareWith=”UTF8Type” />
[state1]/[city1]:   {[name1]:[user1], [name2]:[user2], ... }
[state1]/[city2]:   {[name3]:[user3], [name4]:[user4], ... }
...
“CA/San Francisco”: {”Jennifer”: “jen1982”}
“MA/Boston”: {”Albert”: “albert”}
“WA/Bellingham”: {”Jason”: “jason”}
“WA/Seat...
Show me
EVERYONE IN WASHINGTON
get_range(:LocationUserIndexCF, {:start: 'WA',
                          :finish:'WB'})
{
    ”WA/Bellingham”: {”Jason”: “jason”},
    “WA/Seattle”: {”Ben”: “b”, “Zack”: “zack”}
}
Finale
BUILD SOMETHING AWESOME
(This part is up to you)
Appendix
EXAMPLE KEYSPACE
<Keyspace Name="UserDb">
  <ColumnFamily Name="Users"
          CompareWith="UTF8Type" />

  <ColumnFamily Name="LocationU...
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
Cassandra Explained
Next
Upcoming SlideShare
Cassandra Explained
Next
Download to read offline and view in fullscreen.

Share

Cassandra Basics: Indexing

Download to read offline

An introduction to indexing with supercolumns and range queries in Cassandra.

Related Books

Free with a 30 day trial from Scribd

See all

Cassandra Basics: Indexing

  1. Cassandra Basics Indexing Benjamin Black, b@b3k.us
  2. Relational stores are SCHEMA ORIENTED
  3. Start from your SCHEMA & WORK FORWARDS
  4. Column stores are QUERY ORIENTED
  5. Start from your QUERIES & WORK BACKWARDS
  6. AT SCALE
  7. AT SCALE Denormalization is THE NORM
  8. AT SCALE
  9. AT SCALE Everything depends on THE INDICES
  10. Cassandra is an INDEX CONSTRUCTION KIT
  11. Column Family
  12. Two-level Map key: { column: value, column: value, ... }
  13. Super Column Family
  14. Three-level Map key: { supercolumn: { column:value, column: value }, supercolumn: { ... } }
  15. column sorting defined by CompareWith/ CompareSubcolumnsWith
  16. TimeUUIDType UTF8Type ASCIIType LongType LexicalUUIDType
  17. row placement determined by Partitioner
  18. RandomPartitioner Place based on MD5 of key OrderPreservingPartitioner Place based on actual key
  19. Rows are sorted by key on each node Regardless of partitioner
  20. One example in TWO ACTS
  21. Prelude A USER DATABASE
  22. <ColumnFamily Name=”Users” CompareWith=”UTF8Type” />
  23. “b”: {“name”:”Ben”, “street”:”1234 Oak St.”, “city”:”Seattle”, “state”:”WA”} “jason”: {”name”:”Jason”, “street”:”456 First Ave.”, “city”:”Bellingham”, “state”:”WA”} “zack”: {”name”: “Zack”, “street”: “4321 Pine St.”, “city”: “Seattle”, “state”: “WA”} “jen1982”: {”name”:”Jennifer”, “street”:”1120 Foo Lane”, “city”:”San Francisco”, “state”:”CA”} “albert”: {”name”:”Albert”, “street”:”2364 South St.”, “city”:”Boston”, “state”:”MA”}
  24. SELECT name FROM Users WHERE state=”WA”
  25. SELECT name FROM Users WHERE state=”WA” How is WHERE clause formed?
  26. Act One Supercolumn Indexing
  27. <ColumnFamily Name=”LocationUserIndexSCF” CompareWith=”UTF8Type” CompareSubcolumnsWith=”UTF8Type” ColumnType=”Super” />
  28. [state]: { [city1]: {[name1]:[user1], [name2]:[user2], ... }, [city2]: {[name3]:[user3], [name4]:[user4], ... }, ... [cityX]: {[name5]:[user5], [name6]:[user6], ... } }
  29. “CA”: { “San Francisco”: {”Jennifer”: “jen1982”} } “MA”: { “Boston”: {”Albert”: “albert”} } “WA”: { “Bellingham”: {”Jason”: “jason”}, “Seattle”: {”Ben”: “b”, ”Zack”: “zack”} }
  30. Row Key “CA”: { “San Francisco”: {”Jennifer”: “jen1982”} } “MA”: { “Boston”: {”Albert”: “albert”} } “WA”: { “Bellingham”: {”Jason”: “jason”}, “Seattle”: {”Ben”: “b”, ”Zack”: “zack”} }
  31. Row Key Super Column “CA”: { “San Francisco”: {”Jennifer”: “jen1982”} } “MA”: { “Boston”: {”Albert”: “albert”} } “WA”: { “Bellingham”: {”Jason”: “jason”}, “Seattle”: {”Ben”: “b”, ”Zack”: “zack”} }
  32. Row Key Colum Super Column n “CA”: { “San Francisco”: {”Jennifer”: “jen1982”} } “MA”: { “Boston”: {”Albert”: “albert”} } “WA”: { “Bellingham”: {”Jason”: “jason”}, “Seattle”: {”Ben”: “b”, ”Zack”: “zack”} }
  33. Row Key Colum Super Column Value n “CA”: { “San Francisco”: {”Jennifer”: “jen1982”} } “MA”: { “Boston”: {”Albert”: “albert”} } “WA”: { “Bellingham”: {”Jason”: “jason”}, “Seattle”: {”Ben”: “b”, ”Zack”: “zack”} }
  34. Show me EVERYONE IN WASHINGTON
  35. get(:LocationUserIndexSCF, ‘WA’)
  36. { “Bellingham”: {”Jason”: “jason”}, “Seattle”: {”Ben”: “b”, ”Zack”: “zack”} }
  37. Act Two Composite Key Indexing
  38. Order Preserving Partitioner + Range Queries
  39. <ColumnFamily Name=”LocationUserIndexCF” CompareWith=”UTF8Type” />
  40. [state1]/[city1]: {[name1]:[user1], [name2]:[user2], ... } [state1]/[city2]: {[name3]:[user3], [name4]:[user4], ... } [state2]/[city1]: {[name5]:[user5], [name6]:[user6], ... } ... [stateX]/[cityY]: {[name7]:[user7], [name8]:[user8], ... }
  41. “CA/San Francisco”: {”Jennifer”: “jen1982”} “MA/Boston”: {”Albert”: “albert”} “WA/Bellingham”: {”Jason”: “jason”} “WA/Seattle”: {”Ben”: “b”, “Zack”: “zack”}
  42. Show me EVERYONE IN WASHINGTON
  43. get_range(:LocationUserIndexCF, {:start: 'WA', :finish:'WB'})
  44. { ”WA/Bellingham”: {”Jason”: “jason”}, “WA/Seattle”: {”Ben”: “b”, “Zack”: “zack”} }
  45. Finale BUILD SOMETHING AWESOME
  46. (This part is up to you)
  47. Appendix EXAMPLE KEYSPACE
  48. <Keyspace Name="UserDb"> <ColumnFamily Name="Users" CompareWith="UTF8Type" /> <ColumnFamily Name="LocationUserIndexSCF" CompareWith="UTF8Type" CompareSubcolumnsWith="UTF8Type" ColumnType="Super" /> <ColumnFamily Name="LocationUserIndexCF" CompareWith="UTF8Type" /> <ReplicaPlacementStrategy> org.apache.cassandra.locator.RackUnawareStrategy </ReplicaPlacementStrategy> <ReplicationFactor>1</ReplicationFactor> <EndPointSnitch>org.apache.cassandra.locator.EndPointSnitch</EndPointSnitch> </Keyspace>
  • Vani869

    Dec. 1, 2019
  • williammdavis

    Dec. 3, 2016
  • supertoy2015

    Nov. 3, 2016
  • RadhaKrishnaProddatu

    Jul. 26, 2016
  • hczcolin

    Apr. 17, 2016
  • gmolinari1

    Nov. 11, 2014
  • shashisatya

    Dec. 6, 2012
  • hypermin

    Jun. 18, 2012
  • jonbros

    Jun. 11, 2012
  • MarkusFensterer

    Feb. 6, 2012
  • vjadusumilli

    Nov. 30, 2011
  • colinkuo

    Sep. 12, 2011
  • eagleshack

    Jun. 28, 2011
  • amracfcb

    May. 13, 2011
  • iies

    May. 5, 2011
  • declan.cox

    Feb. 20, 2011
  • artob

    Feb. 17, 2011
  • ybugchen

    Jan. 15, 2011
  • javanesevn

    Jan. 12, 2011
  • tdarwin

    Dec. 16, 2010

An introduction to indexing with supercolumns and range queries in Cassandra.

Views

Total views

45,517

On Slideshare

0

From embeds

0

Number of embeds

2,830

Actions

Downloads

832

Shares

0

Comments

0

Likes

43

×