Your SlideShare is downloading. ×
0
ADVANCED DATA MODELING AND BITMAP INDEXESMatt Stumpmstump@kissmetrics.comMonday, May 6, 13
WHOAREYOURCustomers?Monday, May 6, 13
WHEREDOTHEYHangout?Monday, May 6, 13
HOWSHOULDYOUEngage?Monday, May 6, 13
What is User Experience?Monday, May 6, 13
WhatismyData?Monday, May 6, 13
FormFollowsFunctionMonday, May 6, 13
DataFollowsQueriesMonday, May 6, 13
Primary KeyCREATE TABLE users (username text PRIMARY KEY,first_name text,last_name text,postal_code text,last_login timest...
Primary KeyRowKey username first_name last_name postal_codecstar cstar Cassandra Database 11111user2 user2 Some Guy 22222Mo...
Secondary IndexCREATE INDEX user_zipcode ON users(postal_code);11111 cstar22222 user2 user3 user456 ...Monday, May 6, 13
Where Secondary Indexes BreakHigh Cardinality Data1Only one index per query2Indexes are distributed3Only some datatypes; n...
Roll Your Own Using Wide RowsRowKey 05/02/2012 02/01/2013 05/02/2013 ...user2 JSON JSON JSON JSONAll events for “user2” in...
Limitations to Rolling Your OwnCan’t query across rows1Only some datatypes; no counters2Requires lots of work in the appli...
WhatdoIneed?Monday, May 6, 13
A Query Engine WishlistHigh cardinality data; counters1Complex queries, multiple clauses2Results in < 500ms for billions o...
First Iteration: Ginormus String Sets11111 cstar22222 user2 user3 user456 ...11111 22222Monday, May 6, 13
BitmapsMonday, May 6, 13
BitmapsMonday, May 6, 13
Bitmaps: How do they Work?0-7 8-15 16-23 24-3111111 11010011 1011011 1010000 0000000022222 00000000 0011011 00000000 00000...
Bitmaps: Equality0-7 8-15 16-23 24-3111111 11010011 1011011 1010000 0000000022222 00000000 0011011 00000000 00000000SELECT...
Bitmaps: Range, or How Do I Query Counters?Field Value 0-7 8-15 16-23 24-31Event2 1 11010011 1011011 1010000 00000000Event...
Trigrams; AKA You Promised REGEXField Value 0-7 8-15 16-23 24-31last_name “foo” 11010011 1011011 1010000 00000000last_name...
Monday, May 6, 13
Not Everything is Roses and HoneyIndexes can be huge1Requires a read before write2Requires synchronization34Monday, May 6,...
Compression24Monday, May 6, 13
RLE Compression: How it Works24Header Fill, 11 blocks of 1s Literal 15 bits Fill,18 blocks of 0s Literal 15 bits1010 10000...
Dealing with Read Before WritePartition IndexUsing a Ring4{"product": 124,"user": 22,"event": "event2","value": "Name=Jona...
Ring PartitioningSolves read before write1Solves synchronization issues2Insures index locality34 Easy to isolate big custo...
Sparse Indexes24Offset 0x00 Offset 0x01 Offset 0xA0 Offset 0xF0Field1 0111010101101111 1001010100100101 0111010000100101 01110...
Query &Indexing EngineThe Whole Enchilada4Queries andEventsMonday, May 6, 13
GoalsCore query and index engine, wrapped1Extensible events and queries via Lua2Equality, range and REGEX queries344No sin...
ResourcesLots of Papers on Bitmap Compressionhttp://www-users.cs.umn.edu/~kewu/annotated.html4How Google Code Search Worke...
GOTANYQuestions?Monday, May 6, 13
Thanks4Eric Tschetter of the Druid ProjectandCassandra Devs for answering my questionsMonday, May 6, 13
THANKYOU!Matt Stumpwww.matthewstump.com@mattstumpMonday, May 6, 13
Upcoming SlideShare
Loading in...5
×

Advanced Data Modeling and Bitmap Indexes

1,922

Published on

Matt Stump presents for the DataStax Cassandra South Bay Users group on advanced data modeling and bitmap indexes.

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,922
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
38
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Transcript of "Advanced Data Modeling and Bitmap Indexes"

  1. 1. ADVANCED DATA MODELING AND BITMAP INDEXESMatt Stumpmstump@kissmetrics.comMonday, May 6, 13
  2. 2. WHOAREYOURCustomers?Monday, May 6, 13
  3. 3. WHEREDOTHEYHangout?Monday, May 6, 13
  4. 4. HOWSHOULDYOUEngage?Monday, May 6, 13
  5. 5. What is User Experience?Monday, May 6, 13
  6. 6. WhatismyData?Monday, May 6, 13
  7. 7. FormFollowsFunctionMonday, May 6, 13
  8. 8. DataFollowsQueriesMonday, May 6, 13
  9. 9. Primary KeyCREATE TABLE users (username text PRIMARY KEY,first_name text,last_name text,postal_code text,last_login timestamp);INSERT INTO users(username,first_name,last_name,postal_code,last_login)VALUES (cstar,Cassandra,Database,11111,2013-4-4);SELECT first_name, last_nameFROM users WHERE username = cstar;Monday, May 6, 13
  10. 10. Primary KeyRowKey username first_name last_name postal_codecstar cstar Cassandra Database 11111user2 user2 Some Guy 22222Monday, May 6, 13
  11. 11. Secondary IndexCREATE INDEX user_zipcode ON users(postal_code);11111 cstar22222 user2 user3 user456 ...Monday, May 6, 13
  12. 12. Where Secondary Indexes BreakHigh Cardinality Data1Only one index per query2Indexes are distributed3Only some datatypes; no counters4Range queries are expensive5Monday, May 6, 13
  13. 13. Roll Your Own Using Wide RowsRowKey 05/02/2012 02/01/2013 05/02/2013 ...user2 JSON JSON JSON JSONAll events for “user2” indexed by timeMonday, May 6, 13
  14. 14. Limitations to Rolling Your OwnCan’t query across rows1Only some datatypes; no counters2Requires lots of work in the application3No complex queries4Monday, May 6, 13
  15. 15. WhatdoIneed?Monday, May 6, 13
  16. 16. A Query Engine WishlistHigh cardinality data; counters1Complex queries, multiple clauses2Results in < 500ms for billions of rows3Sub-field searching; regex4Range queries5Monday, May 6, 13
  17. 17. First Iteration: Ginormus String Sets11111 cstar22222 user2 user3 user456 ...11111 22222Monday, May 6, 13
  18. 18. BitmapsMonday, May 6, 13
  19. 19. BitmapsMonday, May 6, 13
  20. 20. Bitmaps: How do they Work?0-7 8-15 16-23 24-3111111 11010011 1011011 1010000 0000000022222 00000000 0011011 00000000 00000000Monday, May 6, 13
  21. 21. Bitmaps: Equality0-7 8-15 16-23 24-3111111 11010011 1011011 1010000 0000000022222 00000000 0011011 00000000 00000000SELECT * FROM users WHERE postal_code IN (11111,22222);0-7 8-15 16-23 24-3111111 &2222200000000 0011011 00000000 00000000Monday, May 6, 13
  22. 22. Bitmaps: Range, or How Do I Query Counters?Field Value 0-7 8-15 16-23 24-31Event2 1 11010011 1011011 1010000 00000000Event2 4 00000000 0011011 00000000 000000000-7 8-15 16-23 24-311 & 4 00000000 0011011 00000000 00000000SELECT * FROM users WHERE Event2 > 0 AND Event2 < 5;Monday, May 6, 13
  23. 23. Trigrams; AKA You Promised REGEXField Value 0-7 8-15 16-23 24-31last_name “foo” 11010011 1011011 1010000 00000000last_name “bar” 00000000 0011011 00000000 000000000-7 8-15 16-23 24-31“foo” &“bar”00000000 0011011 00000000 00000000SELECT * FROM users WHERE last_name ~= ‘f.*bar’;INSERT INTO users(username,first_name,last_name,postal_code,last_login)VALUES (foobar82,johnny,foobar,94110,2013-4-4);Monday, May 6, 13
  24. 24. Monday, May 6, 13
  25. 25. Not Everything is Roses and HoneyIndexes can be huge1Requires a read before write2Requires synchronization34Monday, May 6, 13
  26. 26. Compression24Monday, May 6, 13
  27. 27. RLE Compression: How it Works24Header Fill, 11 blocks of 1s Literal 15 bits Fill,18 blocks of 0s Literal 15 bits1010 10000000001011 111010000100101 000000000010010 000000010000011Example taken from PWAH: http://www.sjvs.nl/?p=72Monday, May 6, 13
  28. 28. Dealing with Read Before WritePartition IndexUsing a Ring4{"product": 124,"user": 22,"event": "event2","value": "Name=Jonathan+Doe&Age=23"}Apply Hash to UserConfigured Fieldhash(:product) = c62fb32eadd5a0fcceb1ddf2697e2345c604f451Monday, May 6, 13
  29. 29. Ring PartitioningSolves read before write1Solves synchronization issues2Insures index locality34 Easy to isolate big customers4Index size is limited to the largestcustomer5Monday, May 6, 13
  30. 30. Sparse Indexes24Offset 0x00 Offset 0x01 Offset 0xA0 Offset 0xF0Field1 0111010101101111 1001010100100101 0111010000100101 0111011100100101OnlyStoretheSetBitsMonday, May 6, 13
  31. 31. Query &Indexing EngineThe Whole Enchilada4Queries andEventsMonday, May 6, 13
  32. 32. GoalsCore query and index engine, wrapped1Extensible events and queries via Lua2Equality, range and REGEX queries344No single point of failure5Distributed, <500ms for billions of rowsMonday, May 6, 13
  33. 33. ResourcesLots of Papers on Bitmap Compressionhttp://www-users.cs.umn.edu/~kewu/annotated.html4How Google Code Search Workedhttp://swtch.com/~rsc/regexp/regexp4.htmlMonday, May 6, 13
  34. 34. GOTANYQuestions?Monday, May 6, 13
  35. 35. Thanks4Eric Tschetter of the Druid ProjectandCassandra Devs for answering my questionsMonday, May 6, 13
  36. 36. THANKYOU!Matt Stumpwww.matthewstump.com@mattstumpMonday, May 6, 13
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×