7. Distributed Database
•
Rows are used to distribute
•
C* pulls the entire row into
memory
•
Can pull out individual parts
or write to individual parts, but
it’s still considered together
8. Log Structured Updates
•
Commitlog and sstables are
log structured
•
Oriented around appending
(streaming at a know location)
•
==> Writes quickly
•
And you want to avoid
rewrites
9. Random Reads
•
Data is scattered around the
store (have to get location and
random read to look it up)
•
Some indexing, and hopefully
it’s in the vfs page cache, but
still.
•
==> Reads “slower”
10. General Rules of Thumb
•
De-normalize Everything
•
Duplicate your data
•
Organize it for reading
14. Row
•
Unique inside of a column
family
•
Key/Value where the Value is
all of the columns in the row
•
Can handle some additional
meaning to the row name
•
Typically “bucketing”
20. Lookup by chat handle
ICQ CF
EMAIL
IRC CF
Name
89403270
EMAIL
Name
bb@example
.com
Bobtholomew
EMAIL
Name
steve@my
.net
Steve
bb@DAL
jsmith@mac.com
John
steve@
DARK
23. How do I create these?
[default@userdb] create column family usersCF;
5ecec19a-3a43-3490-8c9a-3eb2901e2e97
Waiting for schema agreement...
... schemas agree across the cluster
[default@userdb] create column family handleCF;
df82135c-eb1f-3abf-b9df-02c605d571d5
Waiting for schema agreement...
... schemas agree across the cluster
24. How do I insert data?
[default@userdb] set handleCF[utf8(‘bb@DAL’)]
… [utf8(‘NAME’)] = utf8('Bobtholomew');
Value inserted.
Elapsed time: 22 msec(s).
[default@userdb] set handleCF[utf8(‘bb@DAL')]
… [utf8(‘EMAIL’)] = utf8(‘bb@example.com’);
Value inserted.
Elapsed time: 3.43 msec(s).
25. Users CF - TAGS
mac@mac
.com
NAME
TWITTER
TAGS
mac
@macmceniry
admin,super,cool
jsmith@mac
.com
NAME
ICQ
Employer
Hobby
John
89403270
Smithco
Miniature Horses
bb@example
.com
NAME
IRC
Bobtholomew
bb@DAL
NAME
TWITTER
Food
TAGS
Elizabeth
@liz
Cheesecake
admin
NAME
IRC
Steven
steve@DARK
liz@example
.com
steve@my
.net
26. Users CF - TAGS
mac@mac
.com
jsmith@mac
.com
bb@example
.com
liz@example
.com
steve@my
.net
NAME
TWITTER
TAGS:admin
TAGS:cool
mac
@macmceniry
NAME
ICQ
Employer
Hobby
John
89403270
Smithco
Miniature Horses
NAME
IRC
Bobtholomew
bb@DAL
NAME
TWITTER
Food
TAGS:admin
Elizabeth
@liz
Cheesecake
NAME
IRC
Steven
steve@DARK
TAGS:super
28. What’s in a name?
•
Can use row names and
column names to add meaning
•
Row name meaning creates a
new distribution bin
•
Column name meaning can
create a data hierarchy
•
No real change to the column
family creation in the thrift
interface (well, types
depending on what you’re
doing)
31. Now
•
Same underlying structure none of that has changed
•
•
•
Rows - reference quickly use for searching
Columns - scan quickly user for ordering
But now have usage patterns
•
Some have been codified
into CQL
35. How does handle look here?
cqlsh:userdb> SELECT * FROM handles;
handlename | email
| name
————————————|————————————————|—————————————
bb@DAL | bb@example.com | Bobtholomew
cqlsh:userdb> SELECT * FROM handles WHERE
… handlename = ‘bb@DAL’;
handlename | email
| name
————————————|————————————————|—————————————
bb@DAL | bb@example.com | Bobtholomew
36. How do I change it?
cqlsh:userdb> UPDATE handles SET email=‘none’
… WHERE handlename = ‘bb@DAL’;
cqlsh:userdb> SELECT * FROM handles;
handlename | email
| name
————————————|————————————————|—————————————
bb@DAL |
none | Bobtholomew
37. upsert
•
Update instead of Insert
•
•
Does the same thing (as
long as it’s not a key)
Insert instead of Update
•
Overwrites data if it’s
already there
38. What about our event
buckets from earlier?
•
Can do the same thing
•
Creating a composite key
•
•
USERNAME:DATE
Creating a composite column
•
hh:mm:ss
41. … PRIMARY KEY ( (username,d), hr, min, sec ) );
ROW NAME
(C* 1.2)
COLUMN NAME
42. Tags
•
CQL has collections
•
map, list, set
•
Collections are build similar to
small/special composite
columns
•
Can add to our existing
handle table
43. cqlsh:userdb> ALTER TABLE handles ADD tags SET;
cqlsh:userdb> UPDATE TABLE handles
… SET tags = (‘admin’, ‘foo’);
email
name
bb@example.com
Bobtholomew
bb@DAL
tags:admin
tags:foo
44. Design the data model so
that it’s idempotent (eBay)
•
Counter versus Collection (what question is
being asked?)
Count
100
Count
200
Count
300
Likes A
Likes B
Likes C
Likes A
Likes B
Likes C
+user11
1393287359
+user12
1393287359
+user11
1393287359
+user12
1393280912
-user11
1393281942
+user13
1393212345
1393287100
+user12
1393287100
1393287100
+user13
1393287100
1393287100
+user14
1393287100
45. Go Forth and Model!
Thank You!
PS… Sony Network is hiring!