Needle Meet HaystackAdapting your data models for Cassandra<br />Gary Dusbabek  •  Rackspace•  ICOODB 2010<br />
Outline<br />First Things First<br />Column Families<br />Trade Offs<br />Procedures & Best Practices<br />Internals<br />
It’s all about scalability<br />
We can all be friends<br />
Column Families<br />
2.TradeOffs<br />
No Transactions<br />
No <br />Adhoc Queries<br />
No Joins<br />
No Flexible Indexes<br />
Don’t<br />Panic!<br />
Scalability<br />Availability<br />Replication & Backup<br />
3. Procedures & Practices<br />
Relational Way<br />Define entities<br />Normalize<br />Identify Many-to-many<br />Query any way you want<br />
How Come?<br />Scarcity<br />Efficiency<br />
Cassandra Way<br />Know your app<br />Queries first<br />Denormalize<br />
Know Your App<br />
Queries First<br />
Nobody is Normal<br />
Relational Example<br />
Column Family Example<br />
Column Family Example<br />
Column Family Example<br />
Column Family Example<br />
Does it feel strange?<br />
4. Internals<br />
Sequential Writes<br />Always<br />
Consistency Level<br />
Partitioning<br />
Slices<br />Data Locality<br />
Summary<br /><ul><li>The goal is to scale
ColumnFamilies != Relational tables
Trade-offs: you win some, you lose some
Upcoming SlideShare
Loading in...5
×

Data Modeling with Cassandra Column Families

10,809

Published on

Slide notes I used for my presentation at ICOODB 2010.

Published in: Technology
3 Comments
19 Likes
Statistics
Notes
No Downloads
Views
Total Views
10,809
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
0
Comments
3
Likes
19
Embeds 0
No embeds

No notes for slide
  • It’s not about one data model vs. another.It’s not about one storage engine vs. another.Cassandra excels at replicating data and achieving high sustained write throughput.
  • The right tool for the right job
  • Shaped by distribution model
  • Shaped by distribution model
  • Shaped by distribution model
  • Sparse – do not have to exist in every row.
  • Flexible column namingYou define the sort orderNot required to have a specific column just because another row does
  • Look familiar?
  • Arise because of distribution model, not CF model.
  • * Atomic @ CF row. Not isolated.* Large trans apps push down to node (shared nothing)* Guaranteeing ACID constraints across nodes is a hard problem.
  • OTOH, you do get a lot of things:Data redundancyVery fast writes, fast reads
  • Relational&gt;formally defined&gt;correctQuery first&gt;not formally defined&gt;somehow incorrectYou get some things in exchange:ScalabilityAvailabilityReplication
  • Relational&gt;formally defined&gt;correctQuery first&gt;not formally defined&gt;somehow incorrectYou get some things in exchange:ScalabilityAvailabilityReplication
  • Focus on query &amp; analysis.B+treesUpdate once*Cassandra typically becomes IO bound before becoming CPU bound.
  • Not set in stone.Your application may require a different approach.
  • Recognize non-starters: Is my dataset going to become Very Large? Will I need to sustain high write throughput?Also, what are the common operations? Optimize CFs for those operations.
  • *columns sorted. Choose keys and columns.you need to think about how you plan to slice your data.Related data is close to reduce io
  • DenormalizeUse the disk.Don’t be afraid to create another CF that duplicates some data.
  • Composite column namesPainful updates of denormalized partsFast reads &amp; insertions
  • Key
  • Normal attributes
  • Composite column names.Pulling in relationshipsPainful updates. Denormalization is best when data doesn’t change.
  • Commit log – separate diskMemtableSstable
  • Transcript of "Data Modeling with Cassandra Column Families"

    1. 1. Needle Meet HaystackAdapting your data models for Cassandra<br />Gary Dusbabek • Rackspace• ICOODB 2010<br />
    2. 2. Outline<br />First Things First<br />Column Families<br />Trade Offs<br />Procedures & Best Practices<br />Internals<br />
    3. 3. It’s all about scalability<br />
    4. 4. We can all be friends<br />
    5. 5. Column Families<br />
    6. 6.
    7. 7.
    8. 8.
    9. 9.
    10. 10.
    11. 11.
    12. 12.
    13. 13.
    14. 14. 2.TradeOffs<br />
    15. 15. No Transactions<br />
    16. 16. No <br />Adhoc Queries<br />
    17. 17. No Joins<br />
    18. 18. No Flexible Indexes<br />
    19. 19. Don’t<br />Panic!<br />
    20. 20. Scalability<br />Availability<br />Replication & Backup<br />
    21. 21. 3. Procedures & Practices<br />
    22. 22. Relational Way<br />Define entities<br />Normalize<br />Identify Many-to-many<br />Query any way you want<br />
    23. 23. How Come?<br />Scarcity<br />Efficiency<br />
    24. 24. Cassandra Way<br />Know your app<br />Queries first<br />Denormalize<br />
    25. 25. Know Your App<br />
    26. 26. Queries First<br />
    27. 27. Nobody is Normal<br />
    28. 28. Relational Example<br />
    29. 29. Column Family Example<br />
    30. 30. Column Family Example<br />
    31. 31. Column Family Example<br />
    32. 32. Column Family Example<br />
    33. 33. Does it feel strange?<br />
    34. 34. 4. Internals<br />
    35. 35. Sequential Writes<br />Always<br />
    36. 36. Consistency Level<br />
    37. 37. Partitioning<br />
    38. 38. Slices<br />Data Locality<br />
    39. 39. Summary<br /><ul><li>The goal is to scale
    40. 40. ColumnFamilies != Relational tables
    41. 41. Trade-offs: you win some, you lose some
    42. 42. Know your application
    43. 43. Queries first
    44. 44. Denormalization is OK
    45. 45. Cassandra was built for this</li></li></ul><li>Links<br />http://cassandra.apache.org<br />http://wiki.apache.org/cassandra<br />irc: #cassandra on freenode<br />gdusbabek@gmail.com<br />@gdusbabek<br />
    46. 46. Image Credits<br />haystack http://www.flickr.com/photos/james_lumb/3921968993<br />pyramids http://www.flickr.com/photos/gracewong/93631410<br />scales http://www.flickr.com/photos/eflon/3465042138<br />friends http://www.flickr.com/photos/ngmmemuda/4166182931<br />television http://www.flickr.com/photos/angelrravelor/314306023<br />columns http://www.flickr.com/photos/nostri-imago/3564300653<br />devil http://www.flickr.com/photos/52890443@N02/4887855756<br />angel http://www.flickr.com/photos/75001512@N00/4938623021<br />transaction http://www.flickr.com/photos/neubie/2273635564<br />queries http://www.flickr.com/photos/-bast-/349497988<br />rings http://www.flickr.com/photos/baldur/4395738741<br />indexes http://www.flickr.com/photos/waferboard/4137041591<br />panic http://www.flickr.com/photos/pasukaru76/3998981988<br />procedures "The Anatomy Lesson of Dr. NicolaesTuip" by Rembrandt<br />relational http://www.flickr.com/photos/35536700@N07/3292544674<br />desert http://www.flickr.com/photos/waldenpond/4252575735<br />jet http://www.flickr.com/photos/rmahle/709685<br />queries http://www.flickr.com/photos/andreanna/2812118063<br />blackboard http://www.flickr.com/photos/shonk/418180402<br />normal http://www.flickr.com/photos/infrogmation/3180606117<br />phonograph http://www.flickr.com/photos/shiyazuni/4770244591<br />dodo http://www.flickr.com/photos/wheatfields/2071347416<br />Internals http://www.flickr.com/photos/37hz/4057856826<br />writing http://www.flickr.com/photos/stevendepolo/3877225152<br />consistency http://www.flickr.com/photos/betsyweber/4962297050<br />partitioning http://www.flickr.com/photos/featheredtar/3137028766<br />slices http://www.flickr.com/photos/free-stock/4899674517<br />summary http://www.flickr.com/photos/jkdsphotography/4061838798<br />links http://www.flickr.com/photos/creative_stock/3397559016<br />

    ×