Your SlideShare is downloading. ×
0
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Data Modeling with Cassandra Column Families

10,724

Published on

Slide notes I used for my presentation at ICOODB 2010.

Slide notes I used for my presentation at ICOODB 2010.

Published in: Technology
3 Comments
19 Likes
Statistics
Notes
No Downloads
Views
Total Views
10,724
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
0
Comments
3
Likes
19
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • It’s not about one data model vs. another.It’s not about one storage engine vs. another.Cassandra excels at replicating data and achieving high sustained write throughput.
  • The right tool for the right job
  • Shaped by distribution model
  • Shaped by distribution model
  • Shaped by distribution model
  • Sparse – do not have to exist in every row.
  • Flexible column namingYou define the sort orderNot required to have a specific column just because another row does
  • Look familiar?
  • Arise because of distribution model, not CF model.
  • * Atomic @ CF row. Not isolated.* Large trans apps push down to node (shared nothing)* Guaranteeing ACID constraints across nodes is a hard problem.
  • OTOH, you do get a lot of things:Data redundancyVery fast writes, fast reads
  • Relational>formally defined>correctQuery first>not formally defined>somehow incorrectYou get some things in exchange:ScalabilityAvailabilityReplication
  • Relational>formally defined>correctQuery first>not formally defined>somehow incorrectYou get some things in exchange:ScalabilityAvailabilityReplication
  • Focus on query & analysis.B+treesUpdate once*Cassandra typically becomes IO bound before becoming CPU bound.
  • Not set in stone.Your application may require a different approach.
  • Recognize non-starters: Is my dataset going to become Very Large? Will I need to sustain high write throughput?Also, what are the common operations? Optimize CFs for those operations.
  • *columns sorted. Choose keys and columns.you need to think about how you plan to slice your data.Related data is close to reduce io
  • DenormalizeUse the disk.Don’t be afraid to create another CF that duplicates some data.
  • Composite column namesPainful updates of denormalized partsFast reads & insertions
  • Key
  • Normal attributes
  • Composite column names.Pulling in relationshipsPainful updates. Denormalization is best when data doesn’t change.
  • Commit log – separate diskMemtableSstable
  • Transcript

    • 1. Needle Meet HaystackAdapting your data models for Cassandra<br />Gary Dusbabek • Rackspace• ICOODB 2010<br />
    • 2. Outline<br />First Things First<br />Column Families<br />Trade Offs<br />Procedures & Best Practices<br />Internals<br />
    • 3. It’s all about scalability<br />
    • 4. We can all be friends<br />
    • 5. Column Families<br />
    • 6.
    • 7.
    • 8.
    • 9.
    • 10.
    • 11.
    • 12.
    • 13.
    • 14. 2.TradeOffs<br />
    • 15. No Transactions<br />
    • 16. No <br />Adhoc Queries<br />
    • 17. No Joins<br />
    • 18. No Flexible Indexes<br />
    • 19. Don’t<br />Panic!<br />
    • 20. Scalability<br />Availability<br />Replication & Backup<br />
    • 21. 3. Procedures & Practices<br />
    • 22. Relational Way<br />Define entities<br />Normalize<br />Identify Many-to-many<br />Query any way you want<br />
    • 23. How Come?<br />Scarcity<br />Efficiency<br />
    • 24. Cassandra Way<br />Know your app<br />Queries first<br />Denormalize<br />
    • 25. Know Your App<br />
    • 26. Queries First<br />
    • 27. Nobody is Normal<br />
    • 28. Relational Example<br />
    • 29. Column Family Example<br />
    • 30. Column Family Example<br />
    • 31. Column Family Example<br />
    • 32. Column Family Example<br />
    • 33. Does it feel strange?<br />
    • 34. 4. Internals<br />
    • 35. Sequential Writes<br />Always<br />
    • 36. Consistency Level<br />
    • 37. Partitioning<br />
    • 38. Slices<br />Data Locality<br />
    • 39. Summary<br /><ul><li>The goal is to scale
    • 40. ColumnFamilies != Relational tables
    • 41. Trade-offs: you win some, you lose some
    • 42. Know your application
    • 43. Queries first
    • 44. Denormalization is OK
    • 45. Cassandra was built for this</li></li></ul><li>Links<br />http://cassandra.apache.org<br />http://wiki.apache.org/cassandra<br />irc: #cassandra on freenode<br />gdusbabek@gmail.com<br />@gdusbabek<br />
    • 46. Image Credits<br />haystack http://www.flickr.com/photos/james_lumb/3921968993<br />pyramids http://www.flickr.com/photos/gracewong/93631410<br />scales http://www.flickr.com/photos/eflon/3465042138<br />friends http://www.flickr.com/photos/ngmmemuda/4166182931<br />television http://www.flickr.com/photos/angelrravelor/314306023<br />columns http://www.flickr.com/photos/nostri-imago/3564300653<br />devil http://www.flickr.com/photos/52890443@N02/4887855756<br />angel http://www.flickr.com/photos/75001512@N00/4938623021<br />transaction http://www.flickr.com/photos/neubie/2273635564<br />queries http://www.flickr.com/photos/-bast-/349497988<br />rings http://www.flickr.com/photos/baldur/4395738741<br />indexes http://www.flickr.com/photos/waferboard/4137041591<br />panic http://www.flickr.com/photos/pasukaru76/3998981988<br />procedures "The Anatomy Lesson of Dr. NicolaesTuip" by Rembrandt<br />relational http://www.flickr.com/photos/35536700@N07/3292544674<br />desert http://www.flickr.com/photos/waldenpond/4252575735<br />jet http://www.flickr.com/photos/rmahle/709685<br />queries http://www.flickr.com/photos/andreanna/2812118063<br />blackboard http://www.flickr.com/photos/shonk/418180402<br />normal http://www.flickr.com/photos/infrogmation/3180606117<br />phonograph http://www.flickr.com/photos/shiyazuni/4770244591<br />dodo http://www.flickr.com/photos/wheatfields/2071347416<br />Internals http://www.flickr.com/photos/37hz/4057856826<br />writing http://www.flickr.com/photos/stevendepolo/3877225152<br />consistency http://www.flickr.com/photos/betsyweber/4962297050<br />partitioning http://www.flickr.com/photos/featheredtar/3137028766<br />slices http://www.flickr.com/photos/free-stock/4899674517<br />summary http://www.flickr.com/photos/jkdsphotography/4061838798<br />links http://www.flickr.com/photos/creative_stock/3397559016<br />

    ×