Your SlideShare is downloading. ×
Data Modeling with Cassandra Column Families
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Data Modeling with Cassandra Column Families

10,450
views

Published on

Slide notes I used for my presentation at ICOODB 2010.

Slide notes I used for my presentation at ICOODB 2010.

Published in: Technology

3 Comments
18 Likes
Statistics
Notes
No Downloads
Views
Total Views
10,450
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
0
Comments
3
Likes
18
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • It’s not about one data model vs. another.It’s not about one storage engine vs. another.Cassandra excels at replicating data and achieving high sustained write throughput.
  • The right tool for the right job
  • Shaped by distribution model
  • Shaped by distribution model
  • Shaped by distribution model
  • Sparse – do not have to exist in every row.
  • Flexible column namingYou define the sort orderNot required to have a specific column just because another row does
  • Look familiar?
  • Arise because of distribution model, not CF model.
  • * Atomic @ CF row. Not isolated.* Large trans apps push down to node (shared nothing)* Guaranteeing ACID constraints across nodes is a hard problem.
  • OTOH, you do get a lot of things:Data redundancyVery fast writes, fast reads
  • Relational>formally defined>correctQuery first>not formally defined>somehow incorrectYou get some things in exchange:ScalabilityAvailabilityReplication
  • Relational>formally defined>correctQuery first>not formally defined>somehow incorrectYou get some things in exchange:ScalabilityAvailabilityReplication
  • Focus on query & analysis.B+treesUpdate once*Cassandra typically becomes IO bound before becoming CPU bound.
  • Not set in stone.Your application may require a different approach.
  • Recognize non-starters: Is my dataset going to become Very Large? Will I need to sustain high write throughput?Also, what are the common operations? Optimize CFs for those operations.
  • *columns sorted. Choose keys and columns.you need to think about how you plan to slice your data.Related data is close to reduce io
  • DenormalizeUse the disk.Don’t be afraid to create another CF that duplicates some data.
  • Composite column namesPainful updates of denormalized partsFast reads & insertions
  • Key
  • Normal attributes
  • Composite column names.Pulling in relationshipsPainful updates. Denormalization is best when data doesn’t change.
  • Commit log – separate diskMemtableSstable
  • Transcript

    • 1. Needle Meet HaystackAdapting your data models for Cassandra
      Gary Dusbabek • Rackspace• ICOODB 2010
    • 2. Outline
      First Things First
      Column Families
      Trade Offs
      Procedures & Best Practices
      Internals
    • 3. It’s all about scalability
    • 4. We can all be friends
    • 5. Column Families
    • 6.
    • 7.
    • 8.
    • 9.
    • 10.
    • 11.
    • 12.
    • 13.
    • 14. 2.TradeOffs
    • 15. No Transactions
    • 16. No
      Adhoc Queries
    • 17. No Joins
    • 18. No Flexible Indexes
    • 19. Don’t
      Panic!
    • 20. Scalability
      Availability
      Replication & Backup
    • 21. 3. Procedures & Practices
    • 22. Relational Way
      Define entities
      Normalize
      Identify Many-to-many
      Query any way you want
    • 23. How Come?
      Scarcity
      Efficiency
    • 24. Cassandra Way
      Know your app
      Queries first
      Denormalize
    • 25. Know Your App
    • 26. Queries First
    • 27. Nobody is Normal
    • 28. Relational Example
    • 29. Column Family Example
    • 30. Column Family Example
    • 31. Column Family Example
    • 32. Column Family Example
    • 33. Does it feel strange?
    • 34. 4. Internals
    • 35. Sequential Writes
      Always
    • 36. Consistency Level
    • 37. Partitioning
    • 38. Slices
      Data Locality
    • 39. Summary
      • The goal is to scale
      • 40. ColumnFamilies != Relational tables
      • 41. Trade-offs: you win some, you lose some
      • 42. Know your application
      • 43. Queries first
      • 44. Denormalization is OK
      • 45. Cassandra was built for this
    • Links
      http://cassandra.apache.org
      http://wiki.apache.org/cassandra
      irc: #cassandra on freenode
      gdusbabek@gmail.com
      @gdusbabek
    • 46. Image Credits
      haystack http://www.flickr.com/photos/james_lumb/3921968993
      pyramids http://www.flickr.com/photos/gracewong/93631410
      scales http://www.flickr.com/photos/eflon/3465042138
      friends http://www.flickr.com/photos/ngmmemuda/4166182931
      television http://www.flickr.com/photos/angelrravelor/314306023
      columns http://www.flickr.com/photos/nostri-imago/3564300653
      devil http://www.flickr.com/photos/52890443@N02/4887855756
      angel http://www.flickr.com/photos/75001512@N00/4938623021
      transaction http://www.flickr.com/photos/neubie/2273635564
      queries http://www.flickr.com/photos/-bast-/349497988
      rings http://www.flickr.com/photos/baldur/4395738741
      indexes http://www.flickr.com/photos/waferboard/4137041591
      panic http://www.flickr.com/photos/pasukaru76/3998981988
      procedures "The Anatomy Lesson of Dr. NicolaesTuip" by Rembrandt
      relational http://www.flickr.com/photos/35536700@N07/3292544674
      desert http://www.flickr.com/photos/waldenpond/4252575735
      jet http://www.flickr.com/photos/rmahle/709685
      queries http://www.flickr.com/photos/andreanna/2812118063
      blackboard http://www.flickr.com/photos/shonk/418180402
      normal http://www.flickr.com/photos/infrogmation/3180606117
      phonograph http://www.flickr.com/photos/shiyazuni/4770244591
      dodo http://www.flickr.com/photos/wheatfields/2071347416
      Internals http://www.flickr.com/photos/37hz/4057856826
      writing http://www.flickr.com/photos/stevendepolo/3877225152
      consistency http://www.flickr.com/photos/betsyweber/4962297050
      partitioning http://www.flickr.com/photos/featheredtar/3137028766
      slices http://www.flickr.com/photos/free-stock/4899674517
      summary http://www.flickr.com/photos/jkdsphotography/4061838798
      links http://www.flickr.com/photos/creative_stock/3397559016