Your SlideShare is downloading. ×
  • Like
  • Save
Data Modeling with Cassandra Column Families
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Data Modeling with Cassandra Column Families

  • 10,350 views
Published

Slide notes I used for my presentation at ICOODB 2010.

Slide notes I used for my presentation at ICOODB 2010.

Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
10,350
On SlideShare
0
From Embeds
0
Number of Embeds
4

Actions

Shares
Downloads
0
Comments
3
Likes
18

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • It’s not about one data model vs. another.It’s not about one storage engine vs. another.Cassandra excels at replicating data and achieving high sustained write throughput.
  • The right tool for the right job
  • Shaped by distribution model
  • Shaped by distribution model
  • Shaped by distribution model
  • Sparse – do not have to exist in every row.
  • Flexible column namingYou define the sort orderNot required to have a specific column just because another row does
  • Look familiar?
  • Arise because of distribution model, not CF model.
  • * Atomic @ CF row. Not isolated.* Large trans apps push down to node (shared nothing)* Guaranteeing ACID constraints across nodes is a hard problem.
  • OTOH, you do get a lot of things:Data redundancyVery fast writes, fast reads
  • Relational>formally defined>correctQuery first>not formally defined>somehow incorrectYou get some things in exchange:ScalabilityAvailabilityReplication
  • Relational>formally defined>correctQuery first>not formally defined>somehow incorrectYou get some things in exchange:ScalabilityAvailabilityReplication
  • Focus on query & analysis.B+treesUpdate once*Cassandra typically becomes IO bound before becoming CPU bound.
  • Not set in stone.Your application may require a different approach.
  • Recognize non-starters: Is my dataset going to become Very Large? Will I need to sustain high write throughput?Also, what are the common operations? Optimize CFs for those operations.
  • *columns sorted. Choose keys and columns.you need to think about how you plan to slice your data.Related data is close to reduce io
  • DenormalizeUse the disk.Don’t be afraid to create another CF that duplicates some data.
  • Composite column namesPainful updates of denormalized partsFast reads & insertions
  • Key
  • Normal attributes
  • Composite column names.Pulling in relationshipsPainful updates. Denormalization is best when data doesn’t change.
  • Commit log – separate diskMemtableSstable

Transcript

  • 1. Needle Meet HaystackAdapting your data models for Cassandra
    Gary Dusbabek • Rackspace• ICOODB 2010
  • 2. Outline
    First Things First
    Column Families
    Trade Offs
    Procedures & Best Practices
    Internals
  • 3. It’s all about scalability
  • 4. We can all be friends
  • 5. Column Families
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14. 2.TradeOffs
  • 15. No Transactions
  • 16. No
    Adhoc Queries
  • 17. No Joins
  • 18. No Flexible Indexes
  • 19. Don’t
    Panic!
  • 20. Scalability
    Availability
    Replication & Backup
  • 21. 3. Procedures & Practices
  • 22. Relational Way
    Define entities
    Normalize
    Identify Many-to-many
    Query any way you want
  • 23. How Come?
    Scarcity
    Efficiency
  • 24. Cassandra Way
    Know your app
    Queries first
    Denormalize
  • 25. Know Your App
  • 26. Queries First
  • 27. Nobody is Normal
  • 28. Relational Example
  • 29. Column Family Example
  • 30. Column Family Example
  • 31. Column Family Example
  • 32. Column Family Example
  • 33. Does it feel strange?
  • 34. 4. Internals
  • 35. Sequential Writes
    Always
  • 36. Consistency Level
  • 37. Partitioning
  • 38. Slices
    Data Locality
  • 39. Summary
    • The goal is to scale
    • 40. ColumnFamilies != Relational tables
    • 41. Trade-offs: you win some, you lose some
    • 42. Know your application
    • 43. Queries first
    • 44. Denormalization is OK
    • 45. Cassandra was built for this
  • Links
    http://cassandra.apache.org
    http://wiki.apache.org/cassandra
    irc: #cassandra on freenode
    gdusbabek@gmail.com
    @gdusbabek
  • 46. Image Credits
    haystack http://www.flickr.com/photos/james_lumb/3921968993
    pyramids http://www.flickr.com/photos/gracewong/93631410
    scales http://www.flickr.com/photos/eflon/3465042138
    friends http://www.flickr.com/photos/ngmmemuda/4166182931
    television http://www.flickr.com/photos/angelrravelor/314306023
    columns http://www.flickr.com/photos/nostri-imago/3564300653
    devil http://www.flickr.com/photos/52890443@N02/4887855756
    angel http://www.flickr.com/photos/75001512@N00/4938623021
    transaction http://www.flickr.com/photos/neubie/2273635564
    queries http://www.flickr.com/photos/-bast-/349497988
    rings http://www.flickr.com/photos/baldur/4395738741
    indexes http://www.flickr.com/photos/waferboard/4137041591
    panic http://www.flickr.com/photos/pasukaru76/3998981988
    procedures "The Anatomy Lesson of Dr. NicolaesTuip" by Rembrandt
    relational http://www.flickr.com/photos/35536700@N07/3292544674
    desert http://www.flickr.com/photos/waldenpond/4252575735
    jet http://www.flickr.com/photos/rmahle/709685
    queries http://www.flickr.com/photos/andreanna/2812118063
    blackboard http://www.flickr.com/photos/shonk/418180402
    normal http://www.flickr.com/photos/infrogmation/3180606117
    phonograph http://www.flickr.com/photos/shiyazuni/4770244591
    dodo http://www.flickr.com/photos/wheatfields/2071347416
    Internals http://www.flickr.com/photos/37hz/4057856826
    writing http://www.flickr.com/photos/stevendepolo/3877225152
    consistency http://www.flickr.com/photos/betsyweber/4962297050
    partitioning http://www.flickr.com/photos/featheredtar/3137028766
    slices http://www.flickr.com/photos/free-stock/4899674517
    summary http://www.flickr.com/photos/jkdsphotography/4061838798
    links http://www.flickr.com/photos/creative_stock/3397559016