• Save
Data Modeling with Cassandra Column Families
Upcoming SlideShare
Loading in...5

Data Modeling with Cassandra Column Families



Slide notes I used for my presentation at ICOODB 2010.

Slide notes I used for my presentation at ICOODB 2010.



Total Views
Views on SlideShare
Embed Views



6 Embeds 1,970

http://nosql.mypopescu.com 1417
http://www.nosqldatabases.com 546
http://translate.googleusercontent.com 4
http://www.twylah.com 1
http://www.linkedin.com 1
http://webcache.googleusercontent.com 1



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • It’s not about one data model vs. another.It’s not about one storage engine vs. another.Cassandra excels at replicating data and achieving high sustained write throughput.
  • The right tool for the right job
  • Shaped by distribution model
  • Shaped by distribution model
  • Shaped by distribution model
  • Sparse – do not have to exist in every row.
  • Flexible column namingYou define the sort orderNot required to have a specific column just because another row does
  • Look familiar?
  • Arise because of distribution model, not CF model.
  • * Atomic @ CF row. Not isolated.* Large trans apps push down to node (shared nothing)* Guaranteeing ACID constraints across nodes is a hard problem.
  • OTOH, you do get a lot of things:Data redundancyVery fast writes, fast reads
  • Relational>formally defined>correctQuery first>not formally defined>somehow incorrectYou get some things in exchange:ScalabilityAvailabilityReplication
  • Relational>formally defined>correctQuery first>not formally defined>somehow incorrectYou get some things in exchange:ScalabilityAvailabilityReplication
  • Focus on query & analysis.B+treesUpdate once*Cassandra typically becomes IO bound before becoming CPU bound.
  • Not set in stone.Your application may require a different approach.
  • Recognize non-starters: Is my dataset going to become Very Large? Will I need to sustain high write throughput?Also, what are the common operations? Optimize CFs for those operations.
  • *columns sorted. Choose keys and columns.you need to think about how you plan to slice your data.Related data is close to reduce io
  • DenormalizeUse the disk.Don’t be afraid to create another CF that duplicates some data.
  • Composite column namesPainful updates of denormalized partsFast reads & insertions
  • Key
  • Normal attributes
  • Composite column names.Pulling in relationshipsPainful updates. Denormalization is best when data doesn’t change.
  • Commit log – separate diskMemtableSstable

Data Modeling with Cassandra Column Families Data Modeling with Cassandra Column Families Presentation Transcript

  • Needle Meet HaystackAdapting your data models for Cassandra
    Gary Dusbabek • Rackspace• ICOODB 2010
  • Outline
    First Things First
    Column Families
    Trade Offs
    Procedures & Best Practices
  • It’s all about scalability
  • We can all be friends
  • Column Families
  • 2.TradeOffs
  • No Transactions
  • No
    Adhoc Queries
  • No Joins
  • No Flexible Indexes
  • Don’t
  • Scalability
    Replication & Backup
  • 3. Procedures & Practices
  • Relational Way
    Define entities
    Identify Many-to-many
    Query any way you want
  • How Come?
  • Cassandra Way
    Know your app
    Queries first
  • Know Your App
  • Queries First
  • Nobody is Normal
  • Relational Example
  • Column Family Example
  • Column Family Example
  • Column Family Example
  • Column Family Example
  • Does it feel strange?
  • 4. Internals
  • Sequential Writes
  • Consistency Level
  • Partitioning
  • Slices
    Data Locality
  • Summary
    • The goal is to scale
    • ColumnFamilies != Relational tables
    • Trade-offs: you win some, you lose some
    • Know your application
    • Queries first
    • Denormalization is OK
    • Cassandra was built for this
  • Links
    irc: #cassandra on freenode
  • Image Credits
    haystack http://www.flickr.com/photos/james_lumb/3921968993
    pyramids http://www.flickr.com/photos/gracewong/93631410
    scales http://www.flickr.com/photos/eflon/3465042138
    friends http://www.flickr.com/photos/ngmmemuda/4166182931
    television http://www.flickr.com/photos/angelrravelor/314306023
    columns http://www.flickr.com/photos/nostri-imago/3564300653
    devil http://www.flickr.com/photos/52890443@N02/4887855756
    angel http://www.flickr.com/photos/75001512@N00/4938623021
    transaction http://www.flickr.com/photos/neubie/2273635564
    queries http://www.flickr.com/photos/-bast-/349497988
    rings http://www.flickr.com/photos/baldur/4395738741
    indexes http://www.flickr.com/photos/waferboard/4137041591
    panic http://www.flickr.com/photos/pasukaru76/3998981988
    procedures "The Anatomy Lesson of Dr. NicolaesTuip" by Rembrandt
    relational http://www.flickr.com/photos/35536700@N07/3292544674
    desert http://www.flickr.com/photos/waldenpond/4252575735
    jet http://www.flickr.com/photos/rmahle/709685
    queries http://www.flickr.com/photos/andreanna/2812118063
    blackboard http://www.flickr.com/photos/shonk/418180402
    normal http://www.flickr.com/photos/infrogmation/3180606117
    phonograph http://www.flickr.com/photos/shiyazuni/4770244591
    dodo http://www.flickr.com/photos/wheatfields/2071347416
    Internals http://www.flickr.com/photos/37hz/4057856826
    writing http://www.flickr.com/photos/stevendepolo/3877225152
    consistency http://www.flickr.com/photos/betsyweber/4962297050
    partitioning http://www.flickr.com/photos/featheredtar/3137028766
    slices http://www.flickr.com/photos/free-stock/4899674517
    summary http://www.flickr.com/photos/jkdsphotography/4061838798
    links http://www.flickr.com/photos/creative_stock/3397559016