Real world capacity
Upcoming SlideShare
Loading in...5

Like this? Share it with your network


Real world capacity






Total Views
Views on SlideShare
Embed Views



4 Embeds 48 44 2 1 1



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • Add it on the Thank you for the great presentation.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • 10/01/10

Real world capacity Presentation Transcript

  • 1. Real world capacity planning: Cassandra on blades and big iron July 2011
  • 2. About me
    • Hadoop System Admin @ media6degrees
      • Watch cassandra servers as well
      • Write code (hadoop filecrusher)
    • Hive Committer
      • Variable substitution, UDFs like atan, rough draft of c* handler
    • Epic Cassandra Contributor (not!)
      • CLI should allow users to chose consistency level
      • NodeCmd should be able to view Compaction Statistics
    • Self proclaimed president of Cassandra fan club
      • Cassandra NYC User Group
      • High Performance Cassandra Cookbook
  • 3. Media6 Degrees
    • Social Targeting in online advertising
    • Real Time Bidding - A dynamic auction process where each impression is bid for in (near) real time
    • Cassandra @ work storing:
      • Visit Data
      • Ad History
      • Id Mapping
    • Multiple Data Centers (home brew replication)
    • Back end tools hadoop (Data mining, bulk loads)
    • Front end tomcat, mysql + cassandra (lookup data)
  • 4. What is this talk about?
    • Real World Capacity Planning
    • Been running c* in production > 1 year
    • Started with a hand full of nodes also running tomcat and Replication Factor 2!
    • Grew data from 0-10 TB data
    • Grew from 0-751,398,530 reads / day
    • All types of fun along the way
  • 5. Using puppet, chef... from day 1
    • “ I am going to chose Cassandra 0.6.0-beta-1 over 0.5.x so I am future proof” -- Famous quote by me
    • Cassandra is active
      • new versions are coming
      • Rolling restarts between minors
      • But much better to get all to same rev quickly
    • New nodes are coming do not let them:
      • start with the wrong settings
      • fail because you forgot open file limits, etc
  • 6. Calculating Data size on disk
    • SSTable format currently not compressed
    • Repairs, joins, and moves need “wiggle room”
    • Smaller keys and column names save space
    • Enough free space to compact your largest column family
    • Snapshots keep SSTables around after compaction
    • Most *Nix files systems need free space to avoid performance loss to fragmentation!
  • 7. Speed of disk
    • The faster the better!
    • But faster + bigger gets expensive and challenging
    • RAID0
      • Faster for streaming
      • not necessarily seeking
      • Fragile, larger the stripe, higher chance of failure
    • RAID5
      • Not as fast but survives disk failure
    • Battery backed cache helps but is $$$
    • The dedicated commit log decision
  • 8. Disk formatting
    • ext4 everywhere
    • Deletes are much better then ext3
    • Noticeable performance as disks get full
    • A full async mode for risk takers
    • Obligatory noatime fstab setting
    • using multiple file systems can result in multiple caches (check slabtop)
    • Mention XFS
  • 9. Memory
    • Garbage collection is on a separate thread(s)
    • Each request creates temporary objects
    • Cassandra's fast writes go to Memtables
      • You will never guess what they use :)
    • Bloom filter data is in memory
    • Key cache and Row cache
    • For low latency RAM must be some % of data
      • RAM not used by process is OS cache
  • 10. CPU
    • Workload could be more disk then CPU bound
    • High load needs a CPU to clean up java garbage
    • Other then serving requests, compaction uses resources
  • 11. Different workloads
    • Structured log format of C* has deep implications
    • Is data written once or does it change over time?
    • How high is data churn?
    • How random is the read/write pattern?
    • What is the write/read percentage?
    • What are your latency requirements?
  • 12. Large Disk / Big Iron key points
    • RAID0 mean time to failure with bigger stripes
    • Java can not address large heaps well
    • Compactions/Joins/repairs take a long time
      • Lowers agility when joining a node could take hours
    • Maintaining high RAM to Data percentage costly IE 2 machines with 32GB vs 1 machine with 64GB
    • Capacity heavily diminished with loss of one node
  • 13. Blade server key points
    • Management software gives cloud computing vibe
    • Cassandra internode traffic on blade back plane
    • Usually support 1-2 on board disk SCSI/SSD
    • Usually support RAM configurations up to 128G
    • Single and duel socket CPU
    • No exotic RAID options
  • 14. Schema lessons
    • You only need one column family. not always true
    • Infrequently read data in the same CF as frequently data compete for “cache”
    • Separating allows employing multiple cache options
    • Rows that are written or updated get fragmented
  • 15. Capacity Planning rule #1 Know your hard drive limits
  • 16. Capacity Planning rule #2 Writes are fast, until c* flushes and compacts so much, that they are not
  • 17. Capacity Planning rule #3
    • Row cache is fools gold
      • Faster then a read from disk cache
      • Memory use (row key + columns and values)
      • Causes memory pressure (data in and out of mem)
      • Fails with large rows
      • Cold on startup
  • 18. Capacity Planning rule #4
    • Do not upgrade tomorrow what you can upgrade today
      • Joining nodes is intensive on the cluster
      • Do not wait till c* disks are 99% utilized
      • Do not get 100% benefit of new nodes until neighbors are cleaned
      • Doubling nodes results in less move steps
      • Adding RAM is fast and takes heat of hard disk
  • 19. Capacity Planning rule #5
    • Know your traffic patterns better then yourself
  • 20. The use case: Dr. Real Time and Mr. Batch
  • 21. Dr. Real Time
      • Real time bidding needs low latency
      • Peak traffic during the day
      • Need to keep a high cache hit rate
      • Avoid compact, repair, cleanup, joins
  • 22. Dr. Real Time's Lab
      • Experiments with Xmx vs VFS caching
      • Experiments with cache sizing
      • Studying graphs as new releases and features are added
      • Monitoring dropped messages, garbage collection
      • Dr. Real Time enjoys lots of memory for GB of data on disk
        • Enjoys reading (data), writing as well
        • Nice sized memtables help to not pollute vfs cache
  • 23. Mr. Batch
      • Night falls and users sleep
      • Batch/Back loading data (bulk inserts)
      • Finding and removing old data (range scanning)
      • Maintenance work (nodetool)
  • 24. Mr. Batch rampaging through the data
      • Bulk loading
        • Write at quorum, c* work harder on front end
        • Turning off compaction
          • For short burst fine, but we are pushing for hours
          • Forget to turn it back on SSTable count gets bad fast
      • Range scanning to locate and remove old data
      • Scheduling repairs and compaction
      • Mr. Batch enjoys tearing through data
        • Writes, tombstones, range scanning, repairs
        • Enjoys fast disks for compacting
  • 25. Questions
      • ???