Cassandra at scale

2,558 views

Published on

A 30 minute talk I did at Cassandra Dublin and Cassandra London. Just some things I've learned along the way as I've helped some of the largest users of Cassandra be successful. Learn form other peoples mistakes!

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,558
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
57
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Cassandra at scale

  1. 1. Apache Cassandra at Scale Patrick McFadin | Solution Architect | DataStax Saturday, July 13, 13
  2. 2. Apache Cassandra at Scale Patrick McFadin | Solution Architect | DataStax Saturday, July 13, 13
  3. 3. Who is this dude? • Patrick McFadin • Solution Architect at DataStax • Cassandra MVP • User for years • Follow me for more: I talk about Cassandra and building scalable, resilient apps ALL THE TIME! @PatrickMcFadin Saturday, July 13, 13
  4. 4. What do you mean “at scale”? • Personally been involved in ~1000 node deployments • .5 PB total space • Millions of transactions per second • Critical lines of business • Multiple Datacenters 3 Saturday, July 13, 13
  5. 5. Time to scale 4 A few tips to help you get there Saturday, July 13, 13
  6. 6. Scaling busters Disk IO • Cassandra is (almost) never CPU bound • Can your server do this? 5 Disk System Long Sequential Read • No?You have trouble. • Shared storage (NAS, iSCSI) - Just no. See above. - IOPS aren’t going to help Long Sequential Write At the same time?!! Saturday, July 13, 13
  7. 7. Scaling Busters Spinning disk considerations • Separate commit and data disks •Tune for reads and writes at the same time. - Quick test while watching iostat: • Start a long read using dd command • Start a long write using dd command • Did one of them drop to the floor? #fail •Think about using JBOD instead of RAID. - Each mount point a data dir line listed in config file 6 Saturday, July 13, 13
  8. 8. Scaling Busters SSD Considerations • Scheduler! CFQ is wrong. Use deadline or noop - EX: echo noop > /sys/block/sda/queue/scheduler •Turn rotational off - EX: echo 0 > /sys/block/sda/queue/rotational • Read ahead buffers - EX: echo 0 > /sys/block/sda/queue/read_ahead_kb - Start with 0 (better for random reads) - Walk it up while testing under your load • Commit and data can coexist • MLC drives, not SLC. Save your money 7 Saturday, July 13, 13
  9. 9. Scaling Busters OSTuning • Process limits > 10000 • Open files > unlimited • Memory and network •Turn swap off • Read this: Recommended production settings 8 http://www.datastax.com/documentation/cassandra/1.2/index.html#cassandra/install/installRecommendSettings.html Saturday, July 13, 13
  10. 10. Scaling busters Horrible use cases • Relational model projected. - Lots of tables needing a join - Normalized data everywhere - “How can I migrate my RDBMS data to C*” • Deep and perverse desire for a lock • Using secondary indexes to simulate a RDBMS • Row cache with a lot of small slices 9 Saturday, July 13, 13
  11. 11. Great ideas from the real world • ProperTTLs with reverse comparators • GZIP blob data in column values • Load testing with production data model - And similar production data! • Engaging experts 10 Saturday, July 13, 13
  12. 12. Success Plan Learn Data Modeling •The Data Model is Dead, Long Live the Data Model • Become a Super Modeler • Next top Data Model 11 My data modeling webinars on Planet Cassandra Saturday, July 13, 13
  13. 13. Success Plan Learn CQL 12 CREATE TABLE username_video_index ( username varchar, videoid uuid, upload_date timestamp, video_name varchar, PRIMARY KEY (username, videoid) ); SELECT video_name FROM username_video_index WHERE username = ‘ctodd’ AND videoid = ‘99051fe9’ Saturday, July 13, 13
  14. 14. Success Plan Use DataStax Drivers • Async IO. (Netty for Java) • Replace multi-get with executeAsync() •Token aware strategy • Java Driver • C# Driver • Python Driver (soon) 13 Saturday, July 13, 13
  15. 15. Success Plan Great online resources! • Cassandra Summit 2013 SF online now! • Planet Cassandra (www.planetcassandra.org) • IRC #cassandra on irc.freenode.com • Users mailing list 14 Saturday, July 13, 13
  16. 16. 15 Cassandra Summit Europe 2013 CALL FOR PAPERS SPONSORSHIP 30+ Sessions TWO DAYS TRAINING DAYCALL FOR PAPERS SPONSORSHIP OPPORTUNITY TWO DAYS 30+ SESSIONS TRAINING DAY Saturday, July 13, 13
  17. 17. ThankYou Q&A Saturday, July 13, 13

×