Storage on EC2 (& Cassandra), Cassandra Workshop, Berlin Buzzwords

3,864 views

Published on

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,864
On SlideShare
0
From Embeds
0
Number of Embeds
104
Actions
Shares
0
Downloads
34
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Storage on EC2 (& Cassandra), Cassandra Workshop, Berlin Buzzwords

  1. 1. Storage on EC2 (& Cassandra) Tom Wilkie Cassandra Workshop 8/06/11Wednesday, 8 June 2011
  2. 2. ACHTUNG! Data only collected over past 5 days Didn’t repeat experiments (that much) EC2 is a moving targetWednesday, 8 June 2011
  3. 3. Consider: Not considering: • Ephemeral vs EBS • Cluster Performance • ... vs Instance Type • Internode latency, throughput • ... vs RAID level • Tuning... • ... vs # threads ES ... • (...vs storage engine) D F A I L UR A TE EL C OR RWednesday, 8 June 2011
  4. 4. m1.large 7.5 GB RAM, 4 CU, 64-bit, ‘High’ IO m1.xlarge 15 GB RAM, 8 CU, 64-bit, ‘High’ IO c1.xlarge 7GB RAM, 20 CU, 64-bit, ‘High’ IO Cassandra 0.7.6, CentOS 5.5, OpenJDK...Wednesday, 8 June 2011
  5. 5. Ephemeral StorageWednesday, 8 June 2011
  6. 6.    [ih-fem-er-uhl] Show IPA –adjective 1. lasting a very short time; short-lived; transitory: the ephemeral joys of childhood. 2. lasting but one day: an ephemeral flower. –noun 3. anything short-lived, as certain insects.Wednesday, 8 June 2011
  7. 7. Ephemeral Storage Seek Performance 8000 7000 6000 7000 IOPs from a disk?? 5000 m1.large, ephemeralSeek / s 4000 m1.xlarge, ephemeral c1.xlarge, ephemeral 3000 2000 1000 0 1 2 3 4 # Devices http://www.slideshare.net/davegardnerisme/ running-cassandra-on-amazon-ec2 Wednesday, 8 June 2011
  8. 8. Ephemeral Storage Seek Performance 1000 900 800 700 600 m1.large, ephemeral Seek / s 500 m1.xlarge, ephemeral c1.xlarge, ephemeral 400 300 200 100 0 1 2 3 4 # DevicesWednesday, 8 June 2011
  9. 9. Ephemeral Throughput m1.xlarge 500 450 400 350 Write (Raid-0, dd) Throughput (MB/s) 300 Read (Raid-0, dd) Write (Random 10MB 250 chunks) Read (Random 10MB 200 chunks) 150 100 50 0 1 2 3 4 # DevicesWednesday, 8 June 2011
  10. 10. # # dd if=/dev/zero of=/dev/sdd bs=512k count=20000 ... 10485760000 bytes (10 GB) copied, 201.995 seconds, 51.9 MB/s # # dd if=/dev/zero of=/dev/sdd bs=512k count=20000 ... 10485760000 bytes (10 GB) copied, 80.3673 seconds, 130 MB/sWednesday, 8 June 2011
  11. 11. • Max 4 devices per instance • Data goes away when instance is terminated (or crashes!) • Suspect there is some sort indirection layer underneath - thin provisioning / dedupe / CoW or something • Linux software RAID sucksWednesday, 8 June 2011
  12. 12. R ES ... F AI LU E LA T ED CO RR What happens if a bug in your software causes all your nodes to crash? ie say a memory leak causes an OOM... on all nodesWednesday, 8 June 2011
  13. 13. EBSWednesday, 8 June 2011
  14. 14. EBS Seek performance 3000 2500 2000 Seeks / s m1.large, ebs 1500 m1.large, ebs c1.xlarge, ebs 1000 500 0 0 5 10 15 20 25 30 # DevicesWednesday, 8 June 2011
  15. 15. EBS Random Reads m1.xlarge, raid-0 1000 900 800 700 1 2 600 3 4 Total Seek / s 5 500 6 7 400 8 9 10 300 200 100 0 1 2 3 4 5 6 7 8 9 10 # ThreadsWednesday, 8 June 2011
  16. 16. EBS Random Reads m1.xlarge, raid-0 1000 900 800 700 600 Max seek / s 500 400 300 200 100 0 0 1 2 3 4 5 6 7 8 9 10 # DevicesWednesday, 8 June 2011
  17. 17. EBS Random Reads m1.xlarge, raid-0 450 400 350 Seeks per device per second 300 250 max min avg 200 150 100 50 0 1 2 3 4 5 6 7 8 9 10 # DevicesWednesday, 8 June 2011
  18. 18. EBS Throughput10MB chunks) m1.xlarge 350 300 250 Write (Raid-0, dd) Write (Raid-0, dd) Throughput (MB/s) Write (Raid-0, dd) 200 Read (Raid-0, dd) Read (Random 10MB 150 chunks) Read (Random 10MB chunks) 100 50 0 1 2 3 4 5 6 7 8 9 10 # Devices Wednesday, 8 June 2011
  19. 19. • Limited to ~100 IOPS per device? • Or just 10ms latency? • Seems to scale pretty linearly for random IO • Sequential IO limited by network bandwidth, independent of # devices • shared with other network traffic? • Linux software RAID sucksWednesday, 8 June 2011
  20. 20. R ES ... F AI LU E LA T ED CO RR What happens when EBS breaks? http://storagemojo.com/2011/04/29/amazons-ebs-outage/ http://status.heroku.com/incident/151Wednesday, 8 June 2011
  21. 21. + II ???Wednesday, 8 June 2011
  22. 22. “Use Elastic Block Storage” http://stackoverflow.com/questions/4714879/deploy-cassandra-on-ec2 “Raid 0 EBS drives are the way to go” http://coreyhulen.org/2010/10/03/%EF%BB%BFcassandra-performance-tests-on-ec2/ “we recommend using raid0 ephemeral disks” http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cold-boot- performance-problems-td5615829.html#a5615889Wednesday, 8 June 2011
  23. 23. http://coreyhulen.org/2010/10/03/%EF%BB%BFcassandra-performance-tests-on-ec2/Wednesday, 8 June 2011
  24. 24. http://coreyhulen.org/2010/10/03/%EF%BB%BFcassandra-performance-tests-on-ec2/Wednesday, 8 June 2011
  25. 25. Insert Rates by Instance Type 35000 30000 25000 20000 Inserts / s 15000 10000 5000 0 e ral e ral e ral ebs ebs ebs hem hem hem ar ge, ar ge, ar ge, ep ep , ep 1.l 1.x l 1.x l ge, ge, rge m m c 1 .lar .x lar . xl a m m1 c1 100 threads, batch mutate size 100, values length 10, 1 column per row, 300 million valuesWednesday, 8 June 2011
  26. 26. Wednesday, 8 June 2011
  27. 27. Wednesday, 8 June 2011
  28. 28. Get Rates by Instance Type 1700 1275 Gets / s 850 425 0 m1.xlarge, ephemeral m1.xlarge, ebs 100 threads, 700 thousand valuesWednesday, 8 June 2011
  29. 29. Wednesday, 8 June 2011
  30. 30. Range Query Rates by Instance Type Too slow. No resultsWednesday, 8 June 2011
  31. 31. Wednesday, 8 June 2011
  32. 32. TODO • Repeat experiments • # threads vs # devices for ephemeral • Repeat experiments • Cluster performance - scaling, latency, throughput etc • Repeat experiments • Strategies for mixed EBS and Ephemeral? • Repeat experimentsWednesday, 8 June 2011
  33. 33. $470 110 million IOs, 360 GB-months, 560 machine hoursWednesday, 8 June 2011
  34. 34. Questions? http://github.com/acunu http://bitbucket.org/acunu http://www.slideshare.net/acunuWednesday, 8 June 2011

×