C* Summit 2013: No moving parts. Taking advantage of Pure Speed by Matt Kennedy

1,182 views

Published on

Flash Memory technology, deployed as server-side PCIe or solid state disks (SSDs), is emerging as a critical tool for performance and efficiency in data centers of all scales. This presentation will discuss how the use of Flash impacts Cassandra deployments in terms of configuration, DRAM requirements and performance expectations. Ideas on leveraging C*'s cutting-edge data-center awareness to blend flash and disk storage nodes for cost and workload efficiency will also be shared. Flash media itself will be examined from a physical perspective to understand endurance issues. Data on write amplification under bulk-load and operational workload conditions will be presented to explain the impact to Flash of C*'s Log Structured Merge Tree architecture and the associated compactions. Finally, we will examine strategies to make Cassandra more Flash-aware using both conventional techniques as well as emerging Non-volatile memory (NVM) programming capabilities. Lessons learned from real-world customer deployments will be shared to complete this presentation.

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,182
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
48
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

C* Summit 2013: No moving parts. Taking advantage of Pure Speed by Matt Kennedy

  1. 1. Fusion-io Confidential—Copyright © 2013 Fusion-io, Inc. All rights reserved.Cassandra With No Moving PartsMatt KennedyCassandra Summit: June 12, 2013
  2. 2. Switch your database to flashnow. Or you’re doing it wrong.Brian Bulkowski, Aerospike Founder and CTOJune 18, 2013 2#Cassandra13http://highscalability.com/blog/2012/12/10/switch-your-databases-to-flash-storage-now-or-youre-doing-it.html
  3. 3. June 18, 2013 3#Cassandra13Why?
  4. 4. Flash IOPS Drives Server AdoptionJune 18, 2013 4▸ Capacity▸ IOPS▸ Cost per IOP4TB 3TB150 200,000$$$$ ¢¢¢¢#Cassandra13
  5. 5. June 18, 2013 5#Cassandra13What is flash?
  6. 6. NAND Flash MemoryJune 18, 2013 6Flash is a persistent memory technology invented byDr. Fujio Masuoka at Toshiba in 1980.BitLineSourceLine Word LineControl GateFloat GateNPN#Cassandra13
  7. 7. Consumer Volume Drives EconomicsJune 18, 2013 7#Cassandra13
  8. 8. Flash in ServersJune 18, 2013 8#Cassandra13
  9. 9. Direct Cut Through ArchitectureJune 18, 2013 #Cassandra13 9PCIeDRAMHostCPUAppOSLEGACY APPROACH FUSION DIRECT APPROACHPCIeSASDRAMData pathControllerNANDHostCPURAIDControllerAppOSGoal of every I/O operation to move data to/from DRAM and flash.SCSuperCapacitors
  10. 10. June 18, 2013 10#Cassandra13How can we useit in Cassandra?
  11. 11. Cassandra I/O - WritesJune 18, 2013 11http://www.datastax.com/docs/1.2/dml/about_writes#Cassandra13
  12. 12. Cassandra I/O - ReadsJune 18, 2013 12http://www.datastax.com/docs/1.2/dml/about_reads#Cassandra13Memory
  13. 13. DRAM Dictates Cassandra ScalingJune 18, 2013 13▸ Key Design Principle:▸ Working Set < DRAM#Cassandra13
  14. 14. DOLLARSCost of DRAM Modules020040060080010001200140016004GB 8GB 16GB 32GB#Cassandra13June 18, 2013 14$ $$$$$$$$$$$
  15. 15. When do we scale out?June 18, 2013 15▸ A typical server…CPU Cores: 32 with HTMemory: 128 GB…is your working set > 128GB?#Cassandra13
  16. 16. Is there a better way?June 18, 2013 16▸ With NoSQL Databases, we tend to scale out forDRAMCombined ResourcesCPU Cores: 96Memory: 384 GBMore cores than needed to serve reads and writes.#Cassandra13
  17. 17. Flash Offers A New ArchitecturalChoiceJune 18, 2013 #Cassandra13 17Milliseconds 10-3 Microseconds 10-6 Nanoseconds 10-9CPU Cache DRAMDisk DrivesServer-based Flash
  18. 18. Three Deployment OptionsJune 18, 2013 181.  All Flash2.  Data Placement (CASSANDRA-2749)3.  Use Logical Data Centers#Cassandra13
  19. 19. Cassandra with All-Flash StorageJune 18, 2013 #Cassandra13 19Step 1: Mount ioMemory at /var/lib/cassandra/dataStep 2:
  20. 20. Data PlacementJune 18, 2013 20▸  https://issues.apache.org/jira/browse/CASSANDRA-2749•  Thanks Marcus!▸ Takes advantage of filesystem hierarchy▸ Use mount points to pin Keyspaces or ColumnFamilies to flash:•  /var/lib/cassandra/data/{Keyspace}/{CF}▸ Use flash for high performance needs, disk forcapacity needs#Cassandra13
  21. 21. Data Centers for Storage ControlJune 18, 2013 21DC1(Interactive requests)DC3(High density replicas)DC2(Hadoop MR Jobs)PERFORMANCECAPACITY/NODEHIGHMEDIUMLOWHIGHCassandra cluster#Cassandra13
  22. 22. June 18, 2013 #Cassandra13 22The Numbers
  23. 23. YCSB Testing SetupJune 18, 2013 23#Cassandra13x4x4YCSB Load Generator10GB 16-cores24GB DRAMWorkloads use uniformrandom key selectioninstead of Zipfian.150 million 1KB records, RF=3: ~ 120GB SSTables/node
  24. 24. YCSB: Bulk Load (CL=ALL)June 18, 2013 #Cassandra13 24YCSBINSERTS0  10000  20000  30000  40000  50000  60000  70000  10  70  130  190  250  310  370  430  490  550  610  670  730  790  850  910  970  1030  1090  1150  1210  1270  1330  1390  1450  1510  1570  1630  1690  1750  1810  1870  1930  1990  2050  2110  2170  2230  2290  2350  2410  2470  2530  2590  2650  2710  2770  2830    inserts/sec  Avg  Latency:  0.9  ms  95th  Percen?le:  1  ms  99th  Percen?le:  4  ms    
  25. 25. 95/5 R/W Uniform distributionJune 18, 2013 #Cassandra13 25MIXEDOPS/SEC0  10000  20000  30000  40000  50000  60000  70000  80000  10  30  50  70  90  110  130  150  170  190  210  230  250  270  290  310  330  350  370  390  410  430  450  470  490  510  530  550  570  590  610  630  650  670  690  75  threads    200  threads    300  threads  # threads Avg Lat. 95th pctl 99th pctl75 1.4/0.22 ms 2/0 ms 5/0 ms200 3.1/0.19 ms 7/0 ms 13/0 ms300 4.4/2.2 ms 11/0 ms 19/0 ms
  26. 26. 50/50 R/W Uniform distribution 10hrsJune 18, 2013 #Cassandra13 26YCSBMIXEDOPS/SEC0  10000  20000  30000  40000  50000  60000  70000  10  730  1450  2170  2890  3610  4330  5050  5770  6490  7210  7930  8650  9370  10090  10810  11530  12250  12970  13690  14410  15130  15850  16570  17290  18010  18730  19450  20170  20890  21610  22330  23050  23770  24490  25210  25930  26650  27370  28091  28811  29531  30251  30971  31691  32411  33131  33851  34571  35291    mixed  ops/sec  Update  Latency  Average:  511  µs  95th  Pctl:1  ms  99th  Pctl:  2  ms    Read  Latency  Average:  7.0  ms  95th  Pctl:  18  ms  99th  Pctl:  42  ms    
  27. 27. Write AmplificationJune 18, 2013 27#Cassandra13Amplification Factor =Physical Bytes WrittenWorkload Bytes WrittenWorkload Write AmpLeveledCompaction Load(250MB tier-0)0.8-1.2x24-hour mixedworkloads1.2-2.1xSize-tiered w/MajorCompactions (oldskool)3-15xWorkload Type AmplificationFactorBulk Load 14.8NormalOperations(80/20 update/insert split)4.2CassandraCompares favorably to HBase
  28. 28. Next Step in Flash EvolutionJune 18, 2013 28FLASH ASMEMORYNATIVE FLASHAPIsFLASH AS DISK#Cassandra13
  29. 29. Rethinking Cassandra I/OJune 18, 2013 29http://www.datastax.com/docs/1.2/dml/about_writesFlash#Cassandra13
  30. 30. Rethinking Cassandra I/OJune 18, 2013 30#Cassandra13http://www.datastax.com/docs/1.2/dml/about_writesFlashtable
  31. 31. Accelerating Cassandra With FlashJune 18, 2013 31+#Cassandra13NAND Flash Accelerator
  32. 32. Real-World Cassandra on FusionJune 18, 2013 32#Cassandra13
  33. 33. f u s i o n i o . c o m | R E D E F I N E W H A T ’ S P O S S I B L ETHANK YOUf u s i o n i o . c o m | R E D E F I N E W H A T ’ S P O S S I B L ETHANK YOU
  34. 34. Cassandra: ioDrive2 vs 10 diskRAID-0June 18, 2013 34#Cassandra13
  35. 35. 12-hour mixed read/write workloadJune 18, 2013 Fusion-io Confidential 35MIXEDWORKLOAD0  5000  10000  15000  20000  25000  30000  35000  40000  10  880  1750  2620  3490  4360  5230  6100  6970  7840  8710  9580  10450  11320  12190  13060  13930  14800  15670  16540  17410  18280  19150  20020  20890  21760  22630  23500  24370  25240  26110  26980  27850  28720  29590  30460  31331  32201  33071  33941  34811  35681  36551  37421  38291  39161  40031  40901  41771  42641    CL=1  Reads   CL=Q  Reads    CL=Q  Writes  (throMled)  
  36. 36. 50/50 R/W Uniform distributionJune 18, 2013 #Cassandra13 36YCSBMIXEDOPS/SEC0  20000  40000  60000  80000  100000  120000  10  30  50  70  90  110  130  150  170  190  210  230  250  270  290  310  330  350  370  390  410  430  450  470  490  510  530  550    mixed  ops/sec  Update  Latency  Average:  311  µs  95th  Pctl:0  ms  99th  Pctl:  1  ms    Read  Latency  Average:  8.2  ms  95th  Pctl:  20  ms  99th  Pctl:  62  ms    

×