Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

SSD для вашей базы данных, Петр Зайцев (Percona)

2,011 views

Published on

Доклад Петра Зайцева на HighLoad++ 2014.

Published in: Internet
  • Be the first to comment

SSD для вашей базы данных, Петр Зайцев (Percona)

  1. 1. SSD/Flash for Modern Databases Peter Zaitsev, CEO, Percona November 1, 2014 Highload++ 2014 Moscow,Russia
  2. 2. Percona • Percona Server • Percona Xtrabackup • Percona XtraDB Cluster • Percona Toolkit We love Open Source Software • Consulting • Support • Managed Services We want to help you to succeed with MySQL and Beyond 2 www.percona.com
  3. 3. In this Presentation Flash technology overview Review some of the available technology What does this mean for databases ? Specific opportunities for MySQL 3 www.percona.com
  4. 4. Before SSDs 4 www.percona.com
  5. 5. There were HDDs Good at Sequential Read/Writes RT=Seek Time + Rotation Latency Reads/Write – Similar Latency No Specific Write Limits Retain data for a long time One IO Request in Parallel Low cost per GB 5 www.percona.com
  6. 6. RAID and SAN 6 www.percona.com
  7. 7. Using Many HDDs together Caching Reads Buffering Writes (Writeback Cache) Better Sequential Read/Write speed Better throughput at high concurrency Higher IO latencies for uncached IO 7 www.percona.com
  8. 8. Flash Revolution Use Flash chips instead of platters No moving parts No seeks 8 www.percona.com
  9. 9. NAND Flash Cell Page/Read Block Erase Block Write but no overwrite Wears with writes (erases) 9 www.percona.com
  10. 10. Writing to the Flash • Set all bits to “1111111…” Erase • Set some of the bits to 0: “0100111..” Write • Impossible. Do Erase, when Write Change Zero to one 10 www.percona.com
  11. 11. Types of NAND Flash From AnandTech: 11 www.percona.com
  12. 12. Flash Storage Design Cache Battery/Super Capacitor Controller + Complex Firmware Built-in Parallelism 12 www.percona.com
  13. 13. Flash Controller Tasks Write wear leveling Garbage collection Error correction Bad block mapping Read scrubbing Read disturb management Encryption 13 www.percona.com
  14. 14. Flash Properties Lots of IOs per device! (100K+) Less random IO penalty Writes more expensive than reads (but can be faster) Limited by amount of writes Limited retention Concurrent execution on single device Fast write acknowledgement (safe or not) Can burst writes 14 www.percona.com
  15. 15. Flash Interface Designs DIMM PCI-E SFF-8639 SATA/SAS FC and Network 15 www.percona.com
  16. 16. Transitioning AHCI NVMe 16 www.percona.com
  17. 17. AHCI vs NVMe • Source: AnandTech.com 17 www.percona.com
  18. 18. Sandisk ULLtraDIMM 18 www.percona.com
  19. 19. HGST Virident 19 www.percona.com
  20. 20. Sandisk FusionIO 20 www.percona.com
  21. 21. Intel P3700 21 www.percona.com
  22. 22. Intel 730 (SATA) 22 www.percona.com
  23. 23. mSATA 23 www.percona.com
  24. 24. M.2 Interface 24 www.percona.com
  25. 25. Violin Memory 25 www.percona.com
  26. 26. “Consumer” vs “Enterprise” Performance Endurance Durability Retention Encryption 26 www.percona.com
  27. 27. Not your HDD All HDDs are the same; All SSDs are different 27 www.percona.com
  28. 28. Evaluation Performance changes over time Empty Space Matters Complex internals Watch stability carefully 28 www.percona.com
  29. 29. How Flash Fails Clear write amount defined EOL (but often can handle a lot more) One day… it’s gone “Power Loss Protection” Internal ECC and redundancy 29 www.percona.com
  30. 30. To RAID or not to RAID ? More valuable for consumer grade Watch for good Flash support RAID controller logic may slow things down Use a redundant array of inexpensive servers instead? 30 www.percona.com
  31. 31. Redundancy Device internal redundancy Hardware RAID Software RAID Filesystem “RAID” 31 www.percona.com
  32. 32. OS Support Flash support is actively being improved TRIM Sparse Files 32 www.percona.com
  33. 33. Flash And Databases 33 wwwww.wpe.prceorncao.cnoam.com
  34. 34. Database History Most have been designed in HDD time Optimize for sequential IO Count on cheap sequential writes RAID, BBU to improve performance 34 www.percona.com
  35. 35. It’s time for Flash Your OLTP Database should live on Flash 35 www.percona.com
  36. 36. But What Flash ? Pick a flash type that is right for your application 36 www.percona.com
  37. 37. IO vs Memory 37 www.percona.com
  38. 38. Warmup Much faster warmup times Even if the database fits in memory, SSD might be justified 38 www.percona.com
  39. 39. Tolerate more IO bound load • 5ms • Can do 20 IO/s for 100ms response time (non parallel) HDD • 0.1ms • Can do 1000 IO/s for 100ms response time (non parallel) Flash 39 www.percona.com
  40. 40. Endurance Might be a top consideration 40 www.percona.com
  41. 41. Endurance Math • 4400GB/day over 5 Years • 1400MB/sec peak writes • 66 days at peak write throughput HGST FlashMax III 2200GB • 72TB total life time writes • 400MB/sec write • 52 hours at peak write throughput Crucial M500 960GB 41 www.percona.com
  42. 42. Databases and Flash How do we optimize databases to us Flash best? 42 www.percona.com
  43. 43. “Torn Page” problem Flash can avoid this with little cost due to internal design FusionIO NVMFS (Atomic Writes) Copy-on-Write File Systems • ZFS • BTRFS Filesystem level data journaling less preferred • data=journal for EXT4 Skip-Innodb-double-write 43 www.percona.com
  44. 44. Fast IO Path Bypass Caching O_DIRECT Native Asynchronous IO Efficient Checksuming Innodb_checksum_algorithm=crc32 Innodb_flush_method=O_DIRECT 44 www.percona.com
  45. 45. IO Cost Accounting Sequential vs Random IO balance IO vs CPU Balance Smaller page sizes might make sense • innodb_page_size=4K 45 www.percona.com
  46. 46. Less Pre-fetching Most pre-fetched data must be used Often best to try It out 46 www.percona.com
  47. 47. Less merging on flushing Do not assume flushing multiple sequential dirty pages has same cost Innodb_flush_neighbors=0 47 www.percona.com
  48. 48. Less Space on Disk Innodb Compression (2x typical) TokuDB Compression (5-10x typical) Archiving data off OLTP System 48 www.percona.com
  49. 49. Less Writes on Flash Hybrid Flash/SSD System Transactional Logs, Other logs on the HDD with RAID and BBU Small Temporary objects on tmpfs Innodb_log_file_size=<LARGE> 49 www.percona.com
  50. 50. Logs on RAID can be fast 50 www.percona.com
  51. 51. Single Intel 730 Sysbench 51 www.percona.com
  52. 52. IOPS 52 www.percona.com
  53. 53. Consistency • Graph by http://cloud.percona.com 53 www.percona.com
  54. 54. Is Flash Too Fast ? • Multiple instances might scale better 54 www.percona.com
  55. 55. Other Thoughts Host hardware and OS matter, especially with high end flash Virtualization has higher relative overhead Network higher relative overhead 55 www.percona.com
  56. 56. Thank You! Peter Zaitsev pz@percona.com @PeterZaitsev https://www.linkedin.com/in/peterzaitsev 56 wwwww.wpe.prceorncao.cnoam.com

×