SSD Deployment Strategies for MySQL
Upcoming SlideShare
Loading in...5
×
 

SSD Deployment Strategies for MySQL

on

  • 28,073 views

Slides for MySQL Conference & Expo 2010: http://en.oreilly.com/mysql2010/public/schedule/detail/13519

Slides for MySQL Conference & Expo 2010: http://en.oreilly.com/mysql2010/public/schedule/detail/13519

Statistics

Views

Total Views
28,073
Views on SlideShare
27,369
Embed Views
704

Actions

Likes
38
Downloads
782
Comments
2

16 Embeds 704

http://www.dbthink.com 534
http://www.slideshare.net 110
http://yananguo.gotoip4.com 20
https://twitter.com 8
https://www.linkedin.com 6
http://www.linkedin.com 6
http://static.slidesharecdn.com 5
https://discussions.zoho.com 4
http://193.87.17.48 3
http://www.techgig.com 2
http://twitter.com 1
http://www.slashdocs.com 1
http://xianguo.com 1
http://a0.twimg.com 1
http://us-w1.rockmelt.com 1
http://stash.nasieti.sk 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • This PPT should be on the first google search result page)) If I could see that earlier)
    Are you sure you want to
    Your message goes here
    Processing…
  • nice PPT! thanks for your sharing .
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

SSD Deployment Strategies for MySQL SSD Deployment Strategies for MySQL Presentation Transcript

  • SSD Deployment Strategies for MySQL Yoshinori Matsunobu Lead of MySQL Professional Services APAC Sun Microsystems Yoshinori.Matsunobu@sun.com Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 1
  • What do you need to consider? (H/W layer) • SSD or HDD? • Interface – SATA/SAS or PCI-Express? • RAID – H/W RAID, S/W RAID or JBOD? • Network – Is 1GbE enough? • Memory – Is 2GB RAM + PCI-E SSD faster than 64GB RAM + 8HDDs? • CPU – Nehalem or older Xeon? Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 2
  • What do you need to consider? • Redundancy – RAID – DRBD (network mirroring) – Semi-Sync MySQL Replication – Async MySQL Replication • Filesystem – ext3, xfs, raw device ? • File location – Data file, Redo log file, etc • SSD specific issues – Write performance deterioration – Write endurance Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 3
  • Why SSD? IOPS! • IOPS: Number of (random) disk i/o operations per second • Almost all database operations require random access – Selecting records by index scan – Updating records – Deleting records – Modifying indexes • Regular SAS HDD : 200 iops per drive (disk seek & rotation is slow) • SSD : 2,000+ (writes) / 5,000+ (reads) per drive – highly depending on SSDs and device drivers • Let’s start from basic benchmarks Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 4
  • Tested HDD/SSD for this session • SSD – Intel X25-E (SATA, 30GB, SLC) – Fusion I/O (PCI-Express, 160GB, SLC) • HDD – Seagate 160GB SAS 15000RPM Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 5
  • Table of contents • Basic Performance on SSD/HDD – Random Reads – Random Writes – Sequential Reads – Sequential Writes – fsync() speed – Filesystem difference – IOPS and I/O unit size • MySQL Deployments Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 6
  • Random Read benchmark Direct Random Read IOPS (Single Drive, 16KB, xfs) 45000 40000 35000 30000 25000 HDD IOPS 20000 Intel SSD 15000 Fusion I/O 10000 5000 0 1 2 3 4 5 6 8 10 15 20 30 40 50 100 200 # of I/O threads • HDD: 196 reads/s at 1 i/o thread, 443 reads/s at 100 i/o threads • Intel : 3508 reads/s at 1 i/o thread, 14538 reads/s at 100 i/o threads • Fusion I/O : 10526 reads/s at 1 i/o thread, 41379 reads/s at 100 i/o threads • Single thread throughput on Intel is 16x better than on HDD, Fusion is 25x better • SSD’s concurrency (4x) is much better than HDD’s (2.2x) • Very strong reason to use SSD Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 7
  • High Concurrency • Single SSD drive has multiple NAND Flash Memory chips (i.e. 40 x 4GB Flash Memory = 160GB) • Highly depending on I/O controller and Applications – Single threaded application can not gain concurrency advantage Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 8
  • PCI-Express SSD CPU North Bridge South Bridge PCI-Express Controller SAS/SATA Controller 2GB/s (PCI-Express x 8) 300MB/s SSD I/O Controller SSD I/O Controller Flash Flash • Advantage – PCI-Express is much faster interface than SAS/SATA • (current) Disadvantages – Most motherboards have limited # of PCI-E slots – No hot swap mechanism Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 9
  • Write performance on SSD Random Write IOPS (16KB Blocks) 20000 18000 16000 14000 12000 1 i/o thread 10000 100 i/o threads 8000 6000 4000 2000 0 HDD(4 RAID10 xfs) Intel(xfs) Fusion (xfs) • Very strong reason to use SSD • But wait.. Can we get a high write throughput *anytime*? – Not always.. Let’s check how data is written to Flash Memory Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 10
  • Understanding how data is written to SSD (1) Block (empty) Block (empty) Block (empty) Block Page Page …. Flash memory chips • Single SSD drive consists of many flash memory chips (i.e. 2GB) • A flash memory chip internally consists of many blocks (i.e. 512KB) • A block internally consists of many pages (i.e. 4KB) • It is *not* possible to overwrite to a non-empty block – Reading from pages is possible – Writing to pages in an empty block is possible – Appending is possible – Overwriting to pages in a non-empty block is *not* possible Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 11
  • Understanding how data is written to SSD (2) Block (empty) Block (empty) New data Block (empty) Block × Page Page …. • Overwriting to a non-empty block is not possible • Writing new data to an empty block instead • Writing to a non-empty block is fast (-200 microseconds) • Even though applications write to same positions in same files (i.e. InnoDB Log File), written pages/blocks are distributed (Wear-Leveling) Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 12
  • Understanding how data is written to SSD (3) Block P Block P Block P P P P 1. Reading all pages Block P Block P Block P P P New P 2. Erasing the block Block Block P Block P P P P 3. Writing all data P P • In the long run, almost all blocks will be fully used New P – i.e. Allocating 158GB files on 160GB SSD • New empty block must be allocated on writes • Basic steps to write new data: – 1. Reading all pages from a block – 2. ERASE the block – 3. Writing all data w/ new data into the block • ERASE is very expensive operation (takes a few milliseconds) • At this stage, write performance becomes very slow because of massive ERASE operations Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 13
  • Data Space Reserved Space Reserved Space Block P Block P Block P Block (empty) P P P Block P Block P Block P Block (empty) P P P Block Block P Block P 2. Writing data P P P 1. Reading pages P New data Background jobs ERASE unused blocks P • To keep high enough write performance, SSDs have a feature of “reserved space” • Data size visible to applications is limited to the size of data space – i.e. 160GB SSD, 120GB data space, 40GB reserved space • Fusion I/O has a functionality to change reserved space size – # fio-format -s 96G /dev/fct0 Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 14
  • Write performance deterioration Write IOPS deterioration (16KB random write) 30000 Continuous write-intensive workloads 25000 20000 IOPS Fastest 15000 Slowest 10000 5000 Stopping writing for a while 0 Intel Fusion(150G) Fusion(120G) Fusion(96G) Fusion(80G) • At the beginning, write IOPS was close to “Fastest” line • When massive writes happened, write IOPS gradually deteriorated toward “Slowest” line (because massive ERASE happened) • Increasing reserved space improves steady-state write throughput • Write IOPS recovered to “Fastest” when stopping writes for a long time (Many blocks were ERASEd by background job) • Highly depending on Flash memory and I/O controller (TRIM support, ERASE scheduling, etc) Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 15
  • Sequential I/O Sequential Read/Write throughput (1MB consecutive reads/writes) 600 500 400 MB/s Seq read 300 Seq write 200 100 0 4 HDD(raid10, xfs) Intel(xfs) Fusion(xfs) • Typical scenario: Full table scan (read), logging/journaling (write) • SSD outperforms HDD for sequential reads, but less significant • HDD (4 RAID10) is fast enough for sequential i/o • Data transfer size by sequential writes tends to be huge, so you need to care about write deterioration on SSD • No strong reason to use SSD for sequential writes Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 16
  • fsync() speed fsync speed 20000 18000 16000 14000 fsync/sec 12000 1KB 10000 8KB 8000 16KB 6000 4000 2000 0 HDD(xfs) Intel (xfs) Fusion I/O(xfs) • 10,000+ fsync/sec is fine in most cases • Fusion I/O was CPU bound (%system), not I/O bound (%iowait). Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 17
  • HDD is fast for sequential writes / fsync • Best Practice: Writes can be boosted by using BBWC (Battery Backed up Write Cache), especially for REDO Logs (because it’s sequentially written) • No strong reason to use SSDs here seek & rotation time Write cache disk disk seek & rotation time Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 18
  • Filesystem matters Random write iops (16KB Blocks) 20000 18000 16000 14000 12000 1 thread iops 10000 8000 16 thread 6000 4000 2000 0 Fusion(ext3) Fusion (xfs) Fusion (raw) Filesystem • On xfs, multiple threads can write to the same file if opened with O_DIRECT, but can not on ext* • Good concurrency on xfs, close to raw device • ext3 is less optimized for Fusion I/O Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 19
  • Changing I/O unit size Read IOPS and I/O unit size (4 HDD RAID10) 2500 2000 1KB 1500 IOPS 4KB 1000 16KB 500 0 1 2 3 4 5 6 8 10 15 20 30 40 50 100 200 concurrency • On HDD, maximum 22% performance difference was found between 1KB and 16KB • No big difference when concurrency < 10 Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 20
  • Changing I/O unit size on SSD Read IOPS and I/O unit size (Fusion I/O) 200000 150000 1KB IOPS 100000 4KB 16KB 50000 0 1 2 3 4 5 6 8 10 15 20 30 40 50 100 200 concurrency • Huge difference • On SSDs, not only IOPS, but also I/O transfer size matters • It’s worth considering that Storage Engines support “configurable block size” functionality Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 21
  • Let’s start MySQL benchmarking • Base: Disk-bound application (DBT-2) running on: – Sun Fire X4270 – Nehalem 8 Core – 4 HDD – RAID1+0, Write Cache with Battery • What will happen if … – Replacing HDD with Intel SSD (SATA) – Replacing HDD with Fusion I/O (PCI-E) – Moving log files and ibdata to HDD – Not using Nehalem – Using two Fusion I/O drives with Software RAID1 – Deploying DRBD protocol B or C • Replacing 1GbE with 10GbE – Using MySQL 5.5.4 Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 22
  • DBT-2 condition • SuSE Enterprise Linux 11, xfs • MySQL 5.5.2M2 (InnoDB Plugin 1.0.6) • 200 Warehouses (20GB – 25GB hot data) • Buffer pool size – 1GB – 2GB – 5GB – 30GB (large enough to cache all data) • 1000 seconds warm up time • Running 3600 seconds (1 hour) • Fusion I/O: 96GB data space, 64GB reserved space Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 23
  • HDD vs Intel SSD HDD Intel Buffer pool 1G 1125.44 5709.06 (NOTPM: Transactions per minute) • Storing all data on HDD or Intel SSD • Massive disk i/o happens – Random reads for all accesses – Random writes for updating rows and indexes – Sequential writes for REDO log files, etc • SSD is very good at these kinds of workloads • 5.5 times performance improvement, without any application change! Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 24
  • HDD vs Intel SSD vs Fusion I/O HDD Intel Fusion I/O Buffer pool 1G 1125.44 5709.06 15122.75 • Fusion I/O is a PCI-E based SSD • PCI-E is much faster than SAS/SATA • 14x improvement compared to 4HDDs Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 25
  • Which should we spend money, RAM or SSD? HDD Intel Fusion I/O Buffer pool 1G 1125.44 5709.06 15122.75 Buffer pool 2G 1863.19 Buffer pool 5G 4385.18 Buffer pool 30G 36784.76 (Caching all hot data) • Increasing RAM (buffer pool size) reduces random disk reads – Because more data are cached in the buffer pool • If all data are cached, only disk writes (both random and sequential) happen • Disk writes happen asynchronously, so application queries can be much faster • Large enough RAM + HDD outperforms too small RAM + SSD Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 26
  • Which should we spend money, RAM or SSD? HDD Intel Fusion I/O Buffer pool 1G 1125.44 5709.06 15122.75 Buffer pool 2G 1863.19 7536.55 20096.33 Buffer pool 5G 4385.18 12892.56 30846.34 Buffer pool 30G 36784.76 - 57441.64 (Caching all hot data) • It is not always possible to cache all hot data • Fusion I/O + good amount of memory (5GB) was pretty good • Basic rule can be: – If you can cache all active data, large enough RAM + HDD – If you can’t, or if you need extremely high throughput, spending on both RAM and SSD Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 27
  • Let’s think about MySQL file location • SSD is extremely good at random reads • SSD is very good at random writes • HDD is good enough at sequential reads/writes • No strong reason to use SSD for sequential writes • Random I/O oriented: – Data Files (*.ibd) • Sequential reads if doing full table scan – Undo Log, Insert Buffer (ibdata) • UNDO tablespace (small in most cases, except for running long-running batch) • On-disk insert buffer space (small in most cases, except that InnoDB can not catch up with updating indexes) • Sequential Write oriented: – Doublewrite Buffer (ibdata) • Write volume is equal to *ibd files. Huge – Binary log (mysql-bin.XXXXXX) – Redo log (ib_logfile) – Backup files Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 28
  • Moving sequentially written files into HDD Fusion I/O Fusion I/O + HDD Up Buffer pool 1G 15122.75 19295.94 +28% (us=25%, wa=15%) (us=32%, wa=10%) Buffer pool 2G 20096.33 25627.49 +28% (us=30%, wa=12.5%) (us=36%, wa=8%) Buffer pool 5G 30846.34 39435.25 +28% (us=39%, wa=10%) (us=49%, wa=6%) Buffer pool 30G 57441.64 66053.68 +15% (us=70%, wa=3.5%) (us=77%, wa=1%) • Moving ibdata, ib_logfile, (+binary logs) into HDD • High impact on performance – Write volume to SSD becomes half because doublewrite area is allocated in HDD – %iowait was significantly reduced – You can delay write performance deterioration Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 29
  • Does CPU matter? Nehalem Older Xeon CPUs Memory CPUs QPI: 25.6GB/s FSB: 10.6GB/s North Bridge North Bridge (IOH) Memory (MCH) PCI-Express PCI-Express • Nehalem has two big advantages 1. Memory is directly attached to CPU : Faster for in-memory workloads 2. Interface speed between CPU and North Bridge is 2.5x higher, and interface traffics do not conflict with CPU<->Memory workloads : Faster for disk i/o workloads when using PCI-Express SSDs Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 30
  • Harpertown X5470 (older Xeon) vs Nehalem X5570 (HDD) HDD Harpertown X5470, Nehalem(X5570, Up 3.33GHz 2.93GHz) Buffer pool 1G 1135.37 (us=1%) 1125.44 (us=1%) -1% Buffer pool 2G 1922.23 (us=2%) 1863.19 (us=2%) -3% Buffer pool 5G 4176.51 (us=7%) 4385.18(us=7%) +5% Buffer pool 30G 30903.4 (us=40%) 36784.76 (us=40%) +19% us: userland CPU utilization • CPU difference matters on CPU bound workloads Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 31
  • Harpertown X5470 vs Nehalem X5570 (Fusion) Fusion I/O+HDD Harportown X5470, Nehalem(X5570, Up 3.33GHz 2.93GHz) Buffer pool 1G 13534.06 (user=35%) 19295.94 (user=32%) +43% Buffer pool 2G 19026.64 (user=40%) 25627.49 (user=37%) +35% Buffer pool 5G 30058.48 (user=50%) 39435.25 (user=50%) +31% Buffer pool 30G 52582.71 (user=76%) 66053.68 (user=76%) +26% • TPM difference was much higher than HDD • For disk i/o bound workloads (buffer pool 1G/2G), CPU utilizations on Nehalem were smaller, but TPM were much higher – Verified that Nehalem is much more efficient for PCI-E workloads • Benefit from high interface speed between CPU and PCI-Express • Fusion I/O fits with Nehalem much better than with traditional CPUs Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 32
  • We need to think about redundancy overhead • Single server + No RAID is meaningless in the real database world • Redundancy – RAID 1 / 5 / 10 – Network mirroring (DRBD) – Replication (Sync / Async) • Relative overhead for redundancy will be (much) higher than on traditional HDD environment Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 33
  • Fusion I/O + Software RAID1 • Fusion I/O itself has RAID5 feature – Writing parity bits into Flash Memory – Flash Chips are not Single Point of Failure – Controller / PCI-E Board is Single Point of Failure • Right now no H/W RAID controller is provided for PCI-E SSDs • Using Software RAID1 (or RAID10) – Two Fusion I/O drives in the same machine Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 34
  • Understanding how software RAID1 works H/W RAID1 App/DB S/W RAID1 App/DB Writing to files Writing to files on /dev/sdX Response Response on /dev/md0 Write cache with battery Software RAID daemon RAID controller “md0_raid1” process Background writes Writing to disks (in parallel) (in parallel) Disk1 Disk2 Disk1 Disk2 • Response time on Software RAID1 is max(time-to-write-to-disk1, time-to-write-to-disk2) • If either of the two takes time for ERASE, response time will be longer • On faster storages / faster writes (i.e. sequential write + fsync), relative overheads of the software raid process are higher Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 35
  • Random Write IOPS, S/W RAID1 vs No-RAID Random Write IOPS (Fusion I/O 160GB SLC, 16KB I/O unit, XFS) 50000 45000 40000 35000 No-RAID (120G) 30000 IOPS S/W RAID1 (120G) 25000 No-RAID (96G) 20000 15000 S/W RAID1 (96G) 10000 5000 0 1 61 121 181 241 301 361 421 481 Running time (minutes) • 120GB data space = 40GB additional reserved space • 96GB data space = 64GB additional reserved space • On S/W RAID1, IOPS deteriorated more quickly than on No-RAID • On S/W RAID1 with 96GB data space, the slowest line was smaller than No-RAID • 20-25% performance drop can be expected on disk write bound workloads Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 36
  • What about Reads? Read IOPS (16KB Blocks) 80000 70000 60000 50000 IOPS No-RAID 40000 S/W RAID1 30000 20000 10000 0 1 2 3 4 5 6 8 10 15 20 30 40 50 100 200 concurrency • Theoretically reads IOPS can be twice by RAID1 • Peak IOPS was 43636 on No-RAID, 75627 on RAID, 73% up • Good scalability Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 37
  • DBT-2, No-RAID vs S/W RAID on Fusion I/O Fusion I/O+HDD RAID 1 Fusion %iowait Down I/O+HDD Buffer pool 1G 19295.94 15468.81 10% -19.8% Buffer pool 2G 25627.49 21405.23 8% -16.5% Buffer pool 5G 39435.25 35086.21 6-7% -11.0% Buffer pool 30G 66053.68 66426.52 0-1% +0.56% Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 38
  • Intel SSDs with a traditional H/W raid controller Single raw Intel Four RAID5 Intel Down Buffer pool 1G 5709.06 2975.04 -48% Buffer pool 2G 7536.55 4763.60 -37% Buffer pool 5G 12892.56 11739.27 -9% • Raw SSD drives performed much better than using a traditional H/W raid controller – Even on RAID10 performance was worse than single raw drive – H/W Raid controller seemed serious bottleneck – Make sure SSD drives have write cache and capacitor itself (Intel X25- V/M/E doesn’t have capacitor) • Use JBOD + write cache + capacitor • Research appliances such as Schooner, Gear6, etc • Wait until H/W vendors release great H/R raid controllers that work well with SSDs Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 39
  • What about DRBD? • Single server is not Highly Available – Mother Board/RAID Controller/etc are Single Point of Failure • Heartbeat + DRBD + MySQL is one of the most common HA (Active/Passive) solutions • Network might be a bottleneck – 1GbE -> 10GbE, InfiniBand, Dolphin Interconnect, etc • Replication level – Protocol A (async) – Protocol B (sync to remote drbd receiver process) – Protocol C (sync to remote disk) • Network channel is single threaded – Storing all data under /data (single DRBD partition) => single thread – Storing log/ibdata under /hdd, *ibd under /ssd => two threads Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 40
  • DRBD Overheads on HDD HDD No DRBD DRBD Protocol DRBD Protocol B, B, 1GbE 10GbE Buffer pool 1G 1125.44 1080.8 1101.63 Buffer pool 2G 1863.19 1824.75 1811.95 Buffer pool 5G 4385.18 4285.22 4326.22 Buffer pool 30G 36784.76 32862.81 35689.67 • DRBD 8.3.7 • DRBD overhead (protocol B) was not big on disk i/o bound workloads • Network bandwidth difference was not big on disk i/o bound workloads Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 41
  • DRBD Overheads on Fusion I/O Fusion I/O+HDD No DRBD DRBD Protocol Down DRBD Protocol Down B, 1GbE B, 10GbE Buffer pool 1G 19295.94 5976.18 -69.0% 12107.88 -37.3% Buffer pool 2G 25627.49 8100.5 -68.4% 16776.19 -34.5% Buffer pool 5G 39435.25 16073.9 -59.2% 30288.63 -23.2% Buffer pool 30G 66053.68 37974 -42.5% 62024.68 -6.1% • DRBD overhead was not negligible • 10GbE performed much better than 1GbE • Still 6-10 times faster than HDD • Note: DRBD supports faster interface such as InfiniBand SDP and Dolphin Interconnect Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 42
  • Misc topic: Insert performance on InnoDB vs MyISAM (HDD) Time to insert 1 million records (HDD) 5000 4000 250 rows/s Seconds 3000 innodb 2000 myisam 1000 0 1 10 19 28 37 46 55 64 73 82 91 100 109 118 127 136 145 Existing records (millions) • MyISAM doesn’t do any special i/o optimization like “Insert Buffering” so a lot of random reads/writes happen, and highly depending on OS • Disk seek & rotation overhead is really serious on HDD Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 43
  • Note: Insert Buffering (InnoDB feature) • If non-unique, secondary index blocks are not in memory, InnoDB inserts entries to a special buffer(“insert buffer”) to avoid random disk i/o operations – Insert buffer is allocated on both memory and innodb SYSTEM tablespace • Periodically, the insert buffer is merged into the secondary index trees in the database (“merge”) Insert buffer • Pros: Reducing I/O overhead – Reducing the number of disk i/o operations by merging i/o requests to the same block Optimized i/o – Some random i/o operations can be sequential • Cons: Additional operations are added Merging might take a very long time – when many secondary indexes must be updated and many rows have been inserted. – it may continue to happen after a server shutdown and restart Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 44
  • Insert performance: InnoDB vs MyISAM (SSD) Time to insert 1million records (SSD) 600 500 2,000 rows/s 400 Seconds InnoDB 300 MyISAM 200 5,000 rows/s 100 0 1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 103 Existing records (millions) Index size exceeded buffer pool size Filesystem cache was fully used, disk reads began • MyISAM got much faster by just replacing HDD with SSD ! Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 45
  • Try MySQL 5.5.4 ! Fusion I/O + HDD MySQL5.5.2 MySQL5.5.4 Up Buffer pool 1G 19295.94 24019.32 +24% Buffer pool 2G 25627.49 32325.76 +26% Buffer pool 5G 39435.25 47296.12 +20 Buffer pool 30G 66053.68 67253.45 +1.8% • Got 20-26% improvements for disk i/o bound workloads on Fusion I/O – Both CPU %user and %iowait were improved • %user: 36% (5.5.2) to 44% (5.5.4) when buf pool = 2g • %iowait: 8% (5.5.2) to 5.5% (5.5.4) when buf pool = 2g, but iops was 20% higher – Could handle a lot more concurrent i/o requests in 5.5.4 ! – No big difference was found on 4 HDDs • Works very well on faster storages such as Fusion I/O, lots of disks Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 46
  • Conclusion for choosing H/W • Disks – PCI-E SSDs (i.e. Fusion I/O) perform very well – SAS/SATA SSDs (i.e. Intel X25) – Carefully research RAID controller. Many controllers do not scale with SSD drives – Keep enough reserved space if you need to handle massive write traffics – HDD is good at sequential writes • Use fast network adapter – 1GbE will be saturated on DRBD – 10GbE or Infiniband • Use Nahalem CPU – Especially when using PCI-Express SSDs Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 47
  • Conclusion for database deployments • Put sequentially written files on HDD – ibdata, ib_logfile, binary log files – HDD is fast enough for sequential writes – Write performance deterioration can be mitigated – Life expectancy of SSD will be longer • Put randomly accessed files on SSD – *ibd files, index files(MYI), data files(MYD) – SSD is 10x -100x faster for random reads than HDD • Archive less active tables/records to HDD – SSD is still much expensive than HDD • Use InnoDB Plugin – Higher scalability & concurrency matters on faster storage Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 48
  • What will happen in the real database world? • These are just my thoughts.. • Less demand for NoSQL – Isn’t it enough for many applications just to replace HDD with Fusion I/O? – Importance on functionality will be relatively stronger • Stronger demand for Virtualization – Single server will have enough capacity to run two or more mysqld instances • I/O volume matters – Not just IOPS – Block size, disabling doublewrite, etc • Concurrency matters – Single SSD scales as well as 8-16 HDDs – Concurrent ALTER TABLE, parallel query Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 49
  • Special Thanks To • Koji Watanabe – Fusion I/O Japan • Hideki Endo – Sumisho Computer Systems, Japan – Rent me two Fusion I/O 160GB SLC drives • Daisuke Homma, Masashi Hasegawa - Sun Japan – Did benchmarks together Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 50
  • Thanks for attending! • Contact: – E-mail: Yoshinori.Matsunobu@sun.com – Blog http://yoshinorimatsunobu.blogspot.com – @matsunobu on Twitter Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 51
  • Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database 52