Database & Technology 1 _ Guy Harrison _ Making the most of SSD in Oracle11g.pdf
Upcoming SlideShare
Loading in...5
×
 

Database & Technology 1 _ Guy Harrison _ Making the most of SSD in Oracle11g.pdf

on

  • 1,119 views

 

Statistics

Views

Total Views
1,119
Views on SlideShare
1,119
Embed Views
0

Actions

Likes
1
Downloads
49
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Database & Technology 1 _ Guy Harrison _ Making the most of SSD in Oracle11g.pdf Database & Technology 1 _ Guy Harrison _ Making the most of SSD in Oracle11g.pdf Presentation Transcript

  • Making the most of Solid State Diskin Oracle 11gGuy HarrisonDirector, R&D MelbourneEmail: guy.harrison@quest.comTwitter: @guyharrisonWeb: http://www.guyharrison.net ©2011 Quest Software, Inc. All rights reserved..
  • Introductions
  • Star trek shirt fatality analysis RedYellow Blue 0 10 20 30 40 50 60 70 80 Pct
  • Agenda• Brief History of Magnetic Disk• Solid State Disk (SSD) technologies• SSD internals• Oracle DB flash cache architecture• Performance comparisons• Recommendations and Suggestions 10 ©2011 Quest Software, Inc. All rights reserved..
  • A brief history of disk 11 ©2011 Quest Software, Inc. All rights reserved..
  • 5MB HDD circa 1956
  • 28MB HDD - 19611800 RPM
  • The more that things change....
  • Moore’s law• Transistor density doubles every 18 months• Exponential growth is observed in most electronic components: •  CPU clock speeds •  RAM •  Hard Disk Drive storage density• But not in mechanical components •  Service time (Seek latency) – limited by actuator arm speed and disk circumference •  Throughput (rotational latency) – limited by speed of rotation, circumference and data density 15 ©2011 Quest Software, Inc. All rights reserved..
  • Disk trends 2001-2009 2,000 1,500 1,000 %age change 500 260 1,635 1,013 0 -630 -390 -500 -1,000 IO Rate Disk Capacity IO/Capacity CPU IO/CPU
  • Solid State Disk 17 ©2011 Quest Software, Inc. All rights reserved..
  • SSD to the rescue? SSD DDR-RAM 15 SSD PCI flash 25 SSD SATA Flash 80 Magnetic Disk 4,000 0 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 4,500 Seek time (us)
  • Power consumption Start up 20 0.15 Seek Flash SSD 10 SATA HDD 0.08 Idle 8 0.01 0.1 1 10 100 Watts (logarithmic scale)
  • Economics of SSD $/GB 0.00 10.00 20.00 30.00 40.00 50.00 60.00 0.06 FusionIO PCI SLC SSD 53.44 0.06 FusionIO PCI MLC Duo SSD 24.92 0.05 Intel SLC SATA SSD 21.88 $/IOP 0.05 $/GB Intel MLC SATA SSD 6.88 Seagate SAS HDD 1.00 1.53 Seagate SATA HDD 0.09 2.38 0.00 0.50 1.00 1.50 2.00 2.50 $/IOP
  • Tiered storage management Main Memory DDR SSD Flash SSD $/GB $/IOP Fast Disk (SAS, RAID 0+1) Slow Disk (SATA, RAID 5) Tape, Flat Files, Hadoop
  • SSD technology and internals 22 ©2011 Quest Software, Inc. All rights reserved..
  • Flavours of Flash SSD DDR RAM Drive SATA flash drive PCI flash drive SSD storage Server
  • PCI SSD vs SATA SSD PCI vs SATA •  SATA was designed for traditional disk drives with high latencies •  PCI is designed for high speed devices •  PCI SSD has latency ~ 1/3rd of SATA
  • Booth 1107 25
  • Flash SSD Technology Storage Hierarchy: •  Cell: One (SLC) or Two (MLC) bits •  Page: Typically 4K •  Block: Typically 128-512K Writes: •  Read and first write require single page IO •  Overwriting a page requires an erase & overwrite of the block Write endurance: •  100,000 erase cycles for SLC before failure •  5,000 – 10,000 erase cycles for MLC 26 ©2011 Quest Software, Inc. All rights reserved..
  • Flash SSD performanceUpdate (256K block erase) 2000First insert (4k page write) 250 Read (4k page seek) 25 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Microseconds 27 ©2011 Quest Software, Inc. All rights reserved..
  • Flash Disk write degradation Empty Partially FullAll Blocks empty:Write time=250 us25% part full:•  Write time= ( ¾ * 250 us + 1/4 * 2000 us) = 687 us75% part full•  Write time = ( ¼ * 250 us + ¾ * 2000 us ) = 1562 us
  • Data Insert Free Block Pool Insert SSD Controller Used Block Pool Empty Data Page Valid Data Page InValid Data Page
  • Free Block PoolData Update Update SSD Controller Used Block Pool Empty Data Page Valid Data Page Invalid Data Page
  • Free Block PoolGarbage Collection SSD Controller Used Block Pool Empty Data Page Valid Data Page Invalid Data Page
  • 32©2011 Quest Software, Inc. All rights reserved..
  • 11g DB flash Cache 33 ©2011 Quest Software, Inc. All rights reserved..
  • Oracle DB flash cache• Introduced in 11gR2 forOEL and Solaris only• Secondary cachemaintained by the DBWR,but only when idle cyclespermit• Architecture is tolerant ofpoor flash writeperformance 34 ©2011 Quest Software, Inc. All rights reserved..
  • Buffer cache and Free buffer waits Buffer Read from buffer cache Free cacheOracle process Buffer Write to buffer cache Waits Free buffer waits often occur DBWR when reads are much faster than writes.... Read from disk Database files Write dirty blocks to disk
  • Flash Cache Buffer Read from buffer cache cacheOracle process Write to buffer cache Read from flash cache Flash Cache DBWR Write clean blocks (time permitting) DB Flash cache architecture is designed to accelerate buffered reads Read from disk Write dirty blocks to disk Database files
  • Configuration• Create filesystem from flash device• Set DB_FLASH_CACHE_FILE and DB_FLASH_CACHE_SIZE.• Consider Filesystemio_options=setall 37 ©2011 Quest Software, Inc. All rights reserved..
  • Flash KEEP pool• You can prioritise blocks for important objects using the FLASH_CACHE clause: 38 ©2011 Quest Software, Inc. All rights reserved..
  • Oracle Db flash cache statistics http://guyharrison.squarespace.com/storage/flash_insert_stats.sql 39 ©2011 Quest Software, Inc. All rights reserved..
  • Flash Cache Efficiency http://guyharrison.squarespace.com/storage/flash_time_savings.sql
  • Flash cache Contents http://guyharrison.squarespace.com/storage/flashContents.sql
  • Performance tests 42 ©2011 Quest Software, Inc. All rights reserved..
  • Test systems• Low end system: •  Dell Optiplex dual-core 4GB RAM •  2xSeagate 7500RPM Baracuda SATA HDD •  Intel X-25E SLC SATA SSD• Higher end system: •  Dell R510 2xquad core, 32 GB RAM •  4x300GB 15K RPM,6Gbps Dell SAS HDD •  1xFusionIO ioDrive SLC PCI SSD 43 ©2011 Quest Software, Inc. All rights reserved..
  • Performance: indexed reads(X-25)Flash tablespace 48.17 CPU Flash cache 143.27 db file IO flash cache IO Other No Flash 529.7 0 100 200 300 400 500 600 Elapsed (s)
  • Performance: Read/Write (X-25)Flash tablespace 200 CPU db file IO Flash Cache 1,693 write complete free buffer flash cache IO Other 3,289 No Flash 0 500 1000 1500 2000 2500 3000 3500 Elapsed time (s)
  • Random reads – FusionIO Table on SSD 121 SAS disk, flash cache 583 CPU Other DB File IO Flash cache IOSAS disk, no flash cache 2,211 0 500 1000 1500 2000 2500 Elapsed time (s)
  • Updates – Fusion IO Table on SSD 529 DB CPU db file IO SAS disk, flash cache 1,934 log file IO flash cache free buffer waits OtherSAS disk, no flash cache 6,219 0 1000 2000 3000 4000 5000 6000 7000 Elapsed Time (s)
  • Full table scan – FusionIO Table on SSD 72 CPU SAS disk, flash cache 398 Other DB File IO Flash Cache IOSAS disk, no flash cache 418 0 50 100 150 200 250 300 350 400 450 Elasped time (s)
  • Sorting – what we expect Multi-pass Disk SortTime Single Pass Disk Sort Memory Sort PGA Memory available (MB) Table/Index IO CPU Time Temp Segment IO 49
  • Disk Sorts – temporary tablespace 4000 3500 3000 Multi-pass Disk Sort 2500 Elapsed time (s) 2000 1500 Single Pass 1000 Disk Sort 500 0300 250 200 150 100 50 0 Sort Area Size SAS based TTS SSD based TTS 50
  • Redo performance – Fusion IOSAS based redo log 291.93 CPU Log IOFlash based redo log 292.39 0 50 100 150 200 250 300 350 Elapsed time (s)
  • Concurrent redo workload (x10)Flash based redo log 1,637 331 1,681 CPU Other Log File IOSAS based redo log 1,605 397 1,944 0 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 4,500 Elapsed time (s) 52
  • Buffer Cache bottlenecks•  Flash cache architecture avoids ‘free buffer waits’ due to flash IO, but write complete waits can still occur on hot blocks.•  Free buffer waits are still likely against the database files, due to high physical read rates created by the flash cache 53 ©2011 Quest Software, Inc. All rights reserved..
  • Write degradation• In theory, high sustained write IO can lead to SSD degradation when GC fails to cope with the block erase/update cycle• In practice, this is rarely noticeable from Oracle: •  Oracle write IO is largely asynchronous (DBWR) •  Almost all write activity has at least an equal amount of read activity •  Garbage collection and wear levelling algorithms are sophisticated in decent SSD drives 54 ©2011 Quest Software, Inc. All rights reserved..
  • 55©2011 Quest Software, Inc. All rights reserved..
  • 56©2011 Quest Software, Inc. All rights reserved..
  • Fusion IO direct cache File System/ Raw File System/ Raw Devices/ ASM Devices/ ASM• TempTablespace Caching Block Device•  Hot Segments Regular Block Device Read-•  Hot Partitions directCache intensive,•  DB Flash potentiallyCache ioMemory VSL ioMemory VSL massive tablespaces(limited to thesize of the SSD) LUN 57 ©2011 Quest Software, Inc. All rights reserved.. 57
  • Fusion IO direct cache – Table scansdirect cache on 2nd scan 36direct cache on 1st scan 147 CPU IO Other No cache 2nd scan 147 No cache 1st scan 147 0 20 40 60 80 100 120 140 160 Elapsed time (s)
  • Exadata 59 ©2011 Quest Software, Inc. All rights reserved.. 59
  • Exadata flash storage• 4x96GB PCI Flash drives on each storage server• Flash can be configured as: •  Exadata Smart Flash Cache (ESFC) •  Solid State Disk available to ASM disk groups• ESFC is not the same as the DB flash cache: •  Maintained by cellsrv, not DBWR •  DOES supprort full table scans •  DOES NOT support smart scans •  Unless CELL_FLASH_CACHE= KEEP, •  Statistics accessed via the cellcli program• Considerations for cache vs SSD may be similar 61 ©2011 Quest Software, Inc. All rights reserved..
  • Summary 62 ©2011 Quest Software, Inc. All rights reserved..
  • Recommendations• Don’t wait for SSD to become as cheap as HDD •  Magnetic HDD will always be cheaper per GB, SSD cheaper per IO• Consider a mixed or tiered storage strategy •  Using DB flash cache, selective SSD tablespaces or partitions •  Use SSD where your IO bottleneck is greatest and SSD advantage is significant• DB flash cache offers an easy way to leverage SSD for OLTP workloads, but has few advantages for OLAP or Data Warehouse 63 ©2011 Quest Software, Inc. All rights reserved..
  • How to use SSD• Database flash cache •  If your bottleneck is single block (indexed reads) and you are on OEL or Solaris 11GR2• Flash tablespace •  Optimize read/writes against “hot” segments or partitions• Flash temp tablespace •  If multi-pass disk sorts or hash joins are your bottleneck• FusionIO direct cache •  If you want to optimize both scans and index reads OR you are not on OEL/Solaris 11GR2 64 ©2011 Quest Software, Inc. All rights reserved.. 64
  • 65©2011 Quest Software, Inc. All rights reserved..
  • 66©2011 Quest Software, Inc. All rights reserved..
  • References•  Latest version of this presentation: http://www.slideshare.net/gharriso/ssd-and-the-db-flash-cache•  Guy Harrison blog (guyharrison.net) postings: •  All blog posts: •  http://guyharrison.squarespace.com/blog/tag/ssd •  SSD guiide (work in progress): •  http://guyharrison.squarespace.com/ssdguide/•  Kevin Closson: •  http://kevinclosson.wordpress.com/2009/12/15/pardon-me-where-is-that-flash-cache-part-ii/•  General articles on SSD: •  http://www.anandtech.com/storage/showdoc.aspx?i=3631 •  http://en.wikipedia.org/wiki/Flash_memory •  http://www.virident.com/downloads/Virident_Sustained_Performance_Whitepaper.pdf 67 ©2011 Quest Software, Inc. All rights reserved..