• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Making the most of ssd in oracle11g
 

Making the most of ssd in oracle11g

on

  • 482 views

Presentation given at IOUG Collaborate 2013

Presentation given at IOUG Collaborate 2013

Statistics

Views

Total Views
482
Views on SlideShare
482
Embed Views
0

Actions

Likes
1
Downloads
25
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • The higher logical read rate overwhelms the HDD

Making the most of ssd in oracle11g Making the most of ssd in oracle11g Presentation Transcript

  • Making the most of Solid State Disk in Oracle Guy Harrison VP R&D Database Management © 2012 Quest Software Inc. All rights reserved.
  • Making the most of Solid StateDisk in Oracle 11gGuy HarrisonExecutiveDirector, R&DBusiness IntelligenceSoftware
  • Introductions www.guyharrison.net guy.harrison@quest.com http://twitter.com/guyharrison © 2012 Quest Software Inc. All rights reserved. Pg. 3
  • © 2012 Quest Software Inc. All rights reserved. Pg. 4
  • © 2012 Quest Software Inc. All rights reserved. Pg. 5
  • © 2012 Quest Software Inc. All rights reserved. 6 Pg. 6
  • © 2012 Quest Software Inc. All rights reserved. Pg. 7
  • Agenda • Brief History of Magnetic Disk • Solid State Disk (SSD) technologies • SSD internals • Oracle DB flash cache architecture • Performance comparisons • SSD in Exadata • Recommendations and Suggestions © 2012 Quest Software Inc. All rights reserved. Pg. 8
  • A brief history of disk © 2012 Quest Software Inc. All rights reserved. Pg. 9
  • 5MB HDD circa 1956 © 2012 Quest Software Inc. All rights reserved. Pg. 10
  • 28MB HDD - 19611800 RPM © 2012 Quest Software Inc. All rights reserved. Pg. 11
  • The more that things change.... © 2012 Quest Software Inc. All rights reserved. Pg. 12
  • Moore‟s law • Transistor density doubles every 18 months • Exponential growth is observed in most electronic components: −CPU clock speeds −RAM −Hard Disk Drive storage density • But not in mechanical components −Service time (Seek latency) – limited by actuator arm speed and disk circumference −Throughput (rotational latency) – limited by speed of rotation, circumference and data density © 2012 Quest Software Inc. All rights reserved. Pg. 13
  • © 2012 Quest Software Inc. All rights reserved.
  • Solid State Disk © 2012 Quest Software Inc. All rights reserved. Pg. 15
  • SSD to the rescue? SSD DDR-RAM 15 SSD PCI flash 25 SSD SATA Flash 80 Magnetic Disk 4,000 0 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 4,500 Seek time (us)
  • Economics of SSD $/GB 0.00 10.00 20.00 30.00 40.00 50.00 60.00 0.06 FusionIO PCI SLC SSD 53.44 0.06 FusionIO PCI MLC Duo SSD 24.92 0.05 Intel SLC SATA SSD 21.88 $/IOP 0.05 $/GB Intel MLC SATA SSD 6.88 Seagate SAS HDD 1.00 1.53 Seagate SATA HDD 0.09 2.38 0.00 0.50 1.00 1.50 2.00 2.50 $/IOP
  • Tiered storage management Main Memory DDR SSD Flash SSD $/IOP $/GB Fast Disk (SAS, RAID 0+1) Slow Disk (SATA, RAID 5) Tape, Flat Files, Hadoop
  • 12c Automatic Data Placement • Segment on SSD tablespace ALTER TABLE …. ADD ILM Active POLICY • Segment on SAS tablespace TIER TO …. Tablespace AFTER 6 Frequent Access months of no access • OLTP compression on SATA Occassional Access tablespace COMPRESS FOR QUERY LOW AFTER 12 months of no access • Archive Compressed on RAID5 Dormant SATA
  • SSD technology and internals © 2012 Quest Software Inc. All rights reserved. Pg. 20
  • Flavours of Flash SSD  DDR RAM Drive  SATA flash drive  PCI flash drive  SSD storage Server
  • PCI SSD vs SATA SSD  PCI vs SATA − SATA was designed for traditional disk drives with high latencies − PCI is designed for high speed devices − PCI SSD has latency ~ 1/3rd of SATA
  • 23
  • Dell Express Flash Higher performance, durabili ty, flexibility Up to 1000x more IOPs than traditional HDD Front loading, hot swappable Maximum read and write lifespan Improves workload processing Enhances virtual environments Express Flash PCIe-SSD Drives The power to do more
  • PCIe SSD Solution Framework PCIe SSS 0 PCIe SSS 1 PCIe SSS 2 PCIe SSS 3 SAS 0 SAS 1 SAS 2 SAS 3 SAS 4 SAS 5 SAS 6 SAS 7 Media Bay & Control Panel Extender Card (Adapter FF) − Frees Up Valuable PCIe Slot Real Estate − 1X16 Slot Supports 4X4 PCIe SSD Backplane − Modular, scalable 4x Drives w/ BP − Storage Device − HDD form factor – fits in 2.5” carrier − SLC: 175G & 350G
  • Flash SSD Technology Storage Hierarchy: • Cell: One (SLC) or Two (MLC) bits • Page: Typically 4K • Block: Typically 128-512K Writes: • Read and first write require single page IO • Overwriting a page requires an erase & overwrite of the block Write endurance: • 100,000 erase cycles for SLC before failure • 5,000 – 10,000 erase cycles for MLC © 2012 Quest Software Inc. All rights reserved. Pg. 26
  • © 2012 Quest Software Inc. All rights reserved.
  • Flash Disk write degradation Empty Partially Full All Blocks empty: Write time=250 us 25% part full: − Write time= ( ¾ * 250 us + 1/4 * 2000 us) = 687 us 75% part full − Write time = ( ¼ * 250 us + ¾ * 2000 us ) = 1562 us
  • Data Insert Free Block Pool Insert SSD Controller Used Block Pool Empty Data Page Valid Data Page InValid Data Page
  • Free Block PoolData Update Update SSD Controller Used Block Pool Empty Data Page Valid Data Page Invalid Data Page
  • Free Block PoolGarbage Collection SSD Controller Used Block Pool Empty Data Page Valid Data Page Invalid Data Page
  • © 2012 Quest Software Inc. All rights reserved. Pg. 32
  • 11g DB flash Cache © 2012 Quest Software Inc. All rights reserved. Pg. 33
  • Oracle DB flash cache• Introduced in 11gR2 for OEL and Solaris only• Secondary cache maintained by the DBWR, but only when idle cycles permit• Architecture is tolerant of poor flash write performance © 2012 Quest Software Inc. All rights reserved. Pg. 34
  • Buffer cache and Free buffer waits Buffer Read from buffer cache Free cache Oracle process Buffer Write to buffer cache Waits Free buffer waits often occur when DBWR reads are much faster than writes.... Read from disk Database files Write dirty blocks to disk
  • Flash Cache Buffer Read from buffer cache cacheOracle process Write to buffer cache Read from flash cache Flash Cache DBWR Write clean blocks (time permitting) DB Flash cache architecture is designed to accelerate buffered reads Read from disk Write dirty blocks to disk Database files
  • Configuration • Create filesystem from flash device • Set DB_FLASH_CACHE_FILE and DB_FLASH_CACHE_SIZE. • Consider Filesystemio_options=setall © 2012 Quest Software Inc. All rights reserved. Pg. 37
  • Flash KEEP pool • You can prioritise blocks for important objects using the FLASH_CACHE clause: © 2012 Quest Software Inc. All rights reserved. Pg. 38
  • Oracle Db flash cache statistics http://guyharrison.squarespace.com/storage/flash_insert_stats.sql © 2012 Quest Software Inc. All rights reserved. Pg. 39
  • Flash Cache Efficiency http://guyharrison.squarespace.com/storage/flash_time_savings.sql
  • Flash cache Contents http://guyharrison.squarespace.com/storage/flashContents.sql
  • Performance tests © 2012 Quest Software Inc. All rights reserved. Pg. 42
  • Test systems •Low end system: −Dell Optiplex dual-core 4GB RAM −2xSeagate 7500RPM Baracuda SATA HDD −Intel X-25E SLC SATA SSD •Higher end system: −Dell R510 2xquad core, 32 GB RAM −4x300GB 15K RPM,6Gbps Dell SAS HDD −1xFusionIO ioDrive SLC PCI SSD © 2012 Quest Software Inc. All rights reserved. Pg. 43
  • © 2012 Quest Software Inc. All rights reserved.
  • © 2012 Quest Software Inc. All rights reserved.
  • © 2012 Quest Software Inc. All rights reserved.
  • © 2012 Quest Software Inc. All rights reserved.
  • Buffer Cache bottlenecks • Flash cache architecture avoids „free buffer waits‟ due to flash IO, but write complete waits can still occur on hot blocks. • Free buffer waits are still likely against the database files, due to high physical read rates created by the flash cache © 2012 Quest Software Inc. All rights reserved. Pg. 48
  • © 2012 Quest Software Inc. All rights reserved.
  • Sorting – what we expect Multi-pass Disk SortTime Single Pass Disk Sort Memory Sort PGA Memory available (MB) Table/Index IO CPU Time Temp Segment IO 50 50
  • Disk Sorts – temporary tablespace 4000 3500 3000 Multi-pass Disk Sort 2500 Elapsed time (s) 2000 1500 Single Pass 1000 Disk Sort 500 0300 250 200 150 100 50 0 Sort Area Size SAS based TTS SSD based TTS 51
  • Redo performance – Fusion IOFlash based redo log 291.93 CPU Log IOSAS based redo log 292.39 0 50 100 150 200 250 300 350 Elapsed time (s)
  • Concurrent redo workload (x10)Flash based redo log 1,637 331 1,681 CPU Other Log File IOSAS based redo log 1,605 397 1,944 0 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 4,500 Elapsed time (s) 53 53
  • Write degradation • In theory, high sustained write IO can lead to SSD degradation when GC fails to cope with the block erase/update cycle • In practice, this is rarely noticeable from Oracle: −Oracle write IO is largely asynchronous (DBWR) −Almost all write activity has at least an equal amount of read activity − Garbage collection and wear levelling algorithms are sophisticated in decent SSD drives © 2012 Quest Software Inc. All rights reserved. Pg. 54
  • © 2012 Quest Software Inc. All rights reserved. Pg. 55
  • © 2012 Quest Software Inc. All rights reserved. Pg. 56
  • OS level direct cache File System/ Raw File System/ Raw Devices/ ASM Devices/ ASM •Temp Tablespace Caching Block Device • Hot Segments Regular Block Device Read- • Hot Partitions directCache intensive, pot • DB Flash entially Cache ioMemory VSL ioMemory VSL massive tablespaces (limited to the size of the SSD) LUN
  • © 2012 Quest Software Inc. All rights reserved.
  • Exadata© 2012 Quest Software Inc. All rights reserved. 59 Pg. 59
  • Exadata flash storage • 4x96GB PCI Flash drives on each storage server (X2) • Flash can be configured as: − Exadata Smart Flash Cache (ESFC) − Solid State Disk available to ASM disk groups • ESFC is not the same as the DB flash cache: − Maintained by cellsrv, not DBWR − DOES support full table scans − DOES NOT support smart scans − Unless CELL_FLASH_CACHE= KEEP, − Statistics accessed via the cellcli program • Considerations for cache vs. SSD are similar © 2012 Quest Software Inc. All rights reserved. Pg. 61
  • © 2012 Quest Software Inc. All rights reserved.
  • © 2012 Quest Software Inc. All rights reserved.
  • © 2012 Quest Software Inc. All rights reserved.
  • Exadata Smart FlashLog © 2012 Quest Software Inc. All rights reserved. Pg. 65
  • Smart Flash Log Designed to reduce “outlier” redo log sync waits DB Node Redo is written simultaneously to disk and flash CellSrv First write to complete wins Introduced in Exadata storage software 11.2.2.4 Disk Flash © 2012 Quest Software Inc. All rights reserved. Pg. 66
  • All Redo log writes (16M log writes) Flash Min Median Mean 99% Max Log ON 1.0 650 723 1,656 75,740 OFF 1.0 627 878 4,662 291,800 © 2012 Quest Software Inc. All rights reserved. Pg. 67
  • Redo log outliersWAIT #47124064145648: nam=log file sync ela= 710 buffer#=129938 sync scn=1266588258 p3=0 obj#=-1 tim=1347583167579790WAIT #47124064145648: nam=log file sync ela= 733 buffer#=130039 sync scn=1266588297 p3=0 obj#=-1 tim=1347583167580808WAIT #47124064145648: nam=log file sync ela= 621 buffer#=130124 sync scn=1266588332 p3=0 obj#=-1 tim=1347583167581695WAIT #47124064145648: nam=log file sync ela= 507 buffer#=130231 sync scn=1266588371 p3=0 obj#=-1 tim=1347583167582486WAIT #47124064145648: nam=log file sync ela= 683 buffer#=101549 sync scn=1266588404 p3=0 obj#=-1 tim=1347583167583398WAIT #47124064145648: nam=log file sync ela= 2084 buffer#=130410 sync scn=1266588442 p3=0 obj#=-1 tim=1347583167585748WAIT #47124064145648: nam=log file sync ela= 798 buffer#=130535 sync scn=1266588488 p3=0 obj#=-1 tim=1347583167586864WAIT #47124064145648: nam=log file sync ela= 1043 buffer#=101808 sync scn=1266588527 p3=0 obj#=-1 tim=1347583167588250WAIT #47124064145648: nam=log file sync ela= 2394 buffer#=130714 sync scn=1266588560 p3=0 obj#=-1 tim=1347583167590888WAIT #47124064145648: nam=log file sync ela= 932 buffer#=101989 sync scn=1266588598 p3=0 obj#=-1 tim=1347583167592057WAIT #47124064145648: nam=log file sync ela= 291780 buffer#=102074 sync scn=1266588637 p3=0 obj#=-1 tim=1347583167884090WAIT #47124064145648: nam=log file sync ela= 671 buffer#=102196 sync scn=1266588697 p3=0 obj#=-1 tim=1347583167885294WAIT #47124064145648: nam=log file sync ela= 957 buffer#=102294 sync scn=1266588730 p3=0 obj#=-1 tim=1347583167886575WAIT #47124064145648: nam=log file sync ela= 852 buffer#=120 sync scn=1266588778 p3=0 obj#=-1 tim=1347583167887763WAIT #47124064145648: nam=log file sync ela= 639 buffer#=214 sync scn=1266588826 p3=0 obj#=-1 tim=1347583167888778WAIT #47124064145648: nam=log file sync ela= 699 buffer#=300 sync scn=1266588853 p3=0 obj#=-1 tim=1347583167889767WAIT #47124064145648: nam=log file sync ela= 819 buffer#=102647 sync scn=1266588886 p3=0 obj#=-1 tim=1347583167890829 © 2012 Quest Software Inc. All rights reserved. Pg. 68
  • Top 10,000 waits © 2012 Quest Software Inc. All rights reserved. Pg. 69
  • Exadata 12c Smart Flash Cache Write-back  Database writes go to flash cache − LRU aging to mag disk − Reads serviced by flash prior to age out − Similar restrictions to flach cache (smart scans, etc)  Performance claims: − On X3 system, 1M random single block write IOPS − On X3 system, 1.5M random single block read IOPS − 500K write IOPS claimed for Exadata V2 − Note: random IO writes are less problematic for flash than sequential writes. © 2012 Quest Software Inc. All rights reserved. Pg. 70
  • Summary© 2012 Quest Software Inc. All rights reserved. Pg. 71
  • Recommendations • Don‟t wait for SSD to become as cheap as HDD −Magnetic HDD will always be cheaper per GB, SSD cheaper per IO −Oracle‟s Exadata strategy suggests that the era of SSD for Oracle is here • Consider a mixed or tiered storage strategy −Using DB flash cache, selective SSD tablespaces or partitions −Use SSD where your IO bottleneck is greatest and SSD advantage is significant −Oracle 12c ILM offers a perfect solution for data tablespaces • DB flash cache offers an easy way to leverage SSD for OLTP workloads, but has fewer advantages for OLAP or Data Warehouse © 2012 Quest Software Inc. All rights reserved. Pg. 72
  • How to use SSD • Database flash cache −If your bottleneck is single block (indexed reads) and you are on OEL or Solaris 11GR2 • Flash tablespace −Optimize read/writes against “hot” segments or partitions • Flash temp tablespace −If multi-pass disk sorts or hash joins are your bottleneck • Exadata (esp X3) −Provides most of the advantages without much setup • 12c ILM −Should provide an ideal solution for DIY tiering © 2012 Quest Software Inc. All rights reserved. 73 Pg. 73
  • References • Latest version of this presentation: http://www.slideshare.net/gharriso/ssd-and-the-db-flash-cache • Quest whitepaper: • http://www.quest.com/documents/landing.aspx?id=15423 • Guy‟s SSD guide • http://guyharrison.squarespace.com/ssdguide/ © 2012 Quest Software Inc. All rights reserved. Pg. 74
  • © 2012 Quest Software Inc. All rights reserved. Pg. 75
  • © 2012 Quest Software Inc. All rights reserved. Pg. 76
  • © 2012 Quest Software Inc. All rights reserved. Pg. 77
  • Thank You guy.harrison@quest.com www.guyharrison.net @guyharrison© 2012 Quest Software Inc. All rights reserved. Pg. 78