Your SlideShare is downloading. ×
VMworld 2013: Extreme Performance Series: Storage in a Flash
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

VMworld 2013: Extreme Performance Series: Storage in a Flash

1,097
views

Published on

VMworld 2013 …

VMworld 2013

Sankaran Sivathanu, VMware
Mark Achtemichuk, VMware

Learn more about VMworld and register at http://www.vmworld.com/index.jspa?src=socmed-vmworld-slideshare

Published in: Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,097
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Extreme Performance Series: Storage in a Flash Sankaran Sivathanu, VMware Mark Achtemichuk, VMware VSVC5603 #VSVC5603
  • 2. 2 Flash Storage  Flash is everywhere  Used extensively in smartphones, tablets, laptop computers, storage arrays, etc.  Adopting flash in enterprise servers? • Presents an economical alternative to having a storage array  How does VMware embrace flash technology in vSphere 5.5? • Native support for provisioning of flash resources • Flash caching support in ESXi storage stack • vSAN leverages flash storage for high performance  Focus: Application performance on vSphere 5.5 when leveraging flash
  • 3. 3 Agenda vSphere Flash Read Cache (vFRC) Virtual SAN (vSAN) Configurations and Troubleshooting
  • 4. 4 vFRC Overview
  • 5. 5 vFRC – Overview  vSphere 5.5 introduces vSphere Flash Infrastructure layer • Aggregates flash storage devices into a unified flash resource • Supports locally connected flash devices (PCIe, SAS/SATA drives, etc.,)  Flash resource can be used for read caching of VM I/Os • vSphere Flash Read Cache (vFRC)  Write Policy • Write-through : write I/Os are written to persistent storage and vFRC simultaneously • Large writes are filtered – avoids cache pollution with log/streaming data  Caches configured on per-VMDK basis • Can be custom configured based on workload
  • 6. 6 vFRC - Overview Flash Read Cache VMDKs VM Layer ESX Layer Storage Layer
  • 7. 7 vFRC Tunables
  • 8. 8 Performance Tunables in vFRC  What workloads can benefit from vFRC ? • Read-dominated I/O pattern • High repeated access of data (E.g. 20% of working set accessed 80% of time) • Sufficient flash capacity to hold data that is accessed repeatedly  What impacts vFRC performance ? • Cache Size – should be big enough to hold active working set of workload • Cache Block Size – should match the dominant I/O size of workload • Flash Device Types – PCIe flash cards vs. SSD drives
  • 9. 9 a. Cache Size  Cache sizes are specified manually when enabling vFRC for a VMDK • Depends on working set of the application • Should be sized to hold active working set  Inadequate cache sizes lead to increased cache miss rate  Over-sized cache leads to wastage of flash resources and sub- optimal performance during vMotion • By default cache is migrated during vMotion • Over-sized cache increases vMotion time  How to determine the right working set size? • vscsiStats workload tracing  Cache size can be modified at run-time if necessary
  • 10. 10 b. Cache Block Size  Basic unit of cache fill and cache eviction operation  Affects effective utilization of cache capacity  Bigger cache blocks lead to internal fragmentation, but consumes less memory  Smaller cache blocks consumes more memory (upto 2% memory space overhead)  Default cache block size is 8KB 0 0.5 1 1.5 2 2.5 4 8 16 32 64 128 256 512 MemoryConsumed/Cachesize(%) Cache Block Size (KB) Memory Overhead wrt. vFRC block size 0 300 600 900 1200 1500 4KB 8 KB 16KB 32KB 64KB 128KB 256KB 512KB Baseline (no vFRC) LatencyinMicroseconds Cache Block Size Performance Impact of Cache Block Size
  • 11. 11 b. Cache Block Size  Larger Cache Block Size (Example: 512KB cache block size for workload I/O Size of 8KB) – Internal Fragmentation vFRC Cache Blocks Valid Cached Data
  • 12. 12 c. Flash Device Type 30k – 40k Random Read IOPS 200 – 270 MB/s Read Bandwidth Read Latency – 75 microseconds Write Latency – 90 microseconds Upto 750k Random Read IOPS Upto 3 GB/s Read Bandwidth Read Latency – 75 microseconds Write Latency – 15 microseconds High Performance Low Cost VENDOR SPECIFICATIONS
  • 13. 13 vFRC Performance
  • 14. 14 vFRC Performance – Applications  What workloads can benefit from vFRC? • Read-dominated I/O pattern • High repeated access of data (E.g. 20% of working set accessed 80% of time) • Sufficient flash capacity to hold data that is accessed repeatedly  Applications Considered: • Data Warehousing (Swingbench DSS) • Database Transactions (DVDstore) • Real-World Enterprise Server Workloads (Publicly available I/O Traces)
  • 15. 15 1. Data Warehousing Application  Decision Support System [TPC-H]  Benchmark : Swingbench 2.4 using ‘Sales History’ Schema on Oracle 11g R2 database SWINGBENCH DSS BENCHMARK ON RHEL 6.4 VM QUERIES >> << RESULTS ORACLE 11G R2 ON WINDOWS 2008 SERVER VM vFRC EMC VNX 5700 1TB LUN, RAID5 OVER 5 FC 15k RPM HDDs
  • 16. 16 1. Data Warehousing Application  Workload: Read dominated, High re-access rate  vFRC Configuration: 8GB Cache Size and 8KB Cache block size 0 2000 4000 6000 8000 10000 12000 SRMC SCMC PSCR SMA PPSC TSQ SQC #oftransacons Transac on Type Transac on Count Baseline VFRC
  • 17. 17 1. Data Warehousing Application  Workload: Read dominated, High re-access rate  vFRC Configuration: 8GB Cache Size and 8KB Cache block size  Up to 84% improvement in average throughput  Up to 2X reduction in latency 61.7 112.9 0 20 40 60 80 100 120 Baseline VFRC TPM Transac ons Per Minute 20.389 10.859 0 5 10 15 20 25 Baseline VFRC ResponseTIme(s) Average Response Time
  • 18. 18 2. Database Transaction Application  Benchmark : DVDStore • Simulates online e-commerce site operations • Database : MS SQL Server 2008 • Database Size : 15 GB  Workload Characteristics • 60% reads • Mostly random I/Os • Predominant I/O size : 8KB  VM Configuration • 8 vCPUs, 8GB Memory • 25GB Database disk, 10GB Log disk  Storage Array • VNX 5700, 1TB LUN – RAID5 over 5 FC 15k RPM disk drives
  • 19. 19 2. Database Transaction Application 8802 8937 12319 0 2000 4000 6000 8000 10000 12000 14000 Baseline vFRC - 10GB vFRC - 15GB Orders Per Minute Up to 39% improvement in application throughput
  • 20. 20 3. Enterprise Server I/O Traces 1.23 0.321 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Baseline vFRC AverageLatency(ms)  a. Hardware Monitoring Server Workload • Trace from servers that logs data from multiple hardware monitoring programs across a datacenter • Collected at Microsoft Research, Cambridge* • Trace replayed using IOAnalyzer  95% reads  vFRC size – 4GB  vFRC block size – 4KB  vFRC hit percentage – 85% * Dushyanth Narayanan, Austin Donnelly, and Antony Rowstron. 2008. Write off-loading: Practical power management for enterprise storage. Trans. Storage 4, 3, Article 10 (November 2008).
  • 21. 21 3. Enterprise Server I/O Traces  b. Proxy Server Workload • Trace from a web proxy server • Collected at Microsoft Research, Cambridge* • Trace replayed using IOAnalyzer  67% reads  vFRC Size : 16GB  vFRC block Size : 4KB  vFRC hit percentage : 83% * Dushyanth Narayanan, Austin Donnelly, and Antony Rowstron. 2008. Write off-loading: Practical power management for enterprise storage. Trans. Storage 4, 3, Article 10 (November 2008). 1.357 0.612 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 Baseline vFRC AverageLatency(ms)
  • 22. 22 vMotion Performance with vFRC  vFRC is fully supported by vMotion and other vSphere features  vMotion behavior of vFRC-enabled VM • VM caches are migrated by default • Option to drop cache during vMotion  Migrating cache preserves application performance gains • Consumes more network bandwidth • Increased vMotion time  Dropping cache during vMotion leads to temporary dip in application performance gains • No extra overhead in vMotion • Re-warms up the cache at destination
  • 23. 23 vFRC Performance Best Practices
  • 24. 24 vFRC Configuration Guidelines  Cache size may be configured based on working set of workload • Start with about 20% of VMDK size, and monitor vFRC stats to re-configure it  Cache block size must match dominant I/O size of workload • Workload I/O size is not equal to VM I/O Size!  vFRC performs better with PCIe flash devices  Decide on cache migration behavior during vMotion based on: • Criticality of application performance • Time taken for vMotion • Network bandwidth availability
  • 25. 25 Making sense of vscsiStats and vFRC Stats  vscsiStats can be used to know more about the workload • IO Length Histogram • Read Write Ratio • I/O trace to compute working set size  vFRC stats provide information about cache effectiveness • numBlocks – total number of cache blocks for a VMDK • numBlocksCurrentlyCached – number of cache blocks that actually contains data • Evict:avgNumBlocksPerOp – average number of evictions • avgCacheLatency – average device latency of flash resource • maxCacheLatency – maximum device latency of flash resource • cacheHitPercentage – percentage of read cache hits
  • 26. 26 vFRC Sizing Decisions based on vFRC Stats  Using vFRC stats to make sizing decisions • numBlocksCurrentlyCached < numBlocks : cache size may be reduced • numBlocksCurrentlyCached = numBlocks and Evict:avgNumBlocksPerOp is high : cache size may be inadequate • maxCacheLatency is very high : may be because of spike in device latency, which may mean device has worn out • cacheHitPercentage is high and Evict:avgNumBlocksPerOp is low : means cache is correctly configured  For more detailed information, please refer to our performance whitepaper http://www.vmware.com/files/pdf/techpaper/vfrc-perf- vsphere55.pdf
  • 27. 27 Agenda vSphere Flash Read Cache (vFRC) Virtual SAN (vSAN) Configurations and Troubleshooting
  • 28. 28 Virtual SAN - Architecture  Each ESXi host contributes: • Flash storage to absorb IOPS • Hard disk drives to provide capacity  Virtual SAN aggregates these resources from multiple servers in a vSphere cluster • Provides a global datastore for VMs in the cluster  HA/DRS ensures that the VM restarts on a host crash  Virtual SAN objects can be split into multiple components for performance and data protection • Governed by storage policies ESX VSAN cluster ESX ESX VM virtual disk VSAN object replica-1 replica-2 Witness
  • 29. 29 Experiment Setup  Hardware • 16 core 2.9 GHz Dell R720 machines • 2 x Intel PCIe R910 SSD – 200GB (1 PCIe slot) • 12 x 10K RPM Seagate SAS disks • 10G vSAN dedicated network, 1G for VM network  VSAN Configuration • 2 x Disk groups per machine, 6 x Disks per disk group • hostFailuresToTolerate 1, stripeWidth 1  Workload Characteristics • ViewPlanner 3.0 Standard benchmark with 2 sec think-time (heavy user) • ViewPlanner Group A : CPU intensive & Group B: I/O intensive apps. • 1900 x 1200 resolution, PCoIP • Windows 7 desktop’s and Winxp Clients. • VDI workload is known to be CPU intensive but sensitive to I/O latency.
  • 30. 30 Virtual SAN Delivers IOPS Required by VDI • Virtual SAN can meet the IOPS required by VDI workload
  • 31. 31 Virtual SAN Scale.. 275 460 667 0 100 200 300 400 500 600 700 800 3 node 5 node 7 node NumberofHeavyVDIUsers Virtual SAN scale VSAN Linear (VSAN)
  • 32. 32 Group A Score Comparison 0 0.2 0.4 0.6 0.8 1 1.2 Avg Application Latency Group A VSAN SAN • Impact to Group A application latencies is marginal • Virtual SAN uses very few cycles of Host CPU.
  • 33. 33 Group B Score Comparison 0 1 2 3 4 5 6 7 Avg Application Latency Group B VSAN All-Flash-SAN • Group B application latencies are close to All-Flash-SAN • Virtual SAN can meet the IOPS required by VDI workload
  • 34. 34 VSAN VDI Consolidation compared to physical SAN array  VSAN performs better than a typical mid-range FC storage array • vSAN benefits from local flash storage that provide high performance  Impact of VSAN CPU consumption on application performance is low  Physical SAN array is not required to run VDI workload 0 100 200 300 400 500 600 700 800 3 5 7 NumberofVMs Number of Nodes (Servers) VDI Consolida on Ra o Mid-range FC Array vSAN All-Flash FC Array
  • 35. 35 Agenda vSphere Flash Read Cache (vFRC) Virtual SAN (vSAN) Configurations and Troubleshooting
  • 36. 36 Define the Performance Issue  Understand Application Function & Architecture • At a minimum know what your application does and what it’s dependent on  Select Application KPIs • Application performance must be measured using an application counter (tps, response time, etc.) and not virtual resource consumption  Define Success Criteria • With your app owner, define at what level the application KPI’s must be to consider it performant  Comparisons must be Apples-to-Apples • Any changes to infrastructure (physical or virtual) create comparison challenges  Now the Gap is Identified, Begin Troubleshooting • With an understanding of the requirements and current deficiency, you can now begin to investigate and/or tune
  • 37. 37 Disk I/O Latencies Application Guest OS ESX Storage Stack VMM Driver KAVG DAVG GAVG QAVG * KAVG = GAVG – DAVG Fabric vSCSI HBA Time spent in ESX storage stack is minimal, for all practical purposes KAVG ~= QAVG In a well configured system QAVG should be zero Array SP
  • 38. 38 Key Indicators – Investigative Thresholds  Kernel Latency Average (KAVG) • This counter tracks the latencies of IO passing thru the Kernel • Investigation Threshold: 1ms  Device Latency Average (DAVG) • This is the latency seen at the device driver level. It includes the roundtrip time between the HBA and the storage • Investigation Threshold: 15-20ms, lower is better, some spikes okay  Aborts (ABRT/s) • The number of commands aborted per second • Investigation Threshold: 1
  • 39. 39 Disk I/O Queues GQLEN – Guest Queue AQLEN – Adapter Queue WQLEN – World Queue DQLEN – Device / LUN Queue SQLEN – Array SP Queue DQLEN WQLEN SQLEN GQLEN DQLEN can change dynamically when SIOC is enabled Reported in esxtop AQLEN Application Guest OS ESX Storage Stack VMM Driver Fabric vSCSI HBA Array SP
  • 40. 40 Performance Technical Resources Performance Technical Papers • http://www.vmware.com/files/pdf/techpaper/vfrc-perf-vsphere55.pdf • http://www.vmware.com/resources/techresources/cat/91,96 Performance Best Practices • http://www.vmware.com/pdf/Perf_Best_Practices_vSphere4.0.pdf • http://www.vmware.com/pdf/Perf_Best_Practices_vSphere4.1.pdf • http://www.vmware.com/pdf/Perf_Best_Practices_vSphere5.0.pdf • http://www.vmware.com/pdf/Perf_Best_Practices_vSphere5.1.pdf  Troubleshooting Performance Related Problems in vSphere Environments • http://communities.vmware.com/docs/DOC-14905 (vSphere 4.1) • http://communities.vmware.com/docs/DOC-19166 (vSphere 5) • http://communities.vmware.com/docs/DOC-23094 (vSphere 5.x with vCOps)
  • 41. 41 Performance Community Resources Performance Technology Pages • http://www.vmware.com/technical-resources/performance/resources.html Technical Marketing Blog • http://blogs.vmware.com/vsphere/performance/ Performance Engineering Blog VROOM! • http://blogs.vmware.com/performance Performance Community Forum • http://communities.vmware.com/community/vmtn/general/performance Virtualizing Business Critical Applications • http://www.vmware.com/solutions/business-critical-apps/
  • 42. 42 Extreme Performance Series Sessions Extreme Performance Series: vCenter of the Universe – Session #VSVC5234 Monster Virtual Machines – Session # VSVC4811 Network Speed Ahead – Session # VSVC5596 Storage in a Flash – Session # VSVC5603 Big Data: Virtualized SAP HANA Performance, Scalability and Practices – Session # VAPP5591 Hands on Labs: HOL-SDC-1304 – Optimize vSphere Performance  includes vFRC
  • 43. 43 Other VMware Activities Related to This Session  HOL: HOL-SDC-1308 Virtual Storage Solutions  Group Discussions: VSVC1001-GD Performance with Mark Achtemichuk
  • 44. THANK YOU
  • 45. Extreme Performance Series: Storage in a Flash Sankaran Sivathanu, VMware Mark Achtemichuk, VMware VSVC5603 #VSVC5603