WelcomeBriForum   |   © TechTarget
How to Fail at VDIDan Brinkmann @dbrinkmannblog.danbrinkmann.comSolutions Architect, VMware vExpertLewan & Associates (Den...
“What business problem           are we solving?”BriForum   |   © TechTarget
Business/Expectation VDI Failures●    No business problem●    Desktop virtualization is not server virtualization●    Savi...
AgendaTo understand what causes VDI failures● Compute● Storage● GuessingBriForum   |   © TechTarget              5
How to Fail at VDIThe technology failure points●    Test with 5 users●    Using vendor provided users/core sizing●    Usin...
ComputeIt’s magic until it stops working●    Multi-threaded apps●    Latency sensitive workloads●    Hyperthreading●    La...
ComputeCPU scheduler in vSphere● CPU scheduler in vSphere is entitlement/consumption  based, not priority (unlike Windows)...
Compute with a Physical PC                              OS/Apps/Profil                                   e                ...
Compute with Citrix XenApp                              OS/Apps/Pr     OS/Apps/Pr    OS/Apps/Pr     OS/Apps/Pr            ...
Compute with VDI                              CPU 1   CPU 2BriForum   |   © TechTarget                   11
vSphere ComputeThis is poor performance monitoringBriForum   |   © TechTarget           12
vSphere ComputeThis is better performance monitoring - ESXTOPDisplay Metric                Threshold   Explanation        ...
vSphere ComputeBriForum   |   © TechTarget   14
vSphere Compute%CSTP probably driving %RDY valuesBriForum   |   © TechTarget          15
vSphere ComputeNow with fewer vCPU’sBriForum   |   © TechTarget   16
Summary on Compute●    Multithreading, vSMP●    Not priority based●    % Utilization is not the complete picture●    Laten...
StorageThe wrath of the math● #1 cause of performance issues in server virtualization● #1 cause of performance issues in d...
What You Need to Know●    Capacity vs performance●    Random vs sequential●    Average vs peak●    Where it’s coming from●...
StorageSpinning disk   Device                     Type   IOPS   7,200 rpm SATA drives      HDD    ~75-100 IOPS   10,000 rp...
RAID Penalty   RAID level                 Read   Write   RAID 0                     1      1   RAID 1 and 10              ...
The Math – RAID 5 50/50Some back of the napkin math● 500 users, Windows 7, 20 IOPs avg, 50/50 read/write  RAID 5● 500 * 20...
The Math – RAID 10 50/50Some back of the napkin math● 500 users, Windows 7, 20 IOPs avg, 50/50 read/write  RAID 10● 500 * ...
The Math – RAID 10 20/80Some back of the napkin math● 500 users, Windows 7, 20 IOPs avg, 20/80 read/write  RAID 10● 500 * ...
vSphere Storage Latency                              Application                                                          ...
vSphere StoragePerformance monitoring for storageDisplay              Metric   Threshold                      Explanation ...
Building for Read IOPsFairly easy● Memory - Storage controller cache, PVS● Host/Hypervisor - CBRC, Intellicache● Storage -...
Building for Write IOPsMuch harder…and expensive●    Profiles/Apps●    Spinning disk●    SSD tiering●    Local disk●    IO...
Storage Summary●    25,000 IOPs R5 50/50 – 125 spindles●    15,000 IOPs R10 50/50 – 75 spindles●    18,000 IOPs R10 20/80 ...
How does this relate to VDI failure?●    Pilot performance is great, then terrible in production●    Boot storm vs login s...
GuessingYou need to use tools to do this●    Initial sizing●    Determine peaks and when●    Baseline application impact● ...
Project testingGood to know what you are and aren’t doing●    Unit/system testing●    Application testing●    Performance/...
Summary● Understand your limited resources (compute/storage)● Don’t guess● 5 users = what kind of testing, what are you re...
Upcoming SlideShare
Loading in...5
×

How to Fail at VDI

2,927

Published on

BriForum London 2012

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
2,927
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
75
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

How to Fail at VDI

  1. 1. WelcomeBriForum | © TechTarget
  2. 2. How to Fail at VDIDan Brinkmann @dbrinkmannblog.danbrinkmann.comSolutions Architect, VMware vExpertLewan & Associates (Denver, CO)BriForum | © TechTarget
  3. 3. “What business problem are we solving?”BriForum | © TechTarget
  4. 4. Business/Expectation VDI Failures● No business problem● Desktop virtualization is not server virtualization● Saving money● Project in the hands of the vSphere administrator● No success criteria● Assume you know what users do● The same or better experience remotely as locallyBriForum | © TechTarget 4
  5. 5. AgendaTo understand what causes VDI failures● Compute● Storage● GuessingBriForum | © TechTarget 5
  6. 6. How to Fail at VDIThe technology failure points● Test with 5 users● Using vendor provided users/core sizing● Using vendor provided IOPs estimates● Ignore anti-virus● Ignore user profile management● Use existing desktop images for physcial PC’s● GuessBriForum | © TechTarget 6
  7. 7. ComputeIt’s magic until it stops working● Multi-threaded apps● Latency sensitive workloads● Hyperthreading● Latency = HealthBriForum | © TechTarget 7
  8. 8. ComputeCPU scheduler in vSphere● CPU scheduler in vSphere is entitlement/consumption based, not priority (unlike Windows)● There is no priority in the CPU scheduler● Given equal entitlement the more a vm/world consumes the more likely it is to be prempted by another vm/world● http://www.vmware.com/resources/techresources/10131BriForum | © TechTarget 8
  9. 9. Compute with a Physical PC OS/Apps/Profil e CPU 1BriForum | © TechTarget 9
  10. 10. Compute with Citrix XenApp OS/Apps/Pr OS/Apps/Pr OS/Apps/Pr OS/Apps/Pr OS/Apps/ ofile OS/Apps/ ofile OS/Apps/ ofile OS/Apps/ ofile Profile Profile Profile Profile CPU 1 CPU 2BriForum | © TechTarget 10
  11. 11. Compute with VDI CPU 1 CPU 2BriForum | © TechTarget 11
  12. 12. vSphere ComputeThis is poor performance monitoringBriForum | © TechTarget 12
  13. 13. vSphere ComputeThis is better performance monitoring - ESXTOPDisplay Metric Threshold Explanation Overprovisioning of vCPUs, excessive usage of vSMP or a limit(check CPU %RDY 10 %MLMTD) has been set. Excessive usage of vSMP. Decrease amount of vCPUs for this CPU %CSTP 3 particular VM. This should lead to increased scheduling opportunities. The percentage of time spent by system services on behalf of the CPU %SYS 20 world. Most likely caused by high IO VM. Check other metrics and VM for possible root cause The percentage of time the vCPU was ready to run but deliberately wasn’t scheduled because that would violate the “CPU limit” CPU %MLMTD 0 settings. If larger than 0 the world is being throttled due to the limit on CPU. VM waiting on swapped pages to be read from disk. Possible cause: CPU %SWPWT 5 Memory overcommitment.BriForum | © TechTarget 13
  14. 14. vSphere ComputeBriForum | © TechTarget 14
  15. 15. vSphere Compute%CSTP probably driving %RDY valuesBriForum | © TechTarget 15
  16. 16. vSphere ComputeNow with fewer vCPU’sBriForum | © TechTarget 16
  17. 17. Summary on Compute● Multithreading, vSMP● Not priority based● % Utilization is not the complete picture● Latency = Health● http://kb.vmware.com/selfservice/microsites/search.do?la nguage=en_US&cmd=displayKC&externalId=1017926BriForum | © TechTarget 17
  18. 18. StorageThe wrath of the math● #1 cause of performance issues in server virtualization● #1 cause of performance issues in desktop virtualization● Latency = Health - 20ms - in trouble - 50ms - your users hate youBriForum | © TechTarget 18
  19. 19. What You Need to Know● Capacity vs performance● Random vs sequential● Average vs peak● Where it’s coming from● Most are guessingBriForum | © TechTarget 19
  20. 20. StorageSpinning disk Device Type IOPS 7,200 rpm SATA drives HDD ~75-100 IOPS 10,000 rpm SATA drives HDD ~125-150 IOPS 10,000 rpm SAS drives HDD ~140 IOPS 15,000 rpm SAS drives HDD ~175-210 IOPSBriForum | © TechTarget 20
  21. 21. RAID Penalty RAID level Read Write RAID 0 1 1 RAID 1 and 10 1 2 RAID 5 1 4 RAID 6 1 6BriForum | © TechTarget 21
  22. 22. The Math – RAID 5 50/50Some back of the napkin math● 500 users, Windows 7, 20 IOPs avg, 50/50 read/write RAID 5● 500 * 20 = 10,000 IOPs – 5,000 read, 5,000 write● 5,000 write * 4 = 20,000 + 5,000 read = 25,000 IOPs● 25,000 IOPs on 15K spindles (200 IOPS) = 125 spindlesBriForum | © TechTarget 22
  23. 23. The Math – RAID 10 50/50Some back of the napkin math● 500 users, Windows 7, 20 IOPs avg, 50/50 read/write RAID 10● 500 * 20 = 10,000 IOPs – 5,000 read, 5,000 write● 5,000 write * 2 = 10,000 + 5,000 read = 15,000 IOPs● 15,000 IOPs on 15K spindles (200 IOPS) = 75 spindlesBriForum | © TechTarget 23
  24. 24. The Math – RAID 10 20/80Some back of the napkin math● 500 users, Windows 7, 20 IOPs avg, 20/80 read/write RAID 10● 500 * 20 = 10,000 IOPs – 2,000 read, 8,000 write● 8,000 write * 2 = 16,000 + 2,000 read = 18,000 IOPs● 18,000 IOPs on 15K spindles (200 IOPS) = 90 spindlesBriForum | © TechTarget 24
  25. 25. vSphere Storage Latency Application A Application Latency Filesystem Guest I/O Drivers R R = Physical Disk “Disk Secs/Transfer” Device Queue S G = Guest Latency K G K = ESX Kernel Virtual SCSI VMkernel Filesystem D D = Device LatencyBriForum | © TechTarget 25
  26. 26. vSphere StoragePerformance monitoring for storageDisplay Metric Threshold Explanation Look at “DAVG” and “KAVG” as the sum of both is DISK GAVG 20 GAVG. DISK DAVG 20 Disk latency most likely to be caused by array. Disk latency caused by the VMkernel, high KAVG DISK KAVG 2 usually means queuing. Check “QUED”. Queue maxed out. Possibly queue depth set to low. DISK QUED 1 Check with array vendor for optimal queue depth value. Aborts issued by guest(VM) because storage is not DISK ABRTS/s 1 responding. Can be caused when paths failed. DISK RESETS/s 1 The number of commands reset per second. SCSI Reservation Conflicts per second. Can be DISK CONS/s 20 caused by too many VMDKs on a datastore.BriForum | © TechTarget 26
  27. 27. Building for Read IOPsFairly easy● Memory - Storage controller cache, PVS● Host/Hypervisor - CBRC, Intellicache● Storage - SSD tiering / flash cacheBriForum | © TechTarget 27
  28. 28. Building for Write IOPsMuch harder…and expensive● Profiles/Apps● Spinning disk● SSD tiering● Local disk● IO optimization (dedupe, serializing IO)BriForum | © TechTarget 28
  29. 29. Storage Summary● 25,000 IOPs R5 50/50 – 125 spindles● 15,000 IOPs R10 50/50 – 75 spindles● 18,000 IOPs R10 20/80 – 90 spindles● Latency is the key metric● Write IOPs & things that cause it is #1 focusBriForum | © TechTarget 29
  30. 30. How does this relate to VDI failure?● Pilot performance is great, then terrible in production● Boot storm vs login storm● Applications in gold image vs streamed● Read/write ratio is important● Anti-virus software● Existing desktop imagesBriForum | © TechTarget 30
  31. 31. GuessingYou need to use tools to do this● Initial sizing● Determine peaks and when● Baseline application impact● Monitor application impact over time● Application updates/changesBriForum | © TechTarget 31
  32. 32. Project testingGood to know what you are and aren’t doing● Unit/system testing● Application testing● Performance/scalability testing● Operational testing● User acceptance testingBriForum | © TechTarget 32
  33. 33. Summary● Understand your limited resources (compute/storage)● Don’t guess● 5 users = what kind of testing, what are you really accomplishing?BriForum | © TechTarget 33
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×