SlideShare a Scribd company logo
1 of 38
More IOPS Please
DRMC’s VMware View Implementation Using
              Nexenta

             Keith Brennan
             October 2011



                                          S
Delano Regional Medical
        Center

            S   156 bed community hospital in
                central California.

            S   Four satellite clinics.

            S   Only hospital in a 30 mile
                radius.

            S   Serves approximately 60,000
                people spread over several
                communities.

            S   80%+ of our patients are Medi-
                Cal or Medicare.
                S Government doesn’t pay
                   well.
DRMC’s Clinical Staff
The Great Directive of 2009


S Need to deploy 150 new desktops in support
  of a Clinical Documentation implementation.
S Do it as cheaply as possible.

S Oh, by the way, you’re losing an FTE due to
  budget cuts.
“Never let a good crisis go to
     waste.”                     –Rahm Emmanuel




S Used this “Opportunity” to justify moving to VDI.
  S Users resistant to using something other than a traditional
     desktop.
     S   Perceived lack of freedom.
     S   Perceived increase in “Big Brother.”

S Why I wanted the transition to VDI
  S Ease of management.
  S We had a set, well defined, integrated, desktop experience.
  S Wanted a way to deliver the same experience in a controlled
     manner to a myriad of devices. IOS, Android, etc.
I Need Storage!

S My Existing EMC CX500 was barely cutting it for 3 ESX
  hosts w/ a combined 32 VM’s.

S Lots of people on the Virtualization forums liked NetApp.

S NetApp had just published a white paper on a 750 View
  virtual desktop deployment on a FAS 2050a.
  S Near normal desktop load times.
  S Seamless user experience.
Well That’s Timely!


S The next week another vendor calls letting me know that
  IBM is running a huge storage sale.

S It includes their N series of network attached storage.
  S Rebadged NetApps.

S Three weeks later a N3600, a rebadged NetApp 2050a,
  arrives.

S It is setup identically to the VDI whitepaper’s setup.
Implementation Guidelines

S Linked clones are to be used whenever possible.
   S Ease of maintenance
   S Ease of provisioning

S No user data to be stored on the VM’s.

S Significant patching shall be done through the Golden Image
  and VM’s will be re-provisioned with using the updated image.

S AV will run on the VM’s but only in real-time scan mode. No
  scheduled system scans.
Initial Testing

S Two Hosts with 25 VM’s each.
   S One connected to the N3600 via ISCSI
   S The other via NFS.

S Test lab of 25 thin clients.

S Good performance.
   S Equivalent to a desktop of the previous generation.
   S Quick user logins due to the VM’s being always on and waiting.
   S The N3600 is maintaining low utilization.
   S NFS and ISCI exhibit similar speed.
Go Live!

S Five additional ESX Hosts are deployed.
  S Each hosts ~25 VM’s
  S Current setup gives me N+2 host redundancy.

S For the first week everything looks good.

S User complaints are primarily with the clinical application.

S N3600 is handling it well. Running at about 35%
  utilization.
  S ~1.5k IOPs/Sec of regular background chatter.
  S VM’s report average latency of 12ms.
Disaster!
            For me they happen seem to happen in threes.




S First AV engine update happens 1 week after go live.
  S AV server pushes it to all clients at once.
  S The simultaneous update of all the View VM’s forces the
    SAN to a crawl for 3 hours.
  S Users complain that the Virtual Desktops are unusable.
  S Temporarily corrected the problem by only allowing the AV
    to update 3 machines at once.
  S This worked like a champ until a dot version update on the
    AV server a month later broke that setting.
     S   Another 3 hour “downtime.”
Disaster (cont)


S Three days later a helpdesk tech forces the simultaneous
  reprovisioning of 60 of the View VM’s at once.
  S Was applying an application patch.
  S Was trained not to restart more than 5 VM’s at once.
     S   That obviously didn’t stick!
  S That was another hour of the SAN crawling.
  S Once again, users complain that the system was unusable
     during this time.
Disaster! (yet again)

S .net 3.5 service pack is approved for deployment.

S SP is large. >100mb.

S Set to deploy starting at 2am and only on restart.
  S At 04:15 four VM’s restart within one minute of each other.
  S N3600 starts to lag.
  S Users seeing their system running slow decide to restart.

S At 5am I get the call regarding the issue.

S I immediately disabled the SP deployment.
  S Still took an hour for the N3600 to catch up.
My Users Aren’t Happy
What’s Going On???


         S Oh $41+…
          S General use chatter is
            eating my bandwidth.
          S N3600 CPU utilization is
            regularly now above
            50%.
          S Disk utilization rarely
            drops below 40%.
          S Average disk latency
            >18ms.
I Have a Problem

S   I’m maxing performance with just day to day operations.
S IBM has verified that the appliance is functioning
    properly.
    S In other words, this is all I’m going to get out of it.
    S Adding disks might help some, but too costly!
      S Additional Tray would be $15k!
      S SAS drives to populate it are almost $1k each!
      S Still have CPU limitations.
      S NIC Limitations (2 – 1gbe links per head)

S Did I mention that I have no money left in the budget?
Nexenta to the Rescue


S Had just installed Nexenta Core for my home file server.

S Time to find some hardware:
  S Pulled a box out of the View cluster.
  S Installed six Intel SSD’s.
  S Installed Nexenta Core. (yeah, I know.. EULA..)
  S Created the volume and shared via NFS.
  S The next day my poor brain figured out that I could have just
      done a Nexenta VM. Doh!

S Over the next week I migrated half the virtual desktops over.
Its like Night and Day

S Average latency drops
  from 18ms to 2ms.
S Write throughput
  quadruples.
S Read throughput
  doubles.
S 20x improvement on 4k
  iops!
They Like me Again!
Time For a Full Nexenta
         Implementation

S I was able to secure $45k capital for the next year.
  S Normally this would just draw laughter when talking about
      storage.

S I also intend on replacing the existing EMC.
  S Annual maintenance too costly.
  S I despise the fact that I have to call them out every time I want
      to connect a new piece of hardware to it.

S Still some questioning from higher-ups on this whole open-
   storage thing.
Final Solution Hardware

S 2x Supermicro dual Xeon servers with 96gb ram.

S 1x DataOn 1600 JBOD
  S Houses twenty one 1tb nearline SAS drives.

S 1x DataOn 1620 JBOD
  S Houses seventeen 300gb 10k rpm SAS drives

S 2x Stec ZeusRam

S 8x 160gb Intel 320 SSD’s
Hardware Diagram
Why DataOn?


S Disk Shelf Manager
   S One thing Nexenta lacked
     was a way to monitor the
     JBoD’s
   S How could one of my techs
     know how which drive to
     pull?

S Intuitive slot lighting.

S They’re responsive even
   after the sale is made!
Why Nexenta?


S Its good to have on demand support.
  S I am the only member of our technical staff that has a basic
    understanding of storage architectures.
  S I like to have the ability to go on vacation from time to time!

S Its good to have experts for unique problems.

S Regular tested bug-fixes.

S Its always nice to have someone’s neck to wring!
The End Result

S 2ms latency.

S 500 mb/s reads

S 200 mb/s writes

S Happy Users!

S Note: Benchmark was
  done on production
  system with 175 active
  VM’s.
To Dedup or Not to Dedup


S Dedup can give you huge storage savings.
  S I had 14x Dedup ratio on my VDI volume.

S Inline dedup saves on disk write IO.
  S It’ll still hit the ZIL, but won’t be written to disk if it is
     determined to be duplicated data.
     S   Instead of a 4+kb write you get a sub 256 byte metadata write.
To Dedup or Not to Dedup

S Ram Hog!
  S For good performance you need enough ram to store the
      dedup table.
      S   Uses ARC for this, which means you will have less room for
          cached data.

S Potential for hash collision.
  S Odds are astronimcal, but still a chance for data corruption.

S Dedup performance penalty.
  S Small IOPS suffer.
Dedup Perfomance Penalty

 Dedup Enabled   No Dedup
Is Dedup Worth it?

S If you’re using a “Golden Image” - No.
  S VMDC Plugin provides great efficiency by only storing one
    copy of the Golden Image vs one for each pool of VM’s.
  S Compression is virtually free and will do a good job of
    making up the difference in the “new” blocks.
  S Disk is cheap.

S If you’re doing a bunch of P2V desktop migrations -
  Maybe.
  S If the desktops are poorly configured, or have other aspects
    that can cause excessive I/O than no.
  S If the desktops are similar and large, then sure.
Compression


S Use it. Unless you’re using a 5 year old processor, there
  will be no noticeable performance hit.
  S On by default in Nexenta 3.1
  S Compresses before write. Saves disk bandwidth!
Cache is Key!


S Between the the 70gb of arc and 640gb of l2arc the read cache
  is hit almost 98% of the time!




S This equates to sub 2ms average disk latency to the end user.

S Beats the crud out of the >15ms average latency of the N3600!

S Know your working set. You could get away with a lot
  smaller or need a lot larger cache.
Latency Under Stress
Gig-E vs TenGig-E
Gig-E vs TenGig-E


S Obvious differences in maximum throughput.

S Small IOP differences are mainly attributable to network
  latency differences.

S If you’re stuck with Gig-E go use 802.3ad trunk groups.
  S Still stuck with 100 mb/s throughput but no one ESX host
     will saturate the link for the rest.
Gig-E vs TenGig-E - User
            Perspective

S Average time from the “Power On VM” command being
    issued to the user is able to login:
    S 10gbe: 23 seconds
    S 1gbe: 32 seconds

S Time from when user presses “login” button until the
    desktop is ready to use:
    S 10gbe: 5 seconds
    S 1gbe: 9 seconds

*Windows 7, 2 procs, 2gb ram, DRMC’s Standard Clinical Image
Final Thought – All SSD
            Goodness

S For deployments of Linked Clones or VM’s off of a Golden
   Image.

S Allows you to get rid of the L2ARC.

S Use a good ZIL Device (STEC ZeusRam, DDRDrive)
  S Allows for sequential writes to the SSD’s in the pool.
      S   Saves on write wear which is a SSD killer.
          S My first test box with the x25m SSD’s started suffering after
             about 3 months.

S If you want HA you have to use SAS drives.
Takeaway:

Latency is
  Key!!
             S
keith@drmc.com
 661-721-5650
  Feel free to contact me.




                             S

More Related Content

Similar to OSS Presentation DRMC by Keith Brennan

Rails Conf Europe 2007 Notes
Rails Conf  Europe 2007  NotesRails Conf  Europe 2007  Notes
Rails Conf Europe 2007 Notes
Ross Lawley
 

Similar to OSS Presentation DRMC by Keith Brennan (20)

Mysql talk
Mysql talkMysql talk
Mysql talk
 
Rails Conf Europe 2007 Notes
Rails Conf  Europe 2007  NotesRails Conf  Europe 2007  Notes
Rails Conf Europe 2007 Notes
 
Good virtual machines
Good virtual machinesGood virtual machines
Good virtual machines
 
C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization
C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra OptimizationC* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization
C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization
 
WinConnections Spring, 2011 - 30 Bite-Sized Tips for Best vSphere and Hyper-V...
WinConnections Spring, 2011 - 30 Bite-Sized Tips for Best vSphere and Hyper-V...WinConnections Spring, 2011 - 30 Bite-Sized Tips for Best vSphere and Hyper-V...
WinConnections Spring, 2011 - 30 Bite-Sized Tips for Best vSphere and Hyper-V...
 
Development to Production with Sharded MongoDB Clusters
Development to Production with Sharded MongoDB ClustersDevelopment to Production with Sharded MongoDB Clusters
Development to Production with Sharded MongoDB Clusters
 
poster
posterposter
poster
 
Pl2017 High Availability in GCE
Pl2017 High Availability in GCEPl2017 High Availability in GCE
Pl2017 High Availability in GCE
 
High Availability in GCE
High Availability in GCEHigh Availability in GCE
High Availability in GCE
 
The Smug Mug Tale
The Smug Mug TaleThe Smug Mug Tale
The Smug Mug Tale
 
San presentation nov 2012 central pa
San presentation nov 2012 central paSan presentation nov 2012 central pa
San presentation nov 2012 central pa
 
SSD-Bondi.pptx
SSD-Bondi.pptxSSD-Bondi.pptx
SSD-Bondi.pptx
 
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
 
Get Your GeekOn With Ron - Session Two: Local Storage vs Centralized Storage ...
Get Your GeekOn With Ron - Session Two: Local Storage vs Centralized Storage ...Get Your GeekOn With Ron - Session Two: Local Storage vs Centralized Storage ...
Get Your GeekOn With Ron - Session Two: Local Storage vs Centralized Storage ...
 
Optimize Your Hardware for Drupal
Optimize Your Hardware for DrupalOptimize Your Hardware for Drupal
Optimize Your Hardware for Drupal
 
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
 
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
 
OpenNebulaConf2018 - How Inoreader Migrated from Bare-Metal Containers to Ope...
OpenNebulaConf2018 - How Inoreader Migrated from Bare-Metal Containers to Ope...OpenNebulaConf2018 - How Inoreader Migrated from Bare-Metal Containers to Ope...
OpenNebulaConf2018 - How Inoreader Migrated from Bare-Metal Containers to Ope...
 
Backup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
Backup management with Ceph Storage - Camilo Echevarne, Félix BarbeiraBackup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
Backup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
 
Implementing dr w. hyper v clustering
Implementing dr w. hyper v clusteringImplementing dr w. hyper v clustering
Implementing dr w. hyper v clustering
 

More from OpenStorageSummit

OSS Presentation Keynote by Jason Hoffman
OSS Presentation Keynote by Jason HoffmanOSS Presentation Keynote by Jason Hoffman
OSS Presentation Keynote by Jason Hoffman
OpenStorageSummit
 
OSS Presentation Keynote by Hal Stern
OSS Presentation Keynote by Hal SternOSS Presentation Keynote by Hal Stern
OSS Presentation Keynote by Hal Stern
OpenStorageSummit
 
OSS Presentation Keynote by Evan Powell
OSS Presentation Keynote by Evan PowellOSS Presentation Keynote by Evan Powell
OSS Presentation Keynote by Evan Powell
OpenStorageSummit
 
OSS Presentation by Kevin Halgren
OSS Presentation by Kevin HalgrenOSS Presentation by Kevin Halgren
OSS Presentation by Kevin Halgren
OpenStorageSummit
 
OSS Presentation by Stefano Maffulli
OSS Presentation by Stefano MaffulliOSS Presentation by Stefano Maffulli
OSS Presentation by Stefano Maffulli
OpenStorageSummit
 
OSS Presentation Metro Cluster by Andy Bennett & Roel De Frene
OSS Presentation Metro Cluster by Andy Bennett & Roel De FreneOSS Presentation Metro Cluster by Andy Bennett & Roel De Frene
OSS Presentation Metro Cluster by Andy Bennett & Roel De Frene
OpenStorageSummit
 
OSS Presentation DDR Drive ZIL Accelerator by Christopher George
OSS Presentation DDR Drive ZIL Accelerator by Christopher GeorgeOSS Presentation DDR Drive ZIL Accelerator by Christopher George
OSS Presentation DDR Drive ZIL Accelerator by Christopher George
OpenStorageSummit
 

More from OpenStorageSummit (8)

OSS Presentation Keynote by Jason Hoffman
OSS Presentation Keynote by Jason HoffmanOSS Presentation Keynote by Jason Hoffman
OSS Presentation Keynote by Jason Hoffman
 
OSS Presentation Keynote by Hal Stern
OSS Presentation Keynote by Hal SternOSS Presentation Keynote by Hal Stern
OSS Presentation Keynote by Hal Stern
 
OSS Presentation Keynote by Evan Powell
OSS Presentation Keynote by Evan PowellOSS Presentation Keynote by Evan Powell
OSS Presentation Keynote by Evan Powell
 
OSS Presentation by Kevin Halgren
OSS Presentation by Kevin HalgrenOSS Presentation by Kevin Halgren
OSS Presentation by Kevin Halgren
 
OSS Presentation by Stefano Maffulli
OSS Presentation by Stefano MaffulliOSS Presentation by Stefano Maffulli
OSS Presentation by Stefano Maffulli
 
OSS Presentation Arista
OSS Presentation AristaOSS Presentation Arista
OSS Presentation Arista
 
OSS Presentation Metro Cluster by Andy Bennett & Roel De Frene
OSS Presentation Metro Cluster by Andy Bennett & Roel De FreneOSS Presentation Metro Cluster by Andy Bennett & Roel De Frene
OSS Presentation Metro Cluster by Andy Bennett & Roel De Frene
 
OSS Presentation DDR Drive ZIL Accelerator by Christopher George
OSS Presentation DDR Drive ZIL Accelerator by Christopher GeorgeOSS Presentation DDR Drive ZIL Accelerator by Christopher George
OSS Presentation DDR Drive ZIL Accelerator by Christopher George
 

Recently uploaded

Recently uploaded (20)

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

OSS Presentation DRMC by Keith Brennan

  • 1. More IOPS Please DRMC’s VMware View Implementation Using Nexenta Keith Brennan October 2011 S
  • 2. Delano Regional Medical Center S 156 bed community hospital in central California. S Four satellite clinics. S Only hospital in a 30 mile radius. S Serves approximately 60,000 people spread over several communities. S 80%+ of our patients are Medi- Cal or Medicare. S Government doesn’t pay well.
  • 4. The Great Directive of 2009 S Need to deploy 150 new desktops in support of a Clinical Documentation implementation. S Do it as cheaply as possible. S Oh, by the way, you’re losing an FTE due to budget cuts.
  • 5. “Never let a good crisis go to waste.” –Rahm Emmanuel S Used this “Opportunity” to justify moving to VDI. S Users resistant to using something other than a traditional desktop. S Perceived lack of freedom. S Perceived increase in “Big Brother.” S Why I wanted the transition to VDI S Ease of management. S We had a set, well defined, integrated, desktop experience. S Wanted a way to deliver the same experience in a controlled manner to a myriad of devices. IOS, Android, etc.
  • 6. I Need Storage! S My Existing EMC CX500 was barely cutting it for 3 ESX hosts w/ a combined 32 VM’s. S Lots of people on the Virtualization forums liked NetApp. S NetApp had just published a white paper on a 750 View virtual desktop deployment on a FAS 2050a. S Near normal desktop load times. S Seamless user experience.
  • 7. Well That’s Timely! S The next week another vendor calls letting me know that IBM is running a huge storage sale. S It includes their N series of network attached storage. S Rebadged NetApps. S Three weeks later a N3600, a rebadged NetApp 2050a, arrives. S It is setup identically to the VDI whitepaper’s setup.
  • 8. Implementation Guidelines S Linked clones are to be used whenever possible. S Ease of maintenance S Ease of provisioning S No user data to be stored on the VM’s. S Significant patching shall be done through the Golden Image and VM’s will be re-provisioned with using the updated image. S AV will run on the VM’s but only in real-time scan mode. No scheduled system scans.
  • 9. Initial Testing S Two Hosts with 25 VM’s each. S One connected to the N3600 via ISCSI S The other via NFS. S Test lab of 25 thin clients. S Good performance. S Equivalent to a desktop of the previous generation. S Quick user logins due to the VM’s being always on and waiting. S The N3600 is maintaining low utilization. S NFS and ISCI exhibit similar speed.
  • 10. Go Live! S Five additional ESX Hosts are deployed. S Each hosts ~25 VM’s S Current setup gives me N+2 host redundancy. S For the first week everything looks good. S User complaints are primarily with the clinical application. S N3600 is handling it well. Running at about 35% utilization. S ~1.5k IOPs/Sec of regular background chatter. S VM’s report average latency of 12ms.
  • 11. Disaster! For me they happen seem to happen in threes. S First AV engine update happens 1 week after go live. S AV server pushes it to all clients at once. S The simultaneous update of all the View VM’s forces the SAN to a crawl for 3 hours. S Users complain that the Virtual Desktops are unusable. S Temporarily corrected the problem by only allowing the AV to update 3 machines at once. S This worked like a champ until a dot version update on the AV server a month later broke that setting. S Another 3 hour “downtime.”
  • 12. Disaster (cont) S Three days later a helpdesk tech forces the simultaneous reprovisioning of 60 of the View VM’s at once. S Was applying an application patch. S Was trained not to restart more than 5 VM’s at once. S That obviously didn’t stick! S That was another hour of the SAN crawling. S Once again, users complain that the system was unusable during this time.
  • 13. Disaster! (yet again) S .net 3.5 service pack is approved for deployment. S SP is large. >100mb. S Set to deploy starting at 2am and only on restart. S At 04:15 four VM’s restart within one minute of each other. S N3600 starts to lag. S Users seeing their system running slow decide to restart. S At 5am I get the call regarding the issue. S I immediately disabled the SP deployment. S Still took an hour for the N3600 to catch up.
  • 15. What’s Going On??? S Oh $41+… S General use chatter is eating my bandwidth. S N3600 CPU utilization is regularly now above 50%. S Disk utilization rarely drops below 40%. S Average disk latency >18ms.
  • 16. I Have a Problem S I’m maxing performance with just day to day operations. S IBM has verified that the appliance is functioning properly. S In other words, this is all I’m going to get out of it. S Adding disks might help some, but too costly! S Additional Tray would be $15k! S SAS drives to populate it are almost $1k each! S Still have CPU limitations. S NIC Limitations (2 – 1gbe links per head) S Did I mention that I have no money left in the budget?
  • 17. Nexenta to the Rescue S Had just installed Nexenta Core for my home file server. S Time to find some hardware: S Pulled a box out of the View cluster. S Installed six Intel SSD’s. S Installed Nexenta Core. (yeah, I know.. EULA..) S Created the volume and shared via NFS. S The next day my poor brain figured out that I could have just done a Nexenta VM. Doh! S Over the next week I migrated half the virtual desktops over.
  • 18. Its like Night and Day S Average latency drops from 18ms to 2ms. S Write throughput quadruples. S Read throughput doubles. S 20x improvement on 4k iops!
  • 19. They Like me Again!
  • 20. Time For a Full Nexenta Implementation S I was able to secure $45k capital for the next year. S Normally this would just draw laughter when talking about storage. S I also intend on replacing the existing EMC. S Annual maintenance too costly. S I despise the fact that I have to call them out every time I want to connect a new piece of hardware to it. S Still some questioning from higher-ups on this whole open- storage thing.
  • 21. Final Solution Hardware S 2x Supermicro dual Xeon servers with 96gb ram. S 1x DataOn 1600 JBOD S Houses twenty one 1tb nearline SAS drives. S 1x DataOn 1620 JBOD S Houses seventeen 300gb 10k rpm SAS drives S 2x Stec ZeusRam S 8x 160gb Intel 320 SSD’s
  • 23. Why DataOn? S Disk Shelf Manager S One thing Nexenta lacked was a way to monitor the JBoD’s S How could one of my techs know how which drive to pull? S Intuitive slot lighting. S They’re responsive even after the sale is made!
  • 24. Why Nexenta? S Its good to have on demand support. S I am the only member of our technical staff that has a basic understanding of storage architectures. S I like to have the ability to go on vacation from time to time! S Its good to have experts for unique problems. S Regular tested bug-fixes. S Its always nice to have someone’s neck to wring!
  • 25. The End Result S 2ms latency. S 500 mb/s reads S 200 mb/s writes S Happy Users! S Note: Benchmark was done on production system with 175 active VM’s.
  • 26. To Dedup or Not to Dedup S Dedup can give you huge storage savings. S I had 14x Dedup ratio on my VDI volume. S Inline dedup saves on disk write IO. S It’ll still hit the ZIL, but won’t be written to disk if it is determined to be duplicated data. S Instead of a 4+kb write you get a sub 256 byte metadata write.
  • 27. To Dedup or Not to Dedup S Ram Hog! S For good performance you need enough ram to store the dedup table. S Uses ARC for this, which means you will have less room for cached data. S Potential for hash collision. S Odds are astronimcal, but still a chance for data corruption. S Dedup performance penalty. S Small IOPS suffer.
  • 28. Dedup Perfomance Penalty Dedup Enabled No Dedup
  • 29. Is Dedup Worth it? S If you’re using a “Golden Image” - No. S VMDC Plugin provides great efficiency by only storing one copy of the Golden Image vs one for each pool of VM’s. S Compression is virtually free and will do a good job of making up the difference in the “new” blocks. S Disk is cheap. S If you’re doing a bunch of P2V desktop migrations - Maybe. S If the desktops are poorly configured, or have other aspects that can cause excessive I/O than no. S If the desktops are similar and large, then sure.
  • 30. Compression S Use it. Unless you’re using a 5 year old processor, there will be no noticeable performance hit. S On by default in Nexenta 3.1 S Compresses before write. Saves disk bandwidth!
  • 31. Cache is Key! S Between the the 70gb of arc and 640gb of l2arc the read cache is hit almost 98% of the time! S This equates to sub 2ms average disk latency to the end user. S Beats the crud out of the >15ms average latency of the N3600! S Know your working set. You could get away with a lot smaller or need a lot larger cache.
  • 34. Gig-E vs TenGig-E S Obvious differences in maximum throughput. S Small IOP differences are mainly attributable to network latency differences. S If you’re stuck with Gig-E go use 802.3ad trunk groups. S Still stuck with 100 mb/s throughput but no one ESX host will saturate the link for the rest.
  • 35. Gig-E vs TenGig-E - User Perspective S Average time from the “Power On VM” command being issued to the user is able to login: S 10gbe: 23 seconds S 1gbe: 32 seconds S Time from when user presses “login” button until the desktop is ready to use: S 10gbe: 5 seconds S 1gbe: 9 seconds *Windows 7, 2 procs, 2gb ram, DRMC’s Standard Clinical Image
  • 36. Final Thought – All SSD Goodness S For deployments of Linked Clones or VM’s off of a Golden Image. S Allows you to get rid of the L2ARC. S Use a good ZIL Device (STEC ZeusRam, DDRDrive) S Allows for sequential writes to the SSD’s in the pool. S Saves on write wear which is a SSD killer. S My first test box with the x25m SSD’s started suffering after about 3 months. S If you want HA you have to use SAS drives.
  • 38. keith@drmc.com 661-721-5650 Feel free to contact me. S