Taneja “Next-Generation FC Arrays”:Clustered controller designSub-disk virtualizationSelf-configuring and self-tuning storageAutomated storage tieringThin technologies
Up to 256 FC or iSCSI LUNsESX multipathingLoad balancingFailoverFailover between FC and iSCSI*Beware of block sizes greater than 256 KB!If you want virtual disks greater than 256 GB, you must use a VMFS block size larger than 1 MBAlign your virtual disk starting offset to your array (by booting the VM and using diskpart, Windows PE, or UNIX fdisk)*
Link Aggregate Control Protocol (LACP) for trunking/EtherChannel - Use “fixed” path policy, not LRUUp to 8 (or 32) NFS mount pointsTurn off access time updatesThin provisioning? Turn on AutoSize and watch out
Storage Virtualization Seminar Stephen Foskett Director of Data Practice, Contoural
Part 1:Breaking the Connections Storage virtualization is here, breaking the connection between physical storage infrastructure and the logical way we use it
Agenda What is storage virtualization? Volume management Advanced file systems Virtualizing the SAN Virtual NAS
Poll: Who is Already Using Storage Virtualization?
We talk about virtualization like it is new or strange…
According to ESG, 52% have already implemented storage virtualization and 48% plan to! (ESG 2008)
The act of abstracting, hiding, or isolating the internal function of a storage (sub)system or service from applications, compute servers or general network resources for the purpose of enabling application and network independent management of storage or data. The application of virtualization to storage services or devices for the purpose of aggregating, hiding complexity or adding new capabilities to lower level storage resources. Storage can be virtualized simultaneously in multiple layers of a system, for instance to create HSM like systems. SNIA Defines Storage Virtualization The act of abstracting, hiding, or isolating the internalfunction of a storage (sub)system or service from applications, compute servers or general network resources for the purpose of enabling application and network independentmanagement of storage or data. The application of virtualization to storage services or devices for the purpose of aggregating, hiding complexity or adding new capabilities to lower level storage resources. Storage can be virtualized simultaneously in multiple layers of a system, for instance to create HSM like systems.
What and Why? Virtualization removes the hard connection between storage hardware and users Address space is mapped to logical rather than physical locations The virtualizing service consistently maintains this meta-data I/O can be redirected to a new physical location We gain by virtualizing Efficiency, flexibility, and scalability Stability, availability, and recoverability
The Non-Revolution:Storage Virtualization Software
We’ve been talking about storage virtualization for 15 years!
Virtualization exists for both block and file storage networks
Can be located in server-based software, on network-based appliances, SAN switches, or integrated with the storage array
There is no clustering (until Sun adds Lustre)
Path Management Software Path management virtualizes the connection from a server to a storage system Failover Load balancing strategies A few choices Veritas DMP (cross-platform, with Storage Foundation) EMC PowerPath (supports EMC, HDS, IBM, HP) IBM SDD (free for IBM) HDS (HDLM) Microsoft MPIO (Windows, supports iSCSI and most FC) VMware Failover Paths
Can it replicate to and from thin provisioned volumes?
Thin provisioning is an abdication of our responsibilities!
The next frontier – efficiently storing duplicate content
More appropriate to some applications than others
Software or appliance (and now array!) analyzes files or blocks, saving duplicates just once
Block-based reduce capacity more by looking inside files
Once common only for archives, now available for production data
Serious implications for performance and capacity utilization
In-line devices process all data before it is written
Post-processing systems scan written data for duplicates
“Cloud” Storage Many companies are choosing managed services for servers and storage Lots of managed archive and backup providers Zantaz, Google Postini, EMC Mozy, Symantec SPN, etc Managed storage services is coming into its own (finally!) Amazon S3 and Nirvanix EMC “Fortress”
The Next-Generation Data Center Virtualization of server and storage will transform the data center Clusters of capability host virtual servers Cradle to grave integrated management SAN/network convergence is next InfiniBand offers converged virtual connectivity today iSCSI and FCoE become datacenter Ethernet (DCE) with converged network adapters (CNAs)
Poll: Does Server Virtualization Improve Storage Utilization?
Why Use Virtual Storage For Virtual Servers? Mobility of virtual machines between physical servers for load balancing Improved disaster recovery Higher availability Enabling physical server upgrades Operational recovery of virtual machine images
Befuddled traditional backup, replication, reporting
VMware Storage Options:Shared Storage Shared storage - the common/ workstation approach Stores VMDK image in VMFS datastores DAS or FC/iSCSI SAN Hyper-V VHD is similar Why? Traditional, familiar, common (~90%) Prime features (Storage VMotion, etc) Multipathing, load balancing, failover* But… Overhead of two storage stacks (5-8%) Harder to leverage storage features Often shares storage LUN and queue Difficult storage management VM Host Guest OS VMFS VMDK DAS or SAN Storage
VMware Storage Options:Shared Storage on NFS Shared storage on NFS – skip VMFS and use NAS NTFS is the datastore Wow! Simple – no SAN Multiple queues Flexible (on-the-fly changes) Simple snap and replicate* Enables full Vmotion Use fixed LACP for trunking But… Less familiar (3.0+) CPU load questions Default limited to 8 NFS datastores Will multi-VMDK snaps be consistent? VM Host Guest OS NFS Storage VMDK
VMware Storage Options:Raw Device Mapping (RDM) Raw device mapping (RDM) - guest VM’s access storage directly over iSCSI or FC VM’s can even boot from raw devices Hyper-V pass-through LUN is similar Great! Per-server queues for performance Easier measurement The only method for clustering But… Tricky VMotion and DRS No storage VMotion More management overhead Limited to 256 LUNs per data center VM Host Guest OS I/O Mapping File SAN Storage
Physical vs. Virtual RDM Virtual Compatibility Mode Appears the same as a VMDK on VMFS Retains file locking for clustering Allows VM snapshots, clones, VMotion Retains same characteristics if storage is moved Physical Compatibility Mode Appears as a LUN on a “hard” host Allows V-to-P clustering,a VMware locking No VM snapshots, VCB, VMotion All characteristics and SCSI commands (except “Report LUN”) are passed through – required for some SAN management software
Poll: Which VMware Storage Method Performs Best? Mixed Random I/O CPU Cost Per I/O VMFS, RDM (p), or RDM (v) Source: “Performance Characterization of VMFS and RDM Using a SAN”, VMware Inc., 2008
Which Storage Protocol is For You? FC, iSCSI, NFS all work well Most production VM data is on FC Either/or? - 50% use a combination (ESG 2008) Leverage what you have and are familiar with For IP storage Use TOE cards/iSCSI HBAs Use a separate network or VLAN Is your switch backplane fast? No VM Cluster support with iSCSI* For FC storage 4 Gb FC is awesome for VM’s Get NPIV (if you can)
Poll: Which Storage Protocol Performs Best? Throughput by I/O Size CPU Cost Per I/O Fibre Channel, NFS, iSCSI (sw), iSCSI (TOE) Source: “Comparison of Storage Protocol Performance”, VMware Inc., 2008
Storage Configuration Best Practices Separate operating system and application data OS volumes (C: or /) on a different VMFS or LUN from applications (D: etc) Heavy apps get their own VMFS or raw LUN(s) Optimize storage by application Consider different tiers or RAID levels for OS, data, transaction logs - automated tiering can help No more than one VMFS per LUN Less than 16 production ESX .VMDKs per VMFS Get thin Deduplication can have a huge impact on VMDKs created from a template! Thin provisioning can be very useful – Thin disk is in Server, not ESX!?!
Why NPIV Matters Without NPIV N_Port ID Virtualization (NPIV) gives each server a unique WWN Easier to move and clone* virtual servers Better handling of fabric login Virtual servers can have their own LUNs, QoS, and zoning Just like a real server! When looking at NPIV, consider: How many virtual WWNs does it support? T11 spec says “up to 256” OS, virtualization software, HBA, FC switch, and array support and licensing Can’t upgrade some old hardware for NPIV, especially HBAs Virtual Server Virtual Server Virtual Server 21:00:00:e0:8b:05:05:04 With NPIV Virtual Server Virtual Server Virtual Server …05:05:05 …05:05:06 …05:05:07
Virtualization-Enabled Disaster Recovery DR is a prime beneficiary of server and storage virtualization Fewer remote machines idling No need for identical equipment Quicker recovery (RTO) through preparation and automation Who’s doing it? 26% are replicating server images, an additional 39% plan to (ESG 2008) Half have never used replication before (ESG 2008) News: VMware Site Recovery Manager (SRM) integrates storage replication with DR
Enhancing Virtual Servers with Storage Virtualization Mobility of server and storage images enhances load balancing, availability, and maintenance SAN and NAS arrays can snap and replicate server images VMotion moves the server, Storage VMotion (new in 3.5) moves the storage between shared storage locations Virtualization-optimized storage Pillar and HDS claim to tweak allocation per VM Many vendors announcing compatibility with VMware SRM Most new arrays are NPIV-capable Virtual storage appliances LeftHand VSA – A virtual virtualized storage array FalconStor CDP – a virtual CDP system
Enabling Virtual Backup Virtual servers cause havoc for traditional client/server backups I/O crunch as schedules kick off – load is consolidated instead of balanced Difficult to manage and administer (or even comprehend!) Storage virtualization can help Add disk to handle the load (VTL) Switch to alternative mechanisms (snapshots, CDP) Consider VMware consolidated backup (VCB) Snapshot-based backup of shared VMware storage Block-based backup of all VMDKs on a physical server
Larger systems have fewer capacity limitations
How Green Am I? Server virtualization can dramatically reduce power, cooling, and space requirements Fewer physical servers Better (any) power management Storage virtualization offers fewer green benefits Does not normally reduce equipment footprint Enterprise storage systems not very energy efficient Transformed storage systems might help De-duplication, tiered storage, and archiving can slow growth New MAID and spin-down devices offer power/cooling savings
Performance A battle royale between in- and out-of-band! In-band virtualization can improve performance with caching Out-of-band stays out of the way, relying on caching at the device level Split-path adds scalability to in-band Large arrays perform better (usually) than lots of tiny RAIDs or disks First rule of performance: Spindles Second rule of performance: Cache Third rule of performance: I/O Bottlenecks
Solid State Drives (and Myths) The new (old) buzz RAM vs. NAND flash vs. disk EMC added flash drives to the DMX (CX?) as “tier-0”, CEO Joe Tucci claims flash will displace high-end disk after 2010 Sun, HP adding flash to the server as a cache Gear6 caches NAS with RAM But… Are they reliable? Do they really perform that well? Will you be able to use them? Is the 10x-30x cost justified? Do they really save power? Notes: 1 – No one writes this fast 24x7 2 – Manufacturers claim 2x to 10x better endurance
Stability, Availability, and Recoverability Replication creates copies of storage in other locations Local replicas (mirrors and snapshots) are usually frequent and focused on restoring data in daily use Remote replicas are used to recover from disasters Virtualization can ease replication Single point of configuration and monitoring Can support different hardware at each location
We Love It! Efficiency, scalability, performance, availability, recoverability, etc… Without virtualization, none of this can happen!
Implementation Issues Many virtualization systems require additional software loaded on servers Device drivers, path managers, agents, “shims” Additional maintenance and configuration can offset “single pane” benefits Organizational issues can crop up Virtualization blurs the lines between who owns what Future datacenter combines server, storage, network What about application?
Cost Benefit Analysis Benefits Improved utilization Tiering lowers per-GB cost Reduced need for proprietary technologies Potential reduction of administrative/ staffing costs Flexibility boosts IT response time Performance boosts operational efficiency Costs Additional hardware and software cost Added complexity, vendors Training and daily management Reporting and incomprehensibility Possible negative performance impact Stability and reliability concerns
Closing Thought:What Is Virtualization Good For? Virtualization is a technology not a product What will you get from using it? Better DR? Improved service levels and availability? Better performance? Shortened provisioning time? The cost must be justified based on business benefit, not cool technology
Audience Response Questions? Stephen Foskett Contoural, Inc. firstname.lastname@example.org http://blog.fosketts.net