In SDDC, all three core infrastructure components, compute, storage and networking are virtualized. Virtualization software abstracts underlying hardware, while pooling compute, network and storage resources to deliver better utilization, faster provisioning and simpler operations. The VM becomes the centerpiece of the operational model, providing automation and agility to repurpose infrastructure according to business needs.
Today we will focus on Storage, which has been growing at an extremely rapid pace and is a fast changing aspect of the datacenter!
What we are trying to achieve is simplify datacenter operations, and our primary focus will be storage and availability. Storage is we all know traditionally has been a painpoint in many data centers, high cost and usually does not provide the performance and scalability one would want. By offering our customers choice we aim to change the world of IT, start a new revolution. But we cannot do this by ourselves, we need the help of you, the consultant / admin / architect.
vSphere is perfectly positioned for this as it abstracts physical resources and can provide them as a shared pooled construct to the administrator.
Because it sits directly in the I/O path, the hypervisor (through the notion of policies associated with virtual machines) has the unique ability to make optimal decisions around matching the demands of virtualized applications with the supply of underlying physical infrastructure.
On top of that the platform provides you the ability to assign service level agreements to workloads which will reduce the operational complexity and as such significantly reduces the chances of making mistakes.
This is where it all starts, without Storage Policy Based Management many of the products and features we are about to talk about would not be possible! If there is one thing you need to remember when you walk away today, then it is Storage Policy Based Management. it is the key enabled for Software Defined Storage and Availability!
SPBM provides the following benefits for customers: Stable, Robust Automation Platform Intelligent placement and fine control of services at the VM level Shields Automation and Orchestration Platforms from infrastructure changes by abstracting the Underlying Storage Implementation
When you deploy a virtual machine using the SPBM frame work then VMs will show up as either compliant or non-compliant. If a failure has occurred and one of the VMs is impacted you can easily see this as the VM will show up as non-compliant. Of course if there are sufficient hosts available and there is sufficient disk space then the VM will be re-protected (self-healing) by Virtual SAN.
What is VSAN in a nutshell…
So, it follows a hyper-converged architecture for easy, streamlined management and scaling of both compute and storage. Hyper-converged represents a system architecture – one where compute and persistence are co-located. This system architecture is enabled by software.
It is a SDS product. A layer of software that runs on every ESXi host. It aggregates the local storage devices on ESX hosts (SSD and magnetic disks) and makes them look like a single pool of shared storage across all the hosts.
VSAN has a distributed architecture with no single point of failure.
VSAN goes a step further than other HCI products – VMware owns the most popular hypervisor in the industry. Strong integration of VSAN in the hypervisor means that we can optimize the data path and we ensure optimal resource scheduling (compute, network, storage) according to the needs of each application. At the end, better resource utilization means better consolidation ratios, more bang for your buck! Resource utilization is one part of the story. The other part is the Operational aspects of the product.
VSAN has been designed as a storage product to be used primarily by vSphere admins. So, we put a lot of effort in packaging the product in a way that is ideal for today’s use cases of virtualized environments. Specifically, the VSAN configuration and management workflows have been designed as extensions of the existing host and cluster management features of vSphere. That means easy, intuitive operational experience for vSphere admins. It also means native integration with key vSphere features unlike any other storage product out there, HCI or not.
VSAN is widely adopted, over 3000 customers since launch with some very interesting use cases ranging from Oil Platforms to Trains and now being planned to be deployed on sub-marines and mobile deployment units out in the field.
The Oil Platform scenario is a “robo” deployment managed through a central vCenter Server leveraging a satellite connection.
The sub marines and mobile deployment unit story I can’t reveal who this is, but it is very real. Dual datacenter setups in a ship are not uncommon and Virtual SAN is a natural fit here.
We were very conservative when we initially launched VSAN – after all, this was customers data we were talking about. However, even though we were conservative, our customer were not. There are plenty of other use cases. The ones listed on the slide are the most commonly used. It is fair to say that Virtual SAN fits in most scenarios: Of course customers started with the test/dev workloads, just like they did when virtualization was first introduced Business Critical Apps – We have customers running Exchange / SQL / SAP and billing systems on Virtual San Virtual SAN is included in the Horizon Suite Advanced and Enterprise, so VDI/EUC is a natural fit. As a DR destination VSAN is also commonly used as you can scale out and the cost is relatively low compared to a traditional storage system Isolation workloads also something that VSAN is often used for, both DMZ and Management clusters fit this bill Of course there is also ROBO, VSAN can start small and grow when desired, both scale-out and scale-up, and with 6.1 we even made things better by introducing a 2 node, but we will get back to that!
When it comes to deploying VSAN there are 3 options. By far the most popular option is the VSAN Ready Node - pre-installed and configured ready nodes (Ready to Run).
These are pre-configured server models which have been fully certified for and tested with VSAN.
Another option is an integrated out of box experience - HCI nodes from EMC offer an “on rails” solutions.
Lastly EVO:SDDC (Not released yet) offers the capability to deploy VSAN, NSX, vRO and other VMware solutions end to end. An SDDC in a rack, which scales from half a rack to many...
Virtual SAN enables both hybrid and all-flash architectures. Irrespective of the architecture, there is a flash-based caching tier which can be configured out of flash devices like SSDs, PCIe cards, Ultra DIMMs etc. The flash caching tier acts as the read cache/write buffer that dramatically improves the performance of storage operations.
In the hybrid architecture, server-attached magnetic disks are pooled to create a distributed shared datastore, that persists the data. In this type of architecture, you can get up to 40K IOPs per server host.
In All-Flash architecture, the flash-based caching tier is intelligently used as a write-buffer only, while another set of SSDs forms the persistence tier to store data. Since this architecture utilizes only flash devices, it delivers extremely high IOPs of up to 90K per host, with predictable low latencies.
Deployed, configured and manage from vCenter through the vSphere Web Client Radically simple Configure VMkernel interface for Virtual SAN Enable Virtual SAN by clicking Turn On
Objects are divided and distributed into components based on policies. Components and policies will be covered shortly. VMs are no longer based on a set of files, like we have on traditional storage.
First thing you do before you deploy a VM is define a policy. VSAN has what if APIs so it will show what the “result” would be of having such a policy applied to a VM of a certain size. Very useful as it gives you an idea of what the “cost” is of certain attributes
Also note that a number of new capabilities were introduces in VSAN 6.2, and these will be discussed in more detail later on.
RAID-0 and RAID-1 were the only distributed RAID options up to and including version 6.1. New techniques introduced in VSAN 6.2 will be discussed shortly.
RAID-5/6 used when Fault Tolerance Method set to Capacity
Note that in order to protect against a rack failure the minimum required number of failure domains is 3, this is similar to protecting against a host failure using FTT=1 where the minimum number of hosts is 3.
Stretched Cluster Support for ROBO Enhanced Replication Support for SMP-FT Support for Oracle RAC Support for Windows Server Failover Clustering
Virtual SAN On-Disk Format Upgrade Disk Group Bulk Claiming Disk Claiming per Tier Stretched Cluster Configuration Stretched Cluster Health Monitoring
Health Check Plug-in in-box vRealize Operations Manager Integration Global data visualization Capacity planning Root-Cause analysis
Stretched storage with Virtual SAN will allow you to split the Virtual SAN cluster across 2 sites, so that if a site fails, you would be able to seamlessly failover to the other site without any loss of data. Virtual SAN in a stretched storage deployment will accomplish this by synchronously mirror data across the 2 sites. The failover will be initiated by a witness VM that resides in a central place, accessible by both sites.
Bandwidth to witness is 10Mbps, or 2MB per 1000 components (worse case scenario - very little traffic is observed during steady state, but we need to calculate for owner migration, or site failure)
Point-in-time view of the state of the cluster Geared to hardware – ensuring that everything is functioning as expected (disks, network, objects, components)
All Flash Only.
“High level description” Dedupe and compression happens during destaging from the caching tier to the capacity tier. You on a cluster level and deduplication/compression happens on a per disk group basis. Bigger disk groups will result in a higher deduplication ratio. After the blocks are deduplicated they will be compressed. A significant saving already, combined with deduplication and the results achieved can be up to 7x space reduction, of course fully dependent on the workload and type of VMs.
“Lower level description” Compression (LZ4) would be performed during destaging from the caching tier to the capacity tier. 4KB is the block size for deduplication. For each unique 4k block compression would be performed and if the output block size is less than or equal to 2KB, a compressed block would be saved in place of the 4K block. If the output block size is greater than 2KB, the block would be written uncompressed and tracked as such. The reason is to avoid block alignment issues, as well as reduce the CPU hit for decompressing the data which is greater than compression for data with low compression ratios. All of this data reduction is after the write acknowledgement.
Deduplication domains are within each disk group. This avoids needing a global lookup table (significant resource overhead), and allows us to put those resources towards tracking a smaller and more meaningful block size. We purposefully avoid dedupe of “write hot data” In the cache, or decompressing uncompressible data significant CPU/memory resources can avoid being wasted.
Note: Feature is supported with stretch clusters, ROBO edition
Sometimes RAID 5 and RAID 6 over the network is also referred as erasure coding. This is done inline; there is no post-processing required. Since VMware has a design goal of not relying on data locality, this implementation of erasure coding does not bring any negative results by distributing the RAID-5/6 stripe across multiple hosts.
In this case RAID-5 requires 4 hosts at a minimum as it uses a 3+1 logic. With 4 hosts 1 can fail without data loss. This results in a significant reduction of required disk capacity. Normally a 20GB disk would require 40GB of disk capacity, but in the case of RAID-5 over the network the requirement is only ~27GB. There is another option if higher availability is desired
Use case Information: Erasure codes offer “guaranteed capacity reduction unlike deduplication and compression. For customers who have “no thin provisioning policies” have data that is already compressed and deduplicated or have encrypted data this offers “known/fixed” capacity gains. This can be applied on a granular basis (Per VMDK) using the Storage Policy Based Management system.
30% Savings. Note: All Flash VSAN only. Note: Not supported with stretched clusters Note: this does not require the cluster size be a multiple of 4, just 4or more.
Cluster wide setting (Default is on). Can be disabled on a per object basis using storage policies.
Software checksum will enable customers to detect the corruptions that could be caused by hardware/software components including memory, drives, etc during the read or write operations. In case of drives, there are two basic kinds of corruption. The first is “latent sector errors”, which are typically the result of a physical disk drive malfunction. The other type is silent corruption, which can happen without warning (These are typically called silent data corruption). Undetected or completely silent errors could lead to lost or inaccurate data and significant downtime. There is no effective means of detection without end-to-end integrity checking. During the read/write operations VSAN will check for the validity of the data based on checksum. If the data is not valid then it should take the necessary steps to either correct the data or report it to the user to take action. These actions could be: Fetch the data from other copy of the data for RAID1, RAID5/6, etc. This is what we call recoverable data. If there is no valid copy of the data the error SHALL be returned This is what we call Non-recoverable errors
Reporting: In case of errors the issues will be reported in the UI and logs. This will include impacted blocks and their associated VMs. A customer will be able to see the list of the VMs/Blocks that are hit by non-recoverable errors. A customer will be able to see the historical/trending errors on each drive
CRC32 is the algorithm used (CPU offload support reduces overhead)
There will be two level of scrubbing: Component level scrubbing: every block of each component is checked. If checksum mismatch, the scrubber tries to repair the block by reading other components. Object level scrubbing: for every block of the object, data of each mirror (or the parity blocks in RAID-5/6) is read and checked. For inconsistent data, mark all data in this stripe as bad.
Repair can happen during normal I/O at DOM Owner or by scrubber. The repair path for mirror and RAID-5/6 are different. When checksum verification fails, the scrubber or DOM Owner will read the other copy of the data (or other data in the same stripe in case of RAID-5/6), rebuild the correct data and write it out to the bad location.
End-to-end checksum of the data to prevent data integrity issues that could be caused by silent disk errors ( checksum is calculated and stored on the write path ) Detect silent corruptions when reading the data through checksum data
When checksum verification fails, VSAN will read the other copy of the data (or other data in the same stripe in case of RAID-5/6), rebuild the correct data and write it out to the bad location
It is based on 4K block size
This will replace the 1MB cache lines used for read ahead, with a larger cache (.4% of host memory up to 1GB). Preliminary testing with VDI show some impressive numbers and this will compliment CBRC. Data locality will be used for the memory cache (as we do with CBRC) as this is a read only cache (so no need for network ACK). Memory latency is actually low enough for the latency to be a concern. 4KB granularity of cache.
Sparse swap will be an advanced host level option (Swap is not managed by SPBM but the kernel). This will enable the reclaiming of space dedicated to memory. On a cluster with 256GB per host, this would yield TB’s of capacity savings at scale. This should benefit linked clone VDI storage utilization.
Performance Monitoring Service allows from vCenter to be able to monitoring existing workloads. Customers needing access to tactical performance information will not need to go to vRO.
Performance monitor includes macro level views (Cluster latency, throughput, IOPS) as well as granular views (per disk, cache hit ratios, per disk group stats) without needing to leave vCenter.
The performance monitor allows aggregation of states across the cluster into a “quick view” to see what load and latency look like as well as share that information externally directly to 3rd party monitoring solutions by API.
The Performance monitoring service runs on a distributed database that is stored on VSAN and NOT vCenter (will use up to ~255GB, which is why it will ask for a policy).
Work is being done on SAP HANA. This may not make launch, but PE is working with SAP on this.
SAP Core apps are ready to be supported.
“Horizon should be deployed with VSAN”
Exchange DAG, Microsoft Always On as it was already is supported. PE team has put together some impressive transaction numbers for Oracle.
Of course we have a vision, and the vision isn’t too far out, it is just ahead
We about to wrap up this session, I want to leave you with one more thing. VSAN is being extended to serve as a generic storage platform. One which in addition to the traditional virtualization use cases of VMs and VSCSI disks, VSAN can also serve storage though new abstractions: lightweight block drivers (perhaps using the NVMe protocol), files, and REST APIs. That’s storage that can be made available to individual hosts or be shared according to the protocol semantics across many hosts and application instances in the infrastructure. besides that VMware has been prototyping a distributed file system which leverages Virtual SAN as their core storage provider and serves storage capacity in an easy way and distributed fashion to thousands of clients. Yes the future is bright, and this is just the beginning.
With that I would (click) like to thank you and open the floor for questions
With that I would (click) like to thank you and open the floor for questions
Virtual SAN 6.2, hyper-converged infrastructure software
VMware Virtual SAN
Office of the CTO
Storage & Availability
Hyper-converged infrastructure software
2 Virtual SAN, what is it?
3 Virtual SAN, a bit of a deeper dive
4 Virtual SAN Recent Enhancements
5 Wrapping up
The Software Defined Data Center
Compute Networking Storage
• All infrastructure services virtualized:
compute, networking, storage
• Underlying hardware abstracted,
resources are pooled
• Control of data center automated by
software (management, security)
• Virtual Machines are first class citizens
of the SDDC
• Today’s session will focus on one
aspect of the SDDC - storage
The Hypervisor is the Strategic High Ground
SAN/NASx86 - HCI Object Storage
Storage Policy-Based Management – App centric automation
• Intelligent placement
• Fine control of services at VM level
• Automation at scale through policy
• Need new services for VM?
• Change current policy on-the-fly
• Attach new policy on-the-fly
Virtual Machine Storage policy
Reserve Capacity 40GB
Availability 2 Failures to tolerate
Read Cache 50%
Stripe Width 6
Storage Policy-Based Management
Virtual SAN Virtual Volumes
Storage Policy Based Management – What does it look like?
If the storage can satisfy the VM
Storage Policy, the VM Summary tab
in the vSphere client will display the
VM as compliant.
If not, either due to failures, lack of
resources or other reasons, the VM
will be shown as non-compliant.
Virtual SAN, what is it?
Distributed, Scale-out Architecture
Integrated with vSphere platform
Ready for today’s vSphere use cases
vSphere & Virtual SAN
But what does that really mean?
Generic x86 hardware
VMware vSphere & Virtual SAN Integrated with your Hypervisor
Leveraging local storage resources
Exposing a single shared datastore
VSAN is the Most Widely Adopted HCI Product
Simplicity is key, on an oil
platform there are no
virtualization, storage or network
admins. The infrastructure is
managed over a satellite link via
a centralized vCenter Server.
Reliability, availability and
predictability is key.
Virtual SAN Use Cases
VMware vSphere + Virtual SAN
Critical Apps DR / DA
Broadest Deployment Options from HCI to SDDC
Built on Industry-Leading VMware Hyper-Converged Software (HCS)
Certified Solutions Engineered Appliances
Virtual SAN Ready Nodes
Virtual SAN + vSphere + vCenter
Virtual SAN + vSphere + vCenter
Virtual SAN + vSphere + vCenter
EVO SDDC Manager
Tiered Hybrid vs All-Flash
100K IOPS per Host
Writes cached first,
Reads from capacity tier
Reads go directly to capacity tier
SSD PCIe Ultra DIMM
40K IOPS per Host
Read and Write Cache
SAS / NL-SAS / SATA
SSD PCIe Ultra DIMM
Really Simple Setup
2 node or
Virtual Machine as a set of Objects on VSAN
• VM Home Namespace
• VM Swap Object
• Virtual Disk (VMDK) Object
• Snapshot (delta) Object
• Snapshot (delta) Memory Object
Define a policy first…
Virtual SAN currently surfaces multiple storage capabilities to vCenter Server
What If APIs
in VSAN 6.2
Virtual SAN Objects and Components
VSAN is an object store!
• Object Tree with Branches
• Each Object has multiple Components
– This allows you to meet availability and
• Here is one example of “Distributed RAID” using
– Striping (RAID-0)
– Mirroring (RAID-1)
• Data is distributed based on VM Storage Policy
ESXi HostESXi Host
Number of Failures to Tolerate/Failure Tolerance Method
• Defines the number of hosts, disk or network failures a storage object can tolerate.
• RAID-1 Mirroring used when Failure Tolerance Method set to Performance (default).
• For “n” failures tolerated, “n+1” copies of the object are created and “2n+1” host contributing
storage are required!
esxi-01 esxi-02 esxi-03 esxi-04
Virtual SAN Policy: “Number of failures to tolerate = 1”
~50% of I/O
~50% of I/O
Assign it to a new or existing VM
When the policy is selected, Virtual SAN uses it to place / distribute the VM to guarantee
availability and Performance
Fault Domains, increasing availability through rack awareness
• Create fault domains to increase availability
• 8 node cluster with 4 defined fault domains (2 nodes in each)
FD1 = esxi-01, esxi-02 FD3 = esxi-05, esxi-06
FD2 = esxi-03, esxi-04 FD4 = esxi-7, esxi-08
• To protect against one rack failure only 2 replicas are required and a witness across 3 failure domains!
FD2 FD3 FD4
vmdk vmdk witness
All Flash Configuration
64 node VSAN cluster
x2 Hybrid Performance
Deduplication and Compression
RAID 5/6 support
QoS via IOPS Limits
Enhanced Capacity Views
Replication - 5 Minutes RPO
Health Monitoring & Remediation
Virtual SAN – Stretched Cluster
Active-Active data centers
• Virtual SAN cluster split across 2 sites!
• Each site is a Fault Domain (FD)
• Site-level protection with zero data loss
and near-instantaneous recovery
• Support for up to 5ms RTT latency
between data sites
– 10Gbps bandwidth expectation
• Witness VM can reside anywhere
– 200ms RTT latency
– 100Mbps bandwidth required at most
• Automated failover
5ms RTT, 10GbE
VMware vSphere & Virtual SAN
vSphere & Virtual SAN
Site Recovery Manager
Advanced Troubleshooting with VSAN Health Check
• Cluster Health
• Network Health
• Data Health
• Limits Health
• Physical Disk Health
• Stretched Cluster
• Proactive Tests
Deduplication and Compression for Space Efficiency
• Nearline deduplication and compression per disk group level.
– Enabled on a cluster level
– Deduplicated when de-staging from cache tier to capacity tier
– Fixed block length deduplication (4KB Blocks)
• Compression after deduplication
– If block is compressed <= 2KB
– Otherwise full 4KB block is stored
esxi-01 esxi-02 esxi-03
vSphere & Virtual SAN
All Flash Only
Significant space savings achievable,
making the economics of an all-flash
VSAN very attractive
RAID-5/6 (Inline Erasure Coding)
• When Number of Failures to Tolerate = 1 and Failure Tolerance Method = Capacity RAID-5
– 3+1 (4 host minimum)
– 1.33x overhead for RAID-5 instead of 2x compared to FTT=1 with RAID-1
• When Number of Failures to Tolerate = 2 and Failure Tolerance Method = Capacity RAID-6
– 4+2 (6 host minimum)
– 1.5x overhead for for RAID-6 instead of 3x compared to FTT=2 with RAID-1
All Flash Only
Software Checksum and disk scrubbing
• End-to-end checksum of the data to detect and resolve silent
disk errors due to faulty hardware/firmware
• Checksum is enabled by default (policy driven)
• If checksum verification fails on a read:
– VSAN fetches the data from another copy in RAID-1
– VSAN recreates the data from other components in RAID-5/6
• Disk scrubbing is run in the background
• Provide additional level of data integrity
• Automatic detection and resolution of silent disk errors
Virtual SAN Datastore
Other new improvements
• Write through read memory cache
– 0.4% of total host memory, up to 1GB per host
• “Local” to the virtual Machine
• Low overhead, big impact!
• Reclaim Space used by memory swap
• Host advanced option enables setting policy for swap to
no space reservation
IOPS limit on object
• Policy driven capability
• Limit IOPS per VM/Virtual Disk
• Eliminate noisy neighbor issues
• Manage performance SLAs
Enhanced Virtual SAN Management with New Health Service
Built-in performance monitoring
Health and performance APIs and SDK
Storage capacity reporting
And many more health checks…
Performance Monitoring Capacity Monitoring
Performance, Scale and Availability for Any Application
B U S I N E S S - C R I T I C A L
A P P L I C AT I O N S
SAP Core Ready
Testing and validated
Tightly integrated cloud
Bundles Virtual SAN
licenses for lowest
cost VDI storage
Oracle RAC supported
Testing and validated