2. CLUSTER ROLLING UPGRADES
Decrease Time to Value
Ability to upgrade a cloud platform OS to WS.vNext
without interrupting cluster workloads, adopting new
capabilities with no downtime or SLA penalties.
3. WHAT IS THE CLUSTER OS ROLLING UPGRADE PROCESS?
Scenario:
• Start with a Hyper-V cluster with 7 WS2012R2 nodes and 15 VMs
• Support both storage topologies for Hyper-V
• Disaggregated – File Based Storage with SMB (Scale-out File Server)
• Converged – Local block storage and Cluster Shared Volumes
4. FOR RACH NODE IN CLUSTER:
STEP 1: PAUSE NODE | DRAIN ROLES
Node is paused and gracefully drained of all running virtual machines
VMs are live migrated to other nodes – with no downtime
5. STEP 2: EVICT A NODE, CLEAN INSTALL NEW OS
WS.vNext is installed
OS is wiped and a clean install of WS.vNext is done
6. STEP 3: REJOIN NODE TO CLUSTER
Done from WS.vNext or Windows 10 with RSAT
Node is added back to cluster
Cluster runs with “Mixed OS versions”
Cluster Functional Level stays WS2012R2
Enhancements of WS.vNext node will operate in compatibility mode
New features which impact downlevel compatibility on WS.vNext
node will not be enabled
7. STEP 4: REBALANCE THE CLUSTER WORKLOAD
VMs can fail over and live migrate anywhere in mixed mode
Uplevel to WS.vNext node
Downlevel to WS2012R2 node
8. STEP 5: REPEAT STEPS 1-4 FOR THE NEXT NODE
Process is repeated on the next node
UI’s can manage downlevel nodes
Entire cluster is manageable from WS.vNext node
WS.vNext can manage downlevel WS2012R2 nodes
Cannot manage uplevel (WS.vNext) nodes from WS2012R2 node
10. UPGRADE CLUSTER FUNCTIONAL LEVEL
Once all nodes are upgraded to WS.vNext the Cluster Functional
Level is upgraded via Upgrade-ClusterFunctionalLevel cmdlet
Cluster Functional Level considerations:
Cannot be upgraded until all nodes are running WS.vNext
Point of no return – no WS2012R2 nodes can be added after
All compatibility mode disabled vNext features are unlocked
Some features require additional steps to be done:
Spaces require Update-StoragePool cmdlet to upgrade the pool
VMs require Update-VMConfigurationVersion (VM is off) to unlock
some new features, like vTPM
11. UPGRADE COMPLETE
Private Cloud Upgrade:
All nodes are running WS.vNext
Cluster Functional Level is vNext
No downtime to tenant VMs
Lower cost of adopting vNext
vNext features for the cluster:
Storage Replication
Cloud Witness in Azure
VM Resiliency
Node isolation
Quarantine
12. CLUSTER OS ROLLING UPGRADE GUIDANCE
Recommended not to run in Mixed mode over two weeks
Co not create or resize storage on WS.vNext nodes while in Mixed mode
Test an upgrade from WS2012R2 to Technical Preview now
Not supported in Technical Preview release:
Cluster OS Rolling Upgrade of cluster with data de-duplication
Cluster OS Rolling Upgrade of VMs with SCDPM backups
Cluster OS Rolling Upgrade of Shared VHDX guest clusters
13. HYPER-V CONFIGURATION VERSIONING
What is Configuration version?
Configuration files
Saved State and Snapshot Files
What OS support what configuration versions?
14. INTERACTING WITH HYPER-V CONFIGURATION VERSION
PowerShell: Update-VMConfigurationVersion cmdlet
UI will come in future builds
Virtual Machine must be off
Saved state and online checkpoints are discarded
Cluster functional level must be upgraded
17. BLOCKS, NOT FILES
SR is not DFSR
Replicating storage blocks underneath the volume
Don’t care if files are in use
Write IOs is all that matters for Storage Replica
21. REQUIREMENTS
Kerberos (for SMB)
>= 1Gbps between servers
Disks:
GBT, no MBR
Yes: JBOD, iSCSI, Local SCSI or SATA, SAN
No: USB, thumb drives, tapes, floppy disks, etc
Same disk geometry
Free space for logs on Windows volume
No %systemroot% or page file
Firewall: SMB and WS-MAN
22. RECOMMENDATIONS FOR SYNCHRONOUS
Latency:
<=5ms round trip average
Most cases 30-50km on 10GBASE or dark fibre
Write IO:
Perfmon (logical disk) and DISKSPD
Use micro benchmarks before and after SR
Log sizing and backing
SSD or bust
Larger logs allow faster recovery from larger outages, but cost space
23. RECOMMENDATIONS FOR ASYNCHRONOUS
Latency:
Doesn’t matter
Write IO:
Perfmon (logical disk) and DISKSPD
Use micro benchmarks before and after SR
Log sizing and backing
SSD or bust
Larger logs allow faster recovery from larger outages, but cost
space
24. PHILOSOPHY
Async crash consistency versus application consistency
SR guarantees mountable volume
App guarantees a readable file
Snapshots…
Are not working in Technical Preview
Async means some data loss
No RPO in Technical Preview
How much money is your data worth?
25. GOOD IDEAS TO FOLLOW
Drivers, drivers and… drivers
Filters!
Performance envelopes
27. WHAT ISN’T STORAGE REPLICA
Storage Replica is not a “shared nothing” clustering
Storage Replica is not a backup
Will easily replicate deleted/damaged data
Storage Replica is not DFS-R
No file-level
No multi-master
No multi-endpoint
No low bandwidth
Storage Replica is not therefore a general branch office solution
28. TOPOLOGY
Stretch cluster
Server-to-server
No 1-to-many
No a->b->c
No a->b->c->d
Cluster-to-cluster is coming
Not working in Technical Preview
Fun fact: you can set up Storage Replica to replicate server to itself
Clone disks, or even volumes
Create a local mirror for future remote initial sync as seed
29. STRETCH CLUSTERS
Synchronous only
Asymmetric storage only
Two sites
Two sets of shared storage
Cluster storage: CSV or role assigned
Configure and manage via Failover Cluster Manager or PowerShell
Designed to increase the DR capabilities of a cluster
Hyper-V and General Use File Server are main use cases in TP
Scale-out File Server is not supported as stretched solution with SR
31. CONFIGURE AND MANAGE
1. Add a source data disk to a role or CSV
2. Enable replication on that source disk
3. Select a source log
4. Select a destination data disk
5. Select a destination log
All these steps (except 1) would be changed in Consumer Preview UI
32. SERVER TO SERVER
Synchronous or Asynchronous
Any type of fixed storage (same rules as previously)
Configure and manage via PowerShell (no UI)
Designed to increase DR capabilities of a server
Cluster to cluster is coming
File Server is the main use case in Technical Preview
39. REPLICATION METADATA
Hidden disk partition
Hidden logs
Always write through disk
Placed inside System Volume Information
Registry (real and cluster hive)
HKLMSoftwareMicrosoftWVR
HKLMClusterWVR
41. SUPPORTABILITY
Performance Counters
Dozen of counters
Many changes done after TP already
Event logs
Hundreds of clear, guided, low-noise events
Many changes done after TP already
42. KNOWN ISSUES IN TECHNICAL PREVIEW
Removal of replication in Failover Cluster Manager doesn’t work
PowerShell remoting doesn’t work
Performance
Failover Cluster Manager UI
The name.. WVR
43. PLANS FOR CHANGES
Azure Site Recovery integration
Cluster-to-cluster
Ease of management
Performance
Migration
Inventory
44. GET THE GUIDE
Technical Preview Step-by-Step Guide: Storage Replica:
http://go.microsoft.com/fwlink/?LinkID=514902