Understanding
VMware Capacity
Phil Bell
Introducing
Syncsort
>7,000 customers
84 of the Fortune 100
Customers in >100 countries
Headquarters: Pearl River, NY
U . S . L O C AT I O N S
• Burlington, MA; Irvine, CA;
Oakbrook Terrace, IL; Rochester, MN
G L O B A L P R E S E N C E
• U.K., France, Germany, Netherlands,
Israel, Hong Kong & Japan
Big Iron to Big Data is a fast-growing
market segment composed of solutions
that optimize traditional data systems
and deliver mission-critical data from
these systems to next-generation
analytic environments.
Global leader in
Big Iron to Big Data
Optimize IntegrateAssure
Making critical data useful
Improve performance
and control costs across
the full IT environment,
from legacy systems to
the cloud
Connect today’s data
infrastructure with
tomorrow’s technology –
and ensure data quality
– powering machine
learning, AI and
predictive analytics
Increase data availability
and provide security as
the world moves to
accessing data in 24x7
timeframes and data
protection becomes
mission critical
Optimize
Strategic use cases and key partners
• Mainframe optimization
• Cross-platform capacity
management
• Data warehouse
optimization
• Application modernization
Integrate
• Access & integrate machine data for
security & IT ops
• Access & integrate legacy data to the
data lake
• Change data capture
• High-performance ETL
• Data governance
• Customer 360
• Big data quality & integration
Assure
• High availability
• Disaster recovery
• Mission-critical data
migration
• Data security, audit &
encryption for IBM i
Best-in-class product portfolio
Athene™
Athene™ Cloud
MFX®
ZPSaver Suite
DL/2™
Zen™ Suite
EZ Suite
MIMIX® Availability™
MIMIX® DR
MIMIX® Move™
Quick-EDD/HA
iTERA®
Cilasoft Audit & Security Suite
Enforcive® Security Suite
Ironstream™
Ironstream™ for IBM i
Ironstream™ Transaction Tracing
DMX™ & DMX-h™
DMX™ Change Data Capture
MIMIX® Share™
Trillium® Software System
Trillium™ Quality for Big Data
Trillium™ Cloud
Trillium™ Global Locator
Understanding
VMware Capacity
Phil Bell
July 2018
• Why OS Monitoring Can be Misleading
• 5 Key VMWare Metrics for Understanding
VMWare capacity
• How VMWare processor scheduling impacts
CPU capacity measurements
• Measuring Memory Capacity
• Measuring Disk Storage Latency
• Calculating Headroom in VMs
Topics
100%
|
|
|
0%
100%
|
|
|
0%
Why OS Monitoring Can be Misleading:
Determining CPU usage of a VM
OS: 50% CPU Busy
VMware: 25% CPU Busy
Dormant/Idle
OS: 50% CPU Busy
VMware: 50% CPU Busy
VM1
VM2
CPU: 1 Second
Why OS Monitoring Can be Misleading:
OS vs VM CPU, Data Differences
Why OS Monitoring Can be Misleading: Time Slicing
• Cores are shared between vCPUs in time slices
• 1 vCPU to 1 core at any point in time
• More vCPUs = More time slicing
• More time slicing = less accurate data from the OS
• Ignore OS metrics that involve time
• (Disk Occupancy is probably OK)
Running Dormant/IdleVM1
VM1
Fast Clock Real Time Clock
5 Key VMWare Metrics
• CPU MHz
• VM, Host, Cluster
• Ready Time
• Active Memory
• VM, Cluster
• Ballooned Memory
• Host Disk Latency
• Device, Kernel & Queue
Host 2
100%
5 Key VMWare Metrics: CPU MHz vs CPU%
16
CPU
44.8
GHz
Host 1
100% 32
CPU
89.6
GHz
Host 1 Host 2
VMs
80%
5 Key VMWare Metrics:
CPUs with different GHz: VMs
2.4
GHz
Chips
3.2
GHz
Chips
Host 1 Host 2
APP
OS
APP
OS
VM VM
80% CPU Busy80% CPU Busy
106% CPU Busy
VMware Assumption:
Every GHz is equal
Ready Time
• VM wants to process, but can’t
• Accumulated against VM
• More of a stack than a queue
• Contention for CPUs
• Performance impact
How to avoid Ready Time
• Fewer vCPUs per VM
• Monitor: CPU Threads vs vCPUs
• & Ready Time
5 Key VMWare Metrics:
Ready Time
VMWare Processor Scheduling:
Proportion of Time: 4 vCPU VM
VMWare Processor Scheduling:
Proportion of Time: 2 vCPU VM
IdleReadyThreads
VMWare Processor Scheduling:
vCPU Co-Scheduling & Ready Time
1
2
3
4
VM
VM
VM
VM
VM
VM
VM
VM
VM
VMWare Processor Scheduling:
Ready Time - Recap
• Impacts on performance
• Monitor Ready Time as well as CPU %
• Avoid using high vCPU VMs - more vCPUs introduces
the potential of higher ready time
Measuring Memory Capacity: Not that simple
• Tightest headroom in most clusters
• Not just a question of % used
• Other VMWare memory management techniques
• Reservations
• Limits
• Ballooning
• Shared Pages
• Active Memory
• Memory Available for VMs
Measuring Memory Capacity: VM Memory Occupancy
Measuring Memory Capacity: VM Memory Performance
Measuring Memory Capacity: Cluster Memory
Measuring Disk Storage Latency:
At the OS vs. within VMWare
• Why not KB/s or IO Time at the OS?
• Time slicing
• VMware has more detail
• 2 Levels of Interest
• Device
• Kernel
Measuring Disk Storage Latency:
Kernel I/O Processing on processor 0
Measuring Disk Storage Latency
Total vs. Device Latency
Calculating Headroom in VMs
• Makes traditional Capacity Planners uncomfortable
• Easy number for the business to absorb
• Estimates are ok
• Your Mileage May Vary
Allowing
for future
projects
there’s still
17 GHz and
100GB of
RAM
available
There’s space for roughly 10
more VMs before we need
hardware changes
Headroom in Number of VMs
• (Size of the cluster – Used) / Average VM usage
• Do you have to cope with host failures (allow for
failover capacity)?
• Which is the largest host?
• What are you sizing on?
• vCPUs to Core ratio?
• MHz and MB
• … Something else?
• Can you calculate your average VM?
• Prefer Small/Medium/Large?
Data Sources
• From your capacity management tool
• Or collected manually from vCenter
• A good peak
• Not when windows updates being applied
and/or rebooted
• Future project requirements
Good Peak
VMs Available
Including Known Plans
Trend
Roundup
• Ready Time
• Time slicing, vCPUs
• Memory
• Active, Balloon, Swap
• Disk Latency
• Define the size of your cluster
• Average VM usage
• Good peak
• Trend result
AtheneTM - Capacity and Performance Management
AtheneTM
• Relied on by the world’s
leading companies
• Automates the capture and
storage of data and the
creation of capacity reports
• Provides predictive analysis
to help with sizing of
infrastructures today and in
the future
• Includes the mainframe,
IBM i, Unix, Windows,
storage, business, financial
data, and more
Syncsort Professional Services
• Provides capacity
management expertise to
help organization best
manage capacity and
achieve maximum ROI
• Creates capacity reports,
capacity plans, and strategic
recommendations to those
organization needing that
expertise or staff
augmentation
• Leverage Syncsort’s
expertise – our consultants
have decades of experience
AtheneTM Cloud
• World class solution without
the need to provision,
maintain and manage
Athene hardware
• Secure transfer of data from
your environment to
Athene® in the cloud
• Ongoing management of
historical data
• Optional services can help
organizations start or
augment a Capacity
Management process
Why AtheneTM
Less Complexity
Capture and store data from the
entire infrastructure; automate
reporting and alerting; no detailed
system expertise required
Clearer Capacity Information
Identify existing and potential
capacity and performance threats;
prepares and visualizes key data for
time-to-live and bottleneck analysis
Healthier IT Operations
Near real-time alerts identify
problems in all key environments
View latency, transactions per
second, exceptions, etc.
Effective Incident Resolution
Management
Near real-time views to identify real or
potential failures earlier; view detailed
data to support triage repair or
prevention
Higher Operational Efficiency
Enhanced process correlation across
systems; Staff resolves problems faster;
“do more with less”
Eliminate Your Infrastructure
“Blind-Spots”
Get a complete view of Capacity and
Performance – technical, business,
financial – across the enterprise
Better Service,
Significant Cost
Savings at Global
Financial Services
Firm
“Athene allows us to automate our
processes and get the information that’s
vital to our business. We’ve saved
significant money and time using Athene.”
— Senior technical analyst
O B J E C T I V E
• Save money while reducing the
number of capacity-related outages
and slowdowns
• Meet SLAs now and in the future
C H A L L E N G E
• Cost of IT skyrocketing
• Outages put business at risk
• Balancing IT spending and IT
performance
S O L U T I O N
• Athene for capacity planning and
management -- delivered insights and
analysis for the organization to make
sound business decisions
B E N E F I T
• Avoided over 100 outages and saved
almost $4M in spend in 3 years
• Informed hardware purchase decisions
• Faster resolution of incidents
• Implemented more strategic,
sustainable approach
VMware Capacity Management Assessment
• 2 days consulting (1 on-site) with a presentation of
results and recommendations
• Formal CM process recommendations
• ITIL Capacity Plan template
• Gap Analysis for tool capabilities
• A $5000 value, free to the first 5 companies that
respond
• Contact us at rich.fronheiser@syncsort.com
Next Steps
Understanding VMware Capacity

Understanding VMware Capacity

  • 1.
  • 2.
    Introducing Syncsort >7,000 customers 84 ofthe Fortune 100 Customers in >100 countries Headquarters: Pearl River, NY U . S . L O C AT I O N S • Burlington, MA; Irvine, CA; Oakbrook Terrace, IL; Rochester, MN G L O B A L P R E S E N C E • U.K., France, Germany, Netherlands, Israel, Hong Kong & Japan Big Iron to Big Data is a fast-growing market segment composed of solutions that optimize traditional data systems and deliver mission-critical data from these systems to next-generation analytic environments. Global leader in Big Iron to Big Data
  • 3.
    Optimize IntegrateAssure Making criticaldata useful Improve performance and control costs across the full IT environment, from legacy systems to the cloud Connect today’s data infrastructure with tomorrow’s technology – and ensure data quality – powering machine learning, AI and predictive analytics Increase data availability and provide security as the world moves to accessing data in 24x7 timeframes and data protection becomes mission critical
  • 4.
    Optimize Strategic use casesand key partners • Mainframe optimization • Cross-platform capacity management • Data warehouse optimization • Application modernization Integrate • Access & integrate machine data for security & IT ops • Access & integrate legacy data to the data lake • Change data capture • High-performance ETL • Data governance • Customer 360 • Big data quality & integration Assure • High availability • Disaster recovery • Mission-critical data migration • Data security, audit & encryption for IBM i
  • 5.
    Best-in-class product portfolio Athene™ Athene™Cloud MFX® ZPSaver Suite DL/2™ Zen™ Suite EZ Suite MIMIX® Availability™ MIMIX® DR MIMIX® Move™ Quick-EDD/HA iTERA® Cilasoft Audit & Security Suite Enforcive® Security Suite Ironstream™ Ironstream™ for IBM i Ironstream™ Transaction Tracing DMX™ & DMX-h™ DMX™ Change Data Capture MIMIX® Share™ Trillium® Software System Trillium™ Quality for Big Data Trillium™ Cloud Trillium™ Global Locator
  • 6.
  • 7.
    • Why OSMonitoring Can be Misleading • 5 Key VMWare Metrics for Understanding VMWare capacity • How VMWare processor scheduling impacts CPU capacity measurements • Measuring Memory Capacity • Measuring Disk Storage Latency • Calculating Headroom in VMs Topics
  • 8.
    100% | | | 0% 100% | | | 0% Why OS MonitoringCan be Misleading: Determining CPU usage of a VM OS: 50% CPU Busy VMware: 25% CPU Busy Dormant/Idle OS: 50% CPU Busy VMware: 50% CPU Busy VM1 VM2 CPU: 1 Second
  • 9.
    Why OS MonitoringCan be Misleading: OS vs VM CPU, Data Differences
  • 10.
    Why OS MonitoringCan be Misleading: Time Slicing • Cores are shared between vCPUs in time slices • 1 vCPU to 1 core at any point in time • More vCPUs = More time slicing • More time slicing = less accurate data from the OS • Ignore OS metrics that involve time • (Disk Occupancy is probably OK) Running Dormant/IdleVM1 VM1 Fast Clock Real Time Clock
  • 11.
    5 Key VMWareMetrics • CPU MHz • VM, Host, Cluster • Ready Time • Active Memory • VM, Cluster • Ballooned Memory • Host Disk Latency • Device, Kernel & Queue
  • 12.
    Host 2 100% 5 KeyVMWare Metrics: CPU MHz vs CPU% 16 CPU 44.8 GHz Host 1 100% 32 CPU 89.6 GHz Host 1 Host 2 VMs 80%
  • 13.
    5 Key VMWareMetrics: CPUs with different GHz: VMs 2.4 GHz Chips 3.2 GHz Chips Host 1 Host 2 APP OS APP OS VM VM 80% CPU Busy80% CPU Busy 106% CPU Busy VMware Assumption: Every GHz is equal
  • 14.
    Ready Time • VMwants to process, but can’t • Accumulated against VM • More of a stack than a queue • Contention for CPUs • Performance impact How to avoid Ready Time • Fewer vCPUs per VM • Monitor: CPU Threads vs vCPUs • & Ready Time 5 Key VMWare Metrics: Ready Time
  • 15.
  • 16.
  • 17.
    IdleReadyThreads VMWare Processor Scheduling: vCPUCo-Scheduling & Ready Time 1 2 3 4 VM VM VM VM VM VM VM VM VM
  • 18.
    VMWare Processor Scheduling: ReadyTime - Recap • Impacts on performance • Monitor Ready Time as well as CPU % • Avoid using high vCPU VMs - more vCPUs introduces the potential of higher ready time
  • 19.
    Measuring Memory Capacity:Not that simple • Tightest headroom in most clusters • Not just a question of % used • Other VMWare memory management techniques • Reservations • Limits • Ballooning • Shared Pages • Active Memory • Memory Available for VMs
  • 20.
    Measuring Memory Capacity:VM Memory Occupancy
  • 21.
    Measuring Memory Capacity:VM Memory Performance
  • 22.
  • 23.
    Measuring Disk StorageLatency: At the OS vs. within VMWare • Why not KB/s or IO Time at the OS? • Time slicing • VMware has more detail • 2 Levels of Interest • Device • Kernel
  • 24.
    Measuring Disk StorageLatency: Kernel I/O Processing on processor 0
  • 25.
    Measuring Disk StorageLatency Total vs. Device Latency
  • 26.
    Calculating Headroom inVMs • Makes traditional Capacity Planners uncomfortable • Easy number for the business to absorb • Estimates are ok • Your Mileage May Vary Allowing for future projects there’s still 17 GHz and 100GB of RAM available There’s space for roughly 10 more VMs before we need hardware changes
  • 27.
    Headroom in Numberof VMs • (Size of the cluster – Used) / Average VM usage • Do you have to cope with host failures (allow for failover capacity)? • Which is the largest host? • What are you sizing on? • vCPUs to Core ratio? • MHz and MB • … Something else? • Can you calculate your average VM? • Prefer Small/Medium/Large?
  • 28.
    Data Sources • Fromyour capacity management tool • Or collected manually from vCenter • A good peak • Not when windows updates being applied and/or rebooted • Future project requirements
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
    Roundup • Ready Time •Time slicing, vCPUs • Memory • Active, Balloon, Swap • Disk Latency • Define the size of your cluster • Average VM usage • Good peak • Trend result
  • 34.
    AtheneTM - Capacityand Performance Management AtheneTM • Relied on by the world’s leading companies • Automates the capture and storage of data and the creation of capacity reports • Provides predictive analysis to help with sizing of infrastructures today and in the future • Includes the mainframe, IBM i, Unix, Windows, storage, business, financial data, and more Syncsort Professional Services • Provides capacity management expertise to help organization best manage capacity and achieve maximum ROI • Creates capacity reports, capacity plans, and strategic recommendations to those organization needing that expertise or staff augmentation • Leverage Syncsort’s expertise – our consultants have decades of experience AtheneTM Cloud • World class solution without the need to provision, maintain and manage Athene hardware • Secure transfer of data from your environment to Athene® in the cloud • Ongoing management of historical data • Optional services can help organizations start or augment a Capacity Management process
  • 35.
    Why AtheneTM Less Complexity Captureand store data from the entire infrastructure; automate reporting and alerting; no detailed system expertise required Clearer Capacity Information Identify existing and potential capacity and performance threats; prepares and visualizes key data for time-to-live and bottleneck analysis Healthier IT Operations Near real-time alerts identify problems in all key environments View latency, transactions per second, exceptions, etc. Effective Incident Resolution Management Near real-time views to identify real or potential failures earlier; view detailed data to support triage repair or prevention Higher Operational Efficiency Enhanced process correlation across systems; Staff resolves problems faster; “do more with less” Eliminate Your Infrastructure “Blind-Spots” Get a complete view of Capacity and Performance – technical, business, financial – across the enterprise
  • 36.
    Better Service, Significant Cost Savingsat Global Financial Services Firm “Athene allows us to automate our processes and get the information that’s vital to our business. We’ve saved significant money and time using Athene.” — Senior technical analyst O B J E C T I V E • Save money while reducing the number of capacity-related outages and slowdowns • Meet SLAs now and in the future C H A L L E N G E • Cost of IT skyrocketing • Outages put business at risk • Balancing IT spending and IT performance S O L U T I O N • Athene for capacity planning and management -- delivered insights and analysis for the organization to make sound business decisions B E N E F I T • Avoided over 100 outages and saved almost $4M in spend in 3 years • Informed hardware purchase decisions • Faster resolution of incidents • Implemented more strategic, sustainable approach
  • 37.
    VMware Capacity ManagementAssessment • 2 days consulting (1 on-site) with a presentation of results and recommendations • Formal CM process recommendations • ITIL Capacity Plan template • Gap Analysis for tool capabilities • A $5000 value, free to the first 5 companies that respond • Contact us at rich.fronheiser@syncsort.com Next Steps