More Related Content Similar to Cisco POC Review Shows Automation Drives Efficiency & Growth Similar to Cisco POC Review Shows Automation Drives Efficiency & Growth (20) Cisco POC Review Shows Automation Drives Efficiency & Growth1. © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
POC Results Review
July 2018
2. © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
CIT Initiatives:
3. © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Enhancing Technology to Drive Efficiency &
Growth:ImprovingITOperations
• Software operational
OOTB
• Unified decision making
across teams (storage,
platform, VDI, application,
automation, cloud etc)
• Decision automation to
preventatively maintain
QoS
• Automated analysis of
capacity planning
projects
• Integration into
ServiceNow & vRA
ImprovingCustomerExperience
• Overall infrastructure
estate 64% risk to QoS
• 4 production clusters
>90% risk to
performance
• 941 decisions to
improve end user
experience
• Preventatively deliver
QoS to end users; not
waiting for end user calls
• Assure QoS during
Spectre patch upgrades
DriveInfrastructureEfficiency
• Support Spectre patch in
VDI without additional
hardware
• >1800 decisions to right-
size workloads
• 58% improvement in
resource utilization
• $4.6M in Capex
deferment by 2021
• $1.1M cost savings in
upcoming HW refresh
$3.7M Cost Savings
Decision Analysis
Automation
64% Risk to QoS
4. © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
CWOM- Trusted by 8/10 Largest Financial
Institutions
9
World Leading Financial
Services Firms Rely on
Turbonomic
“Turbonomic helped up with migrations
to our private cloud, and has been
instrumental increasing density while
improving Credit Suisse application
performance.
Turbonomic has also enabled us to
move from an allocation based model,
to a more efficient consumption based
model. We are now extending the use
of Turbonomic in our OpenShift,
CloudFoundry, and Public Cloud.“
Rob Gonzalez
Credit Suisse, Cloud Owner
5. © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
POC ran from 2/26/2018 - 5/30/2018
Deployed across 5 vCenter’s (Test & Prod)
4600 VM’s spread across 146 vSphere Hosts
Enabled placement automation in Non-Prod SQL & NR-Prod clusters
Platform, Infrastructure, End User Computing, Storage & Cloud teams
were involved in POC
Evaluation Overview:
6. © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Improving End User Experience:
11
7. © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
64% Risk to
QoS
12
Improving Customer Experience:
<60 minutes of installing, CWOM
analyzed and identified 941
decisions to prevent performance
issues and removing risks in real
time
Entire Infrastructure State of Health 4 Production Clusters State of Health
VA1Production_NR
-VA1
100% Risk to QoS
NJProduction-
NJ1
100% Risk to
QoS
NJ1Database-NJ1
100% Risk to QoS
NJProduction_NR-
NJ1
90% Risk to QoS
8. © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Improving Customer Experience: NJ1Production-NJ1
Control Utilization Under 80%
Production-NJ1 cluster is experiencing high wait time in the CPU
ready queue on a host which causes high database latency directly
impacting the end user experience.
High risk to QoS, due to memory utilization > 90%
35 decisions to improve the QoS,
reduce the readyque and
alleviate the memory congestion
9. © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Memory Utilization safely controlled
<70%
Improving Customer Experience: Production_NR-
NJ1
Current Health of
Cluster
Future State with CWOM
Unbalanced Memory Utilization
CWOM has identified 52 placement decisions
that need to be taken to deliver QoS; safely
driving up the underutilized hosts and driving
down the highly utilized hosts so all
workloads get the resources they need when
they need them
10. © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
VDI_04_05_06: Spectre Performance Impact
PROBLEM: CIT’s VDI performance suffered from
4/23- 5/4 due to a 30% increase in demand from
the Spectre patch applied during the storage
upgrade
CIT thought they needed to purchase additional
resources in order to restore performance
SOLUTION: No additional compute is
necessary to deliver QoS in this VDI
cluster, if CIT leverages placement
automation
Impact of CWOM controlling VDI cluster with Spectre (30% increase):
No additional resources are needed
Memory controlled <80%
11. © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
VDI_03_04_05: Spectre Performance Impact
16
Storage IO peaking at 100% is
causing latency or a delay to the VDI
end user
Memory and Cpu are peaking above
99% utilization
CWOM would have moved
workloads to better accommodate
the increase in demand and remove
the QoS risk caused by the Spectre
patch
Peak
Utilization
Avg
Utilization
12. © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Respecting Business Policies
17
OOTB CWOM understand HA, Affinity, Anti-Affinity and custom
business policies
CWOM will adhere to these business policies when presenting
decisions related to placement, sizing and capacity.
If a human violates one of CIT’s compliance policies, CWOM will
identify the issue and present a decision on how to restore the
violation
13. © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
48%
unhealthy
42% unhealthy
100% healthy
Automation Impact: Assure QoS while increasing Efficiency
14. © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
19
Improving IT Operations:
15. © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
20
CWO
M
Autonomic
Decision Engine
DBA Team
Network
Team
Platform
Team
Infrastructure
Team
Storage Team
End User
Computing
Cloud Team
CWOM’s software provides a
unified decision making
platform across CIT’s
infrastructure teams
Impact of Cohesive
Infrastructure Analysis:
Improved performance
Improved Operational
efficiency
Increased collaboration
amongst various
infrastructure teams
CWOM provides a cohesive analysis of the
relationships and entities across the
architecture and produces specific decisions to
assure performance is delivered
16. © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
21
Improving IT operations: Focus on Innovation
Performance
MGMT
Migrations (VM
& Storage)
Current Mode
of Operations
Using CWOM
Positive
Impact
Tickets
MTTR
Consistent QoS
Focus on Innovation
Migration Completion
Risk
FTE Involvement (hrs)
17. © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Accelerate Future Hardware Refresh: Automated
Migration
Migration Details:
Objective: Move off end of life hardware onto new hosts
Without CWOM, CIT requires a significant amount of man
hours to manually move VMs to new hosts & storage
With CWOM, CIT eliminates all manual migration process
which saves migration man hours and accelerates project
completion
Benefit:
CWOM will assure performance through entire migration
while understanding the overall capacity of the clusters
Eliminate the guessing game / human error of placement&
sizing of apps across new infrastructure
18. © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Driving Infrastructure Efficiency:
25
19. © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Improving Efficiency: Super Cluster
Using
CWOM
Underutilized
Hosts
Highly
Utilized
Hosts
Super Cluster: Driving cross-cluster migrations based on real-time demand to improve both performance
and utilization which provides additional efficiency improvements
Impact of Super Cluster:
All 1381VMs can run safely on 35 hosts at 75% utilization – without re-sizing VMs.
VM-Host Density increases 41%, enabling CIT to gain 14 hosts worth of additional capacity (support
growing application demands without making a capex investment)
Highly
Utilized
Hosts
Current State, 49 hosts Desired State, 35 hosts
Available Capacity
20. © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Save $1.1M in Upcoming Hardware Refresh:
Desired State, 17 hostsCurrent State, 49 hosts
Desired State, 20 hostsCurrent State, 48 hosts
With CWOM, all 2883 VMs can run safely on 37 hosts
CIT to run on 22 less servers or save $1.1M in the upcoming HW purchase
Simulation of how many new Cisco UCS B200 M5 are needed and which workloads should run where..
21. © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
CWOM Drives Efficiency on New HW
Project: Refresh the EOL hardware in NJ1Production_NR-NJ1 & VA1Production_NR-VA1
Plan:
Production_NR-NJ1 cluster example
CWOM Plan Result: Production_NR-NJ1 cluster example
Cluster Name Current
Host Count
(B200 M3)
New Host
Count (B200
M5)
CWOM Efficiency Gain
NJ1Production_
NR-NJ1
7 3 54% more VM’s can be added
(+ 145 VM’s using CWOM)
VA1Production_
NR-VA1
16 5 25% more VM’s can be added
(+ 164 VMs using CWOM)
22. © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
75k SQL Licensing Savings:
DC Cluster Name
Hosts
Today
VMs
Today
Projected
Hosts
with CWOM
Placement
ONLY
Projected
Hosts
with CWOM
Placement &
Resize
NJ1 Database-NJ1 6 83 5 4
VA1 Database-VA1 5 90 5 4
Total 11 173 10 8
Increasing SQL VM-Host Density enables CIT to save on 3 hosts’ worth of
space, software licensing, power, and cooling
25k/esx host equates to $75k in savings/year
11
10
8
0
2
4
6
8
10
12
Hosts Today With CWOM With CWOM (Resize)
# of SQL hosts in 2018
23. © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Application Aware Infrastructure- Driving Efficiency
CWOM: Application Aware Infrastructure Traditional Infrastructure Monitoring
VM
Application
Cache
Thread Pool
• Understands application
resource utilization
CPU Mem
VM
Application
• Application is a black box
CPU Mem
Web
Srvr
VM
App
Srvr
VM
DB
VM
• Understands application
composition
App
VM
App
VM
App
VM
• Components are a black
box
VM VM VM
• Understands application
performance
response time
?
VM
?
VM
?
VM
• No concept of application
performance
CWOM provides trustworthy decisions as it understands demands from applications such as SQL DB, Oracle, WebLogic, etc., out of the box. Unlike VMware vRealize,
CWOM analyzes the data and provides the best decision with an understanding of the application stack; not just the raw data. No agent or additional management
Packs are required.
24. © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Driving Infrastructure Efficiency:
CWOM identified > 1,800 decision to reclaim resources and allow CIT to run more
efficiently (349 decisions for SQL machines)
• CIT end users/business units request and approve more resources than they actually need
• CIT’s most constrained resource is memory
• Memory congestion is causing a diminishing end user experience as additional resources
are needed to fulfill the application demands
• Reclaiming unused memory from these vm’s will allow CIT to repurpose these resources to
vm’s and clusters that are overutilized and suffering from performance as a result
Ability to reclaim 1.4TB unused
memory or ~3 servers worth of
memory
$45k/server; $135,000 cost savings
using CWOM to automatically reclaim
unused capacity
25. © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 33
29
39
43
0.00
5.00
10.00
15.00
20.00
25.00
30.00
35.00
40.00
45.00
50.00
Current With CWOM With CWOM (Resize)
VM/Host Density Improvement with CWOM
144
173
208
249
136 136
152
182
128 128
138
165
0
50
100
150
200
250
300
2018 2019 2020 2021
Projected Growth - # of Hosts
Current Hosts with CWOM Hosts with CWOM (Resize)
34% Improvement
with Placement
48% Improvement
with Sizing &
Placement
CWOM: Driving Infrastructure Efficiencies
Improve density by 48% (with resizes)Run on 84 less servers by 2021 ($4.2M cost deferment)
26. © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Driving Infrastructure Efficiency- Super Clusters
34
144
173
208
249
117 122
146
175
110 110
131
157
0
50
100
150
200
250
300
2018 2019 2020 2021
Projected Growth - # of Hosts
Current SuperCluster with CWOM SuperCluster with CWOM (Resize)
29
41
46
0.00
5.00
10.00
15.00
20.00
25.00
30.00
35.00
40.00
45.00
50.00
Current SuperCluster With CWOM SuperCluster With CWOM
(Resize)
VM/Host Density Improvement with CWOM
41% Improvement
with Placement
58%
Improvement
with Sizing &
Placement
Run on 92 less servers by 2021 ($4.6M cost deferment)
• An additional 8 less servers dismantling cluster
boundaries
Improve density by 58% (with resizes)
• 10% additional increase dismantling cluster boundari
27. © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 35
$0 $0
$495,702
$895,702
$1,174,842
$1,874,842
$1,399,810
$2,249,810
$0
$500,000
$1,000,000
$1,500,000
$2,000,000
$2,500,000
Net Cost Deferment
with CWOM
Net Cost Deferment
with CWOM (Resize)
CWOM Net Saving
2018 2019 2020 2021
CWOM: Driving Infrastructure Efficiencies
0
$571,418
$1,195,702
$1,795,702
$1,474,842
$2,224,842
$1,749,810
$2,649,810
0
500000
1000000
1500000
2000000
2500000
3000000
Net Cost Deferment
with CWOM SuperCluster
Net Cost Deferment
with CWOM SuperCluster (Resize)
CWOM Net Saving- Super Clusters
2018 2019 2020 2021
Using Super Clusters, $2.64M Net savings by 2021
after implementing CWOM
• $400k additional savings dismantling cluster
boundaries
$2.24M Net savings by 2021 after implementing CWOM
28. © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
36
Licensing: Total: $1,840,775.63
3-Year
License
Services
29. © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Recommended
Adoption
Duration
Foundational Core Advanced
Goals and
Projects
• User Training Completed
• Production Install and Configuration
• Best practices platform implementation
• Storage and Server Visibility
• VM Placement Automation enabled
• Identify resource reclamation opportunities
• Key Report and Dashboards Created
• H/A Review – risks and opportunities
• Capacity Planning and Forecasting (understand
headroom and demand)
• Areas with excess capacity and areas with risks
due to supply shortage
• Risk from Compute, storage and VM perspective
• Identify VM right sizing opportunities and its impact
• Storage control and resizing
• Migration Planning
• HW refresh planning
• Decommission zombie/idle/powered off VMs
• Plan for Superclusters
• Execute Cluster merge
• ServiceNow Integration – Change Management
• VM right sizing automation on-prem (Optimizing
infrastructure)
• Right sizing planning and automation in public cloud
• Optimize public cloud spend
• Initial Workload Placement Workflow Enablement (vRA,
ServiceNow)
• Container Visibility and Optimization (Kubernetes, PCF,
OpenShift)
Value Outcomes
o Staff enabled
o Platform adopted into Customer Operations and
Planning workflows
o Environments prepared for Optimization
o Improved performance
o Decrease infrastructure, software and labor spend
without risking performance
o Savings by more efficient use of infrastructure
o Expose any H/A violations
o Capacity on-demand based on app fluctuations
o Plan to reduce infrastructure footprint and cost
o Public cloud cost reductions
o On-prem and public Cloud visibility
o Plan for increased infrastructure utilization
o Safely accelerate refresh
o Reclaim wasted resources (X VMs, X resources)
o Automated change management process
o Performance Improvement/Cost Reduction
o On-prem cost reductions
o Operational cost reduction
o Optimization of orchestration workflow to assure
service delivery and performance
o Continuous optimization of performance and cost into
emerging technologies
Progress
/Metrics
o Cluster health markers improved (delay decreased,
peaks and averages converged)
o Metrics tracked pre/post automation - increased
availability, VM/host density increase, reduction in Host
Utilization risks, ready queue congestion reduction
o Wasted storage files, reclaimable mem, memory
reservation, CPU, CPU reservation
o CPU/Mem/Storage Utilization
o Number of H/A violations
o Headroom Analysis: VM Headroom/Host Suspension
o OpEx savings – Ops savings from reduced alerts,
tickets and MTTR
o Identify wasted storage files, reclaimable mem, memory reservation,
CPU, CPU reservation
o Document resources allocated & reclaimed and OpEx savings
associated; reduction in monthly cloud bills
o Latency reduction; reclaimed resources
o Identify cluster merge opportunities
o CapEx savings – defer spend
o OpEx savings – software reduction, hours saved, cost per ticket
reduction, M/S reduction, reduction in monthly public cloud bills
o VMs and associated resources reclaimed (Mem, CPU, storage)
o Document resources allocated & reclaimed (Mem,
CPU, Storage)
o Labor hours saved = ($/hr); Reduction in MTTR
o Hosts Reduced (CapEx reduction/deferment)
o Container/VM density increase; Reduction in
Container utilization risks; Reduction in VM utilization
risks; Reduction/Prevention of application issues;
Reduction in MTTR (cost per minute)
Roadmap to Operationalizing CWOM:
Editor's Notes Presentation from CEO & CFO Feb 2018 Outcomes from POC that align to IT initiatives
Leverage decision automation to improve operational efficiency, allowing the Platform team to focus on innovation and infrastructure improvements
Preventatively assure performance through real time automation
Automate analysis of capacity planning projects
Deliver consistent and performance to end users
Ability to integrate into ServiceNow & Puppet to determine where to deploy net new workloads
Automatically analyze and reclaim resources during scheduled change windows
Placement automation increases efficiency upwards of XYZ
Plan for VDI, Spectre Patch, hardware refresh
Implement Super Clusters to maximize available resources
Integrate into ServiceNow & vRa to enable scalability and leverage automation
It installs within 20 minutes or less through a virtual machine and single OVA file. Within one hour of installation, it begins to provide specific actions to improve performance. Over the next 72 hours it completes a full demand to dully the demand pattern of workloads. At the end of the 72 hours, it provides actions that will improve efficiency, such as right sizing virtual machines or cloud instances.
WOM includes full stack visibility to help data center staff understand its decisions, but the goal is to elevate people from the routine tasks of workload management. WOM operates in real time, to continuously ensure predicable performance across the IT environment.
With decision-automation, resources and infrastructure can be dynamically adjusted based on real-time workload demand.
But these are extremely complex decisions at scale.
Let’s consider the demands of the workload—they have to be performant (which means getting the resources they need when they need them), they should minimize costs (don’t use more resources than they need), and they have to be compliant (licensing, data sovereignty, affinity/anti-affinity, etc.)
These are all tradeoffs that have to be made in real time all the time. The only way to be successful here is if workloads can self manage (because no human can make these tradeoffs at scale and in real time).
Workloads need to make the decisions and these decisions come down to:
Initial Workload Placement
Increase Resources
Decrease Resources
Move Workload
Retire Resources
In other words, placement, scaling, and capacity decisions. How do you automate decision-making so that the stack can dynamically adjust to changing demand? So that workloads can make those tradeoffs?
You need the right abstraction so that the very real interdependences between workloads and across the stack are captured.
You need analytics to make the right decisions
And you need automation to execute those decisions—to execute the right placement, scaling, and capacity actions at the right time all the time.
WOM abstracts relationships in the stack into a market of buyers and sellers
It’s analytics use economic principles—specifically supply, demand, and price—to make the right decisions.
It works through the APIs of what’s already in your environment to pull the data to make those decisions and execute them.
This is all about making sure that workloads get the resources they need when they need them.
Let’s dig into why we need to automate decisions—what’s the difference between process automation and decision automation?
When you just automate processes, problems are typically address after alerting—it’s reactive.
It’s also more labor intensive and we’re operating in a world where more data is just more noise.
By adding decision automation problems can be addressed before alerting—it’s preventative.
There’s little to no human intervention—again, because software is able to make decisions.
And, because software can scale with massive environments and massive amounts of data, more data means better decisions (not more noise).
The most important difference is that when you add decision automation you can drive the environment towards continuous health—software can continuously determine what needs to be done when to drive performance, efficiency, and compliance at scale in dynamic environments.
NJ1\Production-NJ1: 3/20 Health The goal of enabling placement automation in this cluster is to safely drive up the underutilized hosts and drive down the highly utilized hosts so all workloads get the resources they need when they need them
CWOM has identified 52 placement decisions that need to be taken to deliver QoS
Unlike DRS, CWOM analyzes the impact and outcome of each decision assuring it’s a net positive impact on the environment (assuring each placement move has enough capacity at the end destination and assuring the other vm’s are not negatively impacted a well).
In this planning exercise, CIT simulated the impact of the Spectre patch in the 04_05_06 VDI cluster which caused the demand to increase by 30%.
Allowing software to make placement decisions in real time will assure performance and keep the utilization ≤ 80% (CIT’s targeted state)
Noticing risk to QoS and compute (mem/cpu) storage io causing latency, peaking Analysis- host section Based on a 29:1 current (no cwom) density
Modeling example – UCS B200 M5 / 1TB RAM / Xeon Gold 6152 (2 x 22 cores) Production_NR-NJ1 cluster example with UCS B200 M5 / 1.5TB RAM / Gold 6152
Simulated additional headroom CIT can gain with automation remaining within cluster boundaries.
All results abide by HA as N-1 rules, or 75% utilization max
SQL is licensed per core – 320 cores between 2 production clusters, currently with 11 physical hosts
does not leverage a chargeback model today 20% growth rate 50k- capex input 20% growth rate