Automating IT Analytics to Optimize Service Delivery and Cost at Safeway - A #GartnerDC Presentation

Automating IT Analytics to
Optimize Service Delivery and
Cost at Safeway
David Wagner – TeamQuest Advocate
Chris Lynn - Safeway Capacity
Manager/Performance Analyst
December 11, 2013

TeamQuest and the TeamQuest logo are registered trademarks in the US, EU and
elsewhere.
All other © 2012 TeamQuest Corporation. the property
Copyrighttrademarks and service marks are All Rights of their respective owners.
Reserved.

Agenda
• TeamQuest Perspectives
• Safeway Experiences

Copyright © 2012 TeamQuest Corporation. All Rights
Reserved.

Desired State:
Continuous Optimization
• Continuously financially-optimized IT environment
– Always know where and when performance problems will
affect the bottom line
– Identify cost and performance inefficiencies in support of
business processes and eliminate them

• Continuously optimized customer experience
– Understand when, where and why customer experiences fail
– Resolve, predict and prevent customer dissatisfaction issues

Reserved.

Continuous IT Optimization
Results
• Significantly reduce initial CapEx, and ongoing OpEx
– Make, and keep making, more money!

• Optimize resources for systems of customer engagement
• Deploy and refresh new applications faster
– e.g. Retailers need to capture their share of mobile commerce as it
grows from $6 to $31B (2016)

• Respond faster to business spikes
• Prevent business impacting outages and slowdowns

Reserved.

How :
Aligned Business and IT Analytics
Big Data Collection
Underlying IT
Infrastructure

Outage
Management

Customer
Operations

Distribution
Automation

Asset
Management

Services

Enterprise IT
Optimization
• Correlate
business & IT
performance
• Insight into how
business processchanges impact IT

Applicatio
ns

• Understand and
optimize IT costs
by business
unit/process and
technology

Server/OS

Network

• Insight into
business
performance
across technology
stack

Storage

Business Intelligence

Aligned Business and IT Intelligence
Reserved.

5

TeamQuest’s Approach:
Federated IT Analytics
• Federates existing data/information into purposedesigned optimization process
–
–
–
–

Technology data (e.g. server, network, storage, etc.)
Service data (catalog, metrics, tickets, etc.)
Financial data
Business data (analytics, KPIs, plans, TXNs, etc.)

• Automates IT analytics across all data sources
– Flexible and adaptive to dynamic environments
– Raw (commodity) data -> actionable information for IT

• Single-pane-of-glass IT Optimization

Reserved.

Result:
Automated Application Financial Optimization
• Continuous Optimization
– Pre-purchase validation
– Re-purposing
– Consolidation

• Fully automated, low
cost

• Integrated with Risk and
Service Management
• Changed new VMware
Clusters from every 6
weeks to:
– None for 18+ months…
– Consolidated 1000’s of
VM’s (Saving $M)

Reserved.

Result:
IT Optimized... Future Assured
• Continuous IT Optimization
– Peak IT Performance
– Ideal Resource Capacity
– Optimized Resource Costs

• Automated IT Analytics
– Predictive
– Federated

• Aligned IT and Business Management
– Performance
– Capacity
– Financial

Reserved.

Automating IT Analytics to Optimize Service
Delivery and Cost at Safeway
Chris Lynn - Safeway
December 11, 2013 2:30-3:00

Topics

Background
Server Storage Forecasting and Optimization
Application Capacity Analysis – Dashboards to Details
Business KPI Analytics
Vmware High Level Analytics

Background
• Manager of Safeway Capacity and
Performance Team
• ChrisLynn@usa.com
• http://www.linkedin.com/pub/chris-lynn/2/65/309/

• Environment Supported
• ~4000 servers (~1700 physical)
• ~200 significant applications
• Unix, Windows, Mainframe, Teradata, Tandem,
etc.
• Thousands of internal IT Customers, and
millions of shoppers

Server Storage Forecasting and
Optimization
•
•
•
•
•

Optimizing Availability (reducing incidents)
Optimizing Enterprise Capacity
Reducing Risk
Automated replacing Manual
Embedded expertise

Storage Capacity Incident Avoidance:
Old (on server) Manual Method
$ df
Filesystem
size used avail capacity Mounted on
/dev/vx/dsk/rootvol
3.9G 2.5G 1.4G 65% /
/dev/vx/dsk/var
1.9G 832M 1.0G 45% /var
swap
9.5G 16K 9.5G 1% /var/run
swap
1.0G 2.6M 1021M 1% /tmp
/dev/vx/dsk/patrol
1.9G 1.5G 227M 88% /appl/patrol
/dev/vx/dsk/home
486M 347M 91M 80% /export/home
/dev/vx/dsk/openv
1.4G 583M 717M 45% /usr/openv
/dev/vx/dsk/performdg/usrlocal
1.9G 542M 1.3G 29% /usr/local
/dev/vx/dsk/performdg/oracle
3.9G 1.1G 2.8G 28% /appl/oracle
/dev/vx/dsk/performdg/apache
128M 27M 95M 22% /appl/apache
/dev/vx/dsk/rootdg/opswarelv
241M 234K 216M 1% /var/opt/opsware
/dev/vx/dsk/performdg/b1home
12G 432M 11G 4% /appl/perform/best1home
/dev/vx/dsk/performdg/spool_apache 256M 145M 104M 59% /appl/spool/apache
/dev/vx/dsk/performdg/manage
180G 122G 54G 70% /appl/perform/manager
/dev/vx/dsk/performdg/workspace 90G 59G 29G 68% /appl/perform/workspace
/dev/vx/dsk/performdg/collect
480G 442G 37G 93% /appl/perform/collect

Automated Storage Forecasting:
File System Exceptions
• Weekly automated prioritized scan
• 4500 servers
• 45000 filesystems
• Focused on meaningful exceptions
• A proactive shift from find to fix
• Was – 50 minutes looking for
potential problems, 10 minutes to
fix
• Now- 5 minutes looking for
potential problems, 55 minutes
fixing them
• Impossible to do manually

New Automated (global exception)
File System Forecast Analytic Details
• Complex multi-level thresholds
1. Is file system utilization above 90% AND growing by >0.2% for the interval?
2. Is file system utilization above 75% AND growing by >2% for the interval?
3. Is the file system utilization above 15% AND growing by >15% for the interval?
4. Is /appl/patrol above 90% AND growing for the interval?

•
•
•
•
•
•
•
•
•
•

Individual exclusions and special cases
Physical and virtual in same report, but can be treated uniquely.
Sorted by date/time most likely to fill up
Show all candidates for a single server together (sorted by highest one),
minimize the time for operations to respond
Includes historical trend compared to just a point in time (e.g. df)
Forecast utilization trend into the future (multiple statistical options)
more than 24 hours of data to avoid temp FS
must have recent data to avoid shutdown servers
final measured number not below threshold
if final number >99.5% catches the very full fs that might not be growing.

Executive Capacity Dashboards
Capacity Risk Indicators

Stressed
Highly Stressed

Under Used

Well Used

Application Capacity Dashboards
Capacity Risk Indicators
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%

Highly Stressed

Stressed

Well Used

Under Used

Application Capacity Analysis
•
•
•
•

Automated Application Triage
All relevant metrics
Embedded expertise
Enterprise perspective of true
capacity
Capacity Risk Candidates
(OS--#of systems)

100%
50%
0%

Under Used
Well Used
Stressed
Highly Stressed

Integration With Business Metrics
System/Platform capacity data:
• Physical servers
• Virtual servers
• Tandem capacity systems
• Teradata capacity systems
• Datacenter facilities
Business perspective:
• Business transaction volumes
• Resource utilization

Vmware Aggregate Capacity

Aggregate shows the worst individual status

Vmware Aggregate Capacity

Aggregate shows the worst individual metric status

Lessons Learned/ Value Gained
• Reduced service risk
• More proactive less reactive
• Established a baseline to optimize capacity, and a
mechanism to measure the progress
• Business and IT alignment
• Performance and capacity to the business
• Management and technical personnel
• Launch slowly in phases to not overwhelm the groups
• People really do care about formatting and color
choice, not just content

Automating IT Analytics to Optimize Service Delivery and Cost at Safeway - A #GartnerDC Presentation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to Automating IT Analytics to Optimize Service Delivery and Cost at Safeway - A #GartnerDC Presentation

Similar to Automating IT Analytics to Optimize Service Delivery and Cost at Safeway - A #GartnerDC Presentation (20)

More from TeamQuest Corporation

More from TeamQuest Corporation (11)

Recently uploaded

Recently uploaded (20)

Automating IT Analytics to Optimize Service Delivery and Cost at Safeway - A #GartnerDC Presentation

Editor's Notes