VMworld 2013
Chris Nakagaki, Cox Communications
Jason Davis, Cox Communications
Himanshu Kumar Singh, VMware
Learn more about VMworld and register at http://www.vmworld.com/index.jspa?src=socmed-vmworld-slideshare
Next-generation AAM aircraft unveiled by Supernal, S-A2
VMworld 2013: Troubleshooting at Cox Communications with VMware vCenter Log Insight and vCenter Operations Management Suite
1. Troubleshooting at Cox Communications with
VMware vCenter Log Insight and vCenter
Operations Management Suite
Chris Nakagaki, Cox Communications
Jason Davis, Cox Communications
Himanshu Kumar Singh, VMware
VCM5034
#VCM5034
4. Agenda
Background
Why vCOPs and Log Insight?
vCOPs
Capacity Planning Demo
Custom Dashboarding Demo
HeatMaps Demo
Log Insight – What is it? How did Cox use it?
Storage Deeper Dive Demo
VM Backup Failures Demo
Q&A
How to Play
5. Background
Cox Communications, Inc. (Atlanta)
100+ Hosts, 3000+ VM’s
2800+GHz Compute Capacity
13.5 TB Memory Capacity
200TB SAN Storage
Chris Nakagaki
vExpert 2011, 2012, 2013
10 years @ Cox
Communications
Started w/ ESX 2.5
@zsoldier
Jason Davis
15 years Windows
Experience
12 years @ Cox
Communications
Started w/ ESX 2.0
Credits?
6. Why vCOPs and Log Insight?
!?
Dynamic Thresholds (vCOPs)
Easy Deployment (vCOPs/Log Insight)
Capacity Planning (vCOPs)
Cloud Suite Cost Savings (vCOPs)
Log Aggregation (Log Insight)
Pretty Pictures (vCOPs)
Because we like to have a strong upper
and lower body.
7. vCOPs – Is there capacity?
1UP!
Network switch maintenance
Multiple hosted production VM’s
potentially affected
Can we place affected hosts in
maintenance mode and maintain
uptime?
9. vCOPs – Is there capacity?
1UP!
Conclusion:
Yes, there is capacity
Network maintenance can proceed
Demonstrated:
vCOPs Capacity Planning Tool
Bottleneck is disk space not anything else
VM’s can continue to run
10. vCOPs - How do we monitor streaming servers?
Sim
Infrastructure
Live streaming event w/ CEO and CTO
Monitor VM’s associated w/ streaming
service live!
Key Metrics?
CPU
Memory
Network
11. vCOPs - How do we monitor streaming servers?
Sim
Infrastructure
12. vCOPs - How do we monitor streaming servers?
Sim
Infrastructure
Conclusion:
vCOPs custom dashboarding is useful!
We demonstrated:
Grouping all streaming VMs as an
application object
Creating a custom dashboard
Focused on 3 Key Metrics
Health Tree to show who’s being lazy
13. vCOPs – Why are VM’s slow?
POW!
Receiving reports that VM’s are
performing slowly.
No immediate discernable pattern
vCOPs to the rescue!
15. vCOPs – Why are VM’s slow?
POW!
Determined one array having severe
latency.
Now questions arise around VMware NMP
To Log Insight for deeper analysis…
20. 20
Problem: Operate and Troubleshoot a Complex System
VMware
Logs
OS and
App Logs
Physical Infrastructure Logs
21. 21
VMware’s Approach to Log Management
Extend Analytics to Log Data
• With vC Ops, VMware introduced an analytics-based operations
management solution for structured data (metrics, KPIs, events, alerts)
• Log Insight extends our analytics-based approach to logs and
unstructured, machine generated data
Easy to Use and Accessible
• Existing solutions are highly specialized and often too expensive
• Log Insight has an intuitive, easy-to-use interface
• Using a predictable pricing model with unlimited amount of log data,
making it accessible to all
Optimized for VMware Environments
• Log Insight comes with built-in knowledge and native support for vSphere
• Integration with vC Ops maximizes ROI and value, providing a complete
cloud operations management solution
1
2
3
22. 22
VMware Cloud Ops Mgmt = Log Insight and vCenter Operations
Cloud Operations Management
• vCenter Log Insight and vCenter
Operations complement each other
• Delivers best of breed capabilities for
performance, capacity, configuration
management
• Tight integration enables seamless
transition from monitoring to
troubleshooting
• Log Insight and VC Ops together provide
a complete solution for
Cloud Operations Management
23. 23
Key vCenter Log Insight Use Cases
IT Operations
• Troubleshooting and Root Cause Analysis
I observed a problem (e.g. slowness), try to troubleshoot the problem and identify the
part of the stack that is responsible (e.g. network delay vs storage)
Follow the trail from vC Ops to logs to get to root cause to an observed problem
• Monitoring Using Logs
Monitor metrics and events (performance & change) that are visible only in logs
Collect all the data in one place without the need for custom parsing, transformation of
data
Security and Compliance
• Security Forensics
• Comprehensive Audit (who, when) / Compliance Reporting
Business Transaction Monitoring
• Collect and correlate transaction logs with infrastructure performance
28. Log Insight – Was Round Robin causing issues?
Were paths being marked dead?
Were the paths remaining dead?
Did the paths come back when
expected?
LET’S SEE ….
Leeroy Jenkins!
29. Log Insight – Was Round Robin causing issues?
Leeroy Jenkins!
30. Log Insight – Was Round Robin causing issues?
Conclusion:
No, round robin was not causing issues!
We Demonstrated:
Paths were marked DEAD.
Paths remained DEAD.
Paths came back ON when expected.
Leeroy Jenkins!
31. Log Insight – What’s causing VM backup failures?
Netbackup has snapshot errors (status
code 156).
Symantec HOWTO70949 article states
there are multiple possible causes.
Which is the most probable cause?
Does VMware have correlating logs?
LET’S SEE …
Paku-Man?
32. Log Insight – What’s causing VM backup failures?
Paku-Man?
33. Log Insight – What’s causing VM backup failures?
Conclusion:
The most probable cause is inability to
create VM snapshots due to timeouts.
We Demonstrated:
Correlating errors in VMware logs stating:
“The guest OS has reported an error during quiescing.”
VMware KB 1018194 provides additional
troubleshooting steps:
Reboot the VM
Reduce I/O
Etc ….
Paku-Man?
35. 35
Other VMware Activities Related to This Session
HOL:
HOL-SDC-1301
Applied Cloud Operations
Group Discussions:
VCM1002-GD, VCM1004-GD
Cloud Operations with Hicham Mourad or Sam McBride
Breakout Session – repeat by demand:
VCM4528 – Thursday, 2 pm Moscone West, room 3001
Tips and Tricks with vCenter Log Insight
Follow us:
@VMLogInsight and get 5 free licenses
Hang with us:
Booth 2020 – Cloud Management Lounge
38. Troubleshooting at Cox Communications with
VMware vCenter Log Insight and vCenter
Operations Management Suite
Chris Nakagaki, Cox Communications
Jason Davis, Cox Communications
Himanshu Kumar Singh, VMware
VCM5034
#VCM5034