Carolina mini-cl-2014

661 views

Published on

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
661
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
26
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Carolina mini-cl-2014

  1. 1. Local Edition
  2. 2. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Local Edition Cisco UCS Troubleshooting and Best Practices Jose Martinez Technical Leader Services @jose_at_csco
  3. 3. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Agenda •  Cisco UCS Troubleshooting ‒ Things all UCS admins should know and do ‒ Case Studies : Right out from TAC queue ‒ DIY : What resources are available for you? •  Cisco UCS Best Practices ‒ The Basics ‒ Wait… You are upgrading this weekend? ‒ Day to Day Operations •  Miscellaneous •  Q&A 3
  4. 4. Cisco UCS Troubleshooting Basics
  5. 5. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Cisco UCS Troubleshooting •  Most of the issues that affect the Cisco UCS are investigated via the logs collected in the system •  Even investigation for performance issues or authentication issues start with the logs •  Logs are collected in every component •  Depending on the issue multiple logs may be needed •  Collect UCSM and Chassis tech-support as soon as possible •  Current behavior is to overwrite the last log ‒ Changing with upcoming release : CSCuj56943 The Basics 5
  6. 6. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Cisco UCS Troubleshooting •  The size of the logs can be modified as well as the level of logging ‒ Default log level is info ‒ Default size is 5232880 •  Change these values Scope monitoring à sysdebug à mgmt-logging The Basics 6
  7. 7. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Cisco UCS Troubleshooting UCSM Internal Overview 7 GUI   CLI   Standards   (SNMP,  IPMI)   XML  API   Management   Informa;on  Tree   Data  Management   Engine  (DME)   Applica;on  Gateways   (AG)   Managed  Endpoints  
  8. 8. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Cisco UCS Troubleshooting Logs Example
  9. 9. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Cisco UCS Troubleshooting •  Scope – allows admin to enter into the specified mode ‒ Examples : scope adapter , scope bios-settings , scope org ‒ Each mode has its own set of commands ‒ Commands allowed depend on assigned role and locale (RBAC) ‒ Configuration changes are usually allowed in this mode (commit-buffer) •  Connect – allows admin to connect to a specific component ‒ Examples : connect adapter , connect iom , connect local-mgmt ‒ This allows better troubleshooting options ‒ This does not allow for configuration options Navigating the CLI 9
  10. 10. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Cisco UCS Troubleshooting •  Scope example : Navigating the CLI •  Connect example : 10
  11. 11. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Cisco UCS Troubleshooting •  The connect IOM only works to the directly attached IOM •  To connect to the other side : ‒ connect local management other side ‒ connect iom <chassis #> Navigating the CLI 11 FI-­‐B   Primary   admin   FI-­‐A   Secondary  
  12. 12. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Cisco UCS Troubleshooting •  Best location for traffic troubleshooting •  Debug capability •  Display switch running config (non-server config) •  Access to ethanalyzer •  Typical datacenter switches commands : ‒ Show interface brief ‒ Show vlan ‒ Show mac address vlan <vlan id> ‒ Show port-channel summary ‒ Show lacp neighbor Navigating the CLI – connect nxos 12
  13. 13. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Cisco UCS Troubleshooting •  connect adapter X/Y/Z ‒ X is chassis # ‒ Y is blade # ‒ Z is adapter # Navigating the CLI – connect adapter 13
  14. 14. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Cisco UCS Troubleshooting •  The vnic table provides logical interface (lif) IDs that can be use to collect more information via lif and lifstats output Navigating the CLI – connect adapter 14
  15. 15. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Cisco UCS Troubleshooting •  Logs from the ASIC can be seen directly in the adapter using the show-log command Navigating the CLI – connect adapter 15
  16. 16. Cisco UCS Troubleshooting Case Study #1
  17. 17. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Cisco UCS Troubleshooting •  After a Cisco UCS upgrade, fibre channel performance was severely degraded •  There were many ABTS observed at in the VIC adapter logs : Case Study #1 17
  18. 18. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Cisco UCS Troubleshooting •  Why the ABTS? Case Study #1 18
  19. 19. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Cisco UCS Troubleshooting •  Errors seen indicate a condition of buffer starvation (no credits) and Rx of traffic when not expected •  No drops or congestion in the upstream •  Lets look back at the VIC ASIC logs for more information Case Study #1 19
  20. 20. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Cisco UCS Troubleshooting •  FC traffic is affected on the network as a result of Pause PFC/PG negotiation occurring in the wrong order. Pause configuration on the adapter is incorrect resulting in adapter sending traffic when it is told to stop (Pause) by the IOM. This extra traffic is dropped resulting in the aborts seen. •  Defect tracking : CSCuh61202 •  Resolved in : 2.2(1b) and 2.1(3a) Case Study #1 – Conclusion 20
  21. 21. Cisco UCS Troubleshooting Case Study #2
  22. 22. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Cisco UCS Troubleshooting •  Customer has two almost identical domains •  Both domain allow access via the CLI •  One UCS domain running 2.1(1a) or 2.1(2a) was unable to login via GUI •  Customer tried the following to resolve the issue: ‒ Create a new local user account ‒ Cluster lead failover ‒ Reboot of each FI one at a time ‒ Upgrade Case Study #2 – UCSM Login Issue
  23. 23. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Cisco UCS Troubleshooting •  Confirm that authentication is working •  Debugs available in the NXOS level to confirm authentication ‒ debug aaa all •  Error says that user is not known Case Study #2
  24. 24. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Cisco UCS Troubleshooting •  Confirm user via sam_techsupportinfo •  There is no tac.testadm •  In the many iterations of testing a simple mistake was made Case Study #2
  25. 25. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Cisco UCS Troubleshooting •  A different approach is to perform a trace capture using ethanalyzer •  Collects all data in/out mgmt port (like a sniffer) •  Captures default to summary Case Study #2
  26. 26. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Cisco UCS Troubleshooting •  Ethanalyzer can also be set for detail output display •  Hypertext information visible Case Study #2
  27. 27. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Cisco UCS Troubleshooting •  After credentials were corrected the problem was still present •  Two options to see what is happening with Java is collecting info thru the Java Console : Case Study #2
  28. 28. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Cisco UCS Troubleshooting •  Collecting the information from the client log •  C:Users<username>AppDataLocalLowSunJavaDeploymentlog .ucsm Case Study #2
  29. 29. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Cisco UCS Troubleshooting •  Centrale (client) Logs Case Study #2
  30. 30. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Cisco UCS Troubleshooting •  One more tool… Visore! •  Review the information for the listed classID directly from the UCSM database •  http://<vip.address>/visore.html Case Study #2
  31. 31. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Cisco UCS Troubleshooting •  Visore Output Case Study #2
  32. 32. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Cisco UCS Troubleshooting •  After looking at the different classId requested via Visore it was found that classId == vmInstance was causing the problem •  The UCS was configured with VM-FEX feature •  There were VMs that had special characters in their VM name which were preventing the client to parse the XML properly this lead to user not being able to connect •  These characters were recognized as escape sequence in XML •  Moving the VM from the dVS into the vSwitch allowed access to GUI again •  Defect tracking : CSCui80882 •  Work-around was to rename the offending VM names Case Study #2 – Conclusion
  33. 33. Cisco UCS Troubleshooting Case Study #3
  34. 34. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Cisco UCS Troubleshooting •  After upgrade of Cisco UCS to 2.2(1b) IP communication from some blades in the domain was not working to the Cisco UCS Manager •  In some other cases IP communication was ok to the Cisco UCS Manager, but not with both Fabric Interconnect •  This caused problems with applications running in those blades that need communication to the UCSM (for example, UCS Director or SNMP tools) Case Study #3 34
  35. 35. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Cisco UCS Troubleshooting •  A PING test was executed to understand what was the pattern (what was common between those that failed) •  The test revealed the following : ‒ Blade traffic switched thru Fabric B, destination mgmt0 of FI-B (same VLAN as blade) -> FAIL ‒ Blade traffic switched thru Fabric B, destination mgmt0 of FI-A (same VLAN as blade) -> OK ‒ Blade traffic switched thru Fabric A, destination mgmt0 of FI-B (same VLAN as blade) -> OK ‒ Blade traffic switched thru Fabric A, destination mgmt0 of FI-A (same VLAN as blade) -> FAIL Case Study #3 35
  36. 36. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Cisco UCS Troubleshooting •  Looking at the mac address learned in the upstream switches we found no errors •  An ethanalyzer capture in the mgmt 0 of the Fabric Interconnect showed ARP request from Server reaching the Fabric Interconnect mgmt interface and FI sends ARP reply •  Is the Fabric Interconnect dropping the frame? Case Study #3 36
  37. 37. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Cisco UCS Troubleshooting •  What is causing the drops? •  RPF – Reverse Path Forwarding increasing!! Case Study #3 37
  38. 38. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Cisco UCS Troubleshooting •  Why is that counter increasing? •  Is that mgmt mac address seen/learned somewhere else? Case Study #3 38
  39. 39. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Cisco UCS Troubleshooting •  The use of the same mac address by the mgmt and the FCF (Fibre Channel Forwarder) results in the Fabric Interconnect not forwarding the frames to the vethernet •  This only happens when the blade traffic is switched by the same FI that it is trying to connect to •  Going thru an L3 device (Router) will change the mac address, avoiding this issue •  Defect tracking : CSCun19289 •  Workaround : Configuring the mgmt0 in a different VLAN than the blades will force traffic thru L3 device Case Study #3 – Conclusion 39
  40. 40. Cisco UCS Troubleshooting On Your Own
  41. 41. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Cisco UCS Troubleshooting •  Standalone Offline Diagnostics for UCS Compute Blades •  Not a UCS Manager Solution •  Blade has to boot from Server Diagnostics ISO •  ucs-blade-server-diags.1.0.1a.iso released Oct 2013 •  Available from Cisco.com : Cisco UCS B-series Blade Server Software •  Independent of any UCS Manager version •  ISO image can be booted from vMedia, USB or SD Card DIY – Blade Diagnostics
  42. 42. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Cisco UCS Troubleshooting •  Use Cases ‒ Sanity check after a hardware fix or replacement ‒ Burn-in before deployment in production •  GUI and CLI Interface Options ‒ GUI has same look and feel as SCU Diagnostics for C-Series ‒ memTest86+ integrated in the tool •  Server Inventory, Sensor Information and Logs available from tool •  Log files can be saved to a USB device attached DIY – Blade Diagnostics 42
  43. 43. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Cisco UCS Troubleshooting •  Diagnostic Tests ‒ Memory : options include memory size to test and number of loops ‒ Adapter ‒ CIMC : test communication to CIMC ‒ CPU : stress, stream, cache and register ‒ Storage : S.M.A.R.T. report and LSI megaCLI controller test ‒ Video : GUI only DIY – Blade Diagnostics 43
  44. 44. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Cisco UCS Troubleshooting DIY – Blade Diagnostics 44
  45. 45. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Cisco UCS Troubleshooting •  Cisco Communities ‒ Tech Talks ‒ Best Practices ‒ Platform Emulator ‒ Script Samples DIY – Resources Available
  46. 46. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Cisco UCS Troubleshooting •  Support Forums ‒ Technical discussions ‒ TAC and BU Participation ‒ Partners Participation ‒ Other Customers Like You DIY – Resources Available
  47. 47. Cisco UCS Best Practices The Basics
  48. 48. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Cisco UCS Best Practices •  Hardware & Software Support Matrixes ‒ Support matrix and guidelines are established by the Data Center Group (Development & QA teams) ‒ TAC adheres to the releases listed in those documents/tools ‒ Most common “out of matrix” FW? ENIC and FNIC Drivers ‒ Most common question : Does TAC support X combination? ‒ Biggest concern : Does running X combination invalidates my support contract? The Basics 48
  49. 49. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Cisco UCS Best Practices The Basics
  50. 50. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Cisco UCS Best Practices The Basics
  51. 51. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Cisco UCS Best Practices •  Release Notes ‒ Mixed version support matrix ‒ Minimum version for the different hardware and features ‒ Catalog PID updates ‒ List of new features ‒ List of resolved caveats (fixed bugs) ‒ List of open caveats (bugs in the wild) ‒ Lots of transparency (latest release has a total of 12 resolved and 5 open caveats) •  Release Bundle Content ‒ Started in 2.0(1) release ‒ All related firmware and BIOSes for all UCS components associated with the The Basics 51
  52. 52. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Cisco UCS Best Practices •  Mixed Release Support Matrix The Basics 52
  53. 53. Cisco UCS Best Practices Upgrades
  54. 54. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Cisco UCS Best Practices •  TAC can assist with questions related to the upgrade procedure : ‒ Am I following proper procedure? ‒ Do I understand a caveat properly? •  TAC can review any faults currently present in the system ‒ Do not upgrade a system with Critical/Major/Minor Faults! •  TAC can confirm if a particular defect is fixed in the target version Upgrades 54
  55. 55. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Cisco UCS Best Practices •  Backup your systems! ‒ Many customers do not have a backup of their Cisco UCSM •  The system database residing in the Fabric Interconnect has the configuration for the entire system (pools, service-profiles, vlan, vsan, etc) •  There are four types of Backup options in UCSM. “Logical Configuration” backups can be executed on regular basis to keep up with any changes in service profiles, VLANs, VSANs, pools or policies. The “System Configuration” backup should be executed every time there is changes to username, roles, locales or system IP address •  Store backups outside the Cisco UCS domain Upgrades 55
  56. 56. Cisco UCS Best Practices Day to Day Operations
  57. 57. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Cisco UCS Best Practices •  Starting in 2.1(1) the UCSM offers the capability to schedule automated backups Day to Day Operations Full  State   All  Configura;on  
  58. 58. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Cisco UCS Best Practices •  Enable Smart CallHome (SCH) •  Administrators should not share “admin” userID. Instead they should use their own userID and take advantage of RBAC feature •  Scripts should not use “admin” userID to login •  More than 1 domain? Take advantage of Cisco UCS Central •  SDN ready? Yes we are! Programmable Infrastructure thru XML APIs •  Collect tech-support as soon as a problem is reported Day to Day Operations 58
  59. 59. Miscellaneous
  60. 60. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Register for Cisco Live – San Francisco Cisco Live - Orlando May 18 – 22, 2014 www.ciscolive.com/us 6060
  61. 61. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public Cisco Live San Francisco •  BRKCOM-3008 Unraveling UCS Manager Features, Policies and Mechanics •  BRKCOM-2006 Cisco UCS Administration and RBAC •  LTRVIR-2999 Deploying Nexus 1000v on ESXi, Hyper-V and OpenStack 61

×