NX-OS: High Availability
NX-OS
SAN-OS
IOS
NX-OS modular architecture
Linux Kernel
BGP
OSPF
PIM
TCP/UDP
IPv6
STP
HSRP
LACP
etc
HA Manager
Restart process!
NX7K Data ...
NX-OS : High Availability
www.silantia.com3
Highly granular modularity in software components
Provides better fault isolat...
NX-OS modular architecture
www.silantia.com4
 Message transaction service: A Interprocess communication
channel
 Persist...
Non Stop Forwarding or Graceful restart
www.silantia.com5
 NX-OS has clear control and data plane separation.
 Allows Ne...
Graceful Restart for OSPFv2
www.silantia.com6
 When OSPFv2 needs to perform a graceful restart, it sends a
link-local opa...
Graceful Restart for OSPFv2
www.silantia.com7
Stateful restart is used in the following scenarios:
• First recovery attemp...
Graceful Restart for EIGRP
www.silantia.com8
The graceful restart-capable router uses Hello messages to notify its neighbo...
NX-OS Troubleshooting
 Always gather problem specific show techs…..
N7010B-Dist# show tech ?
 In case you want to collec...
 In 2008 SAN-OS is rebranded as NX-OS (from 4.0). NX-OS for
Nexus products are combination software components from IOS a...
NX-OS Troubleshooting crashes
N7010A-Dist# show system internal sysmgr service name l2fm
Service "l2fm" ("l2fm", 90):
UUID...
Troubleshooting NX-OS software upgrades
 3 types of software images - Kickstart, system and EPLD
 During the upgrade fir...
F1 and M1 line card interactions
 All 32 ports in F1 series line card can work at 10 gig speed for local
switching (switc...
Upcoming SlideShare
Loading in …5
×

Ha nsf notes

576 views

Published on

NX-OS HA

Published in: Spiritual, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
576
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
24
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Ha nsf notes

  1. 1. NX-OS: High Availability NX-OS SAN-OS IOS
  2. 2. NX-OS modular architecture Linux Kernel BGP OSPF PIM TCP/UDP IPv6 STP HSRP LACP etc HA Manager Restart process! NX7K Data Plane PSS Data plane streams  NX-OS services checkpoint their runtime state to the PSS for recovery in the event of a failure  HA manager includes system manager (just like init process of Linux), message transaction service ( an Inter Process Communication ) and Persistent Storage Space (a relation database of last known state e.g. checkpoint of a process). N7010B-Dist# show system internal ? mts MTS statistics pss Display pss information sysmgr Internal state of System Manager
  3. 3. NX-OS : High Availability www.silantia.com3 Highly granular modularity in software components Provides better fault isolation and control Provides streamlined feature sets and resource usage Granular in-service software upgrades Provides better flexibility in upgrades and minimizes service interruption Control-Plane / Data-plane functional separation Non-Stop Forwarding for each of the 3 fundamental technologies Helps minimize data forwarding disruption Integrated Virtualization support for L2 and L3 Better manageability in complex environments Licensing Provides flexibility in deployments and growth, Most of the feature has its own license.
  4. 4. NX-OS modular architecture www.silantia.com4  Message transaction service: A Interprocess communication channel  Persistence Storage Space: A relational databases which stores health and state of the process in case process fails last state of process can be queried from PSS.  System manager Initializes system processes.  Each VDC has its on HA policy that can be configured while defining the VDC. N7010A-Dist(config)# vdc N7010A-Core N7010A-Dist(config-vdc)# show vdc N7010A-Core detail vdc id: 2 vdc name: N7010A-Core vdc state: active vdc mac address: 00:26:98:07:ea:c2 vdc ha policy: RESTART vdc dual-sup ha policy: SWITCHOVER vdc boot Order: 1 vdc create time: Sun Jul 31 17:39:25 2011 vdc reload count: 0 vdc restart count: 0 vdc type: Ethernet vdc supported linecards: m1 m1xl
  5. 5. Non Stop Forwarding or Graceful restart www.silantia.com5  NX-OS has clear control and data plane separation.  Allows Nexus device to remain in data path while routing process restarts.  Neighbors has to be NSF aware to achieve this.  Restarting router sends signal to neighbors informing them it is going thru a temp failure condition and neighboring routers starts a grace period and continue to forward the traffic towards router which has failure condition.
  6. 6. Graceful Restart for OSPFv2 www.silantia.com6  When OSPFv2 needs to perform a graceful restart, it sends a link-local opaque (type 9) grace LSA. The grace LSA includes a grace period, which is a specified time that the neighbor OSPFv2 interfaces hold onto the LSAs from the restarting OSPFv2 interface.  The participating neighbors, which are called NSF helpers, keep all LSAs that originate from the restarting OSPFv2 interface as if the interface was still adjacent. Note that interface has to be up and operational during grace period.
  7. 7. Graceful Restart for OSPFv2 www.silantia.com7 Stateful restart is used in the following scenarios: • First recovery attempt after the process experiences problems • Inservice software upgrade ISSU • User-initiated switchover using the system switchover command Graceful restart is used in the following scenarios: • Second recovery attempt after the process experiences problems within a 4-minute interval • Manual restart of the process using the restart ospf command • Active supervisor removal • Active supervisor reload using the reload module active-sup command N7K(config)# router ospf CCEIDC N7K(config-router)# graceful-restart N7K(config-router)# graceful-restart grace-period 120
  8. 8. Graceful Restart for EIGRP www.silantia.com8 The graceful restart-capable router uses Hello messages to notify its neighbors that a graceful restart operation has started before restarting EIGRP. During a switchover, EIGRP uses nonstop forwarding to continue forwarding traffic based on the information in the FIB, and the system is not taken out of the network topology. Note again, interfaces running EIGRP should be operational during this period. N7K(config)# router eigrp CCIEDC N7K(config-router)# address-family ipv6 unicast N7K(config-router-af)# graceful-restart N7K(config-router-af)# timers nsf converge 180
  9. 9. NX-OS Troubleshooting  Always gather problem specific show techs….. N7010B-Dist# show tech ?  In case you want to collect show tech use TAC PAC N7010B-Dist# tac-pac Above command collects show tech and saves a .gz file in bootflash, you can use tftp to collect the file.  Always get timestamp of the problem. Zip all files with NX-OS “gzip” command before ship it.  Use built in lnux tools eg. grep, egrep, last, less, sed, wc, sort, diff, redirect,exclude, include, pi pe etc. to look for specific information.  Most useful commands show version show module show log | last 100 show running-config ? show system resource show inventory show interface transceiver show core show process log dir bootflash: show accounting log start-time 2011 Sep 20 00:00:00 show proc cpu sorted show cli syntax | egrep “vpc”
  10. 10.  In 2008 SAN-OS is rebranded as NX-OS (from 4.0). NX-OS for Nexus products are combination software components from IOS and SAN OS.  NX-OS is modular OS so if any process crash does not impact overall operation of the switch.  NX-OS was evolved from SAN-OS all necessary routing and switching protocols are imported from IOS with completely rewriten command line parser.  Failed process creates core dumps. N7010A-Dist# show core VDC Module Instance Process-name PID Date(Year- Month-Day Time) --- ------ -------- --------------- ---- ---------- ---------------  To fetch the core dump from the supervisor. N7010B-Dist# copy core:? core: Enter URL "core://<module-number>/<process- id>[/instance-num]" N7010B-Dist# copy core: tftp: NX-OS Troubleshooting crashes
  11. 11. NX-OS Troubleshooting crashes N7010A-Dist# show system internal sysmgr service name l2fm Service "l2fm" ("l2fm", 90): UUID = 0x19A, PID = 4980, SAP = 221 State: SRV_STATE_HANDSHAKED (entered at time Sat Sep 3 17:49:00 2011). Restart count: 1 Time of last restart: Sat Sep 3 17:34:05 2011. The service never crashed since the last reboot. Tag = N/A Plugin ID: 1 N7010A-Dist# show system internal sysmgr service pid 4433 Service "urib" ("urib", 173): UUID = 0x111, PID = 4433, SAP = 427 State: SRV_STATE_HANDSHAKED (entered at time Sat Sep 3 17:49:00 2011). Restart count: 1 Time of last restart: Sat Sep 3 17:33:25 2011. The service never crashed since the last reboot. Tag = N/A Plugin ID: 0
  12. 12. Troubleshooting NX-OS software upgrades  3 types of software images - Kickstart, system and EPLD  During the upgrade first supervisor gets upgraded to new version of code then each line card gets upgraded because line card also runs light weight version of NX- OS.  10 slot fully loaded chassis takes about an hour to upgrade the code but during this period no traffic loss. All configuration is locked on both sup.  Before upgrading software check the software installation impact N7010B-Dist#show install all impact kickstart bootflash:n7000-s1- kickstart.5.2.1.bin system bootflash:n7000-s1-dk9.5.2.1.bin  In case software upgrade fails N7010B-Dist# show install all failure-reason N7010A-Dist# show process log VDC Process PID Normal-exit Stack Core Log-create-time --- --------------- ------ ----------- ----- ----- ------- 1 installer 1497 N N N Sat Jul 16 22:29:59 2012 N7010B-Dist# show system internal log install | no-more N7010B-Dist# show system internal log install details | no-more  If available, gather bootup logs from console or from CMP while software upgrade in process. N7010A-Dist-cmp5# attach cp
  13. 13. F1 and M1 line card interactions  All 32 ports in F1 series line card can work at 10 gig speed for local switching (switching within line card) but total backplane available today is only 230Gbps.  F1 series line card requires M1 line cards to route a packet. All SVIs for vlans in F1 line card are stored on M1 line cards (proxy routing). Port on M1 line card does not need to be up.  F1 series line cards see other M1 series line card with a big giant port- channel for each VDC and uses it for L3 lookups.  F1 line card has connection between Forwarding engines so it can do local switching without going thru switching Fabric.  8 port M1 10 G line card can not perform local switching on first 4 ports to other switch ports without going thru switching fabric because it does not have the connections between forwarding engines.

×