Your SlideShare is downloading. ×
Ipso vrrp troubleshooting
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Ipso vrrp troubleshooting

2,846
views

Published on

Published in: Technology

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,846
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
37
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. The purpose of this article is to help in troubleshooting VRRP related issues onNOkia Checkpoint Firewalls. One of the most common problems faced in Nokia VRRPImplementations is that interfaces on active and standby firewalls go into themaster master state. THe main reason for this is because the individual vrids ofthe master and backup firewall are not able to see the vrrp multicast requestsof each other.The first step is to check the vrrp state of the interfaces. This is how you cancheck that:PrimaryFW-A[admin]# iclidPrimaryFW-A> show vrrpVRRP StateFlags: On6 interface enabled6 virtual routers configured0 in Init state0 in Backup state6 in Master statePrimaryFW-A>PrimaryFW-A> exitBye.PrimaryFW-A[admin]#SecondaryFW-B[admin]# iclidSecondaryFW-B> sh vrrpVRRP StateFlags: On6 interface enabled6 virtual routers configured0 in Init state4 in Backup state2 in Master stateSecondaryFW-B>SecondaryFW-B> exitBye.SecondaryFW-B[admin]#In the example shown you see that 2 interfaces each from both firewalls are inthe Master state.The next step should involve running tcpdumps to see if the vrrp multicasts arereaching the particular interface.As the first troubleshooting measure, put a tcpdump on the problematic interfaceof the master and backup firewalls. If you want to know what the problematicinterface is, “echo sh vrrp int | iclid“ should give you the answer. It is thatinterface on the backup firewall which would be in a Master state.PrimaryFW-A[admin]# tcpdump -i eth-s4p2c0 proto vrrptcpdump: listening on eth-s4p2c000:46:11.379961 O 192.168.1.1 > 224.0.0.18: VRRPv2-adver 20: vrid 102 pri 100[tos 0xc0]00:46:12.399982 O 192.168.1.1 > 224.0.0.18: VRRPv2-adver 20: vrid 102 pri 100[tos 0xc0]00:46:13.479985 O 192.168.1.1 > 224.0.0.18: VRRPv2-adver 20: vrid 102 pri 100[tos 0xc0]00:46:14.560007 O 192.168.1.1 > 224.0.0.18: VRRPv2-adver 20: vrid 102 pri 100[tos 0xc0]
  • 2. When you put a tcpdump on the Primary Firewall, you see that the vrrp multicastrequest is leaving the interface.Next put the tcpdump on the secondary firewall.SecondaryFW-B[admin]# tcpdump -i eth-s4p2c0 proto vrrptcpdump: listening on eth-s4p2c000:19:38.507294 O 192.168.1.2 > 224.0.0.18: VRRPv2-adver 20: vrid 102 pri 95[tos 0xc0]00:19:39.527316 O 192.168.1.2 > 224.0.0.18: VRRPv2-adver 20: vrid 102 pri 95[tos 0xc0]00:19:40.607328 O 192.168.1.2 > 224.0.0.18: VRRPv2-adver 20: vrid 102 pri 95[tos 0xc0]00:19:41.687351 O 192.168.1.2 > 224.0.0.18: VRRPv2-adver 20: vrid 102 pri 95[tos 0xc0]00:19:42.707364 O 192.168.1.2 > 224.0.0.18: VRRPv2-adver 20: vrid 102 pri 95[tos 0xc0]Now you can see that the interface on both the primary and the secondaryfirewalls are broadcasting vrrp multicasts. This is because the vrrp multicastsare not reaching the firewalls interfaces. This means there is a communicationbreakdown which can be possibly caused by network issues.Once the network issue is resolved, communication would be possible and theinterface with the lower priority will go as the secondary or backup state.Now let us discuss another scenario where there is a problem with the firewallinterfaces in Master Master state.Again put a tcpdump on both the interfaces in question:PrimaryFW-A[admin]# tcpdump -i eth-s4p2c0 proto vrrptcpdump: listening on eth-s4p2c000:46:11.206994 I 10.10.10.1 > 224.0.0.18: VRRPv2-adver 20: vrid 103 pri 95 [tos0xc0]00:46:11.379961 O 192.168.1.1 > 224.0.0.18: VRRPv2-adver 20: vrid 102 pri 100[tos 0xc0]00:46:12.286990 I 10.10.10.1 > 224.0.0.18: VRRPv2-adver 20: vrid 103 pri 95 [tos0xc0]00:46:12.399982 O 192.168.1.1 > 224.0.0.18: VRRPv2-adver 20: vrid 102 pri 100[tos 0xc0]00:46:13.307014 I 10.10.10.1 > 224.0.0.18: VRRPv2-adver 20: vrid 103 pri 95 [tos0xc0]00:46:13.479985 O 192.168.1.1 > 224.0.0.18: VRRPv2-adver 20: vrid 102 pri 100[tos 0xc0]00:46:14.387098 I 10.10.10.1 > 224.0.0.18: VRRPv2-adver 20: vrid 103 pri 95 [tos0xc0]00:46:14.560007 O 192.168.1.1 > 224.0.0.18: VRRPv2-adver 20: vrid 102 pri 100[tos 0xc0]00:46:15.467064 I 10.10.10.1 > 224.0.0.18: VRRPv2-adver 20: vrid 103 pri 95 [tos0xc0]00:46:15.580010 O 192.168.1.1 > 224.0.0.18: VRRPv2-adver 20: vrid 102 pri 100[tos 0xc0]SecondaryFW-B[admin]# tcpdump -i eth-s4p2c0 proto vrrptcpdump: listening on eth-s4p2c000:19:38.507294 O 192.168.1.2 > 224.0.0.18: VRRPv2-adver 20: vrid 102 pri 95[tos 0xc0]00:19:38.630075 I 10.10.10.2 > 224.0.0.18: VRRPv2-adver 20: vrid 103 pri 100[tos 0xc0]00:19:39.527316 O 192.168.1.2 > 224.0.0.18: VRRPv2-adver 20: vrid 102 pri 95[tos 0xc0]00:19:39.710131 I 10.10.10.2 > 224.0.0.18: VRRPv2-adver 20: vrid 103 pri 100[tos 0xc0]
  • 3. 00:19:40.607328 O 192.168.1.2 > 224.0.0.18: VRRPv2-adver 20: vrid 102 pri 95[tos 0xc0]00:19:40.790142 I 10.10.10.2 > 224.0.0.18: VRRPv2-adver 20: vrid 103 pri 100[tos 0xc0]00:19:41.687351 O 192.168.1.2 > 224.0.0.18: VRRPv2-adver 20: vrid 102 pri 95[tos 0xc0]00:19:41.810150 I 10.10.10.2 > 224.0.0.18: VRRPv2-adver 20: vrid 103 pri 100[tos 0xc0]In the above example look at the vrid numbers of the incoming and outgoingpackets. From the vrids you see that that the vrids donot match. This is anindication that the cabling is not correct. The cables going to vrid 102 and 103are not connected correctly and they need to be swapped to fix this issue.Swap the cables and the issue will be resolved. The firewall with the higherpriority will go into the Master state.A properly functioning firewall will be like this:PrimaryFW-A[admin]# iclidPrimaryFW-A> sh vrrpVRRP StateFlags: On6 interface enabled6 virtual routers configured0 in Init state0 in Backup state6 in Master statePrimaryFW-A> exitBye.PrimaryFW-A[admin]#SecondaryFW-B[admin]# iclidSecondaryFW-B> sh vrrpVRRP StateFlags: On6 interface enabled6 virtual routers configured0 in Init state6 in Backup state0 in Master stateSecondaryFW-B> exitBye.SecondaryFW-B[admin]#If you were to tcpdump the healthy interface, this is how it would look:PrimaryFW-A[admin]# tcpdump -i eth-s4p2c0 proto vrrptcpdump: listening on eth-s4p2c018:25:44.015711 O 192.168.1.1 > 224.0.0.18: VRRPv2-adver 20: vrid 102 pri 100[tos 0xc0]18:25:45.095726 O 192.168.1.1 > 224.0.0.18: VRRPv2-adver 20: vrid 102 pri 100[tos 0xc0]18:25:46.175751 O 192.168.1.1 > 224.0.0.18: VRRPv2-adver 20: vrid 102 pri 100[tos 0xc0]18:25:47.195770 O 192.168.1.1 > 224.0.0.18: VRRPv2-adver 20: vrid 102 pri 100[tos 0xc0]18:25:48.275819 O 192.168.1.1 > 224.0.0.18: VRRPv2-adver 20: vrid 102 pri 100[tos 0xc0]18:25:49.355812 O 192.168.1.1 > 224.0.0.18: VRRPv2-adver 20: vrid 102 pri 100
  • 4. [tos 0xc0]^C97 packets received by filter0 packets dropped by kernelPrimaryFW-A[admin]#SecondaryFW-B[admin]# tcpdump -i eth-s4p2c0 proto vrrptcpdump: listening on eth-s4p2c018:26:07.415446 I 192.168.1.1 > 224.0.0.18: VRRPv2-adver 20: vrid 102 pri 100[tos 0xc0]18:26:08.495451 I 192.168.1.1 > 224.0.0.18: VRRPv2-adver 20: vrid 102 pri 100[tos 0xc0]18:26:09.515480 I 192.168.1.1 > 224.0.0.18: VRRPv2-adver 20: vrid 102 pri 100[tos 0xc0]18:26:10.595486 I 192.168.1.1 > 224.0.0.18: VRRPv2-adver 20: vrid 102 pri 100[tos 0xc0]18:26:11.675485 I 192.168.1.1 > 224.0.0.18: VRRPv2-adver 20: vrid 102 pri 100[tos 0xc0]18:26:12.695522 I 192.168.1.1 > 224.0.0.18: VRRPv2-adver 20: vrid 102 pri 100[tos 0xc0]18:26:13.775590 I 192.168.1.1 > 224.0.0.18: VRRPv2-adver 20: vrid 102 pri 100[tos 0xc0]^C14 packets received by filter0 packets dropped by kernelSecondaryFW-B[admin]#“““““““““ VRRP Transitions can happen due to several causes:The first (and most common) cause is that one or more of the monitoredinterfaces looses link state.The next cause is due to network issues VRRP hello packets are not seenoriginating from the master VRRP member on the backup.The third cuase is that one of the Check Point critical devices fails to check-in its state to the Kernel within the specified timeout.SolutionVRRP Transitions due to loss of link stateIt is often difficult to determine if the VRRP transition has occured due to aloss of link state on one of the monitored interfaces. To isolate the failovercause to a link transition of one of the following interfaces do the following:Gather switch statistics from the devices directly connected to the VRRP pair toanalyze whether or not you can determine if a link transition occurred.Run following commands to determine what interface is loosing link state causingthe transition to occur.(NOTE: This command shows Up to Down Transitions only. It will not incrementwhen the link state goes from Down to UP.) ipso[admin]# clish -c “show interfacemonitor“
  • 5. Interface MonitorInterface eth1c0 Status up Logical Name eth1c0 State PhysAvail,LinkAvail,Up,Broadcast,Multicast,AutoLink MTU 1518 Up to Down Transitions 1Interface eth2c0 Status up Logical Name eth2c0 State PhysAvail,LinkAvail,Up,Broadcast,Multicast,AutoLink MTU 1518 Up to Down Transitions 1Interface eth3c0 Status up Logical Name eth3c0 State PhysAvail,LinkAvail,Up,Broadcast,Multicast,AutoLink MTU 1518 Up to Down Transitions 1Interface eth4c0 Status down Logical Name eth4c0Interface loop0c0 Status up Logical Name loop0c0 State PhysAvail,LinkAvail,Up,Loopback,Multicast MTU 0 Up to Down Transitions 0ipso[admin]# clish -c “show vrrp interfaces“VRRP InterfacesInterface eth1c0 Number of virtual routers: 1 Flags: MonitoredCircuitMode Authentication: NoAuthentication VRID 10 State: Master Time sincetransition: 85236 BasePriority: 110 Effective Priority:110 Master transitions: 3 Flags: Advertisement interval: 1 Router Dead Interval:3 VMAC Mode: VRRP VMAC:00:00:5e:00:01:0a Primary address: 10.207.159.5 Next advertisement: Number of Addresses: 1 10.207.159.88 Monitored circuits eth3c0 (priority 10)Interface eth3c0 Number of virtual routers: 1 Flags: MonitoredCircuitMode Authentication: NoAuthentication
  • 6. VRID 10 State: Master Time sincetransition: 85236 BasePriority: 110 Effective Priority:110 Master transitions: 3 Flags: Advertisement interval: 1 Router Dead Interval:3 VMAC Mode: VRRP VMAC:00:00:5e:00:01:0a Primary address: 192.168.159.4 Next advertisement: Number of Addresses: 1 192.168.159.88 Monitored circuits eth1c0 (priority 10)VRRP Transitions due to not recieving VRRP hello packetsIn order to determine if VRRP hello packets are seen from the master on thebackup you will need to run tcpdump on each interface (configured for VRRP)looking for the inbound hello packets.The following command will allow you tosee all VRRP hello packets:ipso[admin]# tcpdump -vv -i eth1c0 proto vrrptcpdump: listening on eth1c018:18:20.605420 I 10.207.159.5 > 224.0.0.18: VRRPv2-adver 20: vrid 10 pri 110int 1 sum 9684 naddrs 1 10.207.159.88 [tos 0xc0] (ttl 255, id 14906)36 packets received by filter0 packets dropped by kernelWhen analyzing the VRRP hello packet there are several things that need to belooked at:VRID “ make sure that the packets you are looking at belong to the VRID inquestion.pri “ this is the effective priority that is being announced to the other VRRPmemberVRRP Transitions due to a failure of a Check Point Critical DeviceVRRP will only monitor the state of the Check Point processes only if “FWMonitoring“ is selected in the VRRP configuration. For troubleshooting purposesthis can be disabled from Voyager to rule out a critical device failure. Nokiadoes not recommend that customer run with this setting disabled in a productionenvironment.A Check Point Critical Device is a process that is monitored by the cpha daemon.These devices must report their state to the kernel within the timeoutspecified. If the device fails to report its state to the kernel within the
  • 7. specified timeout the kernel will assume that there is a problem with theprocess and will force a VRRP failover.Note: When “ FW Monitoring “ is enabled on VRRP; any backward clock move willcause fwd to go into problem state and as a result VRRP fail over will occur.To obtain a list of the Check Point Critical Devices and timeouts run thefollowing command:ipso[admin]# cphaprob -i listBuilt-in Devices:Device Name: IPSO member statusCurrent state: OKRegistered Devices:Device Name: SynchronizationRegistration number: 0Timeout: noneCurrent state: OKTime since last report: 102563 secDevice Name: FilterRegistration number: 1Timeout: noneCurrent state: OKTime since last report: 102548 secDevice Name: cphadRegistration number: 2Timeout: 5 secCurrent state: OKTime since last report: 0.2 secDevice Name: fwdRegistration number: 3Timeout: 5 secCurrent state: OKTime since last report: 0.6 secTo enable debugging (which will write an event to the messages file and consoleupon a critical device failure) run the following commands:ipso[admin]# ipsctl -w net:log:partner:status:debug 1That will log to the console and to /var/log/messages. If you want to turn off:ipso[admin]# ipsctl -w net:log:sink:console 0After enabling debugging, analyze the /var/log/messages file and look for linescontaining “noksr“. The log event will look like the following:Oct 12 18:55:28 IP650A [LOG_DEBUG] kernel: netlog:noksr_timeout .. Firewall-1/cphad expiredOct 12 18:55:28 IP650A [LOG_DEBUG] kernel: netlog:noksr_timeout .. Firewall-1/fwd expired
  • 8. Analyzing this information you will be able to determine exactly which criticaldevice has failed. You should then take a look at the timeout value for thiscritical device to determine if the value is high enough.In relatively high CPU usage situations failover may occur due to the criticaldevice not getting the CPU time required to check its state in with the kernel.It is recommended to increase the parameter to 600 seconds if the machine isunder heavy load.If the above does not improve the situation, use the following command tocompletely remove the FWD from the “response“ list:ipso[admin]# cphaprob -d fwd unregisterTake into consideration that this means that failover will not occur if the FWDdaemon crashes during normal operation.To change a timeout value to a higher value use the following command: ipso[admin]# cphaprob -d [device] -t [timeout] -s [state] -p registerExample: ipso[admin]# cphaprob -d fwd -t 120 -s ok -p register This command has registered the fwd process with the state “OK“ and a timeoutvalue of 120 seconds. (NOTE: this command will not survive a reboot so the commands will need to beadded to the fwstart script or rc.local with a 60 seconds sleep to make thispersistant across reboots)““show vrrp interfacesDetailed configuration of VRRP, including priority, hello interval, and VRIDclish -c “show interfacemonitor“Displays interface transitionscphaprob -i listDisplays Checkpoint critical processes and their timeouts.To log critical process failures:ipsctl -w net:log:partner:status:debug 1That will log to the console and to /var/log/messages. If you want to turn off: ipsctl -w net:log:sink:console 0To change the timeout value of a monitored process:cphaprob -d [device] -t [timeout] -s [state] -p register