How to Troubleshooting VLAN Switch Problems-Part1


Published on

Troubleshooting VLAN Switch Problems

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

How to Troubleshooting VLAN Switch Problems-Part1

  1. 1. How to Troubleshoot VLAN and Switch Problems?-Part 1In this article we will talk about the essential part of troubleshooting VLAN andswitch problems. What we discuss here are common general switch issues, VLANrelated issues, and spanning-tree issues. We’ll also cover VLAN/switchtroubleshooting techniques.Common General Switch IssuesOne of the things to keep in mind is that there are some things that can just generallyhappen on a switch. One example is a physical or connectivity related issue.Physical Interface/Connectivity IssuesSymptoms Interface is down/down - This means it’s not receiving keepalives and it’s not physically connected Interface is up/down - Meaning, it’s physically up but the Layer 2 protocol is down Interface is administratively downSolutions Check your cabling. Always start by assuming the problem is with the cable. Swap with a known good cable. This may not be the case all the time, but in some instances, you might have to substitute the crossover cable. It may not have the auto-MDIX crossover function, so you may actually have to do something manually.You can also verify that the hardware is functional. You can use the showcontrollers command to see if there’s something physically wrong with it, or try adifferent port on the switch to see if the same problem is happening. Check your interface. Verify that the interface is operational and use the no shutdowncommand. That would take care of “administratively down” cases
  2. 2. or if it’s been put intoerror-disabled state by one of the Layer 2 protocols and so forth.Physical Interface Speed/Duplex IssuesOther problems that can happen frequently across two interfaces are speed andduplex issues or mismatches. This can be particularly true if you have a gigabitconnection on one side and a 10/100 on the other.Symptoms You’ll see a syslog message that says %CDP-4-DUPLEX_MISMATCH. That’s going to tell you that there’s a duplex mismatch. If you have something hard-coded on one side and auto on the other, or you have them hard-coded on both sides but they’re done differently, it’s not going to be able to auto-sense anything, so you can have a speed and duplex mismatch as a result.Solutions Set the speed and duplex settings to autonegotiate on both ends. Manually configure speed and duplex settings on both ends (i.e., if one device has issues) so that they’re the same.Common VLAN Related IssuesVLAN-specific IssuesSymptom You notice interface flapping on a port set for access-only mode.
  3. 3. Solutions Execute a show running-configuration command. Examine the output and verify whether the following entries are on the port that’s affected: switchport mode access, and switchport access vlanIf something’s missing from that, add what you need. Some of the more automatedtrunking type mechanisms and similar stuff can create this type of issue if you don’thave it specifically set for access mode and the specific VLAN.Another reason a VLAN could be down is because there’s no physical port associatedwith that particular VLAN. Now, with a Layer 3 switch, this typically doesn’t tend tobe as big an issue. On Layer 2 switches, it can be.Symptom VLAN is created on the switch but in a down state.Solution Execute the show vlan command. If it shows “down,” make sure there’s at least one port that’s identified as part of the specified VLAN, or a switch virtual interface in that VLAN.VLANtrunking issuesSymptom You’ve connected the cables but a trunk is still not establishing across the
  4. 4. configured link.Solutions If you’re using ISL trunking, make sure the switch on the other side supports ISL. If it doesn’t, then you need to change it. If you’re using 802.1Q trunking, you may have different native VLANs configured on either side. If that’s the case, change the native VLANs to match. Verify the trunking settings on both ends of the link are the same (e.g. DTP, mode encapsulation, etc.).VLAN Trunking Protocol (VTP) issuesSymptom VLANs are not propagating from servers to clients the way they should be.Solutions The first thing you need to make sure is that the links on both sides, between the client and the server, are configured as trunks and that their trunking types match. Verify that the VTP domains match and adjust if necessary. Verify that the switch you intend to serve as master is no longer in transparent mode or client mode. Make sure it’s in server mode and that the other switch is in client mode.Inter-VLAN Routing IssuesSymptom
  5. 5. VLANs cannot reach one another. For instance, in the figure above, VLAN 1 and VLAN 11 cannot connect.Solutions If you’re using an external router, first make sure that that router’s reachable. Going back to our figure, if the workstation on VLAN 1 can’t reach the VLAN 1 interface on Router 1, there may be a connectivity issue or misconfiguration issue.If you’re having some other issue, you may have to troubleshoot routing. But if VLAN1 workstation can reach Router 1’s VLAN interface and VLAN 11 can do the samething with Router 1’s VLAN interface, then there may be something in the router youneed to look at. If you’re using a Layer 4 Route Processor, make sure that the Switched Virtual Interfaces (SVI) have been configured with the correct VLAN ID and IP subnet information. Verify that a default gateway exists on the switch.Common Spanning Tree Issues802.1D Spanning Tree IssuesSymptom A port has gone into an error-disabled state or has become non-functional after a configuration event.Solutions If you’re using Portfast and you have any of those guard features enabled, make sure there are no other devices creating those protocol units being sent to that port. Make sure no uni-directional links or one-way links exist. In a worst case scenario, just issue a shutdown/no shutdown command to reset that port.
  6. 6. Another spanning tree issue is one that has something to do with Etherchannel.Symptom Etherchannel is not forming a Port-Channel between configured links.Solutions One of the things that you have to make sure is that Etherchannel parameters have to match at both ends. They have to be the same type on the switch (e.g. FastEthernet, Gigabit Ethernet, etc.).You can have a FastEthernet on one switch going into a Gigabit on the other, but ifyou have a FastEthernet and a Gigabit Ethernet configured on that switch to go to theother switch, it’s not going to work. Verify that the same protocol has been configured on all ports (e.g. PAGP, LACP, etc.). Make sure that they’re the same on both ends. Make sure you use identical trunking configurations, including native VLANs, when using 802.1Q.Troubleshooting VLAN/Switch ProblemsNow that we’ve already taken up some common problems, here are some basic ideason how to do troubleshooting on switches and VLANs. Always start with the Physical Layer. Confirm that the interface is Up/Up. Verify that the cabling is operational. People often spend a lot of time troubleshooting other things, only to realize the problem is just the cable. Use the Cisco Discovery Protocol to verify Layer 2 connectivity. If you have it turned off, turn it on just for testing purposes. Execute the show cdp neighbors command and verify whether the device names you’re expecting to see and the types on both ends of the links are actually there. If there are no neighbors being shown and you think you have everything
  7. 7. configured the way they should be, then you may have a Layer 2 issue of some kind. In that case, you’ll be able to isolate the problem to a specific layer in the OSI model. Look at your ARP Mappings. Use the show arp command on both devices and watch for entries listing incorrect MAC addresses or a description of incomplete. If it’s incomplete, you may have some other kind of issue.Also, to verify ARP Mappings, issue a ping command to the IP address on theopposite end of the link. If the ping fails or the ARP entries appear incorrect, examinethe possible causes.VLAN/Switch Lab Troubleshooting ExercisesNow it’s time to look at how this actually works in a simulated environment. We’regoing to start by giving you a general background of some situation that couldactually exist. Three Trouble Tickets will be involved here. You’ll get them from thesystem and use for troubleshooting and resolution purposes.The three Trouble Tickets will be: Internet is Down, No Connectivity, and Network isSlow.As we walk you through each step of the simulated troubleshooting process, we’llpresent it in a way as if you’re the one doing the troubleshooting and that you’redoing it the way an expert would.Here’s the basic layout. Let’s call it our Site 1 Topology:It consists of a large campus with 300 employees spread across three separatebuildings. The Internet connectivity is across the WAN. In other words, this campusenvironment is getting Internet access from another location.There are two routers that provide redundancy both to the WAN and the Internet.See routers R1-1 and R1-2? Those two connect to the Wide Area Network.
  8. 8. Now here’s the situation.Building 3, which is being serviced by R1-3, has been experiencing a number ofservice outages. Your role as the Tier 1 help desk technician on duty is to receive thetrouble ticket, diagnose the issue, and ultimately resolve it.Trouble Ticket: Internet is DownYou arrive at work to find a high-priority trouble ticket assigned to you, and it saysthe Internet is down. The problem has been going on for over an hour without anyresolution. After some investigation, you discover that someone on the networkteam has made an undocumented configuration change.Your task is to pick up the ticket, assign it to yourself, contact the requestor andinform that person that you are now actively working on the problem, and then ofcourse proceed with troubleshooting and resolution.Here’s what greets you the moment you arrive at work:
  9. 9.
  10. 10. Now, while these messages may sound really harsh (see the last one), it’s just normalfor tensions to run high if something isn’t working and a person’s job depended on it.So even if you don’t particularly like the way this person’s talking to you, you have totake all that into account.Note in the upper-right corner of that last screenshot that the Status is Open and thePriority is High. The first thing you do is send the person a message assuring him/herthat you are already working on the issue. After that, you proceed to yourtroubleshooting activity.To begin troubleshooting, you bring up your console. Because R1-3 is the oneexperiencing problems, you right-click on it and select Telnet/SSH to device.
  11. 11. First, you check for connectivity. Since you got a Trouble Ticket from the managerindicating that although the Internet’s down, everything else seems to be working atleast locally, you assume that the workstations are still able to reach you.You proceed by issuing the command:showip interface briefto show the Physical Layer and see what it tells you. From the screenshot below, twoitems stand out.The first one, enclosed in a box marked #1, is something that would have requiredsome deeper inspection. However, it’s not being used, so you skip it.second one (marked #2), on the other hand, is a bunch of LAN interfaces, and they’reUp. That means they’re working the way they should be. In other words, the PhysicalLayer is working.
  12. 12. Next, you execute the show interfaces command and see if everything’s working asexpected. In the screenshot below, FastEthernet is showing Up/Up. That’s a goodsign.While you’re doing all this, you’re following a plan. Here’s the plan you drew up andfilled out for this particular troubleshooting activity:
  13. 13. Next, you do show cdp neighbors.Switch 1-3 (SW1-3) is the upstream switch, so you know that is functional. At thispoint, you think of ruling out both Layer 1 and Layer 2.Next, you conduct some ping tests on VLAN1 (the Management VLAN) and VLAN11(the Production VLAN).Everything looks fine on the Management VLAN:
  14. 14. However, on the Production VLAN, you experience some problems:You want to find out whether the upstream switch can be pinged, so you try toobtain the IP addresses by executing the show cdp neighbors detail command.
  15. 15. It’s not listing an IP address here, so you try pinging the switches.Unlike Switch 1 and Switch 2, which are doing fine, Switch 3 is experiencingconnectivity problems.You try pinging the Internet, and still you can’t get outside on VLAN11. That can bethe reason why the Internet is down.
  16. 16. So you’ve got successful connectivity on VLAN 1 to Router 1-1 and everything inbetween. However, you can’t get on VLAN11.Another thing you consider looking into is routing. To check routing, you execute thecommand:showip routeSeeing signs indicating that you may have a routing problem, you proceed to conductfurther investigation by executing the show ipeigrp interfaces.It reveals that you have zero peers even though you can get out on your VLAN1,which is the Management VLAN. The Production VLAN isn’t getting any routing. Atthis point, you cannot be sure but, judging from the way things are working, it wouldbe logical to suspect a switch related problem and that the problem is not on thisrouter.
  17. 17. When you do a show cdp neighbors, you see that the next upstream is Switch1-3, soyou take a look at that next.You again execute show cdp neighbors. That output includes Router 1-3 as well as anEtherchannel (Switch 1-2) across two interfaces, so you know that you’re looking at aLayer 2 connectivity.Next, you execute show interfaces trunk. You notice that both Native VLANproperties of both the link back to the router (Fa0/1) and the port channel (Po4)that’s up to the next upstream switch, SW1-2, are matching. Everything appears tobe in order here.
  18. 18. After that, you issue the show spanning-tree vlan 11 command. There you see yourroot port (Po4) and your designated port (Fa0/1).So far, everything here appears to be functional, but because you want to make surethat all the necessary configurations have been carried out, you do a show vlan. Theresults show that both VLAN 1 and VLAN 11 have really been configured.
  19. 19. You then execute the command: show vtp statusIt shows that the configuration has been successfully sent, the domain is correct, it’soperating in client mode, and there are 7 existing VLANs.At this point, you eliminate Switch 1-3 from your list of possible culprits and proceedto Switch 1-2.
  20. 20. You try executing a show ip interface brief command. Everything looks good there.Then you try show cdp neighbors. Same story there.
  21. 21. You also try a show spanning-tree vlan 11.Still you see that everything’s functioning the way they’re supposed to.To make sure the vlans are there, you issue the show vlan command.
  22. 22. VLAN1 and VLAN11, which are the ones that are critical, are there.Next, you do a show vtp status.Again, the information shown tells you that everything should be working properly,but that’s before you take a much closer look. Closer inspection reveals that some ofthe letters of the VTP Domain Name are in lower case.That may not sound like a big deal but, to this switch, it may mean somethingdifferent. Now you have what looks like a potential issue. Since everything else isworking, you certainly would like to eliminate every possible cause, negligible as theymay seem.Having found a potential issue, you now conduct further inspection in that particulardirection. You remember to make only one change at a time, knowing fully well thatif you make multiple changes simultaneously, you would run the risk of not knowingwhich one actually worked.
  23. 23. The next thing you do is issue the configure terminal command, followed by vtpdomain CCNP-TSHOOT.You then go back to your Router 1-3 and ping, which was successfulearlier, and, which wasn’t. Now, you find them both reachable.You issue configuration terminal here and then execute logging on (just in case thelogging got turned off), followed by show ip route.Next, you do a show ipeigrp neighbors. Surprisingly, you still don’t see any neighborseven though you already have connectivity back up.So you follow that with a show running-config to see if something’s out of order.
  24. 24. After scrolling down the results, you notice one particular interface with an errorwhere IP authentication for eigrp has been put in place.To take that out, you execute:noip authentication mode eigrp 100 md5After that, things start coming back up.You try show ipeigrp neighbors one more time. This time, you’re shown the threeyou were expecting.
  25. 25. You try pinging the Internet. It’s now back up as well.At this point, you do a little analysis and put together the information you’ve beenable to gather so far. The fault was identified on Device SW1-2. The fault was Layer 2 (Data Link Layer) in nature, specifically VLAN Trunking Protocol. More specifically, the fault was due to a VTP domain name mistyping (a human error) It was resolved by executing the vtp domain CCNP-TSHOOT command, with CCNP-TSHOOT all in capital letters.Since the problem has been resolved, you go back to the trouble ticket sent by therequestor, change the status to resolved, and put in necessary notes.
  26. 26. When you go back to the Home tab, you now see the number of Requests Overdue isalready down to two.
  27. 27. Note: Your day has just started and you still have two more trouble tickets to resolve.We will go over those in Part 2 of this post.More Networking Tips & Tutorials you can visit: