Your SlideShare is downloading. ×
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Transcend NCS Network Troubleshooting Guide
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Transcend NCS Network Troubleshooting Guide

2,076

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,076
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
61
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  1. 3Com Transcend ® Network Control Services Version 5.0 for UNIX® Network Troubleshooting Guide Net wor k Management Part No. 09-1500-000 Businesses run on networks and networks run with management .
  2. 3Com Corporation Copyright © 1999, 3Com Corporation. All rights reserved. No part of this documentation may be reproduced 5400 Bayfront Plaza in any form or by any means or used to make any derivative work (such as translation, transformation, or adaptation) without written permission from 3Com Corporation. Santa Clara, California 95052-8145 3Com Corporation reserves the right to revise this documentation and to make changes in content from time to time without obligation on the part of 3Com Corporation to provide notification of such revision or change. 3Com Corporation provides this documentation without warranty, term, or condition of any kind, either implied or expressed, including, but not limited to, the implied warranties, terms or conditions of merchantability, satisfactory quality, and fitness for a particular purpose. 3Com may make improvements or changes in the product(s) and/or the program(s) described in this documentation at any time. If there is any software on removable media described in this documentation, it is furnished under a license agreement included with the product as a separate document, in the hard copy documentation, or on the removable media in a directory file named LICENSE.TXT or !LICENSE.TXT. If you are unable to locate a copy, please contact 3Com and a copy will be provided to you. UNITED STATES GOVERNMENT LEGEND If you are a United States government agency, then this documentation and the software described herein are provided to you subject to the following: All technical data and computer software are commercial in nature and developed solely at private expense. Software is delivered as “Commercial Computer Software” as defined in DFARS 252.227-7014 (June 1995) or as a “commercial item” as defined in FAR 2.101(a) and as such is provided with only such rights as are provided in 3Com’s standard commercial license for the Software. Technical data is provided with limited rights only as provided in DFAR 252.227-7015 (Nov 1995) or FAR 52.227-14 (June 1987), whichever is applicable. You agree not to remove or deface any portion of any legend provided on any licensed program or documentation contained in, or delivered to you in conjunction with, this User Guide. Portions of this documentation are reproduced in whole or in part with permission from (as appropriate). Unless otherwise indicated, 3Com registered trademarks are registered in the United States and may or may not be registered in other countries. 3Com, the 3Com logo, Boundary Routing, EtherDisk, EtherLink, EtherLink II, LinkBuilder, Net Age, NETBuilder, NETBuilder II, OfficeConnect, Parallel Tasking, SmartAgent, SuperStack, TokenDisk, TokenLink, LinkSwitch® 1000, LinkSwitch® 3000,Transcend, and ViewBuilder are registered trademarks of 3Com Corporation. ATMLink, AutoLink, CoreBuilder, DynamicAccess, FDDILink, NetProbe, and PACE are trademarks of 3Com Corporation. 3ComFacts is a service mark of 3Com Corporation. Artisoft and LANtastic are registered trademarks of Artisoft, Inc. Banyan and VINES are registered trademarks of Banyan Systems Incorporated. CompuServe is a registered trademark of CompuServe, Inc. DEC and PATHWORKS are registered trademarks of Digital Equipment Corporation. Intel and Pentium are registered trademarks of Intel Corporation. AIX, AT, IBM, NetView, and OS/2 are registered trademarks and Warp is a trademark of International Business Machines Corporation. Microsoft, MS-DOS, Windows, and Windows NT are registered trademarks of Microsoft Corporation. Novell and NetWare are registered trademarks of Novell, Inc. PictureTel is a registered trademark of PictureTel Corporation. UNIX is a registered trademark of X/Open Company, Ltd. in the United States and other countries. All other company and product names may be trademarks of the respective companies with which they are associated. Guide written by Patricia Johnson, Chris Flisher, Sarah Newman, and Adam Bell. Edited by Ben Mann Jr.. Technical information provided by Dan Bailey, Bob McTague, Graeme Robertson, and Andrew Ward. ii
  3. CONTENTS ABOUT THIS GUIDE Finding Specific Information in This Guide 13 What to Expect from This Guide 14 Conventions 14 3Com Device Name Changes 16 Related Documentation 16 3Com Publications 16 User Guides 16 Help Systems 18 3Com World Wide Web (WWW) 18 Year 2000 Compliance 19 PART I BEFORE TROUBLESHOOTING 1 NETWORK TROUBLESHOOTING OVERVIEW Introduction to Network Troubleshooting 23 About Connectivity Problems 23 About Performance Problems 24 Solving Connectivity and Performance Problems 24 Network Troubleshooting Framework 25 Troubleshooting Strategy 26 Recognizing Symptoms 27 User Comments 27 Network Management Software Alerts 28 Analyzing Symptoms 28 Understanding the Problem 29 Identifying and Testing the Cause of the Problem 29 Sample Problem Analysis 30 Equipment for Testing 31 Solving the Problem 32 iii
  4. 2 YOUR NETWORK TROUBLESHOOTING TOOLBOX Transcend Applications 33 Transcend Central 34 Status Watch 34 Web Reporter 34 Address Tracker 34 LANsentry Manager 35 Traffix Manager 35 Device View 36 Network Management Platforms 36 3Com SmartAgent Embedded Software 37 Other Commonly Used Tools 39 Ping 39 Strategies for Using Ping 39 Tips on Interpreting Ping Messages 40 Telnet 41 FTP and TFTP 41 Analyzers 41 Probes 42 Cable Testers 42 3 STEPS TO ACTIVELY MANAGING YOUR NETWORK Designing Your Network for Troubleshooting 43 Positioning Your SNMP Management Station 44 Using Probes 45 Monitoring Business-critical Networks 47 FDDI Backbone Monitoring 48 Internet WAN Link Monitoring 48 Switch Management Monitoring 48 Using Telnet, Serial Line, and Modem Connections 49 Using Communications Servers 50 Setting Up Redundant Management 51 Other Tips on Network Design 52 Management Station Configuration 52 More Tips 52 Preparing Devices for Management 52 Configuring Management Parameters 53 iv
  5. Configuring Traps 53 Configuring Transcend NCS 53 Monitoring Devices 53 Setting Thresholds and Alarms 54 Setting Thresholds in Status Watch 54 Setting Thresholds and Alarms in LANsentry Manager 55 Refining Alarm Settings 55 Setting Alarms Based on a Baseline 56 Other Tips for Setting Thresholds and Alarms 57 Knowing Your Network 57 Knowing Your Network’s Configuration 57 Site Network Map 58 Logical Connections 60 Device Configuration Information 60 Other Important Data About Your Network 61 Identifying Your Network’s Normal Behavior 62 Baselining Your Network 62 Identifying Background Noise 63 PART II NETWORK CONNECTIVITY PROBLEMS AND SOLUTIONS 4 MANAGER-TO-AGENT COMMUNICATION Manager-to-Agent Communication Overview 67 Understanding the Problem 67 Identifying the Problem 67 Solving the Problem 68 Verifying Management Configurations 68 Manager-to-Agent Communication Reference 69 IP Address 69 Gateway Address 69 Subnet Mask 69 SNMP Community Strings 69 SNMP Traps 71 5 FDDI CONNECTIVITY FDDI Connectivity Overview 73 v
  6. Understanding the Problem 73 Identifying the Problem 75 Solving the Problem 76 Monitoring FDDI Connections 76 Status Watch 76 Making Your FDDI Connections More Resilient 77 Implementing Dual Homing 77 Installing an Optical Bypass Unit 78 FDDI Connectivity Reference 79 Peer Wrap Condition 79 Twisted Ring Condition 79 Undesired Connection Attempt Event 80 6 TOKEN RING CONNECTIVITY AND ERRORS Token Ring Overview 81 Using Transcend Applications to Identify Problems and Symptoms 82 Using Token Ring Statistics Tool 82 Using LANsentry Manager 84 Using the Ring Station View 85 Using TR Network Analyzer Tool 86 Network Graphs 87 Active Station and Error Statistics List 87 Token Ring Status Tool 88 Token Ring Utilization Tool 88 Identifying and Solving Ring Errors 89 Troubleshooting Notes 90 7 ATM AND LANE CONNECTIVITY ATM and LANE Connectivity Overview 93 Color Status and Propagation 94 Device Level Troubleshooting 95 LANE Level Troubleshooting 95 ATM Network Level Troubleshooting 97 Virtual LANs Level Troubleshooting 97 Identifying VLAN Splits 98 Indications in the VLAN Map 98 Indications in the Backbone and Services Window 98 vi
  7. Path Assistants for Identifying Connectivity and Performance Problems 99 LE Path Assistant 99 ATM Path Assistant 99 Tracing a VC Path Between Two ATM End Nodes 100 Examining Virtual Channels Across Layer 2 Topologies 100 Tracing the LAN Emulation Control VCCs Between Two LANE Clients 100 PART III NETWORK PERFORMANCE PROBLEMS AND SOLUTIONS 8 BANDWIDTH UTILIZATION Bandwidth Utilization Overview 103 Understanding the Problem 103 Identifying the Problem 103 Solving the Problem 104 Identifying Utilization Problems 104 Status Watch 104 Generating Historical Utilization Reports 106 Web Reporter 106 Bandwidth Utilization Reference 106 ATM Utilization 106 Ethernet Utilization 107 FDDI Utilization 108 Token Ring Utilization 108 9 BROADCAST STORMS Broadcast Storms Overview 109 Understanding the Problem 109 Identifying the Problem 109 Solving the Problem 110 Identifying a Broadcast Storm 110 Status Watch 110 Traffix Manager 111 Disabling the Offending Interface 113 Address Tracker 113 Correcting Spanning Tree Misconfigurations 113 vii
  8. Device View 113 Broadcast Storms Reference 114 Broadcast Packets 114 Multicast Packets 114 10 DUPLICATE ADDRESSES Duplicate Addresses Overview 115 Understanding the Problem 115 Identifying the Problem 115 Solving the Problem 115 Finding Duplicate MAC Addresses 116 Status Watch 116 Finding Duplicate IP Addresses 116 Address Tracker 116 LANsentry Manager 117 Duplicate Addresses Reference 117 Duplicate MAC Addresses 117 Duplicate IP Addresses 118 11 ETHERNET PACKET LOSS Ethernet Packet Loss Overview 119 Understanding the Problem 119 Identifying the Problem 120 Solving the Problem 120 Searching for Packet Loss 120 Status Watch 121 LANsentry Manager Network Statistics Graph 122 Device View 125 Ethernet Packet Loss Reference 127 Alignment Errors 127 Collisions 127 CRC Errors 127 Excessive Collisions 128 FCS Errors 128 Late Collisions 128 Nonstandard Ethernet Problems 129 Receive Discards 129 viii
  9. Too Long Errors 129 Too Short Errors 130 Transmit Discards 130 12 FDDI RING ERRORS FDDI Ring Errors Overview 131 Understanding the Problem 131 Identifying the Problem 131 Solving the Problem 132 Identifying Ring Errors 132 Status Watch 132 FDDI Ring Errors Reference 133 Elasticity Buffer Error Condition 133 Frame Error Condition 133 Frames Not Copied Condition 133 Link Error Condition 134 MAC Neighbor Change Event 134 13 NETWORK FILE SERVER TIMEOUTS Network File Server Timeout Overview 135 Understanding the Problem 135 Identifying the Problem 135 Solving the Problem 136 Looking for Obvious Errors 136 Ping and Telnet 136 LANsentry Manager Alarms View 136 LANsentry Manager Statistics View 137 LANsentry Manager History View 137 Reproducing the Fault While Monitoring the Network 138 LANsentry Manager Top-N Graph 138 LANsentry Manager Packet Capture 138 LANsentry Manager Packet Decode 139 Address Tracker 139 LANsentry Manager Packet Decode 140 Correcting the Fault 140 Network File Server Timeouts Reference 141 Jabbering 141 ix
  10. Network File System (NFS) Protocol 141 14 MEASURING ATM NETWORK PERFORMANCE Measuring Traffic Performance 143 Utilization Map 143 Displaying Link Traffic 144 Displaying Node Configuration 144 Configuring the Utilization Tool 144 Map Configuration 144 Polling Configuration 145 Communication Configuration 145 Measuring Device Level Performance 145 Using the History Graph 145 Displaying Statistics 146 Measuring Port Level Performance 146 Traffic 146 Utilization 146 Total Frames 147 Good Frames 147 Errored Frames 147 LANE Component Statistics 148 LES Statistics 148 LEC Statistics 148 LANE User 149 PART IV REFERENCE 15 SNMP IN NETWORK TROUBLESHOOTING SNMP Operation 153 Manager/Agent Operation 153 SNMP Messages 154 Trap Reporting 154 Security 155 SNMP MIBs 155 MIB Tree 155 MIB-II 157 x
  11. RMON MIB 158 RMON2 MIB 159 3Com Enterprise MIBs 160 16 INFORMATION RESOURCES Books 161 URLs 162 INDEX xi
  12. xii
  13. ABOUT THIS GUIDE This guide helps you to troubleshoot connectivity and performance problems on your network using Transcend® Network Management Software and other tools. This guide is intended for network administrators who understand networking technologies and how to integrate networking devices. You should have a working knowledge of: s Transmission Control Protocol/Internet Protocol (TCP/IP) s Simple Network Management Protocol (SNMP) s Network management platforms s 3Com devices on your network You should also be familiar with the interface and features of the Transcend Network Management Software that you have installed. With subsequent releases of Transcend management software, this guide will be updated with new troubleshooting information and additional Transcend troubleshooting tools. The most current version of this guide is on the 3Com Web site under the Support: http://www.3com.com Finding Specific This guide, which is available online in Portable Document Format (PDF) Information in and HyperText Markup Language (HTML) formats and in paper, is This Guide designed to be used online. For the online version, cross-references to other sections are indicated with links in blue, underlined text, which you can click. You can print any pages as needed.
  14. 14 CHAPTER : ABOUT THIS GUIDE Table 1 provides guidelines for navigating through this document. Table 1 Guidelines for Finding Specific Information in This Guide If you are looking for See An introduction to network troubleshooting, Part I: “Before Troubleshooting” information about troubleshooting tools, and Note: This part is recommended guidelines for getting ready for management reading for users who are new to network management. Specific troubleshooting scenarios to help you Part II: “Network Connectivity solve real network problems Problems and Solutions” Part III: “Network Performance Problems and Solutions” Useful background information to help you with Part IV: “Reference” troubleshooting tasks What to Expect This guide demonstrates how to troubleshoot problems on your network from This Guide with the help of Transcend and other tools. It also shows you how to use Transcend to move beyond day-to-day troubleshooting to proactive network management. This guide is not intended to help you identify and correct problems with installation and use of Transcend software. For that type of troubleshooting, see: s The Transcend Network Control Services Installation Guide (for help with installation and startup problems) s The Help or user guide for a specific application (for information about troubleshooting application problems) This guide focuses on technologies to troubleshoot your network and demonstrates how these technologies are applied using Transcend management software. Conventions Table 2 and Table 3 list conventions that are used throughout this guide. Table 2 Notice Icons Icon Notice Type Description Information note Information that describes important features or instructions
  15. Conventions 15 Table 2 Notice Icons Icon Notice Type Description Caution Information that alerts you to potential loss of data or potential damage to an application, system, or device Warning Information that alerts you to potential personal injury Table 3 Text Conventions Convention Description Screen displays This typeface represents information as it appears on the screen. Syntax The word “syntax” means that you must evaluate the syntax provided and then supply the appropriate values for the placeholders that appear in angle brackets. Example: To enable RIPIP, use the following syntax: SETDefault !<port> -RIPIP CONTrol = Listen In this example, you must supply a port number for <port>. Commands The word “command” means that you must enter the command exactly as shown and then press Return or Enter. Commands appear in bold. Example: To remove the IP address, enter the following command: SETDefault !0 -IP NETaddr = 0.0.0.0 The words “enter” When you see the word “enter” in this guide, you must type and “type” something, and then press Return or Enter. Do not press Return or Enter when an instruction simply says “type.” Keyboard key names If you must press two or more keys simultaneously, the key names are linked with a plus sign (+). Example: Press Ctrl+Alt+Del Words in italics Italics are used to: s Emphasize a point. s Denote a new term at the place where it is defined in the text. s Identify menu names, menu commands, and software button names. Examples: From the Help menu, select Contents. Click OK.
  16. 16 CHAPTER : ABOUT THIS GUIDE 3Com Device Name Many devices of the CoreBuilder™ family consist of some 3Com devices Changes that previously belonged to different 3Com brands. These devices are known by their new CoreBuilder names in the Transcend® NCS software. See Table 4. Table 4 3Com Device Name Changes Previous name New name Cellplex® 7000 CoreBuilder™ 7000 LANplex 2500 CoreBuilder 2500 LANplex 6000 CoreBuilder 6000 ONcore hubs CoreBuilder 5000 hubs ONcore Controller and Management CoreBuilder 5000 Controller and modules Management modules ONcore FastModule CoreBuilder 5000 FastModule ONcore SwitchModule CoreBuilder 5000 SwitchModule Related The following documents provide background and related information Documentation about local-area networking and internetworking, SNMP-based network management, and 3Com enterprise computing technology. Most user guides and release notes are available in Adobe Acrobat Reader Portable Document Format (PDF) or HTML on the 3Com World Wide Web site: http://www.3com.com/ 3Com Publications This guide is complemented by other 3Com documents, Help systems, and World Wide Web (WWW) documents. User Guides The following documents are shipped with your Transcend NCS software as printed books: s Transcend Network Control Services Introduction to Transcend Network Management, Version 5.0 for UNIX s Transcend Network Control Services Installation Guide, Version 5.0 for UNIX s Transcend Network Control Services Network Administration Guide, Version 5.0 for UNIX
  17. Related Documentation 17 s Transcend Management Software Network Troubleshooting Guide, Version 5.0 for UNIX s Transcend Network Control Services Release Notes, Version 5.0 for UNIX s Transcend Network Control Services on the Web Quick Tour, Version 5.0 for UNIX The following documents are shipped with your Transcend NCS software on the CD-ROM entitled Transcend Network Control Services Online Documentation Set: s Inventory Management s Transcend Network Control Services Transcend Central User Guide, Version 5.0 for UNIX s Configuration Management s Transcend Network Control Services Network Admin Tools User Guide, Version 5.0 for UNIX s Transcend Network Control Services Device View User Guide, Version 5.0 for UNIX s Transcend Network Control Services NETBuilder Management Application Suite User Guide, Version 5.0 for UNIX s Transcend Network Control Services Token Ring Manager User Guide, Version 5.0 for UNIX s Transcend Network Control Services Enterprise VLAN Manager User Guide, Version 5.0 for UNIX s Transcend Network Control Service PathBuilder Switch Manager User Guide, Version 5.0 for UNIX s Transcend Network Control Services Total Control Manager/SNMP User Guide s Monitoring and Reporting s Transcend Network Control Services Status Watch User Guide, Version 5.0 for UNIX s Transcend Network Control Services LANsentry Manager User Guide, Version 5.0 for UNIX s Transcend Network Control Services LANsentry Reporter User Guide, Version 5.0 for UNIX
  18. 18 CHAPTER : ABOUT THIS GUIDE Help Systems Each Transcend NCS application contains a Help system that describes how to use all the features of the application. Help includes window descriptions, step-by-step instructions, conceptual information, and troubleshooting tips for that application. You can access Help from: s The Help menu in any application by selecting Help Topics (in the Help Topics window, you can view the Contents and Index) s A Help button in windows and dialog boxes s Your 3Com/Transcnd/Help directory (or the directory that you have set for your Transcend software installation) 3Com World Wide Web (WWW) The following 3Com Web resources provide additional information about Transcend Network Control Services: s 3Com Network Management Solution Center –– Contains a range of information about 3Com’s network management solutions including Transcend Network Control Services, Total Control™ Manager, Transcend Traffix™ Manager, Transcend dRMON Edge Monitor, InfoVista, and Transcend Enterprise Monitor hardware probes for Ethernet and Token Ring networks. http://www.3com.com/products/trans_net_man.html s 3Com Support –– Provides access to technical support and includes data sheets, support tips, Frequently Asked Questions (FAQ) documents, user guides, release notes, and software downloads. http://infodeli.3com.com/ s Document Center –– Contains useful links to news, technical briefs, case studies, solutions guides, and product data sheets. http://www.3com.com/util/dcenter.html s Technology Center –– Contains up-to-the-minute white papers, strategic overviews, and in-depth tutorials about networking technologies and innovations. http://www.3com.com/technology/index.html s Networking Glossary –– Explains networking terms and acronyms. http://www.3com.com/nsc/glossary/main.htm
  19. Year 2000 Compliance 19 Year 2000 For information on Year 2000 compliance and 3Com products, visit the Compliance 3Com Year 2000 Web page: http://www.3com.com/products/yr2000.html
  20. 20 CHAPTER : ABOUT THIS GUIDE
  21. BEFORE TROUBLESHOOTING I Chapter 1 Network Troubleshooting Overview Chapter 2 Your Network Troubleshooting Toolbox Chapter 3 Steps to Actively Managing Your Network
  22. NETWORK TROUBLESHOOTING 1 OVERVIEW These sections introduce you to the concepts and practice of network troubleshooting: s Introduction to Network Troubleshooting s Network Troubleshooting Framework s Troubleshooting Strategy Introduction to Network troubleshooting means recognizing and diagnosing networking Network problems with the goal of keeping your network running optimally. As a Troubleshooting network administrator, your primary concern is maintaining connectivity of all devices (a process often called fault management). You also continually evaluate and improve your network’s performance. Because serious networking problems can sometimes begin as performance problems, paying attention to performance can help you address issues before they become serious. About Connectivity Connectivity problems occur when end stations cannot communicate Problems with other areas of your local area network (LAN) or wide area network (WAN). Using management tools, you can often fix a connectivity problem before users even notice it. Connectivity problems include: s Loss of connectivity — When users cannot access areas of your network, your organization’s effectiveness is impaired. Immediately correct any connectivity breaks. s Intermittent connectivity — Although users have access to network resources some of the time, they are still facing periods of downtime. Intermittent connectivity problems can indicate that your network is on the verge of a major break. If connectivity is erratic, investigate the problem immediately. s Timeout problems — Timeouts cause loss of connectivity, but are often associated with poor network performance.
  23. 24 CHAPTER 1: NETWORK TROUBLESHOOTING OVERVIEW About Performance Your network has performance problems when it is not operating as Problems effectively as it should. For example, response times may be slow, the network may not be as reliable as usual, and users may be complaining that it takes them longer to do their work. Some performance problems are intermittent, such as instances of duplicate addresses. Other problems can indicate a growing strain on your network, such as consistently high utilization rates. If you regularly examine your network for performance problems, you can extend the usefulness of your existing network configuration and plan network enhancements, instead of waiting for a performance problem to adversely affect the users’ productivity. Solving Connectivity When you troubleshoot your network, you employ tools and knowledge and Performance already at your disposal. With an in-depth understanding of your Problems network, you can use network software tools, such as “Ping”, and network devices, such as “Analyzers”, to locate problems, and then make corrections, such as swapping equipment or reconfiguring segments, based on your analysis. Transcend® provides another set of tools for network troubleshooting. These tools have graphical user interfaces that make managing and troubleshooting your network easier. With “Transcend Applications”, you can: s Baseline your network’s normal status to use as a basis for comparison when the network operates abnormally s Precisely monitor network events s Be notified immediately of critical problems on your network, such as a device losing connectivity s Establish alert thresholds to warn you of potential problems that you can correct before they affect your network s Resolve problems by disabling ports or reconfiguring devices See “Your Network Troubleshooting Toolbox” for details about each troubleshooting tool.
  24. Network Troubleshooting Framework 25 Network The International Standards Organization (ISO) Open Systems Troubleshooting Interconnect (OSI) reference model is the foundation of all network Framework communications. This seven-layer structure provides a clear picture of how network communications work. Protocols (rules) govern communications between the layers of a single system and among several systems. In this way, devices made by different manufacturers or using different designs can use different protocols and still communicate. By understanding how network troubleshooting fits into the framework of the OSI model, you can identify at what layer problems are located and which type of troubleshooting tools to use. For example, unreliable packet delivery can be caused by a problem with the transmission media or with a router configuration. If you are receiving high rates of “FCS Errors” and “Alignment Errors”, which you can monitor with Status Watch, then the problem is probably located at the physical layer and not the network layer. Figure 1 shows how to troubleshoot the layers of the OSI model. Table 5 describes the data that the network management tools can collect as it relates to the OSI model layers. Table 5 Network Data and the OSI Model Layers Layer Data Collected TranscendcNCS Tool Used Application Protocol information and other s LANsentry Manager Remote Monitoring (RMON) Presentation and RMON2 data s Traffix Manager™ (for more detail) Session Transport Network Routing information s Status Watch s LANsentry Manager (for more detail) s Traffix Manager (for more detail) Data Link Traffic counts and other packet s Status Watch breakdowns s LANsentry Manager (for more detail) Physical Error counts s Status Watch
  25. 26 CHAPTER 1: NETWORK TROUBLESHOOTING OVERVIEW Figure 1 OSI Reference Model and Network Troubleshooting Troubleshooting Tools Application SNMP Console managers Layer 7 Examples: SNMP Telnet, Presentation manager, agent, rlogin, FTP Layer 6 proxy agent Analyzers Probes Session Traffix™ Manager Layer 5 LANsentry® Manager Examples: Transport TCP UDP Layer 4 Examples: Status Network Watch IP IPX Layer 3 LLC LLC LLC Data link Layer 2 Probes MAC MAC MAC LANsentry Manager PHY PHY PHY Cable Status Watch Physical Ethernet Token testing Layer 1 Ring PMD tools FDDI For information about network troubleshooting tools, see “Your Network Troubleshooting Toolbox”. Troubleshooting How do you know when you are having a network problem? The answer Strategy to this question depends on your site’s network configuration and on your network’s normal behavior. See “Knowing Your Network” for more information.
  26. Troubleshooting Strategy 27 If you notice changes on your network, ask the following questions: s Is the change expected or unusual? s Has this event ever occurred before? s Does the change involve a device or network path for which you already have a backup solution in place? s Does the change interfere with vital network operations? s Does the change affect one or many devices or network paths? After you have an idea of how the change is affecting your network, you can categorize it as critical or noncritical. Both of these categories need resolution (except for changes that are one-time occurrences); the difference between the categories is the time that you have to fix the problem. By using a strategy for network troubleshooting, you can approach a problem methodically and resolve it with minimal disruption to network users. It is also important to have an accurate and detailed map of your current network environment. Beyond that, a good approach to problem resolution is: s Recognizing Symptoms s Understanding the Problem s Identifying and Testing the Cause of the Problem s Solving the Problem Recognizing The first step to resolving any problem is to identify and interpret the Symptoms symptoms. You may discover network problems in several ways. Users may complain that the network seems slow or that they cannot connect to a server. You may pass your network management station and notice that a node icon is red. Your beeper may go off and display the message: WAN connection down. User Comments Although you can often solve networking problems before users notice a change in their environment, you invariably get feedback from your users about how the network is running, such as: s They cannot print. s They cannot access the application server.
  27. 28 CHAPTER 1: NETWORK TROUBLESHOOTING OVERVIEW s It takes them much longer to copy files across the network than it usually does. s They cannot log on to a remote server. s When they send e-mail to another site, they get a routing error message. s Their system freezes whenever they try to Telnet. Network Management Software Alerts Network management software, as described in “Your Network Troubleshooting Toolbox”, can alert you to areas of your network that need attention. For example: s The application displays red (Warning) icons. s Your weekly Top-N utilization report (which indicates the 10 ports with the highest utilization rates) shows that one port is experiencing much higher utilization levels than normal. s You receive an e-mail message from your network management station that the threshold for broadcast and multicast packets has been exceeded. These signs usually provide additional information about the problem, allowing you to focus on the right area. Analyzing Symptoms When a symptom occurs, ask yourself these types of questions to narrow the location of the problem and to get more data for analysis: s To what degree is the network not acting normally (for example, does it now take one minute to perform a task that normally takes five seconds)? s On what subnetwork is the user located? s Is the user trying to reach a server, end station, or printer on the same subnetwork or on a different subnetwork? s Are many users complaining that the network is operating slowly or that a specific network application is operating slowly? s Are many users reporting network logon failures? s Are the problems intermittent? For example, some files may print with no problems, while other printing attempts generate error messages, make users lose their connections, and cause systems to freeze.
  28. Troubleshooting Strategy 29 Understanding the Networks are designed to move data from a transmitting device to a Problem receiving device. When communication becomes problematic, you must determine why data are not traveling as expected and then find a solution. The two most common causes for data not moving reliably from source to destination are: s The physical connection breaks (that is, a cable is unplugged or broken). s A network device is not working properly and cannot send or receive some or all data. Network management software can easily locate and report a physical connection break (layer 1 problem). It is more difficult to determine why a network device is not working as expected, which is often related to a layer 2 or a layer 3 problem. To determine why a network device is not working properly, look first for: s Valid service — Is the device configured properly for the type of service it is supposed to provide? For example, has Quality of Service (QoS), which is the definition of the transmission parameters, been established? s Restricted access — Is an end station supposed to be able to connect with a specific device or is that connection restricted? For example, is a firewall set up that prevents that device from accessing certain network resources? s Correct configuration — Is there a misconfiguration of IP address, subnet mask, gateway, or broadcast address? Network problems are commonly caused by misconfiguration of newly connected or configured devices. See “Manager-to-Agent Communication” for more information. Identifying and After you develop a theory about the cause of the problem, test your Testing the Cause of theory. The test must conclusively prove or disprove your theory. the Problem Two general rules of troubleshooting are: s If you cannot reproduce a problem, then no problem exists unless it happens again on its own. s If the problem is intermittent and you cannot replicate it, you can configure your network management software to catch the event in progress.
  29. 30 CHAPTER 1: NETWORK TROUBLESHOOTING OVERVIEW For example, with “LANsentry Manager”, you can set alarms and automatic packet capture filters to monitor your network and inform you when the problem occurs again. See “Configuring Transcend NCS” for more information. Although network management tools can provide a great deal of information about problems and their general location, you may still need to swap equipment or replace components of your network until you locate the exact trouble spot. After you test your theory, either fix the problem as described in “Solving the Problem” or develop another theory. Sample Problem Analysis This section illustrates the analysis phase of a typical troubleshooting incident. On your network, a user cannot access the mail server. You need to establish two areas of information: s What you know — In this case, the user’s workstation cannot communicate with the mail server. s What you do not know and need to test — s Can the workstation communicate with the network at all, or is the problem limited to communication with the server? Test by sending a “Ping” or by connecting to other devices. s Is the workstation the only device that is unable to communicate with the server, or do other workstations have the same problem? Test connectivity at other workstations. s If other workstations cannot communicate with the server, can they communicate with other network devices? Again, test the connectivity. The analysis process follows these steps: 1 Can the workstation communicate with any other device on the subnetwork? s If no, then go to step 2. s If yes, determine if only the server is unreachable. s If only the server cannot be reached, this suggests a server problem. Confirm by doing step 2.
  30. Troubleshooting Strategy 31 s If other devices cannot be reached, this suggests a connectivity problem in the network. Confirm by doing step 3. 2 Can other workstations communicate with the server? s If no, then most likely it is a server problem. Go to step 3. s If yes, then the problem is that the workstation is not communicating with the subnetwork. (This situation can be caused by workstation issues or a network issue with that specific station.) 3 Can other workstations communicate with other network devices? s If no, then the problem is likely a network problem. s If yes, the problem is likely a server problem. When you determine whether the problem is with the server, subnetwork, or workstation, you can further analyze the problem, as follows: s For a problem with the server — Examine whether the server is running, if it is properly connected to the network, and if it is configured appropriately. s For a problem with the subnetwork — Examine any device on the path between the users and the server. s For a problem with the workstation — Examine whether the workstation can access other network resources and if it is configured to communicate with that particular server. Equipment for Testing To help identify and test the cause of problems, have available: s A laptop computer that is loaded with a terminal emulator, TCP/IP stack, TFTP server, CD-ROM drive (to read the online documentation), and some key network management applications, such as LANsentry® Manager. With the laptop computer, you can plug into any subnetwork to gather and analyze data about the segment. s A spare managed hub to swap for any hub that does not have management. Swapping in a managed hub allows you to quickly spot which port is generating the errors. s A single port probe to insert in the network if you are having a problem where you do not have management capability. s Console cables for each type of connector, labeled and stored in a secure place.
  31. 32 CHAPTER 1: NETWORK TROUBLESHOOTING OVERVIEW Solving the Problem Many device or network problems are straightforward to resolve, but others yield misleading symptoms. If one solution does not work, continue with another. A solution often involves: s Upgrading software or hardware (for example, upgrading to a new version of agent software or installing Gigabit Ethernet devices) s Balancing your network load by analyzing: s What users communicate with which servers s What the user traffic levels are in different segments Based on these findings, you can decide how to redistribute network traffic. s Adding segments to your LAN (for example, adding a new switch where utilization is continually high) s Replacing faulty equipment (for example, replacing a module that has port problems or replacing a network card that has a faulty jabber protection mechanism) To help solve problems, have available: s Spare hardware equipment (such as modules and power supplies), especially for your critical devices s A recent backup of your device configurations to reload if flash memory gets corrupted (which can sometimes happen due to a power outage) Use the Transcend NCS application suite Network Admin Tools to save and reload your software configurations to devices.
  32. YOUR NETWORK 2 TROUBLESHOOTING TOOLBOX A robust network troubleshooting toolbox consists of items (such as network management applications, hardware devices, and other software) to recognize, diagnose, and solve networking problems. It contains: s Transcend Applications s Network Management Platforms s 3Com SmartAgent Embedded Software s Other Commonly Used Tools Transcend Transcend® management software is optimized for managing 3Com Applications devices and their attached networks. However, some applications, such as LANsentry® Manager, can manage any vendor’s networking equipment that complies with the Remote Monitoring (RMON) Management Information Base (MIB). This section describes these Transcend applications, which you can use to troubleshoot your network: s Transcend Central s Status Watch s Address Tracker s LANsentry Manager s Traffix Manager s Device View This guide primarily focuses on using these applications to troubleshoot your network.
  33. 34 CHAPTER 2: YOUR NETWORK TROUBLESHOOTING TOOLBOX Transcend Central Start with Transcend Central, which is an asset management and device grouping application, to understand what your network consists of and to control the Transcend NCS network management troubleshooting tools. Transcend Central is available as both a native Windows application and a Java application that you can access using a Web browser. Using Transcend Central for troubleshooting, you can: s Display an inventory of device, module, and port information. s Group devices to make your troubleshooting tasks easier. By managing a collection of devices, you can simultaneously perform the same tasks on each device in a group and locate physical or logical problems on your network. s Launch Transcend NCS applications, including some of your primary Transcend NCS troubleshooting tools: s Status Watch includes Web Reporter (from the Java version) s Address Tracker s LANsentry Manager s Traffix Manager s Device View Status Watch The Status Watch applications manage 3Com devices and their attached networks. Status Watch applications primarily poll for “MIB-II” data. This is a performance monitoring application that allows you to monitor the operational status of your network devices and quickly identify any problems that require your attention. It works in conjunction with Web Reporter. See the Status Watch Help to learn which 3Com devices are supported. Web Reporter Web Reporter is a data-reporting application that runs in a World Wide Web (WWW) browser. It generates reports from data that Status Watch collects, allowing you to compare network statistics against a baseline. Address Tracker Address Tracker is an address collection and discovery application that: s Polls managed devices for all MAC addresses
  34. Transcend Applications 35 s Polls managed devices and routers for IP addresses to perform MAC-to-IP address translation s Uses Device View to disable troublesome ports LANsentry Manager LANsentry Manager is a set of integrated applications that displays and explores the real-time and historical data that RMON-compliant devices (probes) on the network capture. LANsentry Manager uses SNMP polling to gather RMON and RMON2 data from the probes. Use LANsentry Manager to: s Monitor current performance of network segments s See trends over time s Spot signs of current problems s Configure alarms to monitor for specific events s Capture packets and display their contents LANsentry Manager works with any device (from 3Com or other vendors) that supports the “RMON MIB” or the “RMON2 MIB”. Traffix Manager Traffix™ Manager is a performance-monitoring application that provides information about layer 2 (RMON) and layer 3 conversations between nodes. It helps you to assess traffic patterns on your network. Traffix Manager: s Monitors all the stations that the RMON2–compliant probes encounter on your network s Captures and stores RMON and RMON2 data for your network’s protocols and applications s Displays traffic between stations in user-defined views of the network s Graphs current or historical data on the devices selected s Delivers reports for user-specified stations and time periods as postscript to your printer or as HTML to your Web server s Launches LANsentry Manager tools for in-depth analysis of a station or a conversation between stations
  35. 36 CHAPTER 2: YOUR NETWORK TROUBLESHOOTING TOOLBOX You can use Traffix Manager to: s Know your network — Understand overall flow patterns and interactions between systems, and determine how your network is really being used at the application level. s Optimize your network — Gain an insight into traffic and application usage trends to help you optimize the use and placement of current network resources and make wise decisions about capacity planning and network growth. Traffix Manager works with any device (from 3Com or other vendors) that supports the “RMON2 MIB”. Device View The Device View application is a device configuration tool. When you troubleshoot your network, you can use Device View to determine or change a device’s configuration. You can also use Device View to look at a device’s statistics and to set alarms. Device View manages only 3Com devices. See the Device View Help for which 3Com devices are supported by Device View. You can also use Transcend Upgrade Manager, which is one of the Network Admin Tools applications, to perform bulk software upgrades on devices. Network As part of your troubleshooting toolbox, your network management Management platform is the first place to go to view the overall health of your Platforms network. With the platform, you can understand the logical configuration of your network and configure views of your network to understand how devices work together and the role that they play in the users’ work. The network management platform that supports your Transcend software installation can provide valuable troubleshooting tools. Transcend runs on several platforms within the NT and UNIX environments. The platform discovers the devices. Transcend imports that information from the platform to populate the core database. Unless you are rediscovering, the user must manually update the platform
  36. 3Com SmartAgent Embedded Software 37 Using this device database, a map displays the graphical representation of your network. Each device on your network appears as a symbol (icon) on the map. You can configure views of your network to show devices on the same subnetworks or floors. You can monitor network performance and diagnose network performance and connectivity problems. You can also: s Take a snapshot of your network in its normal state. The snapshot records the state of your network at a particular instant. If you later have network performance problems, you can compare the current state of your network to the snapshot. s Quickly determine the connectivity status of a device by noting the color of its map symbol. Red usually means that communication with a device has ceased. s Diagnose connectivity problems by determining whether two devices can communicate. If they can communicate, then examine the route between the devices, the number of packets that were sent and lost, and the roundtrip time between the two devices. s Manage MIB information (for example, collecting and storing MIB data for trend analysis and graphing) using MIB queries. Transcend compiles MIBs and allows you to navigate up and down the “MIB Tree” to retrieve MIB objects from devices. You can set thresholds for MIB data and generate events when a threshold is exceeded. s Configure the software to act on certain events. The Event Categories window informs you of any unexpected events (which arrive in the form of traps). For more information, see the documentation that is shipped with your software. 3Com SmartAgent Traditional Simple Network Management Protocol (SNMP) management Embedded places the burden of collecting network management information on the Software management station. In this traditional model, software agents collect information about throughput, record errors or packet overflows, and measure performance based on established thresholds. Through a polling process, agents pass this information to a centralized network management station whenever they receive an SNMP query. Management applications then make the data useful and alert the user if there are problems on the device.
  37. 38 CHAPTER 2: YOUR NETWORK TROUBLESHOOTING TOOLBOX For more information about traditional SNMP management, see “SNMP Operation”. As a useful companion to traditional network management methods, 3Com’s SmartAgent® technology places management intelligence into the software agent that runs within a 3Com device. This scalable solution reduces the amount of computational load on the management station and helps minimize management-related network traffic. SmartAgent software, which uses the “RMON MIB”, is self-monitoring, collecting and analyzing its own statistical, analytical, and diagnostic data. In this way, you can conduct network management by exception — that is, you are notified only if a problem occurs. Management by exception is unlike traditional SNMP management, in which the management software collects all data from the device through polling. SmartAgent software works autonomously and reports to the network management station whenever an exceptional network event occurs. The software can also take direct action without involving the management station. Devices that contain SmartAgent software may be able to: s Perform broadcast throttling to minimize the flow of broadcast traffic on your network s Monitor the ratio of good frames to bad frames s Switch a resilient link pair to the standby path if the primary path corrupts frames s Report if traffic on vital segments drops below minimum usage levels s Disable a port for five seconds to clear problems, and then automatically reconnect it To configure these advanced SmartAgent software features, see your device documentation. The Transcend NCS applications LANsentry Manager and Traffix Manager make RMON data that the SmartAgent software collect more usable by summarizing and correlating important information.
  38. Other Commonly Used Tools 39 Other Commonly These commonly used tools can also help you troubleshoot your network: Used Tools s Network software, such as Ping, Telnet, and FTP and TFTP. You can use these applications to troubleshoot, configure, and upgrade your system. s Network monitoring devices, such as Analyzers and Probes. s Tools, such as Cable Testers, for working on physical problems. Many of the tools that are discussed in this section are only useful in TCP/IP networks. Ping Packet Internet Groper (Ping) allows you to quickly verify the connectivity of your network devices. Ping attempts to transmit a packet from one device to a station on the network, and listens for the response to ensure that it was correctly received. You can validate connections on the parts of your network by pinging different devices: s A successful response indicates that a valid network path exists between your station and the remote host and that the remote host is active. s Slower response times than normal can indicate that the path is congested or obstructed. s A failed response indicates that a connection is broken somewhere; use the message to help locate the problem. See Tips on Interpreting Ping Messages. Some network devices, like the CoreBuilder™ 5000, must be configured to be able to respond to Ping messages. If you are not receiving responses from a device, first make sure that it is set up to be a Ping responder. Strategies for Using Ping Follow these strategies for using Ping: s Ping devices when your network is operating normally so that you have a performance baseline for comparison. See “Identifying Your Network’s Normal Behavior” for more information. s Ping by IP address when: s You want to test devices on different subnetworks. This method allows you to Ping your network segments in an organized way, rather than having to remember all the hostnames and locations.
  39. 40 CHAPTER 2: YOUR NETWORK TROUBLESHOOTING TOOLBOX s Your Domain Name System (DNS) server is down and your system cannot look up host names properly. You can Ping with IP addresses even if you cannot access hostname information. s Ping by hostname when you want to identify DNS server problems. s To troubleshoot problems that involve large packet sizes, Ping the remote host repeatedly, increasing the packet size each time. s To determine if a link is erratic, perform a continuous Ping (using ping -s on UNIX), which indicates the time that it takes the device to respond to each Ping. s To determine a route taken to a destination, use the trace route function (tracert). s Consider creating a Ping script that periodically sends a Ping to all necessary networking devices. If a Ping failure message is received, the script can perform some action to notify you of the problem, such as paging you. s Use the Ping functions of your network management platform. For example, in your HP OpenView map, select a device and click the right mouse button to gain access to ping functions. Tips on Interpreting Ping Messages Use the following ping failure messages to troubleshoot problems: No reply from <destination> Indicates that the destination routes are available but that there is a problem with the destination itself. <destination> is unreachable Indicates that your system does not know how to get to the destination. This message means either that routing information to a different subnetwork is unavailable or that a device on the same subnetwork is down. ICMP host unreachable from gateway Indicates that your system can transmit to the target address using a gateway, but that the gateway cannot forward the packet properly because either a device is misconfigured or the gateway is not operating.
  40. Other Commonly Used Tools 41 Telnet Telnet, which is a login and terminal emulation program for Transmission Control Protocol/Internet Protocol (TCP/IP) networks, is a common way to communicate with an individual device. You log in to the device (a remote host) and use that remote device as if it were a local terminal. If you have established an out-of-band Telnet connection with a device, you can use Telnet to communicate with that device even if the network is unavailable. This feature makes Telnet one of the most frequently used network troubleshooting tools. Usually, all device statistics and configuration capabilities are accessible by using Telnet to connect to the device’s console. For more information about setting up an out-of-band connection, see “Using Telnet, Serial Line, and Modem Connections”. You can invoke the Telnet application on your local system and set up a link to a Telnet process that is running on a remote host. You can then run a program that is located on a remote host as if you were working at the remote system. FTP and TFTP Most network devices support either the File Transfer Protocol (FTP) or the Trivial File Transfer Protocol (TFTP) for downloading updates of system software. Updating system software is often the solution to networking problems that are related to agent problems. Also, new software features may help correct a networking problem. FTP provides flexibility and security for file transfer by: s Accepting many file formats, such as ASCII and binary s Using data compression s Providing Read and Write access so that you can display, create, and delete files and directories s Providing password protection TFTP is a simple version of FTP that does not list directories or require passwords. TFTP only transfers files to and from a remote server. Analyzers An analyzer, which is often called a Sniffer, is a network device that collects network data on the segment to which it is attached, a process called packet capturing. Software on the device analyzes this data, which is a process referred to as protocol analysis. Most analyzers can interpret different types of protocol traffic, such as TCP/IP, AppleTalk, and Banyan VINES traffic.
  41. 42 CHAPTER 2: YOUR NETWORK TROUBLESHOOTING TOOLBOX You usually use analyzers for reactive troubleshooting — when you see a problem somewhere on your network, you attach an analyzer to capture and interpret the data from that area. Analyzers are particularly helpful for identifying intermittent problems. For example, if your network backbone has experienced moments of instability that prevent users from logging on to the network, you can attach an analyzer to the backbone to capture the intermittent problems when they happen again. Probes Like Analyzers, a probe is a network device that collects network data. Depending on its type, a probe can collect data from multiple segments simultaneously. It stores the collected data and transfers the data to an analysis site when requested. Unlike an analyzer, probes do not interpret data. A probe can be either a stand-alone device or an agent in a network device. The Transcend Enterprise Monitor 500 series and the SuperStack ® II Monitor series are stand-alone RMON probes. LANsentry Manager and Traffix Manager use data from probes that comply with the “RMON MIB” or the “RMON2 MIB”. You can use a probe daily to determine the health of your network. The Transcend NCS applications can interpret and report this data, alerting you to possible problems so that you can proactively manage your network. For example, an RMON2 probe can help you to analyze traffic patterns on your network. Use this data to make decisions about reconfiguring devices and end stations as needed. Cable Testers Cable testers examine the electrical characteristics of the wiring. They are most commonly used to ensure that building wiring and cables meet Category 5, 4, and 3 standards. For example, network technologies such as Fast Ethernet require the cabling to meet Category 5 requirements. Testers are also used to find defective and broken wiring in a building.
  42. STEPS TO ACTIVELY MANAGING 3 YOUR NETWORK These sections describe the steps that you can take to effectively troubleshoot your network when the need arises: s Designing Your Network for Troubleshooting s Preparing Devices for Management s Configuring Transcend NCS s Knowing Your Network Designing Your By designing your network for troubleshooting, you can access key Network for devices on your network when your network is experiencing connectivity Troubleshooting or performance problems. Having adequate management access depends on these design criteria: s Position of the management station so that it can gather the greatest amount of network data through Simple Network Management Protocol (SNMP) polling s Position of probes for distributed management of critical networks s Ability to communicate with each device even when your management station cannot access the network The following sections discuss how to design your network with the preceding criteria in mind: s Positioning Your SNMP Management Station s Using Probes s Monitoring Business-critical Networks s Using Telnet, Serial Line, and Modem Connections s Using Communications Servers s Setting Up Redundant Management
  43. 44 CHAPTER 3: STEPS TO ACTIVELY MANAGING Y OUR NETWORK s Other Tips on Network Design Positioning Your In a typical LAN, locate your management station directly off the SNMP Management backbone where it can conduct SNMP polling and manage network Station devices. The backbone is usually the optimum location for the management station because: s The backbone is not subject to the failures of individual subnetworked routers or switches. s In a partial network outage, the information collected by a backbone management station is probably more accurate than from a station in a routed subnetwork. s The backbone is usually protected with redundant power and technologies, like Fiber Distributed Data Interface (FDDI), that correct their own problems. This redundancy ensures that the backbone remains operational, even when other areas of the network are having problems. s The backbone is typically faster and has a higher bandwidth than other areas of your network, making it a more efficient location for a management station. Make sure that the capacity of your backbone can accommodate the SNMP traffic that the management applications generate. Figure 2 shows a management station that is set up at the network backbone and polling network devices.
  44. Designing Your Network for Troubleshooting 45 Figure 2 SNMP Management at the Backbone Management workstation NIC card or x network device Backbone x x x x x x x x x x x x x = Network devices that you want to poll Although SNMP management from the backbone is a good way to keep track of what is happening on your network, do not rely on it exclusively. Because SNMP management occurs in-band (that is, SNMP traffic shares network bandwidth with data traffic), network troubleshooting using SNMP can become a problem in these ways: s Very heavy data traffic or a break in the network can make it difficult or impossible for the management station to poll a device. s Traffic that SNMP polling adds to the network may contribute to networking problems. Using Probes To minimize the frequency of SNMP traffic on your network, set up one or more Probes to collect Remote Monitoring (RMON) data from the network devices. In the distributed model illustrated in Figure 3, the management station uses SNMP polling to collect data from the probes rather than from all the network devices. Distributing the management over the network ensures you of some continued data collection even if you have network problems. Many management applications support data from MIBs other than the RMON MIBs. For this reason, even if you are using RMON probes, some SNMP polling to individual devices from a key management station is always useful for a complete picture of your network.
  45. 46 CHAPTER 3: STEPS TO ACTIVELY MANAGING Y OUR NETWORK Figure 3 Management at the Backbone with an Attached Probe Management workstation x Probe NIC card or NIC card or x network device x network device Backbone x x x x x x x x x x x Probe x x x x = Network devices that you want to poll To extend your remote monitoring capabilities, use embedded RMON probes or roving analysis (monitoring one port for a period of time, moving on to another port for a while, and so on). However, with roving analysis, you cannot see a historical analysis of the ports because the probe is moving from one port to another. Some probes, like 3Com’s Enterprise Monitor, are designed to support the large number of interfaces that are found in switched environments. The probe’s high port density supports this multi-segmented switched environment. You can also use the probe’s interfaces to monitor mirror (or copy) ports on the switch, which means that all data received and transmitted on a port is also sent to the probe. Probes do not indicate which port has caused an error. Only a managed hub (a hub or switch with an onboard management module) can provide that level of detail. Probes and a hub’s own management module complement each other.
  46. Designing Your Network for Troubleshooting 47 Monitoring On business-critical networks, you need to increase your level of Business-critical management by dedicating probes to the essential areas of your network. Networks For detailed network management, it is not enough to gather raw performance figures — you need to know, at the network and conversation level, what is generating the traffic and when it is being generated. For this type of analysis, use reporting tools, such as Traffix Manager, and low-level, fault diagnostic tools, such as LANsentry Manager®. The three critical areas to monitor on this type of network are discussed in these sections and shown in Figure 4: s FDDI Backbone Monitoring s Internet WAN Link Monitoring s Switch Management Monitoring Figure 4 Probes Monitoring a Business-critical Network Direct connection to the management workstation Management workstation SuperStack® II Enterprise Monitor with FDDI module x Inline monitoring SuperStack II on Fast Ethernet Enterprise Monitor FDDI Backbone x x x x x x x x x x = Network devices that you want to poll WAN = Possible probe attachment to a switch’s roving analysis port
  47. 48 CHAPTER 3: STEPS TO ACTIVELY MANAGING Y OUR NETWORK FDDI Backbone Monitoring On the FDDI backbone, you need to continually monitor whether it is being overutilized, and, if so, by what type of traffic. By placing the SuperStack ® II Enterprise Monitor with an FDDI media module directly at the backbone, you can gather utilization and host matrix information. Traffix Manager uses these data to provide regular segment utilization reports and Top-N host reports. In addition, the probe provides a full range of FDDI performance statistics that LANsentry Manager can record or that SNMP traps can report to the management station. To ensure management access to the probe, provide a direct connection to the probe from your management station. You can use this connection to access probe data even if the ring is unusable and keeps management traffic off the main ring. Internet WAN Link Monitoring The Internet link is a concern for dedicated network management because it: s Represents an external cost to the company s Requires budgeting s Is a possible security problem In a way that is similar to monitoring the FDDI backbone, Traffix Manager reports can indicate whether you are paying for too much bandwidth or whether you need to purchase more. Traffix Manager can also indicate the level of use on a workgroup basis for internal billing and highlight the top sites that users visit. Similarly, you can monitor for unexpected conversations and protocols. You also need to know the error rates on this link and whether you are experiencing congestion because of circumstances on the Internet provider’s network. LANsentry Manager can record and display these statistics and provide a detailed real-time view. Switch Management Monitoring The third area of interest in this network is the large number of switch-to-end station links. When detailed analysis of these devices is required (for example, if one of the ports on the network suddenly reports much higher traffic than normal), you need to track the source of the problem and decide whether you can optimize the traffic path. In this
  48. Designing Your Network for Troubleshooting 49 case, you need a way to view the traffic on the switch port at a conversation level. By placing a Superstack II Enterprise Monitor in a central location, you can easily attach it to the switches that have the most Ethernet ports as the need arises. By using the roving analysis feature of many 3Com devices, you can copy data from a monitored port to the port on the switch that is connected to the SuperStack II. When a problem arises, roving analysis is activated for a particular switch and LANsentry Manager or Traffix Manager collects the data from the SuperStack II Enterprise Monitor. These applications can then monitor the network data for the devices that are connected to that switch. Using Telnet, To minimize your dependency on SNMP management, set up a way to Serial Line, and reach the console of your key networking devices. Through the console, Modem Connections you can often view Ethernet, FDDI, Asynchronous Transfer Mode (ATM), and token ring statistics, view routing and bridging tables, and determine and modify device configurations. Out-of-band (that is, management using a dedicated line to a device) console connections are also key to network troubleshooting. If the network goes down, your console connections are still available. The types of console connections include: s Telnet — Out-of-band and in-band access using a network connection. For example, on 3Com’s CoreBuilder™ 6000 switch, using Telnet you can access the management console by using a dedicated Ethernet connection to the management module (out-of-band) and from any network attached to the device (in-band). s Serial line — Direct, out-of-band access using a terminal connection. This type of connection allows you to maintain your connections to a device if it reboots. s Modem — Remote, out-of-band access using a modem connection. Figure 5 shows management of a device through the serial line and modem ports.
  49. 50 CHAPTER 3: STEPS TO ACTIVELY MANAGING Y OUR NETWORK Figure 5 Out-of-band Management Using the Serial and Modem Ports Management workstation Modem Wiring closet Modem Modem port Serial line port Network switch Management workstation Attached LAN Sometimes, direct access to network devices through out-of-band management is the only way to examine a network problem. For example, if your network connections are down, you can Telnet to one of your key routers and examine its routing table. The routing table lists the devices that the router can reach, allowing you to narrow the area of the problem. You can also Ping from this device to further investigate which areas of the network are down. Using Although out-of-band management keeps you in contact with a Communications particular device during a network problem, it does not inform you about Servers all the areas of your network from a central point. You must access each device separately. To manage devices more centrally, you can set up a communications server (often called a comm server). See Figure 6.
  50. Designing Your Network for Troubleshooting 51 Figure 6 Out-of-band Management with a Communications Server Management workstation Serial line port Wiring closet Wiring closet Serial line port Serial line port Network Network switch switch Communications server (“Comm” server) Attached LAN For optimal benefit, provide two management connections to the comm server: s Connect the comm server to the network (an in-band connection) so that you can access the devices from anywhere on the network using reverse Telnet. s Connect your management workstation directly to one of the serial ports of the comm server (an out-of-band connection) so that you can access the devices when the network is down. Setting Up To ensure that a management station can always access the backbone, Redundant set up a redundancy system of management. In this setup, management Management applications (often different ones) run on separate management workstations, which are connected to the backbone through separate network devices or by using a network card. This setup allows the management workstations to monitor each other and report any problems with their attached network devices. The redundancy system also provides a backup management connection to your network if one management station loses connectivity.
  51. 52 CHAPTER 3: STEPS TO ACTIVELY MANAGING Y OUR NETWORK Other Tips on This section provides some additional tips for designing your network for Network Design troubleshooting. Management Station Configuration s Configure the management station to run without any network connection — including NIS, NFS, and DNS lookups. Do not install Transcend® on a network drive. s Have more than one interface available on the management station, an arrangement called dual hosting. Connect vital probes to the second interface to create a private monitoring LAN (one without regular network traffic) on which network problems do not impair communication. s Do not give the management station privileges on the network, such as the ability to log in with no passwords (rsh). Hackers can easily spot management stations. s Connect the management station to an uninterruptible power supply (UPS) to protect the station from events that interrupt power, such as blackouts, power surges, and brownouts. s Regularly back up the management station. s Provide remote access through a modem to the management station so that you can keep track of your network’s activity remotely. More Tips s Use managed hubs to narrow which link is causing an error. Even if your budget does not allow you to manage all hubs, strategically install one managed hub for error tracking. s Keep copies of all configurations on a file server and on the management station. See “Knowing Your Network’s Configuration” for more information. Preparing Devices Before Transcend (or any other management software) can work with the for Management devices on your network, make sure that the devices are configured appropriately for management communication. If you have a problem establishing a management connection, see “Manager-to-Agent Communication” for more information about solving this problem.
  52. Configuring Transcend NCS 53 Configuring Before you attempt to manage the supported devices with Transcend Management NCS applications, ensure that each device conforms with these Parameters prerequisites: s The device must have an IP hostname and IP address. When you manage modular devices, use the IP address of the device’s management module, if one is present. s The device and your network management platform must use the same SNMP read (get) and write (set) community strings. See “Security” for more information about community strings. Configuring Traps SNMP trap reporting means that management agents send unsolicited messages to management stations, relaying events that have occurred at the device, such as a system reboot. Traps include an object identification (OID) that passes integer values or strings that the management software decodes. Configure each device to send the SNMP traps that are required by the network management applications to the management station. You can set SNMP traps using the device’s console program or Device View, a Transcend NCS application. For more information about traps, see “Trap Reporting”. Configuring Configure Transcend to monitor your network most effectively, identify Transcend NCS when thresholds are exceeded, and alert you to problems or potential problems. Monitoring Devices For Transcend to monitor your devices: s Use your platform’s autodiscovery feature to detect all manageable devices on your network and to create a network map. Transcend NCS applications use this data for their operation. For Transcend NCS applications to recognize 3Com devices from the platform, the device icons must be 3Com device icons. s Add 3Com devices to an inventory database using Transcend Central. You can import devices from your platform’s database. The Transcend Central database defines the devices that many of the Transcend NCS applications manage and allows you to group devices for easier management and faster troubleshooting.
  53. 54 CHAPTER 3: STEPS TO ACTIVELY MANAGING Y OUR NETWORK s Create logical and physical groups of the devices in your database using Transcend Central. Setting Thresholds Thresholds are the upper and lower limits that you set for the network and Alarms conditions and events that you are monitoring with network management software. When these limits are exceeded, the management software reports that a threshold has been exceeded (usually by icons changing color). Alarms add to this reporting functionality by allowing you to configure an action to be taken (such as disabling ports or sending e-mail) if the threshold is exceeded. Alarms that are configured correctly can prevent inconvenient or even catastrophic network failures. The main advantage of alarms is that you can specify at exactly which point an action should take place, and you can tailor them to suit the normal operating conditions of your network. The first time that you use the Transcend NCS applications, use the default thresholds to see how they apply to your network. After you assess your network’s normal behavior, you can adjust the thresholds and alarms to make them more useful for your particular network. See “Identifying Your Network’s Normal Behavior” for more information. Setting Thresholds in Status Watch You can set a rising threshold and a falling threshold for most Status Watch tools. The rising threshold triggers a status severity change when the threshold is exceeded. The falling threshold causes a status severity change when the excessive activity or abnormal condition has returned to normal. For example, your Ethernet network may normally accommodate 50 percent utilization. If it exceeds 60 percent for an extended time, your network slows considerably. You want to know when and for how long your network exceeds the threshold of 60 percent. Status Watch also allows you to set status severity levels for events in the FDDI Status and the System Status tools. You can set the severity level setting for the conditions and events. For some conditions and events, you can specify severity level settings for the individual values of the variables. For more information about setting thresholds in Status Watch, see the Status Watch User Guide and Status Watch Help.
  54. Configuring Transcend NCS 55 Setting Thresholds and Alarms in LANsentry Manager Much of network management involves monitoring for specific network events. With LANsentry Manager, you can specify these events in advance and then know as soon as they occur. This process is known as setting alarms. Consider the following examples of alarms: s Example A: The router on your network, which is capable of forwarding data at 3,000 packets per second (pps), appears to have problems forwarding at the top of its specification. You configure an alarm to notify you as soon as the traffic approaches this rate. s Example B: Your network is running at 1,400 pps. Typically, a Cyclic Redundancy Check (CRC) rate of more than 1 percent of network traffic is considered excessive. You configure an alarm to notify you as soon as the CRC rate climbs above the threshold of 14 pps. Over time, you build up a library of alarms for your own network. Refining Alarm Settings You can refine your alarms for more exact monitoring by setting the hysteresis zone and defining Start and Stop events. Hysteresis zone For more control over the conditions that trigger an alarm, you can also specify a hysteresis zone around the specified value. The hysteresis zone ensures that alarms are not triggered due to small fluctuations around the threshold value. The hysteresis zone is the area where a value has fallen below the upper threshold (also called the rising threshold) but has not yet reached a lower threshold (also called the falling threshold). After a rising threshold generates an alarm, the value must fall below the falling threshold before another alarm is generated. For alarms that are set on falling thresholds, the rule is reversed. Figure 7 shows an example of this alarm mechanism.
  55. 56 CHAPTER 3: STEPS TO ACTIVELY MANAGING Y OUR NETWORK Figure 7 Alarm Triggering Mechanism Hysteresis zone Time Alarm event generated Stop and Start In addition to using alarms on their own, in LANsentry Manager, you can events use them as Start or Stop events when capturing packets with the Capture application. In Example A, you can start capturing all packets the router transmits whenever the traffic rate rises above 2,800 packets per second and then stop capturing when it drops below this level. In this way, you can capture packets leading up to the event and immediately after. By combining alarms and the Capture application, you have powerful troubleshooting capabilities. For more information about setting alarms with LANsentry Manager, see the LANsentry Manager User Guide and Help. Setting Alarms Based on a Baseline When you determine the baselines of your network’s normal activity with Traffix Manager, you can use the Alarms View in LANsentry Manager to set alarms that trigger when network activity deviates from the baseline. See “Baselining Your Network” for more information.
  56. Knowing Your Network 57 When determining the baseline for setting utilization alarms, use either of these approaches: s Set alarms for any peaks in network utilization — Pick a baseline value that covers most of your network traffic, ignoring any obvious one-time-only peaks. For example, as users log on at the start of the day, you see a large peak in network utilization. The alarm is triggered whenever such peaks occur. s Set alarms for exceptional peaks in network utilization — Pick a baseline value that covers the highest possible peak seen when service was still provided. The alarm is triggered at levels higher than this peak, alerting you to the most serious utilization on your network. When you choose the baseline for error alarms, pick the lowest possible baseline so that the alarm is triggered by any peaks. Other Tips for Setting Thresholds and Alarms For SNMP traps to be effective, their thresholds must be high enough so that they do not generate false alarms. On the other hand, high thresholds also mean that small amounts of errors can escape detection. A very small error rate that regularly occurs (such as four per minute) can cause major problems with protocols with large retry delays. For example, some MAC-level errors corrupt packets so that a switch does not forward them. Knowing Your You can better troubleshoot the problems on your network by: Network s Knowing Your Network’s Configuration s Identifying Your Network’s Normal Behavior Knowing Your Part of understanding your network is knowing its physical and logical Network’s configuration. You should know: Configuration s Which devices are on your network s How the devices are configured s Which devices are attached to the backbone s Which devices connect your network to the outside world (WAN) To keep track of your network’s configuration, gather the following information:
  57. 58 CHAPTER 3: STEPS TO ACTIVELY MANAGING Y OUR NETWORK s Site Network Map s Logical Connections s Device Configuration Information s Other Important Data About Your Network This data, when kept up-to-date, is extremely helpful for locating information when you experience network or device problems. Site Network Map A network map helps you to: s Know exactly where each device is physically located s Easily identify the users and applications that are affected by a problem s Systematically search each part of your network for problems You can create a network map using any drawing or flow chart application. Store your network map online. In addition, make sure that you always have a current version on paper in case you cannot access the online version. Figure 8 shows an example of a network map of 3Com devices.
  58. Knowing Your Network 59 Figure 8 Example of a Site Network Map Floor 2 Printers Windows NT workstations Ethernet Fast Ethernet UNIX workstations UNIX workstations Internet Modems SuperStack II ISDN Hub 100 TX CoreBuilder 3500 TM Servers Floor 1 Floor 1 Data center FDDI Backbone IP: 138.6.1.xxx NETBuilder II 8-slot NETBuilder II® 8-slot Windows NT workstations AccessBuilder® FDDI 5000 7-slot IP: 138.6.13.xxx SuperStack II Enterprise Monitor with FDDI module Fast Ethernet CoreBuilder 3500 CoreBuilder 9000 with SwitchModules CS/2500 FDDI SuperStack II FDDI Switch 3300 IP: 138.6.12.xxx Ethernet Fast Ethernet Ethernet SuperStack® II Network Server farm Switch 2200 Windows 95 management Web server workstations station Mail server Printers with FDDI card NetWare servers Consider including the following information on your network map: s Location of important devices and workgroups (by floor, building, or area) s Location of the network backbone, data center, and wiring closets, as appropriate for your network s Location of your network management stations s Location and type of remote connections s IP subnetwork addresses for all managed switches and hubs s Other subnetwork addresses, such as Novell IPX and AppleTalk, if appropriate for your network
  59. 60 CHAPTER 3: STEPS TO ACTIVELY MANAGING Y OUR NETWORK s Type of media (by actual name, such as 10BASE-T, or by grouping, such as Ethernet), which you can show with callouts, colors, line weights, or line styles s Virtual workgroups, which you can show with colors or shaded areas s Redundant links, which you can indicate with gray or dashed lines s Types of network applications that are used in different areas of your network s Types of end stations that are connected to the switches and hubs Complete data about end station connections is usually too detailed for the network map. Instead, maintain tables that detail which end stations are connected to which devices, along with the MAC addresses of each end station. Use tools like “Address Tracker” to generate the MAC address information. Logical Connections With the advent of virtual LANs (VLANs), you need to know how your devices are connected logically as well as physically. For example, if you have connected two devices through the same physical switch, you can assume that they can communicate with each other. However, the devices can be in separate VLANs that restrict their communication. Knowing the setup of your VLANs can help you to quickly narrow the scope of a problem to a VLAN instead of to a network connection. The Transcend NCS application Enterprise VLAN Manager allows you to view the logical makeup of your network. Depending on the complexity of your network and VLAN configurations, you can use colors to show the VLANs graphically on your network map. Device Configuration Information Maintain online and paper copies of device configuration information. Make sure that all online data is stored with your site’s regular data backup. If your site does not have a backup system, copy the information onto a backup disc (CD, Zip disk, and the like) and store it offsite. The Transcend NCS Network Admin Tools include applications that allow you to save device configurations.
  60. Knowing Your Network 61 Follow these guidelines for saving configuration information: s Because the easiest way to recover a device’s configuration is to use FTP or TFTP, save the configuration settings of each device that supports this method of uploading. s For other devices, Telnet in and save the session (which contains configuration details) to a file. If you cannot print the configuration of a device, then create a quick “rebuild” guide that explains the quickest way to configure the device from a fresh install. s For devices that store information to diskette, store this data as part of your site’s regular backup. s For routers and other important devices with text configuration files, store this data online in a revision control system. Keep the most recent version on paper. Keep previous versions. s For PCs, keep a recovery disk for each type of PC. For any device that you use as a server, store all startup scripts and copies of registries. Other Important Data About Your Network For a complete picture of your network, have the following information available: s All passwords — Store passwords in a safe place. Keep previous passwords in case you restore a device to a previous software version and need to use the old password that was valid for that version. s Device inventory — The inventory allows you to see the device type, IP address, ports, MAC addresses, and attached devices at a glance. Software tools, such as Transcend Central, can help you keep track of the 3Com devices on your network. Using Transcend Central, you can group devices by type and location and have this information on hand for troubleshooting. s MAC address-to-port number list — If your hubs or switches are not managed, you must keep a list of the MAC addresses that correlate to the ports on your hubs and switches. Generate and keep a paper copy of this list, which is crucial for deciphering captured packets, using Address Tracker. Do not rely on Address Tracker getting an up-to-date list of MAC addresses because the network may be down, which prevents SNMP polling. If the network is down, an exported copy of Address Tracker’s data is invaluable (online or on paper).
  61. 62 CHAPTER 3: STEPS TO ACTIVELY MANAGING Y OUR NETWORK s Log book — Document your interactions, no matter how trivial, with each device that is critical to your network’s operation (that is, routers, remote access devices, security servers). For example, document that you noticed a fan making noise one morning. Your note may help you to identify why a device is over temperature a week later (because the fan stopped working). s Change control — Maintain a change control system for all critical systems. Permanently store change control records. s Contact details — Store, online and on paper, the details of all support contracts, support numbers, engineer details, and telephone and fax numbers. s LANsentry Reporter — Use LANsentry Reporter to generate reports from the database. To be ready to remotely access your network, store the network maps, contact details, and important network addresses at the homes of those who support the network. Identifying Your By monitoring your network over a long period, you begin to understand Network’s Normal its normal behavior. You begin to see a pattern in the traffic flow, such as Behavior which servers are typically accessed, when peak usage times occur, and so on. If you are familiar with your network when it is fully operational, you can be more effective at troubleshooting problems that arise. Baselining Your Network You can use a baseline analysis, which is an important indicator of overall network health, to identify problems. A baseline can serve as a useful reference of network traffic during normal operation, which you can then compare to captured network traffic while you troubleshoot network problems. A baseline analysis speeds the process of isolating network problems. By running tests on a healthy network, you compile “normal” data to compare against the results that you get when your network is in trouble. For example, Ping each node to discover how long it typically takes you to receive a response from devices on your network. Applications such as Status Watch, Address Tracker, LANsentry Manager, and Traffix Manager allow you to collect days and weeks of data and set a baseline for comparison. Through the reporting mechanisms in the
  62. Knowing Your Network 63 following list, you can continuously assess the data from your network and ensure that its performance is optimal: s Web Reporter generates daily or weekly reports from data collected by Status Watch. s Traffix Manager generates weekly reports from collected data and calculates the baselines for you. Set up Utilization History and Error History reports with data resolution set to Weekly. s LANsentry Manager History View generates daily utilization graphs, which are sampled every 30 minutes, for each day over one week. Use these graphs to calculate your network baselines manually. Identifying Background Noise Know your network’s background noise so that you can recognize “real” data flow. For example, one evening after everyone is gone, no backups are running, and most nodes are on, analyze the traffic on your network using the Traffix Manager application. The traffic that you see is mostly broadcast and multicast packets. Any errors that you see are the result of faulty devices (trace). This traffic is the background noise of your network — traffic that occurs for little value. If background noise is high, redesign your network.
  63. 64 CHAPTER 3: STEPS TO ACTIVELY MANAGING Y OUR NETWORK
  64. NETWORK CONNECTIVITY II PROBLEMS AND SOLUTIONS Chapter 4 Manager-to-Agent Communication Chapter 5 FDDI Connectivity Chapter 6 Token Ring Connectivity and Errors Chapter 7 ATM and LANE Connectivity
  65. MANAGER-TO-AGENT 4 COMMUNICATION Use these sections to identify and correct problems with communication between the management station and network devices: s Manager-to-Agent Communication Overview s Verifying Management Configurations s See “Manager-to-Agent Communication Reference” (for additional conceptual and problem analysis detail.) Manager-to-Agent If your management workstation cannot communicate with devices on Communication the network, examine your management configurations for the devices Overview and your management station configurations. For information about Simple Network Management Protocol (SNMP), see “SNMP Operation”. Understanding If your management station or the devices that you manage are the Problem incorrectly configured for management, then the management station, which includes your Transcend® applications, cannot perform autodiscovery, polling, or SNMP Get and Set requests on the device. If you have not configured port connections (including a possible out-of-band serial or modem connection) and have not created an administration password for access to the management agent, do so before you continue. Identifying Examine your management configurations for any device that your the Problem management station cannot reach. Also examine your management station setup. If you can reach a device but are not receiving traps, first examine the trap configurations (the trap destination address and the traps configured to send). See “Configuring Traps” for more information.
  66. 68 CHAPTER 4: MANAGER-TO-AGENT COMMUNICATION Solving the Problem Either modify device configurations so that they are the same as your management stations or modify the management station to match the configurations of your devices. Verifying Verify that the following management configurations are correct: Management s IP Address Configurations s Gateway Address s Subnet Mask s SNMP Community Strings s SNMP Traps How these parameters are configured can vary by device. For more information, see the user guide for each device. Follow these steps: 1 Ping the device. s If the device is accessible by Ping, then its IP address is valid and you may have a problem with the SNMP setup. Go to step 5. s If the device is not accessible by Ping, then there is a problem with either the path or the IP address. 2 To test the IP address, Telnet into the device using an out-of-band connection. If Telnet works, then your IP address is working. 3 If Telnet does not work, connect to the device’s console using a serial line connection and ensure that your device’s IP address setting is correct. If your management station is on a separate subnetwork, make sure that the gateway address and subnet mask are set correctly. 4 Using a management application, perform an SNMP Get and an SNMP Set (that is, try to poll the device or change a configuration using management software). 5 If you cannot reach the device using SNMP, access the device’s console and make sure that your SNMP community strings and traps are set correctly. You can access the console using Telnet, a serial connection, or a Web management interface.
  67. Manager-to-Agent Communication Reference 69 Manager-to-Agent This section explains management configuration terms and provides Communication additional conceptual and problem analysis detail. Reference IP Address Devices use IP addresses to communicate with the management station and to perform routing tasks. Assign a unique IP address to each device in your network. Choose each IP address from the range of addresses that are assigned to your organization. Gateway Address The default gateway IP address identifies the gateway (for example, a router) that receives and forwards those packets whose addresses are unknown to the local network. The agent uses the default gateway address when sending alert packets to the management workstation on a network other than the local network. Assign the gateway address on each device. Subnet Mask The subnet mask is a 32-bit number in the same format and representation as IP addresses. The subnet mask determines which bits in the IP address are interpreted as the network number, which as the subnetwork number, and which as the host number. Each IP address bit that corresponds to a 1 in the subnet mask is in the network/subnetwork part of the address. This group of numbers is also called the Network ID. Each IP address bit that corresponds to a 0 is in the host part of the IP address. The subnet mask is specific to each type of Internet class. The subnet mask must match the subnet mask that you used when you configured your TCP/IP software. SNMP Community An SNMP community string is a text string that acts as a password. It is Strings used to authenticate messages that are sent between the management station (the SNMP manager) and the device (the SNMP agent). The community string is included in every packet that is transmitted between the SNMP manager and the SNMP agent. After receiving an SNMP request, the SNMP agent compares the community string in the request to the community strings that are
  68. 70 CHAPTER 4: MANAGER-TO-AGENT COMMUNICATION configured for the agent. The requests are valid under these circumstances: s Only SNMP Get and Get-next requests are valid if the community string in the request matches the read-only community. s SNMP Get, Get-next, and Set requests are valid if the community string in the request matches the agent’s read-write community. For more information about SNMP requests and community strings, see “SNMP Operation” A device is difficult or impossible to manage if: s The device is not using the correct community strings. s Your management station uses community strings that do not match those of the devices it manages. If community strings do not match, either modify the community string at the device so that it is the string that the management station expects, or modify the management station so that it uses the device’s community strings. Table 6 lists the default community strings for some common 3Com devices. Modify these default strings when you install a new device. You can use “Device View” to change community strings of most 3Com devices. Community string settings are case-sensitive for all devices. Table 6 Default Security Settings for Common 3Com Devices Read-Only Read-Write Device Community Community AccessBuilder® 7000 BRI Card and PRI Card public private CoreBuilder™ 2500 public private CoreBuilder 3500 public private CoreBuilder 5000 public private CoreBuilder 6000 public private CoreBuilder 7000 public private CoreBuilder 9000 public private CoreBuilder 9400 public private NETBuilder® public *
  69. Manager-to-Agent Communication Reference 71 Table 6 Default Security Settings for Common 3Com Devices Read-Only Read-Write Device Community Community NETBuilder II® public * OfficeConnect® products monitor security OfficeConnect Remote 511, 521, and 531 public private ONline™ hubs public * SuperStack® II Desktop Switch public security SuperStack II Hub TR Network Management public private Module SuperStack II Enterprise Monitor public admin SuperStack II PS Hub monitor security SuperStack II Switch 1000 public security SuperStack II Switch 2000 TR public private SuperStack II Switch 2200 public private SuperStack II Switch 3000 (all variations) public security SuperStack II Switch 3900 public private SuperStack II Switch 9300 public private SuperStack II Token Ring Monitor public admin Transcend® Enterprise Monitor 540 public admin Transcend Enterprise Monitor 542 public admin Transcend Enterprise Monitor 570 public admin * By default, no setting exists or is needed for initial access on this device. Although community strings are SNMP’s way to secure management communication, these strings appear in the SNMP packet header unencrypted and are visible if the packet data is analyzed. For this reason, change community string settings frequently to improve management security. SNMP Traps If your platform or management applications do not report events for some devices, then SNMP trap reporting may not be configured correctly for those devices. If you find that traps are overwhelming your management workstation, you can filter out (disable) some common traps so that the management station does not receive them. Most devices allow you to select which traps to send to a management station IP address.
  70. 72 CHAPTER 4: MANAGER-TO-AGENT COMMUNICATION You can use “Device View” to change the trap reporting configuration of most 3Com devices. See “Trap Reporting” for more information.
  71. FDDI CONNECTIVITY 5 Use these sections to identify and correct connectivity errors on an FDDI ring: s FDDI Connectivity Overview s Monitoring FDDI Connections s Making Your FDDI Connections More Resilient See “FDDI Connectivity Reference” for additional conceptual and problem analysis detail. FDDI Connectivity Fiber Distributed Data Interface (FDDI), which is a self-correcting Overview technology, automatically corrects ring faults to maintain connectivity throughout most of the network. However, you should monitor your FDDI connections for wrapped rings and other problems with ring connectivity. Understanding As shown in Figure 9, in a thru FDDI LAN, no stations on the trunk ring the Problem have a Configuration State (SMTConfigurationState) of Wrap or Isolated. However, users who complain about network performance may have lost connectivity to other stations on the network because the FDDI network is wrapped or segmented. Figure 9 Thru Ring thru thru thru thru Wrapped ring By monitoring the “Peer Wrap Condition”, you can see when the Configuration State changes. In a wrapped ring (Figure 10), two stations
  72. 74 CHAPTER 5: FDDI CONNECTIVITY on the LAN are in a wrapped Configuration State. This condition may or may not affect the connectivity of certain stations. Although operational, your network may have a cabling problem or a problem with a link. Figure 10 Wrapped LAN thru thru wrap_A wrap_B Segmented ring In a segmented ring (Figure 11), more than two stations are wrapped on the trunk ring. Although this mode of operation is a valid FDDI LAN configuration, your LAN is probably experiencing a degraded or degrading condition. Figure 11 Segmented Ring wrap_A wrap_A wrap_B wrap_B When a network connection has excessively high link errors, Station Management (SMT) shuts down the connection and tries to reconnect again. A dual-attachment trunk ring station with an A or B connection that is shut down is one of the wrap points in the network. See “Making Your FDDI Connections More Resilient” for information about keeping a dual-attachment station connection from wrapping. Isolated station Sometimes a network wraps a particular station out of the ring. Stations on either side of a problem station can be wrapped. This effectively isolates the station or links that have problems, as shown in Figure 12.
  73. FDDI Connectivity Overview 75 Figure 12 Wrapped Ring with Isolated Station thru thru wrap_A isolated wrap_B If a ring was already wrapped when a network wraps a station out of the ring, then a segmented ring results, as shown in Figure 13. Figure 13 Segmented Ring with Isolated Stations 2nd down thru wrap_A thru isolated wrap_A wrap_B thru wrap_A isolated isolated wrap_B wrap_B 1st down Twisted ring In a twisted ring, an A port is connected to an A port and a B port is connected to a B port instead of the normal A-to-B connections. A twisted ring, which always has two twist points (stations), can exist in either a Thru or Wrap state. You can monitor the “Twisted Ring Condition” and “Undesired Connection Attempt Event” for evidence of twisted ring and other connection problems. Identifying To identify the problem, follow this process: the Problem 1 At the FDDI LAN level, verify that your network is operating. If the network is operating, the FDDI ring may be segmented, and therefore an FDDI station or an Ethernet station on an Ethernet link may have lost connectivity to other nodes on the network. 2 Determine if a ring is in a Thru, Wrap, or Segmented state. If the FDDI ring is segmented or wrapped, look for a problem with a link somewhere in the network or for a nonfunctioning node on your trunk ring. If the ring is operating and is not segmented, or if it is segmented
  74. 76 CHAPTER 5: FDDI CONNECTIVITY but you still have connectivity to the stations in question, move to a more specific level in your network. See “Monitoring FDDI Connections” for more information. 3 Determine if the poorly performing station is an Ethernet or FDDI station. If the problem is an FDDI station, find out if it is congested (that is, if the station is so busy that it cannot accept all the network traffic that is directed to it) by determining its “Bandwidth Utilization”. Also determine if the station has a high frame error rate by looking at the “FDDI Ring Errors”. If the problem is an Ethernet station, look for congestion by examining “Ethernet Packet Loss” and “Bandwidth Utilization”. Solving the Problem Identify the station that is causing the disconnection and take the appropriate steps: s If the disconnection is caused by a wrapped ring, then fix the hardware or cabling problem at that station. s If the station is congested, you have a device problem rather than a network problem. For example, if the congested station is a file server and every other machine on the network is retrieving and saving files using that server, consider upgrading your server or adding additional servers to the network. A variety of devices from different vendors may be communicating on an FDDI or Ethernet network; some are faster and more capable, and some are slower and more prone to congestion. s If the station is an Ethernet station that is attached to an Ethernet segment, reevaluate the setup of your Ethernet network and make some changes to improve its performance. You can also make FDDI connections more resilient by implementing dual homing or installing an Optical Bypass Unit (OBU) where FDDI connections are prone to fail. See “Making Your FDDI Connections More Resilient” for more information. Monitoring FDDI Monitor your FDDI devices for Warning or Critical alerts in the FDDI Status Connections tool. Status Watch Use Status Watch to identify these FDDI connectivity errors:
  75. Making Your FDDI Connections More Resilient 77 s Peer Wrap Condition s Twisted Ring Condition s Undesired Connection Attempt Event Follow these steps: 1 In the Device area, select the device that is located where you suspect an FDDI ring connectivity problem. 2 Monitor the FDDI Status tool for the currently selected device. Here are some pointers for monitoring: s If the Peer Wrap Configuration State variable is Isolated, the device is not connected to the FDDI trunk ring. If you intend the device to remain isolated, this indication is not a serious condition. However, if the device is supposed to be connected on a trunk ring, a serious problem may exist. The device is no longer transmitting packets to the larger trunk ring. s If the Peer Wrap flag (SMTPeerWrapFlag) is set, the device is one of the wrap points. The cause of the wrapped ring is somewhere in the portion of the network between the two stations that report the peer wrap condition. Making Your FDDI When devices are removed from an FDDI ring, there is a break in the fiber Connections More path that causes the ring to wrap until the ring is made whole again. To Resilient prevent the break in the FDDI connection, you can implement dual homing or install an Optical Bypass Unit (OBU). Implementing Dual When the operation of a dual attachment node is critical to your Homing network, dual homing adds reliability by providing a backup connection if the primary link fails. Because a dual attachment station (DAS) has two attachments to the FDDI ring (A-to-M and B-to-M), you can use one of them as a “standby” link if the active link fails. Using dual homing, only one of the two attachments is active at a time. In this sense, a DAS acts as if it is a single attachment station (SAS) by using its A port as the standby link. Through SMT, a DAS can be dual homed to the same concentrator or, more commonly, to two concentrators. This arrangement provides a more stable trunk ring of concentrators. If one concentrator fails, the DAS
  76. 78 CHAPTER 5: FDDI CONNECTIVITY enables the standby link to another concentrator to become the active link. See Figure 14. If the station is a dual path or dual path/dual MAC station, you can configure the dual-homed station in one of two ways: s With both links active s With one link active and one connection withheld as a backup, only becoming active when one link fails Figure 14 Dual Homing Configuration M A Concentrator #1 M Standby link set by SMT configuration policy B M SAS A A server FDDI Dual-homed dual switch ring B B Active link M SAS A M Concentrator #2 M B M Installing an Optical You can insert an Optical Bypass Unit (OBU) into the FDDI ring as if it were Bypass Unit a node and then plug your device into it. To use an OBU, your device needs an optical bypass interface. This interface lets the bypass know whether your device is still on the ring or not. See Figure 15. If your device is removed or if it fails, the bypass unit diverts the optical path away from your device, keeping the ring whole. You can use a bypass on devices that are prone to failure or are likely to be removed often, such as diagnostic equipment.
  77. FDDI Connectivity Reference 79 Figure 15 Optical Bypass Unit Configuration OBU FDDI dual A ring A A MIC DAS receptacles B B B Power/control cable connected to the optical bypass interface of the DAS FDDI Connectivity This section explains terms that are relevant to FDDI connectivity and Reference provides additional conceptual and problem analysis detail. Peer Wrap Condition A Peer Wrap (wrapped ring) condition occurs when a dual-attachment station detects a fault (often a lost connection) and reconfigures the network by wrapping the dual trunk rings to form a single ring. Normally, the two stations that are adjacent to the fault wrap to maintain full connectivity. However, if a second fault occurs before the first is repaired, the network partitions itself into two or more rings and stations lose connectivity. When a station reports a Peer Wrap condition, locate and repair the problem that caused the station to wrap the rings. Potential causes include: s Faulty FDDI port hardware s Faulty cables or connectors s Unplugged connectors s Powered-down stations You can expect to find the cause of the problem between the two stations that report the Peer Wrap condition. Twisted Ring A Twisted Ring condition occurs when certain undesirable connection Condition types exist. See Table 7 for more information. Although similar to the Undesired Connection Attempt, the Twisted Ring condition provides specific Station Management (SMT) and port information for diagnosis.
  78. 80 CHAPTER 5: FDDI CONNECTIVITY Undesired An Undesired Connection Attempt event occurs when a port tries to Connection connect to another port of a type that may result in an undesirable Attempt Event network topology. Whether the connection attempt is successful depends on the current setting of the station’s connection policies. Table 7 lists connections that the FDDI standard defines as undesirable. The managed devices may or may not permit these connections, depending on their FDDI station configurations. Table 7 Undesirable Connection Types Connection Type* Reason That the Connection Is Undesirable A-A Twisted primary and secondary rings A-S A wrapped ring B-B Twisted primary and secondary rings B-S A wrapped ring S-A A wrapped ring S-B A wrapped ring M-M A tree of rings topology (illegal connection) * SuperStack® II Monitor series and Transcend® Enterprise Monitor series use type 1 to represent connection type A and type 2 to represent connection type B. Table 8 lists FDDI connections that create valid topologies. Table 8 Valid Connection Types Connection Type Reason That the Connection Is Valid A-B A normal trunk peer connection A-M A tree connection with possible redundancy. In a single MAC node, Port B has precedence (by default) for connecting to a Port M. B-A A normal trunk ring peer connection B-M A tree connection with possible redundancy. In a single MAC node, Port B has precedence (by default) for connecting to a Port M. S-S A single ring of two slave stations S-M A normal tree connection M-A A tree connection that provides possible redundancy M-B A tree connection that provides possible redundancy M-S A normal tree connection
  79. TOKEN RING CONNECTIVITY AND 6 ERRORS Use these sections to identify and correct token ring errors: s Token Ring Overview s Using Transcend Applications to Identify Problems and Symptoms s Identifying and Solving Ring Errors s Troubleshooting Notes Token Ring Token Ring’s ring topology uses a token passing method for ring access Overview and data transmission. The term “ring” is derived from the logical adjacency of the links between the adapter cards in a token ring network. Each adapter card has physical links to an upstream neighbor and to a downstream neighbor. Each device on the ring transmits onto the downstream link and receives data from the upstream link. In this way, each node acts as a repeater, passing traffic from neighbor to neighbor. The term “token” refers to a special data sequence that is continuously sent around the ring. Any node that has data to send waits to receive the token before sending that data. Only a station in possession of a token can transmit new data on the ring. Unlike Ethernet and other contention-oriented protocols, token passing resolves network access conflicts without collisions. This arrangement ensures that every station on a token ring always has access to the network within a predictable time interval, even under heavy traffic load. Token Ring connects up to 260 nodes in a star topology at 4 Mbps or 16 Mbps. Token Ring is a data link protocol or MAC layer protocol and functions at layers 1 and 2 of the 7-layer Open Systems Interconnection (OSI) model.
  80. 82 CHAPTER 6: TOKEN RING C ONNECTIVITY AND ERRORS Using Transcend Troubleshooting your Token Ring network is a basic elimination process. Applications to The design of Token Ring makes it easier to isolate the causes of poor Identify Problems network performance or network failure because of the nearest upstream and Symptoms neighbor (NAUN) concept. However, locating the actual cause of the symptom may not be that obvious. Correctly isolating the symptom and cause eases your troubleshooting tasks and decreases the time your network is down or experiencing poor performance. Use these tools to troubleshoot ring errors: s Token Ring Manager’s Statistics Tool (Windows and UNIX) s TR Analyzer (Windows only) s LANsentry Manager (Windows and UNIX) s Status Watch’s Token Ring Status Tool s Status Watch’s Token Ring Utilization Tool Using Token Ring To view general performance statistics of your token ring network you Statistics Tool can use Token Ring Manager’s Statistics Tool. This tool, available for Windows and UNIX, displays high-level statistics of your network. See Figure 16.
  81. Using Transcend Applications to Identify Problems and Symptoms 83 Figure 16 Token Ring Manager’s Statistics Tool
  82. 84 CHAPTER 6: TOKEN RING C ONNECTIVITY AND ERRORS Token Ring Manager’s Statistics Tool shows top level statistics of your token ring network, such as total utilization, soft errors, and hard errors. You can view these statistics on three levels: s Port s Stack s Unit For more information on Token Ring Manager’s Statistics Tool, see the Token Ring Manager User Guide. Soft errors are less serious types of Token Ring errors that usually only temporarily disturb normal ring performance. However, high occurrences of certain types of soft errors can impact your network. You can review the soft errors that have occurred by using LANsentry® Manager (Figure 17) or TR Analyzer. These soft errors include: s Internal error s Burst error s Line error s Abort delimiter transmitted error s AC error s Lost frame error s Receiver congestion error s Frame copied error s Frequency error s Token error Using LANsentry LANsentry Manager consists of an integrated set of applications that you Manager can use to display and explore the real-time and historical data captured by RMON-1 and RMON-2 compliant devices on the network. You can use LANsentry Manager to collect statistics to identify and deal with imminent problems. Use LANsentry Manager to: s Capture and display packets using filtering and decode functions. s Configure alarms to monitor for specific events on a segment. s Monitor current performance of LAN segments.
  83. Using Transcend Applications to Identify Problems and Symptoms 85 s Spot signs of current problems s View trends over time At specified levels, LANsentry Manager polls remote network devices to retrieve essential network data. LANsentry Manager processes and displays the collected data in the main window. From the main window you can monitor the health of a segment, its current performance and recent trends. You can also open new windows to monitor different segments at the same time. The following graphs appear for Token Ring in LANsentry Manager’s main window: s Packet Size Distribution s Packet Rates s Network Statistics s Top 10 Hosts (Packet Rate) s Top 10 Hosts (Error Rate) s Token Ring Status s Event Distribution For in-depth investigation, you can launch LANsentry Manager’s RMON Views and Applications from the main window. The RMON Views and Applications allow you to: s Capture and display specific packets. s Compare statistics from different segments in the same graph. s Look at statistical and historical data. s Monitor conversations between stations on the network. s Set up alarm conditions. Using the Ring Station View The Ring Station View generates a table of statistics and status information associated with each station on the ring, including station status and last entered and last exited times. For example, a disruption is caused every time a station inserts onto the ring. This results in a ring purge event and a new token is issued by the
  84. 86 CHAPTER 6: TOKEN RING C ONNECTIVITY AND ERRORS active monitor. Use the Ring Station View to track which station is doing this and to discover the active monitor issuing the token. Use the Ring Station View to: s Spot patterns on the token ring. s Review isolating errors and non-isolating errors. s See which devices are currently active on the ring. Figure 17 LANSentry Manager Main Window For more information on Ring Station View or LANsentry Manager, see the LANsentry Manager User Guide. Using TR Network The TR Network Analyzer Tool provides an interface to the most Analyzer Tool important performance statistics that Token Ring management agents monitor. The interface provides a summary of the critical status and performance information for any monitored Token Ring network at a glance. The TR Network Analyzer Tool window displays summary network configuration information and performance statistics. The right side of the window displays network graphs. The left side of the window displays an active station and error statistics list.
  85. Using Transcend Applications to Identify Problems and Symptoms 87 Network Graphs By default, the TR Network Analyzer window shows the following graphs: s Utilization — Mac and Non-Mac s Soft Errors — Isolating and Non-isolating s Recoveries — Claim Tokens, Beacons, and Ring Purges You can change the network graphs that appear. The following network graphs are available: s Utilization s Soft Errors s Recoveries s Line Errors s Internal Errors s Burst Errors s AC Bit Errors s Abort Xmits s Lost Frames s RX Congest s Frame Copys s Frequency Errors s Token Errors Active Station and Error Statistics List The Active Station and Error Statistics List lists all active stations on the managed network with corresponding station error information. This spotlights error trends and lets you quickly identify problem nodes. This list is divided into two columns. The first column contains the ASCII characters that represent a station address. The second column is the number of errors that the station has. These errors are broken down into specific errors in a box below. These errors can be: s Line errors
  86. 88 CHAPTER 6: TOKEN RING C ONNECTIVITY AND ERRORS s Internal errors s Burst errors s AC Bit errors s Aborted transmits s Lost Frame errors s Congestion errors s Frame Copied s Frequency errors s Token errors s Soft errors s Isolating errors s Non-Isolating errors For more information about the TR Network Analyzer Tool, see the Token Ring Manager User Guide for Windows. Transcend’s Status Watch application provides two tools for viewing and monitoring Token Ring segments. You can use these tools to: s Monitor events of token ring stations s View and configure utilization of bandwidth on the ring Token Ring Status The Token Ring Status tool monitors conditions and events of the token Tool ring stations in a managed device. When you select a device in the report, the conditions and events for that device are sorted by severity. When you select a condition or event, the related variables and current values appear in the far right column. Token Ring Utilization The Token Ring Utilization tool monitors the amount of traffic on token Tool ring segments and shows how the bandwidth is being allocated on your network. You can use the collected information to determine which ports have excessively high or low utilization. If necessary, you can redistribute network traffic accordingly. For example, if utilization is 40 percent or
  87. Identifying and Solving Ring Errors 89 higher on a shared token ring segment, you need to reconfigure the network to better balance the load. For more information about Status Watch, see the Status Watch User Guide. Identifying and This section contains three sample problems that you may encounter on Solving Ring Errors your network. These examples show ways to begin your troubleshooting task using Transcend®. Example 1 A user cannot access the network. 1 Launch Token Ring Manager’s Network Analyzer (Windows) or LANsentry Manager (UNIX). 2 Look up the user’s address, which can be a MAC address, IP address, or NetWare name. If you find the correct address, then you know that the user is on the ring; however the user may not be able to actually access a particular server. See Example 2. If you cannot find the user’s address, see Step 3. 3 If you did not find the user’s address: s Reboot the machine to determine if this “fixes” any connection problems. s Examine the physical wiring from the stations’ adapters to the hub. Example 2 A user cannot access the server. 1 Launch Token Ring Manager’s Network Analyzer (Windows) or LANsentry Manager (UNIX). 2 Look up the user’s address, which can be a MAC address, IP address, or NetWare name. 3 Launch LANsentry Manager (Windows and UNIX). 4 Perform a packet capture. You may want to perform a packet capture from the user end, the server end, or on any devices in between to find the source of the problem. Packet captures can be filtered on protocols or addresses. 5 Analyze various packet captures.
  88. 90 CHAPTER 6: TOKEN RING C ONNECTIVITY AND ERRORS Example 3 A user declares that the network is slow. 1 Launch LANsentry Manager. 2 Review the “Top 10 Errors”. View which station is reporting the errors and the type of errors being reported. 3 Take decided course of action on the NAUN (Next Active Upstream Neighbor). If you notice the problem is due to a high amount of traffic (greater than 40 or 50% utilization), do the following: 1 Review which station(s) or port(s) have high or excessive utilization. 2 Reconfigure the ring so that network traffic is distributed optimally in the network. Example 4 A group of users cannot access the network. Verify integrity of main ring path cabling. and/or 1 Launch Token Ring Manager’s Device Manager. 2 Examine RI/RO cards for devices on the ring. Example 5 None of the users on the ring can access the network. s Check the status of your switch, bridge, or router. s Ping device from station(s) on the ring. Troubleshooting Here are some recommendations for easing troubleshooting tasks: Notes Documentation The first thing a network administrator should do is document the network topology. Documenting your network is the first step in understanding and administering your network. This topology should include all devices on the network. You can document topologies of your complete LAN and of subnets in a LAN. Documenting saves you time when you begin analyzing network problems. It will ease your troubleshooting task in general.
  89. Troubleshooting Notes 91 You can document token ring networks with Token Ring Manager’s Map Manager. For more information on Token Ring Map Manager, see the Token Ring Manager User Guide for UNIX. Analyzing Failures There are two types of soft errors: normal and abnormal. Many normal soft errors occur when a station inserts into the ring or exits from a ring. These types of normal soft errors can usually be overlooked, especially when diagnosing a potentially more serious network Don’t overlook the basics or any “obvious” problems. Remember to confirm the integrity of the main ring path. Check for bad or loose cable connectors or bad or damaged cabling. Also, make sure that any peripheral devices are configured correctly. For example, verify that your Network Interface Card (NIC) is correctly set to 16Mbps or 4mbps. Know Your Network A network administrator should have a good understanding of your network’s baselines. Baselines are usually identified as average network utilization for a set time period. For example, peak utilization on a network may occur first thing in the morning when a majority of users are starting their systems and launching various applications. Baselining gives you a better understanding of how your network functions normally. This knowledge can also help you better analyze problems or failures in your network.
  90. 92 CHAPTER 6: TOKEN RING C ONNECTIVITY AND ERRORS
  91. ATM AND LANE CONNECTIVITY 7 Ensuring Asynchronous Transfer Mode (ATM) and LAN Emulation (LANE) connectivity is a vital step in troubleshooting your ATM network. Use these tools to establish a baseline from which to measure future performance variations. Use these sections to identify and correct ATM and LANE connectivity problems: s ATM and LANE Connectivity Overview s Color Status and Propagation s Device Level Troubleshooting s LANE Level Troubleshooting s ATM Network Level Troubleshooting s Virtual LANs Level Troubleshooting s Identifying VLAN Splits s Path Assistants for Identifying Connectivity and Performance Problems ATM and LANE ATM differs from conventional LAN technologies because it employs a Connectivity connection-based model for its basis. The connection in ATM is a Overview point-to-point or point-to-multipoint link from one end of a system to another end. ATM is also based on cells. In the process of completing the connection, the cells must traverse a series of ATM switches in the network. This methodology simplifies the delivery of cells because station destination and source addresses do not need to be carried in each cell. After a connection identifier makes the connection the connection remains open. Information is received in the same order in which it was sent so there is no need to disassemble and then reassemble cells. This delivery mechanism is especially useful and effective in voice and video applications.
  92. 94 CHAPTER 7: ATM AND LANE CONNECTIVITY Before data can be sent through an ATM network, a connection must be established between end stations using either a preestablished, fixed path or by a protocol that determines the signalling. Because ATM is connection-oriented, troubleshooting the network begins with the connection. Color Status and An extensive context status notification feature is supported in the Propagation Enterprise VLAN Management software. The same network event may cause different status on different logical maps/icons. For example, an LAN Emulation Client (LEC) that cannot join its LAN Emulation Server (LES) is considered a critical event in the LAN Emulation map and not necessarily a fault in the Enterprise map. The severity depends on the context or the logical domain. Colors propagate upwards to the parent icon, so that the next highest level window’s color is influenced. Transcend icons use high-end platform-configured icon status colors. Each status has a default color that the user can change. You need to: 1 Identify the icon status color. 2 Locate the event. 3 Identify the cause of the color status according to the tables below and fix the problem if possible. Table 9 lists the icon statuses according to the severity of the fault. . Table 9 Color Coding Key Status Color Critical Red Major Orange Minor Yellow Normal Green User-definable Brown Unknown Blue Disabled White
  93. Device Level Troubleshooting 95 Device Level See Table 10 for the color and status of devices. Using this information Troubleshooting you can determine where bottlenecks are occurring and then take appropriate steps to relieve the congestion. Table 10 Color Key for Root Window and Devices . Map Icon Status Status Cause Root Each icon reflects highest priority status of maps below it. Enterprise Device Critical Does not respond to SNMP. Major Hardware problem in the device. Minor Device ports are enabled but in down state. Normal Device operating normally. If one or more parts of the logical entity of a device is in a critical state, the device appears in Major state. For example, a CoreBuilder appears in a Major state if one or more of the LESs that is attached to it is in critical state. LANE Level The LAN Emulation map shows a comprehensive view of the real-time Troubleshooting status of all edge devices in the network. Use this map to quickly determine the status of all LECs in the network prior to the start of a new work day. If any LEC in any switch is not in operational state, the critical status propagates up to the edge device icon level and further to the LAN Emulation icon at the highest level. This status propagation allows you to isolate the problem and fix it prior to users calling them. Table 11 lists the status states of the LAN Emulation icons. . Table 11 Color Key for LANE Level Status Map Icon /Color Status Cause LAN Emulation All icons reflect highest priority status of maps below it. Backbone and LES Critical Not defined for this version. Services Major Does not respond to SNMP
  94. 96 CHAPTER 7: ATM AND LANE CONNECTIVITY Table 11 Color Key for LANE Level (continued) Status Map Icon /Color Status Cause Minor There is a user-defined name for this VLAN ID, but there is no LEC connected. Brown There is no LEC connected. Normal In operational state. LEC Critical The LEC is not connected to the LES. It may be in join, configure or LECS connect state. Major Does not respond to SNMP. Minor In initial state. Normal In operational state. LECS Major Does not respond to SNMP. Minor Enabled but not active. Unknown The LES is enabled but the LECS is disabled on the CoreBuilder device. Normal Enabled and operational. LANE User LEC Critical The LEC is not connected to the LES. It may be in join, configure or LECS connect state. Major Does not respond to SNMP. Minor In initial state. Normal In operational state. Segment Major The device connecting this segment does not respond to SNMP. Brown The first segment on the device may appear in this status. All other segments on the device are operating normally. Unknown Device (all segments) operating normally.
  95. ATM Network Level Troubleshooting 97 ATM Network Level Table 12 lists the status states of the ATM Network icons. These maps are Troubleshooting hidden by default because the Topology view contains the same information. If you want to display the ATM Network map, select Edit, then select Show Hidden Object from the HP OpenView menu. Table 12 Color Key for Network icons Map Icon Status Status Cause ATM Network Switch Domain Critical One or more of the lower level devices has an error of highest severity. Major One or more of the lower level devices has a hardware problem. Minor One or more lower level devices has device ports that are enabled but in down state. Normal Device operating normally. ATM Switch This icon shows the highest priority status of the edge devices Domain below it. Virtual LANs Level Table 13 lists the status states of the ATM Network icons. Troubleshooting Table 13 Color Key for Virtual LANs Icons Status Map Icon /Color Status Cause Virtual LANs Virtual LAN Critical The LES is in major state. Major One of the devices configured to use this VLAN does not respond to SNMP. Minor There is a user-defined name for this VLAN ID but there is no LEC connected. Brown There is no segment connected. VLAN LES Critical Not defined for this version. Major Does not respond to SNMP.
  96. 98 CHAPTER 7: ATM AND LANE CONNECTIVITY Table 13 Color Key for Virtual LANs Icons (continued) Status Map Icon /Color Status Cause Minor There is a user-defined name for this VLAN ID but there is no LEC connected. Brown There is no LEC connected. Normal Device operating normally. Segment Major The device connecting this segment does not respond to SNMP. Brown The first segment on the device may appear in this status. All other segments on the device are operating normally. Unknown Device (all segments) operating normally. Identifying VLAN After the redundancy in the LAN Emulation Server has taken effect, the Splits LAN Emulation Clients (LECs) are moved to the backup services. There may be circumstances where some of the (LECs) remain connected to the primary (LES) and are not moved to the backup LES. This condition creates a VLAN (ELAN) split. The VLAN split is caused because several (LECs) that belong to the same ELAN are bound to different LAN Emulation Servers. The split may occur when a LAN Emulation Server(LES) fails and the Network Management Station (NMS) changes the LAN Emulation Configuration Server database. Indications in the VLAN splits appear in the VLAN Map when the icon for the primary VLAN VLAN Map is green. This condition indicates that LECs are still attached. Under normal circumstances only one ELAN either primary or backup, should be green. Indications in the VLAN splits in the Backbone and Services window appear when different Backbone and LAN Emulation Clients (LECs), belonging to the same ELAN are bound to Services Window both the primary and backup LAN Emulation Server (LES) of an ELAN.
  97. Path Assistants for Identifying Connectivity and Performance Problems 99 General process To unify the split VLANS, you need to: 1 Ensure that all the LAN Emulation Configuration Servers have the same LAN that is displaying the split. 2 Using the Network Management Station, move the ports that are displayed in the primary VLAN to a temporary VLAN. Move the ports from the temporary VLAN into the backup VLAN. Empty ELANS in the network are indicated with a brown color key. Path Assistants for You can use the Enterprise VLAN Management Path assistants to display Identifying the paths between ATM devices and network elements that are part of Connectivity and LAN Emulation. Performance Problems LE Path Assistant Use LE Path Assistant to select any two LE Clients or two ports and to obtain the following information: s Address resolution through the LE Server s Control distributed path (direct) s Multicast forward addressing through the BUS s Data direct path The Path Assistant displays the corresponding ports, its LAN Emulation Clients, and the LES/BUS service used for the connection. The color of the icons isolates the problem area. You can use Path Assistant to isolate user-user or user-server connectivity problems. To access the LAN Emulation Path Assistant window, click on two Ethernet ports and then click the Path icon. ATM Path Assistant Use the Path option to select any two ATM User Network Interface (UNI) or Network to Network Interface (NNI) endpoints across the network and to see the physical path as well as the (VCCs) established between the two end points. The following information can be obtained from this assistant window: s The physical path including all the intermediate switch nodes and the physical link between them
  98. 100 CHAPTER 7: ATM AND LANE CONNECTIVITY s The ports at the ends of the physical links s The VCCs established between the end points You can also use the Path option to setup Private Virtual Channels between the selected endpoint. Tracing a VC Path To trace a VC Path between two ATM nodes perform the following: Between Two ATM s Select two ATM end nodes in the Topology Browser or management End Nodes maps, and then select the Path icon. Examining Virtual You can easily examine Virtual Channels in the Network-Network Channels Across Interface (NNI) and User-Network Interface (UNI) even if the icons are Layer 2 Topologies located in two different maps. To examine Virtual Channels: 1 In the management map, click on a device and select Path from the VLAN menu. 2 Select any two devices in the maps. 3 Select Find VCI/VPI to display the path between the two devices. Tracing the LAN To trace the LAN Emulation control VCCs between two LANE clients Emulation Control perform the following steps: VCCs Between Two LANE Clients 1 In the LAN Emulation map or the Topology Tree, select two LECs that are attached to the same LES. 2 Select the Path icon. The LE-Assist window is displayed, showing the control VCCs between the two LECs and the LANE services (LES/BUS).
  99. NETWORK PERFORMANCE III PROBLEMS AND SOLUTIONS Chapter 8 Bandwidth Utilization Chapter 9 Broadcast Storms Chapter 10 Duplicate Addresses Chapter 11 Ethernet Packet Loss Chapter 12 FDDI Ring Errors Chapter 13 Network File Server Timeouts Chapter 14 Measuring ATM Network Performance
  100. BANDWIDTH UTILIZATION 8 Use these sections to identify and correct problems that are indicated by changes in bandwidth utilization: s Bandwidth Utilization Overview s Identifying Utilization Problems s Generating Historical Utilization Reports s See “Bandwidth Utilization Reference” see for additional conceptual and problem analysis detail. Bandwidth To determine how your network is operating on a day-to-day basis, Utilization examine its bandwidth utilization. Changes in utilization can alert you to Overview actual or potential problems. Understanding Utilization varies depending on the media and on how your network is the Problem configured and used. Become aware of your network’s normal behavior so that you know when to examine utilization levels more closely. See “Identifying Your Network’s Normal Behavior” for more information. Identifying Determine the current utilization of all media on your network (Ethernet, the Problem Fiber Distributed Data Interface, token ring, and Asynchronous Transfer Mode) to determine whether utilization rates are exceeding thresholds that you have set in the management software. On most networks, utilization gradually increases as users begin using more network resources, such as electronic mail, network printing, and file sharing. Be concerned with utilization peaks that do not follow this pattern of use. The process of identifying immediate utilization levels is discussed in “Identifying Utilization Problems”.
  101. 104 CHAPTER 8: BANDWIDTH UTILIZATION Examine your network’s historical trends (its typical utilization over time) and note whether your network has experienced a gradual or sudden increase in utilization. Here are ways to assess trends: s A sharp increase in utilization indicates an abnormal condition. Search the area of the network where the increase occurred. For example, a device might be causing “Broadcast Storms”. s A sustained high or low level of utilization indicates an increasing or decreasing load on your network. Balance your network’s load by adding or redistributing segments. The process of identifying historical trends is discussed in “Generating Historical Utilization Reports”. A high rate of utilization can lead to high rates of packet fragments. As utilization exceeds the alarm threshold, packet fragments become common. See “Ethernet Packet Loss” for information about identifying when packet fragments are occurring. Solving the Problem Narrow the utilization problem to the ports that have excessively high or low utilization. If necessary, redistribute network traffic accordingly by segmenting your LAN with a bridge, router, or switch. Sometimes, a hardware problem can cause abnormal utilization rates. In this case, see “Ethernet Packet Loss” and “FDDI Ring Errors” for troubleshooting information. Identifying First, determine utilization levels on your current network. Try to locate Utilization the segments that are experiencing high or low utilization levels. Problems Use Status Watch, which collects MIB-II data using Simple Network Management Protocol polling, to determine bandwidth utilization. Status Watch The Status Watch utilization tools monitor the amount of traffic on network segments and show how the bandwidth is being allocated. These tools provide a real-time report of utilization data on the selected device or group of devices. Table 14 describes the Status Watch tools that monitor your network’s utilization.
  102. Identifying Utilization Problems 105 Table 14 Status Watch Tools Used for Examining Utilization Tool Icon What It Indicates Ethernet The aggregate percentage of utilization of an Utilization Ethernet segment (calculated by tracking the receive and transmit utilizations of Ethernet ports) FDDI Utilization The percentage of utilization of the primary, secondary, and local FDDI rings (calculated by tracking the percent utilization of FDDI ports) Token Ring The percentage of utilization of a token ring Utilization segment ATM Utilization The percentage of utilization of supported ATM interfaces Follow these steps: 1 Select the group that you suspect has a performance problem. The color-coded icons (for groups, devices, and tools) can guide you to the areas of your network that are experiencing problems. For example, red icons mean that you should examine a problem immediately. If a group is red, click the group to see all devices in that group, and locate the device that is red. Select the device and examine which tool icons are also red. 2 Select the utilization tool icon that indicates a problem. The tool report displays all the interfaces in the group or device. Determine which interfaces reflect high rates. If some interfaces are experiencing excessively high utilization rates, look for broadcast storms and other conditions that cause packet loss, as described in: s Broadcast Storms s Ethernet Packet Loss If an increase in utilization causes an increase in Error rates (other than collisions), look for MAC and physical layer problems (for example, faulty network cards, illegal repeater hops, and cables that are too long). Additionally, monitor Collision rates as utilization rises, looking for large increases that are out of the ordinary. In particular, search devices on the
  103. 106 CHAPTER 8: BANDWIDTH UTILIZATION segment for “Excessive Collisions”. While Collisions are normal, Excessive Collisions means network delays. Generating Use real-time utilization data to see how your network is operating at the Historical moment. To gauge whether utilization is at a critical point for your Utilization Reports network, look at historical data. Use Web Reporter to generate a historical report that shows the utilization trends for a specific set of devices on your network. Web Reporter Using Web Reporter, you can save days and weeks of network data, save a baseline week of “normal” data, and determine when utilization is constantly high. Follow these steps: 1 Access Web Reporter. Use as the uniform resource locator (URL) the directory where you installed Transcend® NCS on the Web. 2 Generate a weekly Historical report to see utilization rates for the whole week. 3 Compare your weekly Historical report to a baseline of historical utilization data. See “Identifying Your Network’s Normal Behavior” and the Web Reporter Help for more information about setting a baseline. Bandwidth This section explains terms that are relevant to bandwidth utilization and Utilization provides additional conceptual and problem analysis detail. Reference ATM Utilization Over time, if a port has experienced increased, sustained utilization levels, then you need to balance the load of your ATM segments. Status Watch calculates ATM utilization in this way: greater of (in_util, out_util) where: in_util = ( ((rate of ifInOctets)*8) / ((linespeed)*0.9875) )*100 out_util = ( ((rate of ifOutOctets)*8) / ((linespeed)*0.9875) )*100
  104. Bandwidth Utilization Reference 107 The 8 factor converts octets to bits. The 0.9875 factor offsets the interframe gap. Ethernet Utilization Over time, if a port has experienced increased utilization levels (often a sustained level of over 40 percent), then you need to rebalance the load of your Ethernet segments. Typically, the larger the frame size, the more utilization your network can accommodate. You may recognize utilization problems with certain protocols before other protocols because some protocols have less tolerance for high rates of traffic. When utilization becomes a problem also depends on users. For example, you may allow higher utilization rates on an engineering network, yet you want greater bandwidth availability on a financial network where data delivery is critical. As general guidelines, your network is healthy in these conditions: s Utilization is running up to 15 percent most of the time. s Utilization is peaking at 30 to 35 percent for a few seconds at a time, with large gaps of time between peaks. s Utilization is peaking at 60 percent for a few seconds, with large gaps of time between peaks. However, in this instance, locate the reason for the peak. Determine if the problem might get worse or if you can isolate it. If the 30 percent utilization peaks start occurring very close together, your network starts showing signs of degraded performance. Status Watch calculates Ethernet utilization in this way: in_util + out_util where: in_util = ( ((rate of ifInOctets)*8) / ((linespeed)*0.9875) )*100 out_util = ( ((rate of ifOutOctets)*8) / ((linespeed)*0.9875) )*100 The 8 factor converts octets to bits. The 0.9875 factor offsets the interframe gap.
  105. 108 CHAPTER 8: BANDWIDTH UTILIZATION FDDI Utilization FDDI accepts utilization levels that are equivalent to its rated speed. Unlike Ethernet, FDDI does not have delays and problems that cause collisions. The best way to determine high FDDI utilization is to know the normal capacity of your FDDI network. Generally, if your FDDI network is consistently reporting 90 percent or more utilization, plan to balance the load on your network. Status Watch calculates FDDI utilization in this way: (1 - (delta(token_count)*latency) / delta(time) )*100 Token Ring Utilization Token ring media accepts utilization levels equivalent to its rated speed. Unlike Ethernet, token ring does not have delays and problems caused by collisions. The best way to determine high token ring utilization is to know the normal capacity of your token ring network. Generally, if your token ring network is consistently reporting 90 percent or more utilization, plan to balance the load on your network. Status Watch calculates token ring utilization in this way: ( ( rate*8) / (speed) )*100 where: rate = ifInOctets / delta(time) speed = line speed of 4 or 16 The 8 factor converts octets to bits.
  106. BROADCAST STORMS 9 Use these sections to identify and eliminate broadcast storms: s Broadcast Storms Overview s Identifying a Broadcast Storm s Disabling the Offending Interface s Correcting Spanning Tree Misconfigurations See “Broadcast Storms Reference” for additional conceptual and problem analysis detail. Broadcast Storms A broadcast storm means that your network is overwhelmed with Overview constant broadcast or multicast traffic. Broadcast storms can eventually lead to a complete loss of network connectivity as the packets proliferate. Some devices, like the CoreBuilder™ 2500 and CoreBuilder 3500, have firewall protection against broadcast storms. If a certain broadcast transmit threshold is reached, the port drops all broadcast traffic. Firewalls are one of the best ways to protect your network against broadcast storms. Determine whether your network devices support this functionality. Understanding “Broadcast Packets” and “Multicast Packets” are a normal part of your the Problem network’s operation. To recognize a storm, you must be able to identify when broadcast and multicast traffic is abnormal for your network. Identifying You may suspect that a broadcast storm is occurring when your network the Problem response times become extremely slow and network operations are timing out. As a broadcast storm progresses, users cannot log in to servers or access e-mail. As the storm worsens, the network becomes unusable.
  107. 110 CHAPTER 9: BROADCAST STORMS When your network is operating normally, monitor the percentage of broadcast and multicast traffic. You can then use this data as a baseline to determine when broadcast and multicast traffic is too high. The process of identifying the problem is discussed in “Identifying a Broadcast Storm”. Solving the Problem Storms can occur if network equipment is faulty or configured incorrectly, if the Spanning Tree Protocol is not implemented correctly, or if poorly designed programs that generate broadcast or multicast traffic are used. The process for solving the problem is discussed in these sections: s Disabling the Offending Interface s Correcting Spanning Tree Misconfigurations Identifying a When identifying broadcast storms, use the following applications: Broadcast Storm s Status Watch — To recognize when broadcast and multicast traffic exceeds the normal rates for your network s Traffix Manager — To monitor all broadcast traffic over time Status Watch Using the Status Watch tools in Table 15, you can identify when and where a broadcast storm is occurring. Table 15 Status Watch Tools Used for Identifying Broadcast Storms Tool Icon What It Indicates Broadcast Receive The percentage of broadcast and multicast traffic received on an Ethernet port or token ring port Broadcast Transmit The percentage of broadcast and multicast traffic transmitted from an Ethernet port or token ring port Ethernet The aggregate percentage of utilization of an Utilization Ethernet segment as calculated by tracking the receive and transmit utilizations of Ethernet ports For the Broadcast Receive and Broadcast Transmit tools, if the value for receive utilization is less than 10 percent, Status Watch ignores the high
  108. Identifying a Broadcast Storm 111 rate of broadcast traffic. This way, a broadcast problem is not falsely triggered in Status Watch for a segment on which a majority of traffic is spanning tree or Routing Information Protocol (RIP) packets. Follow these steps: 1 Use the Summary View window to examine the Broadcast Transmit tool and Broadcast Receive tool to determine if any thresholds have been exceeded on your monitored devices. These tools work together in this way: s If the thresholds for both the Broadcast Transmit tool and Broadcast Receive tool are exceeded on a device, then a broadcast storm is occurring on your network, and this device is receiving and transmitting the broadcast traffic. s If the threshold for the Broadcast Receive tool is exceeded but the Broadcast Transmit tool reports normal data on a device, then a broadcast storm is probably occurring on the segment that is attached to the interface that reports the excessive traffic, but this device might have a filter (such as a multicast packet firewall) that prevents the storm from propagating. s If the threshold for the Broadcast Transmit tool is exceeded but the Broadcast Receive tool reports normal data on a device, then the device is responsible for the broadcast storm. 2 Examine the Asynchronous Transfer Mode (ATM), Ethernet, Fiber Distributed Data Interface (FDDI), and token ring utilization tools to determine if their reported rates are abnormally high. If so, traffic is flooding the network. See “Bandwidth Utilization” for more information. 3 Search for “Ethernet Packet Loss” as an additional indicator that a broadcast storm is occurring. Increased collisions occur as the network becomes saturated. After you set a baseline for normal network activity, you can set the Broadcast Transmit tool and Broadcast Receive tool thresholds to alert you when broadcast and multicast traffic is heavier than normal. Traffix Manager Using Traffix Manager, you can monitor all broadcast traffic to identify exactly which devices are generating broadcast traffic.
  109. 112 CHAPTER 9: BROADCAST STORMS Follow these steps: 1 Using the Select Database Traffic to Load dialog box, retrieve data to the Map using the 6-Hourly or Hourly data resolution. Finer resolutions take longer to load from the database to the Map. However, they are more suitable for in-depth analysis of network traffic than the daily or weekly resolutions. For quicker retrieval of finer resolution data, select a shorter time range. 2 Open the Protocol Selection dialog box and set all protocols to appear as Other: a Click Clear All to deselect all protocols. b Click the Other checkbox to select it without selecting any child protocols. c Set the Protocol Filter Mode to Unselected protocols are added to parent. 3 In the Map, select MAC Labels to display devices by their MAC addresses. 4 Use the Find Objects tool to locate the broadcast MAC address ff:ff:ff:ff:ff:ff and select it from the Object List or Map. 5 From the Display menu, select Show Conversations To and From to display all traffic that is going to and from the broadcast MAC address. 6 Set the Map all objects button to Map connected objects. 7 To create a list of the devices that are sending broadcast traffic to the broadcast address, right-click the Traffix group and select Visible Device List…. 8 To generate a baseline of broadcast traffic: a Right-click the Traffix root group and select Protocol Distribution. b Select Packets and the timeline graph format. 9 To generate a list of the Top-N sources of broadcast traffic: a Right-click the Traffix root group and select Child Top N. b Select Packets and the bar graph format. c Set Top N to an appropriate value. The Top-N list can indicate what interface is starting the storm and what interfaces are propagating the storm.
  110. Disabling the Offending Interface 113 Disabling the Because broadcast storms can ultimately cause your whole network to Offending Interface become unavailable, take action immediately to disable the offending interface. You can enable the interface again after you have corrected the problem. Address Tracker Use Address Tracker to locate the interface that is causing the broadcast storm. Use Device View to disable the port. Follow these steps: 1 In the Find Address window, enter the address of the interface that seems to be receiving the broadcast traffic. You can copy the MAC or IP address from the Status Watch report and paste it into Address Tracker’s Enter the Address You Want to Find field. 2 Click Find Now. Search displays the device name. 3 Use Transcend Central to launch Device View and disable the port. Disabling the port stops the broadcast storm before it interferes with all vital network traffic. You can re-enable this interface using Device View or the device’s console later. Correcting Spanning Tree does not cause broadcast storms, but a loop in your Spanning Tree Spanning Tree topology can create data that looks like a storm. A loop Misconfigurations can occur in your topology if: s Someone disables Spanning Tree on a port s You set up your Spanning Tree configuration incorrectly Device View Use Device View to disable any Spanning Tree port that has a repeater attached to it and to correct Spanning Tree misconfigurations. To correct Spanning Tree misconfigurations, use Device View to disable Spanning Tree Protocol (STP) for a port on a SuperStack® II Switch 1000, Switch 3000, Switch 3000 10/100, Switch 9000SX, Desktop Switch, LinkBuilder® FMS II Bridge/Management Module, or CoreBuilder™ 6000.
  111. 114 CHAPTER 9: BROADCAST STORMS To disable the STP port state for a port on a SuperStack II switch: 1 Select a port and click the right mouse button. 2 From the shortcut menu, select Configure. 3 In the Port section, click the STP tab. 4 From the STP Port State list box, select Disabled. 5 Click Apply. To disable the STP port state for a port on a LinkBuilder FMS II Bridge/Management Module: 1 Double-click the module. 2 From the shortcut menu, select Configure Bridge. 3 In the Port section, click the STP tab. Broadcast Storms This section explains terms that are relevant to broadcast storms and Reference provides additional conceptual and problem analysis detail. Broadcast Packets Broadcast packets, which are a normal part of network operation, are transmitted by a device to a broadcast address. For example, IP networks use broadcasts to resolve network addresses using Address Resolution Protocol (ARP); IPX networks use a large number of broadcast packets to operate most effectively. Problems arise when broadcast packets endlessly propagate throughout the network, which increases the traffic volume on your network and the CPU time that each host spends processing and discarding unwanted broadcast packets. Multicast Packets Multicast packets, which are a normal part of network operation, are transmitted by a device to a multicast group address. Hosts that want to receive the packets indicate that they want to be members of the multicast group, and then multicast packets are distributed to that group. For example, multicast packets support the Spanning Tree Protocol. Multicast applications and underlying multicast protocols control multimedia traffic and shield hosts from processing unnecessary broadcast traffic. However, multicast traffic can also cause storms that saturate your network.
  112. DUPLICATE ADDRESSES 10 Use these sections to identify and correct problems caused by duplicate MAC and IP addresses: s Duplicate Addresses Overview s Finding Duplicate MAC Addresses s Finding Duplicate IP Addresses See “Duplicate Addresses Reference” for additional conceptual and problem analysis detail. Duplicate Networks sometime generate duplicate MAC and IP addresses. Because Addresses duplicate addresses can cause problems with packet delivery, resolve Overview them as soon as possible. Understanding Duplicate MAC addresses are caused by data link layer problems with the Problem Fiber Distributed Data Interface (FDDI) media and the passing of tokens on the FDDI ring. Duplicate IP addresses are caused by network layer problems. See these sections for more information about causes of duplicate addresses: s Duplicate MAC Addresses s Duplicate IP Addresses Identifying Identify duplicate MAC and IP addresses by following the instructions in the Problem these sections: s Finding Duplicate MAC Addresses s Finding Duplicate IP Addresses Solving the Problem Identify the cause of the duplicate address (such as user error or a hardware problem), and fix the problem, if possible.
  113. 116 CHAPTER 10: DUPLICATE ADDRESSES Finding Duplicate To find out if duplicate MAC addresses are occurring, monitor your MAC Addresses network using Status Watch. Status Watch The Status Watch FDDI Status tool identifies duplicate FDDI MAC addresses, and Status Watch reports when two or more MACs on the same ring have the same MAC address (a Duplicate Address condition). Follow these steps: 1 In the Status Watch Summary View window, determine if any FDDI Status conditions are reported. If there are, double-click the table cell value to display the Device List window. Another approach is to examine only the devices that you know reside on your FDDI ring. In the Status Watch main window, red device icons indicate that a threshold has been exceeded. 2 Select a device. s If you selected the device from the Device List window, the real-time report for that device appears in the Status Watch main window. s If you selected the device from the main window, also select the FDDI Status tool to view the real-time report. 3 Determine if a Duplicate Address condition caused the FDDI Status tool to trigger a Critical or Warning status for that device. In Status Watch, you can specify the status severity level to apply to a Duplicate Address condition. Finding Duplicate IP To find out if duplicate IP addresses are occurring, monitor your network Addresses using these applications: s Address Tracker — To find duplicate IP addresses on 3Com devices and their attached networks. s LANsentry Manager® — To find duplicate IP addresses that are collected by probes gathering RMON2 SmartAgent® data from the Enterprise Communications Analysis Module (ECAM) downloaded on your network devices. Address Tracker Use Address Tracker to determine when and where duplicate IP addresses occur.
  114. Duplicate Addresses Reference 117 Follow these steps: 1 From the Find Address menu, select Find Duplicate IP Addresses. 2 Click Find Now to start your search. LANsentry Manager Use the Duplicates table in LANsentry Manager to compile a list of all stations with duplicate IP addresses. This table is available only on probes that have downloaded RMON2 (ECAM) SmartAgent software. Follow these steps: 1 From the LANsentry Manager Address Map menu, select Duplicates. Address Map data is displayed as a table. 2 To export the contents of the table, click Export to launch the Data Export dialog box. Duplicate This section explains terms that are relevant to duplicate addresses and Addresses provides additional conceptual and problem analysis detail. Reference Duplicate MAC Each device on your network has a unique MAC address. This address Addresses identifies a single device on the network, allowing packets to be delivered to correct destinations. Packets are delivered to their destinations by means of MAC-address-to-IP address translation that the Address Resolution Protocol (ARP) provides. Therefore, if MAC addresses are duplicated on the network, ARP caches of routing devices contain erroneous destinations. In FDDI, devices monitor network traffic, searching for their own MAC address in each packet to determine whether to decode the packet. If MAC addresses are not unique, two stations cannot be distinguished from each other. Duplicate MAC addresses can occur for the following reasons: s Someone has manually configured a MAC address for a device instead of using the address that the vendor supplied or allowing it to be assigned dynamically, and this address is also assigned to a different device. s In rare circumstances, loops in a bridged network can cause a MAC hardware problem or an address learning problem that creates a duplicate MAC address entry in the bridging address table.
  115. 118 CHAPTER 10: DUPLICATE ADDRESSES s On DECnet Phase 4 networks, MAC addresses are set from the DECnet address. A duplicate NET address can cause a duplicate MAC address. A router that maps the same MAC address to more than one IP address creates a valid network configuration. These MAC address assignments are not considered duplicate MAC addresses. Burned-in addresses (BIAs), which are MAC addresses that a vendor permanently gives to a device, are always unique. Duplicate IP Because IP addresses are critical for transmission of packets on TCP/IP Addresses networks, resolve them immediately. Duplicate IP addresses can occur when someone has configured an IP address that is identical to an IP address that is assigned to a different device. Address assignments, although possible for you to configure manually, are usually made using one of these protocols: s Dynamic Host Configuration Protocol (DHCP) — Allows your network to dynamically assign IP addresses to nodes. With this protocol, a DHCP server temporarily assigns an IP address to a node, or you can statically configure addresses as needed. s BOOTstrap protocol (BootP) — Allows you to statically assign IP addresses to nodes. This protocol is more efficient than RARP. s Reverse ARP (RARP) — Allows you to statically assign IP addresses to nodes. However, because this protocol relies on the MAC address to identify the node, you cannot use it on networks that dynamically assign hardware addresses.
  116. ETHERNET PACKET LOSS 11 Use these sections to identify and correct Ethernet packet loss: s Ethernet Packet Loss Overview s Searching for Packet Loss See “Ethernet Packet Loss Reference” for additional conceptual and problem analysis detail. Ethernet Packet If your Ethernet network shows signs of congestion, it may be Loss Overview experiencing packet loss. When your network is congested, utilization is usually high, packets are discarded because buffers are full, and collision rates are up. Problems related to “Collisions” are often at the heart of packet loss. Understanding the Collisions are normal in Ethernet networks. In many cases, Collision rates Problem of 50 percent do not cause a large decrease in throughput. The Collision rate helps mark the upper limit on your network (the maximum percentage of collisions that your network can bear), which is usually around 70 percent. If Collisions increase above this upper limit, your network can become unreliable. When the Collision rates increase, so do “Excessive Collisions”, which causes a delay in transmitting data. An increase in Collisions also indicates that network utilization and network errors, such as “FCS Errors”, are probably increasing. The real packet problems to watch for, however, are undetected collisions that show up as “Late Collisions”. If small packets are colliding, you do not necessarily see a rise in utilization, but you may still have a problem. Capture packets to determine their size.
  117. 120 CHAPTER 11: ETHERNET PACKET LOSS Identifying the To identify that your network’s problem is related to packet loss, verify Problem that frames are being dropped on your network by examining this packet loss data: s Alignment Errors s Collisions s Excessive Collisions s FCS Errors s CRC Errors s Late Collisions s Receive Discards s Too Long Errors s Too Short Errors s Transmit Discards The process of identifying the problem is discussed in “Searching for Packet Loss”. Solving the Problem If you notice that packet loss data is consistently high, then your network is too congested. In this case, segment your network with the appropriate network device (such as a switch or router). If Collision data shows increases but your network’s utilization is the same, then your network may have a physical problem, such as cabling that is too long. Other problems that packet loss data can indicate include: s Faulty connectors or improper cabling s Excessive numbers of repeaters between network devices s Defective Ethernet transceivers or controllers Possible solutions to these problems are explained in the procedures in “Searching for Packet Loss”. Searching for When you look for packet loss, use the following applications: Packet Loss s Status Watch — For Ethernet and MIB-II data collection using SNMP polling s LANsentry Manager Network Statistics Graph — For RMON data collection using an RMON probe
  118. Searching for Packet Loss 121 s Device View — On a per-device basis, you can evaluate statistics for any port on the device. Status Watch Status Watch monitors: s Alignment Errors s Excessive Collisions s FCS Errors s Receive Discards s Transmit Discards Follow these steps: 1 Determine if the thresholds for the Alignment Errors tool and FCS Errors tool are being exceeded. Table 16 identifies the problems that this data can indicate and your possible actions. For information about problems related to a nonstandard Ethernet implementation, see “Nonstandard Ethernet Problems”. Table 16 Alignment Errors, FCS Errors, and CRC Errors Data Possible Problem Possible Action Faulty cabling Examine the cable and cable connections for breaks or damage. Network noise Look for improper cabling, faulty cable, faulty network equipment, or cables that are too close to equipment that emits electromagnetic interference (lamps, for example). Faulty transceiver Use an analyzer to identify the problematic transceiver. If necessary, replace the transceiver, network adapter, or station. Fault at the transmitting end 1 Locate the source of the errors by looking at station the module and port statistics. 2 Verify the correct operation of the transceiver or adapter card of the device that is connected to the problem port. 3 If the card appears to be operating correctly, examine the cable and cable connections for breaks or damage.
  119. 122 CHAPTER 11: ETHERNET PACKET LOSS Table 16 Alignment Errors, FCS Errors, and CRC Errors Data (continued) Possible Problem Possible Action Station powering up or down None required. Early implementations of Ethernet transceivers generate a significant amount of in-band noise when powering up; they frequently cause Alignment Errors and FCS Errors in an otherwise stable network. When powering up, some software drivers for Ethernet controllers also initiate Time Domain Reflectometry (TDR) tests to test the Ethernet media. Network monitors report TDR tests as Alignment Errors and FCS Errors. Faulty adapter Replace the adapter. 2 Determine if the Excessive Collisions tool threshold is being exceeded. Table 17 identifies the problems that this data can indicate and your possible actions. Table 17 Collisions and Excessive Collisions Data Possible Problem Possible Action Busy network Use a bridge, router, or switch to reconfigure your network into segments with fewer stations. Faulty device (adapter, switch, Isolate each adapter to see if the problem stops. hub, and the like) that does not listen before broadcasting. This problem increases the incidence of all types of collisions. Network loop Ensure that no redundant connections to the same station have both connections active simultaneously. 3 Determine if the Receive Discards and Transmit Discards tools thresholds are being exceeded. If these errors are high in conjunction with the data that you learned in steps 1 and 2, then your network is overloaded. Segment your network. LANsentry Manager Use the LANsentry® Manager Network Statistics graph to view data for: Network Statistics s Collisions Graph s Late Collisions
  120. Searching for Packet Loss 123 s Bandwidth Utilization s CRC Errors s Too Long Errors s Too Short Errors Follow these steps: 1 Display a Network Statistics graph for the local Ethernet segment on which users have reported poor performance. This graph shows the most recent trend in Collision rates. If you have set up a History sample, you can also look at the historical trend. If a number of segments are connected by repeaters, examine the graph for each Ethernet segment. 2 Analyze Utilization and Collision rates to determine whether collisions are caused by an overloaded segment or a faulty component. s If Utilization rates are high — The collisions are probably caused by an overloaded segment. If you have added nodes or new applications to your network, consider reconfiguring the cabling system using bridges and routers to filter out remote collisions and to keep local traffic on one segment. This action should level the network load. s If Utilization rates are stable and appear normal — The collisions are probably caused by faulty components. In this case, do the following: s If the network consists of repeaters — Compare the Network Statistics graphs for each segment connected to the repeater. Because repeaters “repeat” traffic across all connected segments (which makes many segments seem like one network), you should see similar levels of traffic on all segments. One segment that shows dissimilar levels of traffic and collisions may indicate faulty hardware. In this case, monitor several collisions to track the source station that is transmitting too soon after collisions and repair the station. Packets that are transmitted too soon after collisions are unlikely to be valid. See Table 17 for more information about Collisions. s On other networks — Determine the segment cable length. 3 Examine the CRC Errors and Late Collisions, which often indicate cabling or component problems.
  121. 124 CHAPTER 11: ETHERNET PACKET LOSS Table 16 identifies the problems that CRC Errors can indicate and your possible actions. Table 18 identifies the problems that Late Collisions data can indicate and your possible actions. Table 18 Late Collisions Data Possible Problem Possible Action Cabling problems: Correct the cabling problem by doing one or more of the following: s Segment too long s Reduce the segment length. s Failing cable s Replace the cable. s Segment not grounded properly (noise) s Ground the cable. s Improper termination s Terminate the cable correctly. s Taps too close (10BASE-5 and s Check the taps. 10BASE-2 only) s Check for cables too close to equipment s Noisy cable that emits electromagnetic interference. Component problems: Correct the component problem by doing one of the following: s Deaf or partially deaf node s Trace the failing component and replace it. s Failing repeater, transceiver, or controller cards s Replace the NIC or the transceiver. 4 Trace Too Short Errors and Too Long Errors to the sender. These errors often indicate faulty routers or LAN drivers and transceiver problems. Table 19 identifies the problems that this data can indicate and your possible actions. Table 19 Too Long Errors and Too Short Errors Data Possible Problem Possible Action A transceiver on your network is 1 Use a network analyzer to identify the adding bits to the packets that are problematic transceiver. transmitted by the attached 2 If necessary, replace the transceiver, station. network adapter, or station. The jabber protection mechanism Replace the network card. on a transceiver has failed; it can no longer protect the network from the jabbering produced by the attached station.
  122. Searching for Packet Loss 125 Table 19 Too Long Errors and Too Short Errors Data Possible Problem Possible Action Excessive noise on the cable Check for improper cabling, faulty cable, faulty network equipment, or cables too close to noisy Note: Some 10/100 Mbps cards electronic equipment (lamps, for example) that autodetect the network speed may connect to the network If your network card autodetects the network at the wrong speed, causing speed, and you have ruled out other problems, excessive noise. manually configure the network speed. Faulty routers (two different Notify the manufacturer. network types are connected and the router is not enforcing proper frame size restrictions) Faulty LAN driver Replace the driver. A normal condition on a These frames are passed successfully but will LinkSwitch® 1000, LinkSwitch® create the Too Long Frame error message. 3000, or CoreBuilder™ 5000 If you want to eliminate the error message, FastModule reduce your Ethernet packet frames by 4 bytes. If you use maximum-sized, 1518 Ethernet frames, the device’s VLT-enabled ports add a frame tag of 4 bytes, resulting in a misleading Too Long Frame error. Device View Device View allows you to display a variety of port and device-level statistics relevant to Ethernet packet loss. Table 20 describes these statistics and their use in troubleshooting.
  123. 126 CHAPTER 11: ETHERNET PACKET LOSS : Table 20 Activity and Error Statistics in Device View Statistics Group Description Use in Troubleshooting Activity Displays the total This data shows readable packets, broadcast network activity packets, “Collisions”, total errors, and runts, and errors on the which cause “Too Short Errors”. You can selected port. interpret this data in the following ways: s The presence of runts can often be caused by Collisions; however, if the values increase at specific times of the day, it may indicate you need to change the network topology to manage the traffic more efficiently (for example, with switches or routers). s Runts can also be caused by a badly terminated coax cable. s Large numbers of runts, not associated with high levels of collisions, can indicate a transmission problem (examine the cable). s Particularly high numbers of Collisions, compared to the total number of readable packets, can point to a hardware problem (a bad adapter) or to a data loop. s A high proportion of “Broadcast Packets” (>10%) on a heavily utilized network (>50% of available bandwidth) can point to an incorrectly configured bridge or router on the network. Errors Displays the The significance of errors depends on number of frames accompanying errors and prevailing network with errors on the conditions. See the following error data for more selected port. information: s Alignment Errors, Table 16 s FCS Errors, Table 16 s Too Long Errors, Table 19 s Too Short Errors or runts, Table 19 s Late Collisions, Table 18 To display Activity and Errors statistics for a device or port, follow these steps: 1 Select the required port or device. 2 From the shortcut menu, select Activity or Errors.
  124. Ethernet Packet Loss Reference 127 The statistics available depend on the type of port or device selected. See Table 20 for troubleshooting information. You may not be able to access these statistics on some devices using Device View. See the Device View documentation for additional information. Ethernet Packet This section explains terms that are relevant to Ethernet packet loss and Loss Reference provides additional conceptual and problem analysis detail. Alignment Errors An Alignment Error indicates a received frame in which both are true: s The number of bits received is an uneven byte count (that is, not an integral multiple of 8) s The frame has a Frame Check Sequence (FCS) error. Alignment Errors often result from MAC layer packet formation problems, cabling problems that cause corrupted or lost data, and packets that pass through more than two cascaded multiport transceivers. See “FCS Errors” for more information about interpreting Alignment Errors. Collisions Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay). Because only one device can transmit at a time, both devices must stop sending and attempt to retransmit. Collisions are detected by the transmitting stations. The retransmission algorithm helps to ensure that the packets do not retransmit at the same time. However, if the two devices retry at nearly the same time, packets can collide again; the process repeats until either the packets finally pass onto the network without collisions, or 16 consecutive collisions occur and the packets are discarded. CRC Errors A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines “FCS Errors” and “Alignment Errors”. These errors indicate that packets were received with: s A bad FCS and an integral number of octets (FCS Errors) s A bad FCS and a non-integral number of octets (Alignment Errors)
  125. 128 CHAPTER 11: ETHERNET PACKET LOSS CRC Errors can cause an end station to freeze. If a large number of CRC Errors are attributed to a single station on the network, replace the station’s network interface board. Typically, a CRC Error rate of more than 1 percent of network traffic is considered excessive. Excessive Collisions Excessive Collisions indicate that 16 consecutive collisions have occurred, usually a sign that the network is becoming congested. For each excessive collision count (or after 16 consecutive collisions), a packet is dropped. If you know the normal rate of excessive collisions, then you can determine when the rate of packet loss is affecting your network’s performance. See “Knowing Your Network’s Configuration” for more information. FCS Errors Frame Check Sequence (FCS) Errors, a type of CRC, indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check. The FCS is a mathematical way to ensure that all the frame’s bits are correct without having the system examine each bit and compare it to the original. Packets with Alignment Errors also generate FCS Errors. Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments. In a network that complies with the Ethernet standard, FCS or Alignment Errors indicate bit errors during a transmission or reception. A very low rate is acceptable. Although Ethernet allows a 1 in 108 bit error rate, typical Ethernet performance is 1 in 1012 or better. Late Collisions Late Collisions indicate that two devices have transmitted at the same time, but cabling errors (most commonly, excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision. Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network. As a result, neither of the devices that cause the late collision senses the other’s transmission until the entire packet is on the network. Although late collisions occur for small packets, the transmitter cannot detect them. As a result, a network suffering measurable Late Collisions for large packets is losing small packets as well.
  126. Ethernet Packet Loss Reference 129 Nonstandard Table 21 lists the symptoms that typically occur if a system violates the Ethernet Problems Ethernet standard. Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes “FCS Errors” and Network cabling is If you use a promiscuous network “Alignment Errors” too long. monitor, the number of Late Collisions increase significantly. reported by stations should correlate with the FCS and Alignment Errors reported by the monitor. FCS and Alignment Network segment Typically observed on a 10BASE-T Errors increase is noisy. network segment in a noisy proportionally with environment. If you use multiple interference promiscuous monitors, the FCS and (sometimes referred Alignment Errors among the monitors to as noise hits). will not correlate. If the monitor can track runts, also called “Too Short Errors”, the number of runt packets should be significantly higher than normal. FCS and Alignment Networks do not Occurs when some implementations of Errors are much conform to the Ethernet in the segment are not entirely higher than normal. access scheme of compatible with IEEE 802.3 repeaters. Carrier Sense Collision fragments linger on the Multiple Access network long enough to collide with with Collision retry packets at the minimum Detect interpacket gap (IPG). The IPG is smaller (CSMA/CD). on one side of the repeated network, causing a lost packet. Ethernet controllers cannot receive packets that are separated by 4.7 µs or less. Some controllers cannot sustain receptions of packets separated by as much as 9.6 µs. If runt packets are received one after another and are followed by a collision fragment, Ethernet controllers that cannot sustain reception will lose packets. Receive Discards Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors. Too Long Errors A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed. Too Long Errors are often caused by a bad transceiver, a malfunction of the jabber protection mechanism on a transceiver, or excessive noise on the cable.
  127. 130 CHAPTER 11: ETHERNET PACKET LOSS Too Short Errors A Too Short Error, also called a runt, indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed. Transmit Discards Transmit Discards indicate that packets were not transmitted because of network congestion.
  128. FDDI RING ERRORS 12 Use these sections to identify and correct FDDI ring errors: s FDDI Ring Errors Overview s Identifying Ring Errors See “FDDI Ring Errors Reference” for additional conceptual and problem analysis detail. FDDI Ring Errors Fiber Distributed Data Interface (FDDI) often corrects its own problems. Overview However, because FDDI cannot correct all errors (especially those related to hardware problems), you should monitor FDDI errors. Understanding FDDI ring errors that you should monitor include: the Problem s Elasticity Buffer Error Condition s Frame Error Condition s Frames Not Copied Condition s Link Error Condition Identifying First determine the type of FDDI ring errors and where they are occurring. the Problem Similar to the way you identify other FDDI problems, identify the upstream and downstream neighbors of the devices that you are monitoring. Several types of network errors can cause FDDI performance problems. For example, problems with cables or physical connections may result in a link or frame error. Elasticity buffer (EB) errors can also lead to link and frame errors.
  129. 132 CHAPTER 12: FDDI RING ERRORS FDDI deals with port-related errors as follows: s The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm. When the LER is greater than the alarm setting, Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port. The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring. s When link errors reach the threshold defined by the variable PORTLERCutoff, SMT breaks the connection, disabling the PHY that detected the problem. A “Link Error Condition” is also generated. FDDI deals with MAC-related errors as follows: s When MAC frame errors reach a certain threshold, a “Frame Error Condition” is generated. Because the actual error can be further upstream than the immediate connection, the connection remains intact. s For a large network, the worst case MACFrameErrorRatio is less than 0.1 percent. However, during network configuration, frame error ratios can reach 50 percent for short periods. When you detect a sustained frame error ratio of more than 0.1 percent, a problem exists between the station that is reporting the condition and the nearest upstream MAC. See “Identifying Ring Errors” for more information. Solving the Problem To solve problems related to FDDI errors, fix the hardware, cabling, or congestion problem. Identifying Ring Use Status Watch to monitor your FDDI devices for Warning or Critical Errors alerts. Status Watch Use Status Watch to identify FDDI ring errors.
  130. FDDI Ring Errors Reference 133 Follow these steps: 1 Monitor the FDDI Status tool for the currently selected device. 2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors, Frames Not Copied, or Link Error Rates for the currently selected device. FDDI Ring Errors This section provides additional conceptual and problem analysis detail. Reference Elasticity Buffer Error The Elasticity Buffer Error condition occurs when a port’s elasticity buffer Condition overflows or underflows. This condition usually indicates that a port’s hardware is not operating within the tolerances that the FDDI standard specifies. Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port. Frame Error The Frame Error condition occurs when the percentage of frames that Condition contain errors exceeds a preset threshold. In the situation when a device is an uplink to FDDI (that is, a device is transmitting onto FDDI), this type of condition indicates that the ring is saturated. The ring is out of buffer space and packets are being dropped from the device’s backbone port. The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor. Because many physical connections can lie along this path, the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring. Frames Not Copied The Frames Not Copied condition occurs when the percentage of frames Condition that are dropped because of insufficient buffer space exceeds a preset threshold. This condition indicates that the station is congested and is unable to process frames as quickly as they arrive. To help eliminate congestion: s Add more capacity to the station s Reconfigure your network so that end stations that communicate heavily with one another are on the same bridge or switch s Filter out certain traffic
  131. 134 CHAPTER 12: FDDI RING ERRORS Link Error Condition The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold. When the Link Error threshold is exceeded, the station removes itself from the ring and tries to reinsert itself on the ring. This action creates a “MAC Neighbor Change Event” (which also occurs if a ring wraps). Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector. Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter. MAC Neighbor The MAC Neighbor Change event occurs when a MAC’s upstream or Change Event downstream neighbor changes. This event indicates either: s A network reconfiguration s Another station that is leaving or joining the ring
  132. NETWORK FILE SERVER TIMEOUTS 13 Use these sections to identify and correct timeouts on network file servers: s Network File Server Timeout Overview s Looking for Obvious Errors s Reproducing the Fault While Monitoring the Network s Correcting the Fault See “Network File Server Timeouts Reference” for additional conceptual and problem analysis detail. Network File Server A network file server can time out if your network gets congested or if Timeout Overview your server is having problems. Users might have problems downloading data from or to the server or copying files from or to the server. To help you to understand the troubleshooting process for this type of problem, an EXAMPLE throughout this section follows the symptoms, analysis, and resolution of a typical file server timeout problem. Understanding the When users log in, their stations make network file server calls, either to Problem determine quotas (if this feature has been enabled) or to mount user home directories. The network file server timeout messages, even when spread across multiple nodes, indicate a problem either with the network or with a server. EXAMPLE: UNIX users notice that it takes a long time — over 30 seconds in some cases — to log in to any machine. Some machines report network file server timeout messages, but the messages have no obvious pattern and are infrequent. You begin to get a sense of the problem.
  133. 136 CHAPTER 13: NETWORK FILE SERVER TIMEOUTS Identifying the First, rule out the obvious causes. Ask these questions: Problem s Can you access the network file server with Telnet? s Have any alarms been triggered? s Are there any new errors? The process of identifying the problem is developed in “Looking for Obvious Errors”. Solving the Problem To determine the cause, reproduce the fault while you monitor the network. After you know the cause, you can fix the problem. The solutions to the network file server timeout are identified in these sections: s Reproducing the Fault While Monitoring the Network s Correcting the Fault Looking for To look for obvious errors, use these applications: Obvious Errors s Ping and Telnet — To determine for connectivity to the network file server nodes s LANsentry Manager Alarms View — To search for triggered alarms s LANsentry Manager Statistics View — To look for errors s LANsentry Manager History View — To identify for trends Ping and Telnet Determine whether you can contact network file server nodes using “Ping” and “Telnet”. If the response is extremely slow, then a problem may exist with the connections to the nodes. No delay indicates that the connections are normal, implying that the delay is occurring elsewhere. In this case, use LANsentry® Manager tools to determine whether packets are being lost or ignored. LANsentry Manager Using the LANsentry Manager Alarms View, you can determine if any Alarms View configured alarms have been triggered. Search the Alarms View to see if any MAC events have been logged. EXAMPLE: MAC events have not been logged for the network on which the UNIX users are attached.
  134. Looking for Obvious Errors 137 Even though no alarms have occurred, errors may exist. For example, a lower rate of background errors may exist just below the alarm threshold. Based on maximum and minimum values, RMON errors may miss constant, periodic, or low amounts of errors. Before you monitor your network with LANsentry Manager, you should have already set up alarms for obvious errors related to MAC events and loading problems. See “Setting Thresholds and Alarms” for more information. LANsentry Manager Using the LANsentry Manager Statistics View, you can display a Statistics View multisegment graph of utilization and error statistics. Follow these steps: 1 Set up a graph that shows utilization and errors on all your major segments. 2 Determine whether any segments are particularly busy or error prone. EXAMPLE: You notice that one segment of the UNIX network, HUB3, is reporting “Too Long Errors” and “FCS Errors” roughly every second sample. While the amount is not higher than normal, it is currently higher than any other segment. LANsentry Manager Using the LANsentry Manager History View, you can display a rolling History View history table to determine if the errors that you are seeing are new. For example, if you have a history table that runs for 30-minute samples over two days, you can compare the most recent sample to a previous sample, looking for new errors. If your probe has the resources, use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors. EXAMPLE: You see that the history table shows that no error rates remained constant throughout the day. However, errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors. If you notice low error rates that are not triggering alarms, use a recent history of the network to see if the errors occur in regular bursts and to estimate the average number of errors.
  135. 138 CHAPTER 13: NETWORK FILE SERVER TIMEOUTS Reproducing the Although the RMON View in LANsentry Manager can show error rates Fault While and help you to identify the location of the problem, it may not provide Monitoring the enough data to solve the problem. To determine the cause of the Network problem, reproduce it while you monitor the network by using these applications: s LANsentry Manager Top-N Graph — To locate a quiet node to use for reproducing the fault s LANsentry Manager Packet Capture — To capture packets from the hub to which the quiet node is attached s LANsentry Manager Packet Decode — To analyze the packets to assess network file server traffic and delays s Address Tracker— To find the location of the problem nodes EXAMPLE: Using LANsentry Manager, you find a hub on the network with a higher than normal error rate. However, the error rate does not seem high enough to cause login delays of 60 seconds or more. LANsentry Manager Using the Top-N graph in the LANsentry Manager main window, locate a Top-N Graph quiet node that has been showing the same problem. Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem. EXAMPLE: You see that the node, Monolith, which has the same Network File System (NFS) mounts as the other nodes on the network, is quiet. You decide to use this node for reproducing the fault. See “Network File System (NFS) Protocol” for more information about NFS. LANsentry Manager Using the LANsentry Manager Packet Capture application, capture Packet Capture packets from the network using predefined patterns and start-and-stop conditions. Follow these steps: 1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node. Until you know more about the problem, set a very general filter. EXAMPLE: You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith.
  136. Reproducing the Fault While Monitoring the Network 139 2 Telnet into and log out from the quiet node. Then reset the capture buffer. Repeat this procedure until you see the problem reflected in your captured data. To keep the buffer information clean, reset the buffer each time that you repeat the procedure. 3 When you see the delay, note the rough value of the packet count on the LANsentry Manager packet buffer. By noting the packet count at which you think the delay has occurred, you can narrow the problem to within about 20 packets in the buffer. If you have used an extremely quiet node, you may even identify the exact packet. LANsentry Manager The LANsentry Manager Packet Decode application decodes all major Packet Decode protocols and displays the packet contents at three levels of detail: summary information, header information, and actual packet content. Follow these steps: 1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred. When you Telnet into a node, the traffic that the Telnet operation generates appears in the capture buffer. Expect this traffic when you read the buffer. 2 Select the packet and launch a MAC-layer conversation filter. In the filter display, look for a gap in the conversation (that is, where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem). 3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes. EXAMPLE: On the quiet node that you selected, the delay is obvious. You see an NFS request going out to a node and a repeat of the request 30 seconds later. During that time, the node did not respond. You now know that the delay occurred because nodes were not seeing responses for NFS requests. When you repeat the test on other nodes, you find that the delay is happening with more than one destination node. Address Tracker Use Address Tracker, which polls managed devices, to determine the hubs to which the problem nodes are attached. If the problem end stations are located on unmanaged devices, then you can at least narrow the problem to those unmanaged devices.
  137. 140 CHAPTER 13: NETWORK FILE SERVER TIMEOUTS EXAMPLE: Although your network does not have managed hubs that Transcend® NCS management software can poll, it does have managed switches. When it polls the switches, Address Tracker displays the switch ports on which addresses were last seen. This information indicates the hub (but not the hub port) on which the device is located. If you need to take immediate action to resolve this problem for your users, move all the network file servers to different hubs. This quick fix reduces the amount of timeouts. LANsentry Manager After you know the location of the hub that has the problem node, Packet Decode monitor the problem from the hub using LANsentry Manager Packet Decode. Follow these steps: 1 To capture packets from one of the nodes on the hub, set up another capture buffer and repeat the exercise that is described in “LANsentry Manager Packet Capture”. Because a delay may occur on a different node, use two capture buffers without stopping the first one. Note the rough packet count where the delay appears. 2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation. EXAMPLE: You hope that the nodes are on the same hub. You find that all the nodes are on HUB3. This result indicates that FCS Errors may be causing the timeouts. However, because the errors occur at a low rate, you decide to verify this diagnosis. You monitor the problem from the hub, logging in and out many times, and the delay eventually occurs. This time, the delay shows that the node’s reply had an FCS Error even though the node received the request. The switch would not have transmitted this packet, causing a timeout on the NFS protocol. The retry time is presumably 30 seconds. During this test, you see the problem occurring on another node. Correcting the Fault Without a managed hub, you may find it very difficult to discover network file server timeout errors. To find the problematic node, you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub.
  138. Network File Server Timeouts Reference 141 EXAMPLE: You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission. A possible reason for this occurrence is a “Jabbering” node. This explanation makes sense because FCS/Jabber frames increased linearly when you were monitoring the live network. Network File Server This section explains terms that are relevant to network file server Timeouts Reference timeouts and provides additional conceptual and problem analysis detail. Jabbering When a node transmits illegal length packets and is possibly not operating within carrier specifications. In effect, another node has written bad data over a valid packet. This bad data is often interpreted as a repeated sequence of data. Network File System A distributed file system protocol developed by Sun Microsystems that (NFS) Protocol allows a computer system to access files over a network as if they were on its local disks. This protocol has been incorporated into products by more than 200 companies. It is now a de facto Internet standard. NFS is one protocol in the NFS suite of protocols, which includes NFS, RPC, XDR (External Data Representation), and others. These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC). ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun.
  139. 142 CHAPTER 13: NETWORK FILE SERVER TIMEOUTS
  140. MEASURING ATM NETWORK 14 PERFORMANCE Measuring performance of your Asynchronous Transfer Mode (ATM) network is an important step in establishing appropriate metrics for the most desired operation. Using these metrics you can establish a baseline and measure future performance. This chapter describes how to use the Enterprise VLAN Manager to measure ATM network performance. The following topics are described: s Measuring Traffic Performance s Measuring Device Level Performance s Measuring Port Level Performance s LANE Component Statistics Measuring Traffic Use the Utilization tool icon to launch and view traffic statistics between Performance two or more switches. The Utilization tool can be configured to collect, display, and store information about good or bad traffic patterns across the network. In addition, information on new switches is automatically collected as soon as they are added to the network. The tool’s browser displays both the ATM switch and Ethernet switch hierarchy separately. You can add ATM or Ethernet switches to the Utilization Map using the Add button. After you add the switches to the Utilization map, data collection starts automatically. The Configure option allows you to custom configure traffic polling, communications, and map configuration settings. These settings provide the base of information reported by the history portion of the tool. You can view historical data collected and stored by the Utilization Tool as line graphs, pie charts, and bar graphs. Utilization Map The Utilization maps display the traffic patterns between switches. ATM switches are displayed as circular icons that are also pie charts that
  141. 144 CHAPTER 14: MEASURING ATM NETWORK PERFORMANCE represent the in and out User-Network Interface (UNI) traffic that corresponds to that switch. The upper portion of the pie represents the maximum percentage of bandwidth utilization of the in/out UNI traffic. The lower portion of the circle represents the maximum percentage of speed of the in UNI traffic and appears in magenta. The IP address is displayed below the switch icon. Displaying Link Traffic The lines between the switch icons represent switch-to-switch links. The traffic load on each link is dynamically updated and is represented by a unique color. To view the legend information, select the Map Legend from the Map menu of the Utilization map. The links are color coded according to the following legend: s 0 - 1 percent White s 1 - 3 percent Green s 3 - 10 percent Blue s 10 - 20 percent Yellow s 20 - 100 percent Red Displaying Node Configuration Select the switch in the Utilization Map and then select Node Configuration from the Map menu. The following static parameters of the switch are shown: s Name s IP address s ATM address Configuring the The Utilization tool has a complete set of configuration options. To Utilization Tool configure the Utilization tool, select the Configuration option from the Map menu of the Utilization Tool. You must restart the Utilization tool for the changes to take effect. Map Configuration Use Map tab to configure the size of the switch icon as well as the layout of the map itself. Included in this form is the Switch Radius option which allows you to modify the switch icon radius. The default radius is 32. You can select from one of three layout options:
  142. Measuring Device Level Performance 145 s None — Disables the automatic map layout s Rectangular — The map icons display in a rectangle. s Circular — The map icons display in a circle. The Max% Traffic option allows you to set the maximum percentage traffic rate represented on a switch to switch link. Polling Configuration The Polling tab allows you configure the polling interval for data collection of the Utilization tool. Select the following: s Map Enable — Check to enable the dynamic updating of traffic on the Utilization map. s Chart Enable — Check to enable the dynamic updating of node and link performance charts. s Polling, Seconds — Select a polling interval for data collection. Communication Configuration Use Communication Configuration tab to configure the type of data that is monitored and collected. Select the following: s Good Cells — Configures the Utilization tool to collect data on Good Cells. s Bad Cells — Configures the Utilization tool to collect data on Bad Cells such as Errored (BIP), unrecognized ATM Cells. Measuring Device The Performance Statistics windows display performance statistics for Level Performance different objects in the Network. The Performance Statistics windows are “live” and updated automatically by continuous polling of the system. An object can be a device (for example a SuperStack II Switch 2700 or CoreBuilder module), device port (Ethernet or ATM), Emulated LAN entity (LEC or LES) or Virtual Channel. The windows use history graphs, bar charts, pie charts and dials to display the performance information. Polling and logging features can be accessed using the Options menu. Using the History Use the history graph to track device performance over a specified period Graph of time. This metric is useful for spotting trends in performance and isolating downturns in device operation. Position the cross-hairs at a desired point on the history graph and click the left mouse button. The
  143. 146 CHAPTER 14: MEASURING ATM NETWORK PERFORMANCE detailed information about this sample point appears on the lower left corner of the graph. This information includes sample number, sample time, sample graph, and sample value. When you are in the individual sample display mode, click on the right mouse button to return to the normal display mode. Displaying Statistics To display statistics: 1 Select a device in one of the management maps or, click on a branch of the subtree in the Component View of the Topology tool. 2 Select Graph from the Enterprise VLAN menu or click the Graph icon. Measuring Port Port level statistics are useful for isolating heavy-traffic ports. By knowing Level Performance this information, you can determine bottlenecks and reshape network traffic on a port-by-port basis. This information is also useful for determining Virtual LAN (VLAN) structure and prioritization rationale. Identify high traffic ports so that you can take the proper steps to either isolate or integrate alternative traffic shaping patterns. Doing so is a valuable and necessary step in troubleshooting for peak network performance. 1 Select an element in the Enterprise branch subtree in the Topology tool, or select a port on the device view front panel display. 2 Select Graph from the Enterprise VLAN menu or select the Graph icon. Traffic A History graph shows through frames per second through the port. Four separated sub-graphs are in the performance window: Table 22 Traffic Graphs Graph Meaning inGood All valid frames received at the port inError Errored frames received at the port outGood All valid frames transmitted from the port outError Errored frames transmitted from the port Utilization A Dial graph shows maximum Utilization (10Mbps) of the port.
  144. Measuring Port Level Performance 147 Total Frames A Pie chart shows the distribution of all received and transmitted frames: Table 23 Total Frames Graph Meaning inGood All valid frames received at the port inError Errored frames received at the port outGood All valid frames transmitted from the port outError Errored frames transmitted from the port Good Frames A Pie chart shows the distribution of valid received and transmitted frames: Table 24 Good Frames Graph Meaning inUcast Unicast frames received at the port excluding discards inNonUcast Broadcast and multicast frames received at the port excluding discards outUcast Unicast frames transmitted from the port including discards outNonUcast Broadcast and multicast frames transmitted from the port including discards Errored Frames A Pie chart shows the distribution of errored received and transmitted frames: Table 25 Errored Frames Graph Meaning inDiscards Frames received at the port but discarded for internal reasons inErrors Frames received at the port but discarded due to errors inUnknown Frames received at the port but discarded due to unknown protocols outDiscards Frames discarded from being transmitted from the port for internal reasons outErrors Frames discarded from being transmitted from the port due to errors
  145. 148 CHAPTER 14: MEASURING ATM NETWORK PERFORMANCE LANE Component Use the LANE Component Statistics allow you to measure the Statistics performance of LAN Emulation Services (LES) and LAN Emulation Clients (LEC) in the network. You can display statistics for the following LAN Emulation Services: s LES s LEC s LANE User LES Statistics The LES performance statistics show the type of load that exists on the LAN Emulation Servers, use this information for load balancing when required. The LES performance statistics are as follows: s Data — History graph of transmission rate of Broadcast and Unknown data (BUS) in Emulated LAN. s Data Utilization — Utilization of the transmission rate of the BUS service relative to the maximum possible. s Control Frames — A Pie graph of quality of LE ARPs and other LAN Emulation control frames handled by LES. s Errored Control Frames — A Pie graph of errored control frames. s Data/Control Octets — A pie graph of the ratio between LES transmission rate and BUS transmission rate. To display performance statistics for a LAN Emulation Server: 1 Select an LES icon from the LAN Emulation map or an LES device (found in the Backbone and Services subtree) component in the Component View of the Topology tool. 2 Select Graph from the Enterprise VLAN menu or select the Graph icon. LEC Statistics The LEC Graph displays statistics of the message traffic through the LEC. The LEC Statistics are: s Data frames/sec — History graph of the transmission rate of data frames through the LEC. s Data Frames — Pie graph of the distribution of different types of data frames through the LEC.
  146. LANE Component Statistics 149 s Data Utilization — Utilization of LEC data transmission rate relative to the maximum possible rate. s Control frames/sec — History graph of the transmission rate of control frames through the LEC. s Control Frames — Pie graph of the ratio of transmission of different types of LEC control frames. s Data/Control Frames — Pie graph of the ratio between LEC data frame transmission and LEC control frame transmission. To display performance statistics for an LAN Emulation Client: 1 Select an LEC icon from the management maps or an LEC device component in the Component View of the Topology tool. 2 Select Graph from the Enterprise VLAN menu or use the Graph icon. LANE User The LANE User statistics parameters show the in traffic and out traffic on the LEC and its segments. You may select to display all or part of the LEC groups in the LANE User statistics. To display performance statistics for an LEC: 1 Select the LANE User icon from the management maps or a LANE User device component in the Component View of the Topology tool. 2 Select Graph from the Enterprise VLAN menu or select the Graph icon. Double-click the graph to zoom into one or more of the graphs.
  147. 150 CHAPTER 14: MEASURING ATM NETWORK PERFORMANCE
  148. REFERENCE IV Chapter 15 SNMP in Network Troubleshooting Chapter 16 Information Resources
  149. SNMP IN NETWORK 15 TROUBLESHOOTING The Simple Network Management Protocol (SNMP) and the Management Information Bases (MIBs) it uses are important for troubleshooting your network. These sections provide information about: s SNMP Operation s SNMP MIBs SNMP Operation SNMP which is one of the most widely used management protocols, allows management communication between network devices and your management workstation across TCP/IP internets. Most management applications, including Status Watch and Address Tracker applications, require SNMP to perform their management functions. Manager/Agent SNMP communication requires a manager (the station that is managing Operation network devices) and an agent (the software in the devices that communicates with the management station). SNMP provides the language and the rules that the manager and agent use to communicate. Managers can discover agents: s Through autodiscovery tools on “Network Management Platforms” (such as HP OpenView Network Node Manager) s When you manually enter IP addresses of the devices that you want to manage For agents to discover their managers, you must provide the agents with the IP addresses of the management stations. Managers send requests to agents (either to send information or to set a parameter), and agents provide the requested data or set the parameter.
  150. 154 CHAPTER 15: SNMP IN NETWORK TROUBLESHOOTING Agents can also notify the managers independently through unsolicited trap messages, which indicate that certain events have occurred. SNMP Messages SNMP supports queries (called messages) that allow the protocol to transmit information between the managers and the agents. Types of SNMP messages: s Get and Get-next — The management station requests an agent to report information. s Set — The management station requests an agent to change one of its parameters. s Get Responses — The agent responds to a Get, Get-next, or Set operation. s Trap — The agent sends an unsolicited message to notify the management station that an event has occurred. MIBs define what can be monitored and controlled within a device (that is, what the manager can Get and Set). An agent can implement one or more groups from one or more MIBs. See “SNMP MIBs” for more information. Trap Reporting Traps are unsolicited, asynchronous events that devices generate to indicate status changes. Every agent supports some trap reporting. You must configure trap reporting at the devices so that these events are reported to your management station to be used by the “Network Management Platforms” (such as HP OpenView Network Node Manager) and the “Transcend Applications”. Not all traps are important for your management tasks. To decrease the burden on the management station and on your network, you can limit the number and type of traps reported to the management station. MIBs are not required to document traps. SNMP supports the limited number of traps defined in Table 26. More traps may be defined in vendors’ private MIBs. Table 26 Traps Supported by SNMP Trap Indication Cold Start The agent has started or been restarted. Warm Start The agent’s configuration has changed.
  151. SNMP MIBs 155 Table 26 Traps Supported by SNMP Trap Indication Link Down The status of an attached communication interface has changed from up to down. Link Up The status of an attached communication interface has changed from down to up. Authentication Failure The agent received a request from an unauthorized manager. EGP Neighbor Loss In routers running the Exterior Gateway Protocol (EGP), an EGP Neighbor has changed to a down state. To minimize SNMP traffic on your network, you can implement trap-based polling. Trap-based polling allows the management station to start polling only when it receives certain traps. Your management applications must support trap-based polling for you to take advantage of this feature. Security SNMP uses community strings as a form of management security. To enable management communication, the manager must use the same community strings that are configured on the agent. You can define both read and read/write community strings. Because community strings are included unencoded in the header of a User Datagram Protocol (UDP) packet, packet capture tools can easily access this information. Similar to what you do with any password, change the community strings frequently. See “SNMP Community Strings” for more information. SNMP MIBs SNMP MIBs include MIB-II, other standard MIBs (such as the RMON MIB), and vendors’ private MIBs (such as enterprise MIBs from 3Com). These MIBs and their objects are part of the MIB tree. MIB Tree The MIB tree is a structure that groups MIB objects in a hierarchy and uses an abstract syntax notation to define manageable objects. Each item on the tree is assigned a number (shown in parentheses after each item), which creates the path to objects in the MIB. See Figure 18. This path of numbers is called the object identifier (OID). Each object is uniquely and unambiguously identified by the path of numeric values.
  152. 156 CHAPTER 15: SNMP IN NETWORK TROUBLESHOOTING When you perform an SNMP Get operation, the manager sends the OID to the agent, which in turn determines whether the OID is supported. If the OID is supported, the agent returns information about the object. For example, to retrieve an object from the RMON MIB, the software uses this OID: 1.3.6.1.2.1.16 which indicates this path: iso(1).indent-org(3).dod(6).internet(1).mgmt(2).mib(1).RMON( 16)
  153. SNMP MIBs 157 Figure 18 MIB Tree Showing Key SNMP MIBs ROOT ccit(0) iso(1) joint(2) standard(0) reg-authority(1) member-body(2) indent-org(3) dod(6) internet(1) directory(1) mgmt(2) experimental(3) private(4) mib(1) enterprises(1) MIB-II (1-11) system(1) at(3) icmp(5) udp(7) transmission(10) RMON(16) 3Com® enterprise MIBs: a3Com(43) interfaces(2) ip(4) tcp(6) egp(8) snmp(11) RMON2(17) synernetics(114) chipcom(49) startek(260) onstream(135) retix(72) MIB-II MIB-II defines various groups of manageable objects that contain device statistics as well as information about the device, device status, and the number and status of interfaces. The MIB-II data is collected from network devices using SNMP. This data collects in its raw form. To be useful, data must be interpreted by a management application, such as Status Watch.
  154. 158 CHAPTER 15: SNMP IN NETWORK TROUBLESHOOTING MIB-II, the only MIB that has reached Internet Engineering Task Force (IETF) standard status, is the one MIB that all SNMP agents are likely to support. Table 27 lists the MIB-II object groups. The number in parentheses indicates the group’s branch in the MIB subtree. MIB-I supports groups 1 through 8; MIB-II supports groups 1 through 8, plus two additional groups. Table 27 SNMP MIB-II Group Descriptions MIB-II Group Purpose system(1) Operates on the managed node interfaces(2) Operates on the network interface (for example, a port or MAC) that attaches the device to the network at(3) As used for address translation in MIB-I but is no longer needed in MIB-II ip(4) Operates on the Internet Protocol (IP) icmp(5) Operates on the Internet Control Message Protocol (ICMP) tcp(6) Operates on the Transmission Control Protocol (TCP) udp(7) Operates on the User Datagram Protocol (UDP) egp(8) Operates on the Exterior Gateway Protocol (EGP) transmission(10) Applies to media-specific information (implemented in MIB-II only) snmp(11) Operates on SNMP (implemented in MIB-II only) RMON MIB RMON is an SNMP MIB that enables the collection of data about the network itself, rather than about devices on the network. A typical RMON system consists of two components: s Probe — Connects to a LAN segment, examines all the LAN traffic on that segment, and keeps a summary of statistics (including historical data) in the probe’s local memory. The probe can stand alone or be embedded within the agent software. See “Other Commonly Used Tools” and “3Com SmartAgent Embedded Software” for more information. s Management station — Communicates with the probe and collects the summarized data from it. The station can be on a different network from the probe and can manage the probe through either in-band or out-of-band connections.
  155. SNMP MIBs 159 The IETF definition for the RMON MIB specifies several groups of information. These groups are described in Table 28. Table 28 RMON Group Descriptions RMON Group Description Statistics(1) Total LAN statistics History(2) Time-based statistics for trend analysis Alarm(3) Notices that are triggered when statistics reach predefined thresholds Hosts(4) Statistics stored for each station’s MAC address HostTopN(5) Stations ranked by traffic or errors Matrix(6) Map of traffic communication among devices (that is, who is talking to whom) Filter(7) Packet selection mechanism Capture(8) Traces of packets according to predefined filters Event(9) Reporting mechanisms for alarms Token Ring(10) s Ring Station — Statistics and status information associated with each token ring station on the local ring, which also includes status information for each ring being monitored s Ring Station Order — Location of stations on monitored rings s Source Routing Statistics — Utilization statistics derived from source routing information optionally present in token ring packets RMON2 MIB RMON and RMON2 are complementary MIBs. The RMON2 MIB extends the capability of the original RMON MIB to include protocols above the MAC level. Because network-layer protocols (such as IP) are included, a probe can monitor traffic through routers attached to the local subnetwork. Use RMON2 data to identify traffic patterns and slow applications. The RMON2 probe can monitor: s The sources of traffic arriving by a router from another network s The destination of traffic leaving by a router to another network Because it includes higher-layer protocols (such as those at the application level), an RMON2 probe can provide a detailed breakdown of traffic by application.
  156. 160 CHAPTER 15: SNMP IN NETWORK TROUBLESHOOTING Table 29 lists the additional MIB groups that are available with RMON2. Table 29 RMON2 Group Descriptions RMON2 Group Description Protocol Directory(11) Lists the inventory of protocols that the probe can monitor Protocol Distribution(12) Collects the number of octets and packets for protocols detected on a network segment Address Map(13) Lists MAC-address-to-network-address bindings discovered by the probe, and the interface on which the bindings were last seen Network Layer Host(14) Counts the amount of traffic sent from and to each network address discovered by the probe Network Layer Matrix(15) Counts the amount of traffic sent between each pair of network addresses discovered by the probe Application Layer Host(16) Counts the amount of traffic, by protocol, sent from and to each network address discovered by the probe Application Layer Matrix(17) Counts the amount of traffic, by protocol, sent between each pair of network addresses discovered by the probe User History(18) Periodically samples user-specified variables and logs the data based on user-defined parameters Probe Configuration(19) Defines standard configuration parameters for RMON probes 3Com Enterprise MIBs 3Com Enterprise MIBs allow you to manage unique and advanced functionality of 3Com devices. MIB names and numbers are usually retained when organizations restructure their businesses; therefore, many of the 3Com Enterprise MIB names do not contain the word “3Com.” Figure 18 shows some of the 3Com Enterprise MIB names and numbers.
  157. INFORMATION RESOURCES 16 This section lists the information resources that can help you troubleshoot problems with your network. It contains: s Books s URLs Books The books listed in Table 30 can help you with network troubleshooting. Table 30 Reference Books IBM’s Token-Ring Networking Handbook Interconnections: Bridges and Routers (J. Ranade Series on Computer (Addison-Wesley Professional Computing Communications) Series) Author: George C. Sackett Author: Radia Perlman Publisher: McGraw Hill Text Publisher: Addison-Wesley Publishing Co. ISBN: 0070544182 ISBN: 0201563320 Publish Date: June 1993 Publish Date: May 1992 Internetworking with TCP/IP: Design, Internetworking with TCP/IP: Principles, Implementation, and Internals Protocols, and Architecture Authors: Douglas E. Comer, David L. Author: Douglas E. Comer Stevens Publisher: Prentice-Hall Publisher: Prentice Hall Edition: 3rd Edition: 2nd ISBN: 0132169878 ISBN: 0131255274 Publish Date: April 1995 Publish Date: June 1994 (continued)
  158. 162 CHAPTER 16: INFORMATION RESOURCES Table 30 Reference Books (continued) Managing Switched Local Area Network Management Standards: SNMP, Networks. CMIP, TMN, MIBs, and Object Libraries (McGraw-Hill Computer Author: Darryl Black Communications) Publisher: Addison Wesley Longman, Inc. Author: Uyless Black ISBN: 0201185547 Publisher: McGraw Hill Text Publish Date: November 1997 Edition: 2nd ISBN: 007005570X Publish Date: November 1994 The Complete Guide to Netware LAN The Simple Book: An Introduction to Analysis Networking Management Authors: Laura A. Chappell, Dan E. Hakes Author: Marshall Rose Publisher: Sybex Publisher: Prentice-Hall Edition: 3rd Edition: 2nd ISBN: 0782119034 ISBN: 0134516591 Publish Date: July 1996 Publish Date: 1996 Token Ring Network Design (Data Troubleshooting TCP/IP (Network Communications and Networks) Troubleshooting Library) Author: David Bird Author: Mark A. Miller Publisher: Addison-Wesley Publishing Co. Publisher: M & T Books ISBN: 0201627604 Edition: 2nd Publish Date: July 1994 ISBN: 1558514503 Publish Date: June 1996 URLs The following uniform resource locators (URLs) lead to Web sites that are useful for network troubleshooting: s www.3Com.com — 3Com Corporation’s Web site, which contains: s The latest release notes and documentation for all 3Com products. Documents are organized in the Support area by product type. s White papers and other technical documents about networking technology and solutions. s 3Com product information. s The 3Com Shopping Network.
  159. URLs 163 s wwwhost.ots.utexas.edu/ethernet/ethernet-home.html — Charles Spurgeon’s Ethernet Web Site, which includes Ethernet troubleshooting information. s techweb.cmp.com/nc/netdesign/series.htm — Network Computing Online’s Interactive Network Design Manual, which helps you to design and troubleshoot networks. s www.nmf.org — Network Management Forum (NMF), a nonprofit global consortium that promotes and accelerates the worldwide acceptance and implementation of a common, service-based approach to the management of networked information systems. s www.ovforum.org — HP OpenView Forum’s Web site. HP OpenView Forum is a nonprofit corporation formed by the largest licensees of Hewlett-Packard OpenView to represent the interests of HP OpenView users and developers world-wide. The Forum is an independent corporation, not affiliated with Hewlett-Packard Company. s hpcc920.external.hp.com/openview/index.html — HP OpenView home page. s www.iol.unh.edu/index.html — University of New Hampshire InterOperability Lab (IOL) web site. Information on IOL consortiums, test suites, and technology tutorials. s www.3com.com/nsc/500251.html — Location of the document RMON Methodology: Towards Successful Deployment for Distributed Enterprise Management by John McConnell of McConnell Consulting, published in 1997. These URLs are known to work; however, URLs are subject to change without notice.
  160. 164 CHAPTER 16: INFORMATION RESOURCES
  161. INDEX 165 INDEX ATM parameters 106 Ethernet parameters 107 FDDI parameters 108 problems with 103 token ring parameters 108 baselines creating 62 defined 62 setting alarms from 56 book resources 161 BootP defined 118 Numerics BOOTstrap protocol 118 3Com enterprise MIBs 160 broadcast packets defined 114 See also broadcast storms A broadcast storms Address Resolution Protocol broadcast packets 114 role in duplicate MAC addresses 117 defined 109 alarms disabling the offending interface 113 defined 54 first clues 109 defining Start and Stop events 56 identifying with Traffix Manager 111 setting against a baseline 56 monitoring with Status Watch 110 setting in LANsentry Manager 55 multicast packets 114 tips for setting 57 troubleshooting 109 alignment errors BUS causes and actions 121 in emulated LAN 148 defined 127 tranmission rate utilization 148 See also FCS errors analyzers defined 41 C use in troubleshooting 42 cable testers 42 See also probes cabling analyzing symptoms 28 faulty 121 application layer 25 problems 74, 124 ARP testing 42 quality of 148 too long 129 ARP (Address Resolution Protocol) too short 130 role in duplicate MAC addresses 117 collisions ATM (Asynchronous Transfer Mode) utilization 106 causes and actions 123 audience description, About This Guide 13 defined 127 excessive 119 late 119 B related to packet loss 119 backbone when normal 119 checking utilization 103 See also excessive collisions and late collisions location of management station 44 color coded legend monitoring with probes 48 Utilization 144 background noise 63 color status propogation 94 bad traffic communications servers Utiolization 145 connecting on the network 51 balancing network load 32 defined 50 bandwidth utilization community strings
  162. 166 INDEX default settings for 3Com devices 70 monitoring with Transcend software 53 defined 155 DHCP (Dynamic Host Configuration Protocol) device configuration 53 defined 118 configuring diagnostic equipment on FDDI 78 Utilization Tool 144 disabling an interface 113 configuring and customizing DNS server problems 40 Utilization 144 dual homing congested station 76 configuration 78 connections defined 77 adding redundancy 77 dual hosting 52 undesirable for FDDI 80 duplicate addresses valid for FDDI 80 causes 115 connectivity problems troubleshooting 115 defined 23 with IP addresses 116 FDDI ring disconnections 73 with MAC addresses 116 manager-to-agent communication 67 Dynamic Host Configuration Protocol 118 conventions notice icons, About This Guide 15 text, About This Guide 15 E CRC (Cyclic Redundancy Check) errors ECAM 116, 117 causes and actions 123 elasticity buffer errors defined 127 causes 131 customizing defined 133 Utilization Tool 144 Enterprise Communications Analysis Module 116 enterprise MIBs 160 equipment D backups 32 data link layer 25 for testing 31 DECnet Phase 4 networks 118 replacing 32 default community strings 70 Ethernet default thresholds 54 cabling problems 124 designing a network 43 frames through port 147 device configurations network problems 76 for management 52 nonstandard cabling problems 129 misconfigured 29 port Ping responder 39 utilization 146 storing 60 segment problems 76 device level station problems 75 measuring performance 145 utilization 107 troubleshooting 95 Ethernet packet loss Device View checking with LANsentry Manager 122 checking packet loss statistics 125 checking with Status Watch 121 correcting spanning tree configurations 113 Ethernet standard violations 129 defined 36 troubleshooting 119 using to set traps 53 excessive collisions devices causes and actions 122 configuration information 60 defined 128 configuring for management 52 related to packet loss 119 default community strings 70 faulty 122 grouping 34, 54 inventory 34, 61 monitoring with probes 48
  163. INDEX 167 hysteresis zone F controlling alarms 55 FCS (Frame Check Sequence) errors defined 128 related to packet loss 119 I FCS errors identifying VLAN splits 98 See also alignment errors IETF FDDI MIB-II MIB 158 identifying problems with Status Watch 132 RMON MIB 159 MAC errors 132 in/out UNI traffic 144 ring errors 131 in-band management 49 station problems 76 information resources 161 utilization 108 installation problems 14 FDDI backbone intermittent connectivity 23 monitoring with probes 48 internet link position of management station 44 monitoring with probes 48 FDDI connectivity IP address adding redundancy 77 switch 144 dual homing 77 IP addresses Optical Bypass Unit 78 causes of duplicates 115, 118 SMT role 74 defined 69 troubleshooting 73 device configuration 53 undesired connections 80 dynamically assigned 118 valid connections 80 identifying duplicates 116 file servers Pinging 39 correcting timeouts 136 IP hostnames firewalls device configuration 53 protection against broadcast storms 109 Pinging 40 restricting access 29 ISO (International Standards Organization) 25 frame errors 131 isolated stations causes 132 defined 74 defined 133 frames not copied defined 133 J FTP (File Transfer Protocol) jabbering compared to TFTP 41 defined 141 defined 41 protection mechanism failure 124 G L gateway address LAN driver, faulty 125 defined 69 LAN Emulation good traffic tracing VCCs between LANE clients Utilization 145 in Wizard Tool 100 LANE statistics 148 H LANE User hardware statistics 149 backups 32 LANSentry Manager 84 upgrading 32 LANsentry Manager historical reports 106 analyzing file server timeouts 136, 138 history checking Ethernet packet loss 122 graph 145 decoding packets 139
  164. 168 INDEX defined 35 Upgrade Manager 36 identifying duplicate IP addresses 117 Web Reporter 34 setting alarms 55 management station setting thresholds 55 configuration 52 late collisions connecting to UPS 52 causes and actions 123 dual hosting 52 defined 128 location on network 44 related to packet loss 119 RMON MIB 158 LE Server security 52 data 148 measuring statistics 148 network-wide ATM traffic LER cutoff 132 Utilization 143 link errors 74 MIB browser causes 132 in NNM 37 defined 134 viewing the tree 155 log book MIB-II maintaining 62 defined 157 logical network configuration 60 objects 158 loss of connectivity MIBs overview 23 enterprise 160 example of OID 156 in SNMP management 154 M MIB-II 157 MAC addresses RMON 158 causes of duplicate 115, 117 RMON2 159 finding 34 tree representation 157 identifying duplicate 116 tree structure 155 storing 61 misconfigurations MAC neighbor change events 134 in newly connected devices 29 MAC Watch modem finding duplicate IP addresses 116 accessing the device console 49 troubleshooting file server timeouts 139 out-of-band connections 49 MACFrameErrorRatio variable 132 multicast packets MAC-to-IP address translation 35 defined 114 managed hubs See also broadcast storms defined 46 in troubleshooting 52 troubleshooting file server timeouts 140 N management configurations Network Admin Tools 32 checking 68 network changes design of network 43 interpreting 27 gateway address 69 network configuration IP address 69 device configurations 60 SNMP community strings 69 site map 58 SNMP traps 71 VLAN setup 60 management software network design Enterprise VLAN Manager 60 console connections 49 Device View 36 criteria 43 LANsentry Manager 35 for business-critical networks 47 Network Admin Tools 32 position of management station 44 Status Watch 34 redundant management 51 Traffix Manager 35 tips 52 Transcend Central 34 using communications servers 50
  165. INDEX 169 using probes 45 with Telnet 41 network file server timeouts checking for errors 136 correcting the problem 140 P decoding packets 139 packet capturing description 135 using analyzers 41 overview 135 passwords reproducing the fault 138 community strings 155 Network ID 69 storing 61 network layer 25 peer wrap condition network load causes 73 balancing 32 defined 79 network loop 122 evaluating 77 network management performance position of management station 44 device level 145 network management platforms performance problems 131 defined 36 checking utilization 103 in troubleshooting 37 correcting duplicate addresses 115 network map correcting FDDI ring errors 131 content 59 defined 24 defined 58 Ethernet congestion problems 119 example 59 solving file server timeouts 135 network noise 121 stopping broadcast storms 109 network performance measurement 143 Performance Statistics window 145 NFS physical connection break 29 defined 141 physical layer 25 in file server timeouts 138 Ping NNM checking file server response 136 MIB browser 37 creating a script 40 normal networks defined 39 baselining 62 device configuration 39 collision rates 119 interpreting messages 40 defined 62 strategies for using 39 identifying background noise 63 Ping responder 39 setting thresholds and alarms 54 platforms 36 presentation layer 25 probes O defined 42 OBU in troubleshooting 42 configuration 79 on business-critical networks 47 defined 78 placement on a network 45 OID RMON MIB 158 example 156 roving analysis 46 MIB tree 157 See also analyzers 42 use in trap reporting 53 problems ONC 141 analysis example 30 Open Network Computing 141 device configuration 29 OSI reference model identifying causes 29 and network troubleshooting 25 physical connection break 29 graphical representation 26 recognizing symptoms 27 layers and troubleshooting tools 25 software installation 14 out-of-band connections solving 32 defined 49 testing causes 29
  166. 170 INDEX Transcend software errors 14 SNMP community strings 71, 155 understanding 29 segmented ring protocol analysis 41 defined 74 identifying 75 serial line Q accessing the device console 49 QoS (Quality of Service) 29 out-of-band connections 49 servers comm 50 R timeouts 135 RARP 118 session layer 25 receive discards 129 SmartAgent software redundant connections defined 37 dual homing 77 use in troubleshooting 38 Optical Bypass Unit 78 SMT (Station Management) redundant management 51 role in FDDI connectivity 74 replacing faulty equipment 32 SMTConfigurationState variable 73 reporting SMTPeerWrapFlag variable 77 with Web Reporter 34 SNMP reports messages 154 historical 106 SNMP agent utilization 106 defined 153 resources troubleshooting communication problems 67 books 161 SNMP community strings URLs 162 3Com defaults 70 Reverse ARP 118 defined 69, 155 RIP packets 111 device configuration 53 RMON SNMP Get groups 159 defined 154 LANsentry Manager 35 when valid 70 MIB definition 158 SNMP Get Responses probes 42 defined 154 SmartAgent software 38 SNMP Get-next Traffix Manager 35 defined 154 RMON2 when valid 70 groups 160 SNMP management LANsentry Manager 35 location of station on network 45 MIB definition 159 problems with 45 probes 42 SNMP manager purpose 159 defined 153 Traffix Manager 35 troubleshooting communication problems 67 routers, faulty 125 SNMP Set Routing Information Protocol 111 defined 154 routing table when valid 70 examining 50 SNMP traps roving analysis defined 71, 154 in business-critical networks 49 device configuration 53 with probes 46 message description 154 supported objects 154 software S alerts 28 security backups 32 of management station 52 problems 14
  167. INDEX 171 upgrading 32 timeout problems solving problems network file servers 135 balancing network load 32 overview 23 overview 24 Token Ring Manager replacing equipment 32 Statistics Tool 82 upgrading software and hardware 32 token ring utilization 108 spanning tree too long errors causing broadcast storms 110 causes and actions 124 correcting configurations 113 defined 129 traffic not monitored 111 too short errors statistics causes and actions 124 LANE component 148 defined 130 LE Server 148 tracing Status Watch LAN Emulation Control VCCs checking for Ethernet packet loss 121 in Wizard Tool 100 checking utilization 104 traffic patterns 143 defined 34 evaluating 36 identifying a broadcast storm 110 RMON2 MIB 159 identifying duplicate FDDI MAC addresses 116 Traffix Manager identifying FDDI ring errors 132 defined 35 setting thresholds 54 identifying broadcast storms 111 Stop and Start events 56 transceiver, faulty 121 subnetwork mask Transcend Central defined 69 3Com inventory database 53 switch radius 144 defined 34 symptoms grouping devices 54 analyzing 28 Transcend Software recognizing 27 Upgrade Manager 36 software alerts 28 Transcend software user comments 27 Device View 36 Enterprise VLAN Manager 60 LANsentry Manager 35 T monitoring devices 53 Telnet Network Admin Tools 32 accessing the device console 49 Status Watch 34 checking file server response 136 Traffix Manager 35 defined 41 Transcend Central 34 examining a routing table 50 troubleshooting toolbox 33 out-of-band connections 41, 49 Web Reporter 34 use in troubleshooting 41 transmit discards testing defined 130 equipment 31 transport layer 25 proving a theory 29 trap reporting TFTP defined 154 compared to FTP 41 device configuration 53 defined 41 trap-based polling 155 thresholds troubleshooting defined 54 device level 95 hysteresis zone 55 LANE level 95 setting in LANsentry Manager 55 Virtual LANs level 97 setting in Status Watch 54 troubleshooting strategy 26 tips for setting 57 twisted ring thru ring 73 defined 75, 79
  168. 172 INDEX evaluating 77 historical utilization reports 106 wiring testing 42 U Wizard Tool undesired connection attempt tracing defined 80 LAN Emulation VCCs 100 evaluating 77 wrapped ring uninterruptible power supply 52 defined 73 upgrading software identifying 75 to solve problems 32 peer wrap condition 73, 79 using FTP 41 WWW browser using TFTP 41 with Web Reporter 34 UPS 52 URL resources 162 user complaints 27 Utilization bad traffic 145 color coded legend 144 Communication Configuration Tab 145 configuring and customizing 144 good traffic 145 historical data 143 Map Configuration Tab 144 Polling Configuration Tab 145 Traffic Polling Configuration 145 utilization ATM parameters 106 Ethernet parameters 107 FDDI parameters 108 historical reports 106 of Ethernet port 146 of LEC transmission 149 problems with 103 token ring parameters 108 transmission rate of BUS 148 Utilization Icon 143 Utilization Map 143 Utilization Tool configuring and customizing 144 V valid service 29 VLAN splits 98 VLANs (virtual LANs) 60 W WAN Link monitoring with probes 48 Web Reporter defined 34

×