Your SlideShare is downloading. ×
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Network troubleshoots
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Network troubleshoots

8,982

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
8,982
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
79
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Network Troubleshooting Tools By Joseph D. Sloan Publisher : OReilly Pub Date : August 2001 ISBN : 0-596-00186-XTable of Pages : 364Contents Network Troubleshooting Tools helps you sort through the thousands of tools that have been developed for debugging TCP/IP networks and choose the ones that are best for your needs. It also shows you how to approach network troubleshooting using these tools, how to document your network so you know how it behaves under normal conditions, and how to think about problems when they arise so you can solve them more effectively. Y FL AM TE Team-Fly®
  • 2. Table of Content Table of Content ........................................................................................................... ii Preface........................................................................................................................... v Audience................................................................................................................... vi Organization............................................................................................................. vi Conventions ............................................................................................................. ix Acknowledgments ................................................................................................... ix Chapter 1. Network Management and Troubleshooting ........................................ 1 1.1 General Approaches to Troubleshooting....................................................... 1 1.2 Need for Troubleshooting Tools...................................................................... 3 1.3 Troubleshooting and Management................................................................. 5 Chapter 2. Host Configurations................................................................................ 14 2.1 Utilities ............................................................................................................... 15 2.2 System Configuration Files ............................................................................ 27 2.3 Microsoft Windows .......................................................................................... 32 Chapter 3. Connectivity Testing............................................................................... 35 3.1 Cabling .............................................................................................................. 35 3.2 Testing Adapters.............................................................................................. 40 3.3 Software Testing with ping............................................................................. 41 3.4 Microsoft Windows .......................................................................................... 54 Chapter 4. Path Characteristics ............................................................................... 56 4.1 Path Discovery with traceroute...................................................................... 56 4.2 Path Performance............................................................................................ 62 4.3 Microsoft Windows .......................................................................................... 77 Chapter 5. Packet Capture ....................................................................................... 79 5.1 Traffic Capture Tools ...................................................................................... 79 5.2 Access to Traffic .............................................................................................. 80 5.3 Capturing Data ................................................................................................. 81 5.4 tcpdump............................................................................................................. 82 5.5 Analysis Tools .................................................................................................. 93 5.6 Packet Analyzers ............................................................................................. 99 5.7 Dark Side of Packet Capture ....................................................................... 103 5.8 Microsoft Windows ........................................................................................ 105 Chapter 6. Device Discovery and Mapping.......................................................... 107 6.1 Troubleshooting Versus Management ....................................................... 107 6.2 Device Discovery ........................................................................................... 109 6.3 Device Identification ...................................................................................... 115 6.4 Scripts.............................................................................................................. 119 6.5 Mapping or Diagramming............................................................................. 121 6.6 Politics and Security...................................................................................... 125 6.7 Microsoft Windows ........................................................................................ 126 Chapter 7. Device Monitoring with SNMP............................................................ 128 7.1 Overview of SNMP ........................................................................................ 128 7.2 SNMP-Based Management Tools .............................................................. 132 ii
  • 3. 7.3 Non-SNMP Approaches ............................................................................... 154 7.4 Microsoft Windows ........................................................................................ 154Chapter 8. Performance Measurement Tools ..................................................... 158 8.1 What, When, and Where .............................................................................. 158 8.2 Host-Monitoring Tools................................................................................... 159 8.3 Point-Monitoring Tools.................................................................................. 160 8.4 Network-Monitoring Tools ............................................................................ 167 8.5 RMON.............................................................................................................. 176 8.6 Microsoft Windows ........................................................................................ 179Chapter 9. Testing Connectivity Protocols ........................................................... 184 9.1 Packet Injection Tools................................................................................... 184 9.2 Network Emulators and Simulators ............................................................ 193 9.3 Microsoft Windows ........................................................................................ 195Chapter 10. Application-Level Tools ..................................................................... 197 10.1 Application-Protocols Tools ....................................................................... 197 10.2 Microsoft Windows ...................................................................................... 208Chapter 11. Miscellaneous Tools .......................................................................... 209 11.1 Communications Tools ............................................................................... 209 11.2 Log Files and Auditing ................................................................................ 213 11.3 NTP................................................................................................................ 218 11.4 Security Tools .............................................................................................. 220 11.5 Microsoft Windows ...................................................................................... 221Chapter 12. Troubleshooting Strategies............................................................... 223 12.1 Generic Troubleshooting............................................................................ 223 12.2 Task-Specific Troubleshooting.................................................................. 226Appendix A. Software Sources .............................................................................. 234 A.1 Installing Software......................................................................................... 234 A.2 Generic Sources............................................................................................ 236 A.3 Licenses.......................................................................................................... 237 A.4 Sources for Tools .......................................................................................... 237Appendix B. Resources and References ............................................................. 250 B.1 Sources of Information ................................................................................. 250 B.2 References by Topic..................................................................................... 253 B.3 References ..................................................................................................... 256Colophon ................................................................................................................... 259 iii
  • 4. Copyright © 2001 OReilly & Associates, Inc. All rights reserved.Printed in the United States of America.Published by OReilly & Associates, Inc., 101 Morris Street, Sebastopol, CA 95472.Nutshell Handbook, the Nutshell Handbook logo, and the OReilly logo are registered trademarks ofOReilly & Associates, Inc. Many of the designations used by manufacturers and sellers to distinguishtheir products are claimed as trademarks. Where those designations appear in this book, and OReilly& Associates, Inc. was aware of a trademark claim, the designations have been printed in caps orinitial caps. The association between the image of a basilisk and network troubleshooting is atrademark of OReilly & Associates, Inc.While every precaution has been taken in the preparation of this book, the publisher assumes noresponsibility for errors or omissions, or for damages resulting from the use of the informationcontained herein. iv
  • 5. PrefaceThis book is not a general introduction to network troubleshooting. Rather, it is about one aspect oftroubleshooting—information collection. This book is a tutorial introduction to tools and techniquesfor collecting information about computer networks. It should be particularly useful when dealingwith network problems, but the tools and techniques it describes are not limited to troubleshooting.Many can and should be used on a regular basis regardless of whether you are having problems.Some of the tools I have selected may be a bit surprising to many. I strongly believe that the bestapproach to troubleshooting is to be proactive, and the tools I discuss reflect this belief. Basically, ifyou dont understand how your network works before you have problems, you will find it verydifficult to diagnose problems when they occur. Many of the tools described here should be usedbefore you have problems. As such, these tools could just as easily be classified as networkmanagement or network performance analysis tools.This book does not attempt to catalog every possible tool. There are simply too many tools alreadyavailable, and the number is growing too rapidly. Rather, this book focuses on the tools that I believeare the most useful, a collection that should help in dealing with almost any problem you see. I havetried to include pointers to other relevant tools when there wasnt space to discuss them. In many cases,I have described more than one tool for a particular job. It is extremely rare for two tools to haveexactly the same features. One tool may be more useful than another, depending on circumstances.And, because of the differences in operating systems, a specific tool may not be available on everysystem. It is worth knowing the alternatives.The book is about freely available Unix tools. Many are open source tools covered by GNU- or BSD-style licenses. In selecting tools, my first concern has been availability. I have given the highestpriority to the standard Unix utilities. Next in priority are tools available as packages or ports forFreeBSD or Linux. Tools requiring separate compilation or available only as binaries were given alower priority since these may be available on fewer systems. In some cases, PC-only tools andcommercial tools are noted but are not discussed in detail. The bulk of the book is specific to Ethernetand TCP/IP, but the general approach and many of the tools can be used with other technologies.While this is a book about Unix tools, at the end of most of the chapters I have included a brief sectionfor Microsoft Windows users. These sections are included since even small networks usually include afew computers running Windows. These sections are not, even in the wildest of fantasies, meant to bedefinitive. They are provided simply as starting points—a quick overview of what is available.Finally, this book describes a wide range of tools. Many of these tools are designed to do one thingand are often overlooked because of their simplicity. Others are extremely complex tools or sets oftools. I have not attempted to provide a comprehensive treatment for each tool discussed. Some ofthese tools can be extremely complex when used to their fullest. Some have manuals and otherdocumentation that easily exceed the size of this book. Most have additional documentation that youwill want to retrieve once you begin using them.My goal is to make you aware of the tools and to provide you with enough information that you candecide which ones may be the most useful to you and in what context so that you can get started usingthe tools. Each chapter centers on a collection of related tasks or problems and tools useful for dealingwith these tasks. The discussion is limited to features that are relevant to the problem being discussed.Consequently, the same tool may be discussed in several places throughout the book. v
  • 6. Please be warned: the suitability or behavior of these tools on your system cannot be guaranteed.While the material in this book is presented in good faith, neither the author nor OReilly & Associatesmakes any explicit or implied warranty as to the behavior or suitability of these tools. We stronglyurge you to assess and evaluate these tool as appropriate for your circumstances.AudienceThis book is written primarily for individuals new to network administration. It should also be usefulto those of you who have inherited responsibility for existing systems and networks set up by others.This book is designed to help you acquire the additional information you need to do your job.Unfortunately, the book may also appeal to crackers. I truly regret this and wish there were a way topresent this material to limit its worth to crackers. I never met a system manager or networkadministrator who wasnt overworked. Time devoted to security is time stolen from providing newservices to users or improving existing services. There simply is no valid justification for cracking. Ican only hope that the positive uses for the information I provide will outweigh the inevitablemalicious uses to which it may be put. I would feel much better if crackers would forego buying thisbook.In writing this book, I attempted to write the sort of book I often wished I had when I was learning.Certainly, there are others who are more knowledgeable and better prepared to write this book. Butthey never seemed to get around to it. They have written pieces of this book, a chapter here or atutorial there, for which I am both immensely thankful and greatly indebted.I see this book as a work in progress. I hope that the response to it will make future expanded editionspossible. You can help by sending me your comments and corrections. I would particularly like tohear about new tools and about how you have used the tools described here to solve your problems.Perhaps some of the experts who should have written this book will share their wisdom! While I cantpromise to respond to your email, I will read it. You can contact me through OReilly Book Support atbooktech@oreilly.com.OrganizationThere are 12 chapters and 2 appendixes in this book. The book begins with individual network hosts,discusses network connections next, and then considers networks as a whole.It is unlikely that every chapter in the book will be of equal interest to you. The following outline willgive you an overview of the book so you can select the chapters of greatest interest and either skim orskip over the rest.Chapter 1 This chapter attempts to describe network management and troubleshooting in an administrative context. It discusses the need for network analysis and probing tools, their appropriate and inappropriate uses, professionalism in general, documentation practices, and vi
  • 7. the economic ramifications of troubleshooting. If you are familiar with the general aspects of network administration, you may want to skip this chapter.Chapter 2 Chapter 2 is a review of tools and techniques used to configure or determine the configuration of a networked host. The primary focus is on built-in utilities. If you are well versed in Unix system administration, you can safely skip this chapter.Chapter 3 Chapter 3 describes tools and techniques to test basic point-to-point and end-to-end network connectivity. It begins with a brief discussion of cabling. A discussion of ping, ping variants, and problems with ping follows. Even if you are very familiar with ping, you may want to skim over the discussion of the ping variants.Chapter 4 This chapter focuses on assessing the nature and quality of end-to-end connections. After a discussion of traceroute, a tool for decomposing a path into individual links, the primary focus is on tools that measure link performance. This chapter covers some lesser known tools, so even a seasoned network administrator may find a few useful tools and tricks.Chapter 5 This chapter describes tools and techniques for capturing traffic on a network, primarily tcpdump and ethereal, although a number of other utilities are briefly mentioned. Using this chapter requires the greatest understanding of Internet protocols. But, in my opinion, this is the most important chapter in the book. Skip it at your own risk.Chapter 6 This chapter begins with a general discussion of management tools. It then focuses on a few tools, such as nmap and arpwatch, that are useful in piecing together information about a network. After a brief discussion of network management extensions provided for Perl and Tcl/Tk, it concludes with a discussion of route and network discovery using tkined.Chapter 7 Chapter 7 focuses on device monitoring. It begins with a brief review of SNMP. Next, a discussion of NET SNMP (formerly UCD SNMP) demonstrates the basics of SNMP. The chapter continues with a brief description of using scotty to collect SNMP information. Finally, it describes additional features of tkined, including network monitoring. In one sense, this chapter is a hands-on tutorial for using SNMP. If you are not familiar with SNMP, you will definitely want to read this chapter.Chapter 8 This chapter is concerned with monitoring and measuring network behavior over time. The stars of this chapter are ntop and mrtg. I also briefly describe using SNMP tools to retrieve vii
  • 8. RMON data. This chapter assumes that you have a thorough knowledge of SNMP. If you dont, go back and read Chapter 7.Chapter 9 This chapter describes several types of tools for examining the behavior of low-level connectivity protocols, protocols at the data link and network levels, including tools for custom packet generation and load testing. The chapter concludes with a brief discussion of emulation and simulation tools. You probably will not use these tools frequently and can safely skim this chapter the first time through.Chapter 10 Chapter 10 looks at several of the more common application-level protocols and describes tools that may be useful when you are faced with a problem with one of these protocols. Unless you currently face an application-level problem, you can skim this chapter for now.Chapter 11 This chapter describes a number of different tools that are not really network troubleshooting or management tools but rather are tools that can ease your life as a network administrator. Youll want to read the sections in this chapter that discuss tools you arent already familiar with.Chapter 12 When dealing with a complex problem, no single tool is likely to meet all your needs. This last chapter attempts to show how the different tools can be used together to troubleshoot and analyze performance. No new tools are introduced in this chapter. Arguably, this chapter should have come at the beginning of the book. I included it at the end so that I could name specific tools without too many forward references. If you are familiar with general troubleshooting techniques, you can safely skip this chapter. Alternately, if you need a quick review of troubleshooting techniques and dont mind references to tools you arent familiar with, you might jump ahead to this chapter.Appendix A This appendix begins with a brief discussion of installing software and general software sources. This discussion is followed by an alphabetical listing of those tools mentioned in this book, with Internet addresses when feasible. Beware, many of the URLs in this section will be out of date by the time you read this. Nonetheless, these URLs will at least give you a starting point on where to begin looking.Appendix B This appendix begins with a discussion of different sources of information. Next, it discusses books by topic, followed by an alphabetical listing of those books mentioned in this book. viii
  • 9. ConventionsThis book uses the following typographical conventions:Italics For program names, filenames, system names, email addresses, and URLs and for emphasizing new terms when first definedConstant width In examples showing the output from programs, the contents of files, or literal informationConstant-width italics General syntax and items that should be replaced in expressions Indicates a tip, suggestion, or general note. Indicates a warning or caution.AcknowledgmentsThis book would not have been possible without the help of many people. First on the list are thetoolsmiths who created the tools described here. The number and quality of the tools that are availableis truly remarkable. We all owe a considerable debt to the people who selflessly develop these tools.I have been very fortunate that many of my normal duties have overlapped significantly with tasksrelated to writing this book. These duties have included setting up and operating Lander Universitysnetworking laboratory and evaluating tools for use in teaching. For their help with the laboratory, Igratefully acknowledge Landers Department of Computing Services, particularly Anthony Aven,Mike Henderson, and Bill Screws. This laboratory was funded in part by a National ScienceFoundation grant, DUE-9980366. I gratefully acknowledge the support the National ScienceFoundation has given to Lander. I have also benefited from conversations with the students andfaculty at Lander, particularly Jim Crabtree. I would never have gotten started on this project withoutthe help and encouragement of Jerry Wilson. Jerry, I owe you lunch (and a lot more).This book has benefited from the help of numerous people within the OReilly organization. Inparticular, the support given by Robert Denn, Mike Loukides, and Rob Romano, to name only a few,has been exceptional. After talking with authors working with other publishers, I consider myself veryfortunate in working with technically astute people from the start. If you are thinking about writing atechnical book, OReilly is a publisher to consider. ix
  • 10. The reviewers for this book have done an outstanding job. Thanks go to John Archie, Anthony Aven,Jon Forrest, and Kevin and Diana Mullet. They cannot be faulted for not turning a sows ear into a silkpurse.It seems every author always acknowledges his or her family. It has almost become a cliché, but thatdoesnt make it any less true. This book would not have been possible without the support andpatience of my family, who have endured more that I should have ever asked them to endure. Thankyou. x
  • 11. Chapter 1. Network Management and TroubleshootingThe first step in diagnosing a network problem is to collect information. This includes collectinginformation from your users as to the nature of the problems they are having, and it includes collectingdata from your network. Your success will depend, in large part, on your efficiency in collecting thisinformation and on the quality of the information you collect. This book is about tools you can use andtechniques and strategies to optimize their use. Rather than trying to cover all aspects oftroubleshooting, this book focuses on this first crucial step, data collection.There is an extraordinary variety of tools available for this purpose, and more become available daily.Very capable people are selflessly devoting enormous amounts of time and effort to developing thesetools. We all owe a tremendous debt to these individuals. But with the variety of tools available, it iseasy to be overwhelmed. Fortunately, while the number of tools is large, data collection need not beoverwhelming. A small number of tools can be used to solve most problems. This book centers on acore set of freely available tools, with pointers to additional tools that might be needed in somecircumstances.This first chapter has two goals. Although general troubleshooting is not the focus of the book, itseems worthwhile to quickly review troubleshooting techniques. This review is followed by anexamination of troubleshooting from a broader administrative context—using troubleshooting tools inan effective, productive, and responsible manner. This part of the chapter includes a discussion of Ydocumentation practices, personnel management and professionalism, legal and ethical concerns, and FLeconomic considerations. General troubleshooting is revisited in Chapter 12, once we have discussedavailable tools. If you are already familiar with these topics, you may want to skim or even skip thischapter. AM TE1.1 General Approaches to TroubleshootingTroubleshooting is a complex process that is best learned through experience. This section looksbriefly at how troubleshooting is done in order to see how these tools fit into the process. But whileevery problem is different, a key step is collecting information.Clearly, the best way to approach troubleshooting is to avoid it. If you never have problems, you willhave nothing to correct. Sound engineering practices, redundancy, documentation, and training canhelp. But regardless of how well engineered your system is, things break. You can avoidtroubleshooting, but you cant escape it.It may seem unnecessary to say, but go for the quick fixes first. As long as you dont fixate on them,they wont take long. Often the first thing to try is resetting the system. Many problems can beresolved in this way. Bit rot, cosmic rays, or the alignment of the planets may result in the systementering some strange state from which it cant exit. If the problem really is a fluke, resetting thesystem may resolve the problem, and you may never see it again. This may not seem very satisfying,but you can take your satisfaction in going home on time instead.Keep in mind that there are several different levels in resetting a system. For software, you can simplyrestart the program, or you may be able to send a signal to the program so that it reloads itsinitialization file. From your users perspective, this is the least disruptive approach. Alternately, you 1 Team-Fly®
  • 12. might restart the operating system but without cycling the power, i.e., do a warm reboot. Finally, youmight try a cold reboot by cycling the power.You should be aware, however, that there can be some dangers in resetting a system. For example, itis possible to inadvertently make changes to a system so that it cant reboot. If you realize you havedone this in time, you can correct the problem. Once you have shut down the system, it may be toolate. If you dont have a backup boot disk, you will have to rebuild the system. These are, fortunately,rare circumstances and usually happen only when you have been making major changes to a system.When making changes to a system, remember that scheduled maintenance may involve restarting asystem. You may want to test changes you have made, including their impact on a system reset, priorto such maintenance to ensure that there are no problems. Otherwise, the system may fail whenrestarted during the scheduled maintenance. If this happens, you will be faced with the difficult task ofdeciding which of several different changes are causing problems.Resetting the system is certainly worth trying once. Doing it more than once is a different matter. Withsome systems, this becomes a way of life. An operating system that doesnt provide adequate memoryprotection will frequently become wedged so that rebooting is the only option.[1] Sometimes you maywant to limp along resetting the system occasionally rather than dealing with the problem. In auniversity setting, this might get you through exam week to a time when you can be more relaxed inyour efforts to correct the underlying problem. Or, if the system is to be replaced in the near future,the effort may not be justified. Usually, however, when rebooting becomes a way of life, it is time formore decisive action. [1] Do you know what operating system Im tactfully not naming?Swapping components and reinstalling software is often the next thing to try. If you have the sparecomponents, this can often resolve problems immediately. Even if you dont have spares, switchingcomponents to see if the problem follows the equipment can be a simple first test. Reinstallingsoftware can be much more problematic. This can often result in configuration errors that will worsenproblems. The old, installed version of the software can make getting a new, clean installationimpossible. But if the install is simple or you have a clear understanding of exactly how to configurethe software, this can be a relatively quick fix.While these approaches often work, they arent what we usually think of as troubleshooting. Youcertainly dont need the tools described in this book to do them. Once you have exhausted the quicksolutions, it is time to get serious. First, you must understand the problem, if possible. Problems thatare not understood are usually not fixed, just postponed.One standard admonition is to ask the question "has anything changed recently?" Overwhelmingly,most problems relate to changes to a working system. If you can temporarily change things back andthe problem goes away, you have confirmed your diagnosis.Admittedly, this may not help with an installation where everything is new. But even a newinstallation can and should be grown. Pieces can be installed and tested. New pieces of equipment canthen be added incrementally. When this approach is taken, the question of what has changed onceagain makes sense.Another admonition is to change only one thing at a time and then to test thoroughly after each change.This is certainly good advice when dealing with routine failures. But this approach will not apply ifyou are dealing with a system failure. (See the upcoming sidebar on system failures.) Also, if you dofind something that you know is wrong but fixing it doesnt fix your problem, do you really want to 2
  • 13. change it back? In this case, it is often better to make a note of the additional changes you have madeand then proceed with your troubleshooting.A key element to successful debugging is to control the focus of your investigation so that you arereally dealing with the problem. You can usually focus better if you can break the problem into pieces.Swapping components, as mentioned previously, is an example of this approach. This technique isknown by several names—problem decomposition, divide and conquer, binary search, and so on. Thisapproach is applicable to all kinds of troubleshooting. For example, when your car wont start, firstdecide whether you have an electrical or fuel supply problem. Then proceed accordingly. Chapter 12outlines a series of specific steps you might want to consider. System FailuresThe troubleshooting I have described so far can be seen roughly as dealing with normalfailures (although there may be nothing terribly normal about them). A second general classof problems is known as system failures. System failures are problems that stem from theinteraction of the parts of a complex system in unexpected ways. They are most often seenwhen two or more subsystems fail at about the same time and in ways that interact.However, system failures can result through interaction of subsystems without anyostensible failure in any of the subsystems.A classic example of a system failure can be seen in the movie China Syndrome. In onescene the reactor scrams, the pumps shut down, and the water-level indicator on a strip-chart recorder sticks. The water level in the reactor becomes dangerously low due to thepump shutdown, but the problem is not recognized because the indicator gives misleadinginformation. These two near-simultaneous failures conceal the true state of the reactor.System failures are most pernicious in systems with tight coupling between subsystems andsubsystems that are linked in nonlinear or nonobvious ways. Debugging a system failurecan be extremely difficult. Many of the more standard approaches simply dont work. Thestrategy of decomposing the system into subsystems becomes difficult, because thesymptoms misdirect your efforts. Moreover, in extreme cases, each subsystem may beoperating correctly—the problem stems entirely from the unexpected interactions.If you suspect you have a system failure, the best approach, when feasible, is to substituteentire subsystems. Your goal should not be to look for a restored functioning system, but tolook for changes in the symptoms. Such changes indicate that you may have found one ofthe subsystems involved. (Conversely, if you are working with a problem and the symptomschange when a subsystem is replaced, this is strong indication of a system failure.)Unfortunately, if the problem stems from unexpected interaction of nonfailing systems,even this approach will not work. These are extremely difficult problems to diagnose. Eachproblem must be treated as a unique, special problem. But again, an important first step iscollecting information.1.2 Need for Troubleshooting Tools 3
  • 14. The best time to prepare for problems is before you have them. It may sound trite, but if you dontunderstand the normal behavior of your network, you will not be able to identify anomalous behavior.For the proper management of your system, you must have a clear understanding of the currentbehavior and performance of your system. If you dont know the kinds of traffic, the bottlenecks, orthe growth patterns for your network, then you will not be able to develop sensible plans. If you dontknow the normal behavior, you will not be able to recognize a problems symptoms when you seethem. Unless you have made a conscious, aggressive effort to understand your system, you probablydont understand it. All networks contain surprises, even for the experienced administrator. You onlyhave to look a little harder.It might seem strange to some that a network administrator would need some of the tools described inthis book, and that he wouldnt already know the details that some of these tools provide. But there area number of reasons why an administrator may be quite ignorant of his network.With the rapid growth of the Internet, turnkey systems seem to have grown in popularity. Afundamental assumption of these systems is that they are managed by an inexperienced administratoror an administrator who doesnt want to be bothered by the details of the system. Documentation isalmost always minimal. For example, early versions of Sun Microsystems Netra Internet servers, bydefault, did not install the Unix manpages and came with only a few small manuals. Print serviceswere disabled by default.This is not a condemnation of turnkey systems. They can be a real blessing to someone who needs togo online quickly, someone who never wants to be bothered by such details, or someone who canoutsource the management of her system. But if at some later time she wants to know what herturnkey system is doing, it may be up to her to discover that for herself. This is particularly likely ifshe ever wants to go beyond the basic services provided by the system or if she starts having problems.Other nonturnkey systems may be customized, often heavily. Of course, all these changes should becarefully documented. However, an administrator may inherit a poorly documented system. (And, ofcourse, sometimes we do this to ourselves.) If you find yourself in this situation, you will need todiscover (or rediscover) your system for yourself.In many organizations, responsibilities may be highly partitioned. One group may be responsible forinfrastructure such as wiring, another for network hardware, and yet another for software. In someenvironments, particularly universities, networks may be a distributed responsibility. You may havevery little control, if any, over what is connected to the network. This isnt necessarily bad—its theway universities work. But rogue systems on your network can have annoying consequences. In thissituation, probably the best approach is to talk to the system administrator or user responsible for thesystem. Often he will be only too happy to discuss his configuration. The implications of what he isdoing may have completely escaped him. Developing a good relationship with power users may giveyou an extra set of eyes on your network. And, it is easier to rely on the system administrator to tellyou what he is doing than to repeatedly probe the network to discover changes. But if this fails, as itsometimes does, you may have to resort to collecting the data yourself.Sometimes there may be some unexpected, unauthorized, or even covert changes to your network.Well-meaning individuals can create problems when they try to help you out by installing equipmentthemselves. For example, someone might try installing a new computer on the network by copying thenetwork configuration from another machine, including its IP address. At other times, some "volunteeradministrator" simply has her own plans for your network.Finally, almost to a person, network administrators must teach themselves as they go. Consequently,for most administrators, these tools have an educational value as well as an administrative value. They 4
  • 15. provide a way for administrators to learn more about their networks. For example, protocol analyzerslike ethereal provide an excellent way to learn the inner workings of a protocol like TCP/IP. Often,more than one of these reasons may apply. Whatever the reason, it is not unusual to find yourselfreading your configuration files and probing your systems.1.3 Troubleshooting and ManagementTroubleshooting does not exist in isolation from network management. How you manage yournetwork will determine in large part how you deal with problems. A proactive approach tomanagement can greatly simplify problem resolution. The remainder of this chapter describes severalimportant management issues. Coming to terms with these issues should, in the long run, make yourlife easier.1.3.1 DocumentationAs a new administrator, your first step is to assess your existing resources and begin creating newresources. Software sources, including the tools discussed in this book, are described and listed inAppendix A. Other sources of information are described in Appendix B.The most important source of information is the local documentation created by you or yourpredecessor. In a properly maintained network, there should be some kind of log about the network,preferably with sections for each device. In many networks, this will be in an abysmal state. Almostno one likes documenting or thinks he has the time required to do it. It will be full of errors, out ofdate, and incomplete. Local documentation should always be read with a healthy degree of skepticism.But even incomplete, erroneous documentation, if treated as such, may be of value. There areprobably no intentional errors, just careless mistakes and errors of omission. Even flaweddocumentation can give you some sense of the history of the system. Problems frequently occur due tomultiple conflicting changes to a system. Software that may have been only partially removed canhave lingering effects. Homegrown documentation may be the quickest way to discover what mayhave been on the system.While the creation and maintenance of documentation may once have been someone elsesresponsibility, it is now your responsibility. If you are not happy with the current state of yourdocumentation, it is up to you to update it and adopt policies so the next administrator will not bemuttering about you the way you are muttering about your predecessors.There are a couple of sets of standard documentation that, at a minimum, you will always want tokeep. One is purchase information, the other a change log. Purchase information includes salesinformation, licenses, warranties, service contracts, and related information such as serial numbers. Aninventory of equipment, software, and documentation can be very helpful. When you unpack a system,you might keep a list of everything you receive and date all documentation and software. (Achangeable rubber date stamp and ink pad can help with this last task.) Manufacturers can do a poorjob of distinguishing one version of software and its documentation from the next. Dates can behelpful in deciding which version of the documentation applies when you have multiple systems orupgrades. Documentation has a way of ending up in someones personal library, never to be seen again,so a list of what you should have can be very helpful at times.Keep in mind, there are a number of ways software can enter your system other than through purchaseorders. Some software comes through CD-ROM subscription services, some comes in over the 5
  • 16. Internet, some is bundled with the operating system, some comes in on a CD-ROM in the back of abook, some is brought from home, and so forth. Ideally, you should have some mechanism to tracksoftware. For example, for downloads from the Internet, be sure to keep a log including a listidentifying filenames, dates, and sources.You should also keep a change log for each major system. Record every significant change or problemyou have with the system. Each entry should be dated. Even if some entries no longer seem relevant,you should keep them in your log. For instance, if you have installed and later removed a piece ofsoftware on a server, there may be lingering configuration changes that you are not aware of that maycome to haunt you years later. This is particularly true if you try to reinstall the program but couldeven be true for a new program as well.Beyond these two basic sets of documentation, you can divide the documentation you need to keepinto two general categories—configuration documentation and process documentation. Configurationdocumentation statically describes a system. It assumes that the steps involved in setting up the systemare well understood and need no further comments, i.e., that configuration information is sufficient toreconfigure or reconstruct the system. This kind of information can usually be collected at any time.Ironically, for that reason, it can become so easy to put off that it is never done.Process documentation describes the steps involved in setting up a device, installing software, orresolving a problem. As such, it is best written while you are doing the task. This creates a differentset of collection problems. Here the stress from the task at hand often prevents you from documentingthe process.The first question you must ask is what you want to keep. This may depend on the circumstances andwhich tools you are using. Static configuration information might include lists of IP addresses andEthernet addresses, network maps, copies of server configuration files, switch configuration settingssuch as VLAN partitioning by ports, and so on.When dealing with a single device, the best approach is probably just a simple copy of theconfiguration. This can be either printed or saved as a disk file. This will be a personal choice basedon which you think is easiest to manage. You dont need to waste time prettying this up, but be sureyou label and date it.When the information spans multiple systems, such as a list of IP addresses, management of the databecomes more difficult. Fortunately, much of this information can be collected automatically. Severaltools that ease the process are described in subsequent chapters, particularly in Chapter 6.For process documentation, the best approach is to log and annotate the changes as you make themand then reconstruct the process at a later time. Chapter 11 describes some of the common Unixutilities you can use to automate documentation. You might refer to this chapter if you arent familiarwith utilities like tee, script, and xwd.[2] [2] Admittedly these guidelines are ideals. Does anyone actually do all of this documenting? Yes, while most administrators probably dont, some do. But just because many administrators dont succeed in meeting the ideal doesnt diminish the importance of trying.1.3.2 Management PracticesA fundamental assumption of this book is that troubleshooting should be proactive. It is preferable toavoid a problem than have to correct it. Proper management practices can help. While some of thissection may, at first glance, seem unrelated to troubleshooting, there are fundamental connections. 6
  • 17. Management practices will determine what you can do and how you do it. This is true both foravoiding problems and for dealing with problems that cant be avoided. The remainder of this chapterreviews some of the more important management issues.1.3.2.1 ProfessionalismTo effectively administer a system requires a high degree of professionalism. This includes personalhonesty and ethical behavior. You should learn to evaluate yourself in an honest, objective manner.(See The Peter Principle Revisited.) It also requires that you conform to the organizations mission andculture. Your network serves some higher purpose within your organization. It does not exist strictlyfor your benefit. You should manage the network with this in mind. This means that everything youdo should be done from the perspective of a cost-benefit trade-off. It is too easy to get caught in thetrap of doing something "the right way" at a higher cost than the benefits justify. Performance analysisis the key element.The organizations mind-set or culture will have a tremendous impact on how you approach problemsin general and the use of tools in particular. It will determine which tools you can use, how you canuse the tools, and, most important, what you can do with the information you obtain. Withinorganizations, there is often a battle between openness and secrecy. The secrecy advocate believes thatdetails of the network should be available only on a need-to-know basis, if then. She believes, notwithout justification, that this enhances security. The openness advocate believes that the details of asystem should be open and available. This allows users to adapt and make optimal use of the systemand provides a review process, giving users more input into the operation of the network.Taken to an extreme, the secrecy advocate will suppress information that is needed by the user,making a system or network virtually unusable. Openness, taken to an extreme, will leave a networkvulnerable to attack. Most peoples views fall somewhere between these two extremes but often favorone position over the other. I advocate prudent openness. In most situations, it makes no sense to shutdown a system because it might be attacked. And it is asinine not to provide users with the informationthey need to protect themselves. Openness among those responsible for the different systems within anorganization is absolutely essential.1.3.2.2 Ego managementWe would all like to think that we are irreplaceable, and that no one else could do our jobs as well aswe do. This is human nature. Unfortunately, some people take steps to make sure this is true. Themost obvious way an administrator may do this is hide what he actually does and how his systemworks.This can be done many ways. Failing to document the system is one approach—leaving comments outof code or configuration files is common. The goal of such an administrator is to make sure he is theonly one who truly understands the system. He may try to limit others access to a system by restrictingaccounts or access to passwords. (This can be done to hide other types of unprofessional activities aswell. If an administrator occasionally reads other users email, he may not want anyone else to havestandard accounts on the email server. If he is overspending on equipment to gain experience withnew technologies, he will not want any technically literate people knowing what equipment he isbuying.)This behavior is usually well disguised, but it is extremely common. For example, a technician mayinsist on doing tasks that users could or should be doing. The problem is that this keeps usersdependent on the technician when it isnt necessary. This can seem very helpful or friendly on the 7
  • 18. surface. But, if you repeatedly ask for details and dont get them, there may be more to it than meetsthe eye.Common justifications are security and privacy. Unless you are in a management position, there isoften little you can do other than accept the explanations given. But if you are in a managementposition, are technically competent, and still hear these excuses from your employees, beware! Youhave a serious problem.No one knows everything. Whenever information is suppressed, you lose input from individuals whodont have the information. If an employee cant control her ego, she should not be turned loose onyour network with the tools described in this book. She will not share what she learns. She will onlyuse it to further entrench herself.The problem is basically a personnel problem and must be dealt with as such. Individuals in technicalareas seem particularly prone to these problems. It may stem from enlarged egos or from insecurity.Many people are drawn to technical areas as a way to seem special. Alternately, an administrator maysee information as a source of power or even a weapon. He may feel that if he shares the information,he will lose his leverage. Often individuals may not even recognize the behavior in themselves. It isjust the way they have always done things and it is the way that feels right.If you are a manager, you should deal with this problem immediately. If you cant correct the problemin short order, you should probably replace the employee. An irreplaceable employee today will beeven more irreplaceable tomorrow. Sooner or later, everyone leaves—finds a better job, retires, orruns off to Poughkeepsie with an exotic dancer. In the meantime, such a person only becomes moreentrenched making the eventual departure more painful. It will be better to deal with the problem nowrather than later.1.3.2.3 Legal and ethical considerationsFrom the perspective of tools, you must ensure that you use tools in a manner that conforms not just tothe policies of your organization, but to all applicable laws as well. The tools I describe in this bookcan be abused, particularly in the realm of privacy. Before using them, you should make certain thatyour use is consistent with the policies of your organization and all applicable laws. Do you have theappropriate permission to use the tools? This will depend greatly on your role within the organization.Do not assume that just because you have access to tools that you are authorized to use them. Norshould you assume that any authorization you have is unlimited.Packet capture software is a prime example. It allows you to examine every packet that travels acrossa link, including applications data and each and every header. Unless data is encrypted, it can bedecoded. This means that passwords can be captured and email can be read. For this reason alone, youshould be very circumspect in how you use such tools.A key consideration is the legality of collecting such information. Unfortunately, there is a constantlychanging legal morass with respect to privacy in particular and technology in general. Collecting somedata may be legitimate in some circumstances but illegal in others.[3] This depends on factors such asthe nature of your operations, what published policies you have, what assurances you have given yourusers, new and existing laws, and what interpretations the courts give to these laws. [3] As an example, see the CERT Advisory CA-92.19 Topic: Keystroke Logging Banner at http://www.cert.org/advisories/CA-1992-19.html for a discussion on keystroke logging and its legal implications. 8
  • 19. It is impossible for a book like this to provide a definitive answer to the questions such considerationsraise. I can, however, offer four pieces of advice: • First, if the information you are collecting can be tied to the activities of an individual, you should consider the information highly confidential and should collect only the information that you really need. Be aware that even seemingly innocent information may be sensitive in some contexts. For example, source/destination address pairs may reveal communications between individuals that they would prefer not be made public. • Second, place your users on notice. Let them know that you collect such information, why it is necessary, and how you use the information. Remember, however, if you give your users assurances as to how the information is used, you are then constrained by those assurances. If your management policies permit, make their prior acceptance of these policies a requirement for using the system. • Third, you must realize that with monitoring comes obligations. In many instances, your legal culpability may be less if you dont monitor. • Finally, dont rely on this book or what your colleagues say. Get legal advice from a lawyer who specializes in this area. Beware: many lawyers will not like to admit that they dont know everything about the law, but many arent current with the new laws relating to technology. Also, keep in mind that even if what you are doing is strictly legal and you have appropriate authority, your actions may still not be ethical. The Peter Principle RevisitedIn 1969, Laurence Peter and Raymond Hull published the satirical book, The PeterPrinciple. The premise of the book was that people rise to their level of incompetence. Forexample, a talented high school teacher might be promoted to principal, a job requiring aquite different set of skills. Even if ill suited for the job, once she has this job, she willprobably remain with it. She just wont earn any new promotions. However, if she is adeptat the job, she may be promoted to district superintendent, a job requiring yet another set ofskills. The process of promotions will continue until she reaches her level of incompetence.At that point, she will spend the remainder of her career at that level.While hardly a rigorous sociological principle, the book was well received because itcontained a strong element of truth. In my humble opinion, the Peter Principle usually failsmiserably when applied to technical areas such as networking and telecommunications. Theproblem is the difficulty in recognizing incompetence. If incompetence is not recognized,then an individual may rise well beyond his level of incompetence. This often happens intechnical areas because there is no one in management who can judge an individualstechnical competence.Arguably, unrecognized incompetence is usually overengineering. Networking, a field ofengineering, is always concerned with trade-offs between costs and benefits. Anunderengineered network that fails will not go unnoticed. But an overengineered networkwill rarely be recognizable as such. Such networks may cost many times what they should,drawing resources from other needs. But to the uninitiated, it appears as a normal,functioning network.If a network engineer really wants the latest in new equipment when it isnt needed, who,outside of the technical personnel, will know? If this is a one-person department, or if all themembers of the department can agree on what they want, no one else may ever know. It is 9
  • 20. too easy to come up with some technical mumbo jumbo if they are ever questioned.If this seems far-fetched, I once attended a meeting where a young engineer was arguingthat a particular router needed to be replaced before it became a bottleneck. He had pickedout the ideal replacement, a hot new box that had just hit the market. The problem with allthis was that I had recently taken measurements on the router and knew the averageutilization of that "bottleneck" was less than 5% with peaks that rarely hit 40%.This is an extreme example of why collecting information is the essential first step innetwork management and troubleshooting. Without accurate measurements, you can easilyspend money fixing imaginary problems.1.3.2.4 Economic considerationsSolutions to problems have economic consequences, so you must understand the economicimplications of what you do. Knowing how to balance the cost of the time used to repair a systemagainst the cost of replacing a system is an obvious example. Cost management is a more generalissue that has important implications when dealing with failures.One particularly difficult task for many system administrators is to come to terms with the economicsof networking. As long as everything is running smoothly, the next biggest issue to upper managementwill be how cost effectively you are doing your job. Unless you have unlimited resources, when youoverspend in one area, you take resources from another area. One definition of an engineer that Iparticularly like is that "an engineer is someone who can do for a dime what a fool can do for adollar." My best guess is that overspending and buying needlessly complex systems is the single mostcommon engineering mistake made when novice network administrators purchase network equipment.One problem is that some traditional economic models do not apply in networking. In mostengineering projects, incremental costs are less than the initial per-unit cost. For example, if a 10,000-square-foot building costs $1 million, a 15,000-square-foot building will cost somewhat less than $1.5million. It may make sense to buy additional footage even if you dont need it right away. This isjustified as "buying for the future."This kind of reasoning, when applied to computers and networking, leads to waste. Almost no onewould go ahead and buy a computer now if they wont need it until next year. Youll be able to buy abetter computer for less if you wait until you need it. Unfortunately, this same reasoning isnt appliedwhen buying network equipment. People will often buy higher-bandwidth equipment than they need,arguing that they are preparing for the future, when it would be much more economical to buy onlywhat is needed now and buy again in the future as needed.Moores Law lies at the heart of the matter. Around 1965, Gordon Moore, one of the founders of Intel,made the empirical observation that the density of integrated circuits was doubling about every 12months, which he later revised to 24 months. Since the cost of manufacturing integrated circuits isrelatively flat, this implies that, in two years, a circuit can be built with twice the functionality with noincrease in cost. And, because distances are halved, the circuit runs at twice the speed—a fourfoldimprovement. Since the doubling applies to previous doublings, we have exponential growth.It is generally estimated that this exponential growth with chips will go on for another 15 to 20 years.In fact, this growth is nothing new. Raymond Kurzweil, in The Age of Spiritual Machines: WhenComputers Exceed Human Intelligence, collected information on computing speeds and functionalityfrom the beginning of the twentieth century to the present. This covers mechanical, electromechanical 10
  • 21. (relay), vacuum tube, discrete transistor, and integrated circuit technologies. Kurzweil found thatexponential growth has been the norm for the last hundred years. He believes that new technologieswill be developed that will extend this rate of growth well beyond the next 20 years. It is certainly truethat we have seen even faster growth in disk densities and fiber-optic capacity in recent years, neitherof which can be attributed to semiconductor technology.What does this mean economically? Clearly, if you wait, you can buy more for less. But usually,waiting isnt an option. The real question is how far into the future should you invest? If the price iscoming down, should you repeatedly buy for the short term or should you "invest" in the long term?The general answer is easy to see if we look at a few numbers. Suppose that $100,000 will provideyou with network equipment that will meet your anticipated bandwidth needs for the next four years.A simpleminded application of Moores Law would say that you could wait and buy similarequipment for $25,000 in two years. Of course, such a system would have a useful life of only twoadditional years, not the original four. So, how much would it cost to buy just enough equipment tomake it through the next two years? Following the same reasoning, about $25,000. If your growth istracking the growth of technology,[4] then two years ago it would have cost $100,000 to buy four yearsworth of technology. That will have fallen to about $25,000 today. Your choice: $100,000 now or$25,000 now and $25,000 in two years. This is something of a no-brainer. It is summarized in the firsttwo lines of Table 1-1.[4] This is a pretty big if, but its reasonable for most users and organizations. Most users and organizationshave selected a point in the scheme of things that seems right for them—usually the latest technology they Ycan reasonably afford. This is why that new computer you buy always seems to cost $2500. You are buying FLthe latest in technology, and you are trying to reach about the same distance into the future. Table 1-1. Cost estimates AM Year 1 Year 2 Year 3 Year 4 TotalFour-year plan $100,000 $0 $0 $0 $100,000Two-year plan $25,000 $0 $25,000 $0 $50,000 TEFour-year plan with maintenance $112,000 $12,000 $12,000 $12,000 $148,000Two-year plan with maintenance $28,000 $3,000 $28,000 $3,000 $62,000Four-year plan with maintenance and 20% MARR $112,000 $10,000 $8,300 $6,900 $137, 200Two-year plan with maintenance and 20% MARR $28,000 $2,500 $19,500 $1,700 $51,700If this argument isnt compelling enough, there is the issue of maintenance. As a general rule of thumb,service contracts on equipment cost about 1% of the purchase price per month. For $100,000, that is$12,000 a year. For $25,000, this is $3,000 per year. Moores Law doesnt apply to maintenance forseveral reasons: • A major part of maintenance is labor costs and these, if anything, will go up. • The replacement parts will be based on older technology and older (and higher) prices. • The mechanical parts of older systems, e.g., fans, connectors, and so on, are all more likely to fail. • There is more money to be made selling new equipment so there is no incentive to lower maintenance prices.Thus, the $12,000 a year for maintenance on a $100,000 system will cost $12,000 a year for all fouryears. The third and fourth lines of Table 1-1 summarize these numbers. 11 Team-Fly®
  • 22. Yet another consideration is the time value of money. If you dont need the $25,000 until two yearsfrom now, you can invest a smaller amount now and expect to have enough to cover the costs later. Sothe $25,000 needed in two years is really somewhat less in terms of todays dollars. How much lessdepends on the rate of return you can expect on investments. For most organizations, this number iscalled the minimal acceptable rate of return (MARR). The last two lines of Table 1-1 use a MARR of20%. This may seem high, but it is not an unusual number. As you can see, buying for the future ismore than two and a half times as expensive as going for the quick fix.Of course, all this is a gross simplification. There are a number of other important considerations evenif you believe these numbers. First and foremost, Moores Law doesnt always apply. The mostimportant exception is infrastructure. It is not going to get any cheaper to pull cable. You should takethe time to do infrastructure well; thats where you really should invest in the future.Most of the other considerations seem to favor short-term investing. First, with short-term purchasing,you are less likely to invest in dead-end technology since you are buying later in the life cycle and willhave a clearer picture of where the industry is going. For example, think about the difference twoyears might have made in choosing between Fast Ethernet and ATM for some organizations. For thesame reason, the cost of training should be lower. You will be dealing with more familiar technology,and there will be more resources available. You will have to purchase and install equipment moreoften, but the equipment you replace can be reused in your networks periphery, providing additionalsavings.On the downside, the equipment you buy wont have a lot of excess capacity or a very long, usefullifetime. It can be very disconcerting to nontechnical management when you keep replacingequipment. And, if you experience sudden unexpected growth, this is exactly what you will need to do.Take the time to educate upper management. If frequent changes to your equipment are particularlydisruptive or if you have funding now, you may need to consider long-term purchases even if they aremore expensive. Finally, dont take the two-year time frame presented here too literally. Youlldiscover the appropriate time frame for your network only with experience.Other problems come when comparing plans. You must consider the total economic picture. Dontlook just at the initial costs, but consider ongoing costs such as maintenance and the cost of periodicreplacement. As an example, consider the following plans. Plan A has an estimated initial cost of$400,000, all for equipment. Plan B requires $150,000 for equipment and $450,000 for infrastructureupgrades. If you consider only initial costs, Plan A seems to be $200,000 cheaper. But equipmentneeds to be maintained and, periodically, replaced. At 1% per month, the equipment for Plan A wouldcost $48,000 a year to maintain, compared to $18,000 per year with Plan B. If you replace equipmenta couple of times in the next decade, that will be an additional $800,000 for Plan A but only $300,000for Plan B. As this quick, back-of-the-envelope calculation shows, the 10-year cost for Plan A was$1.68 million, while only $1.08 million for Plan B. What appeared to be $200,000 cheaper was really$600,000 more expensive. Of course, this was a very crude example, but it should convey the idea.You shouldnt take this example too literally either. Every situation is different. In particular, you maynot be comfortable deciding what is adequate surplus capacity in your network. In general, however,you are probably much better off thinking in terms of scalability than raw capacity. If you want tohedge your bets, you can make sure that high-speed interfaces are available for the router you areconsidering without actually buying those high-speed interfaces until needed.How does this relate to troubleshooting? First, dont buy overly complex systems you dont really need.They will be much harder to maintain, as you can expect the complexity of troubleshooting to growwith the complexity of the systems you buy. Second, dont spend all your money on the system and 12
  • 23. forget ongoing maintenance costs. If you dont anticipate operational costs, you may not have thefunds you need. 13
  • 24. Chapter 2. Host ConfigurationsThe goal of this chapter is to review system administration from the perspective of the individual hostson a network. This chapter presumes that you have a basic understanding of system administration.Consequently, many of the more basic issues are presented in a very cursory manner. The intent ismore to jog your memory, or to fill an occasional gap, than to teach the fundamentals of systemadministration. If you are new to system administration, a number of the books listed in Appendix Bprovide excellent introductions. If, on the other hand, you are a knowledgeable system administrator,you will probably want to skim or even skip this chapter.Chapter 1 lists several reasons why you might not know the details of your network and the computerson it. This chapter assumes that you are faced with a networked computer and need to determine orreconstruct its configuration. It should be obvious that if you dont understand how a system isconfigured, you will not be able to change its configuration or correct misconfigurations. The toolsdescribed in this chapter can be used to discover or change a hosts configuration.As discussed in Chapter 1, if you have documentation for the system, begin with it. The assumptionhere is that such documentation does not exist or that it is incomplete. The primary focus is networkconfiguration, but many of the techniques can easily be generalized.If you have inherited a multiuser system that has been in service for several years with manyundocumented customizations, reconstructing its configuration can be an extremely involved andextended process. If your system has been compromised, the intruder has taken steps to hide heractivity, and you arent running an integrity checker like tripwire, it may be virtually impossible todiscover all her customizations. (tripwire is discussed briefly in Chapter 11.) While it may not befeasible, you should at least consider reinstalling the system from scratch. While this may seemdraconian, it may ultimately be much less work than fighting the same battles over and over, as oftenhappens with compromised systems. The best way to do this is to set up a replacement system inparallel and then move everyone over. This, of course, requires a second system.If rebuilding the system is not feasible, or if your situation isnt as extreme as that just described, thenyou can use the techniques described in this chapter to reconstruct the systems configuration.Whatever your original motivation, you should examine your systems configuration on a regular basis.If for no other reason, this will help you remember how your system is configured. But there are otherreasons as well. As you learn more, you will undoubtedly want to revisit your configuration to correctproblems, improve security, and optimize performance. Reviewing configurations is a necessary stepto ensure that your system hasnt been compromised. And, if you share management of a system, youmay be forced to examine the configuration whenever communications falter.Keep a set of notes for each system, giving both the configuration and directions for changing theconfiguration. Usually the best place to start is by constructing a list of what can be found where in thevendor documentation you have. This may seem pointless since this information is in thedocumentation. But the information you need will be spread throughout this documentation. Youwont want to plow through everything every time you need to check or change something. You mustcreate your own list. I frequently write key page numbers inside the front covers of manuals andspecifics in the margins throughout the manual. For example, Ill add device names to the manpagesfor the mount command, something I always seem to need but often cant remember. (Be warned thatthis has the disadvantage of tying manuals to specific hardware, which could create other problems.) 14
  • 25. When reconstructing a hosts configuration, there are two basic approaches. One is to examine thesystems configuration files. This can be a very protracted approach. It works well when you knowwhat you are looking for and when you are looking for a specific detail. But it can be difficult toimpossible to find all the details of the system, particularly if someone has taken steps to hide them.And some parameters are set dynamically and simply cant be discovered just from configuration files.The alternative is to use utilities designed to give snapshots of the current state of the system.Typically, these focus on one aspect of the system, for example, listing all open files. Collectively,these utilities can give you a fairly complete picture. They tend to be easy to use and give answersquickly. But, because they may focus on only one aspect of the system, they may not provide all theinformation you need if used in isolation.Clearly, by itself, neither approach is totally adequate. Where you start will depend in part on howquickly you must be up to speed and what specific problems you are facing. Each approach will bedescribed in turn.2.1 UtilitiesReviewing system configuration files is a necessary step that you will have to address before you canclaim mastery of a system. But this can be a very time-consuming step. It is very easy to overlook oneor more key files. If you are under time pressure to resolve a problem, configuration files are not thebest place to start.Even if you plan to jump into the configuration files, you will probably want a quick overview of thecurrent state of the system before you begin. For this reason, we will examine status and configurationutilities first. This approach has the advantage of being pretty much the same from one version ofUnix to the next. With configuration files, the differences among the various flavors of Unix can bestaggering. Even when the files have the same functionality and syntax, they can go by differentnames or be in different directories. Certainly, using these utilities is much simpler than looking atkernel configuration files. The output provided by these utilities may vary considerably from system to system and will depend heavily on which options are used. In practice, this should present no real problem. Dont be alarmed if the output on your system is formatted differently.2.1.1 psThe first thing any system administrator should do on a new system is run the ps command. You areprobably already familiar with ps so I wont spend much time on it. The ps command lists whichprocesses are running on the system. Here is an example:bsd4# ps -auxUSER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMANDroot 6590 22.0 2.1 924 616 ?? R 11:14AM 0:09.80 inetd: chargen [2root 1 0.0 0.6 496 168 ?? Ss Fri09AM 0:00.03 /sbin/init --root 2 0.0 0.0 0 0 ?? DL Fri09AM 0:00.52 (pagedaemon)root 3 0.0 0.0 0 0 ?? DL Fri09AM 0:00.00 (vmdaemon)root 4 0.0 0.0 0 0 ?? DL Fri09AM 0:44.05 (syncer) 15
  • 26. root 100 0.0 1.7 820 484 ?? Ss Fri09AM 0:02.14 syslogddaemon 109 0.0 1.5 828 436 ?? Is Fri09AM 0:00.02 /usr/sbin/portmaproot 141 0.0 2.1 924 616 ?? Ss Fri09AM 0:00.51 inetdroot 144 0.0 1.7 980 500 ?? Is Fri09AM 0:03.14 cronroot 150 0.0 2.8 1304 804 ?? Is Fri09AM 0:02.59 sendmail: acceptiroot 173 0.0 1.3 788 368 ?? Is Fri09AM 0:01.84 moused -p /dev/psroot 213 0.0 1.8 824 508 v1 Is+ Fri09AM 0:00.02 /usr/libexec/gettroot 214 0.0 1.8 824 508 v2 Is+ Fri09AM 0:00.02 /usr/libexec/gettroot 457 0.0 1.8 824 516 v0 Is+ Fri10AM 0:00.02 /usr/libexec/gettroot 6167 0.0 2.4 1108 712 ?? Ss 4:10AM 0:00.48 telnetdjsloan 6168 0.0 0.9 504 252 p0 Is 4:10AM 0:00.09 -sh (sh)root 6171 0.0 1.1 464 320 p0 S 4:10AM 0:00.14 -su (csh)root 0 0.0 0.0 0 0 ?? DLs Fri09AM 0:00.17 (swapper)root 6597 0.0 0.8 388 232 p0 R+ 11:15AM 0:00.00 ps -auxIn this example, the first and last columns are the most interesting since they give the owners and theprocesses, along with their arguments. In this example, the lines, and consequently the arguments,have been truncated, but this is easily avoided. Running processes of interest include portmap, inetd,sendmail, telnetd, and chargen.There are a number of options available to ps, although they vary from implementation toimplementation. In this example, run under FreeBSD, the parameters used were -aux. Thiscombination shows all users processes (-a), including those without controlling terminals (-x), inconsiderable detail (-u). The options -ax will provide fewer details but show more of the command-line arguments. Alternately, you can use the -w option to extend the displayed information to 132columns. With AT&T-derived systems, the options -ef do pretty much the same thing. Interestingly,Linux supports both sets of options. You will need to precede AT&T-style options with a hyphen.This isnt required for BSD options. You can do it either way with Solaris. /usr/bin/ps follows theAT&T conventions, while /usr/ucb/ps supports the BSD options.While ps quickly reveals individual processes, it gives a somewhat incomplete picture if interpretednaively. For example, the inetd daemon is one source of confusion. inetd is used to automatically startservices on a system as they are needed. Rather than start a separate process for each service thatmight eventually be run, the inetd daemon runs on their behalf. When a connection request arrives,inetd will start the requested service. Since some network services like ftp, telnet, and finger areusually started this way, ps will show processes for them only when they are currently running. If psdoesnt list them, it doesnt mean they arent available; they just arent currently running.For example, in the previous listing, chargen was started by inetd. We can see chargen in this instancebecause it was a running process when ps was run. But, this particular test system was configured torun a number of additional services via inetd (as determined by the /etc/inetd.conf configuration file).None of these other services show up under ps because, technically, they arent currently running. Yet,these other services will be started automatically by inetd, so they are available services.In addition to showing what is running, ps is a useful diagnostic tool. It quickly reveals defunctprocesses or multiple instances of the same process, thereby pointing out configuration problems andsimilar issues. %MEM and %CPU can tell you a lot about resource usage and can provide crucialinformation if you have resource starvation. Or you can use ps to identify rogue processes that arespawning other processes by looking at processes that share a common PPID. Once you arecomfortable with the usual uses, it is certainly worth revisiting ps periodically to learn more about itsother capabilities, as this brief discussion just scratches the surface of ps.2.1.2 top 16
  • 27. Although less ubiquitous, the top command, a useful alternative to ps, is available on many systems. Itwas written by William LeFebvre. When running, top gives a periodically updated listing of processesranked in order of CPU usage. Typically, only the top 10 processes are given, but this isimplementation dependent, and your implementation may let you select other values. Here is a singleinstance from our test system:15 processes: 2 running, 13 sleepingCPU states: 0.8% user, 0.0% nice, 7.4% system, 7.8% interrupt, 84.0% idleMem: 6676K Active, 12M Inact, 7120K Wired, 2568K Cache, 3395K Buf, 1228K FreeSwap: 100M Total, 100M Free PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND 6590 root 35 0 924K 616K RUN 0:15 21.20% 20.75% inetd 144 root 10 0 980K 500K nanslp 0:03 0.00% 0.00% cron 150 root 2 0 1304K 804K select 0:03 0.00% 0.00% sendmail 100 root 2 0 820K 484K select 0:02 0.00% 0.00% syslogd 173 root 2 0 788K 368K select 0:02 0.00% 0.00% moused 141 root 2 0 924K 616K select 0:01 0.00% 0.00% inetd 6167 root 2 0 1108K 712K select 0:00 0.00% 0.00% telnetd 6171 root 18 0 464K 320K pause 0:00 0.00% 0.00% csh 6168 jsloan 10 0 504K 252K wait 0:00 0.00% 0.00% sh 6598 root 28 0 1556K 844K RUN 0:00 0.00% 0.00% top 1 root 10 0 496K 168K wait 0:00 0.00% 0.00% init 457 root 3 0 824K 516K ttyin 0:00 0.00% 0.00% getty 214 root 3 0 824K 508K ttyin 0:00 0.00% 0.00% getty 213 root 3 0 824K 508K ttyin 0:00 0.00% 0.00% getty 109 daemon 2 0 828K 436K select 0:00 0.00% 0.00% portmapOutput is interrupted with a q or a Ctrl-C. Sometimes system administrators will leave top running onthe console when the console is not otherwise in use. Of course, this should be done only in aphysically secure setting.In a sense, ps is a more general top since it gives you all running processes. The advantage to top isthat it focuses your attention on resource hogs, and it provides a repetitive update. top has a largenumber of options and can provide a wide range of information. For more information, consult itsUnix manpage.[1] [1] Solaris users may want to look at process management utilities included in /usr/proc/bin.2.1.3 netstatOne of the most useful and diverse utilities is netstat. This program reports the contents of kernel datastructures related to networking. Because of the diversity in networking data structures, many ofnetstat s uses may seem somewhat unrelated, so we will be revisiting netstat at several points in thisbook.One use of netstat is to display the connections and services available on a host. For example, this isthe output for the system we just looked at:bsd4# netstat -aActive Internet connections (including servers)Proto Recv-Q Send-Q Local Address Foreign Address (state)tcp 0 0 bsd4.telnet 205.153.60.247.3473 TIME_WAITtcp 0 17458 bsd4.chargen sloan.1244 ESTABLISHEDtcp 0 0 *.chargen *.* LISTENtcp 0 0 *.discard *.* LISTEN 17
  • 28. tcp 0 0 *.echo *.* LISTENtcp 0 0 *.time *.* LISTENtcp 0 0 *.daytime *.* LISTENtcp 0 0 *.finger *.* LISTENtcp 0 2 bsd4.telnet sloan.1082 ESTABLISHEDtcp 0 0 *.smtp *.* LISTENtcp 0 0 *.login *.* LISTENtcp 0 0 *.shell *.* LISTENtcp 0 0 *.telnet *.* LISTENtcp 0 0 *.ftp *.* LISTENtcp 0 0 *.sunrpc *.* LISTENudp 0 0 *.1075 *.*udp 0 0 *.1074 *.*udp 0 0 *.1073 *.*udp 0 0 *.1072 *.*udp 0 0 *.1071 *.*udp 0 0 *.1070 *.*udp 0 0 *.chargen *.*udp 0 0 *.discard *.*udp 0 0 *.echo *.*udp 0 0 *.time *.*udp 0 0 *.daytime *.*udp 0 0 *.sunrpc *.*udp 0 0 *.syslog *.*Active UNIX domain socketsAddress Type Recv-Q Send-Q Inode Conn Refs Nextref Addrc3378e80 dgram 0 0 0 c336efc0 0 c3378f80c3378f80 dgram 0 0 0 c336efc0 0 c3378fc0c3378fc0 dgram 0 0 0 c336efc0 0 0c336efc0 dgram 0 0 c336db00 0 c3378e80 0 /var/run/logThe first column gives the protocol. The next two columns give the sizes of the send and receivequeues. These should be 0 or near 0. Otherwise, you may have a problem with that particular service.The next two columns give the socket or IP address and port number for each end of a connection.This socket pair uniquely identifies one connection. The socket is presented in the formhostname.service. Finally, the state of the connection is given in the last column for TCP services.This is blank for UDP since it is connectionless. The most common states are ESTABLISHED forcurrent connections, LISTEN for services awaiting a connection, and TIME_WAIT for recentlyterminated connections. Any of the TCP states could show up, but you should rarely see the others.An excessive number of SYN_RECEIVED, for example, is an indication of a problem (possibly adenial-of-service attack). You can safely ignore the last few lines of this listing.A couple of examples should clarify this output. The following line shows a Telnet connectionbetween bsd4 and sloan using port 1082 on sloan:tcp 0 2 bsd4.telnet sloan.1082 ESTABLISHEDThe next line shows that there was a second connection to sloan that was recently terminated:tcp 0 0 bsd4.telnet 205.153.60.247.3473 TIME_WAITTerminated connections remain in this state for a couple of minutes, during which time the socket paircannot be reused.Name resolution can be suppressed with the -n option if you would rather see numeric entries. Thereare a couple of reasons you might want to do this. Typically, netstat will run much faster withoutname resolution. This is particularly true if you are having name resolution problems and have to wait 18
  • 29. for requests to time out. This option can help you avoid confusion if your /etc/services or /etc/hostsfiles are inaccurate.The remaining TCP entries in the LISTEN state are services waiting for a connection request. Since arequest could come over any available interface, its IP address is not known in advance. The * in theentry *.echo acts as a placeholder for the unknown IP address. (Since multiple addresses may beassociated with a host, the local address is unknown until a connection is actually made.) The *.*entries indicate that both the remote address and port are unknown. As you can see, this shows anumber of additional services that ps was not designed to display. In particular, all the services that areunder the control of inetd are shown.Another use of netstat is to list the routing table. This may be essential information in resolvingrouting problems, e.g., when you discover that a host or a network is unreachable. Although it may betoo long or volatile on many systems to be very helpful, the routing table is sometimes useful ingetting a quick idea of what networks are communicating with yours. Displaying the routing tablerequires the -r option.There are four main ways entries can be added to the routing table—by the ifconfig command when aninterface is configured, by the route command, by an ICMP redirect, or through an update from adynamic protocol like RIP or OSPF. If dynamic protocols are used, the routing table is an example ofa dynamic structure that cant be discovered by looking at configuration files.Here is an example of a routing table from a FreeBSD system:bsd1# netstat -rnRouting tablesInternet:Destination Gateway Flags Refs Use Netif Expiredefault 205.153.60.2 UGSc 0 0 xl0127.0.0.1 127.0.0.1 UH 0 0 lo0172.16.1/24 172.16.2.1 UGSc 0 7 xl1172.16.2/24 link#2 UC 0 0 xl1172.16.2.1 0:10:7b:66:f7:62 UHLW 2 0 xl1 913172.16.2.255 ff:ff:ff:ff:ff:ff UHLWb 0 18 xl1172.16.3/24 172.16.2.1 UGSc 0 2 xl1205.153.60 link#1 UC 0 0 xl0205.153.60.1 0:0:a2:c6:e:42 UHLW 4 0 xl0 906205.153.60.2 link#1 UHLW 1 0 xl0205.153.60.5 0:90:27:9c:2d:c6 UHLW 0 34 xl0 987205.153.60.255 ff:ff:ff:ff:ff:ff UHLWb 1 18 xl0205.153.61 205.153.60.1 UGSc 0 0 xl0205.153.62 205.153.60.1 UGSc 0 0 xl0205.153.63 205.153.60.1 UGSc 2 0 xl0At first glance, output from other systems may be organized differently, but usually the same basicinformation is present. In this example, the -n option was used to suppress name resolution.The first column gives the destination, while the second gives the interface or next hop to thatdestination. The third column gives the flags. These are often helpful in interpreting the first twocolumns. A U indicates the path is up or available, an H indicates the destination is a host rather than anetwork, and a G indicates a gateway or router. These are the most useful. Others shown in this tableinclude b, indicating a broadcast address; S, indicating a static or manual addition; and W and c,indicating a route that was generated as a result of cloning. (These and other possibilities are describedin detail in the Unix manpage for some versions of netstat.) The fourth column gives a reference count, 19
  • 30. i.e., the number of active uses for each of the routes. This is incremented each time a connection isbuilt over the route (e.g., a Telnet connection is made using the route) and decremented when theconnection is torn down. The fifth column gives the number of packets sent using this entry. The lastentry is the interface that will be used.If you are familiar with the basics of routing, you have seen these tables before. If not, an explanationof the first few lines of the table should help. The first entry indicates the default route. This wasadded statically at startup. The second entry is the loopback address for the machine. The third entry isfor a remotely attached network. The destination network is a subnet from a Class B address space.The /24 is the subnet mask. Traffic to this network must go through 172.16.2.1, a gateway that isdefined with the next two entries. The fourth entry indicates that the network gateway, 172.16.2.1, ison a network that has a direct attachment through the second interface xl1. The entry that followsgives the specifics, including the Ethernet address of the gateways interface.In general, it helps to have an idea of the interfaces and how they are configured before you get toodeeply involved in routing tables. There are two quick ways to get this information—use the -i optionwith netstat or use the ifconfig command. Here is the output for the interfaces that netstat generates.This corresponds to the routing table just examined.bsd1# netstat -iName Mtu Network Address Ipkts Ierrs Opkts Oerrs Collxl0 1500 <Link> 00.10.5a.e3.37.0c 2123 0 612 0 0xl0 1500 205.153.60 205.153.60.247 2123 0 612 0 0xl1 1500 <Link> 00.60.97.92.4a.7b 478 0 36 0 0xl1 1500 172.16.2/24 172.16.2.13 478 0 36 0 0lp0* 1500 <Link> 0 0 0 0 0tun0* 1500 <Link> 0 0 0 0 0sl0* 552 <Link> 0 0 0 0 0ppp0* 1500 <Link> 0 0 0 0 0lo0 16384 <Link> 6 0 6 0 0lo0 16384 127 localhost 6 0 6 0 0For our purposes, we are interested in only the first four entries. (The other interfaces include the loop-back, lo0, and unused interfaces like ppp0*, the PPP interface.) The first two entries give the Ethernetaddress and IP address for the xl0 interface. The next two are for xl1. Notice that this also gives thenumber of input and output packets and errors as well. You can expect to see very large numbers forthese. The very low numbers indicate that the system was recently restarted.The format of the output may vary from system to system, but all will provide the same basicinformation. There is a lot more to netstat than this introduction shows. For example, netstat can berun periodically like top. We will return to netstat in future chapters.2.1.4 lsoflsof is a remarkable tool that is often overlooked. Written by Victor Abel, lsof lists open files on aUnix system. This might not seem a particularly remarkable service until you start thinking about theimplications. An application that uses the filesystem, networked or otherwise, will have open files atsome point. lsof offers a way to track that activity.The program is available for a staggering variety of Unix systems, often in both source and binaryformats. Although I will limit this discussion to networking related tasks, lsof is more properly anoperating system tool than a networking tool. You may want to learn more about lsof than describedhere. 20
  • 31. In its simplest form, lsof produces a list of all open files. Youll probably be quite surprised at thenumber of files that are open on a quiescent system. For example, on a FreeBSD system with no oneelse logged on, lsof listed 564 open files.Here is an example of the first few lines of output from lsof:bsd2# lsofCOMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAMEswapper 0 root cwd VDIR 116,131072 512 2 /swapper 0 root rtd VDIR 116,131072 512 2 /init 1 root cwd VDIR 116,131072 512 2 /init 1 root rtd VDIR 116,131072 512 2 /init 1 root txt VREG 116,131072 255940 157 /sbin/init...The most useful fields are the obvious ones, including the first three—the name of the command, theprocess ID, and its owner. The other fields and codes used in the fields are explained in the manpagefor lsof, which runs about 30 pages.It might seem that lsof returns too much information to be useful. Fortunately, it provides a number ofoptions that will allow you to tailor the output to your needs. You can use lsof with the -p option tospecify a specific process number or with the -c option to specify the name of a process. For example,the command lsof -csendmail will list all the files opened by sendmail. You only need to give enoughof the name to uniquely identify the process. The -N option can be used to list files opened for the Ylocal computer on an NFS server. That is, when run on an NFS client, lsof shows files opened by the FLclient. When run on a server, lsof will not show the files the server is providing to clients.The -i option limits output to Internet and X.25 network files. If no address is given, all such files will AMbe listed, effectively showing all open socket files on your network:bsd2# lsof -i TECOMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAMEsyslogd 105 root 4u IPv4 0xc3dd8f00 0t0 UDP *:syslogportmap 108 daemon 3u IPv4 0xc3dd8e40 0t0 UDP *:sunrpcportmap 108 daemon 4u IPv4 0xc3e09d80 0t0 TCP *:sunrpc (LISTEN)inetd 126 root 4u IPv4 0xc3e0ad80 0t0 TCP *:ftp (LISTEN)inetd 126 root 5u IPv4 0xc3e0ab60 0t0 TCP *:telnet (LISTEN)inetd 126 root 6u IPv4 0xc3e0a940 0t0 TCP *:shell (LISTEN)inetd 126 root 7u IPv4 0xc3e0a720 0t0 TCP *:login (LISTEN)inetd 126 root 8u IPv4 0xc3e0a500 0t0 TCP *:finger (LISTEN)inetd 126 root 9u IPv4 0xc3dd8d80 0t0 UDP *:biffinetd 126 root 10u IPv4 0xc3dd8cc0 0t0 UDP *:ntalkinetd 126 root 11u IPv6 0xc3e0a2e0 0t0 TCP *:ftpinetd 126 root 12u IPv6 0xc3e0bd80 0t0 TCP *:telnetinetd 126 root 13u IPv6 0xc3e0bb60 0t0 TCP *:shellinetd 126 root 14u IPv6 0xc3e0b940 0t0 TCP *:logininetd 126 root 15u IPv6 0xc3e0b720 0t0 TCP *:fingerlpd 131 root 6u IPv4 0xc3e0b500 0t0 TCP *:printer (LISTEN)sendmail 137 root 4u IPv4 0xc3e0b2e0 0t0 TCP *:smtp (LISTEN)httpd 185 root 16u IPv4 0xc3e0b0c0 0t0 TCP *:http (LISTEN)httpd 198 nobody 16u IPv4 0xc3e0b0c0 0t0 TCP *:http (LISTEN)httpd 199 nobody 16u IPv4 0xc3e0b0c0 0t0 TCP *:http (LISTEN)httpd 200 nobody 16u IPv4 0xc3e0b0c0 0t0 TCP *:http (LISTEN)httpd 201 nobody 16u IPv4 0xc3e0b0c0 0t0 TCP *:http (LISTEN)httpd 202 nobody 16u IPv4 0xc3e0b0c0 0t0 TCP *:http (LISTEN)httpd 10408 nobody 16u IPv4 0xc3e0b0c0 0t0 TCP *:http (LISTEN)httpd 10409 nobody 16u IPv4 0xc3e0b0c0 0t0 TCP *:http (LISTEN)httpd 10410 nobody 16u IPv4 0xc3e0b0c0 0t0 TCP *:http (LISTEN) 21 Team-Fly®
  • 32. httpd 25233 nobody 16u IPv4 0xc3e0b0c0 0t0 TCP *:http (LISTEN)httpd 25236 nobody 16u IPv4 0xc3e0b0c0 0t0 TCP *:http (LISTEN)telnetd 58326 root 0u IPv4 0xc3e0eb60 0t0 TCPbsd2.lander.edu:telnet->sloan.lander.edu:1184 (ESTABLISHED)telnetd 58326 root 1u IPv4 0xc3e0eb60 0t0 TCPbsd2.lander.edu:telnet->sloan.lander.edu:1184 (ESTABLISHED)telnetd 58326 root 2u IPv4 0xc3e0eb60 0t0 TCPbsd2.lander.edu:telnet->sloan.lander.edu:1184 (ESTABLISHED)perl 68936 root 4u IPv4 0xc3dd8c00 0t0 UDP *:eicon-x25ping 81206 nobody 3u IPv4 0xc3e98f00 0t0 ICMP *:*As you can see, this is not unlike the -a option with netstat. Apart from the obvious differences in thedetails reported, the big difference is that lsof will not report connections that do not have files open.For example, if a connection is being torn down, all files may already be closed. netstat will stillreport this connection while lsof wont. The preferred behavior will depend on what information youneed.If you specify an address, then only those files related to the address will be listed:bsd2# lsof -i@sloan.lander.eduCOMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAMEtelnetd 73825 root 0u IPv4 0xc3e0eb60 0t0 TCP bsd2.lander.edu:telnet->sloan.lander.edu:1177 (ESTABLISHED)telnetd 73825 root 1u IPv4 0xc3e0eb60 0t0 TCP bsd2.lander.edu:telnet->sloan.lander.edu:1177 (ESTABLISHED)telnetd 73825 root 2u IPv4 0xc3e0eb60 0t0 TCP bsd2.lander.edu:telnet->sloan.lander.edu:1177 (ESTABLISHED)One minor problem with this output is the identification of the telnet user as root—a consequence ofroot owning telnetd, the servers daemon. On some systems, you can use the PID with the -p option totrack down the device entry and then use lsof on the device to discover the owner. Unfortunately, thiswont work on many systems.You can also use lsof to track an FTP transfer. You might want to do this to see if a transfer is makingprogress. You would use the -p option to see which files are open to the process. You can then use -adto specify the device file descriptor along with -r to specify repeat mode. lsof will be run repeatedly,and you can see if the size of the file is changing.Other uses of lsof are described in the manpage, the FAQ, and a quick-start guide supplied with thedistribution. The latter is probably the best place to begin.2.1.5 ifconfigifconfig is usually thought of as the command used to alter the configuration of the network interfaces.But, since you may need to know the current configuration of the interfaces before you make changes,ifconfig provides a mechanism to retrieve interface configurations. It will report the configuration ofall the interfaces when called with the -a option or of a single interface when used with the interfacesname.Here are the results for the system we just looked at:bsd1# ifconfig -a 22
  • 33. xl0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 inet 205.153.60.247 netmask 0xffffff00 broadcast 205.153.60.255 ether 00:10:5a:e3:37:0c media: 10baseT/UTP <half-duplex> supported media: autoselect 100baseTX <full-duplex> 100baseTX <half-duplex> 100baseTX 10baseT/UTP <full-duplex> 10baseT/UTP <half-duplex>10baseT/UTPxl1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 inet 172.16.2.13 netmask 0xffffff00 broadcast 172.16.2.255 ether 00:60:97:92:4a:7b media: 10baseT/UTP <half-duplex> supported media: autoselect 100baseTX <full-duplex> 100baseTX <half-duplex> 100baseTX 10baseT/UTP <full-duplex> 10baseT/UTP 10baseT/UTP <half-duplex>lp0: flags=8810<POINTOPOINT,SIMPLEX,MULTICAST> mtu 1500tun0: flags=8010<POINTOPOINT,MULTICAST> mtu 1500sl0: flags=c010<POINTOPOINT,LINK2,MULTICAST> mtu 552ppp0: flags=8010<POINTOPOINT,MULTICAST> mtu 1500lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384 inet 127.0.0.1 netmask 0xff000000You can see that for the interfaces xl0 and xl1, we are given a general status report. UP indicates thatthe interface is operational. If UP is missing, the interface is down and will not process packets. ForEthernet, the combination of BROADCAST, SIMPLEX, and MULTICAST is not surprising. The mtu isthe largest frame size the interface will handle. Next, we have the IP number, address mask, andbroadcast address. The Ethernet address comes next, although some systems (Solaris, for example)will suppress this if you arent running the program as root. Finally, we see information about thephysical interface connections.You can ignore the entries for lp0, tun0, sl0, and ppp0. In fact, if you dont want to see these, you canuse the combination -au to list just the interfaces that are up. Similarly, -d is used to list just theinterfaces that are down.While netstat allows you to get basic information on the interfaces, if your goal is configurationinformation, ifconfig is a better choice. First, as you can see, ifconfig supplies more of that sort ofinformation. Second, on some systems, netstat may skip interfaces that havent been configured.Finally, ifconfig also allows you to change parameters such as the IP addresses and masks. Inparticular, ifconfig is frequently used to shut down an interface. This is roughly equivalent todisconnecting the interface from the network. To shut down an interface, you use the down option. Forexample, ifconfig xl1 down will shut down the interface xl1, and ifconfig xl1 up will bring it back up.Of course, you must have root privileges to use ifconfig to change configurations.Since ifconfig is used to configure interfaces, it is typically run automatically by one of the startupscripts when the system is booted. This is something to look for when you examine startup scripts.The use of ifconfig is discussed in detail in Craig Hunts TCP/IP Network Administration.2.1.6 arpThe ARP table on a system maps network addresses into MAC addresses. Of course, the ARP tableapplies only to directly connected devices, i.e., devices on the local network. Remote devices, i.e.,devices that can be reached only by sending traffic through one or more routers, will not be added tothe ARP table since you cant communicate with them directly. (However, the appropriate routerinterface will be added.) 23
  • 34. Typically, addresses are added or removed automatically. If your system needs to communicate withanother system on the local network whose MAC address is unknown, your system sends an ARPrequest, a broadcast packet with the destinations IP address. If the system is accessible, it will respondwith an ARP reply that includes its MAC address. Your system adds this to its ARP table and thenuses this information to send packets directly to the destination. (A simple way to add an entry for adirectly connected device to the ARP table is to ping the device you want added. ping is discussed indetail in Chapter 3.) Most systems are configured to drop entries from the ARP table if they arentbeing used, although the length of the timeout varies from system to system.At times, you may want to examine or even change entries in the ARP table. The arp command allowsyou to do this. When arp is invoked with the -a option, it reports the current contents of the ARP table.Here is an example from a Solaris system:sol1# arp -aNet to Media TableDevice IP Address Mask Flags Phys Addr------ -------------------- --------------- ----- ---------------elxl0 205.153.60.1 255.255.255.255 00:00:a2:c6:0e:42elxl0 205.153.60.53 255.255.255.255 00:e0:29:21:3c:0belxl0 205.153.60.55 255.255.255.255 00:90:27:43:72:70elxl0 mail.lander.edu 255.255.255.255 00:90:27:9c:2d:c6elxl0 sol1 255.255.255.255 SP 00:60:97:58:71:b7elxl0 pm3.lander.edu 255.255.255.255 00:c0:05:04:2d:78elxl0 BASE-ADDRESS.MCAST.NET 240.0.0.0 SM 01:00:5e:00:00:00The format or details may vary from system to system, but the same basic information should beprovided.For Solaris, the first column gives the interface for the connection. The next two are the IP addressand its mask. (You can get just IP numbers by using the -n option.) There are four possible flags thatmay appear in the flags column. An S indicates a static entry, one that has been manually set ratherthan discovered. A P indicates an address that will be published. That is, this machine will provide thisaddress should it receive an ARP request. In this case, the P flag is for the local machine, so it isnatural that the machine would respond with this information. The flags U and M are used forunresolved and multicast addresses, respectively. The final column is the actual Ethernet address.This information can be useful in several ways. It can be used to determine the Ethernet hardware inthis computer, as well as the hardware in directly connected devices. The IEEE assigns to themanufacturers of Ethernet adapters unique identifiers to be used as the first three bytes of theirEthernet addresses. These addresses, known as Organizationally Unique Identifiers (OUI), can befound at the IEEE web page at http://standards.ieee.org/regauth/oui/index.html. In other words, thefirst three bytes of an Ethernet address identify the manufacturer. In this case, by entering on this webpage 00 60 97, i.e., the first three bytes of the address 00 60 97 58 71 b7, we find that the hostsol1 has a 3COM Ethernet adapter. In the same manner we can discover that the host 205.153.60.1 isBay Networks equipment. OUI designations are not foolproof. The MAC address of a device may have been changed and may not have the manufacturers OUI. And even if you can identify the manufacturer, in todays world of merger mania and takeovers, you may see an OUI of an acquired company that you dont recognize. 24
  • 35. If some machines on your network are reachable but others arent, or connectivity comes and goes,ARP problems may be the cause. (For an example of an ARP problem, see Chapter 12.) If you thinkyou might have a problem with IP-to-Ethernet address resolution on your local network, arp is thelogical tool to use to diagnose the problem. First, look to see if there is an entry for the destination andif it is correct. If it is missing, you can attempt to add it using the -s option. (You must be root.) If theentry is incorrect, you must first delete it with the -d option. Entries added with the -s option will nottime out but will be lost on reboot. If you want to permanently add an entry, you can create a startupscript to do this. In particular, in a script, arp can use the -f option to read entries from a file.The usual reason for an incorrect entry in an arp table is a duplicated IP address somewhere on yournetwork. Sometimes this is a typing mistake. Sometimes when setting up their computers, people willcopy the configuration from other computers, including the supposedly unique IP number. A rogueDHCP server is another possibility. If you suspect one of your hosts is experiencing problems causedby a duplicate IP number on the network, you can shut down the interface on that computer or unplugit from the network. (This is less drastic than shutting down the computer, but that will also work.)Then you can ping the IP address in question from a second computer. If you get an answer, someother computer is using your IP address. Your arp table should give you the Ethernet address of theoffending machine. Using its OUI will tell you the type of hardware. This usually wont completelylocate the problem machine, but it is a start, particularly for unusual hardware.[2] [2] You can also use arp to deliberately publish a bad address. This will shut up a connection request that wont otherwise stop.2.1.7 Scanning ToolsWeve already discussed one reason why ps may not give a complete picture of your system. There isanother much worse possibility. If you are having security problems, your copy of ps may becompromised. Crackers sometimes will replace ps with their own version that has been patched tohide their activities. In this event, you may have an additional process running on your system thatprovides a backdoor that wont show up under ps.One way of detecting this is to use a port scanner to see which ports are active on your system. Youcould choose to do this from the compromised system, but you are probably better off doing this froma remote system known to be secure. This assumes, however, that the attacker hasnt installed atrapdoor on the compromised host that is masquerading as a legitimate service on a legitimate port.There are a large number of freely available port scanners. These include programs like gtkportscan,nessus, portscan, and strobe, to name just a few. They generally work by generating a connectionrequest for each port number in the range being tested. If they receive a reply from the port, they add itto their list of open ports. Here is an example using portscan:bsd1# portscan 205.153.63.239 1 10000 -vvThis is a portscanner - Rafael Barrero, Jr.Email me at rbarrero@polymail.calpoly.eduFor further information. Enjoy!Port: 7 --> echoPort: 9 --> discardPort: 13 --> daytimePort: 19 --> chargenPort: 21 --> ftpPort: 23 --> telnetPort: 25 --> smtpPort: 37 --> time 25
  • 36. Port: 79 --> fingerPort: 111 --> sunrpcPort: 513 --> loginPort: 514 --> shellThe arguments are the destination address and beginning and ending port numbers. The result is a listof port numbers and service names for ports that answered.Figure 2-1 shows another example of a port scanner running under Windows NT. This particularscanner is from Mentor Technologies, Inc., and can be freely downloaded fromhttp://www.mentortech.com/learn/tools/tools.shtml. It is written in Java, so it can be run on bothWindows and Unix machines but will require a Java runtime environment. It can also be run incommand-line mode. Beware, this scanner is very slow when used with Windows. Figure 2-1. Chesapeake Port ScannerMost administrators look on such utilities as tools for crackers, but they can have legitimate uses asshown here. Keep in mind that the use of these tools has political implications. You should be safescanning your own system, but you are on very shaky ground if you scan other systems. These twotools make no real effort to hide what they are doing, so they are not difficult to detect. Stealth portscanners, however, send the packets out of order over extended periods of time and are, consequently,more difficult to detect. Some administrators consider port scans adequate justification for cuttingconnections or blocking all traffic from a site. Do not use these tools on a system withoutauthorization. Depending on the circumstances, you may want to notify certain colleagues before youdo a port scan even if you are authorized. In Chapter 12, we will return to port scanners and examineother uses, such as testing firewalls.One last word about these tools. Dont get caught up in using tools and overlook simpler tests. Forexample, you can check to see if sendmail is running by trying to connect to the SMTP port using 26
  • 37. telnet. In this example, the test not only tells me that sendmail is running, but it also tells me whatversion of sendmail is running:lnx1# telnet 205.153.63.239 25Trying 205.153.63.239...Connected to 205.153.63.239.Escape character is ^].220 bsd4.lander.edu ESMTP Sendmail 8.9.3/8.9.3; Wed, 8 Mar 2000 09:38:02 -0500(EST)quit221 bsd4.lander.edu closing connectionConnection closed by foreign host.In the same spirit:bsd1# ipfw listipfw: getsockopt(IP_FW_GET): Protocol not availableclearly shows ipfw is not running on this system. All I did was try to use it. This type of application-specific testing is discussed in greater detail in Chapter 10.2.2 System Configuration FilesA major problem with configuration files under Unix is that there are so many of them in so manyplaces. On a multiuser system that provides a variety of services, there may be scores of configurationfiles scattered among dozens of directories. Even worse, it seems that every implementation of Unix isdifferent. Even different releases of the same flavor of Unix may vary. Add to this the complicationsthat multiple applications contribute and you have a major undertaking. If you are running a numberof different platforms, you have your work cut out for you.For these reasons, it is unrealistic to attempt to give an exhaustive list of configuration files. It ispossible, however, to discuss configuration files by categories. The categories can then serve as aguide or reminder when you construct your own lists so that you dont overlook an important group offiles. Just keep in mind that what follows is only a starting point. You will have to discover yourparticular implementations of Unix one file at a time.2.2.1 Basic Configuration FilesThere are a number of fairly standard configuration files that seem to show up on most systems. Theseare usually, but not always, located in the /etc directory. (For customization, you may see a number offiles in the /usr/local or /usr/opt directories or their subdirectories.) When looking at files, this isclearly the first place to start. Your system will probably include many of the following:defaultdomain, defaultroute, ethers, gateways, host.conf, hostname, hosts, hosts.allow, hosts.equiv,inetd.conf, localhosts, localnetworks, named.boot, netmasks, networks, nodename, nsswitch.conf,protocols, rc, rc.conf, rc.local, resolv.conf, and services. You wont find all of these on a single system.Each version and release will have its own conventions. For example, Solaris puts the hosts name innodename.[3] With BSD, it is set in rc.conf. Customizations may change these as well. Thus, thelocations and names of files will vary from system to system. [3] The hostname may be used in other files as well so dont try to change the hostname by editing these files. Use the hostname command instead. 27
  • 38. One starting point might be to scan all the files in /etc and its subdirectories, trying to identify whichones are relevant. In the long run, you may want to know the role of all the files in /etc, but you dontneed to do this all at once.There are a few files or groups of files that will be of particular interest. One of the most important isinetd.conf. While we can piece together what is probably being handled by inetd by using ps incombination with netstat, an examination of inetd.conf is usually much quicker and safer. On anunfamiliar system, this is one of the first places you will want to look. Be sure to compare this to theoutput provided by netstat. Services that you cant match to running processes or inetd are a cause forconcern.You will also want to examine files like host.conf, resolv.conf, and nsswitch.conf to discover howname resolution is done. Be sure to examine files that establish trust relationships like hosts.allow.This is absolutely essential if you are having, or want to avoid, security problems. (There is more onsome of these files in the discussion of tcpwrappers in Chapter 11.)Finally, there is one group of these files, the rc files, that deserve particular attention. These arediscussed separately in the later section on startup files and scripts.2.2.2 Configuration ProgramsOver the years, Unix has been heavily criticized because of its terse command-line interface. As aresult, many GUI applications have been developed. System administration has not escaped this trend.These utilities can be used to display as well as change system configurations.Once again, every flavor of Unix will be different. With Solaris, admintool was the torchbearer foryears. In recent years, this has been superseded with Solstice AdminSuite. With FreeBSD, select theconfigure item from the menu presented when you run /stand/sysinstall. With Linux you can uselinuxconf. Both the menu and GUI versions of this program are common. The list goes on.2.2.3 KernelIts natural to assume that examining the kernels configuration might be an important first step. Butwhile it may, in fact, be essential in resolving some key issues, in general, it is usually not the mostproductive place to look. You may want to postpone this until it seems absolutely necessary or youhave lots of free time.As you know, the first step in starting a system is loading and initializing the kernel. Network servicesrely on the kernel being configured correctly. Some services will be available only if first enabled inthe kernel. While examining the kernels configuration wont tell you which services are actually beingused, it can give some insight into what is not available. For example, if the kernel is not configured toforward IP packets, then clearly the system is not being used as a router, even if it has multipleinterfaces. On the other hand, it doesnt immediately follow that a system is configured as a firewalljust because the kernel has been compiled to support filtering.Changes to the kernel will usually be required only when building a new system, installing a newservice or new hardware, or tuning system performance. Changing the kernel will not normally beneeded to simply discover how a system is configured. However, changes may be required to usesome of the tools described later in this book. For example, some versions of FreeBSD have not, bydefault, enabled the Berkeley packet filter pseudodriver. Thus, it is necessary to recompile the kernelto enable this before some packet capture software, such as tcpdump, can be run on these systems. 28
  • 39. To recompile a kernel, youll need to consult the documentation for your operating system for thespecifics. Usually, recompiling a kernel first requires editing configuration files. This may be donemanually or with the aid of a utility created for this task. For example, with Linux, the command makeconfig runs an interactive program that sets appropriate parameters.[4] BSD uses a program calledconfig. If you can locate the configuration files used, you can see how the kernel was configured. But,if the kernel has been rebuilt a number of times without following a consistent naming scheme, thiscan be surprisingly difficult. [4] You can also use make xconfig or make menuconfig. These are more interactive, allowing you to go back and change parameters once you have moved on. make config is unforgiving in this respect.As an example, on BSD-derived systems, the kernel configuration files are usually found in thedirectory /sys/arch/conf/kernelwhere arch corresponds to the architecture of the system andkernel is the name of the kernel. With FreeBSD, the file might be /sys/i386/conf/GENERIC if thekernel has not been recompiled. In Linux, the configuration file is .config in whatever directory thekernel was unpacked in, usually /usr/src/linux/.As you might expect, lines beginning with a # are comments. What youll probably want to look forare lines specifying unusual options. For example, it is not difficult to guess that the following linesfrom a FreeBSD system indicate that the machine may be used as a firewall:...# Firewall optionsoptions IPFIREWALLoptions IPFIREWALL_VERBOSE_LIMIT=25...Some entries can be pretty cryptic, but hopefully there are some comments. The Unix manpages for asystem may describe some options.Unfortunately, there is very little consistency from one version of Unix to the next on how such filesare named, where they are located, what information they may contain, or how they are used. Forexample, Solaris uses the file /etc/system to hold some directives, although there is little of interest inthis file for our purposes. IRIX keeps its files in the /var/sysgen/system directory. For Linux, take alook at /etc/conf.modules. The list goes on.[5] [5] While general configuration parameters should be in a single file, a huge number of files are actually involved. If you have access to FreeBSD, you might look at /sys/conf/files to get some idea of this. This is a list of the files FreeBSD uses.It is usually possible to examine or change selected system parameters for an existing kernel. Forexample, Solaris has the utilities sysdef, prtconf, and ndd. For our purposes, ndd is the most interestingand should provide the flavor of how such utilities work.Specifically, ndd allows you to get or set driver configuration parameters. You will probably want tobegin by listing configurable options. Specifying the driver (i.e., /dev/arp, /dev/icmp, /dev/ip, /dev/tcp,and /dev/udp) with the ? option will return the parameters available for that driver. Here is an example:sol1# ndd /dev/arp ?? (read only)arp_cache_report (read only)arp_debug (read and write)arp_cleanup_interval (read and write) 29
  • 40. This shows three parameters that can be examined, although only two can be changed. We canexamine an individual parameter by using its name as an argument. For example, we can retrieve theARP table as shown here:sol1# ndd /dev/arp arp_cache_reportifname proto addr proto mask hardware addr flagselxl0 205.153.060.053 255.255.255.255 00:e0:29:21:3c:0belxl0 205.153.060.055 255.255.255.255 00:90:27:43:72:70elxl0 205.153.060.001 255.255.255.255 00:00:a2:c6:0e:42elxl0 205.153.060.005 255.255.255.255 00:90:27:9c:2d:c6elxl0 205.153.060.248 255.255.255.255 00:60:97:58:71:b7 PERM PUBLISH MYADDRelxl0 205.153.060.150 255.255.255.255 00:c0:05:04:2d:78elxl0 224.000.000.000 240.000.000.000 01:00:5e:00:00:00 PERM MAPPINGIn this instance, it is fairly easy to guess the meaning of whats returned. (This output is for the sameARP table that we looked at with the arp command.) Sometimes, whats returned can be quite cryptic.This example returns the value of the IP forwarding parameter:# ndd /dev/ip ip_forwarding0It is far from obvious how to interpret this result. In fact, 0 means never forward, 1 means alwaysforward, and 2 means forward only when two or more interfaces are up. Ive never been able to locatea definitive source for this sort of information, although a number of the options are described in anappendix to W. Richard Stevens TCP/IP Illustrated, vol. 1. If you want to change parameters, you caninvoke the program interactively.Other versions of Unix will have their own files and utilities. For example, BSD has the sysctlcommand. This example shows that IP forwarding is disabled:bsd1# sysctl net.inet.ip.forwardingnet.inet.ip.forwarding: 0The manpages provide additional guidance, but to know what to change, you may have to delve intothe source code. With AIX, there is the no utility. As I have said before, the list goes on.This brief description should give you a general idea of whats involved in gleaning information aboutthe kernel, but you will want to go to the appropriate documentation for your system. It should beclear that it takes a fair degree of experience to extract this kind of information. Occasionally, there isa bit of information that can be obtained only this way, but, in general, this is not the most profitableplace to start.One last comment—if you are intent on examining the behavior of the kernel, you will almostcertainly want to look at the messages it produces when booting. On most systems, these can beretrieved with the dmesg command. These can be helpful in determining what network hardware yoursystem has and what drivers it uses. For hardware, however, I generally prefer opening the case andlooking inside. Accessing the CMOS is another approach for discovering the hardware that doesntrequire opening the box.2.2.4 Startup Files and Scripts 30
  • 41. Once the kernel is loaded, the swapper or scheduler is started and then the init process runs. Thisprocess will, in turn, run a number of startup scripts that will start the various services and doadditional configuration chores.After the standard configuration files, these are the next group of files you might want to examine.These will primarily be scripts, but may include configuration files read by the scripts. In general, it isa bad idea to bury configuration parameters within these scripts, but this is still done at times. Youshould also be prepared to read fairly cryptic shell code. It is hoped that most of these will be either intheir pristine state, heavily commented, or both.Look for three things when examining these files. First, some networking parameters may be buried inthese files. You will not want to miss these. Next, there may be calls to network configuration utilitiessuch as route or ifconfig. These are frequently customizations, so read these with a critical eye. Finally,networking applications such as sendmail may be started from these files. I strongly urge that youcreate a list of all applications that are run automatically at startup.For systems derived from BSD, you should look for files in /etc beginning with rc. Be sure to look atrc.conf and any rc files with extensions indicating a networking function of interest, e.g., rc.firewall.Realize that many of these will be templates for services that you may not be using. For example, ifyou see the file rc.atm, dont be too disappointed when you cant find your ATM connection.Unix systems can typically be booted in one of several different states or run levels that determine Ywhich services are started. For example, run level 1 is single-user mode and is used for systemmaintenance. The services started by the different run levels vary somewhat among the different FLflavors of Unix. If your system is derived from System V, then the files will be in a half dozen or sodirectories in /etc. These are named rc1.d, rc2.d, and so forth. The digit indicates the run level of the AMsystem when booted. Networking scripts are usually in rc2.d. In each directory, there will be scriptsstarting with an S or a K and a two-digit number. The rest of the name should give some indication ofthe function of the file. Files with names beginning with an S are started in numerical order when thesystem is rebooted. When the system shuts down, the files with K are run. (Some versions of Linux, TEsuch as Red Hat, follow this basic approach but group these directories together in the /etc/rc.ddirectory. Others, such as Debian, follow the System V approach.) There is one serious catch with all this. When versions of operating systems change, sometimes the locations of files change. For backward compatibility, links may be created to old locations. For example, on recent versions of Solaris, the network configuration file /etc/hosts is actually a link to /etc/inet/hosts. There are other important network configuration files that are really in /etc/inet, not /etc. Similarly, some of the startup scripts are really links to files in /etc/init.d. If the link is somehow broken, you may find yourself editing the wrong version of a file and wondering why the system is ignoring your changes.2.2.5 Other FilesThere are several other categories of files that are worth mentioning briefly. If you have beenfollowing the steps just described, you will already have found most of these, but it may be worthmentioning them separately just in case you have overlooked something.2.2.5.1 Application files 31 Team-Fly®
  • 42. Once you have your list of applications that are started automatically, investigate how each applicationis configured. When it comes to configuration files, each application will follow its own conventions.The files may be grouped together, reside in a couple of directories, or have some distributed structurethat spans a number of directories. For example, sendmail usually keeps configuration files together,usually in /etc or in /etc/mail. DNS may have a couple of files in /etc to get things started, with thedatabase files grouped together somewhere else. A web server like apache may have an extensive setof files distributed across a number of directories, particularly if you consider content. But beware,your particular implementation may vary from the norm—in that case, all bets are off. You will needto look for these on an application-by-application and a system-by-system basis.2.2.5.2 Security filesIt is likely you will have already discovered relevant security files at this point, but if you are havingproblems, this is something worth revisiting. There are several different categories to consider:Trust relationships Some files such as /etc/hosts.equiv set up trust relationships with other computers. You will definitely want to review these. Keep in mind that users can establish their own trust relationships, so dont forget the .rhost file in home directories if you are having problems tied to specific users.Traffic control A number of files may be tied to general access or the control of traffic. These include configuration files for applications like tcpwrappers or firewall configuration files.Application specific Dont forget that individual applications may have security files as well. For example, the file /etc/ftpusers may be used by ftp to restrict access. These are very easy to overlook.2.2.5.3 Log filesOne last category of files you might want to consider is log files. Strictly speaking, these are notconfiguration files. Apart from an occasional startup message, these may not tell you very much aboutyour systems configuration. But occasionally, these will provide the missing puzzle piece forresolving a problem. Log files are described in much greater detail in Chapter 11.2.3 Microsoft WindowsNetworking with Windows can be quite complicated, since it may involve Microsofts proprietaryenhancements. Fortunately, Microsofts approach to TCP/IP is pretty standard. As with Unix, you canapproach the various versions of Windows by looking at configuration parameters or by using utilitiesto examine the current configuration. For the most part, you wont be examining files directly underWindows, at least for versions later than Windows for Workgroups. Rather, youll use the utilities thatWindows provides. (There are exceptions. For example, like Unix, Windows has hosts, protocol, andservices files.) 32
  • 43. If you are looking for basic information quickly, Microsoft provides one of two programs for thispurpose, depending on which version of Windows you use. The utility winipcfg is included withWindows 95/98. A command-line program, ipconfig, is included with Windows NT and Windows2000 and in Microsofts TCP/IP stack for Windows for Workgroups. Both programs provide the sameinformation. winipcfg produces a pop-up window giving the basic parameters such as the Ethernetaddress, the IP address, the default route, the name servers address, and so on (see Figure 2-2). Youcan invoke the program by entering the program name from Run on the start menu or in a DOSwindow. The most basic parameters will be displayed. Additional information can be obtained byusing the /all option or by clicking on the More Info >> button. Figure 2-2. winipcfgFor ipconfig, start a DOS window. You can use the command switch /all to get the additional details.As in Unix, the utilities arp, hostname, and netstat are available. All require a DOS window to run.There are a few differences in syntax, but they work basically the same way and provide the samesorts of information. For example, arp -a will list all the entries in the ARP table:C:>arp -aInterface: 205.153.63.30 on Interface 2 Internet Address Physical Address Type 205.153.63.1 00-00-a2-c6-28-44 dynamic 205.153.63.239 00-60-97-06-22-22 dynamicThe command netstat -r gives the computers routing table:C:>netstat -rRoute Table 33
  • 44. ===========================================================================Interface List0x1 ........................... MS TCP Loopback interface0x2 ...00 10 5a a1 e9 08 ...... 3Com 3C90x Ethernet Adapter0x3 ...00 00 00 00 00 00 ...... NdisWan Adapter======================================================================================================================================================Active Routes:Network Destination Netmask Gateway Interface Metric 0.0.0.0 0.0.0.0 205.153.63.1 205.153.63.30 1 127.0.0.0 255.0.0.0 127.0.0.1 127.0.0.1 1 205.153.63.0 255.255.255.0 205.153.63.30 205.153.63.30 1 205.153.63.30 255.255.255.255 127.0.0.1 127.0.0.1 1 205.153.63.255 255.255.255.255 205.153.63.30 205.153.63.30 1 224.0.0.0 224.0.0.0 205.153.63.30 205.153.63.30 1 255.255.255.255 255.255.255.255 205.153.63.30 205.153.63.30 1===========================================================================Active Connections Proto Local Address Foreign Address State TCP jsloan:1025 localhost:1028 ESTABLISHED TCP jsloan:1028 localhost:1025 ESTABLISHED TCP jsloan:1184 205.153.60.247:telnet ESTABLISHED TCP jsloan:1264 mail.lander.edu:pop3 TIME_WAITAs you can see, the format is a little different, but it supplies the same basic information. (You canalso use the command route print to list the routing table.) You can use netstat -a to get the activeconnections and services. There really isnt an option that is analogous to -i in Unixs netstat (theoption to display attached interfaces). For a listing of the basic syntax and available commands, trynetstat /?.While Windows does not provide ps, both Windows NT and Windows 2000 provide the TaskManager (taskmgr.exe), a utility that can be used to see or control what is running. If you have theWindows Resource Kit, three additional utilities, process viewer (pviewer.exe), process explode(pview.exe), and process monitor (pmon.exe), are worth looking at. All four can be started by enteringtheir names at Start Run. The Task Manager can also be started by pressing Ctrl-Alt-Delete andselecting Task Manager from the menu or by right-clicking on a vacant area on the task bar at thebottom of the screen and selecting Task Manger from the menu.You wont need NTs administrator privileges to use the DOS-based commands just described. If youwant to reconfigure the system or if you need additional details, you will need to turn to the utilitiesprovided by Windows. For NT, this will require administrator privileges. (Youll also needadministrative privileges to make changes with arp or route.) This is available from Start Settings Control Panel Network or by following a similar path from My Computer. Select theappropriate tab and fields as needed.If you are interested in port scanners, a number are available. I have already mentioned that theChesapeake Port Scanner will run under Windows. Scan the Internet for others.Finally, for the really brave of heart, you can go into the registry. But thats a subject for another book.(See Paul Robichauxs Managing the Windows 2000 Registry or Steven Thomass Windows NT 4.0Registry.) 34
  • 45. Chapter 3. Connectivity TestingThis chapter describes simple tests for individual network links and for end-to-end connectivitybetween networked devices. The tools described in this chapter are used to show that there is afunctioning connection between two devices. These tools can also be used for more sophisticatedtesting, including the discovery of path characteristics and the general performance measurements.These additional uses are described in Chapter 4. Tools used for testing protocol issues related toconnectivity are described in Chapter 9. You may want to turn next to these chapters if you needadditional information in either of these areas.This chapter begins with a quick review of cabling practices. If your cabling isnt adequate, thats thefirst thing you need to address. Next, there is a lengthy discussion of using ping to test connectivityalong with issues that might arise when using ping, such as security problems. Next, I describealternatives to ping. Finally, I discuss alternatives that run on Microsoft Windows platforms.3.1 CablingFor most managers, cabling is the most boring part of a network. Even administrators who arenormally control freaks will often jump at the opportunity to delegate or cede responsibility forcabling to somebody else. It has none of the excitement of new equipment or new software. It is oftenhidden away in wiring closets, walls, and ceilings. When it is visible, it is usually in the way or aneyesore. The only time most managers think about cabling is when it is causing problems. Yet, unlessyou are one of a very small minority running a wireless network, it is the core of your network.Without adequate cabling, you dont have a network.Although this is a book about software tools, not cabling, the topics are not unrelated. If you have acabling problem, you may need to turn to the tools described later in this chapter to pinpoint theproblem. Conversely, to properly use these tools, you cant ignore cabling, as it may be the real sourceof your problems.If a cable is damaged, it wont be difficult to recognize the problem. But intermittent cabling problemscan be a nightmare to solve. The problem may be difficult to recognize as a cabling problem. It maycome and go, working correctly most of the time. The problem may arise in cables that have been inuse for years. For example, I once watched a technician try to deal with a small classroom LAN thathad been in use for more than five years and would fail only when the network was heavily loaded,i.e., if and only if there was a scheduled class in the room. The problem took weeks before whatproved to be a cabling problem was resolved. In the meantime, several classes were canceled.A full discussion of cabling practices, standards, and troubleshooting has been the topic of severalbooks, so this coverage will be very selective. I am assuming that you are familiar with the basics. Ifnot, several references in Appendix B provide a general but thorough introduction to cabling.With cabling, as with most things, it is usually preferable to prevent problems than to have tosubsequently deal with them. The best way to avoid cabling problems is to take a proactive approach.While some of the following suggestions may seem excessive, the costs are minimal when comparedto what can be involved in solving a problem. 35
  • 46. 3.1.1 Installing New CablingIf you are faced with a new installation, take the time to be sure it is done correctly from the start.While it is fairly straightforward to wire a few machines together in a home office, cabling should notgenerally be viewed as a do-it-yourself job. Large cabling projects should be left to trainedprofessionals whenever possible.Cabling is usually a large investment. Correcting cabling problems can be very costly in lost time bothfor diagnosing the problem and for correcting the problem. Also, cabling must conform to allapplicable building and fire codes. For example, using nonplenum cabling in plenum spaces can, inthe event of a fire, greatly endanger the safety of you and your fellow workers. (Plenum cabling iscabling designed to be used in plenum spaces, spaces used to recirculate air in a building. It usesmaterials that have low flame-spread and low smoke-producing properties.)Cabling can also be very sensitive to its physical environment. Cable that runs too near fluorescentlights or large motors, e.g., elevator motors, can be problematic. Proximity to power lines can alsocause problems. The network cable acts like an antenna, picking up other nearby electrical activity andintroducing unwanted signals or noise onto the network. This can be highly intermittent and verydifficult to identify. Concerns such as these should be enough to discourage you from doing the jobyourself unless you are very familiar with the task.Unfortunately, sometimes budget or organizational policies are such that you will have no choice butto do the job yourself or use internal personnel. If you must do the job yourself, take the time to learnthe necessary skills before you begin. Get formal training if at all possible. Invest in the appropriatetools and test equipment to do the job correctly. And make sure you arent violating any building orfire codes.If the wiring is handled by others, you will need to evaluate whether those charged with the task reallyhave the skill to complete the job. Most electricians and telephone technicians are not familiar withdata cabling practices. Worse still, many dont realize this. So, if asked, they will reassure you theycan do the job. If possible, use an installer who has been certified in data cabling. Once you haveidentified a likely candidate, follow up on her references. Ask for the names of some past customersand call those customers. If possible, ask to see some of her work.When planning a project, you should install extra cable whenever feasible. It is much cheaper to pullextra cable as you go than to go back and install new cable or replace a faulty cable. You should alsoconsider technologies that will support higher speeds than you are currently using. For example, if youare using 10-Mbps Ethernet to the desktop, you should install cable that will support 100 Mbps. In thepast it has been a common recommendation to install fiber-optic cables to the desk as well, even if youarent using fiber technologies at the desk at this time. Recent developments with copper cables havemade this more of a judgment call. Certainly, you will want to pull spare fiber to any point yourbackbone may eventually include.If at all feasible, cabling should be certified. This means that each cable is tested to ensure that itmeets appropriate performance standards for the intended application. This can be particularlyimportant for spare cabling. When it is time to use that cable, you dont want any nasty surprises.Adequate documentation is essential. Maintenance will be much simpler if you follow cablingstandards and use one of the more common structured cable schemes. More information can be foundin the sources given in Appendix B. 36
  • 47. 3.1.2 Maintaining Existing CablingFor existing cabling, you wont have as much latitude as with a new installation. You certainly wontwant to go back and replace working cable just because it does not follow some set of standards. Butthere are several things you can do to make your life simpler when you eventually encounter problems.The first step in cable management is knowing which cable is which and where each cable goes.Perhaps the most important tool for the management and troubleshooting of cabling is a good labelmaker. Even if you werent around when the cable was originally installed, you should be able, overtime, to piece together this information. You will also want to collect basic information about eachcable such as cable types and lengths.You will want to know which of your cables dont meet standards. If you have one of the moresophisticated cable testers, you can self-certify your cabling plant. You probably wont want to doeither of these tasks all at once, but you may be able to do a little at a time. And you will definitelywant to do it for any changes or additions you make. Labeling CablesThis should be a self-explanatory topic. Unfortunately for some, this is not the case. I havevery vivid memories of working with a wiring technician with years of experience. Theindividual had worked for major organizations and should have been quite familiar withlabeling practices.We were installing a student laboratory. The laboratory has a switch mounted in a box onthe wall. Cabling went from the box into the wall and then through cable raceways downthe length of the room. Along the raceway, it branched into raceways built into computertables going to the individual computers. The problem should be clear. Once the cabledisappears into the wall and raceways, it is impossible to match the end at the switch withthe corresponding end that emerges at the computer.While going over what needed to be done, I mentioned, needlessly I thought, that the cableshould be clearly labeled. This was just one part of my usual lengthy litany. He thought fora moment and then said, "I guess I can do that." Then a puzzled expression came over hisface and he added in dead earnest, "Which end do you want labeled?" Id like to think hewas just putting me on, but I dont think so.You should use some method of attaching labels that is reasonably permanent. It can bevery discouraging to find several labels lying on the floor beneath your equipment rack.Also, you should use a meaningful scheme for identifying your cables. TIA/EIA-606Administration Standard for Telecommunications Infrastructure of Commercial Buildingsprovides one possibility. (See Appendix B for more information of TIA/EIA standards.)And, at the risk of stating the obvious, unless you can see the entire cable at the same time,it should be labeled at both ends.3.1.3 Testing CablingCable testing can be a simple, quick check for continuity or a complex set of measurements thatcarefully characterizes a cables electrical properties. If you are in a hurry to get up and running, youmay be limited to simple connectivity tests, but the more information you collect, the better prepared 37
  • 48. you will be to deal with future problems. If you must be up quickly, make definite plans to return andfinish the job, and stick to those plans.3.1.3.1 Link lightsPerhaps the simplest test is to rely on the network interfaces link lights. Almost all networkingequipment now has status lights that show, when lit, that you have functioning connections. If these donot light when you make a connection, you definitely have a problem somewhere. Keep in mind,however, a lit link light does not necessarily indicate the absence of a problem.Many devices have additional indicators that give you more information. It is not uncommon to have atransmit light that blinks each time a packet is sent, a receive light that blinks each time a packet isreceived, and a collision light that blinks each time the device detects a collision. To get an idea ofwhat is normal, look at the lights on other computers on the same network.Typically, you would expect to see the receive light blinking intermittently as soon as you connect thedevice to an active network. Generally, anomalous behavior with the receive light indicates a problemsomewhere else on your network. If it doesnt ever light, you may have a problem with yourconnection to the network. For example, you could be plugged into a hub that is not connected to thenetwork. If the light is on all or most of the time, you probably have an overloaded network.The transmit light should come on whenever you access the network but should remain off otherwise.You may be surprised, however, how often a computer will access the network. It will almostcertainly blink several times when your computer is booted. If in doubt, try some basic networkingtasks while watching for activity. If it does not light when you try to access the network, you haveproblems on that computer. If it stays lit, you not only have a problem but also are probably floodingthe network with packets, thereby causing problems for others on the network as well. You may wantto isolate this machine until the problem is resolved.In the ideal network, from the users perspective at least, the collision light should remain relativelyinactive. However, excessive collision light flashing or even one that remains on most of the time maynot indicate a problem. A collision is a very brief event. If the light only remained on for the length ofthe event, the flash would be too brief to be seen. Consequently, these lights are designed to remain onmuch longer than the actual event. A collision light that remains on doesnt necessarily mean that yournetwork is saturated with collisions. On the other hand, this is something youll want to investigatefurther.For any of the cases in which you have an indication of a network overload, unless your network iscompletely saturated, you should be able to get some packets through. And you should see similarproblems on every computer on that network segment. If your network is completely saturated, thenyou may have a malfunctioning device that is continuously transmitting. Usually, this can be easilylocated by turning devices off one at a time until the problem suddenly disappears.If you have an indication of a network overload, you should look at the overall behavior and structureof your network. A good place to start is with netstat as discussed in Chapter 4. For a more thoroughdiscussion of network performance monitoring, turn to Chapter 8. One last word of warning—you may see anomalous behavior with any of these lights if your interface is misconfigured or has the wrong driver installed. 38
  • 49. 3.1.3.2 Cable testersA wide variety of cable testers are available. Typically, you get what you pay for. Some check littlemore than continuity and the cables pin-out (that the individual wires are connected to the appropriatepins). Others are extremely sophisticated and fully characterize the electrical properties of yourcabling. These can easily cost thousands of dollars. Better testers typically consist of a pair of units—the actual tester and a termination device that creates a signal loop. These devices commonly checkthe following:Wire-map (or pin-outs) This checks to see if the corresponding pins on each end of a cable are correctly paired. Failure indicates an improperly terminated cable, such as crossed wires or faulty connections.Near End Cross-Talk (NEXT) This is a measure of how much a signal on one wire interferes with other signals on adjacent wires. High values can indicate improper termination or the wrong type of cable or connectors.Attenuation This measures how much of the original signal is lost over the length of the cable. As this is frequency dependent, this should be done at a number of different frequencies over the range used. It will determine the maximum data rates the cable can support. Problem causes include the wrong cable type, faulty connectors, and excessive lengths.Impedance This is the opposition to changes in current and arises from the resistance and the inductance of the cable. Impedance measurements may be useful for finding an impedance mismatch that may cause reflected signals at the point where cables are joined. It can also be useful in ascertaining whether or not you are using the right type of cable.Attenuation to Cross-talk Ratio (ARC) This is a comparison of signal strength to noise. Values that are too low indicate excessive cable length or poor connections.Capacitance This is the electrical field energy that can be stored in the cable. Anomalous values can indicate problems with the cable such as shorts or broken wires.Length By timing the return of a signal injected onto the cable, the length of a cable can be discovered. This can reveal how much cable is hidden in the walls, allowing you to verify that cable lengths are not exceeding the maximum allowed by the applicable standards. 39
  • 50. The documentation with your cable tester will provide more details in understanding and using thesetests.The better cable testers may be preprogrammed with appropriate values for different types of cable,allowing you to quickly identify parameters that are out of specification. A good tester should alsoallow you to print or upload measurements into a database. This allows you to easily compare resultsover time to identify changes.3.1.3.3 Other cable testsIn general, moving cables around is a poor way to test them. You may jiggle a nearby poor connection,changing the state of the problem. But if you cant afford a cable tester, you may have little choice.If the cable in question is not installed in the wall, you can try to test it by swapping it with a cableknown to be good. However, it is usually better to replace a working cable with a questionable cableand see if things continue to work rather than the other way around. This method is more robust tomultiple failures. You will immediately know the status of the questionable cable. If you replace aquestionable cable with a good cable and you still have problems, you clearly have a problem otherthan the cable. But you dont know if it is just a different problem or an additional problem. Of course,this approach ties up more systems.Remember, electrical connectivity does not equate to network connectivity. Ive seen technicians plugdifferent subnets into the same hub and then wonder why the computers cant communicate.[1] [1] There are also circumstances in which this will work, but mixing subnets this way is an extremely bad idea.3.2 Testing AdaptersWhile most problems with adapters, such as Ethernet cards, are configuration errors, sometimesadapters do fail. Without getting into the actual electronics, there are generally three simple tests youcan make with adapters. However, each has its drawbacks: • If you have some doubts about whether the problem is in the adapter or network, you might try eliminating the bulk of the network from your tests. The easiest approach is to create a two-computer network using another working computer. If you use coaxial cable, simply run a cable known to be good between the computers and terminate each end appropriately. For twisted pair, use a crossover cable, i.e., a patch cable with send and receive crossed. If all is well, the computers should be able to communicate. If they dont, you should have a pretty clear idea of where to look next. The crossover cable approach is analogous to setting up a serial connection using a null modem. You may want to first try this method with two working computers just to verify you are using the right kind of cable. You should also be sure IP numbers and masks are set appropriately on each computer. Clearly, the drawbacks with this approach are shuffling computers around and finding the right cable. But if you have a portable computer available, the shuffling isnt too difficult. 40
  • 51. • A second alternative is to use the configuration and test software provided by the adapters manufacturer. If you bought the adapter as a separate purchase, you probably already have this software. If your adapter came with your computer, you may have to go to the manufacturers web page and download the software. This approach can be helpful, particularly with configuration errors. For example, a combination adapter might be configured for coaxial cable while you are trying to use it with twisted pair. You may be able to change interrupts, DMA channels, memory locations, bus mastering configuration, and framing types with this software. Using diagnostic software has a couple of limitations. First, the software may not check for some problems and may seemingly absolve a faulty card. Second, the software may not be compatible with the operating system you are using. This is particularly likely if you are using something like Linux or FreeBSD on an Intel platform. • The third alternative is to swap the card for one that is known to work. This presumes that you have a spare card or are willing to remove one from another machine. It also presumes that you arent having problems that may damage some other component in the computer or the new card. Even though I generally keep spare cards on hand, I usually leave this test until last whenever possible.3.3 Software Testing with ping Y FLThus far, I have described ways to examine electrical and mechanical problems. The tools described inthis section, ping and its variants, focus primarily on the software problems and the interaction of AMsoftware with hardware. When these tools successfully communicate with remote systems, you haveestablished basic connectivity. Your problem is almost certainly at a higher level in your system. TEWith these tools, you begin with the presumption that your hardware is working correctly. If the linklight is out on the local host, these tools will tell you nothing you dont already know. But if yousimply suspect a hardware problem somewhere on your network, these tools may help you locate theproblem. Once you know the location of the problem, you will use the techniques previouslydescribed to resolve it. These tools can also provide insight when your hardware is marginal or whenyou have intermittent failures.3.3.1 pingWhile there are several useful programs for analyzing connectivity, unquestionably ping is the mostcommonly used program. As it is required by the IP RFC, it is almost always available as part of thenetworking software supplied with any system. In addition, numerous enhanced versions of ping areavailable at little or no cost. There are even web sites that will allow you to run ping from their sites.Moreover, the basic idea has been adapted from IP networks to other protocols. For example, Ciscosimplementation of ping has an optional keyword to check connectivity among routers usingAppleTalk, DECnet, or IPX. ping is nearly universal.ping was written by Mike Muuss.[2] Inspired by echo location, the name comes from sounds sonarmakes. The name ping is frequently described as an acronym for Packet InterNet Groper. But,according to Muusss web page, the acronym was applied to the program after the fact by someoneelse. 41 Team-Fly®
  • 52. [2] For more on the background of ping as well as a review of the book The Story About Ping, an alleged allegory of the ping program, visit Muusss web page at http://ftp.arl.mil/~mike/ping.html.3.3.2 How ping WorksIt is, in essence, a simple program based on a simple idea. (Muuss describes it as a 1000-line hack thatwas completed in about one evening.) One network device sends a request for a reply to anotherdevice and records the time the request was sent. The device receiving the request sends a packet back.When the reply is received, the round-trip time for packet propagation can be calculated. The receiptof a reply indicates a working connection. This elapsed time provides an indication of the length of thepath. Consistency among repeated queries gives an indication of the quality of the connection. Thus,ping answers two basic questions. Do I have a connection? How good is that connection? In thischapter, we will focus on the first question, returning to the second question in the next chapter.Clearly, for the program to work, the networking protocol must support this query/responsemechanism. The ping program is based on Internet Control Message Protocol (ICMP), part of theTCP/IP protocol. ICMP was designed to pass information about network performance betweennetwork devices and exchange error messages. It supports a wide variety of message types, includingthis query/response mechanism.The normal operation of ping relies on two specific ICMP messages, ECHO_REQUEST andECHO_REPLY, but it may respond to ICMP messages other than ECHO_REPLY when appropriate.In theory, all TCP/IP-based network equipment should respond to an ECHO_REQUEST by returningthe packet to the source, but this is not always the case.3.3.2.1 Simple examplesThe default behavior of ping will vary among implementations. Typically, implementations have awide range of command-line options so that the behavior discussed here is generally available. Forexample, implementations may default to sending a single packet, a small number of packets, or acontinuous stream of packets. They may respond with a set of round-trip transmission times or with asimple message. The version of ping that comes with the Solaris operating system sends, by default, asingle ICMP packet. It responds that the destination is alive or that no answer was received. In thisexample, an ECHO_REPLY was received:sol1# ping 205.153.63.30205.153.63.30 is alivesol1#In this example, no response was received before the program timed out:sol1# ping www.microsoft.comno answer from microsoft.comsol1#Note that ping can be used with an IP number or with a hostname, as shown by these examples.Other implementations will, by default, repeatedly send ECHO_REQUESTs until interrupted.FreeBSD is an example:bsd1# ping www.bay.comPING www.bay.com (204.80.244.66): 56 data bytes64 bytes from 204.80.244.66: icmp_seq=0 ttl=112 time=180.974 ms 42
  • 53. 64 bytes from 204.80.244.66: icmp_seq=1 ttl=112 time=189.810 ms64 bytes from 204.80.244.66: icmp_seq=2 ttl=112 time=167.653 ms^C--- www.bay.com ping statistics ---3 packets transmitted, 3 packets received, 0% packet lossround-trip min/avg/max/stddev = 167.653/179.479/189.810/9.107 msbsd1#The execution of the program was interrupted with a Ctrl-C, at which point the summary statisticswere printed. Without an interrupt, the program will continue indefinitely. With the appropriatecommand-line option, -s, similar output can be obtained with Solaris.3.3.2.2 Interpreting resultsBefore I go into the syntax of ping and the ways it might be used, it is worth getting a clearunderstanding of what results might be returned by ping. The simplest results are seen with Solaris, amessage simply stating, in effect, that the reply packet was received or was not received. WithFreeBSD, we receive a great deal more information. It repeatedly sends packets and reports results foreach packet, as well as providing a summary of results. In particular, for each packet we are given thesize and source of each packet, an ICMP sequence number, a Time-To-Live (TTL) count, and theround-trip times. (The TTL field is explained later.) Of these, the sequence number and round-triptimes are the most revealing when evaluating basic connectivity.When each ECHO_REQUEST packet is sent, the time the packet is sent is recorded in the packet.This is copied into the corresponding ECHO_REPLY packet by the remote host. When anECHO_REPLY packet is received, the elapsed time is calculated by comparing the current time to thetime recorded in the packet, i.e., the time the packet was sent. This difference, the elapsed time, isreported, along with the sequence number and the TTL, which comes from the packets header. If noECHO_REPLY packet is received that matches a particular sequence number, that packet is presumedlost. The size and the variability of elapsed times will depend on the number and speed of intermediatelinks as well as the congestion on those links.An obvious question is "What values are reasonable?" Typically, this is highly dependent on thenetworks you cross and the amount of activity on those networks. For example, these times are takenfrom a PPP link with a 28.8-Kbps modem:64 bytes from 205.153.60.42: icmp_seq=0 ttl=30 time=225.620 ms64 bytes from 205.153.60.42: icmp_seq=1 ttl=30 time=213.652 ms64 bytes from 205.153.60.42: icmp_seq=2 ttl=30 time=215.306 ms64 bytes from 205.153.60.42: icmp_seq=3 ttl=30 time=194.782 ms64 bytes from 205.153.60.42: icmp_seq=4 ttl=30 time=199.562 ms...The following times were for the same link only moments later:64 bytes from 205.153.60.42: icmp_seq=0 ttl=30 time=1037.367 ms64 bytes from 205.153.60.42: icmp_seq=1 ttl=30 time=2119.615 ms64 bytes from 205.153.60.42: icmp_seq=2 ttl=30 time=2269.448 ms64 bytes from 205.153.60.42: icmp_seq=3 ttl=30 time=2209.715 ms64 bytes from 205.153.60.42: icmp_seq=4 ttl=30 time=2493.881 ms...There is nothing wrong here. The difference is that a file download was in progress on the link duringthe second set of measurements. 43
  • 54. In general, you can expect very good times if you are staying on a LAN. Typically, values should bewell under 100 ms and may be less than 10 ms. Once you move onto the Internet, values may increasedramatically. A coast-to-coast, round-trip time will take at least 60 ms when following a mythicalstraight-line path with no congestion. For remote sites, times of 200 ms may be quite good, and timesup to 500 ms may be acceptable. Much larger times may be a cause for concern. Keep in mind theseare very rough numbers.You can also use ping to calculate a rough estimate of the throughput of a connection. (Throughputand related concepts are discussed in greater detail in Chapter 4.) Send two packets with differentsizes across the path of interest. This is done with the -s option, which is described later in this chapter.The difference in times will give an idea of how much longer it takes to send the additional data in thelarger packet. For example, say it takes 30 ms to ping with 100 bytes and 60 ms with 1100 bytes. Thus,it takes an additional 30 ms round trip or 15 ms in one direction to send the additional 1000 bytes or8000 bits. The throughput is roughly 8000 bits per 15 ms or 540,000 bps. The difference between twomeasurements is used to eliminate overhead. This is extremely crude. It makes no adjustment for othertraffic and gives a composite picture for all the links on a path. Dont try to make too much out ofthese numbers.It may seem that the TTL field could be used to estimate the number of hops on a path. Unfortunately,this is problematic. When a packet is sent, the TTL field is initialized and is subsequently decrementedby each router along the path. If it reaches zero, the packet is discarded. This imposes a finite lifetimeon all packets, ensuring that, in the event of a routing loop, the packet wont remain on the networkindefinitely. Unfortunately, the TTL field may or may not be reset at the remote machine and, if reset,there is little consistency in what it is set to. Thus, you need to know very system-specific informationto use the TTL field to estimate the number of hops on a path.A steady stream of replies with reasonably consistent times is generally an indication of a healthyconnection. If packets are being lost or discarded, you will see jumps in the sequence numbers, themissing numbers corresponding to the lost packets. Occasional packet loss probably isnt an indicationof any real problem. This is particularly true if you are crossing a large number of routers or anycongested networks. It is particularly common for the first packet in a sequence to be lost or have amuch higher elapsed time. This behavior is a consequence of the need to do ARP resolution at eachlink along the path for the first packet. Since the ARP data is cached, subsequent packets do not havethis overhead. If, however, you see a large portion of the packets being lost, you may have a problemsomewhere along the path.The program will also report duplicate and damaged packets. Damaged packets are a cause for realconcern. You will need to shift into troubleshooting mode to locate the source of the problem. Unlessyou are trying to ping a broadcast address, you should not see duplicate packets. If your computers areconfigured to respond to ECHO_REQUESTs sent to broadcast addresses, you will see lots ofduplicate packets. With normal use, however, duplicate responses could indicate a routing loop.Unfortunately, ping will only alert you to the problem; its underlying mechanism cannot explain thecause of such problems.In some cases you may receive other ICMP error messages. Typically from routers, these can be veryinformative and helpful. For example, in the following, an attempt is made to reach a device on anonexistent network:bsd1# ping 172.16.4.1PING 172.16.4.1 (172.16.4.1): 56 data bytes36 bytes from 172.16.2.1: Destination Host UnreachableVr HL TOS Len ID Flg off TTL Pro cks Src Dst 4 5 00 5400 5031 0 0000 fe 01 0e49 172.16.2.13 172.16.4.1 44
  • 55. 36 bytes from 172.16.2.1: Destination Host UnreachableVr HL TOS Len ID Flg off TTL Pro cks Src Dst 4 5 00 5400 5034 0 0000 fe 01 0e46 172.16.2.13 172.16.4.1^C--- 172.16.4.1 ping statistics ---2 packets transmitted, 0 packets received, 100% packet lossSince the router has no path to the network, it returns the ICMPDESTINATION_HOST_UNREACHABLE message. In general, you will receive a DestinationHost Unreachable warning or a Destination Network Unreachable warning if theproblem is detected on the machine where ping is being run. If the problem is detected on a devicetrying to forward a packet, you will receive only a Destination Host Unreachable warning.In the next example, an attempt is being made to cross a router that has been configured to deny trafficfrom the source:bsd1# ping 172.16.3.10PING 172.16.3.10 (172.16.3.10): 56 data bytes36 bytes from 172.16.2.1: Communication prohibited by filterVr HL TOS Len ID Flg off TTL Pro cks Src Dst 4 5 00 5400 5618 0 0000 ff 01 0859 172.16.2.13 172.16.3.1036 bytes from 172.16.2.1: Communication prohibited by filterVr HL TOS Len ID Flg off TTL Pro cks Src Dst 4 5 00 5400 561b 0 0000 ff 01 0856 172.16.2.13 172.16.3.10^C--- 172.16.3.10 ping statistics ---2 packets transmitted, 0 packets received, 100% packet lossThe warning Communication prohibited by filter indicates the packets are beingdiscarded. Be aware that you may be blocked by filters without seeing this message. Consider thefollowing example:bsd1# ping 172.16.3.10PING 172.16.3.10 (172.16.3.10): 56 data bytes^C--- 172.16.3.10 ping statistics ---6 packets transmitted, 0 packets received, 100% packet lossThe same filter was used on the router, but it was applied to traffic leaving the network rather thaninbound traffic. Hence, no messages were sent. Unfortunately, ping will often be unable to tell youwhy a packet is unanswered.While these are the most common ICMP messages you will see, ping may display a wide variety ofmessages. A listing of ICMP messages can be found in RFC 792. A good discussion of the morecommon messages can be found in Eric A. Halls Internet Core Protocols: The Definitive Guide. MostICMP messages are fairly self-explanatory if you are familiar with TCP/IP.3.3.2.3 OptionsA number of options are generally available with ping. These vary considerably from implementationto implementation. Some of the more germane options are described here. 45
  • 56. Several options control the number of or the rate at which packets are sent. The -c option will allowyou to specify the number of packets you want to send. For example, ping -c10 would send 10 packetsand stop. This can be very useful if you are running ping from a script.The commands -f and -l are used to flood packets onto a network. The -f option says that packetsshould be sent as fast as the receiving host can handle them. This can be used to stress-test a link or toget some indication of the comparative performance of interfaces. In this example, the program is runfor about 10 seconds on each of two different destinations:bsd1# ping -f 172.16.2.12PING 172.16.2.12 (172.16.2.12): 56 data bytes..^C--- 172.16.2.12 ping statistics ---27585 packets transmitted, 27583 packets received, 0% packet lossround-trip min/avg/max/stddev = 0.303/0.310/0.835/0.027 msbsd1# ping -f 172.16.2.20PING 172.16.2.20 (172.16.2.20): 56 data bytes.^C--- 172.16.2.20 ping statistics ---5228 packets transmitted, 5227 packets received, 0% packet lossround-trip min/avg/max/stddev = 1.535/1.736/6.463/0.363 msIn the first case, the destination was a 200-MHz Pentium with a PCI adapter. In the second, thedestination was a 50-MHz 486 with an ISA adapter. It is not surprising that the first computer wasmore than five times faster. But remember, it may not be clear whether the limiting factor is the sourceor the receiver unless you do multiple tests. Clearly, use of this option could cripple a host.Consequently, the option requires root privileges to run and may not be included in someimplementations.The -l option takes a count and sends out that many packets as fast as possible. It then falls back tonormal mode. This could be used to see how the router handles a flood of packets. Use of thiscommand is also restricted to root.The -i option allows the user to specify the amount of time in seconds to wait between sendingconsecutive packets. This could be a useful way to space out packets for extended runs or for use withscripts. In general, the effect of an occasional ping packet is negligible when compared to the trafficalready on all but the slowest of links. Repeated packets or packet flooding can, however, addconsiderably to traffic and congestion. For that reason, you should be very circumspect in using any ofthese options (and perhaps ping in general).The amount and form of the data can be controlled to a limited extent. The -n option restricts output tonumeric form. This is useful if you are having DNS problems. Implementations also typically includeoptions for more detailed output, typically -v for verbose output, and for fewer details, typically -q and-Q for quiet output.The amount and nature of the data in the frame can be controlled using the -s and -p options. Thepacket size option, -s, allows you to specify how much data to send. If set too small, less than 8, therewont be space in the packet for a timestamp. Setting the packet size can help in diagnosing a problemcaused by path Maximum Transmission Unit (MTU) settings (the largest frame size that can be sent onthe path) or fragmentation problems. (Fragmentation is dividing data among multiple frames when asingle packet is too large to cross a link. It is handled by the IP portion of the protocol stack.) Thegeneral approach is to increase packet sizes up to the maximum allowed to see if at some point youhave problems. When this option isnt used, ping defaults to 64 bytes, which may be too small a 46
  • 57. packet to reveal some problems. Also remember that ping does not count the IP or ICMP header in thespecified length so your packets will be 28 bytes larger than you specify.You could conceivably see MTU problems with protocols, such as PPP, that use escaped characters aswell.[3] With escaped characters, a single character may be replaced by two characters. The expansionof escaped characters increases the size of the data frame and can cause problems with MTUrestrictions or fragmentation. [3] Generally there are better ways to deal with problems with PPP. For more information, see Chapter 15 in Using and Managing PPP, by Andrew Sun.The -p option allows you to specify a pattern for the data included within the packet after thetimestamp. You might use this if you think you have data-dependent problems. The FreeBSDmanpage for ping notes that this sort of problem might show up if you lack sufficient "transitions" inyour data, i.e., your data is all or almost all ones or all or almost all zeros. Some serial links areparticularly vulnerable to this sort of problem.There are a number of other options not discussed here. These provide control over what interfaces areused, the use of multicast packets, and so forth. The flags presented here are from FreeBSD and arefairly standard. Be aware, however, that different implementations may use different flags for theseoptions. Be sure to consult your documentation if things dont work as expected.3.3.2.4 Using pingTo isolate problems using ping, you will want to run it repeatedly, changing your destination addressso that you work your way through each intermediate device to your destination. You should beginwith your loopback interface. Use either localhost or 127.0.0.1. Next, ping your interface by IPnumber. (Run ifconfig -a if in doubt.) If either of these fails, you know that you have a problem withthe host.Next, try a host on a local network that you know is operational. Use its IP address rather than itshostname. If this fails, there are several possibilities. If other hosts are able to communicate on thelocal network, then you likely have problems with your connection to the network. This could be yourinterface, the cable to your machine, or your connection to a hub or switch. Of course, you cant ruleout configuration errors such as media type on the adapter or a bad IP address or mask.Next, try to reach the same host by name rather than number. If this fails, you almost certainly haveproblems with name resolution. Even if you have this problem, you can continue using ping to checkyour network, but you will need to use IP addresses.Try reaching the near and far interfaces of your router. This will turn up any basic routing problemsyou may have on your host or connectivity problems getting to your router.If all goes well here, you are ready to ping remote computers. (You will need to know the IP addressof the intermediate devices to do this test. If in doubt, read the section on traceroute in the nextchapter.) Realize, of course, that if you start having failures at this point, the problem will likely liebeyond your router. For example, your ICMP ECHO_REQUEST packets may reach the remotemachine, but it may not have a route to your machine to use for the ICMP ECHO_REPLY packets.When faced with failure at this point, your response will depend on who is responsible for themachines beyond your router. If this is still part of your network, you will want to shift your tests tomachines on the other side of the router and try to work in both directions. 47
  • 58. If these machines are outside your responsibility or control, you will need to enlist the help of theappropriate person. Before you contact this person, however, you should collect as much informationas you can. There are three things you may want to do. First, go back to using IP numbers if you havebeen using names. As said before, if things start working, you have a name resolution problem.Second, if you were trying to ping a device several hops beyond your router, go back to closermachines and try to zero in on exactly where you first encountered the problem.Finally, be sure to probe from more than one machine. While you may have a great deal of confidencein your local machine at this point, your discussion with the remote administrator may go much moresmoothly if you can definitely say that you are seeing this problem from multiple machines instead ofjust one. In general, this stepwise approach is the usual approach for this type of problem.Sometimes, you may be more interested in investigating connectivity over time. For example, youmight have a connection that seems to come and go. By running ping in the background or from ascript, you may be able to collect useful information. For example, with some routing protocols,updates have a way of becoming synchronized, resulting in periodic loading on the network. If yousee increased delays, for example every 30 seconds, you might be having that sort of problem. Or, ifyou lose packets every time someone uses the elevator, you might look at the path your cable takes.If you are looking at performance over a long period of time, you will almost certainly want to use the-i option to space out your packets in a more network- friendly manner. This is a reasonable approachto take if you are experiencing occasional outages and need to document the time and duration of theoutages. You should also be aware that over extended periods of time, you may see changes in thepaths the packets follow.3.3.3 Problems with pingUp to this point, I have been describing how ping is normally used. I now describe some of thecomplications faced when using ping.First, the program does not exist in isolation, but depends on the proper functioning of other elementsof the network. In particular, ping usually depends upon ARP and DNS. As previously noted, if youare using a hostname rather than an IP address as your destination, the name of the host will have to beresolved before ping can send any packets. You can bypass DNS by using IP addresses.It is also necessary to discover the hosts link-level address for each host along the path to thedestination. Although this is rarely a problem, should ARP resolution fail, then ping will fail. Youcould avoid this problem, in part, by using static ARP entries to ensure that the ARP table is correct. Amore common problem is that the time reported by ping for the first packet sent will often be distortedsince it reflects both transit times and ARP resolution times. On some networks, the first packet willoften be lost. You can avoid this problem by sending more than one packet and ignoring the results forthe first packet.The correct operation of your network will depend on considerations that do not affect ping. In suchsituations, ping will work correctly, but you will still have link problems. For example, if there areproblems with the configuration of the path MTU, smaller ping packets may zip through the networkwhile larger application packets may be blocked. S. Lee Henry described a problem in which shecould ping remote systems but could not download web pages.[4] While her particular problem washighly unusual, it does point out that a connection can appear to be working, but still have problems. 48
  • 59. [4] "Systems Administration: You Cant Get There from Here," Server/Workstation Expert, May 1999. This article can be found in PDF format at http://sw.expert.com/C4/SE.C4.MAY.99.pdf.The opposite can be true as well. Often ping will fail when the connection works for other uses. Forvarious reasons, usually related to security, some system administrators may block ICMP packets ingeneral or ECHO_REQUEST packets in particular. Moreover, this practice seems to be increasing.Ive even seen a site block ping traffic at its DNS server.3.3.3.1 Security and ICMPUnfortunately, ping in particular, and ICMP packets in general, have been implicated in several recentdenial-of-service attacks. But while these attacks have used ping, they are not inherently problemswith ping. Nonetheless, network administrators have responded as though ping was the problem (or atleast the easiest way to deal with the problem), and this will continue to affect how and even if pingcan be used in some contexts.3.3.3.2 Smurf AttacksIn a Smurf Attack, ICMP ECHO_REQUEST packets are sent to the broadcast address of a network.Depending on how hosts are configured on the network, some may attempt to reply to theECHO_REQUEST. The resulting flood of responses may degrade the performance of the network,particularly at the destination host.With this attack, there are usually three parties involved—the attacker who generates the originalrequest; an intermediary, sometimes called a reflector or multiplier, that delivers the packet onto thenetwork; and the victim. The attacker uses a forged source address so that the ECHO_REPLY packetsare returned, not to the attacker, but to a "spoofed" address, i.e., the victim. The intermediary may beeither a router or a compromised host on the destination network.Because there are many machines responding to a single request, little of the attackers bandwidth isused, while much of the victims bandwidth may be used. Attackers have developed tools that allowthem to send ECHO_REQUESTs to multiple intermediaries at about the same time. Thus, the victimwill be overwhelmed by ECHO_REPLY packets from multiple sources. Notice also that congestion isnot limited to just the victim but may extend through its ISP all the way back to the intermediariesnetworks.The result of these attacks is that many sites are now blocking ICMP ECHO_REQUEST traffic intotheir network. Some have gone as far as to block all ICMP traffic. While understandable, this is not anappropriate response. First, it blocks legitimate uses of these packets, such as checking basicconnectivity. Second, it may not be effective. In the event of a compromised host, theECHO_REQUEST may originate within the network. At best, blocking pings is only a temporarysolution.A more appropriate response requires taking several steps. First, you should configure your routers sothey will not forward broadcast traffic onto your network from other networks. How you do this willdepend on the type of router you have, but solutions are available from most vendors.Second, you may want to configure your hosts so they do not respond to ECHO_REQUESTs sent tobroadcast addresses. It is easy to get an idea of which hosts on your network respond to thesebroadcasts. First, examine your ARP table, then ping your broadcast address, and then look at yourARP table again for new entries.[5] 49
  • 60. [5] At one time, you could test your site by going to http://www.netscan.org, but this site seems to have disappeared.Finally, as a good network citizen, you should install filters on your access router to prevent packetsthat have a source address not on your network from leaving your network. This limits not only SmurfAttacks but also other attacks based on spoofed addresses from originating on your network. Thesefilters should also be applied to internal routers as well as access routers. (This assumes you areproviding forwarding for other networks!)If you follow these steps, you should not have to disable ICMP traffic. For more information on SmurfAttacks, including information on making these changes, visit http://www.cert.org/advisories/CA-1998-01.html. You might also look at RFC 2827.3.3.3.3 Ping of DeathThe specifications for TCP/IP have a maximum packet size of 65536 octets or bytes. Unfortunately,some operating systems behave in unpredictable ways if they receive a larger packet. Systems mayhang, crash, or reboot. With a Ping of Death (or Ping o Death) Attack, the packet size option for pingis used to send a slightly oversized packet to the victims computer. For example, on some oldermachines, the command ping -s 65510 172.16.2.1 (use -l rather than -s on old Windows systems) willsend a packet, once headers are added, that causes this problem to the host 172.16.2.1. (Admittedly, Ihave some misgivings about giving an explicit command, but this has been widely published andsome of you may want to test your systems.)This is basically an operating system problem. Large packets must be fragmented when sent. Thedestination will put the pieces in a buffer until all the pieces have arrived and the packet can bereassembled. Some systems simply dont do adequate bounds checking, allowing memory to betrashed.Again, this is not really a problem with ping. Any oversized packet, whether it is an ICMP packet,TCP packet, or UDP packet, will cause the same problem in susceptible operating systems. (Even IPXhas been mentioned.) All ping does is supply a trivial way to exploit the problem. The correct way todeal with this problem is to apply the appropriate patch to your operating system. Blocking ICMPpackets at your router will not protect you from other oversized packets. Fortunately, most systemshave corrected this problem, so you are likely to see it only if you are running older systems.[6] [6] For more information on this attack, see http://www.cert.org/advisories/CA-1996-26.html.3.3.3.4 Other problemsOf course, there may be other perceived problems with ping. Since it can be used to garnerinformation about a network, it can be seen as a threat to networks that rely on security throughobscurity. It may also be seen as generating unwanted or unneeded traffic. For these and previouslycited reasons, ICMP traffic is frequently blocked at routers.Blocking is not the only difficulty that routers may create. Routers may assign extremely lowpriorities to ICMP traffic rather than simply block such traffic. This is particularly true for routersimplementing quality of service protocols. The result can be much higher variability in traffic patterns.Network Address Translation (NAT) can present other difficulties. Ciscos implementation has therouter responding to ICMP packets for the first address in the translation pool regardless of whether itis being used. This might not be what you would have expected. 50
  • 61. In general, blocking ICMP packets, even just ECHO_REQUEST packets, is not desirable. You lose avaluable source of information about your network and inconvenience users who may have alegitimate need for these messages. This is often done as a stopgap measure in the absence of a morecomprehensive approach to security.Interestingly, even if ICMP packets are being blocked, you can still use ping to see if a host on thelocal subnet is up. Simply clear the ARP table (typically arp -ad), ping the device, and then examinethe ARP table. If the device has been added to the ARP table, it is up and responding.One final note about ping. It should be obvious, but ping checks only connectivity, not thefunctionality of the end device. During some network changes, I once used ping to check to see if anetworked printer had been reconnected yet. When I was finally able to ping the device, I sent a job tothe printer. However, my system kept reporting that the job hadnt printed. I eventually got up andwalked down the hall to the printer to see what was wrong. It had been reconnected to the network,but someone had left it offline. Be warned, it is very easy to read too much into a successful ping.3.3.4 Alternatives to pingVariants to ping fall into two general categories, those that add to pings functionality and those thatare alternatives to ping. An example of the first is fping, and an example of the second is echoping.3.3.4.1 fping Y FLWritten by Roland Schemers of Stanford University, fping extends ping to support multiple hosts inparallel. Typical output is shown in this example: AMbsd1# fping 172.16.2.10 172.16.2.11 172.16.2.12 172.16.2.13 172.16.2.14172.16.2.13 is alive172.16.2.10 is alive172.16.2.12 is alive TE172.16.2.14 is unreachable172.16.2.11 is unreachableNotice that five hosts are being probed at the same time and that the results are reported in the orderreplies are received.This works the same way ping works, through sending and receiving ICMP messages. It is primarilydesigned to be used with files. Several command-line options are available, including the -f option forreading a list of devices to probe from a file and the -u option used to print only those systems that areunreachable. For example:bsd1# fping -u 172.16.2.10 172.16.2.11 172.16.2.12 172.16.2.13 172.16.2.14172.16.2.14172.16.2.11The utility of this form in a script should be self-evident.3.3.4.2 echopingSeveral tools similar to ping dont use ICMP ECHO_REQUEST and ECHO_REPLY packets. Thesemay provide an alternative to ping in some contexts. 51 Team-Fly®
  • 62. One such program is echoping. It is very similar to ping. It works by sending packets to one of severalservices that may be offered over TCP and UDP—ECHO, DISCARD, CHARGEN, and HTTP.Particularly useful when ICMP messages are being blocked, echoping may work where ping fails.If none of these services is available, echoping cannot be used. Unfortunately, ECHO and CHARGENhave been used in the Fraggle denial of service attacks. By sending the output from CHARGEN (acharacter-generation protocol) to ECHO, the network can be flooded. Consequently, many operatingsystems are now shipped with these services disabled. Thus, the program may not be as useful as ping.With Unix, these services are controlled by inetd and could be enabled if desired and if you haveaccess to the destination machine. But these services have limited value, and you are probably betteroff disabling them.In this example, I have previously enabled ECHO on lnx1:bsd1# echoping -v lnx1This is echoping, version 2.2.0.Trying to connect to internet address 205.153.61.177 to transmit 256 bytes...Connected...Sent (256 bytes)...256 bytes read from server.CheckedElapsed time: 0.004488 secondsThis provides basically the same information as ping. The -v option simply provides a few moredetails. The program defaults to TCP and ECHO. Command-line options allow UDP packet or theother services to be selected.When ping was first introduced in this chapter, we saw that www.microsoft.com could not be reachedby ping. Nor can it be reached using echoping in its default mode. But, as a web server, port 80 shouldbe available. This is in fact the case:bsd1# echoping -v -h /ms.htm www.microsoft.com:80This is echoping, version 2.2.0.Trying to connect to internet address 207.46.130.14 (port 80) to transmit 100bytes...Connected...Sent (100 bytes)...2830 bytes read from server.Elapsed time: 0.269319 secondsClearly, Microsoft is blocking ICMP packets. In this example, we could just as easily have turned toour web browser. Sometimes, however, this is not the case.An obvious question is "Why would you need such a tool?" If you have been denied access to anetwork, should you be using such probes? On the other hand, if you are responsible for the securityof a network, you may want to test your configuration. What can users outside your network discoverabout your network? If this is the case, youll need these tools to test your network.3.3.4.3 arping 52
  • 63. Another interesting and useful variant of ping is arping. arping uses ARP requests and replies insteadof ICMP packets. Here is an example:bsd2# arping -v -c3 00:10:7b:66:f7:62This box: Interface: ep0 IP: 172.16.2.236 MAC address: 00:60:97:06:22:22ARPING 00:10:7b:66:f7:6260 bytes from 172.16.2.1 (00:10:7b:66:f7:62): icmp_seq=060 bytes from 172.16.2.1 (00:10:7b:66:f7:62): icmp_seq=160 bytes from 172.16.2.1 (00:10:7b:66:f7:62): icmp_seq=2--- 00:10:7b:66:f7:62 statistics ---3 packets transmitted, 3 packets received, 0% unanswered2 packets transmitted, 2 packets received, 0% unansweredIn this case, Ive used the MAC address, but the IP address could also be used. The -v option is forverbose, while -c3 limits the run to three probes. Verbose doesnt really add a lot to the default output,just the first line identifying the source. If you just want the packets sent, you can use the -q, or quiet,option.This tool has several uses. First, it is a way to find which IP addresses are being used. It can also beused to work backward, i.e., to discover IP addresses given MAC addresses. For example, if you havecaptured non-IP traffic (e.g., IPX, etc.) and you want to know the IP address for the traffics source,you can use arping with the MAC address. If you just want to check connectivity, arping is also auseful tool. Since ARP packets wont be blocked, this should work even when ICMP packets areblocked. You could also use this tool to probe for ARP entries in a router. Of course, due to the natureof ARP, there is not a lot that this tool can tell you about devices not on the local network.3.3.4.4 Other programsThere are other programs that can be used to check connectivity. Two are described later in this book.nmap is described in Chapter 6, and hping is described in Chapter 9. Both are versatile tools that canbe used for many purposes.A number of ping variants and extended versions of ping are also available, both freely andcommercially. Some extend pings functionality to the point that the original functionality seems littlemore than an afterthought. Although only a few examples are described here, dont be fooled intobelieving that these are all there are. A casual web search should turn up many, many more.Finally, dont forget the obvious. If you are interested in checking only basic connectivity, you canalways try programs like telnet or your web browser. While this is generally not a recommendedapproach, each problem is different, and you should use whatever works. (For a discussion of theproblems with this approach, see Using Applications to Test Connectivity.) Using Applications to Test ConnectivityOne all-too-common way of testing a new installation is to see if networking applicationsare working. The cable is installed and connected, the TCP/IP stack is configured, and thena web browser is started to see if the connection is working. If you can hit a couple of websites, then everything is alright and no further testing is needed.This is understandably an extremely common way to test a connection. It can be particularlygratifying to see a web page loading on a computer you have just connected to your 53
  • 64. network. But it is also an extremely poor way to test a connection.One problem is that the software stack you use to test the connection is designed to hideproblems from users. If a packet is lost, the stack will transparently have the lost packetresent without any indication to the user. You could have a connection that is losing 90% ofits packets. The problem would be immediately obvious when using ping. But with mostapplications, this would show up only as a slow response. Other problems include locallycached information or the presence of proxy servers on the network.Unfortunately, web browsers seem to be the program of choice for testing a connection.This, of course, is the worst possible choice. The webs slow response is an accepted fact oflife. What technician is going to blame a slow connection on his shoddy wiring when thealternative is to blame the slow connection on the Web? What technician would evenconsider the possibility that a slow web response is caused by a cable being too close to afluorescent light?The only thing testing with an application will really tell you is whether a connection istotally down. If you want to know more than that, you will have to do real testing.3.4 Microsoft WindowsThe various versions of Windows include implementations of ping. With the Microsoftimplementation, there are a number of superficial differences in syntax and somewhat lessfunctionality. Basically, however, it works pretty much as you might expect. The default is to sendfour packets, as shown in the two following examples. In the first, we successfully ping the hostwww.cabletron.com:C:>ping www.cabletron.comPinging www.cabletron.com [204.164.189.90] with 32 bytes of data:Reply from 204.164.189.90: bytes=32 time=100ms TTL=239Reply from 204.164.189.90: bytes=32 time=100ms TTL=239Reply from 204.164.189.90: bytes=32 time=110ms TTL=239Reply from 204.164.189.90: bytes=32 time=90ms TTL=239C:>In the next example, we are unable to reach www.microsoft.com for reasons previously explained:C:>ping www.microsoft.comPinging microsoft.com [207.46.130.149] with 32 bytes of data:Request timed out.Request timed out.Request timed out.Request timed out.Note that this is run in a DOS window. If you use ping without an argument, you will get a descriptionof the basic syntax and a listing of the various options: 54
  • 65. C:>pingUsage: ping [-t] [-a] [-n count] [-l size] [-f] [-i TTL] [-v TOS] [-r count] [-s count] [[-j host-list] | [-k host-list]] [-w timeout] destination-listOptions: -t Ping the specifed host until interrupted. -a Resolve addresses to hostnames. -n count Number of echo requests to send. -l size Send buffer size. -f Set Dont Fragment flag in packet. -i TTL Time To Live. -v TOS Type Of Service. -r count Record route for count hops. -s count Timestamp for count hops. -j host-list Loose source route along host-list. -k host-list Strict source route along host-list. -w timeout Timeout in milliseconds to wait for each reply.Notice that the flooding options, fortunately, are absent and that the -t option is used to get an outputsimilar to that used in most of our examples. The implementation does not provide a summary at theend, however.In addition to Microsofts implementation of ping, numerous other versions—as well as more generictools or toolkits that include a ping-like utility—are available. Most are free or modestly priced.Examples include tjping, trayping, and winping, but many more are available, including someinteresting variations. For example, trayping monitors a connection in the background. It displays asmall heart in the system tray as long as the connection is up. As availability changes frequently, ifyou need another version of ping, search the Web. 55
  • 66. Chapter 4. Path CharacteristicsIn the last chapter, we attempted to answer a fundamental question, "Do we have a working networkconnection?" We used tools such as ping to verify basic connectivity. But simple connectivity is notenough for many purposes. For example, an ISP can provide connectivity but not meet your needs orexpectations. If your ISP is not providing the level of service you think it should, you will needsomething to base your complaints on. Or, if the performance of your local network isnt adequate,you will want to determine where the bottlenecks are located before you start implementing expensiveupgrades. In this chapter, we will try to answer the question, "Is our connection performingreasonably?"We will begin by looking at ways to determine which links or individual connections compose a path.This discussion focuses on the tool traceroute. Next, we will turn to several tools that allow us toidentify those links along a path that might cause problems. Once we have identified individual linksof interest, we will examine some simple ways to further characterize the performance of those links,including estimating the bandwidth of a connection and measuring the available throughput.4.1 Path Discovery with tracerouteThis section describes traceroute, a tool used to discover the links along a path. While this is the firststep in investigating a paths behavior and performance, it is useful for other tasks as well. In theprevious discussion of ping, it was suggested that you work your way, hop by hop, toward a deviceyou cant reach to discover the point of failure. This assumes that you know the path.Path discovery is also an essential step in diagnosing routing problems. While you may fullyunderstand the structure of your network and know what path you want your packets to take throughyour network, knowing the path your packets actually take is essential information and may come as asurprise.Once packets leave your network, you have almost no control over the path they actually take to theirdestination. You may know very little about the structure of adjacent networks. Path discovery canprovide a way to discover who their ISP is, how your ISP is connected to the world, and otherinformation such as peering arrangements. traceroute is the tool of choice for collecting this kind ofinformation.The traceroute program was written by Van Jacobson and others. It is based on a clever use of theTime-To-Live (TTL) field in the IP packets header. The TTL field, described briefly in the last chapter,is used to limit the life of a packet. When a router fails or is misconfigured, a routing loop or circularpath may result. The TTL field prevents packets from remaining on a network indefinitely should sucha routing loop occur. A packets TTL field is decremented each time the packet crosses a router on itsway through a network. When its value reaches 0, the packet is discarded rather than forwarded. Whendiscarded, an ICMP TIME_EXCEEDED message is sent back to the packets source to inform thesource that the packet was discarded. By manipulating the TTL field of the original packet, theprogram traceroute uses information from these ICMP messages to discover paths through a network.traceroute sends a series of UDP packets with the destination address of the device you want a pathto.[1] By default, traceroute sends sets of three packets to discover each hop. traceroute sets the TTL 56
  • 67. field in the first three packets to a value of 1 so that they are discarded by the first router on the path.When the ICMP TIME_EXCEEDED messages are returned by that router, traceroute records thesource IP address of these ICMP messages. This is the IP address of the first hop on the route to thedestination. [1] tracert, a Windows variant of traceroute, uses ICMP rather than UDP. tracert is discussed later in this chapter.Next, three packets are sent with their TTL field set to 2. These will be discarded by the second routeron the path. The ICMP messages returned by this router reveal the IP address of the second router onthe path. The program proceeds in this manner until a set of packets finally has a TTL value largeenough so that the packets reach their destination.Typically, when the probe packets finally have an adequate TTL and reach their destination, they willbe discarded and an ICMP PORT_UNREACHABLE message will be returned. This happens becausetraceroute sends all its probe packets with what should be invalid port numbers, i.e., port numbers thatarent usually used. To do this, traceroute starts with a very large port number, typically 33434, andincrements this value with each subsequent packet. Thus, each of the three packets in a set will havethree different unlikely port numbers. The receipt of ICMP PORT_UNREACHABLE messages is thesignal that the end of the path has been reached. Here is a simple example of using traceroute:bsd1# traceroute 205.160.97.122traceroute to 205.160.97.122 (205.160.97.122), 30 hops max, 40 byte packets 1 205.153.61.1 (205.153.61.1) 1.162 ms 1.068 ms 1.025 ms 2 cisco (205.153.60.2) 4.249 ms 4.275 ms 4.256 ms 3 165.166.36.17 (165.166.36.17) 4.433 ms 4.521 ms 4.450 ms 4 e0.r01.ia-gnwd.Infoave.Net (165.166.36.33) 5.178 ms 5.173 ms 5.140 ms 5 165.166.125.165 (165.166.125.165) 13.171 ms 13.277 ms 13.352 ms 6 165.166.125.106 (165.166.125.106) 18.395 ms 18.238 ms 18.210 ms 7 atm12-0-10-mp.r01.ia-clma.infoave.net (165.166.126.3) 18.816 ms 18.934 ms 18.893 ms 8 Serial5-1-1.GW1.RDU1.ALTER.NET (157.130.35.69) 26.658 ms 26.484 ms26.855 ms 9 Fddi12-0-0.GW2.RDU1.ALTER.NET (137.39.40.231) 26.692 ms 26.697 ms 26.490ms10 smatnet-gw2.customer.ALTER.NET (157.130.36.94) 27.736 ms 28.101 ms27.738 ms11 rcmt1-S10-1-1.sprintsvc.net (205.244.203.50) 33.539 ms 33.219 ms 32.446ms12 rcmt3-FE0-0.sprintsvc.net (205.244.112.22) 32.641 ms 32.724 ms 32.898 ms13 gwd1-S3-7.sprintsvc.net (205.244.203.13) 46.026 ms 50.724 ms 45.960 ms14 gateway.ais-gwd.com (205.160.96.102) 47.828 ms 50.912 ms 47.823 ms15 pm3-02.ais-gwd.com (205.160.97.41) 63.786 ms 48.432 ms 48.113 ms16 user58.ais-gwd.com (205.160.97.122) 200.910 ms 184.587 ms 202.771 msThe results should be fairly self-explanatory. This particular path was 16 hops long. Reverse namelookup is attempted for the IP address of each device, and, if successful, these names are reported inaddition to IP addresses. Times are reported for each of the three probes sent. They are interpreted inthe same way as times with ping. (However, if you just want times for one hop, ping is generally abetter choice.)Although no packets were lost in this example, should a packet be lost, an asterisk is printed in theplace of the missing time. In some cases, all three times may be replaced with asterisks. This canhappen for several reasons. First, the router at this hop may not return ICMP TIME_EXCEEDEDmessages. Second, some older routers may incorrectly forward packets even though the TTL is 0. A 57
  • 68. third possibility is that ICMP messages may be given low priority and may not be returned in a timelymanner. Finally, beyond some point of the path, ICMP packets may be blocked.Other routing problems may exist as well. In some instances traceroute will append additionalmessages to the end of lines in the form of an exclamation point and a letter. !H, !N, and !P indicate,respectively, that the host, network, or protocol is unreachable. !F indicates that fragmentation isneeded. !S indicates a source route failure.4.1.1 OptionsTwo options control how much information is printed. Name resolution can be disabled with the -noption. This can be useful if name resolution fails for some reason or if you just dont want to wait onit. The -v option is the verbose flag. With this flag set, the source and packet sizes of the probes willbe reported for each packet. If other ICMP messages are received, they will also be reported, so thiscan be an important option when troubleshooting.Several options may be used to alter the behavior of traceroute, but most are rarely needed. Anexample is the -m option. The TTL field is an 8-bit number allowing a maximum of 255 hops. Mostimplementations of traceroute default to trying only 30 hops before halting. The -m option can beused to change the maximum number of hops tested to any value up to 255.As noted earlier, traceroute usually receives a PORT_UNREACHABLE message when it reaches itsfinal destination because it uses a series of unusually large port numbers as the destination ports.Should the number actually match a port that has a running service, the PORT_UNREACHABLEmessage will not be returned. This is rarely a problem since three packets are sent with different portnumbers, but, if it is, the -p option lets you specify a different starting port so these ports can beavoided.Normally, traceroute sends three probe packets for each TTL value with a timeout of three secondsfor replies. The default number of packets per set can be changed with the -q option. The defaulttimeout can be changed with the -w option.Additional options support how packets are routed. See the manpage for details on these if needed.4.1.2 Complications with tracerouteThe information traceroute supplies has its limitations. In some situations, the results returned bytraceroute have a very short shelf life. This is particularly true for long paths crossing severalnetworks and ISPs.You should also recall that a router, by definition, is a computer with multiple network interfaces,each with a different IP address. This raises an obvious question: which IP address should be returnedfor a router? For traceroute, the answer is dictated by the mechanism it uses to discover the route. Itcan report only the address of the interface receiving the packet. This means a quite different path willbe reported if traceroute is run in the reverse direction.Here is the output when the previous example is run again from what was originally the destination towhat was originally the source, i.e., with the source and destination exchanged:C:>tracert 205.153.61.178 58
  • 69. Tracing route to 205.153.61.178 over a maximum of 30 hops 1 132 ms 129 ms 129 ms pm3-02.ais-gwd.com [205.160.97.41] 2 137 ms 130 ms 129 ms sprint-cisco-01.ais-gwd.com [205.160.97.1] 3 136 ms 129 ms 139 ms 205.160.96.101 4 145 ms 150 ms 140 ms rcmt3-S4-5.sprintsvc.net [205.244.203.53] 5 155 ms 149 ms 149 ms sl-gw2-rly-5-0-0.sprintlink.net [144.232.184.85] 6 165 ms 149 ms 149 ms sl-bb11-rly-2-1.sprintlink.net [144.232.0.77] 7 465 ms 449 ms 399 ms sl-gw11-dc-8-0-0.sprintlink.net [144.232.7.198] 8 155 ms 159 ms 159 ms sl-infonet-2-0-0-T3.sprintlink.net[144.228.220.6] 9 164 ms 159 ms 159 ms atm4-0-10-mp.r01.ia-gnvl.infoave.net[165.166.126.4] 10 164 ms 169 ms 169 ms atm4-0-30.r1.scgnvl.infoave.net[165.166.125.105] 11 175 ms 179 ms 179 ms 165.166.125.166 12 184 ms 189 ms 195 ms e0.r02.ia-gnwd.Infoave.Net [165.166.36.34] 13 190 ms 179 ms 180 ms 165.166.36.18 14 185 ms 179 ms 179 ms 205.153.60.1 15 174 ms 179 ms 179 ms 205.153.61.178Trace complete.There are several obvious differences. First, the format is slightly different because this example wasrun using Microsofts implementation of traceroute, tracert. This, however, should present nodifficulty.A closer examination shows that there are more fundamental differences. The second trace is notsimply the first trace in reverse order. The IP addresses are not the same, and the number of hops isdifferent.There are two things going on here. First, as previously mentioned, traceroute reports the IP numberof the interface where the packet arrives. The reverse path will use different interfaces on each router,so different IP addresses will be reported. While this can be a bit confusing at first glance, it can beuseful. By running traceroute at each end of a connection, a much more complete picture of theconnection can be created.Figure 4-1 shows the first six hops on the path starting from the source for the first trace asreconstructed from the pair of traces. We know the packet originates at 205.153.61.178. The first traceshows us the first hop is 205.153.61.1. It leaves this router on interface 205.153.60.1 for 205.153.60.2.The second of these addresses is just the next hop in the first trace. The first address comes from thesecond trace. It is the last hop before the destination. It is also reasonable in that we have twoaddresses that are part of the same class C network. With IP networks, the ends of a link are part of thelink and must have IP numbers consistent with a single network. Figure 4-1. First six hops on path 59
  • 70. From the first trace, we know packets go from the 205.153.60.2 to 165.166.36.17. From the reversetrace, we are able to deduce that the other end of the 165.166.36.17 link is 165.166.36.18. Or,equivalently, the outbound interface for the 205.153.60.2 router has the address 165.166.36.18.In the same manner, the next routers inbound interface is 165.166.36.17, and its outbound interface is165.166.36.34. This can be a little confusing since it appears that these last three addresses should beon the same network. On closer examination of this link and adjacent links, it appears that this class Baddress is using a subnet mask of /20. With this assumption, the addresses are consistent.We can proceed in much the same manner to discover the next few links. However, when we get tothe seventh entry in the first trace (or to the eighth entry working backward in the second trace), theprocess breaks down. The reason is simple—we have asymmetric paths across the Internet. This alsoaccounts for the difference in the number of hops between the two traces.In much the same way we mapped the near end of the path, the remote end can be reconstructed aswell. The paths become asymmetric at the seventh router when working in this direction. Figure 4-2shows the first four hops. We could probably fill in the remaining addresses for each direction byrunning traceroute to the specific machine where the route breaks down, but this probably isnt worththe effort. Figure 4-2. First four hops on reverse path 60
  • 71. One possible surprise in Figure 4-2 is that we have the same IP number, 205.160.97.41, on eachinterface at the first hop. The explanation is that dial-in access is being used. The IP number Y205.166.97.122 is assigned to the host when the connection is made. 205.160.97.41 must be the access FLrouter. This numbering scheme is normal for an access router.Although we havent constructed a complete picture of the path(s) between these two computers, we AMhave laid out the basic connection to our network through our ISP. This is worth working out well inadvance of any problems. When you suspect problems, you can easily ping these intermediate routersto pinpoint the exact location of a problem. This will tell you whether it is your problem or your ISPs TEproblem. This can also be nice information to have when you call your ISP.To construct the bidirectional path using the technique just described, you need access to a second,remote computer on the Internet from which you can run traceroute. Fortunately, this is not a problem.There are a number of sites on the Internet, which, as a service to the network community, will runtraceroute for you. Often called looking glasses, such sites can provide a number of other services aswell. For example, you may be able to test how accessible your local DNS setup is by observing howwell traceroute works. A list of such sites can be found at http://www.traceroute.org. Alternately, thesearch string "web traceroute" or "traceroute looking glass" will usually turn up a number of such siteswith most search engines.In theory, there is an alternative way to find this type of information with some implementations oftraceroute. Some versions of traceroute support loose source routing, the ability to specify one ormore intermediate hops that the packets must go through. This allows a packet to be diverted througha specific router on its way to its destination. (Strict source routing may also be available. This allowsthe user to specify an exact path through a network. While loose source routing can take any path thatincludes the specified hops, strict source routing must exactly follow the given path.)To construct a detailed list of all devices on a path, the approach is to use traceroute to find a pathfrom the source host to itself, specifying a route through a remote device. Packets leave the host withthe remote device as their initial destination. When the packets arrive at the remote device, that devicereplaces the destination address with the sources address, and the packets are redirected back to the 61 Team-Fly®
  • 72. source. Thus, you get a picture of the path both coming and going. (Of course, source routing is notlimited to just this combination of addresses.)At least, that is how it should work in theory. In practice, many devices no longer support sourcerouting. Unfortunately, source routing has been used in IP spoofing attacks. Packets sent with aspoofed source address can be diverted so they pass through the spoofed devices network. Thisapproach will sometimes slip packets past firewalls since the packet seems to be coming from theright place.This is shown in Figure 4-3. Without source routing, the packet would come into the firewall on thewrong interface and be discarded. With source routing, the packet arrives on the correct interface andpasses through the firewall. Because of problems like this, source routing is frequently disabled. Figure 4-3. IP source spoofingOne final word of warning regarding traceroute—buggy or nonstandard implementations exist.Nonstandard isnt necessarily bad; it just means you need to watch for differences. For example, seethe discussion of tracert later in this chapter. Buggy implementations, however, can really misleadyou.4.2 Path PerformanceOnce you have a picture of the path your traffic is taking, the next step in testing is to get some basicperformance numbers. Evaluating path performance will mean doing three types of measurements.Bandwidth measurements will give you an idea of the hardware capabilities of your network, such asthe maximum capacity of your network. Throughput measurements will help you discover what 62
  • 73. capacity your network provides in practice, i.e., how much of the maximum is actually available.Traffic measurements will give you an idea of how the capacity is being used.My goal in this section is not a definitive analysis of performance. Rather, I describe ways to collectsome general numbers that can be used to see if you have a reasonable level of performance or if youneed to delve deeper. If you want to go beyond the quick-and-dirty approaches described here, youmight consider some of the more advanced tools described in Chapter 9. The tools mentioned hereshould help you focus your efforts.4.2.1 Performance MeasurementsSeveral terms are used, sometimes inconsistently, to describe the capacity or performance of a link.Without getting too formal, lets review some of these terms to avoid potential confusion.Two factors determine how long it takes to send a packet or frame across a single link. The amount oftime it takes to put the signal onto the cable is known as the transmission time or transmission delay.This will depend on the transmission rate (or interface speed) and the size of the frame. The amountof time it takes for the signal to travel across the cable is known as the propagation time orpropagation delay. Propagation time is determined by the type of media used and the distanceinvolved. It often comes as a surprise that a signal transmitted at 100 Mbps will have the samepropagation delay as a signal transmitted at 10 Mbps. The first signal is being transmitted 10 times asfast, but, once it is on a cable, it doesnt propagate any faster. That is, the difference between 10 Mbpsand 100 Mbps is not the speed the bits travel, but the length of the bits.Once we move to multihop paths, a third consideration enters the picture—the delay introduced fromprocessing packets at intermediate devices such as routers and switches. This is usually called thequeuing delay since, for the most part, it arises from the time packets spend in queues within thedevice. The total delay in delivering a packet is the sum of these three delays. Transmission andpropagation delays are usually quite predictable and stable. Queuing delays, however, can introduceconsiderable variability.The term bandwidth is typically used to describe the capacity of a link. For our purposes, this is thetransmission rate for the link.[2] If we can transmit onto a link at 10 Mbps, then we say we have abandwidth of 10 Mbps. [2] My apologies to any purist offended by my somewhat relaxed, pragmatic definition of bandwidth.Throughput is a measure of the amount of data that can be sent over a link in a given amount of time.Throughput estimates, typically obtained through measurements based on the bulk transfer of data, areusually expressed in bits per second or packets per second. Throughput is frequently used as anestimate of the bandwidth of a network, but bandwidth and throughput are really two different things.Throughput measurement may be affected by considerable overhead that is not included in bandwidthmeasurements. Consequently, throughput is a more realistic estimator of the actual performance youwill see.Throughput is generally an end-to-end measurement. When dealing with multihop paths, however, thebandwidths may vary from link to link. The bottleneck bandwidth is the bandwidth of the slowest linkon a path, i.e., the link with the lowest bandwidth. (While introduced here, bottleneck analysis isdiscussed in greater detail in Chapter 12.) 63
  • 74. Additional metrics will sometimes be needed. The best choice is usually task dependent. If you aresending real-time audio packets over a long link, you may want to minimize both delay and variabilityin the delay. If you are using FTP to do bulk transfers, you may be more concerned with thethroughput. If you are evaluating the quality of your link to the Internet, you may want to look atbottleneck bandwidth for the path. The development of reliable metrics is an active area of research.4.2.2 Bandwidth MeasurementsWe will begin by looking at ways to estimate bandwidth. Bandwidth really measures the capabilitiesof our hardware. If bandwidth is not adequate, you will need to reexamine your equipment.4.2.2.1 ping revisitedThe preceding discussion should make clear that the times returned by ping, although frequentlydescribed as propagation delays, really are the sum of the transmission, propagation, and queuingdelays. In the last chapter, we used ping to calculate a rough estimate of the bandwidth of a connectionand noted that this treatment is limited since it gives a composite number.We can refine this process and use it to estimate the bandwidth for a link along a path. The basic ideais to first calculate the path behavior up to the device on the closest end of the link and then calculatethe path behavior to the device at the far end of the link. The difference is then used to estimate thebandwidth for the link in question. Figure 4-4 shows the basic arrangement. Figure 4-4. Link traffic measurementsThis process requires using ping four times. First, ping the near end of a link with two different packetsizes. The difference in the times will eliminate the propagation and queuing delays along the path(assuming they havent changed too much) leaving the time required to transmit the additional data inthe larger packet. Next, use the same two packet sizes to ping the far end of the link. The difference inthe times will again eliminate the overhead. Finally, the difference in these two differences will be theamount of time to send the additional data over the last link in the path. This is the round-trip time.Divide this number by two and you have the time required to send the additional data in one directionover the link. The bandwidth is simply the amount of additional data sent divided by this lastcalculated time. [3] [3] The formula for bandwidth is BW = 16 x (Pl-Ps)/(t2l-t2s-t1l+t1s). The larger and smaller packet sizes are Pl and Ps bytes, t1l and t1s are the ping times for the larger and smaller packets to the nearer interface in seconds, and t2l and t2s are the ping times for the larger and smaller packets to the distant interface in seconds. The result is in bits per second.Table 4-1 shows the raw data for the second and third hops along the path shown in Figure 4-1.Packets sizes are 100 and 1100 bytes. 64
  • 75. Table 4-1. Raw data IP address Time for 100 bytes Time for 1100 bytes205.153.61.1 1.380 ms 5.805 ms205.153.60.2 4.985 ms 12.823 ms165.166.36.17 8.621 ms 26.713 msTable 4-2 shows the calculated results. The time difference was divided by two (RRT correction), thendivided into 8000 bits (the size of the data in bits), and then multiplied by 1000 (milliseconds-to-seconds correction.). The results, in bps, were then converted to Mbps. If several sets of packets aresent, the minimums of the times can be used to improve the estimate. Table 4-2. Calculated bandwidth Near link Far link Time difference Estimated bandwidth205.153.61.1 205.153.60.2 3.413 ms 4.69 Mbps205.153.60.2 165.166.36.17 10.254 ms 1.56 MbpsClearly, doing this manually is confusing, tedious, and prone to errors. Fortunately, several tools basedon this approach greatly simplify the process. These tools also improve accuracy by using multiplepackets.4.2.2.2 pathcharOne tool that automates this process is pathchar. This tool, written by Van Jacobson several years ago,seems to be in a state of limbo. It has, for several years, been available as an alpha release, but nothingseems to have been released since. Several sets of notes or draft notes are available on the Web, butthere appears to be no manpage for the program. Nonetheless, the program remains available and hasbeen ported to several platforms. Fortunately, a couple of alternative implementations of the programhave recently become available. These include bing, pchar, clink, and tmetric.One strength of pathchar and its variants is that they can discover the bandwidth of each link along apath using software at only one end of the path. The method used is basically that described earlier forping, but pathchar uses a large number of packets of various sizes. Here is an example of runningpathchar :bsd1# pathchar 165.166.0.2pathchar to 165.166.0.2 (165.166.0.2) mtu limited to 1500 bytes at local host doing 32 probes at each of 45 sizes (64 to 1500 by 32) 0 205.153.60.247 (205.153.60.247) | 4.3 Mb/s, 1.55 ms (5.88 ms) 1 cisco (205.153.60.2) | 1.5 Mb/s, -144 us (13.5 ms) 2 165.166.36.17 (165.166.36.17) | 10 Mb/s, 242 us (15.2 ms) 3 e0.r01.ia-gnwd.Infoave.Net (165.166.36.33) | 1.2 Mb/s, 3.86 ms (32.7 ms) 4 165.166.125.165 (165.166.125.165) | ?? b/s, 2.56 ms (37.7 ms) 5 165.166.125.106 (165.166.125.106) | 45 Mb/s, 1.85 ms (41.6 ms), +q 3.20 ms (18.1 KB) *4 6 atm1-0-5.r01.ncchrl.infoave.net (165.166.126.1) | 17 Mb/s, 0.94 ms (44.3 ms), +q 5.83 ms (12.1 KB) *2 7 h10-1-0.r01.ia-chrl.infoave.net (165.166.125.33) 65
  • 76. | ?? b/s, 89 us (44.3 ms), 1% dropped 8 dns1.InfoAve.Net (165.166.0.2)8 hops, rtt 21.9 ms (44.3 ms), bottleneck 1.2 Mb/s, pipe 10372 bytesAs pathchar runs, it first displays a message describing how the probing will be done. From the thirdline of output, we see that pathchar is using 45 different packet sizes ranging from 64 to 1500 bytes.(1500 is the local hosts MTU.) It uses 32 different sets of these packets for each hop. Thus, this eight-hop run generated 11,520 test packets plus an equal number of replies.The bandwidth and delay for each link is given. pathchar may also include information on the queuingdelay (links 5 and 6 in this example). As you can see, pathchar is not always successful in estimatingthe bandwidth (see the links numbered 4 and 7) or the delay (see link numbered 1). With thisinformation, we could go back to Figure 4-1 and fill in link speeds for most links.As pathchar runs, it shows a countdown as it sends out each packet. It will display a line that lookssomething like this:1: 31 288 0 3The 1: refers to the hop count and will be incremented for each successive hop on the path. The nextnumber counts down, giving the number of sets of probes remaining to be run for this link. The thirdnumber is the size of the current packet being sent. Both the second and third numbers should bechanging rapidly. The last two numbers give the number of packets that have been dropped so far onthis link and the average round-trip time for this link.When the probes for a hop are complete, this line is replaced with a line giving the bandwidth,incremental propagation delay, and round-trip time. pathchar uses the minimum of the observeddelays to improve its estimate of bandwidth.Several options are available with pathchar. Of greatest interest are those that control the number andsize of the probe packet used. The option -q allows the user to specify the number of sets of packets tosend. The options -m and -M control the minimum and maximum packet sizes, respectively. Theoption -Q controls the step size from the smallest to largest packet sizes. As a general rule of thumb,more packets are required for greater accuracy, particularly on busy links. The option -n turns offDNS resolution, and the option -v provides for more output.pathchar is not without problems. One problem for pathchar is hidden or unknown transmissionpoints. The first link reports a bandwidth of 4.3 Mbps. From traceroute, we only know of the host andthe router at the end of the link. This is actually a path across a switched LAN with three segments andtwo additional transmission points at the switches. The packet is transmitted onto a 10-Mbps network,then onto a 100-Mbps backbone, and then back onto a 10 Mbps network before reaching the firstrouter. Consequently, there are three sets of transmission delays rather than just one, and a smallerthan expected bandwidth is reported.You will see this problem with store-and-forward switches, but it is not appreciable with cut-throughswitches. (Types of Switches if you are unfamiliar with the difference between cut-through and store-and-forward switches.) In a test in which another switch, configured for cut-through, was added to thisnetwork, almost no change was seen in the estimated bandwidth with pathchar. When the switch wasreconfigured as a store-and-forward switch, the reported bandwidth on the first link dropped to 3.0Mbps. 66
  • 77. Types of SwitchesDevices may minimize queuing delays by forwarding frames as soon as possible. In somecases, a device may begin retransmitting a frame before it has finished receiving that frame.With Ethernet frames, for example, the destination address is the first field in the header.Once this has been read, the out interface is known and transmission can begin even thoughmuch of the original frame is still being received. Devices that use this scheme are calledcut-through devices.The alternative is to wait until the entire frame has arrived before retransmitting it. Switchesthat use this approach are known as store-and-forward devices.Cut-through devices have faster throughput than store-and-forward switches because theybegin retransmitting sooner. Unfortunately, cut-through devices may forward damagedframes, frames that a store-and-forward switch would have discarded. The problem is thatthe damage may not be discovered by the cut-through device until after retransmission hasalready begun. Store-and-forward devices introduce longer delays but are less likely totransmit damaged frames since they can examine the entire frame before retransmitting it.Store-and-forward technology is also required if interfaces operate at different speeds. Oftendevices can be configured to operate in either mode.This creates a problem if you are evaluating an ISP. For example, it might appear that the fourth linkis too slow if the contract specifies T1 service. This might be the case, but it could just be a case of ahidden transmission point. Without more information, this isnt clear.Finally, you should be extremely circumspect about running pathchar. It can generate a huge amountof traffic. The preceding run took about 40 minutes to complete. It was run from a host on a universitycampus while the campus was closed for Christmas break and largely deserted. If you are crossing aslow link and have a high path MTU, the amount of traffic can effectively swamp the link.Asymmetric routes, routes in which the path to a device is different from the path back, changingroutes, links using tunneling, or links with additional padding added can all cause problems.4.2.2.3 bingOne alternative to pathchar is bing, a program written by Pierre Beyssac. Where pathchar gives thebandwidth for every link along a path, bing is designed to measure point-to-point bandwidth.Typically, you would run traceroute first if you dont already know the links along a path. Then youwould run bing specifying the near and far ends of the link of interest on the command line. Thisexample measures the bandwidth of the third hop in Figure 4-1:bsd1# bing -e10 -c1 205.153.60.2 165.166.36.17BING 205.153.60.2 (205.153.60.2) and 165.166.36.17 (165.166.36.17) 44 and 108 data bytes1024 bits in 0.835ms: 1226347bps, 0.000815ms per bit1024 bits in 0.671ms: 1526080bps, 0.000655ms per bit1024 bits in 0.664ms: 1542169bps, 0.000648ms per bit1024 bits in 0.658ms: 1556231bps, 0.000643ms per bit1024 bits in 0.627ms: 1633174bps, 0.000612ms per bit1024 bits in 0.682ms: 1501466bps, 0.000666ms per bit1024 bits in 0.685ms: 1494891bps, 0.000669ms per bit1024 bits in 0.605ms: 1692562bps, 0.000591ms per bit1024 bits in 0.618ms: 1656958bps, 0.000604ms per bit 67
  • 78. --- 205.153.60.2 statistics ---bytes out in dup loss rtt (ms): min avg max 44 10 10 0% 3.385 3.421 3.551 108 10 10 0% 3.638 3.684 3.762--- 165.166.36.17 statistics ---bytes out in dup loss rtt (ms): min avg max 44 10 10 0% 3.926 3.986 4.050 108 10 10 0% 4.797 4.918 4.986--- estimated link characteristics ---estimated throughput 1656958bpsminimum delay per packet 0.116ms (192 bits)average statistics (experimental) :packet loss: small 0%, big 0%, total 0%average throughput 1528358bpsaverage delay per packet 0.140ms (232 bits)weighted average throughput 1528358bpsresetting after 10 samples.The output begins with the addresses and packet sizes followed by lines for each pair of probes. Next,bing returns round-trip times and packet loss data. Finally, it returns several estimates of throughput.[4] [4] The observant reader will notice that bing reported throughput, not bandwidth. Unfortunately, there is a lot of ambiguity and inconsistency surrounding these terms.In this particular example, we have specified the options -e10 and -c1, which limit the probe to onecycle using 10 pairs of packets. Alternatively, you can omit these options and watch the output. Whenthe process seems to have stabilized, enter a Ctrl-C to terminate the program. The summary resultswill then be printed. Interpretation of these results should be self-explanatory.bing allows for a number of fairly standard options. These options allow controlling the number ofpacket sizes, suppressing name resolution, controlling routing, and obtaining verbose output. See themanpage if you have need of these options.Because bing uses the same mechanism as pathchar, it will suffer the same problems with hiddentransmission points. Thus, you should be circumspect when using it if you dont fully understand thetopology of the network. While bing does not generate nearly as much traffic as pathchar, it can stillplace strains on a network.4.2.2.4 Packet pair softwareOne alternative approach that is useful for measuring bottleneck bandwidth is the packet pair orpacket stretch approach. With this approach, two packets that are the same size are transmitted back-to-back. As they cross the network, whenever they come to a slower link, the second packet will haveto wait while the first is being transmitted. This increases the time between the transmission of thepackets at this point on the network. If the packets go onto another faster link, the separation ispreserved. If the packets subsequently go onto a slower link, then the separation will increase. Whenthe packets arrive at their destination, the bandwidth of the slowest link can be calculated from theamount of separation and the size of the packets.It would appear that getting this method to work requires software at both ends of the link. In fact,some implementations of packet pair software work this way. However, using software at both ends is 68
  • 79. not absolutely necessary since the acknowledgment packets provided with some protocols shouldpreserve the separation.One assumption of this algorithm is that packets will stay together as they move through the network.If other packets are queued between the two packets, the separation will increase. To avoid thisproblem, a number of packet pairs are sent through the network with the assumption that at least onepair will stay together. This will be the pair with the minimum separation.Several implementations of this algorithm exist. bprobe and cprobe are two examples. At the time thiswas written, these were available only for the IRIX operating system on SGI computers. Since thesource code is available, this may have changed by the time you read this.Compared to the pathchar approach, the packet pair approach will find only the bottleneck bandwidthrather than the bandwidth of an arbitrary link. However, it does not suffer from the hidden hopproblem. Nor does it create the levels of traffic characteristic of pathchar. This is a technology towatch.4.2.3 Throughput MeasurementsEstimating bandwidth can provide a quick overview of hardware performance. But if your bandwidthis not adequate, you are limited in what you can actually do—install faster hardware or contract forfaster service. In practice, it is often not the raw bandwidth of the network but the bandwidth that isactually available that is of interest. That is, you may be more interested in the throughput that you canactually achieve.Poor throughput can result not only from inadequate hardware but also from architectural issues suchas network design. For example, a broadcast domain that is too large will create problems despiteotherwise adequate hardware. The solution is to redesign your network, breaking apart or segmentingsuch domains once you have a clear understanding of traffic patterns.Equipment configuration errors may also cause poor performance. For example, some Ethernetdevices may support full duplex communication if correctly configured but will fall back to halfduplex otherwise. The first step toward a solution is recognizing the misconfiguration. Throughputtests are the next logical step in examining your network.Throughput is typically measured by timing the transfer of a large block of data. This may be calledthe bulk transfer capacity of the link. There are a number of programs in this class besides thosedescribed here. The approach typically requires software at each end of the link. Because the softwareusually works at the application level, it tests not only the network but also your hardware andsoftware at the endpoints.Since performance depends on several parts, when you identify that a problem exists, you wontimmediately know where the problem is. Initially, you might try switching to a different set ofmachines with different implementations to localize the problem. Before you get too caught up in yourtesting, youll want to look at the makeup of the actual traffic as described later in this chapter. Inextreme cases, you may need some of the more advanced tools described later in this book.One simple quick-and-dirty test is to use an application like FTP. Transfer a file with FTP and seewhat numbers it reports. Youll need to convert these to a bit rate, but that is straightforward. Forexample, here is the final line for a file transfer: 69
  • 80. 1294522 bytes received in 1.44 secs (8.8e+02 Kbytes/sec)Convert 1,294,522 bytes to bits by multiplying by 8 and then dividing by the time, 1.44 seconds. Thisgives about 7,191,789 bps.One problem with this approach is that the disk accesses required may skew your results. There are afew tricks you can use to reduce this, but if you need the added accuracy, you are better off using atool that is designed to deal with such a problem. ttcp, for example, overcomes the disk accessproblem by repeatedly sending the same data from memory so that there is no disk overhead.4.2.3.1 ttcpOne of the oldest bulk capacity measurement tools is ttcp. This was written by Mike Muuss and TerrySlattery. To run the program, you first need to start the server on the remote machine using, typically,the -r and -s options. Then the client is started with the options -t and -s and the hostname or addressof the server. Data is sent from the client to the server, performance is measured, the results arereported at each end, and then both client and server terminate. For example, the server might looksomething like this:bsd2# ttcp -r -sttcp-r: buflen=8192, nbuf=2048, align=16384/0, port=5001 tcpttcp-r: socketttcp-r: accept from 205.153.60.247ttcp-r: 16777216 bytes in 18.35 real seconds = 892.71 KB/sec +++ttcp-r: 11483 I/O calls, msec/call = 1.64, calls/sec = 625.67ttcp-r: 0.0user 0.9sys 0:18real 5% 15i+291d 176maxrss 0+2pf 11478+28cswThe client side would look like this:bsd1# ttcp -t -s 205.153.63.239ttcp-t: buflen=8192, nbuf=2048, align=16384/0, port=5001 tcp ->205.153.63.239ttcp-t: socketttcp-t: connectttcp-t: 16777216 bytes in 18.34 real seconds = 893.26 KB/sec +++ttcp-t: 2048 I/O calls, msec/call = 9.17, calls/sec = 111.66ttcp-t: 0.0user 0.5sys 0:18real 2% 16i+305d 176maxrss 0+2pf 3397+7cswThe program reports the amount of information transferred, indicates that the connection is beingmade, and then gives the results, including raw data, throughput, I/O call information, and executiontimes. The number of greatest interest is the transfer rate, 892.71 KB/sec (or 893.26 KB/sec). This isabout 7.3 Mbps, which is reasonable for a 10-Mbps Ethernet connection. (But it is not very differentfrom our quick-and-dirty estimate with FTP.)These numbers reflect the rate at which data is transferred, not the raw capacity of the line. Relatingthese numbers to bandwidth is problematic since more bits are actually being transferred than thesenumbers would indicate. The program reports sending 16,777,216 bytes in 18.35 seconds, but this isjust the data. On Ethernet with an MTU of 1500, each buffer will be broken into 6 frames. The firstwill carry an IP and TCP header for 40 more bytes. Each of the other 5 will have an IP header for 20more bytes each. And each will be packaged as an Ethernet frame costing an additional 18 bytes each.And dont forget the Ethernet preamble. All this additional overhead should be included in acalculation of raw capacity. 70
  • 81. Poor throughput numbers typically indicate congestion but that may not always be the case.Throughput will also depend on configuration issues such as the TCP window size for yourconnection. If your window size is not adequate, it will drastically affect performance. Unfortunately,this problem is not uncommon for older systems on todays high-speed links.The -u option allows you to check UDP throughput. A number of options give you some control overthe amount and the makeup of the information transferred. If you omit the -s option, the program usesstandard input and output. This option allows you to control the data being sent.[5] [5] In fact, ttcp can be used to transfer files or directories between machines. At the destination, use ttcp -r | tar xvpf - and, at the source, use tar cf - directory| ttcp -t dest_machine.The nice thing about ttcp is that a number of implementations are readily available. For example, it isincluded as an undocumented command in the Enterprise version of Cisco IOS 11.2 and later. At onetime, a Java version of ttcp was freely available from Chesapeake Computer Consultants, Inc., (nowpart of Mentor Technologies, Inc.). This program would run on anything with a Java interpreterincluding Windows machines. The Java version supported both a Windows and a command-lineinterface. Unfortunately, this version does not seem to be available anymore, but you might want totry tracking down a copy.4.2.3.2 netperf YAnother program to consider is netperf, which had its origin in the Information Networks Division of FLHewlett-Packard. While not formally supported, the program does appear to have informal support. Itis freely available, runs on a number of Unix platforms, and has reasonable documentation. It has alsobeen ported to Windows. While not as ubiquitous as ttcp, it supports a much wider range of tests. AMUnlike with ttcp, the client and server are two separate programs. The server is netserver and can bestarted independently or via inetd. The client is known as netperf. In the following example, the server TEand client are started on the same machine:bsd1# netserverStarting netserver at port 12865bsd1# netperfTCP STREAM TEST to localhost : histogramRecv Send SendSocket Socket Message ElapsedSize Size Size Time Throughputbytes bytes bytes secs. 10^6bits/sec 16384 16384 16384 10.00 326.10This tests the loop-back interface, which reports a throughput of 326 Mbps.In the next example, netserver is started on one host:bsd1# netserverStarting netserver at port 12865Then netperf is run with the -H option to specify the address of the server:bsd2# netperf -H 205.153.60.247TCP STREAM TEST to 205.153.60.247 : histogram 71 Team-Fly®
  • 82. Recv Send SendSocket Socket Message ElapsedSize Size Size Time Throughputbytes bytes bytes secs. 10^6bits/sec 16384 16384 16384 10.01 6.86This is roughly the same throughput we saw with ttcp. netperf performs a number of additional tests.In the next test, the transaction rate of a connection is measured:bsd2# netperf -H 205.153.60.247 -tTCP_RRTCP REQUEST/RESPONSE TEST to 205.153.60.247 : histogramLocal /RemoteSocket Size Request Resp. Elapsed Trans.Send Recv Size Size Time Ratebytes Bytes bytes bytes secs. per sec16384 16384 1 1 10.00 655.8416384 16384The program contains several scripts for testing. It is also possible to do various stream tests withnetperf. See the document that accompanies the program if you have these needs.4.2.3.3 iperfIf ttcp and netperf dont meet your needs, you might consider iperf. iperf comes from the NationalLaboratory for Applied Network Research (NLANR) and is a very versatile tool. While beyond thescope of this chapter, iperf can also be used to test UDP bandwidth, loss, and jitter. A Java frontend isincluded to make iperf easier to use. This utility has also been ported to Windows.Here is an example of running the server side of iperf on a FreeBSD system:bsd2# iperf -s -p3000------------------------------------------------------------Server listening on TCP port 3000TCP window size: 16.0 KByte (default)------------------------------------------------------------[ 4] local 172.16.2.236 port 3000 connected with 205.153.63.30 port 1133[ ID] Interval Transfer Bandwidth[ 4] 0.0-10.0 sec 5.6 MBytes 4.5 Mbits/sec^CHere is the client side under Windows:C:>iperf -c205.153.60.236 -p3000------------------------------------------------------------Client connecting to 205.153.60.236, TCP port 3000TCP window size: 8.0 KByte (default)------------------------------------------------------------[ 28] local 205.153.63.30 port 1133 connected with 205.153.60.236 port 3000[ ID] Interval Transfer Bandwidth[ 28] 0.0-10.0 sec 5.6 MBytes 4.5 Mbits/secNotice the use of Ctrl-C to terminate the server side. In TCP mode, iperf is compatible with ttcp so itcan be used as the client or server. 72
  • 83. iperf is a particularly convenient tool for investigating whether your TCP window is adequate. The -woption sets the socket buffer size. For TCP, this is the window size. Using the -w option, you can stepthrough various window sizes and see how they impact throughput. iperf has a number of otherstrengths that make it worth considering.4.2.3.4 Other related toolsYou may also want to consider several similar or related tools. treno uses a traceroute-like approachto calculate bulk capacity, path MTU, and minimum RTP. Here is an example:bsd2# treno 205.153.63.30 MTU=8166 MTU=4352 MTU=2002 MTU=1492 ..........Replies were from sloan.lander.edu [205.153.63.30] Average rate: 3868.14 kbp/s (3380 pkts in + 42 lost = 1.2%) in 10.07 sEquilibrium rate: 0 kbp/s (0 pkts in + 0 lost = 0%) in 0 sPath properties: min RTT was 13.58 ms, path MTU was 1440 bytesXXX Calibration checks are still under construction, use -vtreno is part of a larger Internet traffic measurement project at NLANR. treno servers are scatteredacross the Internet.In general, netperf, iperf, and treno offer a wider range of features, but ttcp is generally easier to find. Evaluating Internet Service ProvidersWhen you sign a contract with an ISP to provide a level of service, say T1 access, whatdoes this mean? The answer is not obvious.ISPs sell services based, in some sense, on the total combined expected usage of all users.That is, they sell more capacity than they actually have, expecting levels of usage bydifferent customers to balance out. If everyone tries to use their connection at once, therewont be enough capacity. But the idea is that this will rarely happen. To put it bluntly, ISPsoversell their capacity.This isnt necessarily bad. Telephone companies have always done this. And, apart fromMothers Day and brief periods following disasters, you can almost always count on thephone system working. When you buy T1 Internet access, the assumption is that you willnot be using that line to its full capacity all the time. If everyone used their connection tofull capacity all the time, the price of those connections would be greatly increased. If youreally need some guaranteed level of service, talk to your ISP. They may be able to provideguarantees if you are willing to pay for them.But for the rest of us, the question is "What can we reasonably expect?" At a minimum, acouple of things seem reasonable. First, the ISP should have a connection to the Internetthat well exceeds the largest connections that they are selling. For example, if they areselling multiple T1 lines, they should have a connection that is larger than a T1 line, e.g., aT3 line. Otherwise, if more that one customer is using the link, then no one can operate atfull capacity. Since two customers using the link at the same time is very likely, having onlya T1 line would violate the basic assumption that the contracted capacity is available.Second, the ISP should be able to provide a path through their network to their ISP thatoperates in excess of the contracted speed. If you buy T1 access that must cross a 56-Kbps 73
  • 84. line to reach the rest of the Internet, you dont really have T1 access.Finally, ISPs should have multiple peering arrangements (connections to the global Internet)so that if one connection goes down, there is an alternative path available.Of course, your ISP may feel differently. And, if the price is really good, your arrangementmay make sense. Clearly, not all service arrangements are the same. Youll want to come toa clear understanding with your ISP if you can. Unfortunately, with many ISPs, theinformation you will need is a closely guarded secret. As always, caveat emptor.4.2.4 Traffic Measurements with netstatIn the ideal network, throughput numbers, once you account for overhead, will be fairly close to yourbandwidth numbers. But few of us have our networks all to ourselves. When throughput numbers arelower than expected, which is usually the case, youll want to account for the difference. As mentionedbefore, this could be hardware or software related. But usually it is just the result of the other traffic onyour network. If you are uncertain of the cause, the next step is to look at the traffic on your network.There are three basic approaches you can take. First, the quickest way to get a summary of the activityon a link is to use a tool such as netstat. This approach is described here. Or you can use packetcapture to look at traffic. This approach is described in Chapter 5. Finally, you could use SNMP-basedtools like ntop. SNMP tools are described in Chapter 7. Performance analysis tools using SNMP aredescribed in Chapter 8.The program netstat was introduced in Chapter 2. Given that netstats role is to report network datastructures, it should come as no surprise that it might be useful in this context. To get a quick pictureof the traffic on a network, use the -i option. For example:bsd2# netstat -iName Mtu Network Address Ipkts Ierrs Opkts Oerrs Colllp0* 1500 <Link> 0 0 0 0 0ep0 1500 <Link> 00.60.97.06.22.22 13971293 0 1223799 1 0ep0 1500 205.153.63 bsd2 13971293 0 1223799 1 0tun0* 1500 <Link> 0 0 0 0 0sl0* 552 <Link> 0 0 0 0 0ppp0* 1500 <Link> 0 0 0 0 0lo0 16384 <Link> 234 0 234 0 0lo0 16384 127 localhost 234 0 234 0 0The output shows the number of packets processed for each interface since the last reboot. In thisexample, interface ep0 has received 13,971,293 packets (Ipkts) with no errors (Ierrs), has sent1,223,799 packets (Opkts) with 1 error (Oerrs), and has experienced no collisions (Coll). A fewerrors are generally not a cause for alarm, but the percentage of either error should be quite low,certainly much lower than 0.1% of the total packets. Collisions can be higher but should be less than10% of the traffic. The collision count includes only those involving the interface. A high number ofcollisions is an indication that your network is too heavily loaded, and you should considersegmentation. This particular computer is on a switch, which explains the absence of collision.Collisions are seen only on shared media.If you want output for a single interface, you can specify this with the -I option. For example:bsd2# netstat -Iep0Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll 74
  • 85. ep0 1500 <Link> 00.60.97.06.22.22 13971838 0 1223818 1 0ep0 1500 205.153.63 bsd2 13971838 0 1223818 1 0(This was run a couple of minutes later so the numbers are slightly larger.)Implementations vary, so your output may look different but should contain the same basicinformation. For example, here is output under Linux:lnx1# netstat -iKernel Interface tableIface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flgeth0 1500 0 7366003 0 0 0 93092 0 0 0BMRUeth1 1500 0 289211 0 0 0 18581 0 0 0 BRUlo 3924 0 123 0 0 0 123 0 0 0 LRUAs you can see, Linux breaks down lost packets into three categories—errors, drops, and overruns.Unfortunately, the numbers netstat returns are cumulative from the last reboot of the system. What isreally of interest is how these numbers have changed recently, since a problem could develop and itwould take a considerable amount of time before the actual numbers would grow enough to reveal theproblem.[6] [6] System Performance Tuning by Mike Loukides contains a script that can be run at regular intervals so that differences are more apparent.One thing you may want to try is stressing the system in question to see if this increases the number oferrors you see. You can use either ping with the -l option or the spray command. (spray is discussed ingreater detail in Chapter 9.)First, run netstat to get a current set of values:bsd2# netstat -Iep0Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Collep0 1500 <Link> 00.60.97.06.22.22 13978296 0 1228137 1 0ep0 1500 205.153.63 bsd2 13978296 0 1228137 1 0Next, send a large number of packets to the destination. In this example, 1000 UDP packets were sent:bsd1# spray -c1000 205.153.63.239sending 1000 packets of lnth 86 to 205.153.63.239 ... in 0.09 seconds elapsed time 464 packets (46.40%) droppedSent: 11267 packets/sec, 946.3K bytes/secRcvd: 6039 packets/sec, 507.2K bytes/secNotice that this exceeded the capacity of the network as 464 packets were dropped. This may indicatea congested network. More likely, the host is trying to communicate with a slower machine. Whenspray is run in the reverse direction, no packets are dropped. This indicates the latter explanation.Remember, spray is sending packets as fast as it can, so dont make too much out of dropped packets.Finally, rerun nestat to see if any problems exist:bsd2# netstat -Iep0 75
  • 86. Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Collep0 1500 <Link> 00.60.97.06.22.22 13978964 0 1228156 1 0ep0 1500 205.153.63 bsd2 13978964 0 1228156 1 0No problems are apparent in this example.If problems are indicated, you can get a much more detailed report with the -s option. Youll probablywant to pipe the output to more so it doesnt disappear off the top of the screen. The amount of outputdata can be intimidating but can give a wealth of information. The information is broken down byprotocol and by error types such as bad checksums or incomplete headers.On some systems, such as FreeBSD, a summary of the nonzero values can be obtained by using the -soption twice, as shown in this example:bsd2# netstat -s -sip: 255 total packets received 255 packets for this host 114 packets sent from this hosticmp: ICMP address mask responses are disabledigmp:tcp: 107 packets sent 81 data packets (8272 bytes) 26 ack-only packets (25 delayed) 140 packets received 77 acks (for 8271 bytes) 86 packets (153 bytes) received in-sequence 1 connection accept 1 connection established (including accepts) 77 segments updated rtt (of 78 attempts) 2 correct ACK header predictions 62 correct data packet header predictionsudp: 115 datagrams received 108 broadcast/multicast datagrams dropped due to no socket 7 delivered 7 datagrams outputA summary for a single protocol can be obtained with the -p option to specify the protocol. The nextexample shows the nonzero statistics for TCP:bsd2# netstat -p tcp -s -stcp: 147 packets sent 121 data packets (10513 bytes) 26 ack-only packets (25 delayed) 205 packets received 116 acks (for 10512 bytes) 122 packets (191 bytes) received in-sequence 1 connection accept 1 connection established (including accepts) 116 segments updated rtt (of 117 attempts) 2 correct ACK header predictions 88 correct data packet header predictions 76
  • 87. This can take a bit of experience to interpret. Begin by looking for statistics showing a large numberof errors. Next, identify the type of errors. Typically, input errors are caused by faulty hardware.Output errors are a problem on or at the local host. Data corruption, such as faulty checksums,frequently occurs at routers. And, as noted before, congestion is indicated by collisions. Of course,these are generalizations, so dont read too much into them.4.3 Microsoft WindowsMost of the tools we have been discussing are available in one form or another for Windows platforms.Microsofts implementation of traceroute, known as tracert, has both superficial and fundamentaldifferences from the original implementation. Like ping, tracert requires a DOS window to run. Wehave already seen an example of its output. tracert has fewer options, and there are some superficialdifferences in their flags. But most of traceroutes options are rarely used anyway, so this isnt muchof a problem.A more fundamental difference between Microsofts tracert and its Unix relative is that tracert usesICMP packets rather than UDP packets. This isnt necessarily bad, just different. In fact, if you haveaccess to both traceroute and tracert, you may be able to use this to your advantage in some unusualcircumstances. Its behavior may be surprising in some cases. One obvious implication is that routersthat block ICMP messages will block tracert, while traceroutes UDP packets will be passed.As noted earlier in this chapter, Mentors Java implementation of ttcp runs under Windows if you canfind it. Both netperf and iperf have also been ported to Windows. Another freely available programworth considering is Qcheck from Ganymede Software, Inc. This program requires that GanymedesPerformance Endpoints software be installed on systems at each end of the link. This software is alsoprovided at no cost and is available for a wide variety of systems ranging from Windows to MVS. Inaddition to supporting IP, the software supports SPX and IPX protocols. The software provides ping-like connectivity checks, as well as response time and throughput measurements.As noted in Chapter 2, Microsoft also provides its own version of netstat. The options of interest hereare -e and -s. The -e option gives a brief summary of activity on any Ethernet interface:C:>netstat -eInterface Statistics Received SentBytes 9840233 2475741Unicast packets 15327 16414Non-unicast packets 9268 174Discards 0 0Errors 0 0Unknown protocols 969The -s option gives the per-protocol statistics:C:>netstat -sIP Statistics Packets Received = 22070 Received Header Errors = 0 77
  • 88. Received Address Errors = 6 Datagrams Forwarded = 0 Unknown Protocols Received = 0 Received Packets Discarded = 0 Received Packets Delivered = 22064 Output Requests = 16473 Routing Discards = 0 Discarded Output Packets = 0 Output Packet No Route = 0 Reassembly Required = 0 Reassembly Successful = 0 Reassembly Failures = 0 Datagrams Successfully Fragmented = 0 Datagrams Failing Fragmentation = 0 Fragments Created = 0ICMP Statistics Received Sent Messages 20 8 Errors 0 0 Destination Unreachable 18 8 Time Exceeded 0 0 Parameter Problems 0 0 Source Quenchs 0 0 Redirects 0 0 Echos 0 0 Echo Replies 0 0 Timestamps 0 0 Timestamp Replies 0 0 Address Masks 0 0 Address Mask Replies 0 0TCP Statistics Active Opens = 489 Passive Opens = 2 Failed Connection Attempts = 69 Reset Connections = 66 Current Connections = 4 Segments Received = 12548 Segments Sent = 13614 Segments Retransmitted = 134UDP Statistics Datagrams Received = 8654 No Ports = 860 Receive Errors = 0 Datagrams Sent = 2717Interpretation is basically the same as with the Unix version. 78
  • 89. Chapter 5. Packet CapturePacket capture and analysis is the most powerful technique that will be discussed in this book—it isthe ultimate troubleshooting tool. If you really want to know what is happening on your network, youwill need to capture traffic. No other tool provides more information.On the other hand, no other tool requires the same degree of sophistication to use. If misused, it cancompromise your systems security and invade the privacy of your users. Of the software described inthis book, packet capture software is the most difficult to use to its full potential and requires athorough understanding of the underlying protocols to be used effectively. As noted in Chapter 1, youmust ensure that what you do conforms to your organizations policies and any applicable laws. Youshould also be aware of the ethical implications of your actions.This chapter begins with a discussion of the type of tools available and various issues involved intraffic capture. Next I describe tcpdump, a ubiquitous and powerful packet capture tool. This isfollowed by a brief description of other closely related tools. Next is a discussion of ethereal, apowerful protocol analyzer that is rapidly gaining popularity. Next I describe some of the problemscreated by traffic capture. The chapter concludes with a discussion of packet capture tools availablefor use with Microsoft Windows platforms.5.1 Traffic Capture ToolsPacket capture is the real-time collection of data as it travels over networks. Tools for the capture andanalysis of traffic go by a number of names including packet sniffers, packet analyzers, protocolanalyzers, and even traffic monitors. Although there is some inconsistency in how these terms areused, the primary difference is in how much analysis or interpretation is provided after a packet iscaptured. Packet sniffers generally do the least amount of analysis, while protocol analyzers providethe greatest level of interpretation. Packet analyzers typically lie somewhere in between. All have thecapture of raw data as a core function. Traffic monitors typically are more concerned with collectingstatistical information, but many support the capture of raw data. Any of these may be augmented withadditional functions such as graphing utilities and traffic generators. This chapter describes tcpdump, apacket sniffer, several analysis tools, and ethereal, a protocol analyzer.While packet capture might seem like a low-level tool, it can also be used to examine what ishappening at higher levels, including the application level, because of the way data is encapsulated.Since application data is encapsulated in a generally transparent way by the lower levels of theprotocol stack, the data is basically intact when examined at a lower level.[1] By examining networktraffic, we can examine the data generated at the higher levels. (In general, however, it is usually mucheasier to debug an application using a tool designed for that application. Tools specific to severalapplication-level protocols are described in Chapter 10.) [1] There are two obvious exceptions. The data may be encrypted, or the data may be fragmented among multiple packets.Packet capture programs also require the most technical expertise of any program we will examine. Athorough understanding of the underlying protocol is often required to interpret the results. For thisreason alone, packet capture is a tool that you want to become familiar with well before you need it. 79
  • 90. When you are having problems, it will also be helpful to have comparison systems so you can observenormal behavior. The time to learn how your system works is before you have problems. Thistechnique cannot be stressed enough—do a baseline run for your network periodically and analyze itclosely so you know what traffic you expect to see on your network before you have problems.5.2 Access to TrafficYou can capture traffic only on a link that you have access to. If you cant get traffic to an interface,you cant capture it with that interface. While this might seem obvious, it may be surprisingly difficultto get access to some links on your network. On some networks, this wont be a problem. For example,10Base2 and 10Base5 networks have shared media, at least between bridges and switches. Computersconnected to a hub are effectively on a shared medium, and the traffic is exposed. But on othersystems, watch out!Clearly, if you are trying to capture traffic from a host on one network, it will never see the localtraffic on a different network. But the problem doesnt stop there. Some networking devices, such asbridges and switches, are designed to contain traffic so that it is seen only by parts of the localnetwork. On a switched network, only a limited amount of traffic will normally be seen at anyinterface.[2] Traffic will be limited to traffic to or from the host or to multicast and broadcast traffic. Ifthis includes the traffic you are interested in, so much the better. But if you are looking at generalnetwork traffic, you will use other approaches. [2] This assumes the switches have been running long enough to have a reasonably complete address table. Most switches forward traffic onto all ports if the destination address is unknown. So when they are first turned on, switches look remarkably like hubs.Not being able to capture data on an interface has both positive and negative ramifications. Theprimary benefit is that it is possible to control access to traffic with an appropriate network design. Bysegmenting your network, you can limit access to data, improving security and enhancing privacy.Lack of access to data can become a serious problem, however, when you must capture that traffic.There are several basic approaches to overcome this problem. First, you can try to physically go to thetraffic by using a portable computer to collect the data. This has the obvious disadvantage of requiringthat you travel to the site. This may not be desirable or possible. For example, if you are addressing asecurity problem, it may not be feasible to monitor at the source of the suspected attack withoutrevealing what you are doing. If you need to collect data at multiple points simultaneously, being atdifferent places at the same time is clearly not possible by yourself.Another approach is to have multiple probe computers located throughout your network. For example,if you have computers on your network that you can reach using telnet, ssh, X Window software, orvnc, you can install the appropriate software on each. Some software has been designed with remoteprobing in mind. For example, Microsofts netmon supports the use of a Windows platform as a probefor collecting traffic. Data from the agents on these machines can be collected by a centralmanagement station. Some RMON probes will also do this. (vnc and ssh are described in Chapter 11.netmon is briefly described later in this chapter, and RMON is described in Chapter 8.)When dealing with switches, there are two common approaches you can take. (Several othertechniques that I cant recommend are described later in this chapter.) One approach is to augment theswitch with a spare hub. Attach the hub to the switch and move from the switch to the hub only theconnections that need to be examined. You could try replacing the switch with a hub, but this can be 80
  • 91. disruptive and, since a hub inherently has a lower capacity, you may have more traffic than the hubcan handle. Augmenting the switch with a hub is a better solution.Buying a small portable hub to use in establishing a probe point into your network is certainly worththe expense. Because you will be connecting a hub to a switch, you will be using both crossover andpatch cables. Be sure you work out the details of the cabling well before you have to try this approachon a problematic network. Alternately, there are several commercially available devices designedspecifically for patching into networks. These devices include monitoring switches, fiber splitters, anddevices designed to patch into 100-Mbps links or links with special protocols. If your hardwaredictates such a need, these devices are worth looking into. Here is a riddle for you—when is a hub not a hub? In recent years, the distinction between hubs and switches has become blurred. For example, a 10/100 autoswitching hub may be implemented, internally, as a 10-Mbps hub and a 100-Mbps hub connected by a dual-port switch. With such a device, you may not be able to see all the traffic. In the next few years, true hubs may disappear from the market. You may want to keep this in mind when looking for a hub for traffic monitoring.A second possibility with some switches is to duplicate the traffic from one port onto another port. Ifyour switch supports this, it can be reconfigured dynamically to copy traffic to a monitoring port. YOther ports continue functioning normally so the monitoring appears transparent to the rest of theswitchs operation. This technique is known by a variety of names. With Bay Network products, this is FLknown as conversation steering. Cisco refers to this as monitoring or using a spanning port. Othernames include port aliasing and port mirroring. AMUnfortunately, many switches either dont support this behavior or place limitations on what can bedone. For instance, some switches will allow traffic to be redirected only to a high-speed port.Implementation details determining exactly what can be examined vary greatly. Another problem is TEthat some types of errors will be filtered by the switch, concealing possible problems. For example, ifthere are any framing errors, these will typically be discarded rather than forwarded. Normally,discarding these packets is exactly what you want the switch to do, just not in this context. Youll haveto consult the documentation with your switch to see what is possible.5.3 Capturing DataPacket capture may be done by software running on a networked host or by hardware/softwarecombinations designed specifically for that purpose. Devices designed specifically for capturingtraffic often have high-performance interfaces that can capture large amounts of data without loss.These devices will also capture frames with framing errors—frames that are often silently discardedwith more conventional interfaces. More conventional interfaces may not be able to keep up with hightraffic levels so packets will be lost. Programs like tcpdump give summary statistics, reporting thenumber of packets lost. On moderately loaded networks, however, losing packets should not be aproblem. If dropping packets becomes a problem, you will need to consider faster hardware or, betteryet, segmenting your network.Packet capture software works by placing the network interface in promiscuous mode.[3] In normaloperations, the network interface captures and passes on to the protocol stack only those packets with 81 Team-Fly®
  • 92. the interfaces unicast address, packets sent to a multicast address that matches a configured addressfor the interface, or broadcast packets. In promiscuous mode, all packets are captured regardless oftheir destination address. [3] On a few systems you may need to manually place the interface in promiscuous mode with the ifconfig command before running the packet capture software.While the vast majority of interfaces can be placed in promiscuous mode, a few are manufactured notto allow this. If in doubt, consult the documentation for your interface. Additionally, on Unix systems,the operating system software must be configured to allow promiscuous mode. Typically, placing aninterface in promiscuous mode requires root privileges.5.4 tcpdumpThe tcpdump program was developed at the Lawrence Berkeley Laboratory at the University ofCalifornia, Berkeley, by Van Jacobson, Craig Leres, and Steven McCanne. It was originallydeveloped to analyze TCP/IP performance problems. A number of features have been added over timealthough some options may not be available with every implementation. The program has been portedto a wide variety of systems and comes preinstalled on many systems.For a variety of reasons, tcpdump is an ideal tool to begin with. It is freely available, runs on manyUnix platforms, and has even been ported to Microsoft Windows. Features of its syntax and its fileformat have been used or supported by a large number of subsequent programs. In particular, itscapture software, libpcap, is frequently used by other capture programs. Even when proprietaryprograms with additional features exist, the universality of tcpdump makes it a compelling choice. Ifyou work with a wide variety of platforms, being able to use the same program on all or most of theplatforms can easily outweigh small advantages proprietary programs might have. This is particularlytrue if you use the programs on an irregular basis or dont otherwise have time to fully master them. Itis better to know a single program well than several programs superficially. In such situations, specialfeatures of other programs will likely go unused.Since tcpdump is text based, it is easy to run remotely using a Telnet connection. Its biggestdisadvantage is a lack of analysis, but you can easily capture traffic, move it to your local machine,and analyze it with a tool like ethereal. Typically, I use tcpdump in text-only environments or onremote computers. I use ethereal in a Microsoft Windows or X Window environment and to analyzetcpdump files.5.4.1 Using tcpdumpThe simplest way to run tcpdump is interactively by simply typing the programs name. The outputwill appear on your screen. You can terminate the program by typing Ctrl-C. But unless you have anidle network, you are likely to be overwhelmed by the amount of traffic you capture. What you areinterested in will likely scroll off your screen before you have a chance to read it.Fortunately, there are better ways to run tcpdump. The first question is how you plan to use tcpdump.Issues include whether you also plan to use the host on which tcpdump is running to generate traffic inaddition to capturing traffic, how much traffic you expect to capture, and how you will determine thatthe traffic you need has been captured. 82
  • 93. There are several very simple, standard ways around the problem of being overwhelmed by data. TheUnix commands tee and script are commonly used to allow a user to both view and record outputfrom a Unix session. (Both tee and script are described in Chapter 11.) For example, script could bestarted, tcpdump run, and script stopped to leave a file that could be examined later.The tee command is slightly more complicated since tcpdump must be placed in line mode to displayoutput with tee. This is done with the -l option. The syntax for capturing a file with tee is:bsd1# tcpdump -l | tee outfileOf course, additional arguments would probably be used.Using multiple Telnet connections to a host or multiple windows in an X Window session allows youto record in one window while taking actions to generate traffic in another window. This approach canbe very helpful in some circumstances.An alternative is to use telnet to connect to the probe computer. The session could be logged withmany of the versions of telnet that are available. Be aware, however, that the Telnet connection willgenerate considerable traffic that may become part of your log file unless you are using filtering.(Filtering, which is discussed later in this chapter, allows you to specify the type of traffic you want toexamine.) The additional traffic may also overload the connection, resulting in lost packets.Another alternative is to run tcpdump as a detached process by including an & at the end of thecommand line. Here is an example:bsd1# tcpdump -w outfile &[1] 70260bsd1# tcpdump: listening on xl0The command starts tcpdump, prints a process number, and returns the user prompt along with amessage that tcpdump has started. You can now enter commands to generate the traffic you areinterested in. (You really have a prompt at this point; the message from tcpdump just obscures it.)Once you have generated the traffic of interest, you can terminate tcpdump by issuing a kill commandusing the process number reported when tcpdump was started. (You can use the ps command if youhave forgotten the process number.)bsd1# kill 70260153 packets received by filter0 packets dropped by kernel[1] Done tcpdump -w outfileYou can now analyze the capture file. (Running tcpdump as a detached process can also be usefulwhen you are trying to capture traffic that might not show up for a while, e.g., RADIUS or DNSexchanges. You might want to use the nohup command to run it in the background.)Yet another approach is to use the -w option to write the captured data directly to a file. This optionhas the advantage of collecting raw data in binary format. The data can then be replayed with tcpdumpusing the -r option. The binary format decreases the amount of storage needed, and different filterscan be applied to the file without having to recapture the traffic. Using previously captured traffic isan excellent way of fine-tuning filters to be sure they work as you expect. Of course, you canselectively analyze data captured as text files in Unix by using the many tools Unix provides, but youcant use tcpdump filtering on text files. And you can always generate a text file from a tcpdump file 83
  • 94. for subsequent analysis with Unix tools by simply redirecting the output. To capture data you mighttype:bsd1# tcpdump -w rawfileThe data could be converted to a text file with:bsd1# tcpdump -r rawfile > textfileThis approach has several limitations. Because the data is being written directly to a file, you mustknow when to terminate recording without actually seeing the traffic. Also, if you limit what iscaptured with the original run, the data you exclude is lost. For these reasons, you will probably wantto be very liberal in what you capture, offsetting some of the storage gains of the binary format.Clearly, each approach has its combination of advantages and disadvantages. If you use tcpdump verymuch, you will probably need each from time to time.5.4.2 tcpdump OptionsA number of command-line options are available with tcpdump. Roughly speaking, options can beseparated into four broad categories—commands that control the program operations (excludingfiltering), commands that control how data is displayed, commands that control what data is displayed,and filtering commands. We will consider each category in turn.5.4.2.1 Controlling program behaviorThis class of command-line options affects program behavior, including the way data is collected. Wehave already seen two examples of control commands, -r and -w. The -w option allows us to redirectoutput to a file for later analysis, which can be extremely helpful if you are not sure exactly how youwant to analyze your data. You can subsequently play back capture data using the -r option. You canrepeatedly apply different display options or filters to the data until you have found exactly theinformation you want. These options are extremely helpful in learning to use tcpdump and areessential for documentation and sharing.If you know how many packets you want to capture or if you just have an upper limit on the numberof packets, the -c option allows you to specify that number. The program will terminate automaticallywhen that number is reached, eliminating the need to use a kill command or Ctrl-C. In the nextexample, tcpdump will terminate after 100 packets are collected:bsd1# tcpdump -c100While limiting packet capture can be useful in some circumstances, it is generally difficult to predictaccurately how many packets need to be collected.If you are running tcpdump on a host with more than one network interface, you can specify whichinterface you want to use with the -i option. Use the command ifconfig -a to discover what interfacesare available and what networks they correspond to if you arent sure. For example, suppose you areusing a computer with two class C interfaces, xl0 with an IP address of 205.153.63.238 and xl1 withan IP address of 205.153.61.178. Then, to capture traffic on the 205.153.61.0 network, you would usethe command:bsd1# tcpdump -i xl1 84
  • 95. Without an explicitly identified interface, tcpdump defaults to the lowest numbered interface.The -p option says that the interface should not be put into promiscuous mode. This option would, intheory, limit capture to the normal traffic on the interface—traffic to or from the host, multicast traffic,and broadcast traffic. In practice, the interface might be in promiscuous mode for some other reason.In this event, -p will not turn promiscuous mode off.Finally, -s controls the amount of data captured. Normally, tcpdump defaults to some maximum bytecount and will only capture up to that number of bytes from individual packets. The actual number ofbytes depends on the pseudodevice driver used by the operating system. The default is selected tocapture appropriate headers, but not to collect packet data unnecessarily. By limiting the number ofbytes collected, privacy can be improved. Limiting the number of bytes collected also decreasesprocessing and buffering requirements.If you need to collect more data, the -s option can be used to specify the number of bytes to collect. Ifyou are dropping packets and can get by with fewer bytes, -s can be used to decrease the number ofbytes collected. The following command will collect the entire packet if its length is less than or equalto 200 bytes:bsd1# tcpdump -s200Longer packets will be truncated to 200 bytes.If you are capturing files using the -w option, you should be aware that the number of bytes collectedwill be what is specified by the -s option at the time of capture. The -s option does not apply to filesread back with the -r option. Whatever you captured is what you have. If it was too few bytes, thenyou will have to recapture the data.5.4.2.2 Controlling how information is displayedThe -a, -n, -N, and -f options determine how address information is displayed. The -a option attemptsto force network addresses into names, the -n option prevents the conversion of addresses into names,the -N option prevents domain name qualification, and the -f option prevents remote name resolution.In the following, the remote site www.cisco.com (192.31.7.130) is pinged from sloan.lander.edu(205.153.63.30) without an option, with -a, with -n, with -N, and with -f, respectively. (The options -c1 host 192.31.7.130 restricts capture to one packet to or from the host 192.31.7.130.)bsd1# tcpdump -c1 host 192.31.7.130tcpdump: listening on xl014:16:35.897342 sloan.lander.edu > cio-sys.cisco.com: icmp: echo requestbsd1# tcpdump -c1 -a host 192.31.7.130tcpdump: listening on xl014:16:14.567917 sloan.lander.edu > cio-sys.cisco.com: icmp: echo requestbsd1# tcpdump -c1 -n host 192.31.7.130tcpdump: listening on xl014:17:09.737597 205.153.63.30 > 192.31.7.130: icmp: echo requestbsd1# tcpdump -c1 -N host 192.31.7.130tcpdump: listening on xl014:17:28.891045 sloan > cio-sys: icmp: echo requestbsd1# tcpdump -c1 -f host 192.31.7.130tcpdump: listening on xl014:17:49.274907 sloan.lander.edu > 192.31.7.130: icmp: echo requestClearly, the -a option is the default. 85
  • 96. Not using name resolution can eliminate the overhead and produce terser output. If the network isbroken, you may not be able to reach your name server and will find yourself with long delays, whilename resolution times out. Finally, if you are running tcpdump interactively, name resolution willcreate more traffic that will have to be filtered out.The -t and -tt options control the printing of timestamps. The -t option suppresses the display of thetimestamp while -tt produces unformatted timestamps. The following shows the output for the samepacket using tcpdump without an option, with the -t option, and with the -tt option, respectively:12:36:54.772066 sloan.lander.edu.1174 > 205.153.63.238.telnet: . ack 3259091394win 8647 (DF)sloan.lander.edu.1174 > 205.153.63.238.telnet: . ack 3259091394 win 8647 (DF)934303014.772066 sloan.lander.edu.1174 > 205.153.63.238.telnet: . ack3259091394win 8647 (DF)The -t option produces a more terse output while the -tt output can simplify subsequent processing,particularly if you are writing scripts to process the data.5.4.2.3 Controlling whats displayedThe verbose modes provided by -v and -vv options can be used to print some additional information.For example, the -v option will print TTL fields. For less information, use the -q, or quiet, option.Here is the output for the same packet presented with the -q option, without options, with the -v option,and with the -vv option, respectively:12:36:54.772066 sloan.lander.edu.1174 > 205.153.63.238.telnet: tcp 0 (DF)12:36:54.772066 sloan.lander.edu.1174 > 205.153.63.238.telnet: . ack 3259091394win 8647 (DF)12:36:54.772066 sloan.lander.edu.1174 > 205.153.63.238.telnet: . ack 3259091394win 8647 (DF) (ttl 128, id 45836)12:36:54.772066 sloan.lander.edu.1174 > 205.153.63.238.telnet: . ack 3259091394win 8647 (DF) (ttl 128, id 45836)This additional information might be useful in a few limited contexts, while the quiet mode providesshorter output lines. In this instance, there was no difference between the results with -v and -vv, butthis isnt always the case.The -e option is used to display link-level header information. For the packet from the previousexample, with the -e option, the output is:12:36:54.772066 0:10:5a:a1:e9:8 0:10:5a:e3:37:c ip 60:sloan.lander.edu.1174 > 205.153.63.238.telnet: . ack 3259091394 win 8647 (DF)0:10:5a:a1:e9:8 is the Ethernet address of the 3Com card in sloan.lander.edu, while 0:10:5a:e3:37:cis the Ethernet address of the 3Com card in 205.153.63.238. (We can discover the types of adaptersused by looking up the OUI portion of these addresses, as described in Chapter 2.)For the masochist who wants to decode packets manually, the -x option provides a hexadecimal dumpof packets, excluding link-level headers. A packet displayed with the -x and -vv options looks like this: 86
  • 97. 13:57:12.719718 bsd1.lander.edu.1657 > 205.153.60.5.domain: 11587+ A? www.microsoft.com. (35) (ttl 64, id 41353) 4500 003f a189 0000 4011 c43a cd99 3db2 cd99 3c05 0679 0035 002b 06d9 2d43 0100 0001 0000 0000 0000 0377 7777 096d 6963 726f 736f 6674 0363 6f6d 0000 0100 01Please note that the amount of information displayed will depend on how many bytes are collected, asdetermined by the -s option. Such hex listings are typical of what might be seen with many captureprograms.Describing how to do such an analysis in detail is beyond the scope of this book, as it requires adetailed understanding of the structure of packets for a variety of protocols. Interpreting this data is amatter of taking packets apart byte by byte or even bit by bit, realizing that the interpretation of theresults at one step may determine how the next steps will be done. For header formats, you can look tothe appropriate RFC or in any number of books. Table 5-1 summarizes the analysis for this particularpacket, but every packet is different. This particular packet was a DNS lookup for www.microsoft.com.(For more information on decoding packets, see Eric A. Halls Internet Core Protocols: The DefinitiveGuide.) Table 5-1. Packet analysis summary Raw data in hex InterpretationIP headerFirst 4 bits of 45 IP version—4Last 4 bits of 45 Length of header multiplier—5 (times 4 or 20 bytes)00 Type of service00 3f Packet length in hex—63 bytesa1 89 IDFirst 3 bits of 00 000—flags, none setLast 13 bits of 00 00 Fragmentation offset40 TTL—64 hops11 Protocol number in hex—UDPc4 3a Header checksumcd 99 3d b2 Source IP—205.153.61.178cd 99 3c 05 Destination IP—205.153.60.5UDP header06 79 Source port00 35 Destination port—DNS00 2b UDP packet length—43 bytes06 d9 Header checksumDNS message2d 43 ID01 00 Flags—query with recursion desired00 01 Number of queries00 00 Number of answers00 00 Number of authority RRs00 00 Number of additional RRs 87
  • 98. Query03 Length—377 77 77 String—"www"09 Length—96d 69 63 72 6f 73 6f 66 74 String—"microsoft"03 Length—363 6f 6d String—"com"00 Length—000 01 Query type—IP address00 01 Query class—InternetThis analysis was included here primarily to give a better idea of how packet analysis works. Severalprograms that analyze packet data from a tcpdump trace file are described later in this chapter. Unixutilities like strings, od, and hexdump can also make the process easier. For example, in the followingexample, this makes it easier to pick out www.microsoft.com in the data:bsd1# hexdump -C tracefile00000000 d4 c3 b2 a1 02 00 04 00 00 00 00 00 00 00 00 00 |................|00000010 c8 00 00 00 01 00 00 00 78 19 06 38 66 fb 0a 00 |........x..8f...|00000020 4d 00 00 00 4d 00 00 00 00 00 a2 c6 0e 43 00 60 |M...M........C.`|00000030 97 92 4a 7b 08 00 45 00 00 3f a1 89 00 00 40 11 |..J{..E..?....@.|00000040 c4 3a cd 99 3d b2 cd 99 3c 05 06 79 00 35 00 2b |.:..=...<..y.5.+|00000050 06 d9 2d 43 01 00 00 01 00 00 00 00 00 00 03 77 |..-C...........w|00000060 77 77 09 6d 69 63 72 6f 73 6f 66 74 03 63 6f 6d |ww.microsoft.com|00000070 00 00 01 00 01 |.....|00000075The -vv option could also be used to get as much information as possible.Hopefully, you will have little need for the -x option. But occasionally you may encounter a packetthat is unknown to tcpdump, and you have no choice. For example, some of the switches on my localnetwork use a proprietary implementation of a spanning tree protocol to implement virtual local areanetworks (VLANs). Most packet analyzers, including tcpdump, wont recognize these. Fortunately,once you have decoded one unusual packet, you can usually easily identify similar packets.5.4.2.4 FilteringTo effectively use tcpdump, it is necessary to master the use of filters. Filters permit you to specifywhat traffic you want to capture, allowing you to focus on just what is of interest. This can beabsolutely essential if you need to extract a small amount of traffic from a massive trace file.Moreover, tools like ethereal use the tcpdump filter syntax for capturing traffic, so youll want to learnthe syntax if you plan to use these tools.If you are absolutely certain that you are not interested in some kinds of traffic, you can exclude trafficas you capture. If you are unclear of what traffic you want, you can collect the raw data to a file andapply the filters as you read back the file. In practice, you will often alternate between these twoapproaches.Filters at their simplest are keywords added to the end of the command line. However, extremelycomplex commands can be constructed using logical and relational operators. In the latter case, it isusually better to save the filter to a file and use the -F option. For example, if testfilter is a text file 88
  • 99. containing the filter host 205.153.63.30, then typing tcpdump -Ftestfilter is equivalentto typing the command tcpdump host 205.153.63.30. Generally, you will want to use this feature withcomplex filters only. However, you cant combine filters on the command line with a filters file in thesame command.5.4.2.4.1 Address filtering.It should come as no surprise that filters can select traffic based on addresses. For example, considerthe command:bsd1# tcpdump host 205.153.63.30This command captures all traffic to and from the host with the IP address 205.153.63.30. The hostmay be specified by IP number or name. Since an IP address has been specified, you might incorrectlyguess that the captured traffic will be limited to IP traffic. In fact, other traffic, such as ARP traffic,will also be collected by this filter. Restricting capture to a particular protocol requires a morecomplex filter. Nonintuitive behavior like this necessitates a thorough testing of all filters.Addresses can be specified and restricted in several ways. Here is an example that uses the Ethernetaddress of a computer to select traffic:bsd1# tcpdump ether host 0:10:5a:e3:37:cCapture can be further restricted to traffic flows for a single direction, either to a host or from a host,using src to specify the source of the traffic or dst to specify the destination. The next example showsa filter that collects traffic sent to the host at 205.153.63.30 but not from it:bsd1# tcpdump dst 205.153.63.30Note that the keyword host was omitted in this example. Such omissions are OK in several instances,but it is always safer to include these keywords.Multicast or broadcast traffic can be selected by using the keyword multicast or broadcast,respectively. Since multicast and broadcast traffic are specified differently at the link level and thenetwork level, there are two forms for each of these filters. The filter ether multicast captures trafficwith an Ethernet multicast address, while ip multicast captures traffic with an IP multicast address.Similar qualifiers are used with broadcast traffic. Be aware that multicast filters may capture broadcasttraffic. As always, test your filters.Traffic capture can be restricted to networks as well as hosts. For example, the following commandrestricts capture to packets coming from or going to the 205.153.60.0 network:bsd1# tcpdump net 205.153.60The following command does the same thing:bsd1# tcpdump net 205.153.60.0 mask 255.255.255.0Although you might guess otherwise, the following command does not work properly due to thefinal .0: 89
  • 100. bsd1# tcpdump net 205.153.60.0Be sure to test your filters!5.4.2.4.2 Protocol and port filtering.It is possible to restrict capture to specific protocols such as IP, Appletalk, or TCP. You can alsorestrict capture to services built on top of these protocols, such as DNS or RIP. This type of capturecan be done in three ways—by using a few specific keywords known by tcpdump, by protocol usingthe proto keyword, or by service using the port keyword.Several of these protocol names are recognized by tcpdump and can be identified by keyword. Thefollowing command restricts the traffic captured to IP traffic:bsd1# tcpdump ipOf course, IP traffic will include TCP traffic, UDP traffic, and so on.To capture just TCP traffic, you would use:bsd1# tcpdump tcpRecognized keywords include ip, igmp, tcp, udp, and icmp.There are many transport-level services that do not have recognized keywords. In this case, you canuse the keywords proto or ip proto followed by either the name of the protocol found in the/etc/protocols file or the corresponding protocol number. For example, either of the following willlook for OSPF packets:bsd1# tcpdump ip proto ospfbsd1# tcpdump ip proto 89Of course, the first works only if there is an entry in /etc/protocols for OSPF.Built-in keywords may cause problems. In these examples, the keyword tcp must either be escaped orthe number must be used. For example, the following is fine:bsd#1 tcpdump ip proto 6On the other hand, you cant use tcp with proto.bsd#1 tcpdump ip proto tcpwill generate an error.For higher-level services, services built on top of the underlying protocols, you must use the keywordport. Either of the following will collect DNS traffic:bsd#1 tcpdump port domainbds#1 tcpdump port 53 90
  • 101. In the former case, the keyword domain is resolved by looking in /etc/services. When there may beambiguity between transport-layer protocols, you may further restrict ports to a particular protocol.Consider the command:bsd#1 tcpdump udp port domainThis will capture DNS name lookups using UDP but not DNS zone transfers using TCP. The twoprevious commands would capture both.5.4.2.4.3 Packet characteristics.Filters can also be designed based on packet characteristics such as packet length or the contents of aparticular field. These filters must include a relational operator. To use length, the keyword less orgreater is used. Here is an example:bsd1# tcpdump greater 200This command collects packets longer than 200 bytes.Looking inside packets is a little more complicated in that you must understand the structure of thepackets header. But despite the complexity, or perhaps because of it, this technique gives you thegreatest control over what is captured. (If you are charged with creating a firewall using a product that Yrequires specifying offsets into headers, practicing with tcpdump could prove invaluable.) FLThe general syntax is proto[expr:size]. The field proto indicates which header to look into—ipfor the IP header, tcp for the TCP header, and so forth. The expr field gives an offset into the header AMindexed from 0. That is, the first byte in a header is number 0, the second byte is number 1, and soforth. Alternately, you can think of expr as the number of bytes in the header to skip over. The sizefield is optional. It specifies the number of bytes to use and can be 1, 2, or 4. TEbsd1# tcpdump "ip[9] = 6"looks into the IP header at the tenth byte, the protocol field, for a value of 6. Notice that this must bequoted. Either an apostrophe or double quotes should work, but a backquote will not work.bsd1# tcpdump tcpis an equivalent command since 6 is the protocol number for TCP.This technique is frequently used with a mask to select specific bits. Values should be in hex.Comparisons are specified using the syntax & followed by a bit mask. The next example extracts thefirst byte from the Ethernet header (i.e., the first byte of the destination address), extracts the low-order bit, and makes sure the bit is not 0:[4] [4] The astute reader will notice that this test could be more concisely written as =1 rather than !=0. While it doesnt matter for this example, using the second form simplifies testing in some cases and is a common idiom. In the next command, the syntax is simpler since you are testing to see if multiple bits are set.bsd1# tcpdump ether[0] & 1 != 0This will match multicast and broadcast packets. 91 Team-Fly®
  • 102. With both of these examples, there are better ways of matching the packets. For a more realisticexample, consider the command:bsd1# tcpdump "tcp[13] & 0x03 != 0"This filter skips the first 13 bytes in the TCP header, extracting the flag byte. The mask 0x03 selectsthe first and second bits, which are the FIN and SYN bits. A packet is captured if either bit is set. Thiswill capture setup or teardown packets for a TCP connection.It is tempting to try to mix in relational operators with these logical operators. Unfortunately,expressions like tcp src port > 23 dont work. The best way of thinking about it is that the expressiontcp src port returns a value of true or false, not a numerical value, so it cant be compared to a number.If you want to look for all TCP traffic with a source port with a value greater than 23, you must extractthe port field from the header using syntax such as "tcp[0:2] & 0xffff > 0x0017".5.4.2.4.4 Compound filters.All the examples thus far have consisted of simple commands with a single test. Compound filters canbe constructed in tcpdump using logical operator and, or, and not. These are often abbreviated &&,||, and ! respectively. Negation has the highest precedence. Precedence is left to right in the absenceof parentheses. While parentheses can be used to change precedence, remember that they must beescaped or quoted.Earlier it was noted that the following will not limit capture to just IP traffic:bsd1# tcpdump host 205.153.63.30If you really only want IP traffic in this case, use the command:bsd1# tcpdump host 205.153.63.30 and ipOn the other hand, if you want all traffic to the host except IP traffic, you could use:bsd1# tcpdump host 205.153.63.30 and not ipIf you need to capture all traffic to and from the host and all non-IP traffic, replace the and with an or.With complex expressions, you have to be careful of the precedence. Consider the two commands:bsd1# tcpdump host lnx1 and udp or arpbsd1# tcpdump "host lnx1 and (udp or arp)"The first will capture all UDP traffic to or from lnx1 and all ARP traffic. What you probably want isthe second, which captures all UDP or ARP traffic to or from lxn1. But beware, this will also captureARP broadcast traffic. To beat a dead horse, be sure to test your filters.I mentioned earlier that running tcpdump on a remote station using telnet was one way to collect dataacross your network, except that the Telnet traffic itself would be captured. It should be clear now thatthe appropriate filter can be used to avoid this problem. To eliminate a specific TCP connection, youneed four pieces of information—the source and destination IP addresses and the source and 92
  • 103. destination port numbers. In practice, the two IP addresses and the well-known port number is oftenenough.For example, suppose you are interested in capturing traffic on the host lnx1, you are logged onto thehost bsd1, and you are using telnet to connect from bsd1 to lnx1. To capture all the traffic at lnx1,excluding the Telnet traffic between bsd1 and lnx1, the following command will probably workadequately in most cases:lnx1# tcpdump -n "not (tcp port telnet and host lnx1 and host bsd1)"We cant just exclude Telnet traffic since that would exclude all Telnet traffic between lnx1 and anyhost. We cant just exclude traffic to or from one of the hosts because that would exclude non-Telnettraffic as well. What we want to exclude is just traffic that is Telnet traffic, has lnx1 as a host, and hasbsd1 as a host. So we take the negation of these three requirements to get everything else.While this filter is usually adequate, this filter excludes all Telnet sessions between the two hosts, notjust yours. If you really want to capture other Telnet traffic between lnx1 and bsd1, you would need toinclude a fourth term in the negation giving the ephemeral port assigned by telnet. Youll need to runtcpdump twice, first to discover the ephemeral port number for your current session since it will bedifferent with every session, and then again with the full filter to capture the traffic you are interestedin.One other observation—while we are not reporting the traffic, the traffic is still there. If you areinvestigating a bandwidth problem, you have just added to the traffic. You can, however, minimizethis traffic during the capture if you write out your trace to a file on lnx1 using the -w option. This istrue, however, only if you are using a local filesystem. Finally, note the use of the -n option. This isrequired to prevent name resolution. Otherwise, tcpdump would be creating additional network trafficin trying to resolve IP numbers into names as noted earlier.Once you have mastered the basic syntax of tcpdump, you should run tcpdump on your own systemwithout any filters. It is worthwhile to do this occasionally just to see what sorts of traffic you have onyour network. There are likely to be a number of surprises. In particular, there may be router protocols,switch topology information exchange, or traffic from numerous PC-based protocols that you arentexpecting. It is very helpful to know that this is normal traffic so when you have problems you wontblame the problems on this strange traffic.This has not been an exhaustive treatment of tcpdump, but I hope that it adequately covers the basics.The manpage for tcpdump contains a wealth of additional information, including several detailedexamples with explanations. One issue I have avoided has been how to interpret tcpdump data.Unfortunately, this depends upon the protocol and is really beyond the scope of a book such as this.Ultimately, you must learn the details of the protocols. For TCP/IP, Richard W. Stevens TCP/IPIllustrated, vol. 1, The Protocols has extensive examples using tcpdump. But the best way to learn isto use tcpdump to examine the behavior of working systems.5.5 Analysis ToolsAs previously noted, one reason for using tcpdump is the wide variety of support tools that areavailable for use with tcpdump or files created with tcpdump. There are tools for sanitizing the data,tools for reformatting the data, and tools for presenting and analyzing the data. 93
  • 104. 5.5.1 sanitizeIf you are particularly sensitive to privacy or security concerns, you may want to consider sanitize, acollection of five Bourne shell scripts that reduce or condense tcpdump trace files and eliminateconfidential information. The scripts renumber host entries and select classes of packets, eliminatingall others. This has two primary uses. First, it reduces the size of the files you must deal with,hopefully focusing your attention on a subset of the original traffic that still contains the traffic ofinterest. Second, it gives you data that can be distributed or made public (for debugging or networkanalysis) without compromising individual privacy or revealing too much specific information aboutyour network. Clearly, these scripts wont be useful for everyone. But if internal policies constrainwhat you can reveal, these scripts are worth looking into.The five scripts included in sanitize are sanitize-tcp, sanitize-syn-fin, sanitize-udp, sanitize-encap, andsanitize-other. Each script filters out inappropriate traffic and reduces the remaining traffic. Forexample, all non-TCP packets are removed by sanitize-tcp and the remaining TCP traffic is reduced tosix fields—an unformatted timestamp, a renumbered source address, a renumbered destination address,the source port, a destination address, and the number of data bytes in the packet.934303014.772066 205.153.63.30.1174 > 205.153.63.238.23: . ack 3259091394 win8647 (DF) 4500 0028 b30c 4000 8006 2d84 cd99 3f1e cd99 3fee 0496 0017 00ff f9b3 c241 c9c2 5010 21c7 e869 0000 0000 0000 0000would be reduced to 934303014.772066 1 2 1174 23 0. Notice that the IP numbers havebeen replaced with 1 and 2, respectively. This will be done in a consistent manner with multiplepackets so you will still be able to compare addresses within a single trace. The actual data reportedvaries from script to script. Here is an example of the syntax:bsd1# sanitize-tcp tracefileThis runs sanitize-tcp over the tcpdump trace file tracefile. There are no arguments.5.5.2 tcpdprivThe program tcpdpriv is another program for removing sensitive information from tcpdump files.There are several major differences between tcpdpriv and sanitize. First, as a shell script, sanitizeshould run on almost any Unix system. As a compiled program, this is not true of tcpdpriv. On theother hand, tcpdpriv supports the direct capture of data as well as the analysis of existing files. Thecaptured packets are written as a tcpdump file, which can be subsequently processed.Also, tcpdpriv allows you some degree of control over how much of the original data is removed orscrambled. For example, it is possible to have an IP address scrambled but retain its class designation.If the -C4 option is chosen, an IP address such as 205.153.63.238 might be replaced with 193.0.0.2.Notice that address classes are preserved—a class C address is replaced with a class C address.There are a variety of command-line options that control how data is rewritten, several of which aremandatory. Many of the command-line options will look familiar to tcpdump users. The program doesnot allow output to be written to a terminal, so it must be written directly to a file or redirected. Whilea useful program, the number of required command-line options can be annoying. There is someconcern that if the options are not selected properly, it may be possible to reconstruct the original datafrom the scrambled data. In practice, this should be a minor concern. 94
  • 105. As an example of using tcpdpriv, the following command will scramble the file tracefile:bsd1# tcpdpriv -P99 -C4 -M20 -r tracefile -w outfileThe -P99 option preserves (doesnt scramble) the port numbers, -C4 preserves the class identity of theIP addresses, and -M20 preserves multicast addresses. If you want the data output to your terminal,you can pipe the output to tcpdump:bsd1# tcpdpriv -P99 -C4 -M20 -r tracefile -w- | tcpdump -r-The last options look a little strange, but they will work.5.5.3 tcpflowAnother useful tool is tcpflow, written by Jeremy Elson. This program allows you to captureindividual TCP flows or sessions. If the traffic you are looking at includes, say, three different Telnetsessions, tcpflow will separate the traffic into three different files so you can examine eachindividually. The program can reconstruct data streams regardless of out-of-order packets orretransmissions but does not understand fragmentation.tcpflow stores each flow in a separate file with names based on the source and destination addressesand ports. For example, SSH traffic (port 22) between 172.16.2.210 and 205.153.63.30 might have thefilename 172.016.002.210.00022-205.153.063.030.01071, where 1071 is the ephemeral port createdfor the session.Since tcpflow uses libpcap, the same packet capture library tcpdump uses, capture filters areconstructed in exactly the same way and with the same syntax. It can be used in a number of ways.For example, you could see what cookies are being sent during an HTTP session. Or you might use itto see if SSH is really encrypting your data. Of course, you could also use it to capture passwords orread email, so be sure to set permissions correctly.5.5.4 tcp-reduceThe program tcp-reduce invokes a collection of shell scripts to reduce the packet capture informationin a tcpdump trace file to one-line summaries for each connection. That is, an entire Telnet sessionwould be summarized by a single line. This could be extremely useful in getting an overall picture ofhow the traffic over a link breaks down or for looking quickly at very large files.The syntax is quite simple.bsd1# tcp-reduce tracefile > outfilewill reduce tracefile, putting the output in outfile. The program tcp-summary, which comes with tcp-reduce, will further summarize the results. For example, on my system I traced a system briefly withtcpdump. This process collected 741 packets. When processed with tcp-reduce, this revealed 58 TCPconnections. Here is an example when results were passed to tcp-summary :bsd1# tcp-reduce out-file | tcp-summaryThis example produced the following five-line summary: 95
  • 106. proto # conn KBytes % SF % loc % ngh----- ------ ------ ---- ----- -----www 56 35 25 0 0telnet 1 1 100 0 0pop-3 1 0 100 0 0In this instance, this clearly shows that the HTTP traffic dominated the local network traffic.5.5.5 tcpshowThe program tcpshow decodes a tcpdump trace file. It represents an alternative to using tcpdump todecode data. The primary advantage of tcpshow is much nicer formatting for output. For example,here is the tcpdump output for a packet:12:36:54.772066 sloan.lander.edu.1174 > 205.153.63.238.telnet: . ack3259091394 win 8647 (DF) bHere is corresponding output from tcpshow for the same packet:-----------------------------------------------------------------------Packet 1TIME: 12:36:54.772066LINK: 00:10:5A:A1:E9:08 -> 00:10:5A:E3:37:0C type=IP IP: sloan -> 205.153.63.238 hlen=20 TOS=00 dgramlen=40 id=B30C MF/DF=0/1 frag=0 TTL=128 proto=TCP cksum=2D84 TCP: port 1174 -> telnet seq=0016775603 ack=3259091394 hlen=20 (data=0) UAPRSF=010000 wnd=8647 cksum=E869 urg=0DATA: <No data>-----------------------------------------------------------------------The syntax is:bsd1# tcpshow < trace-fileThere are numerous options.5.5.6 tcpsliceThe program tcpslice is a simple but useful program for extracting pieces or merging tcpdump files.This is a useful utility for managing larger tcpdump files. You specify a starting time and optionally anending time for a file, and it extracts the corresponding records from the source file. If multiple filesare specified, it extracts packets from the first file and then continues extracting only those packetsfrom the next file that have a later timestamp. This prevents duplicate packets if you have overlappingtrace files.While there are a few options, the basic syntax is quite simple. For example, consider the command:bsd1# tcpslice 934224220.0000 in-file > out-fileThis will extract all packets with timestamps after 934224220.0000. Note the use of an unformattedtimestamp. This is the same format displayed with the -tt option with tcpdump. Note also the use ofredirection. Because it works with binary files, tcpslice will not allow you to send output to yourterminal. See the manpage for additional options. 96
  • 107. 5.5.7 tcptraceThis program is an extremely powerful tcpdump file analysis tool. The program tcptrace is strictly ananalysis tool, not a capture program, but it works with a variety of capture file formats. The toolsprimary focus is the analysis of TCP connections. As such, it is more of a network management toolthan a packet analysis tool. The program provides several levels of output or analysis ranging fromvery brief to very detailed.While for most purposes tcptrace is used as a command-line tool, tcptrace is capable of producingseveral types of output files for plotting with the X Window program xplot. These include timesequence graphs, throughput graphs, and graphs of round-trip times. Time sequence graphs (-S option)are plots of sequence numbers over time that give a picture of the activity on the network. Throughputgraphs (-T option), as the name implies, plot throughput in bytes per second against time. Whilethroughput gives a picture of the volume of traffic on the network, round-trip times give a betterpicture of the delays seen by individual connections. Round-trip time plots (-R option) displayindividual round-trip times over time. For other graphs and graphing options, consult thedocumentation.For normal text-based operations, there are an overwhelming number of options and possibilities. Oneof the most useful is the -l option. This produces a long listing of summary statistics on a connection-by-connection basis. What follows is an example of the information provided for a single brief Telnetconnection:TCP connection 2: host c: sloan.lander.edu:1230 host d: 205.153.63.238:23 complete conn: yes first packet: Wed Aug 11 11:23:25.151274 1999 last packet: Wed Aug 11 11:23:53.638124 1999 elapsed time: 0:00:28.486850 total packets: 160 filename: telnet.trace c->d: d->c: total packets: 96 total packets: 64 ack pkts sent: 95 ack pkts sent: 64 pure acks sent: 39 pure acks sent: 10 unique bytes sent: 119 unique bytes sent: 1197 actual data pkts: 55 actual data pkts: 52 actual data bytes: 119 actual data bytes: 1197 rexmt data pkts: 0 rexmt data pkts: 0 rexmt data bytes: 0 rexmt data bytes: 0 outoforder pkts: 0 outoforder pkts: 0 pushed data pkts: 55 pushed data pkts: 52 SYN/FIN pkts sent: 1/1 SYN/FIN pkts sent: 1/1 mss requested: 1460 bytes mss requested: 1460 bytes max segm size: 15 bytes max segm size: 959 bytes min segm size: 1 bytes min segm size: 1 bytes avg segm size: 2 bytes avg segm size: 23 bytes max win adv: 8760 bytes max win adv: 17520 bytes min win adv: 7563 bytes min win adv: 17505 bytes zero win adv: 0 times zero win adv: 0 times avg win adv: 7953 bytes avg win adv: 17519 bytes initial window: 15 bytes initial window: 3 bytes initial window: 1 pkts initial window: 1 pkts ttl stream length: 119 bytes ttl stream length: 1197 bytes missed data: 0 bytes missed data: 0 bytes truncated data: 1 bytes truncated data: 1013 bytes truncated packets: 1 pkts truncated packets: 7 pkts 97
  • 108. data xmit time: 28.479 secs data xmit time: 27.446 secs idletime max: 6508.6 ms idletime max: 6709.0 ms throughput: 4 Bps throughput: 42 BpsThis was produced by using tcpdump to capture all traffic into the file telnet.trace and then executingtcptrace to process the data. Here is the syntax required to produce this output:bsd1# tcptrace -l telnet.traceSimilar output is produced for each TCP connection recorded in the trace file. Obviously, a protocol(like HTTP) that uses many different sessions may overwhelm you with output.There is a lot more to this program than covered in this brief discussion. If your primary goal isanalysis of network performance and related problems rather than individual packet analysis, this is avery useful tool.5.5.8 trafshowThe program trafshow is a packet capture program of a different sort. It provides a continuous displayof traffic over the network, giving repeated snapshots of traffic. It displays the source address,destination address, protocol, and number of bytes. This program would be most useful in looking forsuspicious traffic or just getting a general idea of network traffic.While trafshow can be run on a text-based terminal, it effectively takes over the display. It is best usedin a separate window of a windowing system. There are a number of options, including support forpacket filtering using the same filter format as tcpdump.5.5.9 xplotThe xplot program is an X Windows plotting program. While it is a general purpose plotting program,it was written as part of a thesis project for TCP analysis by David Clark. As a result, some support forplotting TCP data (oriented toward network analysis) is included with the package. It is also used bytcptrace. While a powerful and useful program, it is not for the faint of heart. Due to the lack ofdocumentation, the program is easiest to use with tcptrace rather than as a standalone program.5.5.10 Other Packet Capture ProgramsWe have discussed tcpdump in detail because it is the most widely available packet capture programfor Unix. Many implementations of Unix have proprietary packet capture programs that arecomparable to tcpdump. For example, Sun Microsystems Solaris provides snoop. (This is areplacement for etherfind, which was supplied with earlier versions of the Sun operating system.)Here is an example of using snoop to capture five packets:sol1> snoop -c5Using device /dev/elxl (promiscuous mode)172.16.2.210 -> sol1 TELNET C port=28863 sol1 -> 172.16.2.210 TELNET R port=28863 /dev/elxl (promiscuo172.16.2.210 -> sol1 TELNET C port=28863172.16.2.210 -> sloan.lander.edu TCP D=1071 S=22 Ack=143990 Seq=3737542069Len=60 Win=17520 98
  • 109. sloan.lander.edu -> 172.16.2.210 TCP D=22 S=1071 Ack=3737542129 Seq=143990Len=0 Win=7908snoop: 5 packets capturedAs you can see, it is used pretty much the same way as tcpdump. (Actually, the output has a slightlymore readable format.) snoop, like tcpdump, supports a wide range of options and filters. You shouldhave no trouble learning snoop if you have ever used tcpdump.Other systems will provide their own equivalents (for example, AIX provides iptrace ). While thesyntax is different, these tools are used in much the same way.5.6 Packet AnalyzersEven with the tools just described, the real limitation with tcpdump is interpreting the data. For manyuses, tcpdump may be all you need. But if you want to examine the data within packets, a packetsniffer is not enough. You need a packet analyzer. A large number of packet analyzers are available attremendous prices. But before you start spending money, you should consider ethereal.5.6.1 etherealethereal is available both as an X Windows program for Unix systems and as a Microsoft Windowsprogram. It can be used as a capture tool and as an analysis tool. It uses the same capture engine andfile format as tcpdump, so you can use the same filter syntax when capturing traffic, and you can useethereal to analyze tcpdump files. Actually, ethereal supports two types of filters, capture filters basedon tcpdump and display filters used to control what you are looking at. Display filters use a differentsyntax and are described later in this section.5.6.1.1 Using etherealUsually ethereal will be managed entirely from a windowing environment. While it can be run withcommand-line options, Ive never encountered a use for these. (There is also a text-based version,tethereal.) When you run ethereal, you are presented with a window with three initially empty panes.The initial screen is similar to Figure 5-1 except the panes are empty. (These figures are for theWindows implementation of ethereal, but these windows are almost identical to the Unix version.) Ifyou have a file you want to analyze, you can select File Open. You can either load a tcpdump filecreated with the -w option or a file previously saved from ethereal. Figure 5-1. ethereal 99
  • 110. To capture data, select Capture Start. You will be presented with a Capture Preferences screen likethe one shown in Figure 5-2. If you have multiple interfaces, you can select which one you want to usewith the first field. The Count: field is used to limit the number of packets you will collect. You canenter a capture filter, using tcpdump syntax, in the Filter: field. If you want your data automaticallysaved to a file, enter that in the File: field. The fifth field allows you to limit the number of bytes youcollect from the packet. This can be useful if you are interested only in header information and want tokeep your files small. The first of the four buttons allows you to switch between promiscuous andnonpromiscuous mode. With the latter, youll collect only traffic sent to or from your machine ratherthan everything your machine sees. Select the second button if you want to see traffic as it is captured.The third button selects automatic scrolling. Finally, the last button controls name resolution. Nameresolution really slows ethereal down. Dont enable name resolution if you are going to displaypackets in real time! Once you have everything set, click on OK to begin capturing data. Figure 5-2. ethereal Capture Preferences 100
  • 111. While you are capturing traffic, ethereal will display a Capture window that will give you counts forthe packets captured in real time. This window is shown in Figure 5-3. If you didnt say how manyframes you wanted to capture on the last screen, you can use the Stop button to end capture. Y Figure 5-3. ethereal Capture FL AM TEOnce you have finished capturing data, youll want to go back to the main screen shown in Figure 5-1.The top pane displays a list of the captured packets. The lower panes display information for thepacket selected in the top pane. The packet to be dissected is selected in the top pane by clicking on it.The second pane then displays a protocol tree for the packet, while the bottom pane displays the rawdata in hex and ASCII. The layout of ethereal is shown in Figure 5-1. Youll probably want to scrollthrough the top pane until you find the traffic of interest. Once you have selected a packet, you canresize the windows as needed. Alternately, you can select Display Show Packet in New Window toopen a separate window, allowing you to open several packets at once.The protocol tree basically displays the structure of the packet by analyzing the data and determiningthe header type and decoding accordingly. Fields can be expanded or collapsed by clicking on the plusor minus next to the field, respectively. In the figure, the Internet Protocol header has been expanded 101 Team-Fly®
  • 112. and the Type-Of-Service (TOS) field in turn has been expanded to show the various values of the TOSflags. Notice that the raw data for the field selected in the second pane is shown in bold in the bottompane. This works well for most protocols, but if you are using some unusual protocol, like otherprograms, ethereal will not know what to do with it.ethereal has several other useful features. For example, you can select a TCP packet from the mainpane and then select Tools Follow TCP Stream. This tool collects information from all the packetsin the TCP session and displays the information. Unfortunately, while convenient at times, this featuremakes it just a little too easy to capture passwords or otherwise invade users privacy.The Tools Summary gives you the details for data you are looking at. An example is shown inFigure 5-4. Figure 5-4. ethereal SummaryThere are a number of additional features that I havent gone into here. But what I described here ismore than enough for most simple tasks.5.6.1.2 Display filtersDisplay filters allow you to selectively display data that has been captured. At the bottom of thewindow shown in Figure 5-1, there is a box for creating display filters. As previously noted, displayfilters have their own syntax. The ethereal documentation describes this syntax in great detail. In thiscase, I have entered http to limit the displayed traffic to web traffic. I could just as easily enter anynumber of other different protocols—ip, udp, icmp, arp, dns, etc. 102
  • 113. The real power of ethereal s display filters comes when you realize that you dont really need tounderstand the syntax of display filters to start using them. You can select a field from the center paneand then select Display Match Selected, and ethereal will construct and apply the filter for you. Ofcourse, not every field is useful, but it doesnt take much practice to see what works and what doesntwork.The primary limitation of this approach comes in constructing compound filters. If you want tocapture all the traffic to or from a computer, you wont be able to match a single field. But you shouldbe able to discover the syntax for each of the pieces. Once you know thatip.src==205.153.63.30 matches all IP traffic with 205.153.63.30 as its source and thatip.dst==205.153.63.30 matches all IP traffic to 205.153.63.30, it isnt difficult to come up withthe filter you need, ip.src==205.153.63.30 or ip.dst==205.153.63.30. Display filters arereally very intuitive, so you should have little trouble learning how to use them.Perhaps more than any other tool described in this book, ethereal is constantly being changed andimproved. While this book was being written, new versions were appearing at the rate of about once amonth. So you should not be surprised if ethereal looks a little different from what is described here.Fortunately, ethereal is a well-developed program that is very intuitive to use. You should have littletrouble going on from here.5.7 Dark Side of Packet CaptureWhat you can do, others can do. Pretty much anything you can discover through packet capture can bediscovered by anyone else using packet capture in a similar manner. Moreover, some technologies thatwere once thought to be immune to packet capture, such as switches, are not as safe as once believed.5.7.1 Switch SecuritySwitches are often cited as a way to protect traffic from sniffing. And they really do provide somedegree of protection from casual sniffing. Unfortunately, there are several ways to defeat theprotection that switches provide.First, many switches will operate as hubs, forwarding traffic out on every port, whenever their addresstables are full. When first initialized, this is the default behavior until the address table is built.Unfortunately, tools like macof, part of the dsniff suite of tools, will flood switches with MACaddresses overflowing a switchs address table. If your switch is susceptible, all you need to do tocircumvent security is run the program.Second, if two machines have the same MAC address, some switches will forward traffic to bothmachines. So if you want copies of traffic sent to a particular machine on your switch, you can changethe MAC address on your interface to match the target devices MAC address. This is easily done onmany Unix computers with the ifconfig command.A third approach, sometimes called ARP poisoning, is to send a forged ARP packet to the sourcedevice. This can be done with a tool like arpredirect, also part of dsniff. The idea is to substitute thepacket capture devices MAC address for the destinations MAC address. Traffic will be sent to apacket capture device, which can then forward the traffic to its destination. Of course, the forged ARPpackets can be sent to any number of devices on the switch. 103
  • 114. The result, with any of these three techniques, is that traffic will be copied to a device that can captureit. Not all switches are susceptible to all of these attacks. Some switches provide various types of portsecurity including static ARP assignments. You can also use tools like arpwatch to watch forsuspicious activities on your network. (arpwatch is described in Chapter 6.) If sniffing is a concern,you may want to investigate what options you have with your switches.While these techniques could be used to routinely capture traffic as part of normal management, thetechniques previously suggested are preferable. Flooding the address table can significantly degradenetwork performance. Duplicating a MAC address will allow you to watch traffic only to a single host.ARP poisoning is a lot of work when monitoring more than one host and can introduce traffic delays.Consequently, these arent really techniques that youll want to use if you have a choice.5.7.2 Protecting YourselfBecause of the potential for abuse, you should be very circumspect about who has access to packetcapture tools. If you are operating in a Unix-only environment, you may have some success inrestricting access to capture programs. packet capture programs should always be configured asprivileged commands. If you want to allow access to a group of users, the recommended approach isto create an administrative group, restrict execution of packet capture programs to that group, and givegroup membership only to a small number of trusted individuals. This amounts to setting the SUID bitfor the program, but limiting execution to the owner and any group members.With some versions of Unix, you might even consider recompiling the kernel so the packet capturesoftware cant be run on machines where it isnt needed. For example, with FreeBSD, it is verystraightforward to disable the Berkeley packet filter in the kernel. (With older versions of FreeBSD,you needed to explicitly enable it.) Another possibility is to use interfaces that dont supportpromiscuous mode. Unfortunately, these can be hard to find.There is also software that can be used to check to see if your interface is in promiscuous mode. Youcan do this manually with the ifconfig command. Look for PROMISC in the flags for the interface. Forexample, here is the output for one interface in promiscuous mode:bsd2# ifconfig ep0ep0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500 inet 172.16.2.236 netmask 0xffffff00 broadcast 172.16.2.255 inet6 fe80::260:97ff:fe06:2222%ep0 prefixlen 64 scopeid 0x2 ether 00:60:97:06:22:22 media: 10baseT/UTP supported media: 10baseT/UTPOf course, youll want to check every interface.Alternately, you could use a program like cpm, check promiscuous mode from CERT/CC. lsof,described in Chapter 11, can be used to look for large open files that might be packet sniffer output.But if you have Microsoft Windows computers on your network or allow user-controlled computerson your network, this approach isnt enough.While it may appear that packet capture is a purely passive activity that is undetectable, this is oftennot the case. There are several techniques and tools that can be used to indicate packet capture or totest remote interfaces to see if they are in promiscuous mode. One of the simplest techniques is to turnyour packet capture software on, ping an unused IP address, and watch for DNS queries trying toresolve that IP address. An unused address should be ignored. If someone is trying to resolve theaddress, it is likely they have captured a packet. 104
  • 115. Another possibility is the tool antisniff from L0pht Heavy Industries. This is a commercial tool, but aversion is available for noncommercial uses. There are subtle changes in the behavior of an interfacewhen placed in promiscuous mode. This tool is designed to look for those changes. It can probe thesystems on a network, examine their responses, and usually determine which devices have an interfacein promiscuous mode.Another approach is to restructure your network for greater security. To the extent you can limitaccess to traffic, you can reduce the packet capture. Use of virtual LANs can help, but no approach isreally foolproof. Ultimately, strong encryption is your best bet. This wont stop sniffing, but it willprotect your data. Finally, it is always helpful to have clearly defined policies. Make sure your usersknow that unauthorized packet capture is not acceptable.5.8 Microsoft WindowsIn general, it is inadvisable to leave packet capture programs installed on Windows systems unlessyou are quite comfortable with the physical security you provide for those machines. Certainly, packetcapture programs should never be installed on publicly accessible computers using consumer versionsof Windows.The programs WinDump95 and WinDump are ports of tcpdump to Windows 95/98 and Windows NT,respectively. Each requires the installation of the appropriate drivers. They are run in DOS windowsand have the same basic syntax as tcpdump. As tcpdump has already been described, there is little toadd here.ethereal is also available for Windows and, on the whole, works quite well. The one area in which theport doesnt seem to work is in sending output directly to a printer. However, printing to files worksnicely so you can save any output you want and then print it.One of the more notable capture programs available for Windows platforms is netmon (NetworkMonitor), a basic version of which is included with Windows NT Server. The netmon program wasoriginally included with Windows NT 3.5 as a means of collecting data to send to Microsoftstechnical support. As such, it was not widely advertised. Figure 5-5 shows the packet display window. Figure 5-5. netmon for Windows 105
  • 116. The basic version supplied with Windows NT Server is quite limited in scope. It restricts capture totraffic to or from the server and severely limits the services it provides. The full version is included aspart of the Systems Management Server (SMS), part of the BackOffice suite, and is an extremelypowerful program. Of concern with any capture and analysis program is what protocols can beeffectively decoded. As might be expected, netmon is extremely capable when dealing with Microsoftprotocols but offers only basic decoding of Novell protocols. (For Novell protocols, consider NovellsLANalyzer.)One particularly nice feature of netmon is the ability to set up collection agents on any Windows NTworkstation and have them collect data remotely. The collected data resides with the agent untilneeded, thus minimizing traffic over the network.The program is, by default, not installed. The program can be added as a service under networkconfiguration in the setup window. It is included under Administrative Tools (Common). The program,once started, is very intuitive and has a strong help system. 106
  • 117. Chapter 6. Device Discovery and MappingThe earlier chapters in this book focused on collecting information on the smaller parts of a network,such as the configuration of an individual computer or the path between a pair of computers. Startingwith this chapter, we will broaden our approach and look at tools more suited to collecting informationon IP networks as a whole. The next three closely related chapters deal with managing andtroubleshooting devices distributed throughout a network. This chapter focuses on device discoveryand mapping. Additional techniques and tools for this purpose are presented in Chapter 7, once SimpleNetwork Management Protocol (SNMP) has been introduced. Chapter 8 focuses on the collection ofinformation on traffic patterns and device utilization throughout the network.This chapter begins with a brief discussion of the relationship between network management andtroubleshooting. This is followed by a discussion of ways to map out the IP addresses that are beingused on your network and ways to find which IP addresses correspond to which hosts. This isfollowed by a description of ways to discover more information on these hosts based on the networkservices they support and other forensic information. The chapter briefly discusses scripting tools,then describes the network mapping and monitoring tool, tkined. The chapter concludes with a briefdescription of related tools for use with Microsoft Windows platforms.6.1 Troubleshooting Versus ManagementSome of the tools in the next few chapters may seem only marginally related to troubleshooting. Thisis not a totally unfair judgment. Of course, troubleshooting is an unpredictable business, and any toolsthat can provide information may be useful in some circumstances. Often you will want to use toolsthat were designed with another purpose in mind.But these tools were not included just on the off chance they might be useful. Many of the toolsdescribed here, while typically used for management, are just as useful for troubleshooting. In a veryreal sense, troubleshooting and management are just different sides of the same coin. Ideally,management deals with problems before they happen, while troubleshooting deals with problems afterthe fact. With this in mind, it is worth reviewing management software with an eye on how it can beused as troubleshooting software.6.1.1 Characteristics of Management SoftwareEveryone seems to have a different idea of exactly what management software should do. Ideally,network management software will provide the following:Discovery and mapping Discovery includes both the automatic detection of all devices on a network and the collection of basic information about each device, such as the type of each device, its MAC address and IP address, the type of software being used, and, possibly, the services it provides. Mapping is the creation of a graphical representation of the network showing individual interconnections as well as overall topology.Event monitoring 107
  • 118. Once a picture of the network has been created, each device may be monitored to ensure continuous operation. This can be done passively, by waiting for the device to send an update or alert, or by actively polling the device.Remote configuration You should be able to connect to each device and then examine and change its configuration. It should also be possible to collectively track configuration information, such as which IP addresses are in use.Metering and performance management Information on resource utilization should be collected. Ideally, this information should be available in a usable form for purposes such as trend analysis and capacity planning.Software management Being able to install and configure software remotely is rapidly becoming a necessity in larger organizations. Being able to track licensing can be essential to avoid legal problems. Version management is also important.Security and accounting Depending on the sensitivity of data, the organizations business model, and access and billing policies, it may be necessary to control or track who is using what on the network.It doesnt take much imagination to see how most of these functions relate to troubleshooting. Thischapter focuses on discovery and mapping. Chapter 7 will discuss event monitoring and the remoteconfiguration of hardware and software. Metering and performance management are discussed inChapter 8. Security is discussed throughout the next three chapters as appropriate.6.1.2 Discovery and Mapping ToolsA wide range of tools is available. At the low end are point tools -- tools designed to deal with specifictasks or closely related tasks. Several of the tools we will examine, such as arpwatch and nmap, fall inthis category. Such tools tend to be well focused and do their job well. Typically, they are very easy tolearn to use and are usually free or quite inexpensive.Also found at the low end are toolkits and scripting languages for creating your own applications.Unlike most prebuilt tools, these can be extremely difficult to both learn and use, but they often giveyou the greatest degree of control. The quality of the final tool will ultimately depend on how mucheffort and skill you put into its creation. The initial outlay may be modest, but the development timecan be extremely costly. Nonetheless, some people swear by this approach. The idea is that time isspent once to develop a tool that saves time each time it is used. We will look very briefly at thescripting language Tcl and its extensions. The primary goal here will be to describe the issues andprovide information on how to get started.At the middle of the range are integrated packages. This type of software addresses more than oneaspect of network management. They typically include network discovery, mapping, and monitoringprograms but may include other functionality as well. Typically they are straightforward to use butdont perform well with very large, diverse networks. 108
  • 119. Finally, at the high end are frameworks. Roughly, these are packages that can be easily extended.Since you can extend functionality by adding modules, frameworks are better suited for larger, diversenetworks. But be warned, dividing lines among these last categories are not finely drawn.Unfortunately, at the time of this writing, there arent many freely available packages at these higherlevels. The leading contenders are really works in progress. tkined is described in this chapter and thenext because it seemed, at the time this was written, to be further along and fairly stable. But there areat least two other projects making rapid progress in this area that are worth considering. The work ofOpen Network Management Systems (http://www.opennms.org) is truly outstanding and makingterrific progress. The other is the GxSNMP SNMP Manager (http://www.gxsnmp.org), a part of theGNOME project. Both are open source (http://opensource.org) projects, and both appear to have acommitted base of supporters and are likely to be successful. At the time this was written, both hadbegun to release viable tools, particularly the Open Network Management Systems folks. (Linux usersmay want to also consider Cheops.)6.1.3 Selecting a ProductIt may seem strange that a book devoted to noncommercial software would recommend buyingsoftware, but network management is one area in which you should at least consider the possibility.Commercial products are not without problems, but noncommercial mapping and management toolsare relatively scarce. Depending on the size of the network you are dealing with, you may have littlechoice but to consider commercial products at this time.The key factors are the size of your network, the size of your budget, and the cost of a nonfunctioningnetwork. With point tools, you will be forced to put the pieces together. Certainly, this is somethingyou can do with a small network. If you are responsible for a single LAN or small number of LANsand if you can tolerate being down for a few hours at a time, then you can probably survive with thenoncommercial tools described here. But if you are responsible for a larger network or one that israpidly changing, then you should consider commercial tools. While these may be quite expensive,they may be essential for a large network. And if you are really dealing with a large number ofmachines, the cost per machine may not be that high.Even if you feel compelled to buy commercial management software, you should read the rest of thischapter. Several of the point tools described here can be used in conjunction with commercial tools.Some of these tools, because they are designed for a single function, will perform better thancommercial tools that attempt to do everything. In a few instances, noncommercial tools addressissues not addressed by commercial tools.6.2 Device DiscoveryThe first step in managing a network is discovering which devices are on the network. There are somefairly obvious reasons why this is important. You will need to track address usage to manage servicessuch as DNS. You may need this information to verify licensing information. From a securityperspective, you will want to know if there are any devices on your network that shouldnt be there.And one particularly compelling reason for a complete picture of your network is IP addressmanagement.6.2.1 IP Address Management 109
  • 120. Management of IP addresses is often cited as the most common problem faced in the management ofan IP network. There are two goals in IP management—keeping track of the addresses in use so youknow what is available and keeping track of the devices associated with each assigned IP address.Several developments over the last few years have helped to lessen the problems of IP management.First, DHCP servers, systems that automatically allocate and track IP addresses, help when dynamicallocation is appropriate. But there are a number of reasons why a system may require a static IPaddress. Any resource or server—time server, name server, and so on—should be given a staticaddress. Network devices like switches and routers require static addresses. Some sites require reverseDNS lookup before allowing access. The easiest way to provide this is with a static IP address andwith an appropriate DNS entry.[1] Even when such issues dont apply, the cost and complexity ofDHCP services may prevent their use. And even if you use DHCP, there is nothing to prevent a userfrom incorrectly assigning a static IP address in the middle of the block of addresses you havereserved for DNS assignment. [1] Strictly speaking, static addresses are not mandatory in every case. Support for dynamic DNS, or DDNS, has been available for several years. With DDNS, DNS entries can be mapped to dynamically assigned IP addresses. Unfortunately, many sites still do not use it.Another development that has helped is automatic testing of newly assigned addresses. While earlierimplementations of TCP/IP stacks sometimes neglected to test whether an IP address was being used,most systems, when booted, now first check to see if an IP address is in use before using it. The test,known as gratuitous ARP, sends out an ARP request for the IP address about to be used. If anyonereplies, the address must already be in use. Of course, this test works only when the other machine isturned on. You may set up a machine with everything appearing to work correctly, only to get a calllater in the day. Once such a problem has been detected, you will need to track it down.While these and similar developments have gone a long way toward lessening the problems of IPmanagement and duplicate IP addresses, IP management remains a headache on many networks.Ideally, you will keep careful records as IP addresses are assigned, but mistakes are unavoidable. Thus,an automated approach is often desirable.The simplest way to collect MAC/IP address pairs is to ping the address and then examine your ARPtable. The ping is necessary since most ARP tables are flushed frequently. At one time, it was possibleto ping a broadcast address and get a number of replies at once. Most hosts are now configured toignore ICMP requests sent to broadcast addresses. (See the discussion of Smurf Attacks in Chapter 3.)You will need to repeat ping scans very frequently if you want to get a picture over time. It is a simplematter to create a script that automates the process of pinging a range of IP addresses, particularly ifyou use a tool like fping. Youll need the output from the arp command if you want the MACaddresses. And you certainly will want to do some cleanup with sort or sed.Fortunately, there is a class of tools that simplifies this process—IP scanner or ping scanner. These areusually very simple tools that send ICMP ECHO_REQUEST packets in a systematic manner to eachIP address in a range of IP addresses and then record any replies. (These tools are not limited to usingjust ECHO_REQUEST packets.)6.2.2 nmapThe program nmap is a multifunction tool that supports IP scanning. It also provides port scanning andstack fingerprinting. (Stack fingerprinting is described later in this chapter.) nmap is an extremely 110
  • 121. feature-rich program with lots of versatility. For many of its uses, root privileges are required,although some functions work without root privileges.nmap certainly could have been described in Chapter 2, when port scanners were introduced. But if allyou want is a port scan for a single machine, using nmap is overkill.[2] Nonetheless, if you only wantas few programs as possible and you need some of the other functionality that nmap provides, thenyou can probably get by with just nmap. [2] There are also reasons, as will become evident, why you might not want nmap too freely available on your network.To use nmap as a port scanner, the only information you need is the IP address or hostname of thetarget:bsd1# nmap sol1Starting nmap V. 2.12 by Fyodor (fyodor@dhp.com, www.insecure.org/nmap/)Interesting ports on sol1.lander.edu (172.16.2.233):Port State Protocol Service21 open tcp ftp23 open tcp telnet25 open tcp smtp37 open tcp time111 open tcp sunrpc515540 open open tcp tcp printer uucp Y FL6000 open tcp X11Nmap run completed—1 IP address (1 host up) scanned in 1 second AMThe results should be self-explanatory. You can specify several IP addresses or you can span asegment by specifying an address with a mask if you want to scan multiple devices or addresses. Thenext example will scan all the addresses on the same subnet as the lnx1 using a class C network mask: TEbsd1# nmap lnx1/24While nmap skips addresses that dont respond, this can still produce a lot of output.Fortunately, nmap will recognize a variety of address range options. Consider:bsd1# nmap 172.16.2.230-235,240This will scan seven IP addresses—those from 172.16.2.230 through 172.16.2.235 inclusive and172.16.2.240. You can use 172.16.2.* to scan everything on the subnet. Be warned, however, that theshell you use may require you to use an escape sequence for the * to work correctly. For example,with C-shell, you could use 172.16.2.*. You should also note that the network masks do not have toalign with a class boundary. For example, /29 would scan eight hosts by working through thepossibilities generated by changing the three low-order bits of the address.If you want to just do an IP scan to discover which addresses are currently in use, you can use the -sPoption. This will do a ping-like probe for each address on the subnet:bsd1# nmap -sP lnx1/24Starting nmap V. 2.12 by Fyodor (fyodor@dhp.com, www.insecure.org/nmap/) 111 Team-Fly®
  • 122. Host (172.16.2.0) seems to be a subnet broadcast address (returned 3 extrapings). Skipping host.Host cisco.lander.edu (172.16.2.1) appears to be up.Host (172.16.2.12) appears to be up.Host (172.16.2.230) appears to be up.Host bsd2.lander.edu. (172.16.2.232) appears to be up.Host sol1.lander.edu (172.16.2.233) appears to be up.Host lnx1.lander.edu (172.16.2.234) appears to be up.Host (172.16.2.255) seems to be a subnet broadcast address (returned 3 extrapings). Skipping host.Nmap run completed—256 IP addresses (6 hosts up) scanned in 1 secondYou should be warned that this particular scan uses both an ordinary ICMP packet and a TCP ACKpacket to port 80 (HTTP). This second packet will get past routers that block ICMP packets. If an RSTpacket is received, the host is up and the address is in use. Unfortunately, some intrusion detectionsoftware that will ignore the ICMP packet will flag the TCP ACK as an attack. If you want to use onlyICMP packets, use the -PI option. For example, the previous scan could have been done using onlyICMP packets with the command:bsd1# nmap -sP -PI lnx1/24In this case, since the devices are on the same subnet and there is no intervening firewall, the samemachines are found.Unfortunately, nmap stretches the limits of what might be considered appropriate at times. Inparticular, nmap provides a number of options for stealth scanning. There are two general reasons forusing stealth scanning. One is to probe a machine without being detected. This can be extremelydifficult if the machine is actively watching for such activity.The other reason is to slip packets past firewalls. Because firewall configuration can be quite complexand because it can be very difficult to predict traffic patterns, many firewalls are configured in waysthat allow or block broad, generic classes of traffic. This minimizes the number of rules that need tobe applied and improves the throughput of the firewall. But blocking broad classes of traffic alsomeans that it may be possible to sneak packets past such firewalls by having them look like legitimatetraffic. For example, external TCP connections may be blocked by discarding the external SYNpackets used to set up a connection. If a SYN/ACK packet is sent from the outside, most firewalls willassume the packet is a response for a connection that was initiated by an internal machine.Consequently, the firewall will pass the packet. With these firewalls, it is possible to construct such apacket and slip it through the firewall to see how an internal host responds.nmap has several types of scans that are designed to do stealth probes. These include -sF, -sX, and -sN.(You can also use the -f option to break stealth probes into lots of tiny fragments.) But while thesestealth packets may slip past firewalls, they should all be detected by any good intrusion detectionsoftware running on the target. You may want to try these on your network just to see how well yourintrusion detection system works or to investigate how your firewall responds. But if you are usingthese to do clandestine scans, you should be prepared to be caught and to face the consequences.Another questionable feature of nmap is the ability to do decoy scans. This option allows you tospecify additional forged IP source addresses. In addition to the probe packets that are sent with thecorrect source address, other similar packets are sent with forged source addresses. The idea is tomake it more difficult to pinpoint the real source of the attack since only a few of the packets willhave the correct source address. Not only does this create unnecessary network traffic, but it cancreate problems for hosts whose addresses are spoofed. If the probed site automatically blocks trafficfrom probing sites, it will cut off the spoofed sites as well as the site where the probe originated. 112
  • 123. Clearly, this is not what you really want to do. This calls into question any policy that simply blockssites without further investigation. Such systems are also extremely vulnerable to denial-of-serviceattacks. Personally, I can see no legitimate use for this feature and would be happy to see it droppedfrom nmap.But while there are some questionable options, they are easily outnumbered by useful options. If youwant your output in greater detail, you might try the -v or the -d option. If information is streamingpast you on the screen too fast for you to read, you can log the output to a file in human-readable ormachine-parseable form. Use, respectively, the -o or -m options along with a filename. The -h optionwill give a brief summary of nmaps many options. You may want to print this to use while you learnnmap.If you are using nmap to do port scans, you can use the -p option to specify a range of ports.Alternatively, the -F, or fast scan option, can be used to limit scans to ports in your services file.Youll certainly want to consider using one or the other of these. Scanning every possible port on anetwork can take a lot of time and generate a lot of traffic. A number of other options are described innmaps documentation.Despite the few negative things I have mentioned, nmap really is an excellent tool. You will definitelywant to add it to your collection.6.2.3 arpwatchActive scans, such as those we have just seen with nmap, have both advantages and disadvantages.They allow scans of remote networks and give a good snapshot of the current state of the network.The major disadvantage is that these scans will identify only machines that are operational when youdo the scan. If a device is on for only short periods at unpredictable times, it can be virtuallyimpossible to catch by scanning. Tools that run constantly, like arpwatch, provide a better picture ofactivity over time.For recording IP addresses and their corresponding MAC addresses, arpwatch is my personal favorite.It is a very simple tool that does this very well. Basically, arpwatch places an interface in promiscuousmode and watches for ARP packets. It then records IP/MAC address pairs. The primary limitation toarpwatch comes from being restricted to local traffic. It is not a tool that can be used across networks.If you need to watch several networks, you will need to start arpwatch on each of those networks.The information can be recorded in one of four ways. Data may be written directly to the systemconsole, to the systems syslog file, or to a user-specified text file, or it can be sent as an email to root.(syslog is described in Chapter 11.) Output to the console or the syslog file is basically the same. Anentry will look something like:Mar 30 15:16:29 bsd1 arpwatch: new station 172.16.2.234 0:60:97:92:4a:6Of course, with the syslog file, these messages will be interspersed with many other messages, but youcan easily use grep to extract them. For example, to write all the messages from arpwatch that wererecorded in /var/log/messages into the file /temp/arp.data, you can use the command:bsd1# grep arpwatch /var/log/messages > /tmp/arp.list 113
  • 124. If your syslog file goes by a different name or you want output in a different output file, you will needto adjust names accordingly. This approach will include other messages from arpwatch as well, butyou can easily delete those that are not of interest.Email looks like:From: arpwatch (Arpwatch)To: rootSubject: new station (lnx1.lander.edu) hostname: lnx1.lander.edu ip address: 172.16.2.234 ethernet address: 0:60:97:92:4a:6 ethernet vendor: 3Com timestamp: Thursday, March 30, 2000 15:16:29 -0500Email output has the advantage of doing name resolution for the IP address, and it gives the vendor forthe MAC address. The vendor name is resolved using information in the file ethercodes.dat. This file,as supplied with arpwatch, is not particularly complete or up-to-date, but you can always go to theIEEE site as described in Chapter 2 if you need this data for a particular interface. If you do this, dontforget to update the ethercodes.dat file on your system.arpwatch can also record raw data to a file. This is typically the file arp.dat, but you can specify adifferent file with the -f option. The default location for arp.dat seems to vary with systems. Themanpage for arpwatch specifies /usr/operator/arpwatch as the default home directory, but this maynot be true for some ports. If you use an alternative file, be sure to give its full pathname. Whether youuse arp.dat or another file, the file must exist before you start arpwatch. The format is pretty sparse:0:60:97:92:4a:6 172.16.2.234 954447389 lnx1Expect a lot of entries the first few days after you start arpwatch as it learns your network. This can bea little annoying at first, but once most machines are recorded, you shouldnt see much traffic—onlynew or changed addresses. These should be very predictable. Of particular concern are frequentlychanging addresses. The most likely explanation for a single address change is that a computer hasbeen replaced by another. Although less likely, a new adapter would also explain the change.Frequent or unexplained changes deserve greater scrutiny. It could simply mean someone is using twocomputers. Perhaps a user is unplugging his desktop machine in order to plug in his portable. But itcan also mean that someone is trying to hide something they are doing. On many systems, both theMAC and IP addresses can be easily changed. A cracker will often change these addresses to coverher tracks. Or a cracker could be using ARP poisoning to redirect traffic.Here is an example of an email report for an address change:From: arpwatch (Arpwatch)To: rootSubject: changed ethernet address hostname: <unknown> ip address: 205.153.63.55 ethernet address: 0:e0:29:21:88:83 ethernet vendor: <unknown>old ethernet address: 0:e0:29:21:89:d9 old ethernet vendor: <unknown> timestamp: Monday, April 3, 2000 4:57:16 -0400 114
  • 125. previous timestamp: Monday, April 3, 2000 4:52:33 -0400 delta: 4 minutesNotice that the subject line will alert you to the nature of the change. This change was followedshortly by another change as shown here:From: arpwatch (Arpwatch)To: rootSubject: flip flop hostname: <unknown> ip address: 205.153.63.55 ethernet address: 0:e0:29:21:89:d9 ethernet vendor: <unknown>old ethernet address: 0:e0:29:21:88:83 old ethernet vendor: <unknown> timestamp: Monday, April 3, 2000 9:40:47 -0400 previous timestamp: Monday, April 3, 2000 9:24:07 -0400 delta: 16 minutesThis is basically the same sort of information, but arpwatch labels the first as a changed address andsubsequent changes as flip-flops.If you are running DHCP and find arpwatchs output particularly annoying, you may want to avoidarpwatch. But if you are having problems with DHCP, arpwatch might, in limited circumstances, beuseful.6.3 Device IdentificationAt times it can be helpful to identify the operating system used on a remote machine. For example,you may need to identify systems vulnerable to some recently disclosed security hole. Or if you arefaced with a duplicate IP address, identifying the type of machine is usually the best first step inlocating it. Using arp to discover the type of hardware may be all that you will need to do. If you haveidentified the interface as a Cisco interface and you have only a half dozen Cisco devices on yournetwork, you should be able to easily find the one with the duplicate address. If, on the other hand,you can identify it only as one of several hundred PCs, youll want more information. Knowing theoperating system on the computer may narrow your search.The obvious, simple strategies are usually the best place to start, since these are less likely to offendanyone. Ideally, you will have collected additional information as you set systems up, so all youllneed to do is consult your database, DHCP records, or DNS files or, perhaps, give the user a call. Butif your records are incomplete, youll need to probe the device.Begin by using telnet to connect to the device to check for useful banners. Often login banners arechanged or suppressed, so dont restrict yourself to just the Telnet port. Here is an example of tryingthe SMTP port (25):bsd1# telnet 172.16.2.233 25Trying 172.16.2.233...Connected to 172.16.2.233.Escape character is ^].220 sol1. ESMTP Sendmail 8.9.1b+Sun/8.9.1; Fri, 2 Jun 2000 09:02:45 -0400 (EDT) 115
  • 126. quit221 sol1. closing connectionConnection closed by foreign host.This simple test tells us the host is sol1, and it is using a Sun port of sendmail. The most likely ports totry are FTP (21), Telnet (23), SNMP (25), HTTP (80), POP2 (109), POP3 (110), and NTTP (119), but,depending on the systems, others may be informative as well.Often, you dont even have to get the syntax correct to get useful information. Here is an example ofan ill-formed GET request (the REQUEST_URI is omitted) sent using telnet:bsd1# telnet 172.16.2.230 80Trying 172.16.2.230...Connected to 172.16.2.230.Escape character is ^].GET HTTP/1.0HTTP/1.1 400 Bad RequestServer: Microsoft-IIS/4.0...Additional output has been omitted, but the system has been identified in the last line shown. (SeeChapter 10 for other examples.)Port scanning is one of the tools described in Chapter 2 that can also be used here. To do the testsdescribed in Chapter 2, you need change only the host address. The interpretation of the results is thesame. The only thing you need worry about is the possibility that some of the services you are testingmay be blocked by a firewall. Of course, the presence or absence of a service may provide insight intothe role of the device. An obvious example is an open HTTP port. If it is open, you are looking at aweb server (or, possibly, a machine misconfigured as a web server) and can probably get moreinformation by using your web browser on the site.When these obvious tests fail, as they often will, youll need a more sophisticated approach such asstack fingerprinting.6.3.1 Stack FingerprintingThe standards that describe TCP/IP stack implementations are incomplete in the sense that theysometimes do not address how the stack should respond in some degenerate or pathological situations.For example, there may be no predefined way for dealing with a packet with contradictory flags orwith a meaningless sequence of inconsistent packets. Since these situations should not normally arise,implementers are free to respond in whatever manner they see fit. Different implementations respondin different ways.There are also optional features that stack implementers may or may not choose to implement. Thepresence or absence of such support is another useful clue to the identity of a system. Even whenbehavior is well defined, some TCP/IP stacks do not fully conform to standards. Usually, thedifferences are minor inconsistencies that have no real impact on performance or interoperability. Forexample, if an isolated FIN packet is sent to an open port, the system should ignore the packet.Microsoft Windows, among others, will send a RESET instead of ignoring the packet. This doesntcreate any problems for either of the devices involved, but it can be used to distinguish systems.Collectively, these different behaviors can be exploited to identify which operating system (OS) isbeing used on a remote system. A carefully chosen set of packets is sent and the responses are 116
  • 127. examined. It is necessary only to compare the responses seen against a set of known behaviors todeduce the remote system. This technique is known as stack fingerprinting or OS fingerprinting.A fingerprinting program will be successful only if it has a set of anomalies or, to mix metaphors, asignature that distinguishes the device of interest from other devices. Since devices change and newdevices are introduced, it is not uncommon for a stack fingerprinting program not to know thesignature for some devices. Ideally, the program will have a separate signature file or database so thatit can be easily updated. From the users perspective, it may also be helpful to have more than oneprogram since each may be able to identify devices unknown to the other. Consequently, both quesoand the stack fingerprinting option for nmap are described here.It should also be noted that passive fingerprinting is possible. With passive fingerprinting, the idea isto examine the initialization packets that come into your machine. Of course, this will only identifysystems that try to contact you, but this can be a help in some circumstances, particularly with respectto security. In some ways, this approach is more reliable. When a remote machine sends the firstpacket, it must fill in all the fields in the headers. When you probe a remote machine, many of thefields in the headers in the reply packet will have been copied directly from your probe packets. If youare interested in this approach, you might want to look at siphon or p0f. When using stack fingerprinting, whether active or passive, you must realize that you are fingerprinting the machine you are actually communicating with. Normally, that is exactly what you want. But if there is a proxy server between your machine and the target, you will fingerprint the proxy server, not the intended target.6.3.2 quesoA number of programs do stack fingerprinting. One simple program that works well is queso. Its solefunction is stack fingerprinting. The syntax is straightforward:bsd1# queso 172.16.2.230172.16.2.230:80 * Windoze 95/98/NTBy default, queso probes the HTTP port (80). If that port is not in use, queso will tell you to tryanother port:bsd1# queso 172.16.2.1172.16.2.1:80 *- Not Listen, try another portYou can do this with the -p option. In this example, the Telnet port is being checked:bsd1# queso -p23 172.16.2.1172.16.2.1:23 * Cisco 11.2(10a), HP/3000 DTC, BayStack SwitchThis is not a definitive answer, but it has certainly narrowed down the field.You can call queso with multiple addresses by simply putting all the addresses on the command line.You can also use subnet masks, as shown in the following:bsd1# queso -p23 172.16.2.232/29172.16.2.233:23 * Solaris 2.x 117
  • 128. 172.16.2.234:23 * Linux 2.1.xx172.16.2.235:23 *- Not Listen, try another port172.16.2.236:23 * Dead Host, Firewalled Port or Unassigned IP172.16.2.237:23 * Dead Host, Firewalled Port or Unassigned IP172.16.2.238:23 * Dead Host, Firewalled Port or Unassigned IPNotice from this example that mask selection doesnt have to fall on a class boundary.queso maintains a separate configuration file. If it doesnt recognize a system, it will prompt you toupdate this file:bsd1# queso -p23 205.153.60.1205.153.60.1:23 *- Unknown OS, pleez update /usr/local/etc/queso.confYou can update this file with the -w option. queso can identify a hundred or so different systems. It isnot a particularly fast program but gives acceptable results. It can take several seconds to scan eachmachine on the same subnet. If you invoke queso without any argument, it will provide a briefsummary of its options.6.3.3 nmap RevisitedYou can also do stack fingerprinting with nmap by using the -O option:bsd1# nmap -O 172.16.2.230Starting nmap V. 2.12 by Fyodor (fyodor@dhp.com, www.insecure.org/nmap/)WARNING: OS didnt match until the 2 tryInteresting ports on (172.16.2.230):Port State Protocol Service21 open tcp ftp80 open tcp http135 open tcp loc-srv139 open tcp netbios-ssn443 open tcp https1032 open tcp iad36666 open tcp irc-serv7007 open tcp afs3-bosTCP Sequence Prediction: Class=trivial time dependency Difficulty=0 (Trivial joke)Remote operating system guess: Windows NT4 / Win95 / Win98Nmap run completed—1 IP address (1 host up) scanned in 5 secondsYou can suppress most of the port information by specifying a particular port. For example:bsd1# nmap -p80 -O 172.16.2.230Starting nmap V. 2.12 by Fyodor (fyodor@dhp.com, www.insecure.org/nmap/)Interesting ports on (172.16.2.230):Port State Protocol Service80 open tcp httpTCP Sequence Prediction: Class=trivial time dependency Difficulty=0 (Trivial joke)Remote operating system guess: Windows NT4 / Win95 / Win98Nmap run completed—1 IP address (1 host up) scanned in 1 second 118
  • 129. You will probably want to do this if you are scanning a range of machines to save time. However, ifyou dont restrict nmap to a single port, you are more likely to get a useful answer.Results can be vague at times. This is what nmap returned on one device:...Remote OS guesses: Cisco Catalyst 1900 switch or Netopia 655-U/POTS ISDN Router, Datavoice TxPORT PRISM 3000 T1 CSU/DSU 6.22/2.06, MultiTech CommPleteController, IBM MVS TCP/IP stack V. 3.2, APC MasterSwitch Network Power Controller, AXISor Meridian Data Network CD-ROM server, Meridian Data Network CD-ROM Server (V4.20 Nov 26 1997), WorldGroup BBS (MajorBBS) w/TCP/IPThe correct answer is none of the above. A system that may not be recognized by nmap may berecognized by queso or vice versa.6.4 ScriptsSince most networks have evolved over time, they are frequently odd collections of equipment forwhich no single tool may be ideal. And even when the same tool can be used, differences inequipment may necessitate minor differences in how the tool is used. Since many of the tasks mayneed to be done on a regular basis, it should come as no surprise that scripting languages are a popularway to automate these tasks. Getting started can be labor intensive, but if your current approach isalready labor intensive, it can be justified.You will want to use a scripting language with extensions that support the collection of network data.To give an idea of this approach, Tcl and its extensions are briefly described here. Even if you dontreally want to write your own tools, you may want to consider one of the tools based on Tcl that arefreely available, most notably tkined.Tcl was selected because it is provides a natural introduction to tkined. Of course, there are otherscripting languages that you may want to consider. Perl is an obvious choice. Several packages andextensions are available for system and network administration. For example, you may want to look atspidermap. This is a set of Perl scripts that do network scans. For SNMP-based management, youllprobably want to get Simon Leinens SNMP extensions SNMP_Session.pm and BER.pm. (Other toolsyou might also look at include mon and nocol.)6.4.1 Tcl/Tk and scottyTool Command Language, or Tcl (pronounced "tickle"), is a scripting language that is well suited fornetwork administration. Tcl was developed in the late 1980s by John Ousterhout, then a facultymember at UC Berkeley. Tcl was designed to be a generic, embeddable, and extensible interpretedlanguage. Users frequently cite studies showing Tcl requires one-tenth the development time requiredby C/C++. Its major weakness is that it is not well suited for computationally intensive tasks, but thatshouldnt pose much of a problem for network management. You can also write applets or tclets(pronounced "tik-lets") in Tcl.Tcl can be invoked interactively using the shell tclsh (pronounced "ticklish") or with scripts. You mayneed to include a version number as part of the name. Here is an example: 119
  • 130. bsd2# tclsh8.2%This really is a shell. You can change directories, print the working directory, copy files, remove files,and so forth, using the usual Unix commands. You can use the exit command to leave the program.One thing that makes Tcl interesting is the number and variety of extensions that are available. Tk is aset of extensions that provides the ability to create GUIs in an X Window environment. Theseextensions make it easy to develop graphical interfaces for tools. Tk can be invoked interactively usingthe windowing shell wish. Both Tcl and Tk are implemented as C library packages that can beincluded in programs if you prefer.scotty, primarily the work of Jürgen Schönwälder, adds network management extensions to Tcl/Tk.The tnm portion of scotty adds network administration support. The tkined portion of scotty, describedin the next section, is a graphical network administration program. What tnm adds is a number ofnetwork management commands. These include support for a number of protocols including ICMP,UDP, DNS, HTTP, Suns RPC, NTP, and, most significantly, SNMP. In addition, there are severalsets of commands that simplify writing network applications. The netdb command gives access tolocal network databases such as /etc/hosts, the syslog command supports sending messages to thesystem logging facilities, and the job command simplifies scheduling tasks. A few examples shouldgive an idea of how these commands could be used.You can invoke the scotty interpreter directly as shown here. In this example, the netdb command isused to list the /etc/host table on a computer:bsd4# scotty% netdb hosts{localhost.lander.edu 1.0.0.127} {bsd4.lander.edu 239.63.153.205}{bsd4.lander.edu. 239.63.153.205} {bsd1.lander.edu 231.60.153.205} {sol1.lander.edu233.60.153.205} {lnx1.lander.edu 234.60.153.205}% exitThe results are returned with each entry reduced to the canonical name and IP address in brackets.Here is the host table for the same system:bsd4# cat /etc/hosts127.0.0.1 localhost.lander.edu localhost205.153.63.239 bsd4.lander.edu bsd4205.153.63.239 bsd4.lander.edu.205.153.60.231 bsd1.lander.edu bsd1205.153.60.233 sol1.lander.edu sol1205.153.60.234 lnx1.lander.edu lnx1Note that there is not a separate entry for the alias bsd4.Here are a few examples of other commands. In the first example, the name of the protocol with avalue of 1 is looked up in /etc/protocols using the netdb command:% netdb protocols name 1icmpIn the second example, a reverse DNS lookup is done for the host at 205.153.63.30: 120
  • 131. % dns name 205.153.63.30sloan.lander.eduFinally, an ICMP ECHO_REQUEST is sent to www.cisco.com:% icmp echo www.cisco.com{www.cisco.com 321}The response took 321 ms. Other commands, such as snmp, require multiple steps to first establish asession and then access information. (Examples are given in Chapter 7.) If you are interested in usingthese tools in this manner, you will first want to learn Tcl. You can then consult the manpages forthese extensions. A number of books and articles describe Tcl, some of them listed in Appendix B.The source is freely available for all these tools.6.5 Mapping or DiagrammingAt this point, you should have a good idea of how to find out what is on your network. The next stepis to put together a picture of how everything interconnects. This is usually referred to as mapping butmay go by other names such as network drawing or diagramming. This can be absolutely essential ifyou are dealing with topology-related problems. Y FLA wide spectrum of approaches may be taken. At one extreme, you could simply use the collecteddata and some standard drawing utility to create your map. Clearly, some graphics software is bettersuited than others for this purpose. For example, special icons for different types of equipment are AMparticularly nice. But almost any software should be usable to a degree. I have even put togetherpassable diagrams using the drawing features in Microsoft Excel.Manual diagramming is usually practical only for a single segment or a very small network. But there TEmight be times when this will be desirable for larger networks—for example, you may be preparinggraphics for a formal presentation. This, however, should be an obvious exception, not a routineactivity.In the middle of the spectrum are programs that will both discover and draw the network. When usingtools with automatic discovery, you will almost certainly want to clean up the graphics. It is extremelyhard to lay out a graph in an aesthetically pleasing manner when doing it manually. You can forgetabout a computer doing a good job automatically.Another closely related possibility is to use scripting tools to update the files used by a graphing utility.The graphic utility can then display the new or updated map with little or no additional interaction.While this is a wonderful learning opportunity, it really isnt a practical solution for most people withreal time constraints.At the other extreme, mapping tools are usually part of more comprehensive management packages.Automatic discovery is the norm for these. Once the map is created, additional managementfunctions—including basic monitoring to ensure that devices and connections still work and to collectperformance data—are performed. 121 Team-Fly®
  • 132. Ideally, these programs will provide a full graphic display that is automatically generated, includesevery device on the network, provides details of the nature and state of the devices, updates the map inreal time, and requires a minimum of user input. Some tools are well along the path to this goal.There are problems with automatic discovery. First, youll want to be careful when you specify thenetworks to be analyzed and keep an eye on things whenever you change this. It is not that uncommonto make an error and find that you are mapping devices well beyond your network. And, as explainedlater in this chapter, not everyone will be happy about this.Also, many mapping programs do a poor job of recognizing topology. For example, in a virtual LAN,a single switch may be logically part of two different networks. Apart from proprietary tools, dontexpect many map programs to recognize and handle these devices correctly. Each logical device maybe drawn as a separate device. If you are relying solely on ICMP ECHO_REQUEST packets,unmanaged hubs and switches will not be recognized at all, while managed hubs and switches will bedrawn as just another device on the network without any indication of the role they play in thenetwork topology.Even with automatic discovery, network mapping and management tools may presuppose that youknow the basic structure of your network. At a minimum, you must know the address range for yournetwork. It seems very unlikely that a legitimate administrator would not have this information. If forsome bizarre reason you dont have this information, you might begin by looking at the routing tablesand NAT tables in your router, DNS files, DHCP configurations, or Internic registration information.You might also use traceroute to identify intermediate segments and routers.6.5.1 tkinedAn excellent example of a noncommercial, open source mapping program is tkined. This is a networkeditor that can be used as a standalone tool or as a framework for an extensible network managementsystem. At its simplest, it can be used to construct a network diagram. Figure 6-1 is an example of asimple network map that has been constructed using tkined tools. (Actually, as will be explained, thismap was "discovered" rather than drawn, but dont worry about this distinction for now.) Figure 6-1. A network map constructed with tkined 122
  • 133. 6.5.1.1 Drawing maps with tkinedManually drawing a map like this is fairly straightforward, although somewhat tedious for all but thesmallest networks. You begin by starting tkined under an X Window session. (This discussionassumes you are familiar with using an X Window application.) You should see the menu bar acrossthe top window just under the titlebar, a toolbar to the left, and a large, initially blank work area calledthe canvas.To create a map, follow these steps: 1. Add the devices to the canvas. Begin by clicking[3] on the machine icon on the toolbar on the left. This is the icon with the question mark in the middle. With this tool selected, each time you click over the canvas, a copy of this icon will be inserted on the canvas at the cursor. [3] Unless otherwise noted, clicking means clicking with the left mouse button. You can change the appearance of each of these icons to reflect the type of device it represents. First, click on Select on the toolbar (not Select on the menu). Next, select the icon or icons you want to change. You select single icons by clicking on them. Multiple icons can be selected by Shift-clicking on each in turn. As you select devices, small boxes are displayed at the corners of the icon. Once you have selected the icons of interest, go to the icon pull- down menu and select the icon you want from the appropriate submenu. Notice that the icon on the toolbar changes. (You could make this change before inserting devices if you wish and insert the selected icon that way.) 2. Label each device. Right-click on each device in turn. From the pop-up menu, select Edit All Attributes..., enter the appropriate name and IP address for each device, and then select Set Values. Once you have done this, right-click on the icon again and select Label with Attribute..., select either name or address depending on your preference, and then click on Accept. 3. Add the networks. This is done with the tool below the machine icon (the thick bar). Select this tool by clicking on it. Click where you want the bar to begin on the canvas. Move the mouse to where you want the network icon to end and click a second time. You can label networks in the same way you label nodes. 4. Connect devices to the networks. You can join devices to a network using the next tool on the toolbar, the thin line with little boxes at either end. Select this tool, click on the device you want to join to the network, and then click on the appropriate network icon. As you move the mouse, a line from the icon to the mouse pointer will be shown. When you click on the network, the line should be attached to both the device and the network. If it disappears, your aim was off. Try again. At this point, you will probably want to rearrange your drawing to tidy things up. You can move icons by dragging them with the middle mouse button. (If your mouse doesnt have three buttons, try holding down both the left and right buttons simultaneously.) 5. Group devices and networks. This allows you to collapse a subnet into a single icon. You can open whichever subnets you need to work with at the moment and leave the rest closed. For large networks, this is essential. Otherwise, the map becomes too cluttered to use effectively. To combine devices, use the Select tool to select the devices and the network. Then select Structure Group. You can use this same menu to select Ungroup, Expand, and Collapse for your groups. You can edit the group label as desired in the previously discussed manner. 123
  • 134. 6.5.1.2 Autodiscovery with tkinedFor a small network, manually drawing a diagram doesnt take very long. But for large networks, thiscan be a very tedious process. Fortunately, tkined provides tools for the automatic discovery of nodesand the automatic layout of maps.You begin with Tools IP-Discover. What this does is add the IP Discover menu to the menu bar.The first two items on this menu are Discover IP Network and Discover Route. These tools willattempt to discover either the devices on a network or the routers along a path to a remote machine.When one of these is selected, a pop-up box queries you for the network number or remote device ofinterest. Unfortunately, tkined seems to support only class-based discovery, so you must specify aclass B or a class C address (although you can specify a portion of a class B network by giving a classC style subnet address, e.g., 172.16.1.0). It also tends to be somewhat unpredictable or quirky whentrying to discover multiple networks. If you are using subnets on a class B address, what seems towork best is to run separate discovery sessions and then cut and paste the results together. This is alittle bit of a nuisance, but it is not too bad. This was what was actually done to create Figure 6-1.Figure 6-2 shows the output generated in discovering a route across the network and one of thesubnets for the network shown in Figure 6-1. This window is automatically created by tkined andshows its progress during the discovery process. Note that it is sending out a flood of ICMPECHO_REQUEST packets in addition to the traceroute-style discovery packets, the ICMP networkmask queries, and the SNMP queries shown here. Figure 6-2. Route and network discovery with tkinedIf you do end up piecing together a network map, other previously discussed tools, such as traceroute,can be very helpful. You might also want to look at your routing tables with netstat.There are a couple of problems in using tkined. Foremost is the problem of getting everything installedcorrectly. You will need to install Tcl, then Tk, and then scotty. scotty can be very particular aboutwhich version of Tcl and Tk are installed. You will also need to make sure everything is in the defaultlocation or that the environmental variables are correctly set. Fortunately, packages are available for 124
  • 135. some systems, such as Linux, that take care of most of these details automatically. Also, tkined willnot warn you if you exit without saving any changes you have made.6.6 Politics and SecurityYou should have a legitimate reason and the authority to use the tools described here. Some of thesetools directly probe other computers on the network. Even legitimate uses of these tools can createsurprises for users and may, in some instances, result in considerable ill will and mistrust. For example,doing security probes to discover weaknesses in your network may be a perfectly reasonable thing todo, provided that is your responsibility. But you dont want these scans to come as a surprise to yourusers. I, for one, strongly resent unexpected probing of my computer regardless of the reason. Often, awell-meaning individual has scanned a network only to find himself with a lot of explaining to do.The list of people who have made this mistake includes several big names in the security community.With the rise of personal firewalls and monitoring tools, more and more users are monitoring what ishappening on their local networks and at their computers. Not all of these users really understand theresults returned by these tools, so you should be prepared to deal with misunderstandings. Reactionscan be extreme, even from people who should know enough to put things in context.The first time I used CiscoWorks for Windows, the program scanned the network with, among others,CMIP packets. This, of course, is a perfectly natural thing to do. Unfortunately, another machine onthe network had been configured in a manner that, when it saw the packet, it began blocking allsubsequent packets from the management station. It then began logging all subsequent traffic from themanagement station as attacks. This included the System Messaged Blocks (SMB) that are a normalpart of the network background noise created by computers running Microsoft Windows. A couple ofdays later I received a very concerned email regarding a 10-page log of attacks originating from themanagement station. To make matters worse, the clock on the "attacked" computer was off a couple ofhours. The times recorded for the alleged attacks didnt fall in the block of time I had run CiscoWorks.It did include, however, blocks of times I knew the management station was offline. Before it was allsorted out, my overactive imagination had turned it into a malicious attack with a goal of castingblame on the management station when it was nothing more than a misunderstanding.[4] [4] This problem could have been lessened if both had been running NTP. NTP is discussed in Chapter 11.It is best to deal with such potential problems in advance by clearly stating what you will be doing andwhy. If you cant justify it, then perhaps you should reconsider exactly why you are doing it. Anumber of sites automatically block networks or hosts they receive scans from. And within someorganizations, unauthorized scanning may be grounds for dismissal. You should consider developing aformal policy clearly stating when and by whom scanning may and may not be done.This leads to an important point: you really should have a thorough understanding of how scanningtools work before you use them. For example, some SNMP tools have you enter a list of the variousSNMP passwords (community strings) you use on your network. In the automatic discovery mode, itwill probe for SNMP devices by trying each of these passwords in turn on each machine on thenetwork. This is intended to save the network manager from having to enter this information for eachindividual device. However, it is a simple matter for scanned machines to capture these passwords.Tools like dsniff are designed specifically for that purpose. I strongly recommend watching thebehavior of whatever scanning tools you use with a tool like tcpdump or ethereal to see what it isactually doing. 125
  • 136. Unfortunately, some of the developers of these tools cant seem to decide whether they are writing forresponsible users or crackers. As previously noted, some tools include questionable features, such assupport stealth scans or forged IP addresses. In general, I have described only those features for whichI can see a legitimate use. However, sometimes there is no clear dividing line. For example, forged IPaddresses can be useful in testing firewalls. When I have described such features, I assume that youwill be able to distinguish between appropriate and inappropriate uses.6.7 Microsoft WindowsTraditionally, commercial tools for network management have typically been developed for Unixplatforms rather than Windows. Those available under Windows tended not to scale well. In the lastfew years this has been changing rapidly, and many of the standard commercial tools are nowavailable for Windows platforms.A number of packages support IP scanning under Windows. These include freeware, shareware, andcommercial packages. Generally, these products are less sophisticated than similar Unix tools. Forexample, stealth scanning is usually lacking under Windows. (Personally, Im not sure this issomething to complain about.)Nonetheless, there are a number of very impressive noncommercial tools for Windows. In fact,considering the quality and functionality of some of these free packages, it is surprising that thecommercial packages are so successful. But free software, particularly in network management, seemsto have a way of becoming commercial software over time—once it has matured and developed afollowing.6.7.1 CyberkitOne particularly impressive tool is Luc Neijens cyberkit. The package works well, has a good helpsystem, and implements a wide range of functions in one package. In addition to IP scanning, theprogram includes, among others, ping, traceroute, finger, whois, nslookup, and NTP synchronization.With cyberkit, you can scan a range of addresses within an address space or you can read a set ofaddresses from a file. Figure 6-3 shows an example of such a scan. Figure 6-3. IP scan with cyberkit 126
  • 137. Here you can see how to specify a range of IP addresses. The button to the right of the Address Rangefield will assist you in specifying an address range or entering a filename. If you want to use a file,you need enter only the path and name of a text file containing a set of addresses, one address per line.Notice that you can use the same tab to resolve addresses or do port scans of each address. There are anumber of other tools you might consider. getif, which makes heavy use of SNMP, is described inChapter 7. You might also want to look at Sam Spade. (Sam Spade is particularly helpful whendealing with spamming and other email related problems.)6.7.2 Other Tools for WindowsThe good news is that Tcl, Tk, scotty, and tkined are all available for Windows platforms. Tcl and Tkseem to be pretty stable ports. tkined is usually described as an early alpha port but seems to workfairly well. Youll want a three-button mouse. The interface is almost identical to the Unix version,and I have moved files between Windows and Unix platforms without problems. For example, youcould create maps on one and move them to another for monitoring. Moreover, the tnm extensionshave been used as the basis for additional tools available for Windows.If you use Microsoft Exchange Server, a topology diagramming tool called emap can be downloadedfrom Microsoft. It will read an Exchange directory and automatically generate a Visio diagram foryour site topology. Of course, youll need Visio to view the results.Finally, if you are using NetBIOS, you might want to look at the nbtstat utility. This commanddisplays protocol statistics and current TCP connections using NetBIOS over TCP/IP (NBT). You canuse this command to poll remote NetBIOS name tables among other things. The basic syntax isreturned if you call the program with no options. 127
  • 138. Chapter 7. Device Monitoring with SNMPThis chapter is about monitoring devices with Simple Network Management Protocol (SNMP). Itdescribes how SNMP can be used to retrieve information from remote systems, to monitor systems,and to alert you to problems. While other network management protocols exist, SNMP is currently themost commonly used. While SNMP has other uses, our primary focus will be on monitoring systemsto ensure that they are functioning properly and to collect information when they arent. The materialin this chapter is expanded upon in Chapter 8.This chapter begins with a brief review of SNMP. This description is somewhat informal but shouldserve to convey enough of the basic ideas to get you started if you are unfamiliar with SNMP. If youare already familiar with the basic concepts and vocabulary, you can safely skip over this section.Next I describe NET SNMP—a wonderful tool for learning about SNMP that can be used for manysimple tasks. Network monitoring using tkined is next, followed by a few pointers to tools forMicrosoft Windows.7.1 Overview of SNMPSNMP is a management protocol allowing a management program to communicate, configure, orcontrol remote devices that have embedded SNMP agents. The basic idea behind SNMP is to have aprogram or agent running on the remote system that you can communicate with over the network.This agent then can monitor systems and collect information. Software on a management station sendsmessages to the remote agent requesting information or directing it to perform some specific task.While communication is usually initiated by the management station, under certain conditions theagent may send an unsolicited message or trap back to the management station.SNMP provides a framework for network management. While SNMP is not the only managementprotocol or, arguably, even the best management protocol, SNMP is almost universal. It has a smallfootprint, can be implemented fairly quickly, is extensible, is well documented, and is an openstandard. It resides at the application level of the TCP/IP protocol suite. On the other hand, SNMP,particularly Version 1, is not a secure protocol; it is poorly suited for real-time applications, and it canreturn an overwhelming amount of information.SNMP is an evolving protocol with a confusing collection of abbreviations designating the variousversions. Only the major versions are mentioned here. Understanding the major distinctions amongversions can be important, because there are a few things you cant do with earlier versions andbecause of differences in security provided by the different versions. However, the original version,SNMPv1, is still widely used and will be the primary focus of this chapter. Generally, the laterversions are backward compatible, so differences in versions shouldnt cause too many operationalproblems.The second version has several competing variants. SNMPv2 Classic has been superseded bycommunity-based SNMPv2 or SNMPv2c. Two more secure super-sets of SNMPv2c are SNMPv2uand SNMPv2*. SNMPv2c is the most common of the second versions and is what is usually meantwhen you see a reference to SNMPv2. SNMPv2 has not been widely adopted, but its use is growing.SNMP-NG or SNMPv3 attempts to resolve the differences between SNMPv2u and SNMPv2*. It istoo soon to predict how successful SNMPv3 will be, but it also appears to be growing in popularity. 128
  • 139. Although there are usually legitimate reasons for the choice of terms, the nomenclature used todescribe SNMP can be confusing. For example, parameters that are monitored are frequently referredto as objects, although variables might have been a better choice and is sometimes used. Basically,objects can be thought of as data structures.Sometimes, the specialized nomenclature doesnt seem to be worth the effort. For example, SNMPuses community strings to control access. In order to gain access to a device, you must give thecommunity string. If this sounds a lot like a password to you, you are not alone. The primarydifference is the way community strings are used. The same community strings are often shared by agroup or community of devices, something frowned upon with passwords. Their purpose is more tologically group devices than to provide security.An SNMP manager, software on a central management platform, communicates with an SNMP agent,software located in the managed device, through SNMP messages. With SNMPv1 there are five typesof messages. GET_REQUEST, GET_NEXT_REQUEST, and SET_REQUEST are sent by themanager to the agent to request an action. In the first two cases, the agent is asked to supplyinformation, such as the value of an object. The SET_REQUEST message asks the agent to changethe value of an object.The remaining messages, GET_RESPONSE and TRAP, originate at the agent. The agent replies to thefirst three messages with the GET_RESPONSE message. In each case, the exchange is initiated by themanager. With the TRAP message, the action is initiated by the agent. Like a hardware interrupt on acomputer, the TRAP message is the agents way of getting the attention of the manager. Traps play anessential role in network management in that they alert you to problems needing attention. Knowingthat a device is down is, of course, the first step to correcting the problem. And it always helps to beable to tell a disgruntled user that you are aware of the problem and are working on it. Traps are asclose as SNMP gets to real-time processing. Unfortunately, for many network problems (such as acrashed system) traps may not be sent. Even when traps are sent, they could be discarded by a busyrouter. UDP is the transport protocol, so there is no error detection for lost packets. Figure 7-1summarizes the direction messages take when traveling between the manager and agent. Figure 7-1. SNMP messagesFor a management station to send a packet, it must know the IP address of the agent, the appropriatecommunity string or password used by the agent, and the name of the identifier for the variable orobject referenced. Unfortunately, SNMPv1 is very relaxed about community strings. These are sent inclear text and can easily be captured by a packet sniffer. One of the motivating factors for SNMPv2was to provide greater security. Be warned, however, SNMPv2c uses plain text community strings. Most systems, by default, use public for the read-only community string and private for the read/write community string. When you set up SNMP access on a device, you will be given the opportunity to change these. If you dont want your system to be reconfigurable by anyone on the Internet you 129
  • 140. dont want your system to be reconfigurable by anyone on the Internet, you should change these. When communicating with devices, use read-only community strings whenever possible and read/write community strings only when necessary. Use filters to block all SNMP traffic into or out of your network. Most agents will also allow you to restrict which devices you can send and receive SNMP messages to and from. Do this! For simplicity and clarity, the examples in this chapter have been edited to use public and private. These are not the community strings I actually use.Another advantage to SNMPv2 is that two additional messages have been added.GET_BULK_REQUEST will request multiple pieces of data with a single query, whereasGET_REQUEST generates a separate request for each piece of data. This can considerably improveperformance. The other new message, INFORM_REQUEST, allows one manager to send unsolicitedinformation to another.Collectively, the objects are variables defined in the Management Information Base (MIB).Unfortunately, MIB is an overused term that means slightly different things in different contexts.There are some formal rules for dealing with MIBs—MIB formats are defined by Structure ofManagement Information (SMI), the syntax rules for MIB entries are described in Abstract SyntaxNotation One (ASN.1), and how the syntax is encoded is given by Basic Encoding Rules (BER).Unless you are planning to delve into the implementation of SNMP or decode hex dumps, you canpostpone learning SMI, ASN.1, and BER. And because of the complexity of these rules, I adviseagainst looking at hex dumps. Fortunately, programs like ethereal do a good job of decoding thesepackets, so I wont discuss these rules in this book.The actual objects that are manipulated are identified by a unique, authoritative object identifier (OID).Each OID is actually a sequence of integers separated by decimal points, sometimes called dottednotation. For example, the OID for a systems description is 1.3.6.1.2.1.1.1. This OID arises from thestandardized organization of all such objects, part of which is shown in Figure 7-2. The actual objectsare the leaves of the tree. To eliminate any possibility of ambiguity among objects, they are named bygiving their complete path from the root of the tree to the leaf. Figure 7-2. Partial OID structure 130
  • 141. As you can see from the figure, nodes are given both names and numbers. Thus, the OID can also begiven by specifying the names of each node or object descriptor. For example,iso.org.dod.internet.mgmt.mib-2.system.sysDescr is the object descriptor that corresponds to the object Yidentifier 1.3.6.1.2.1.1.1. The more concise numerical names are used within the agents and within FLmessages. The nonnumeric names are used at the management station for the convenience of users.Objects are coded directly into the agents and manipulated by object descriptors. While managementstations can mechanically handle object descriptors, they must be explicitly given the mappings AMbetween object descriptors and object identifiers if you want to call objects by name. This is one roleof the MIB files that ship with devices and load onto the management station. These files also tell themanagement station which identifiers are valid. TEAs you might guess from Figure 7-2, this is not a randomly created tree. Through the standardizationprocess, a number of identifiers have been specified. In particular, the mib-2 subtree has a number ofsubtrees or groups of interest. The system group, 1.3.6.1.2.1.1, has nodes used to describe the systemsuch as sysDescr(1), sysObjectID(2), sysUpTime(3), and so on. These should be pretty self-explanatory. Although not shown in the figure, the ip(4) group has a number of objects such asipForwarding(1), which indicates whether IP packets will be forwarded, and ipDefaultTTL(2), whichgives the default TTL when it isnt specified by the transport layer. The ip group also has three tablesincluding the ipRouteTable(20). While this information can be gleaned from RFC 1213, which definesthe MIB, several books that present this material in a more accessible form are listed in Appendix B.Fortunately, there are tools that can be used to investigate MIBs directly.In addition to standard entries, companies may register private or enterprise MIBs. These haveextensions specific to their equipment. Typically, these MIBs must be added to those on themanagement station if they are not already there. They are usually shipped with the device or can bedownloaded over the Internet. Each company registers for a node under the enterprises node(1.3.6.1.4.1). These extensions are under their respective registered nodes.If you are new to SNMP, this probably seems pretty abstract. Appendix B also lists and discusses anumber of sources that describe the theory and architecture of SNMP in greater detail. But you shouldknow enough at this point to get started. The best way to come to terms with SNMP and the structure 131 Team-Fly®
  • 142. of managed objects is by experimentation, and that requires tools. I will try to clarify some of theseconcepts as we examine SNMP management tools.7.2 SNMP-Based Management ToolsThere are several extremely powerful and useful noncommercial SNMP tools. Tools from the NETSNMP project, scotty, and tkined are described here.7.2.1 NET SNMP (UCD SNMP)The University of California at Davis implementation of SNMP (UCD SNMP) has its origin in asimilar project at Carnegie Mellon University under Steve Waldbusser (CMU SNMP). In the mid-nineties, the CMU project languished. During this period, the UCD project was born. The UCDproject has greatly expanded the original CMU work and is flourishing, thanks to the work of WesHardaker. The CMU project reemerged for a while with a somewhat different focus and has seen a lotof support in the Linux community. Both are excellent. While only UCD SNMP will be describedhere, the basics of each are so similar that you should have no problem using CMU SNMP once youare familiar with UCD SNMP. Very recently, UCD SNMP has been renamed NET SNMP to reflectsome organizational changes.NET SNMP is actually a set of tools, a SNMP library, and an extensible agent. The source code isavailable and runs on a number of systems. Binaries are also available for some systems, includingMicrosoft Windows. NET SNMP supports SNMPv1, SNMPv2c, and SNMPv3.Admittedly, the NET SNMP toolset is not ideal for the routine management of a large network. But itis ideal for learning about SNMP, is not an unreasonable toolset for occasional tasks on smallernetworks, and can be particularly useful in debugging SNMP problems, in part because it separatesSNMP functions into individual utilities. The agent software is a logical choice for systems usingLinux or FreeBSD and is extensible. Most, but not all, of the utilities will be described.7.2.1.1 snmpgetIn the last section, it was stated that there are three messages that can be sent by a management station:GET_REQUEST, GET_NEXT_REQUEST, and SET_REQUEST. NET SNMP provides utilities tosend each of these messages—snmpget, snmpgetnext, and snmpset, respectively. In order to retrievethe value of an object, it is necessary to specify the name or IP address of the remote host, acommunity string for the host, and the OID of the object. For example:bsd4# snmpget 172.16.1.5 public .1.3.6.1.2.1.1.1.0system.sysDescr.0 = "APC Embedded PowerNet SNMP Agent (SW v2.2, HW vB2, Mod:AP9605, Mfg 08/10/96, SN: WA9632270847, Agent Loader v1.0)"There are a couple of points to make about the OID. First, notice the 0 at the end. This is an offset intothe data. It is a common error to omit this. If you are looking at a table, you would use the actual offsetinto the table instead of a 0. For example, the description of the third interface in the interface tablewould have the OID ifDescr.3. 132
  • 143. Second, the leading dot is important. NET SNMP will attempt to attach a prefix to any OIDs notbeginning with a dot. By default, the prefix is 1.3.6.1.2.1, but you can change this by setting theenvironment variable PREFIX. In this example, we have specified the OID explicitly. Without theleading dot, snmpget would have added the prefix to what we had, giving an OID that was too long.On the other hand, you could just use 1.1.0 without the leading dot and you would get the sameresults. Initially, using the prefix can be confusing, but it can save a lot of typing once you are used toit.Of course, you can also use names rather than numbers, provided the appropriate MIB is available.This is shown in the next two examples:bsd4# snmpget 172.16.1.5 public iso.org.dod.internet.mgmt.mib-2.system.sysDescr.0system.sysDescr.0 = "APC Embedded PowerNet SNMP Agent (SW v2.2, HW vB2, Mod:AP9605, Mfg 08/10/96, SN: WA9632270847, Agent Loader v1.0)"bsd4# snmpget 172.16.1.5 public system.sysDescr.0system.sysDescr.0 = "APC Embedded PowerNet SNMP Agent (SW v2.2, HW vB2, Mod:AP9605, Mfg 08/10/96, SN: WA9632270847, Agent Loader v1.0)"In the first case, the full path was given, and in the second the prefix was used. (Dont forget thetrailing 0.) Numbers and names can be mixed:bsd4# snmpget 172.16.1.5 public .1.3.6.internet.2.1.system.1.0system.sysDescr.0 = "APC Embedded PowerNet SNMP Agent (SW v2.2, HW vB2, Mod:AP9605, Mfg 08/10/96, SN: WA9632270847, Agent Loader v1.0)"(Frankly, I cant see much reason for doing this.)Also, if the MIB is known, you can do a random-access lookup for unique node names:bsd4# snmpget 172.16.1.5 public upsBasicIdentModel.0enterprises.apc.products.hardware.ups.upsIdent.upsBasicIdent.upsBasicIdentModel.0 = "APC Smart-UPS 700 "In this example, only the final identifier in the OID, upsBasicIdentMode.0, is given, and the MIBis searched to construct the full OID. This can be particularly helpful if you want to query severalobjects with a single snmpget. You can also use multiple OIDs in the same snmpget command toretrieve the values of several objects.7.2.1.2 Configuration and optionsBefore we look further at the NET SNMP commands, lets discuss configuration and options. For themost part, these tools share the same configuration files and options. (A few exceptions will be notedwhen appropriate.) The general configuration file is snmp.conf and is typically in the/usr/local/share/snmp, /usr/local/lib/snmp, or $HOME/.snmp directory. This search path can beoverridden by setting the SNMPCONFPATH environment variable. Further documentation can befound in the snmp.conf Unix manpage. This manpage also describes environment variables.One particular concern in configuring the software is the proper installation of MIBs. As noted earlier,use of the name form of OIDs works only if the appropriate MIB[1] is loaded. Devices may have morethan one MIB associated with them. In the examples just presented, we have been interacting with an 133
  • 144. SNMP-controlled uninterruptible power supply (UPS) manufactured by APC Corp. With this device,we can use the standard default MIB-II defined in RFC 1213. This standard MIB defines objects usedby most devices. If you have correctly installed the software, this MIB should be readily available.There are two additional MIBs that may be installed for this particular device. The first is the IETFMIB, which defines a generic UPS. This is the UPS-MIB defined by RFC 1628. The third MIB,PowerNet-MIB, contains APC Corp.s custom extensions. These last two MIBs came on a diskettewith the SNMP adapter for this particular UPS. [1] When a MIB is loaded, it becomes part of the MIB. Dont say I didnt warn you.To install these MIBs, the files are first copied to the appropriate directory, /usr/local/share/snmp inthis case. (You may also want to rename them so that all your MIB files have consistent names.) Next,the environment variable MIBS is set so the MIBs will be loaded. This can be a colon-delimited list ofindividual MIB names, but setting MIBS to ALL is usually simpler. On a Windows computer, use thecommand:C:usrbin>set MIBS=ALLOn a Unix system using the Bash shell, you would use:export MIBS=ALLFor the C-shell, use:setenv MIBS ALLOf course, this may vary depending on the shell you use.Alternately, you can use the environment variable MIBFILES to specify filenames. There is also acommand-line option with most of these utilities, -m, to load specific MIBs. If the MIBs are notinstalled correctly, you will not be able to use names from the MIB, but you can still access objects bytheir numerical OIDs.The NET SNMP commands use the same basic syntax and command-line options. For example, theearlier discussion on OID usage applies to each command. This is described in the variables manpage.The manpages for the individual commands are a little sparse. This is because the descriptions of theoptions have been collected together on the snmpcmd manpage. Options applicable to a specificcommand can be displayed by using the -h option.Lets return to snmpget and look at some of the available options. The -O options control how outputis formatted. The default is to print the text form of the OID:bsd4# snmpget 172.16.1.5 public .1.3.6.1.4.1.318.1.1.1.1.1.1.0enterprises.apc.products.hardware.ups.upsIdent.upsBasicIdent.upsBasicIdentModel.0 = "APC Smart-UPS 700 "-On forces the OID to be printed numerically:bsd4# snmpget -On 172.16.1.5 public .1.3.6.1.4.1.318.1.1.1.1.1.1.0.1.3.6.1.4.1.318.1.1.1.1.1.1.0 = "APC Smart-UPS 700 " 134
  • 145. Sometimes the value of an object will be a cryptic numerical code. By default, a description will beprinted. For example:bsd4# snmpget 172.16.1.5 public ip.ipForwarding.0ip.ipForwarding.0 = not-forwarding(2)Here, the actual value of the object is 2. This description can be suppressed with the -Oe option:bsd4# snmpget -Oe 172.16.1.5 public ip.ipForwarding.0ip.ipForwarding.0 = 2This could be useful in eliminating any confusion about the actual stored value, particularly if you aregoing to use the value subsequently with a SET command.Use the -Os, -OS, and -Of commands to control the amount of information included in the OID. The -Os option displays the final identifier only:bsd4# snmpget -Os 172.16.1.5 public enterprises.318.1.1.1.1.1.1.0upsBasicIdentModel.0 = "APC Smart-UPS 700 "The -OS option is quite similar to -Os except that the name of the MIB is placed before the identifier:sd4# snmpget -OS 172.16.1.5 public enterprises.318.1.1.1.1.1.1.0PowerNet-MIB::upsBasicIdentModel.0 = "APC Smart-UPS 700 "-Of forces the display of the full OID:bsd4# snmpget -Of 172.16.1.5 public enterprises.318.1.1.1.1.1.1.0.iso.org.dod.internet.private.enterprises.apc.products.hardware.ups.upsIdent.upsBasicIdent.upsBasicIdentModel.0 = "APC Smart-UPS 700 "This leaves no question about what you are looking at.There are a number of additional options. The -V option will return the programs version. The versionof SNMP used can be set with the -v option, either 1, 2c, or 3. The -d option can be used to dump allSNMP packets. You can set the number of retries and timeouts with the -r and -t options. These fewoptions just scratch the surface. The syntax for many of these options has changed recently, so be sureto consult the snmpcmd manpage for more options and details for the version you use.7.2.1.3 snmpgetnext, snmpwalk, and snmptableSometimes you will want to retrieve several related values that are stored together within the agent.Several commands facilitate this sort of retrieval. The snmpgetnext command is very similar to thesnmpget command. But while snmpget returns the value of the specified OID, snmpgetnext returns thevalue of the next object in the MIB tree:bsd4# snmpget -Os 172.16.1.5 public sysDescr.0sysDescr.0 = APC Embedded PowerNet SNMP Agent (SW v2.2, HW vB2, Mod: AP9605,Mfg 08/10/96, SN: WA9632270847, Agent Loader v1.0)bsd4# snmpgetnext -Os 172.16.1.5 public sysDescr.0sysObjectID.0 = OID: smartUPS700bsd4# snmpgetnext -Os 172.16.1.5 public sysObjectID.0sysUpTime.0 = Timeticks: (77951667) 9 days, 0:31:56.67 135
  • 146. bsd4# snmpgetnext -Os 172.16.1.5 public sysUpTime.0sysContact.0 = SloanAs you can see from this example, snmpgetnext can be used to walk through a sequence of values.Incidentally, this is one of the few cases in which it is OK to omit the trailing 0. This command can beparticularly helpful if you dont know the next identifier.If you want all or most of the values of adjacent objects, the snmpwalk command can be used toretrieve a subtree. For example:bsd4# snmpwalk 172.16.1.5 public systemsystem.sysDescr.0 = APC Embedded PowerNet SNMP Agent (SW v2.2, HW vB2, Mod:AP9605, Mfg 08/10/96, SN: WA9632270847, Agent Loader v1.0)system.sysObjectID.0 = OID:enterprises.apc.products.system.smartUPS.smartUPS700system.sysUpTime.0 = Timeticks: (78093618) 9 days, 0:55:36.18system.sysContact.0 = Sloansystem.sysName.0 = Equipment Rack APCsystem.sysLocation.0 = Network Laboratorysystem.sysServices.0 = 72Be prepared to be overwhelmed if you dont select a small subtree. You probably wouldnt want towalk the mib-2 or enterprises subtree:bsd4# snmpwalk 172.16.2.1 public enterprises | wc 3320 10962 121987In this example, the enterprises subtree is 3320 lines long. Nonetheless, even with large subtrees thiscan be helpful to get a quick idea of what is out there. For example, you might pipe output from asubtree you arent familiar with to head or more so you can skim it.Some objects are stored as tables. It can be painful to work with these tables one item at a time, andonce you have them, they can be almost unreadable. snmptable is designed to address this need. Hereis an example of a small route table from a Cisco 3620 router:bsd4# snmptable -Cb -Cw 80 172.16.2.1 public ipRouteTableSNMP table: ip.ipRouteTable Dest IfIndex Metric1 Metric2 Metric3 Metric4 NextHop Type 0.0.0.0 0 0 -1 -1 -1 205.153.60.2 indirect 172.16.1.0 2 0 -1 -1 -1 172.16.1.1 direct 172.16.2.0 3 0 -1 -1 -1 172.16.2.1 direct 172.16.3.0 4 0 -1 -1 -1 172.16.3.1 direct 205.153.60.0 1 0 -1 -1 -1 205.153.60.250 direct 205.153.61.0 0 0 -1 -1 -1 205.153.60.1 indirect 205.153.62.0 0 0 -1 -1 -1 205.153.60.1 indirect 205.153.63.0 0 0 -1 -1 -1 205.153.60.1 indirectSNMP table ip.ipRouteTable, part 2 Proto Age Mask Metric5 Info local 33 0.0.0.0 -1 .ccitt.nullOID local 0 255.255.255.0 -1 .ccitt.nullOID local 0 255.255.255.0 -1 .ccitt.nullOID local 0 255.255.255.0 -1 .ccitt.nullOID local 0 255.255.255.0 -1 .ccitt.nullOID local 33 255.255.255.0 -1 .ccitt.nullOID local 33 255.255.255.0 -1 .ccitt.nullOID 136
  • 147. local 33 255.255.255.0 -1 .ccitt.nullOIDEven with snmptable, it can be a little tricky to get readable output. In this case, I have used twooptions to help. -Cb specifies a brief header. -Cw 80 defines a maximum column width of 80characters, resulting in a multipart table. You can also specify the column delimiter with the -Cfoption, and you can suppress headers altogether with the -CH option. (There are also a snmpbulkgetand a snmpbulkwalk if you are using SNMPv2.)7.2.1.4 snmpsetThe snmpset command is used to change the value of objects by sending SET_REQUEST messages.The syntax of this command is a little different from previous commands since you must also specifya value and a type for the value. You will also need to use a community string that provides read/writeaccess:bsd4# snmpset 172.16.1.5 private sysContact.0 s "el Zorro"system.sysContact.0 = el ZorroIn this example, the system contact was set using a quote-delimited string. Legitimate types includeintegers (i), strings (s), hex strings (x), decimal strings (d), null objects (n), object ID (o), time ticks (t),and IP addresses (a), among others.People often think of SNMP as being appropriate only for collecting information, not as a generalconfiguration tool, since SNMP only allows objects to be retrieved or set. However, many objects areconfiguration parameters that control the operation of the system. Moreover, agents can react tochanges made to objects by running scripts, and so on. With the appropriate agent, virtually any actioncan be taken.[2] For example, you could change entries in an IP routing table, enable or disable asecond interface on a device, or enable or disable IP forwarding. With an SNMP-controlled UPS, youcould shut off power to a device. What you can do, and will want to do, will depend on both thedevice and the context. You will need to study the documentation for the device and the applicableMIBs to know what is possible on a case-by-case basis. [2] In an extremely interesting interview of John Romkey by Carl Malamud on this topic, Romkey describes an SNMP-controlled toaster. The interview was originally on the Internet radio program Geek of the Week (May 29, 1993). At one time, it was available on audio tape from OReilly & Associates (ISBN 1-56592-997-7). Visit http://town.hall.org/radio/Geek and follow the link to Romkey.7.2.1.5 snmptranslateIn all the preceding examples, I have specified an OID. An obvious question is how did I know theOID? Available OIDs are determined by the design of the agent and are described by its MIB. Thereare several different approaches you can take to discover the contents of a MIB. The most directapproach is to read the MIB. This is not a difficult task if you dont insist on understanding everydetail. Youll be primarily interested in the object definitions.Here is an example of the definition of the system contact (sysContact) taken from MIB-II (RFC1213):sysContact OBJECT-TYPE SYNTAX DisplayString (SIZE (0..255)) ACCESS read-write STATUS mandatory DESCRIPTION 137
  • 148. "The textual identification of the contact person for this managed node, together with information on how to contact this person." ::= { system 4 }The object name is in the first line. The next line says the objects type is a string and specifies itsmaximum size. The third line tells us that this can be read or written. In addition to read-write, anobject may be designated read-only or not-accessible. While some objects may not be implemented inevery agent, this object is required, as shown in the next line. Next comes the description. The last linetells where the object fits into the MIB tree. This is the fourth node in the system group.With an enterprise MIB, there is usually some additional documentation that explains what isavailable. With standard MIBs like this one, numerous descriptions in books on SNMP describe eachvalue in detail. These can be very helpful since they are usually accompanied by tables or diagramsthat can be scanned quickly. See Appendix B for specific suggestions.NET SNMP provides two tools that can be helpful. We have already discussed snmpwalk. Anotheruseful tool is snmptranslate. This command is designed to present a MIB in a human-readable form.snmptranslate can be used in a number of different ways. First, it can be used to translate between thetext and numeric form of an object. For example:bsd4# snmptranslate system.sysContact.0.1.3.6.1.2.1.1.4.0We can get the numeric form with the -On option as shown in the next two examples:bsd4# snmptranslate -On .1.3.6.1.2.1.1.4.0system.sysContact.0bsd4# snmptranslate -Ofn system.sysContact.0.iso.org.dod.internet.mgmt.mib-2.system.sysContact.0snmptranslate can be a little particular about prefixes. In the previous example, sysContact.0 wouldnot have been sufficient. You can get around this with the -IR option. (This is usually the default formost NET SNMP commands.)bsd4# snmptranslate -IR sysContact.0.1.3.6.1.2.1.1.4.0You can also use regular expression matching. For example:bsd4# snmptranslate -On -Ib sys.*imesystem.sysUpTimeNotice the use of single quotes. (This option can return a few surprises at times as well.)You get extended information by using the -Td option:bsd4# snmptranslate -Td system.sysContact.1.3.6.1.2.1.1.4sysContact OBJECT-TYPE —FROM SNMPv2-MIB, RFC1213-MIB -- TEXTUAL CONVENTION DisplayString SYNTAX OCTET STRING (0..255) DISPLAY-HINT "255a" 138
  • 149. MAX-ACCESS read-write STATUS current DESCRIPTION "The textual identification of the contact person for this managed node, together with information on how to contact this person. If no contact information is known, the value is the zero-length string."::= { iso(1) org(3) dod(6) internet(1) mgmt(2) mib-2(1) system(1) 4 }This is basically what we saw in the MIB but in a little more detail. (By the way, the lines startingwith — are just comments embedded in the MIB.)We can use snmptranslate to generate a tree representation for subtrees by using the -Tp option. Forexample:bsd4# snmptranslate -Tp system+--system(1) | +-- -R-- String sysDescr(1) | Textual Convention: DisplayString | Size: 0..255 +-- -R-- ObjID sysObjectID(2) +-- -R-- TimeTicks sysUpTime(3) +-- -RW- String sysContact(4) | Textual Convention: DisplayString | Size: 0..255 +-- -RW- String sysName(5) | Textual Convention: DisplayString | Size: 0..255 +-- -RW- String sysLocation(6) | Textual Convention: DisplayString | Size: 0..255 +-- -R-- Integer sysServices(7) +-- -R-- TimeTicks sysORLastChange(8) | Textual Convention: TimeStamp | +--sysORTable(9) | +--sysOREntry(1) | +-- ---- Integer sysORIndex(1) +-- -R-- ObjID sysORID(2) +-- -R-- String sysORDescr(3) | Textual Convention: DisplayString | Size: 0..255 +-- -R-- TimeTicks sysORUpTime(4) Textual Convention: TimeStampDont forget the final argument or youll get the entire MIB. There are also options to print all objectsin labeled form (-Tl ), numeric form (-To), or symbolic form (-Tt), but frankly, Ive never found muchuse for these. These options simply give too much data. One last word of warning: if you have troubleusing snmptranslate, the first thing to check is whether your MIBs are correctly loaded.7.2.1.6 snmpnetstatsnmpnetstat is an SNMP analog to netstat. Using SNMP, it will provide netstat-like information fromremote systems. Many of the major options are the same as with netstat. A few examples will showhow this tool is used. 139
  • 150. The -an option will show the sockets in open mode:bsd4# snmpnetstat 172.16.2.234 public -anActive Internet (tcp) Connections (including servers)Proto Local Address Foreign Address (state)tcp *.ftp *.* LISTENtcp *.telnet *.* LISTENtcp *.smtp *.* LISTENtcp *.http *.* LISTENtcp *.sunrpc *.* LISTENtcp *.printer *.* LISTENtcp *.659 *.* LISTENtcp *.680 *.* LISTENtcp *.685 *.* LISTENtcp *.690 *.* LISTENtcp *.1024 *.* LISTENtcp 172.16.2.234.telnet sloan.1135 ESTABLISHEDActive Internet (udp) ConnectionsProto Local Addressudp *.sunrpcudp *.snmpudp *.whoudp *.657udp *.668udp *.678udp *.683udp *.688udp *.1024udp *.nfsdNotice that with snmpnetstat, the options are listed at the end of the command.The -r option gives the route table. Here is a route table from a Cisco 3620 router:bsd4# snmpnetstat 172.16.2.1 public -rnRouting tablesDestination Gateway Flags Interfacedefault 205.153.60.2 UG if0172.16.1/24 172.16.1.1 U Ethernet0/1172.16.2/24 172.16.2.1 U Ethernet0/2172.16.3/24 172.16.3.1 U Ethernet0/3205.153.60 205.153.60.250 U Ethernet0/0205.153.61 205.153.60.1 UG if0205.153.62 205.153.60.1 UG if0205.153.63 205.153.60.1 UG if0In each of these examples, the -n option is used to suppress name resolution.Here are the packet counts for the interfaces from the same router:bsd4# snmpnetstat 172.16.2.1 public -iName Mtu Network Address Ipkts Ierrs Opkts Oerrs QueueEthernet0/1 1500 172.16.1/24 172.16.1.1 219805 0 103373 0 0Ethernet0/0 1500 205.153.60 205.153.60.250 406485 0 194035 0 0Ethernet0/2 1500 172.16.2/24 172.16.2.1 177489 1 231011 0 0Ethernet0/3 1500 172.16.3/24 172.16.3.1 18175 0 97954 0 0Null0 1500 0 0 0 0 0As with netstat, the -i option is used. 140
  • 151. As a final example, the -s option is used with the -P option to get general statistics with outputrestricted to a single protocol, in this case IP:bsd4# snmpnetstat 172.16.2.1 public -s -P ipip: 533220 total datagrams received 0 datagrams with header errors 0 datagrams with an invalid destination address 231583 datagrams forwarded 0 datagrams with unknown protocol 0 datagrams discarded 301288 datagrams delivered 9924 output datagram requests 67 output datagrams discarded 4 datagrams with no route 0 fragments received 0 datagrams reassembled 0 reassembly failures 0 datagrams fragmented 0 fragmentation failures 0 fragments createdThis should all seem very familiar to netstat users.7.2.1.7 snmpstatus YThe snmpstatus command is a quick way to get a few pieces of basic information from an agent: FLbsd4# snmpstatus 172.16.2.1 public AM[172.16.2.1]=>[Cisco Internetwork Operating System SoftwareIOS (tm) 3600 Software (C3620-IO3-M), Version 12.0(7)T, RELEASE SOFTWARE (fc2)Copyright (c) 1986-1999 by Cisco Systems, Inc.Compiled Wed 08-Dec-99 10:08 by phanguye] Up: 11 days, 1:31:43.66Interfaces: 5, Recv/Trans packets: 1113346/629074 | IP: 533415/9933 TEIt gets the IP address, text description, time since the system was booted, total received andtransmitted packets, and total received and transmitted IP packets.7.2.1.8 Agents and trapsIn addition to management software, NET SNMP also includes the agent snmpd. As with any agent,snmpd responds to SNMP messages, providing basic management for the host on which it is run.snmpd uses the snmpd.conf configuration file (not to be confused with snmp.conf, the configurationfile for the utilities). snmpd functionality will depend, in part, on what is enabled by its configurationfile. The distribution comes with the MIB UCD-SNMP-MIB.txt and the file EXAMPLE.conf, anexample configuration file that is fairly well documented. The manpage for snmpd.conf providesadditional information.At a minimum, youll want to edit the security entries. The com2sec entry is used to set the communitynames for a host or network. The group entry defines an access class. For example, consider thesethree lines from a configuration file:com2sec local 172.16.2.236 private...group MyRWGroup v1 local... 141 Team-Fly®
  • 152. access MyRWGroup "" any noauth prefix all all noneThe first line sets the community string to private for the single host 172.16.2.236. The last twoestablish that this host is using SNMPv1 and has both read and write privileges.Even without further editing of the configuration file, the agent provides a number of useful pieces ofinformation. These include things like information on processes (prTable), memory usage (memory),processor load (laTable), and disk usage (dskTable). For example, here is the disk information from aLinux system:bsd4# snmpwalk 172.16.2.234 public dskTableenterprises.ucdavis.dskTable.dskEntry.dskIndex.1 = 1enterprises.ucdavis.dskTable.dskEntry.dskPath.1 = /enterprises.ucdavis.dskTable.dskEntry.dskDevice.1 = /dev/sda1enterprises.ucdavis.dskTable.dskEntry.dskMinimum.1 = 10000enterprises.ucdavis.dskTable.dskEntry.dskMinPercent.1 = -1enterprises.ucdavis.dskTable.dskEntry.dskTotal.1 = 202182enterprises.ucdavis.dskTable.dskEntry.dskAvail.1 = 133245enterprises.ucdavis.dskTable.dskEntry.dskUsed.1 = 58497enterprises.ucdavis.dskTable.dskEntry.dskPercent.1 = 31enterprises.ucdavis.dskTable.dskEntry.dskErrorFlag.1 = 0enterprises.ucdavis.dskTable.dskEntry.dskErrorMsg.1 =Most of the entries are just what you would guess. The dskPath entry says we are looking at the rootpartition. The dskDevice gives the path to the partition being examined, /dev/sda1. The next two itemsare parameters for triggering error messages. The dskTotal entry is the size of the partition in kilobytes.This partition is 202MB. The next two entries, dskAvail and dskUsed, give the amount of availableand used space; 31% of the disk is in use. Here is the output from df for the same system:lnx1# df -k /Filesystem 1k-blocks Used Available Use% Mounted on/dev/sda1 202182 58497 133245 31% /The last two entries are objects used to signal errors. By editing the configuration file, you can getinformation on other partitions. Brief descriptions for each object are included within the MIB, UCD-SNMP-MIB.txt. Directions for changing the configuration file are given in the example file.It is also possible to extend the agent. This will allow you to run external programs or scripts. Theoutput, in its simplest form, is limited to a single line and an exit code that can be retrieved as an MIBobject. For example, the following line could be added to the configuration file:exec datetest /bin/date -j -uHere, exec is a keyword, datetest is a label, /bin/date is the command, and the rest of the line is treatedas a set of arguments and parameters to the command. The -j option prevents a query to set the date,and -u specifies Coordinated Universal time. The command is run by the agent each time you try toaccess the object. For example, snmpwalk could be used to retrieve the following information:bsd4# snmpwalk 172.16.2.236 private extTableenterprises.ucdavis.extTable.extEntry.extIndex.1 = 1enterprises.ucdavis.extTable.extEntry.extNames.1 = datetestenterprises.ucdavis.extTable.extEntry.extCommand.1 = /bin/date -j -uenterprises.ucdavis.extTable.extEntry.extResult.1 = 0enterprises.ucdavis.extTable.extEntry.extOutput.1 = Mon Jun 26 14:10:50 GMT2000 142
  • 153. enterprises.ucdavis.extTable.extEntry.extErrFix.1 = 0enterprises.ucdavis.extTable.extEntry.extErrFixCmd.1 =You should be able to recognize the label, command with options, exit code, and output in this table.The command will be run each time you retrieve a value from this table.Running snmpd on a system is straightforward. As root, type snmpd, and it will immediately fork andreturn the prompt. There are several options you can use. If you dont want it to fork, you can use the -f option. This is useful with options that return additional runtime information. Ive found that it is alsouseful when testing the configuration file. Ill start snmpd in one window and test the configuration inanother. When Im ready to change configurations, I jump back to the original window and kill andrestart the process. Of course, you can always use ps to look up the process and then send the processa -HUP signal. Or you could use snmpset to set the OID versionUpdateConfig to 1 to force a reload ofthe configuration file:bsd4# snmpset 172.16.2.236 private versionUpdateConfig.0 i 1enterprises.ucdavis.version.versionUpdateConfig.0 = 1Take your pick, but you must reload the file before changes will take effect.It is possible to use snmpd options in a couple of ways to trace packet exchanges. You can use theoptions -f, -L, and -d, respectively, to prevent forking, to redirect messages to standard output, and todump packets. Here is an example:bsd4# snmpd -f -L -dUCD-SNMP version 4.1.2Received 49 bytes from 205.153.63.30:10550000: 30 82 00 2D 02 01 00 04 06 70 75 62 6C 69 63 A0 0..-.....public.0016: 82 00 1E 02 02 0B 78 02 01 00 02 01 00 30 82 00 ......x......0..0032: 10 30 82 00 0C 06 08 2B 06 01 02 01 01 06 00 05 .0.....+........0048: 00 .Received SNMP packet(s) from 205.153.63.30 GET message -- system.sysLocation.0 >> system.sysLocation.0 = 303 Laura Lander HallSending 70 bytes to 205.153.63.30:10550000: 30 82 00 42 02 01 00 04 06 70 75 62 6C 69 63 A2 0..B.....public.0016: 82 00 33 02 02 0B 78 02 01 00 02 01 00 30 82 00 ..3...x......0..0032: 25 30 82 00 21 06 08 2B 06 01 02 01 01 06 00 04 %0..!..+........0048: 15 33 30 33 20 4C 61 75 72 61 20 4C 61 6E 64 65 .303 Laura Lande0064: 72 20 48 61 6C 6C r HallThis is probably more information than you want. As previously noted, you probably dont want todelve into the hex. You can replace the -d option with the -V option to get a verbose display butwithout the dump:bsd4# snmpd -f -L -VUCD-SNMP version 4.1.2Received SNMP packet(s) from 205.153.63.30 GET message -- system.sysLocation.0 >> system.sysLocation.0 = 303 Laura Lander Hall 143
  • 154. This should give you an adequate idea of what is going on for most troubleshooting needs. See themanpage for other options.NET SNMP also includes two applications for dealing with traps. snmptrapd starts a daemon toreceive and respond to traps. It uses the configuration file snmptrapd.conf. The snmptrap is anapplication used to generate traps. While these can be useful in troubleshooting, their use is arcane tosay the least. You will need to edit the appropriate MIB files before using these. There are simplerways to test traps.7.2.2 scottyscotty was introduced in Chapter 6. Now that weve talked a little about SNMP, here are a few moreexamples of using scotty. These are based on examples given in one of the README files that comeswith scotty. Since you will have to install scotty to get tkined, it is helpful to know a few scottycommands to test your setup. These scotty commands also provide a quick-and-dirty way of getting afew pieces of information.To use SNMP with scotty, you must first establish an SNMP session:lnx1# scotty% set s [snmp session -address 172.16.1.5 -community private]snmp0Once you have a session, you can retrieve a single object, multiple objects, the successor of an object,or subtrees. Here are some examples:% $s get sysDescr.0{1.3.6.1.2.1.1.1.0 {OCTET STRING} {APC Embedded PowerNet SNMP Agent (SW v2.2,HW vB2, Mod: AP9605, Mfg 08/10/96, SN: WA9632270847, Agent Loader v1.0)}}% $s get "sysDescr.0 sysContact.0"{1.3.6.1.2.1.1.1.0 {OCTET STRING} {APC Embedded PowerNet SNMP Agent (SW v2.2,HW vB2, Mod: AP9605, Mfg 08/10/96, SN: WA9632270847, Agent Loader v1.0)}}{1.3.6.1.2.1.1.4.0 {OCTET STRING} {Sloan <jsloan@lander.edu>}}% $s getnext sysUpTime.0{1.3.6.1.2.1.1.4.0 {OCTET STRING} {Sloan <jsloan@lander.edu>}}% $s getnext [mib successor system]{1.3.6.1.2.1.1.1.0 {OCTET STRING} {APC Embedded PowerNet SNMP Agent (SW v2.2,HW vB2, Mod: AP9605, Mfg 08/10/96, SN: WA9632270847, Agent Loader v1.0)}}{1.3.6.1.2.1.1.2.0 {OBJECT IDENTIFIER} PowerNet-MIB!smartUPS700} {1.3.6.1.2.1.1.3.0TimeTicks {4d 22:27:07.42}} {1.3.6.1.2.1.1.4.0 {OCTET STRING} {Joe Sloan}} {1.3.6.1.2.1.1.5.0 {OCTET STRING} {APC UPS}} {1.3.6.1.2.1.1.6.0 {OCTET STRING} {214Laura Lander Hall, Equipment Rack}} {1.3.6.1.2.1.1.7.0 INTEGER 72}{1.3.6.1.2.1.2.1.0 INTEGER 1} {1.3.6.1.2.1.2.1.0 INTEGER 1}Once you know the syntax, it is straightforward to change the value of objects as can be seen here:% $s set [list [list sysContact.0 "OCTET STRING" "Joe Sloan"] ]{1.3.6.1.2.1.1.4.0 {OCTET STRING} {Joe Sloan}} 144
  • 155. % $s get sysContact.0{1.3.6.1.2.1.1.4.0 {OCTET STRING} {Joe Sloan}}Notice that after the object is set, I have retrieved it to verify the operation. I strongly recommenddoing this each time you change something.If you arent familiar with Tcl, then defining a trap handler will seem arcane. Here is an example:% % proc traphandler {ip list} { set msg "SNMP trap from $ip:" foreach vb $list { append msg " [mib name [lindex $vb 0]]="[lindex $vb 2]"" } puts stderr $msg }% set t [snmp session -port 162]snmp1% $t bind "" trap {traphandler %A "%V"}Once the trap handler is defined, we can test it by interrupting the power to the UPS by unplugging theUPS.[3] This test generated the following trap messages: [3] This is OK with this particular UPS. In fact, its suggested in the documentation. However, you dont want to do this with just any UPS. While UPSs are designed to deal with power interruptions, some are not necessarily designed to deal with the ground being removed, as happens when you unplug a UPS.% SNMP trap from 172.16.1.5: sysUpTime.0="2d 21:15:50.44"snmpTrapOID.0="PowerNet-MIB!upsOnBattery"smartUPS700="57:41:52:4E:49:4E:47:3A:20:54:68:65:20:55:50:53:20:6F:6E:20:73:65:72:69:61:6C:20:70:6F:72:74:20:31:20:69:73:20:6F:6E:20:62:61:74:74:65:72:79:20:62:61:63:6B:75:70:20:70:6F:77:65:72:2E"snmpTrapEnterprise.0="apc"SNMP trap from 172.16.1.5: sysUpTime.0="2d 21:15:50.55"snmpTrapOID.0="1.3.6.1.2.1.33.2.0.1" upsEstimatedMinutesRemaining="31" upsSecondsOnBattery="0"upsConfigLowBattTime="2" snmpTrapEnterprise.0="upsTraps"SNMP trap from 172.16.1.5: sysUpTime.0="2d 21:15:50.66"snmpTrapOID.0="1.3.6.1.2.1.33.2.0.3" upsAlarmId="12" upsAlarmDescr="UPS-MIB!upsAlarmInputBad"snmpTrapEnterprise.0="upsTraps"SNMP trap from 172.16.1.5: sysUpTime.0="2d 21:15:55.27"snmpTrapOID.0="1.3.6.1.2.1.33.2.0.4" upsAlarmId="11" upsAlarmDescr="UPS-MIB!upsAlarmOnBattery"snmpTrapEnterprise.0="upsTraps"SNMP trap from 172.16.1.5: sysUpTime.0="2d 21:15:55.38"snmpTrapOID.0="1.3.6.1.2.1.33.2.0.4" upsAlarmId="12" upsAlarmDescr="UPS-MIB!upsAlarmInputBad"snmpTrapEnterprise.0="upsTraps"SNMP trap from 172.16.1.5: sysUpTime.0="2d 21:15:55.50"snmpTrapOID.0="PowerNet-MIB!powerRestored" smartUPS700="49:4E:46:4F:52:4D:41:54:49:4F:4E:3A:20:4E:6F:72:6D:61:6C:20:70:6F:77:65:72:20:68:61:73:20:62:65:65:6E:20:72:65:73:74:6F:72:65:64 145
  • 156. :20:74:6F:20:74:68:65:20:55:50:53:20:6F:6E:20:73:65:72:69:61:6C:20:70:6F:72:74:20:31:2E" snmpTrapEnterprise.0="apc"From this example, you can see a sequence of traps as the power is lost and restored. Most messagesshould be self-explanatory, and all are explained in the UPS documentation.Generating traps is much simpler. In this example, a session is started and a trap is sent to that session:% set u [snmp session -port 162 -address 172.16.2.234]snmp2% $u trap coldStart ""You can terminate a session without exiting scotty with the destroy command:% $u destroyIf you are thinking about writing Tcl scripts, this should give you an idea of the power of the tnmextensions supplied by scotty.If you arent familiar with the syntax of Tcl, these examples will seem fairly opaque but should giveyou an idea of what is possible. You could try these on your system as presented here, but if you arereally interested is doing this sort of thing, youll probably want to learn some Tcl first. Severalsources of information are given in Appendix B.7.2.3 tkinedtkined was introduced in the last chapter. Here we will look at how it can be used to retrieveinformation and do basic monitoring. tkined is a versatile tool, and only some of the more basicfeatures will be described here. This should be enough to get you started and help you decide if tkinedis the right tool for your needs. A small test network is shown in Figure 7-3. (We will be looking atthis network, along with minor variations, in the following examples.) Figure 7-3. Demo network 146
  • 157. 7.2.3.1 ICMP monitoringICMP monitoring periodically sends an ECHO_REQUEST packet to a remote device to see if theconnection is viable. (Weve seen examples of this before.) SNMP monitoring is superior whenavailable since it can be used to retrieve additional information. But if the device doesnt supportSNMP, or if you dont have SNMP access, ICMP monitoring may be your only option. Your ISP, forexample, probably wont give you SNMP access to their routers even though you depend on them.To use ICMP monitoring with tkined, use Tools IP-Monitor. This will add an IP-Monitor menu tothe menu bar. Next, select a device on your map by clicking on the Select tool and then the devicesicon. Now, use IP-Monitor Check Reachability. (See Figure 7-4.) Since the idea of monitoring is toalert you to problems, if your device is reachable, you shouldnt see any changes. If the device isnearby and it wont create any problems, you can test your setup by disconnecting the device from thenetwork. The devices icon should turn red and start flashing. A message will also be displayed on themap under the icon. Figure 7-4. IP-Monitor menuIf the device is in a collapsed group, the icon for the group will flash. Thus, you dont have to have anicon displayed for every device you are monitoring. You could start a monitor on each device ofinterest, put related devices into a group, and collapse the group. By creating a number of groups, allcollapsed, you can monitor a large number of machines from a small, uncluttered map and still be ableto drill down on a problem.When you reconnect the device, the icon should turn black and then stop flashing. It may take aminute to see these changes. By default, the system polls devices every 60 seconds. You can checkwhich devices are being monitored by selecting IP-Monitor Monitor Job Info. A pop-up box willdisplay a list of the monitors that are running. 147
  • 158. If you want to change parameters, select IP-Monitor Modify Monitor Job. This will bring up a boxdisplaying a list of running jobs. Select the job of interest by clicking on it, then click on the Modifybutton. The box listing jobs will be replaced by a box giving job parameters, as shown in Figure 7-5. Figure 7-5. Monitor job parametersYou can reset the polling rate by changing the Intervaltime field. The next two radio buttons allowyou to suspend or restart a suspended job. The two Threshold fields allow you to establish limits onresponse times. If your system normally responds within, say, 100ms, you could set Rising Thresholdto 200ms. If the quality of the connection degrades so that response time rises above 200ms, thesystem will alert you. The Threshold Action buttons allow you to say how you want to be notifiedwhen thresholds are crossed. Finally, you can commit to the changes, terminate the job, or cancel anychanges.If you are really interested in tracking how response time is changing, you can select IP-MonitorRound Trip Time. A small box will appear on the map, partially obscuring the icon. (You can drag itto a more convenient location.) This is called a stripchart and will plot round-trip times against time.You can change parameters using IP-Monitor Modify Monitor Job. You can change labels andscale by right-clicking on the chart.Figure 7-6 shows two stripcharts. The chart in the upper right really isnt very revealing since thedevice is on the local network and everything is working OK. The latest round-trip time is displayedbelow the stripchart and is updated dynamically. A device does not have to be integrated into the map.The site www.infoave.net, an ISP at the bottom of the figure, has been added to the site and is beingmonitored. This icon is partially obscured by a slider used to adjust the scale. Other ICMP monitoringoptions, shown in Figure 7-4, are available. Figure 7-6. Map with stripcharts 148
  • 159. 7.2.3.2 SNMP trapsBefore you begin using tkined for SNMP-based monitoring, you want to make sure the appropriateMIBs are installed. These will usually be located in a common mibs directory under the tnm librarydirectory, e.g., /usr/lib/tnm2.1.10/mibs or /usr/local/lib/tnm2.1.10/mibs. You will want to copy anyenterprise MIB you plan to use to that directory. Next, you should verify that the files are compatible.Try loading them into scotty with the mib load command, e.g., mib load toaster.mib. If the file loadswithout comment, you are probably OK. Finally, you will want to edit the init.tcl file to automaticallyload the MIBs. Ideally, you will have a site-specific version of the file for changes, but you can editthe standard default file. You will want to add a line that looks something like lappend tnm(mibs)toaster.mib. You are now ready to start tkined and do SNMP-based monitoring.The first step is to go to Tools SNMP-Monitor. This will add the SNMP-Monitor menu to themenu bar. This menu is shown in Figure 7-7. To receive traps, select SNMP-Monitor Trap Sink. Apop-up box will give you the option of listening to or ignoring traps. Select the Listen button and clickon Accept to start receiving traps. At this point, the station is now configured to receive traps. Figure 7-7. SNMP-Monitor menu 149
  • 160. To test that this is really working, we need to generate some traps for the system to receive. If you area scotty user, you might use the code presented in the last section. For this example, a UPS that wasbeing monitored was unplugged. Regardless of how the trap is generated, tkined responds in the sameway. The device icon blinks, a message is written on the map, and a new window, shown in Figure 7-8,is displayed with the trap messages generated by the UPS. Note that the duration of this problem wasunder 5 seconds. It is likely this event would have been missed with polling. Figure 7-8. SNMP monitor report7.2.3.3 Examining MIBsTools SNMP Tree provides one way of examining MIBs. Or, if you prefer, you can use ToolsSNMP-Browser. The SNMP Tree command displays a graphical representation of a subtree of theMIB. This is shown in Figure 7-9. Figure 7-9. SNMP tree 150
  • 161. Menu items allow you to focus in on a particular subtree. For example, the MIB-2 menu shows thevarious subtrees under the MIB-2 node. The Enterprises menu shows various enterprise MIBs thathave been loaded. You simply select the MIB of interest from the menu, and it will be displayed in thewindow. You can click on an item on the tree and a pop-up window will give you the option ofdisplaying a description of the item, retrieving its value, changing its value, or displaying just thesubtree of the node in question. Of course, you will need to select a system before you can retrievesystem-specific information. Y FLThe SNMP-Browser option provides much the same functionality but displays information in adifferent format. If you select SNMP-Browser MIB Browser, you will be given a text box listing AMthe nodes below the internet node (.1.3.6.1) of the MIB tree. If you click on any of these nodes, thetext box will be replaced with one of the nodes under the selected node. In this manner, you can movedown the MIB tree. After the first box, you will also be given the option to move up the tree or, ifappropriate, to the previous or next node in the subtree. If you reach a leaf, you will be given a TEdescription of the object, as shown in Figure 7-10. If the object can be changed, you will be given thatchoice as well. Figure 7-10. MIB Browser 151 Team-Fly®
  • 162. You are also given the option to walk a subtree. This option will attempt to retrieve all the objectvalues for leaves under the current node. This can be quite lengthy depending on where you are in thetree. Figure 7-11 shows the last few entries under ip. Most of the values have scrolled off the window. Figure 7-11. Walk for IPSNMP Tree provides a nice visual display, but it can be a little easier to move around with the MIBBrowser. Take your choice.7.2.3.4 Monitoring SNMP objectsIn much the same way you monitor devices, you can monitor SNMP objects. First, you will need toidentify the object you want to monitor. This can be done using the techniques just described. WithMIB Browser you can select monitoring at a leaf. Alternately, you can select SNMP-MonitorMonitor Variable. This is a little easier if you already know the name of the object you want tomonitor. A pop-up box will request the name of the object to monitor. Type in the name of the objectand click on Start. (Dont forget to select a system first.) A stripchart will be created on your mapdisplaying the values for the monitored object.7.2.3.5 Other commandsTools SNMP Trouble installs the SNMP-Trouble menu. The name is somewhat misleading.Generally, the SNMP-Trouble menu provides quick ways to collect common, useful information. First,it can be used to locate SNMP-aware devices on your network. By selecting multiple devices on themap and then choosing SNMP-Trouble SNMP Devices, tkined will poll each of the devices. Theoutput for the test network is shown in Figure 7-12. Figure 7-12. SNMP devices 152
  • 163. Please note that noResponse does not necessarily mean that the device is down or that it doesntsupport SNMP. For example, it may simply mean that you are not using the correct community string.The SNMP-Trouble menu also provides menu options that will return some of the more commonlyneeded pieces of information such as system information, ARP tables, IP routing tables, interfaceinformation, or TCP connections. A few of these reports are shown in Figure 7-13. Figure 7-13. SNMP-Trouble reports7.2.3.6 Caveatstkined is a fine program, but it does have a couple of problems. As noted in the last chapter, it will letyou exit without saving changes. Another problem is that it doesnt recover well from one particulartype of user error. When you are through with a window or display, you should shrink the windowrather than closing it. If you close the window, tkined will not automatically reopen it for you. Whenyou later use a command that needs the closed window, it will appear that tkined has simply ignoredyour command. Usually, you can simply unload and then reload the menu that contains the selectionused to initially create the window. Typically, the last item on a menu (for example, see Figure 7-4and Figure 7-7) will remove or delete the menu and unload the subsystem. Then go to the Tools menuand reload the menu. The appropriate subsystem will be reloaded, correcting the problem. This can bevery frustrating when you first encounter it, but it is easy to work around or avoid once you know tolook for it. 153
  • 164. One other problem with tkined is that it uses a single community string when talking with devices.This can be changed with Set SNMP Parameters, which is available on several menus. But if you areusing different community strings within your network or prefer using read-only strings most of thetime but occasionally need to change something, changing the community string can be a nuisance.Overall, these few problems seem to be minor inconveniences for an otherwise remarkably usefulprogram. The program has a number of additional features—such as sending reports to the syslogsystem—that were not discussed here. You should, however, have a pretty good idea of how to getstarted using tkined from this discussion.7.3 Non-SNMP ApproachesOf course, SNMP is not the only way to retrieve information or monitor systems. For example, anumber of devices now have small HTTP servers built in that allow remote configuration andmanagement. These can be particularly helpful in retrieving information. With Unix, it is possible toremotely log on to a system using telnet or ssh over a network connection and reconfigure the host.There is probably very little I can say about using these approaches that you dont already know or thatisnt obvious. There is one thing that you undoubtedly know, but that is all too easy to forget—dontmake any changes that will kill your connection.[4] [4] One precaution that some administrators use is connecting the console port of crucial devices to another device that should remain reachable—a port on a terminal server, a modem, or even a serial port on a nearby server. If you take this "milking-machine" approach, be sure this portal is secure.Some remote-access programs provide a greater degree of control than others. In a MicrosoftWindows environment, where traditionally there is only one user on a system, a remote controlprogram may take complete control of the remote system. On a multiuser system such as a Unix-basedsystem, the same software may simply create another session on the remote host. Although theseprograms are not specifically designed with network management in mind, they work well asmanagement tools.While these approaches will allow you to actively retrieve information or reconfigure devices, theremote systems are basically passive entities. There are, however, other monitoring tools that youcould consider. Big Brother (bb) is one highly regarded package. It is a web-based, multiplatformmonitor. It is available commercially and, for some uses, noncommercially.7.4 Microsoft WindowsSNMP is implemented as a Win32 service. It is available for the more recent versions of Windows butmust be installed from the distribution CD-ROM. Installation and setup is very straightforward butvaries from version to version.7.4.1 Windows SNMP SetupWith NT, SNMP is installed from the Network applet under the Control Panel. Select Add under theServices tab, then select SNMP Services from the Select Network Service pop-up box. You will thenbe prompted for your distribution CD-ROM. Once it is installed, a pop-up box called Microsoft 154
  • 165. SNMP Properties will appear. You use the three tabs on this box to configure SNMP. The Agent tab isused to set the contact and location. The Traps tab is used to set the Community name and address ofthe management station that will receive the traps. Use the Add button in the appropriate part of thebox. The Security tab is used to set the community strings, privileges, and addresses for themanagement stations. Be sure to select the radio button Accept SNMP Packets for These Hosts if youwant to limit access. If you experience problems running SNMP, try reinstalling the latest service packfrom Microsoft.Installation with Windows 98 is similar, but at the Select Network Service prompt, you must clickHave Disk. The SNMP agent can be found in the ToolsReskitNetadminSNMP directory on theinstallation disk. SNMP is not included with the original distribution of Windows 95 but can beinstalled from the Resource Kit or downloaded from Microsoft. On later releases, it can be found onthe distribution disk in AdminNtoolsSNMP.With Windows 2000, instead of using the Network applet, you will use the Add/Remove Programsapplets. Select Add/Remove Windows Components. From the Windows Components Wizard, selectManagement and Monitoring Tools. Click on Next to install SNMP. To configure SNMP, start theAdministrative Tools applet, and select Services and then SNMP Services. Youll be given morechoices, but you can limit yourself to the same three tabs as with Windows NT.For further details on installation and configuration of SNMP on Windows platforms, look first to theWindows help system. You might also look at James D. Murrays Windows NT SNMP.7.4.2 SNMP ToolsNET SNMP is available both in source and binary form for Windows. With the binary version Idownloaded, it was necessary to move all the subdirectories up to C:usr to get things to work.Although the program still needs a little polish, it works well enough. As noted in Chapter 6, tkined isalso available under Windows.One very nice freeware program for Windows, written by Philippe Simonet, is getif. This providesboth SNMP services as well as other basic network services. It is intuitively organized as a windowwith a tab for each service.To begin using getif, you must begin with the Parameters tab. You identify and set the communitystrings for the remote host here. Having done this, clicking on Start will retrieve the basic informationcontained in the system group. This is shown in Figure 7-14. Even if you know this information, it is agood idea to get it again just to make sure everything is working correctly. Figure 7-14. getif Parameters tab 155
  • 166. Once this has been done, many of the other services simply require selecting the appropriate tab andclicking on Start. For example, you can retrieve the devices interface, address, routing, and ARPtables this way.The Reachability tab will allow you to send an ICMP ECHO_REQUEST and will also test if severalcommon TCP ports, such as HTTP, TELNET, SMTP, and so on, are open. The Traceroute tab doesboth a standard ICMP traceroute and an SNMP traceroute. An SNMP traceroute constructs the routefrom the route tables along the path. Of course, all the intervening routers must be SNMP accessibleusing the community strings set under the Parameters tab. The NSLookup tab does a name servicelookup. The IP Discovery tab does simple IP scanning.The MBrowsertab provides a graphical interface to NET SNMP. This is shown in Figure 7-15. In thelarge pane in the upper left, the MIB tree is displayed. You can expand and collapse subtrees asneeded. You can select a subtree by clicking on its root node. If you click on Walk, all readableobjects in the subtree will be queried and displayed in the lower pane. You can also use this display toset objects. Figure 7-15. getif MBrowser tab 156
  • 167. The Graph tab will be discussed in Chapter 8.7.4.3 Other OptionsApart from SNMP, there are a number of remote administration options including several third-partycommercial tools. If remote access is the only consideration, vnc is an excellent choice. In particular,the viewer requires no installation. It is under 200KB so it can be run from a floppy disk. It provides avery nice way to access an X Window session on a Unix system from a PC even if you dont want touse it for management. Installation of the server binary is very straightforward. However, vnc will notprovide multiuser access to Windows and can be sluggish over low-bandwidth connections such asdial-up lines. Under these circumstances, you might consider Microsoft Terminal Server, MicrosoftCorporations thin client architecture, which supports remote access. (See Chapter 11 for moreinformation on vnc.)For other administrative tasks, there are a number of utilities that are sold as part of MicrosoftsResource Kits. While not free, these are generally modestly priced, and many of the tools can bedownloaded from the Web at no cost. Some tools, while not specifically designed for remotetroubleshooting, can be used for that purpose if you are willing to allow appropriate file sharing.These include the System Policy Editor, Registry Editor, System Monitor, and Net Watcher, amongothers. These are all briefly described by the Windows help system and more thoroughly in Microsoftpublished documentation. 157
  • 168. Chapter 8. Performance Measurement ToolsEverything on your network may be working, but using it can still be a frustrating experience. Often, apoorly performing system is worse than a broken system. As a user on a broken system, you knowwhen to give up and find something else to do. And as an administrator, it is usually much easier toidentify a component that isnt working at all than one that is still working but performing poorly. Inthis chapter, we will look at tools and techniques used to evaluate network performance.This chapter begins with a brief overview of the types of tools available. Then we look at ntop, anexcellent tool for watching traffic on your local network. Next, I describe mrtg, rrd, and cricket—toolsfor collecting traffic data from remote devices over time. RMON, monitoring extensions to SNMP, isnext. We conclude with tools for use on Microsoft Windows systems.Dont overlook the obvious! Although we will look at tools for measuring traffic, user dissatisfactionis probably the best single indicator of the health of your network. If users are satisfied, you needntworry about theoretical problems. And if users are screaming at your door, then it doesnt matter whatthe numbers prove.8.1 What, When, and WhereNetwork performance will depend on many things—on the applications you are using and how theyare configured, on the hosts running these applications, on the networking devices, on the structureand design of the network as a whole, and on how these pieces interact with one another. Even thoughthe focus of this chapter is restricted to network performance, you shouldnt ignore the other pieces ofthe puzzle. Problems may arise from the interaction of these pieces, or a problem with one of thepieces may look like a problem with another piece. A misconfigured or poorly designed applicationcan significantly increase the amount of traffic on a network. For example, Version 1.1 of the HTTPprotocol provides for persistent connections that can significantly reduce traffic. Not using thisparticular feature is unlikely to be a make or break issue. My point is, if you look only at the traffic ona network without considering software configurations, you may seem to have a hardware capacityproblem when a simple change in software might lessen the problem and, at a minimum, buy you alittle more time.This chapter will focus on tools used to collect information on network performance. The first step inanalyzing performance is measuring traffic. In addition to problem identification and resolution, thisshould be done as part of capacity planning and capacity management (tuning). Several books listed inAppendix B provide general discussions of application and host performance analysis.Of the issues related to measuring network traffic, the most important ones are what to measure, howoften, and where. Although there are no simple answers to any of these questions, what to measure isprobably the hardest of the three. It is extremely easy to end up with so much data that you dont havetime to analyze it. Or you may collect data that doesnt match your needs or that is in an unusableformat. If you keep at it, eventually you will learn from experience what is most useful. Take the timeto think about how you will use the data before you begin. Be as goal directed as possible. Just realizethat, even with the most careful planning, when faced with a new, unusual problem, youll probablythink of something you wish you had been measuring. 158
  • 169. If you are looking at the performance of your system over time, then data at just one point in time willbe of little value. You will need to collect data periodically. How often you collect will depend on thegranularity or frequency of the events you want to watch. For many tasks, the ideal approach is onethat periodically condenses and eventually discards older data.Unless your network is really unusual, the level of usage will vary with the time of day, the day of theweek, and the time of the year. Most performance related problems will be most severe at the busiesttimes. In telephony, the hour when traffic is heaviest is known as the busy hour, and planning centersaround traffic at this time. In a data network, for example, the busy hour may be first thing in themorning when everyone is logging on and checking their email, or it could be at noon when everyoneis web surfing over their lunch hour.Knowing usage patterns can simplify data collection since youll need to do little collecting when thenetwork is underutilized. Changes in usage patterns can indicate fundamental changes in your networkthat youll want to be able to identify and explain. Finally, knowing when your network is least busyshould give you an idea of the most convenient times to do maintenance.I have divided traffic-measurement tools into three rough categories based on where they are usedwithin a network. Tools that allow you to capture traffic coming into or going out of a particularmachine are called host-monitoring tools. Tools that place an interface in promiscuous mode andallow you to capture all the traffic at an interface are called point-monitoring tools. Finally, tools thatbuild a global picture of network traffic by querying other hosts (which are in turn running either host-monitoring or point-monitoring tools) are called network-monitoring tools. Both host monitoring andpoint monitoring should have a minimal impact on network traffic. With the exception of DNS traffic,they shouldnt be generating additional traffic. This is not true for network-monitoring tools.Because of their roles within a network, devices such as switches and routers dont easily fit into thisclassification scheme. If a single switch interconnects all devices in a subnet, then it will see all thelocal traffic. If, however, multiple switches are used and you arent mirroring traffic, each switch willsee only part of the traffic. Routers will see only traffic moving between networks. While this is idealfor measuring traffic between local and remote devices, it is not helpful in understanding strictly localtraffic. The problem should be obvious. If you monitor the wrong device, you may easily missbottlenecks or other problems. Before collecting data, you need to understand the structure of yournetwork so you can understand what traffic is actually being seen. This is one reason the informationin Chapter 6, is important.Finally, you certainly wont want to deal with raw data on a routine basis. You will want tools thatpresent the data in a useful manner. For time-series data, graphs and summary statistics are usually thebest choice.8.2 Host-Monitoring ToolsWe have already discussed host-monitoring tools in several different parts of this book, particularlyChapter 2 and Chapter 4. An obvious example of a host-monitoring tool is netstat. You will recall thatthe -i option will give a cumulative picture of the traffic into and out of a computer.Although easy to overlook, any tool that logs traffic is a host-monitoring tool of sorts. These aregenerally not too useful after the fact, but you may be able to piece together some information fromthem. A better approach is to configure the software to collect what you need. Dont forget 159
  • 170. applications, like web servers, that collect data. Accounting tools and security tools provide otherpossibilities. Tools like ipfw, ipchains, and tcpwrappers all support logging. (Log files are discussedin greater detail in Chapter 11.)Host-monitoring tools can be essential in diagnosing problems related to host performance, but theygive very little information about the performance of the network as a whole. Of course, if you havethis information for every host, youll have the data you need to construct a complete picture.Constructing that picture is another story.8.3 Point-Monitoring ToolsA point-monitoring tool puts your network interface in promiscuous mode and allows you to collectinformation on all traffic seen at the computers interface. The major limitation to point monitoring isit gives you only a local view of your network. If your focus is on host performance, this is probablyall that you will need. Or, if you are on a shared media network such as a hub, you will see all of thelocal traffic. But, if you are on a switched network, you will normally be able to see only traffic to orfrom the host or broadcast traffic. And as more and more networks shift to switches for efficiency, thisproblem will worsen.The quintessential point-monitoring tools are network sniffers. In Chapter 5, we saw several utilitiesthat capture traffic and generate traffic summaries. These included tcp-reduce, tcptrace, and xplot. Ingeneral, sniffers are not really designed for traffic measurement—they are too difficult to use for thispurpose, provide too much information, and provide information in a format ill-suited to this purpose.But if you really want to understand a problem, packet capture gives you the most complete picture, ifyou can wade through all the data.8.3.1 ntopntop, the work of Luca Deri, is an excellent example of just how useful a point-monitoring tool can be.ntop is usually described as the network equivalent of the Unix utility top. Actually, it is a lot more.ntop is based on the libpcap library that originated at the Lawrence Berkeley National Laboratory andon which tcpdump is based. It puts the network interface in promiscuous mode so that all traffic at theinterface is captured. It will then begin to collect data, periodically creating summary statistics. (It willalso use lsof and other plug-ins to collect data if available.)ntop can be run in two modes: as a web-based utility using a built-in web server or in interactive mode,i.e., as a text-based application on a host. It closely resembles top when run in interactive mode. Thiswas the default mode with earlier versions of ntop but is now provided by a separate command, intop.Normally, you will want to use a separate window when using interactive mode.8.3.1.1 Interactive modeHere is an example of the output with intop :$<50> intop 0.0.1 (Sep 19 2000) listening on [eth0]379 Pkts/56.2 Kb [IP 50.5 Kb/Other 5.7 Kb] Thpt: 6.1 Kbps/24.9 Kbps Host Act -Rcv-Rcvd- Sent TC-TCP- UDPIC$ 160
  • 171. sloan B 69.0% 16.7% 38.8 Kb 0 0 lnx1a B 16.7% 69.4% 9.4 Kb 0 0 rip2-routers.mcast.net R 3.7% 0.0% 0 2.1 Kb 0 172.16.3.1 B 2.1% 6.5% 0 0 0 Cisco CDPD/VTP [MAC] I 4.7% 0.0% 0 0 0 172.16.3.3 B 2.2% 6.1% 0 0 0Interpretation of the data is straightforward. The top two lines show the program name and version,date, interface, number of packets, total traffic, and throughput. The first column lists hosts by nameor IP number. The second column reflects activity since the last update—Idle, Send, Receive, or Both.The next two columns are the amount of traffic sent and received, while the last two columns breaktraffic down as TCP, UPD, or ICMP traffic.intop should be started with the -i option to specify which interface to use. For example:lnx1# intop -i eth0If your computer is multihomed, you can specify several interfaces on the command line, each with aseparate -i. Once started, it prints an annoying 20 lines or so of general information about the programand then gives you a prompt. At this point, you can enter ? to find out what services are available:intop@eth0> ?Commands enclosed in <> are not yet implemented.Commands may be abbreviated. Commands are: Y FL ? <warranty> filter swap nbt help <copying> sniff top <dump> exit history uptime lsdev <last> quit open <hash> hosts <nslookup> AM prompt <close> info arpintop@eth0> TEAs you can see, a number of commands are planned but had not been implemented at the time thiswas written. Most are exactly what you would expect. You use the top command to get a display likethe one just shown. The info command reports the interface and number of packets captured. With thefilter command, you can set packet-capture filters. You use the same syntax as explained in Chapter 5with tcpdump. (Filters can also be specified on the command line when intop is started.) The lsdevcommand lists interfaces. The swap command is used to jump between data collection on twodifferent interfaces.You can change how the data is displayed on-the-fly using your keyboard. For example, the d key willallow you to toggle between showing all hosts or only active hosts. The l key toggles betweenshowing or not showing only local hosts. The p key can be used to show or suppress showing data aspercentages. The y key is used to change the sorting order among the columns. The n key is used totoggle between hostnames and IP addresses. The r key can be used to reset or zero statistics. The qkey is used to stop the program.8.3.1.2 Web modeActually, youll probably prefer web mode to interactive mode, as it provides considerably moreinformation and a simpler interface. Since ntop uses a built-in web server, you wont need to have aseparate web server running on your system. By default, ntop uses port 3000, so this shouldntinterfere with any existing web servers. If it does, or if you are paranoid about using default ports, youcan use the -w option to select a different port. The only downside is that the built-in web server uses 161 Team-Fly®
  • 172. frames and displays data as tables, which still seems to confuse some browsers, particularly whenprinting.There are a number of options, some of which are discussed next, but the defaults work well enoughto get you started. Once you start ntop, point your browser to the machine and port it runs on. Figure8-1 shows what the initial screen looks like. Figure 8-1. ntops home pageAs you can see, on startup ntop provides you with a brief description of the program in the largerframe to the right. The real area of interest is the menu on the left. By clicking on the triangles, eachmenu expands to give you a number of choices. This is shown to the left in Figure 8-2. Figure 8-2. ntops All Protocols page 162
  • 173. Figure 8-2 shows the All Protocols page, which groups traffic by protocol and host. This is availablefor both received and transmitted data. A number of statistics for other protocols—such as AppleTalk,OSPF, NetBIOS, and IGMP—have scrolled off the right of this window. You can click on the columnheader to sort the data based on that column. By default, this screen will be updated every two minutes,but this can be changed.The IP option displays received or transmitted data grouped by individual IP protocols such as FTP,HTTP, DNS, and Telnet. The Throughput option gives a table organized by host and by throughput,average throughput, and peak throughput for both bits and packets.The Stats submenu offers a number of options. Multicast gives a table of multicast traffic. Trafficprovides you with a number of tables and graphs showing how traffic breaks down. Figure 8-3 showsone of these graphs. Figure 8-3. ntops Traffic page under Stats 163
  • 174. Figures and tables break down traffic by broadcast versus unicast versus multicast packets, by packetsize categories, by IP versus non-IP traffic, by protocol category such as TCP versus UDP versusAppleTalk versus Other, and by application protocols such as FTP versus Telnet. Either bar graphs orpie charts are used to display the data. The tables give the data in both kilobytes and percentages.These graphs can save you a lot of work in analyzing data and discovering how your network is beingused.The Host option under Stats gives basic host information including hostnames, IP addresses, MACaddresses for local hosts, transmit bandwidth, and vendors for MAC addresses when known. Byclicking on a hostname, additional data will be displayed as shown in Figure 8-4. Figure 8-4. Host information 164
  • 175. The host shown here is on a different subnet from the host running ntop, so less information isavailable. For example, there is no way for ntop to discover the remote hosts MAC address or to tracktraffic to or from the remote host that doesnt cross the local network. Since this displays connectionsbetween hosts, its use has obvious privacy implications.The Throughput option gives a graph of the average throughput over the last hour. Domain gives atable of traffic grouped by domain. Plug-ins provide a way to extend the functionality of ntop byadding other applications. Existing plug-ins provide support for such activities as tracking new ARPentries, NFS traffic, and WAP traffic and tracking and classifying ICMP traffic.An important issue in capacity planning is what percentage of traffic is purely local and whatpercentage has a remote network for its source or destination (see Local Versus Remote Traffic). TheIP Traffic menu gives you options to collect this type of information. The Distribution option on theIP Protocols menu gives you plots and tables for local and remote IP traffic. For example, Figure 8-5shows a graph and tables for local and remote-to-local traffic. There is a local-to-remote table that isnot shown. The Usage option shows IP subnet usage by port. Sessions shows active TCP sessions, andRouters identifies routers on the local subnet. Figure 8-5. Measuring local and remote traffic 165
  • 176. Local Versus Remote TrafficBefore the Internet became popular, most network traffic stayed on the local network. Thiswas often summarized as the 90-10 Rule (or sometimes the 80-20 Rule), a heuristic thatsays that roughly 90% of network traffic will stay on the local network. The Internet hasturned the old 90-10 Rule on its head by providing a world of reasons to leave the localnetwork; now most traffic does just that. Today the 90-10 Rule says that 90% of traffic onthe local network will have a remote site as its source or destination.Clearly, the 90-10 Rule is nothing more than a very general rule of thumb. It may be anentirely inappropriate generalization for your network. But knowing the percentage of localand remote traffic can be useful in understanding your network in a couple of ways. First,whatever the numbers, they really shouldnt be changing a lot over time unless somethingfundamental is changing in the way your network is being used. This is something youllwant to know about.Second, local versus remote traffic provides a quick sanity check for network design. If90% of your traffic is entering or leaving your network over a 1.544-Mbps T1 line, youshould probably think very carefully about why you need to upgrade your backbone to 166
  • 177. gigabit speeds.The last menu, Admin, is used to control the operation of ntop. Switch NIC allows you to capture on adifferent interface, and Reset Stats zeros all cumulative statistics. Shutdown shuts down ntop. Usersand URLs allow you to control access to ntop.A number of command-line options allow you to control how ntop runs. These can be listed with the -h option. As noted previously, -w is used to change the port it listens to, and -i allows you to specifywhich interface to listen to. -r sets the delay between screen updates in seconds. The -n option is usedto specify numeric IP addresses rather than hostnames. Consult the documentation for other options.ntop has other features not discussed here. It can be used as a lightweight intrusion detection system. Itprovides basic access control and can be used with secure HTTP. It also provides facilities to log data,including logging to a SQL database.As previously noted, the real problem with point monitoring is that it doesnt really work well withsegmented or switched networks. Unless you are mirroring all traffic to your test host, many of thesenumbers can be meaningless. If this is the case, youll want to collect information from a number ofsources.8.4 Network-Monitoring ToolsIt should come as no surprise that SNMP can be used to collect performance information. We havealready seen simple examples in Chapter 7. Using the raw statistics gathered with a tool like NETSNMP or even the stripcharts in tkined is alright if you need only a little data, but in practice you willwant tools designed to deal specifically with performance data. Which tool you use will depend onwhat you want to do. One of your best choices from this family of tools is mrtg. (Although it is notdiscussed here, you also may want to look at scion. This is from Merit Networks, Inc., and will rununder Windows as well as Unix.)8.4.1 mrtgmrtg (Multirouter Traffic Grapher) was originally developed by Tobias Oetiker with the support ofnumerous people, most notably Dave Rand. This tool uses SNMP to collect statistics from networkequipment and creates web-accessible graphs of the statistics. It is designed to be run periodically toprovide a picture of traffic over time. mrtg is ideally suited for identifying busy-hour traffic. All youneed to do is scan the graph looking for the largest peaks.mrtg is most commonly used to graph traffic through router interfaces but can be configured for otheruses. For example, since NET SNMP can be used to collect disk usage data, mrtg could be used toretrieve and graph the amount of free space on the disk drive over time for a system running snmpd.Because the graphs are web-accessible, mrtg is well suited for remote measurement. mrtg usesSNMPs GET command to collect information. With the current implementation, collection is done bya Perl module supplied as part of mrtg. No separate installation of SNMP is needed.mrtg is designed to be run regularly by cron, typically every five minutes. However, mrtg can be runas a standalone program, or the sampling interval can be changed. Configuration files, generallycreated with the cfgmaker utility, determine the general appearance of the web pages and what data is 167
  • 178. collected. mrtg generates graphs of traffic in GIF format and HTML pages to display these graphs.Typically, these will be made available by a web server running on the same computer as mrtg, but thefiles can be viewed with a web browser running on the same computer or the files can be moved toanother computer for viewing. This could be helpful when debugging mrtg since the web server mayconsiderably complicate the installation, particularly if you are not currently running a web server orare not comfortable with web server configuration.Figure 8-6 shows a typical web page generated by mrtg. In this example, you can see some basicinformation about the router at the top of the page and, below it, two graphs. One shows traffic for thelast 24 hours and the other shows traffic for the last two weeks, along with summary statistics for each.The monthly and yearly graphs have scrolled off the page. This is the output for a single interface.Input traffic is shown in green and output traffic is shown in blue, by default, on color displays. Figure 8-6. mrtg interface reportIt is possible to have mrtg generate a summary web page with a graph for each interface. Each graphis linked to the more complete traffic report such as the one shown in Figure 8-6. The indexmakerutility is used to generate this page once the configuration file has been created.8.4.1.1 mrtg configuration file 168
  • 179. To use mrtg, you will need a separate configuration file for each device. Each configuration file willdescribe all the interfaces within the device. Creating these files is the first step after installation.While a sample configuration file is supplied as part of the documentation, it is much easier to use thecfgmaker script. An SNMP community string and hostname or IP number must be supplied as parts toa compound argument:bsd2# cfgmaker public@172.16.2.1 > mrtg.cfgSince the script writes the configuration to standard output, youll need to redirect your output to a file.If you want to measure traffic at multiple devices, then you simply need to create a differentconfiguration file for each. Just give each a different (but meaningful) name.Once you have a basic configuration file, you can further edit it as you see fit. As described next, thiscan be an involved process. Fortunately, cfgmaker does a reasonable job. In many cases, this willprovide all you need, so further editing wont be necessary.Here is the first part of a fairly typical configuration file. (You may want to compare this to the sampleoutput shown in Figure 8-6.)# Add a WorkDir: /some/path line to this fileWorkDir: /usr/local/share/doc/apache/mrtg####################################################################### Description: Cisco Internetwork Operating System Software IOS (tm) 3600 Software (C3620-IO3-M), Version 12.0(7)T, RELEASE SOFTWARE (fc2) Copyright (c)1986-1999 by cisco Systems, Inc. Compiled Wed 08-Dec-99 10:08 by phanguye# Contact: "Joe Sloan"# System Name: NLRouter# Location: "LL 214"#.....................................................................Target[C3600]: 1:public@172.16.2.1MaxBytes[C3600]: 1250000Title[C3600]: NLRouter (C3600): Ethernet0/0PageTop[C3600]: <H1>Traffic Analysis for Ethernet0/0 </H1> <TABLE> <TR><TD>System:</TD><TD>NLRouter in "LL 214"</TD></TR> <TR><TD>Maintainer:</TD><TD>"Joe Sloan"</TD></TR> <TR><TD>Interface:</TD><TD>Ethernet0/0 (1)</TD></TR> <TR><TD>IP:</TD><TD>C3600 (205.153.60.250)</TD></TR> <TR><TD>Max Speed:</TD> <TD>1250.0 kBytes/s (ethernetCsmacd)</TD></TR> </TABLE>#---------------------------------------------------------------Target[172.16.2.1.2]: 2:public@172.16.2.1MaxBytes[172.16.2.1.2]: 1250000Title[172.16.2.1.2]: NLRouter (No hostname defined for IP address): Ethernet0/1PageTop[172.16.2.1.2]: <H1>Traffic Analysis for Ethernet0/1 </H1> <TABLE> <TR><TD>System:</TD><TD>NLRouter in "LL 214"</TD></TR> <TR><TD>Maintainer:</TD><TD>"Joe Sloan"</TD></TR> <TR><TD>Interface:</TD><TD>Ethernet0/1 (2)</TD></TR> <TR><TD>IP:</TD><TD>No hostname defined for IP address(172.16.1.1)</TD></TR> <TR><TD>Max Speed:</TD> 169
  • 180. <TD>1250.0 kBytes/s (ethernetCsmacd)</TD></TR> </TABLE>#---------------------------------------------------------------As you can see from the example, the general format of a directive is Keyword[Label]:Arguments. Directives always start in the first column of the configuration file. Their argumentsmay extend over multiple lines, provided the additional lines leave the first column blank. In theexample, the argument to the first PageTop directive extends for 10 lines.In this example, Ive added the second line—specifying a directory where the working files will bestored. This is a mandatory change. It should be set to a directory that is accessible to the web serveron the computer. It will contain log files, home pages, and graphs for the most recent day, week,month, and year for each interface. The interface label, explained shortly, is the first part of a filename.Filename extensions identify the function of each file.Everything else, including the files just described, is automatically generated. As you can see,cfgmaker uses SNMP to collect some basic information from the device, e.g., sysName, sysLocation,and sysContact, for inclusion in the configuration file. This information has been used both in theinitial comment (lines beginning with #) and in the HTML code under the PageTop directive. As youmight guess, PageTop determines what is displayed at the top of the page in Figure 8-6.cfgmaker also determines the type of interface by retrieving ifType and its maximum operating speedby retrieving ifSpeed, ethernetCsmacd and 125.0 kBytes/s in this example. The interface typeis used by the PageTop directive. The speed is used by both PageTop and the MaxBytes directive. TheMaxBytes directive determines the maximum value that a measured variable is allowed to reach. If alarger number is retrieved, it is ignored. This is given in bytes per second, so if you think in bits persecond, dont be misled.cfgmaker collects information on each interface and creates a section in the configuration file for each.Only two interfaces are shown in this fragment, but the omitted sections are quite similar. Eachsection will begin with the Target directive. In this example, the first interface is identified with thedirective Target[C3600]: 1:public@172.16.2.1. The interface was identified by the initialscan by cfgmaker. The label was obtained by doing name resolution on the IP address. In this case, itcame from an entry in /etc/hosts.[1] If name resolution fails, the IP and port numbers will be used as alabel. The argument to Target is a combination of the port number, SNMP community string, and IPaddress of the interface. You should be aware that adding or removing an interface in a monitoreddevice without updating the configuration file can lead to bogus results. [1] In this example, a different system name and hostname are used to show where each is used. This is not recommended.The only other directive in this example is Title, which determines the title displayed for the HTMLpage. These examples are quite adequate for a simple page, but mrtg provides both additionaldirectives and additional arguments that provide a great deal of flexibility.By default, mrtg collects the SNMP objects ifInOctets and ifOutOctets for each interface. This can bechanged with the Target command. Here is an example of a small test file (the recommended way totest mrtg) that is used to collect the number of unicast and nonunicast packets at an interface.bsd2# cat test.cfgWorkDir: /usr/local/share/doc/apache/mrtg 170
  • 181. Target[Testing]: ifInUcastPkts.1&ifInNUcastPkts.1:public@172.16.2.1MaxBytes[Testing]: 1250000Title[Testing]: NLRouter: Ethernet0/0PageTop[Testing]: <H1>Traffic Analysis for Ethernet0/0 </H1> <TABLE> <TR><TD>System:</TD><TD>NLRouter in "LL 214"</TD></TR> <TR><TD>Maintainer:</TD><TD>"Joe Sloan"</TD></TR> <TR><TD>Interface:</TD><TD>Ethernet0/0 (1)</TD></TR> <TR><TD>IP:</TD><TD>C3600 (205.153.60.250)</TD></TR> <TR><TD>Max Speed:</TD> <TD>1250.0 kBytes/s (ethernetCsmacd)</TD></TR> </TABLE>mrtg knows a limited number of OIDs. These are described in the mibhelp.txt file that comes withmrtg. Fortunately, you can use dotted notation as well, so you arent limited to objects with knownidentifiers. Nor do you have to worry about MIBs. You can also use an expression in the place of anidentifier, e.g., the sum of two OIDs, or you can specify an external program if you wish to collectdata not available through SNMP. There are a number of additional formats and options available withTarget.Other keywords are available that will allow you to customize mrtgs behavior. For example, you canuse the Interval directive to change the reported frequency of sampling. Youll also need to changeyour crontab file to match. If you dont want to use cron, you can use the RunAsDaemon directive, inconjunction with the Interval directive to set mrtg up to run as a standalone program. Interval takes an Yargument in minutes; for example, Interval: 10 would sample every 10 minutes. To enable mrtg FLto run as a stand-alone program, the syntax is RunAsDaemon: yes.Several directives are useful for controlling the appearance of your graphs. If you dont want all four AMgraphs, you can suppress the display of selected graphs with the Suppress directive. For example,Suppress[Testing]: my will suppress the monthly and yearly graphs. Use d and w for daily andweekly graphs. You may use whatever combination you want. TEOne annoyance with mrtg is that it scales each graph to the largest value that has to be plotted. mrtgshouldnt be faulted for this; it is simply using what information it has. But the result can be graphswith some very unusual vertical scales and sets of graphs that you cant easily compare. This issomething youll definitely want to adjust.You can work around this problem with several of the directives mrtg provides, but the approach youchoose will depend, at least in part, on the behavior of the data you are collecting. The Unscaleddirective suppresses automatic scaling of data. It uses the value from MaxBytes as maximum on thevertical scale. You can edit MaxBytes if you are willing to have data go off the top of the graph. If youchange this, you should use AbsMax to set the largest value that you expect to see.Other commands allow you to change the color, size, shape, and background of your graphs. You canalso change the directions that graphs grow. Here is an example that changes the display of data to bitsper second, has the display grow from left to right, displays only the daily and weekly graphs, and setsthe vertical scale to 4000 bits per second:Options[Testing]: growright,bitsSuppress[Testing]: myMaxBytes[Testing]: 500AbsMax[Testing]: 1250000Unscaled[Testing]: dw 171 Team-Fly®
  • 182. Notice that you still need to give MaxBytes and AbsMax in bytes.Many more keywords are available. Only the most common have been described here, but theseshould be more than enough to meet your initial needs. See the mrtg sample configuration file anddocumentation for others.Once you have the configuration file, use indexmaker to create a main page for all the interfaces on adevice. In its simplest form, you merely give the configuration file and the destination file:bsd2# indexmaker mrtg.cfg > /usr/local/www/data/mrtg/index.htmlYou may specify a router name and a regular expression that will match a subset of the interfaces ifyou want to limit what you are looking at. For example, if you have a switch with a large number ofports, you may want to monitor only the uplink ports.Youll probably want to run mrtg manually a couple of times. Here is an example using theconfiguration file test.cfg:bsd2# mrtg test.cfgRateup WARNING: .//rateup could not read the primary log file for testingRateup WARNING: .//rateup The backup log file for testing was invalid as wellRateup WARNING: .//rateup Cant remove testing.old updating log fileRateup WARNING: .//rateup Cant rename testing.log to testing.old updating logfileThe first couple of runs will generate warning messages about missing log files and the like. Theseshould go away after a couple of runs and can be safely ignored.Finally, youll want to make an appropriate entry in your contab file. For example, this entry will runmrtg every five minutes on a FreeBSD system:0,5,10,15,20,25,30,35,40,45,50,55 * * * * /usr/ports/net/mrtg/work/mrtg-2.8.12/run/mrtg /usr/ports/net/mrtg/work/mrtg-2.8.12/run/mrtg.cfg > /dev/null 2>&1This should be all on a single line. The syntax is different on some systems, such as Linux, so be sureto check your local manpages.8.4.2 rrd and the Future of mrtgThe original version of mrtg had two deficiencies, a lack of both scalability and portability. Originally,mrtg was able to support only about 20 routers or switches. It used external utilities to perform SNMPqueries and create GIF images—snmpget from CMU SNMP and pnmtogif from the PBM package,respectively.These issues were addressed by MRTG-2, the second and current version of mrtg. Performance wasimproved when Dave Rand contributed rateup to the project. Written in C, rateup improved bothgraph generation and handling of the log files.The portability problem was addressed by two changes. First, Simon Leinens Perl script for collectingSNMP is now used, eliminating the need for CMU SNMP. Second, Thomas Boutells GD library is 172
  • 183. now used to directly generate graphics. At this point, mrtg is said to reasonably support querying 500ports on a regular basis.As an ongoing project, the next goal is to further improve performance and flexibility. Toward thisgoal, Tobias Oetiker has written rrd (Round Robin Database), a program to further optimize thedatabase and the graphing portion of mrtg. Although MRTG-3, the next version of mrtg, is notcomplete, rrd has been completed and is available as a standalone program. MRTG-3 will be built ontop of rrd.rrd is designed to store and display time-series data. It is written in C and is available under the GNUGeneral Public License. rrd stores data in a round-robin fashion so that older data is condensed andeventually discarded. Consequently, the size of the database stabilizes and will not continue to growover time.8.4.3 cricketA number of frontends are available for rrd, including Jeff Allens cricket. Allen, working at WebTV,was using mrtg but found that it really wasnt adequate to support the 9000 targets he needed tomanage. Rather than wait for MRTG-3, he developed cricket. At least superficially, cricket hasbasically the same uses as mrtg. But cricket has been designed to be much more scalable. cricket isorganized around the concept of a configuration tree. The configuration files for devices are organizedin a hierarchical manner so the general device properties can be defined once at a higher level andinherited, while exceptions can be simply defined at a lower level of the hierarchy. This makes cricketmuch more manageable for larger organizations with large numbers of devices. Since it is designedaround rrd, cricket is also much more efficient.cricket does a very nice job of organizing the pages that it displays. To access the pages, you willbegin by executing the grapher.cgi script on the server. For example, if the server were at172.16.2.236 and CGI scripts were in the cgi-bin directory, you would point your browser to the URLhttp://172.16.2.236/cgi-bin/grapher.cgi. This will present you with a page organized around types ofdevices, e.g., routers, router interfaces, switches, along with descriptions of each. From this you willselect the type of device you want to monitor. Depending on your choice, you may be presented with alist of monitored devices items or with another subhierarchy such as that shown in Figure 8-7. Figure 8-7. cricket router interfaces 173
  • 184. You can quickly drill down to the traffic graph for the device of interest. Figure 8-8 shows an exampleof a traffic graph for a router interface on a router during a period of very low usage (but you get theidea, I hope). Figure 8-8. Traffic on a single interface 174
  • 185. As you can see, this looks an awful lot like the graphs from mrtg. Unlike with mrtg, you have somecontrol over which graphs are displayed from the web page. Short-Term displays both hourly anddaily graphs, Long-Term displays both weekly and monthly graphs, and Hourly, Daily, and All arejust what you would expect.[2] [2] mrtg uses Daily to mean an hour-by-hour plot for 24 hours. cricket uses Hourly to mean the same thing. This shouldnt cause any problems.Of course, you will need to configure each option for mrtg to work correctly. You will need to gothrough the hierarchy and identify the appropriate targets, set SNMP community strings, and add anydescriptions that you want. Here is the interfaces file in the router-interfaces subdirectory of thecricket-config directory, the directory that contains the configuration tree. (This file corresponds to theoutput shown in Figure 8-8.)target --default-- router = NLCisco snmp-community=publictarget Ethernet0_0 175
  • 186. interface-name = Ethernet0/0 short-desc = "Gateway to Internet"target Ethernet0_1 interface-name = Ethernet0/1 short-desc = "172.16.1.0/24 subnet"target Ethernet0_2 interface-name = Ethernet0/2 short-desc = "172.16.2.0/24 subnet"target Ethernet0_3 interface-name = Ethernet0/3 short-desc = "172.16.3.0/24 subnet"target Null0 interface-name = Null0 short-desc = ""While this may look simpler than an mrtg configuration file, youll be dealing with a large number ofthese files. If you make a change to the configuration tree, you will need to recompile theconfiguration tree before you run cricket. As with mrtg, you will need to edit your crontab file toexecute the collector script on a regular basis.On the whole, cricket is considerably more difficult to learn and to configure than mrtg. One way thatcricket gains efficiency is by using CGI scripts to generate web pages only when they are neededrather than after each update. The result is that the pages are not available unless you have a webserver running on the same computer that cricket is running on. Probably the most difficult part of thecricket installation is setting up your web server and the cricket directory structure so that the scriptscan be executed by the web server without introducing any security holes. Setting up a web server andweb security are beyond the scope of this book.Unless you have such a large installation that mrtg doesnt meet your needs, my advice would be tostart with mrtg. Its nice to know that cricket is out there. And if you really need it, it is a solid packageworth learning. But mrtg is easier to get started with and will meet most peoples needs.8.5 RMONAs we saw in the last chapter, SNMP can be used to collect network traffic at an interface.Unfortunately, SNMP is not a very efficient mechanism in some circumstances. Frequent collection ofdata over an overused, low-bandwidth WAN link can create the very problems you are using SNMP toavoid. Even after you have the data, a significant amount of processing may still be needed before thedata is in a useful form.A better approach is to do some of the processing and data reduction remotely and retrieve dataselectively. This is one of the ideas behind the remote monitoring (RMON) extensions to SNMP.RMON is basically a mechanism to collect and process data at the point of collection. RMONprovides both continuous and offline data collection. Some implementation can even provide remotepacket capture. The RMON mechanism may be implemented in software on an existing device, indedicated hardware such as an add-on card for a device, or even as a separate device. Hardwareimplementations are usually called RMON probes. 176
  • 187. Data is organized and retrieved in the same manner as SNMP data. Data organization is described inan RMON MIB, identified by OIDs, and retrieved with SNMP commands. To the users, RMON willseem to be little more than an expanded or super MIB. To implementers, there are significantdifferences between RMON and traditional SNMP objects, resulting from the need for continuousmonitoring and remote data processing.Originally, RMON data was organized in nine groups (RFCs 1271 and 1757) and later expanded toinclude a tenth group (RFC 1513) for token rings:Statistics group Offers low-level utilization and error statisticsHistory group Provides trend analysis data based on the data from the statistics groupAlarm group Provides for the user to configure alarmsEvent group Logs and generates traps for user-defined rising thresholds, falling thresholds, and matched packetsHost group Collects statistics based on MAC addressesTop N Hosts group Collects host statistics for the busiest hostsPacket Capture group Controls packet captureTraffic Matrix group Collects and returns errors and utilization data based on pairs of addressesFilter group Collects information based on definable filtersToken-ring group Collects low-level token-ring statistics 177
  • 188. RMON implementations are often limited to a subset of these groups. This isnt unrealistic, but youshould be aware of what you are getting when paying the premium prices often required for RMONsupport.Provided you have the RMON MIB loaded, you can use snmptranslate to explore the structure ofthese groups. For example, here is the structure of the statistics group:bsd2# snmptranslate -Tp rmon.statistics+--statistics(1) | +--etherStatsTable(1) | +--etherStatsEntry(1) | +-- -R-- Integer etherStatsIndex(1) | Range: 1..65535 +-- -RW- ObjID etherStatsDataSource(2) +-- -R-- Counter etherStatsDropEvents(3) +-- -R-- Counter etherStatsOctets(4) +-- -R-- Counter etherStatsPkts(5) +-- -R-- Counter etherStatsBroadcastPkts(6) +-- -R-- Counter etherStatsMulticastPkts(7) +-- -R-- Counter etherStatsCRCAlignErrors(8) +-- -R-- Counter etherStatsUndersizePkts(9) +-- -R-- Counter etherStatsOversizePkts(10) +-- -R-- Counter etherStatsFragments(11) +-- -R-- Counter etherStatsJabbers(12) +-- -R-- Counter etherStatsCollisions(13) +-- -R-- Counter etherStatsPkts64Octets(14) +-- -R-- Counter etherStatsPkts65to127Octets(15) +-- -R-- Counter etherStatsPkts128to255Octets(16) +-- -R-- Counter etherStatsPkts256to511Octets(17) +-- -R-- Counter etherStatsPkts512to1023Octets(18) +-- -R-- Counter etherStatsPkts1024to1518Octets(19) +-- -RW- String etherStatsOwner(20) | Textual Convention: OwnerString +-- -RW- EnumVal etherStatsStatus(21) Textual Convention: EntryStatus Values: valid(1), createRequest(2), underCreation(3),invalid(4)You retrieve the number of Ethernet packets on each interface exactly as you might guess:bsd2# snmpwalk 172.16.1.9 public rmon.1.1.1.5rmon.statistics.etherStatsTable.etherStatsEntry.etherStatsPkts.1 = 36214rmon.statistics.etherStatsTable.etherStatsEntry.etherStatsPkts.2 = 0rmon.statistics.etherStatsTable.etherStatsEntry.etherStatsPkts.3 = 3994rmon.statistics.etherStatsTable.etherStatsEntry.etherStatsPkts.4 = 242rmon.statistics.etherStatsTable.etherStatsEntry.etherStatsPkts.5 = 284rmon.statistics.etherStatsTable.etherStatsEntry.etherStatsPkts.6 = 292rmon.statistics.etherStatsTable.etherStatsEntry.etherStatsPkts.7 = 314548rmon.statistics.etherStatsTable.etherStatsEntry.etherStatsPkts.8 = 48074rmon.statistics.etherStatsTable.etherStatsEntry.etherStatsPkts.9 = 36861rmon.statistics.etherStatsTable.etherStatsEntry.etherStatsPkts.10 = 631831rmon.statistics.etherStatsTable.etherStatsEntry.etherStatsPkts.11 = 104rmon.statistics.etherStatsTable.etherStatsEntry.etherStatsPkts.12 = 457157rmon.statistics.etherStatsTable.etherStatsEntry.etherStatsPkts.25 = 0rmon.statistics.etherStatsTable.etherStatsEntry.etherStatsPkts.26 = 0rmon.statistics.etherStatsTable.etherStatsEntry.etherStatsPkts.27 = 0 178
  • 189. (This is data from a recently installed 12 port switch. The last three interfaces are currently unuseduplink ports.)The primary problem with RMON, as described, is that it is limited to link-level traffic. This issue isbeing addressed with RMON2 (RFC 2021), which adds another 10 groups. In order to collectnetwork-level information, however, it is necessary to delve into packets. This is processing intensive,so it is unlikely that RMON2 will become common in the near future. For most purposes, the first fewRMON groups should be adequate.One final word of warning. While RMON may lessen network traffic, RMON can be CPU intensive.Make sure you arent overloading your system when collecting RMON data. It is ironic that toolsdesigned to analyze traffic to avoid poor performance can actually cause that performance. To maketruly effective use of an RMON probe, you should consider using a commercial tool designedspecifically for your equipment and goals.8.6 Microsoft WindowsApart from the basic text-based tools such as netstat, Microsoft doesnt really include many usefulutilities with the consumer versions of Windows. But if you are using Windows NT or Windows 2000,you have more options. The netmon tool is included with the server versions. A brief description ofhow this tool can be used to capture traffic was included in Chapter 5. netmon can also be used tocapture basic traffic information.Figure 8-9 shows netmons basic capture screen. The upper-left pane shows five basic graphs for real-time traffic—network utilization, frames per second, bytes per second, broadcasts per second, andmulticasts per second. The second pane on the left lists current connections between this and otherhosts. The details of these connections are provided in the bottom pane. The pane on the right givesoverall network statistics. To use netmon in this fashion, just start the program and select CaptureStart. In standalone mode, netmon functions as a point-monitoring tool, but as noted in Chapter 5, itcan be used with agents to collect traffic throughout the network. Figure 8-9. netmon traffic monitoring 179
  • 190. For general systems monitoring, perfmon (Performance Monitor) is a better choice. It is also suppliedwith both the workstation and server versions. perfmon is a general performance-monitoring tool, notjust a network-monitoring tool. You can use it to measure system performance (including CPUutilization) and I/O performance, as well as basic network performance. If appropriately configured, itwill also monitor remote machines.Data collected is organized by object type, e.g., groups of counters. For example, with the UDP object,there are counters for the number of datagrams sent per second, datagrams received per second,datagrams received errors, etc. For network monitoring, the most interesting objects include ICMP, IP,Network Interface, RAS Ports, RAS Total, TCP, and UDP.perfmon provides four views—alert, chart, log, and report. With alert view you can set a threshold andbe notified when a counter exceeds or drops below it. Chart view gives a real-time graph for selectedcounters. You can customize the sampling rate and scale. Log view logs all the counters for an objectto a file periodically. Finally, report view displays numerical values in a window for selected counters.Each view is independent of the others. Figure 8-10 shows the process of adding a monitored object tothe chart view for the Windows NT version. Figure 8-10. Windows NT perfmon 180
  • 191. The Windows 2000 version has received a slight face-lift but seems to be the same basic program.perfmon can be particularly useful if you arent sure whether you have a host problem or a network Yproblem. Both netmon and perfmon are described in the Windows help files as well as several books FLdescribed in Appendix B.8.6.1 ntop, mrtg, and cricket on Windows AMAll three major packages described in this chapter—ntop, mrtg, and cricket—are available forWindows systems. TEThe developers of ntop have provided you with two choices. You can compile it yourself for free.Both the Unix and Windows versions share the same source tree. Or, if you cant easily compile it, youcan buy a precompiled binary directly from them. Since ntop is basically a point-monitoring tool,youll likely want to run it on multiple machines if you have a switched network or multiplesubnetworks.Since mrtg and cricket are primarily written in Perl, it is not surprising that they will run underWindows. Youll find mrtg fairly straightforward to set up. While cricket is said to work, at the timethis was written there were no published directions on how to set it up, and the Unix directions dontgeneralize well.Setting up mrtg for Windows is not that different from setting it up under Unix. To get mrtg running,youll need to download a copy of mrtg with the binary for rateup. This was included with the copy ofmrtg I downloaded, but the mrtg web page for NT has a separate link should you need it. You willneed a copy of Perl along with anything else you may need to get Perl running. The mrtg site has linksto the Active Perl site. Installing Active Perl requires an updated version of the Windows Installer,available at their site. Youll need to provide some mechanism for running mrtg on a regular basis. Thefile fiveminute.zip provided a program to add mrtg to the Windows NT scheduler. Finally, youll wantto provide some mechanism to view the output from mrtg. This could be a web server or, at aminimum, a web browser. 181 Team-Fly®
  • 192. Once you have unpacked everything, youll need to edit the mrtg script so that NT rather than Unix isthe operating system. This amounts to commenting out the fourth line of the script and uncommentingthe fifth:#$main::OS = UNIX;$main::OS = NT;Also, make sure rateup is in the same directory as mrtg.Creating the configuration file and running the script is basically the same as with the Unix version.Youll want to run cfgmaker and indexmaker. And, as with the Unix version, youll need to edit theconfiguration file to set WorkDir :. You will need to invoke Perl explicitly and use explicit paths withthese scripts. For example, here are the commands to run indexmaker and mrtg on my system:D:mrtgrun>perl d:mrtgrunindexmaker d:mrtgrunmrtg.cfg >d:apachehtdocsmrtgD:mrtgrun>perl d:mrtgrunmrtg d:mrtgrunmrtg.cfgOn my system, D:mrtgrun is the directory where mrtg is installed and D:apachehtdocsmrtg iswhere the output is put so it can be accessed by the web server.Finally, youll need to make some provision to run mrtg periodically. As noted, you can use suppliedcode to add it to the scheduler. Alternately, you can edit the configuration file to have it run as adaemon. For example, you could add the following to your configuration file:RunAsDaemon: yesInterval: 5Youll want to add mrtg to the startup group so that it will be run automatically each time the system isrebooted.8.6.2 getif revisitedIn Chapter 7, we introduced getif but did not discuss the graph tab. Basically, the graph tab providesfor two types of graphs—graphs of ping round-trip delays and graphs of SNMP objects. The latterallows us to use getif as a traffic-monitoring tool.Graphing SNMP objects is a three-step process. First, youll need to go back to the Parameters tab andidentify the remote system and set its SNMP community strings. Next, youll need to visit theMBrowser tab and select the objects you want to graph. Locate the objects of interest by working yourway down the MIB tree in the large pane on the upper left of the window. Visit the object by clickingthe Walk button. The object and its value should be added to the large lower pane. Finally, select theitem from the large pane and click on the Add to Graph button. (Both of these tabs were described inChapter 7.)You can now go to the Graph tab. Each of the selected variables should have been added to the legendto the right of the chart. You can begin collecting data by clicking on the Start button. Figure 8-11shows one such graph. Figure 8-11. getif graph 182
  • 193. The controls along the bottom of the page provide some control over the appearance of the chart andover the sampling rate. 183
  • 194. Chapter 9. Testing Connectivity ProtocolsThis chapter and the next describe tools used to investigate protocol-specific behavior. In this chapter,I describe tools used to explore connectivity protocols, protocols that work at the network andtransport levels to provide connectivity. Chapter 10 focuses on tools used in testing protocols at theapplication level.I begin with a description of packet generation tools. Custom packet generators, like hping andnemesis, will allow you to create custom packets to test protocols. Load generators, like MGEN, willlet you flood your network with packets to see how your network responds to the additional traffic.We conclude with a brief discussion of network emulators and simulators.Many of the tools described in this chapter and the next are not tools that you will need often, if ever.But should the need arise, you will want to know about them. Some of these tools are described quitebriefly. My goal is to familiarize you with the tools rather than to provide a detailed introduction.Unless you have a specific need for one of these tools, youll probably want to just skim these chaptersinitially. Should the need arise, youll know the appropriate tool exists and can turn to the referencesfor more information.9.1 Packet Injection ToolsThis first group of tools generates and injects packets into your network. Basically, there are twodifferent purposes for generating packets, each with its own general approach and its own set of tools.First, to test software configuration and protocols, it may be necessary to control the content ofindividual fields within packets. For example, customized packets can be essential to test whether afirewall is performing correctly. They can also be used to investigate problems with specific protocolsor to collect information such as path MTU. They are wonderful learning tools, but using them can bea lot of work and will require a very detailed knowledge of the relevant protocols.The second reason for generating packets is to test performance. For this purpose, you typicallygenerate a large number of packets to see how your network or devices on the network respond to theincreased load. We have already done some of this. In Chapter 4, we looked at tools that generatedstreams of packets to analyze link and path performance. Basically, any network benchmark will havea packet generator as a component. Typically, however, you wont have much control over thiscomponent. The tools described here give you much greater control over the number, size, andspacing of packets. Unlike custom packet generators, load generators typically wont provide muchcontrol over the contents of the packets.These two uses are best thought of as extremes on a continuum rather than mutually exclusivecategories. Some programs lie somewhere between these two extremes, providing a moderate degreeof control over packet contents and the functionality to generate multiple packets. There is no oneideal tool, so you may want to become familiar with several, depending on your needs.9.1.1 Custom Packets Generators 184
  • 195. A number of different programs will construct custom packets for you. The utilities vary considerablyin the amount of control you actually have. As all require a thorough understanding of the underlyingprotocols, none of these tools are particularly easy to use. All of the ones I am familiar with arecommand-line programs. This is really a plus since, if you find yourself using these programs heavily,you will want to call them from scripts.Two programs, hping and nemesis, are briefly described here. A number of additional tools are cited atthe end of this section in case these utilities dont provide the exact functionality you want or arenteasily ported to your system. Of the two, hping is probably the better known, but nemesis has featuresthat recommend it. Neither is perfect.Generally, once you have the idea of how to use one of these tools, learning another is simply a matterof identifying the options of interest. Most custom packet generators have a reasonable set of defaultsthat you can start with. Depending on what you want to do, you select the appropriate options tochange just what is necessary—ideally as little as possible.Custom packet tools have a mixed reputation. They are extremely powerful tools and, as such, can beabused. And some of their authors seem to take great pride in this potential. These are definitely toolsthat you should use with care. For some purposes, such as testing firewalls, they can be indispensable.Just make sure it is your firewall, and not someone elses, that you are testing.9.1.1.1 hpinghping, or hping2 as it is sometimes called, was written by Salvatore Sanfilippo. The documentation isa little rough at times and suggests uses that are inappropriate. Nonetheless, it is a powerful, versatileprogram.When run with the default parameters, it looks a lot like ping and is useful for checking connectivity:lnx1# hping 205.153.63.30eth0 default routing interface selected (according to /proc)HPING 205.153.63.30 (eth0 205.153.63.30): NO FLAGS are set, 40 headers + 0 databytes46 bytes from 205.153.63.30: flags=RA seq=0 ttl=126 id=786 win=0 rtt=4.4 ms46 bytes from 205.153.63.30: flags=RA seq=1 ttl=126 id=1554 win=0 rtt=4.5 ms46 bytes from 205.153.63.30: flags=RA seq=2 ttl=126 id=2066 win=0 rtt=4.6 ms46 bytes from 205.153.63.30: flags=RA seq=3 ttl=126 id=2578 win=0 rtt=5.5 ms46 bytes from 205.153.63.30: flags=RA seq=4 ttl=126 id=3090 win=0 rtt=4.5 ms--- 205.153.63.30 hping statistic ---5 packets tramitted, 5 packets received, 0% packet lossround-trip min/avg/max = 4.4/4.7/5.5 msAt first glance, the output looks almost identical to pings. Actually, by default, hping does not sendICMP packets. It sends TCP packets to port 0. (You can change ports with the -p option.) Since thisport is almost never used, most systems will reply with a RESET message. Consequently, hping willsometimes get responses from systems that block ping. On the other hand, it may trigger intrusiondetection systems as well. If you want to mimic ping, you can use the -1 argument, which specifiesICMP. Or, if you prefer, you can use -2 to send UDP packets.When using ICMP, this is what one of the replies from the output looks like:46 bytes from 205.153.63.30: icmp_seq=0 ttl=126 id=53524 rtt=2.2 ms 185
  • 196. Otherwise, the output will be almost identical to the default behavior.If you want more information, you can use -V for verbose mode. Here is what a reply looks like withthis option:46 bytes from 172.16.2.236: flags=RA seq=0 ttl=63 id=12961 win=0 rtt=1.0 ms tos = 0 len = 40 seq = 0 ack = 108515096 sum = a5bc urp = 0There is also a debug mode if you are having problems with hping.Other options that control the general behavior of hping include -c to set the number of packets tosend, -i to set the time between packets, -n for numeric output (no name resolution), and -q for quietoutput (just summary lines when done).Another group of options allows you to control the contents of the packet header. For example, the -aoption can be used to specify an arbitrary source address for a packet. Here is an example:lnx1# hping2 -a 205.153.63.30 172.16.2.236eth0 default routing interface selected (according to /proc)HPING 172.16.2.236 (eth0 172.16.2.236): NO FLAGS are set, 40 headers + 0 databytes--- 172.16.2.236 hping statistic ---4 packets tramitted, 0 packets received, 100% packet lossround-trip min/avg/max = 0.0/0.0/0.0 msIn this case, the packet has been sent from a computer whose actual source address is 172.16.3.234.The packet, however, will have 205.153.63.30 in its IP header as the source address. Of course, anyreply from the destination will go back to the spoofed source address, not the actual source address. Ifthis a valid address that belongs to someone else, they may not look kindly on your testing.Spoofing source addresses can be useful when testing router and firewall setup, but you should do thisin a controlled environment. All routers should be configured to drop any packets with invalid sourceaddresses. That is, if a packet claims to have a source that is not on the local network or that is notfrom a device for which the local network should be forwarding a packet, then the source address isillegal and the packet should be dropped. By creating packets with illegal source addresses, you cantest your routers to be sure they are, in fact, dropping these packets. Of course, you need to use a toollike ethereal or tcpdump to see what is getting through and what is blocked.[1] [1] If this is all you are testing, you may prefer to use a specialized tool like egressor.The source port can be changed with the -s option. The TTL field can be set with the -t option. Thereare options to set the various TCP flags: -A for ACK, -F for FIN, -P for PUSH, -R for RST, -S forSYN, and -U for URG. Oddly, although you can set the urgent flag, there doesnt seem to be a way toset the urgent pointer. You can set the packet size with the -d option, set the TCP header length withthe -O option, and read the packets data from a file with the -E option. Here is an example of sendinga DNS packet using data in the file data.dns:bsd2# hping -2 -p 53 -E data.dns -d 31 205.153.63.30hping generated an error on my system with this command, but the packet was sent correctly. 186
  • 197. Be warned, constructing a usable data file is nontrivial. Here is a crude C program that will constructthe data needed for this DNS example:#include <stdio.h>main( ){FILE *fp;fp=fopen("data.dns", "w");fprintf(fp, "%c%c%c%c", 0x00, 0x01, 0x01, 0x00);fprintf(fp, "%c%c%c%c", 0x00, 0x01, 0x00, 0x00);fprintf(fp, "%c%c%c%c", 0x00, 0x00, 0x00, 0x00);fprintf(fp, "%c%s", 0x03, "www");fprintf(fp, "%c%s", 0x05, "cisco");fprintf(fp, "%c%s%c", 0x03, "com", 0x00);fprintf(fp, "%c%c%c%c", 0x00, 0x01, 0x00, 0x01);fclose(fp);}Even if you dont use C, it should be fairly clear how this works. The fopen command creates the file,and the fprintf commands write out the data. %c and %s are used to identify the datatype whenformatting the output. The remaining arguments are the actual values for the data. (Im sure there arecleaner ways to create this data, but this will work.)Finally, hping can also be put in dump mode so that the contents of the reply packets are displayed inhex:bsd2# hping -c 1 -j 172.16.2.230HPING 172.16.2.230 (ep0 172.16.2.230): NO FLAGS are set, 40 headers + 0 databytes46 bytes from 172.16.2.230: flags=RA seq=0 ttl=128 id=60017 win=0 rtt=2.1 ms 0060 9706 2222 0060 088f 5f0e 0800 4500 0028 ea71 0000 8006 f26b ac10 02e6 ac10 02ec 0000 0a88 0000 0000 1f41 a761 5014 0000 80b3 0000 0000 0000 0000--- 172.16.2.230 hping statistic ---1 packets transmitted, 1 packets received, 0% packet lossround-trip min/avg/max = 2.1/2.1/2.1 msNumerous other options are described in hpings documentation. You can get a very handy summaryof options if you run hping with the -h option. I strongly recommend you print this to use while youare learning the program.9.1.1.2 nemesisnemesis, whose author is identified only as Obecian in the documentation, is actually a family ofclosely related command-line tools designed to generate packets. They are nemesis-arp, nemesis-dns,nemesis-icmp, nemesis-igmp, nemesis-ospf, nemesis-rip, nemesis-tcp, and nemesis-udp. Each, as youmight guess, is designed to construct and send a particular type of packet. The inclusion of support forprotocols like OSPF or IGMP really sets nemesis apart from similar tools.Here is an example that sends a TCP packet:bsd2# nemesis-tcp -v -D 205.153.63.30 -S 205.153.60.236 187
  • 198. TCP Packet Injection -=- The NEMESIS Project 1.1(c) 1999, 2000 obecian <obecian@celerity.bartoli.org>205.153.63.30[IP] 205.153.60.236 > [Ports] 42069 > 23[Flags][TCP Urgent Pointer] 2048[Window Size] 512[IP ID] 0[IP TTL] 254[IP TOS] 0x18[IP Frag] 0x4000[IP Options]Wrote 40 bytesTCP Packet InjectedThe -v option is for verbose mode. Without this option, the program sends the packet but displaysnothing on the screen. Use this option to test your commands and then omit it when you embed thecommands in scripts. The -S and -D options give the source and destination addresses. You can usethe -x and -y to set source and destination ports. If you want to specify flags, you can use the -f option.For example, if you add -fS -fA to the command line, the SYN and ACK flags will be set. (Manyfirewalls will block packets with some combinations of SYN and ACK flags but will pass packetswith different combinations. Being able to set the SYN and ACK flags can be useful in testing thesefirewalls.)Here is an example setting the SYN and ACK flags and the destination port:bsd2# nemesis-tcp -S 172.16.2.236 -D 205.153.63.30 -fS -fA -y 22Notice the program performs silently without the -v option. A number of additional options aredescribed in the Unix manpages.The other programs in the nemesis suite work pretty much the same way. Here is an example forsending an ICMP ECHO REQUEST:bsd2# nemesis-icmp -v -S 172.16.2.236 -D 205.153.63.30 -i 8ICMP Packet Injection -=- The NEMESIS Project 1.1(c) 1999, 2000 obecian <obecian@celerity.bartoli.org>[IP] 172.16.2.236 > 205.153.63.30[Type] ECHO REQUEST[Sequence number] 0[IP ID] 0[IP TTL] 254[IP TOS] 0x18[IP Frag] 0x4000Wrote 48 bytesICMP Packet InjectedThe -i option specifies the type field in the ICMP header. In this case, the 8 is the code for anECHO_REQUEST message. The destination should respond with an ECHO_REPLY. 188
  • 199. The -P option can be used to read the data for the packet from a file. For example, here is the syntax tosend a DNS query.bsd2# nemesis-dns -v -S 172.16.2.236 -D 205.153.63.30 -q 1 -P data.dnsDNS Packet Injection -=- The NEMESIS Project 1.1(c) 1999, 2000 obecian <obecian@celerity.bartoli.org>[IP] 172.16.2.236 > 205.153.63.30[Ports] 42069 > 53[# Questions] 1[# Answer RRs] 0[# Authority RRs] 0[# Additional RRs] 0[IP ID] 420[IP TTL] 254[IP TOS] 0x18[IP Frag] 0x4000[IP Options]00 01 01 00 00 01 00 00 00 00 00 00 03 77 77 .............ww77 05 63 69 73 63 6F 03 63 6F 6D 00 00 01 00 w.cisco.com....01 .Wrote 40 bytesDNS Packet InjectedAlthough it appears the data has been sent correctly, I have seen examples when the packets were notcorrectly sent despite appearances. So, be warned! It is always a good idea to check the output of apacket generator with a packet sniffer just to make sure you are getting what you expect.9.1.1.3 Other toolsThere are a number of other choices. ipfilter is a suite of programs for creating firewalls. Suppliedwith some operating systems, including FreeBSD, ipfilter has been ported to a number of otherplatforms. One of the tools ipfilter includes is ipsend. Designed for testing firewalls, ipsend is yetanother tool to construct packets. Here is an example:bsd2# ipsend -v -i ep0 -g 172.16.2.1 -d 205.153.63.30Device: ep0Source: 172.16.2.236Dest: 205.153.63.30Gateway: 172.16.2.1mtu: 1500ipsend is not the most versatile of tools, but depending on what system you are using, you may alreadyhave it installed.Yet another program worth considering is sock. sock is described in the first volume of Richard W.Stevens TCP/Illustrated and is freely downloadable. While sock doesnt give the range of controlsome of these other programs give, it is a nice pedagogical tool for learning about TCP/IP. Beware,there are other totally unrelated programs called sock. 189
  • 200. Finally, some sniffers and analyzers support the capture and retransmission of packets. Look at thedocumentation for the sniffer you are using, particularly if it is a commercial product. If you decide touse this feature, proceed with care. Retransmission of traffic, if used indiscriminately, can create somesevere problems. socket and netcatWhile they dont fit cleanly into this or the next category, netcat (or nc) and JuergenNickelsens socket are worth mentioning. (The netcat documentation identifies only theauthor as Hobbit.) Both are programs that can be used to establish a connection betweentwo machines. They are useful for debugging, moving files, and exploring and learningabout TCP/IP. Both can be used from scripts.Youll need to start one copy as a server (in listen mode) on one computer:bsd1# nc -l -p 2000Then start another as a client on a second computer:bsd2# nc 172.16.2.231 2000Here is the equivalent command for socket as a server:bsd1# socket -s 2000Here is the equivalent command for a client:bsd2# socket 172.16.2.231 2000In all examples 2000 is an arbitrarily selected port number.Here is a simple example using nc to copy a file from one system to another. The server isopened with output redirected to a file:bsd1# nc -l -p 2000 > tmpThen the file is piped to the client:bsd2# cat README | nc 172.16.2.231 2000^C punt!Finally, nc is terminated with a Ctrl-C. The contents of README on bsd1 have been copiedto the file tmp on bsd2. These programs can be cleaner than telnet in some testing situationssince, unlike telnet, they dont attempt any session negotiations when started. Play withthem, and you are sure to find a number of other uses.9.1.2 Load GeneratorsWhen compared to custom packet generators, load generators are at the opposite extreme of thecontinuum for packet injectors. These are programs that generate traffic to stress-test a network or 190
  • 201. devices on a network. These tools can help you judge the performance of your network or diagnoseproblems. They can also produce a considerable strain on your network. You should use these tools totest systems offline, perhaps in a testing laboratory prior to deployment or during scheduled downtime.Extreme care should be taken before using these tools on a production network. Unless you areabsolutely convinced that what you are doing is safe and reasonable, dont use these tools onproduction networks.Almost any application can be used to generate traffic. A few tools, such as ping and ttcp, areparticularly easy to use for this purpose. For example, by starting multiple ping sessions in thebackground, by varying the period between packets with the -i option, and by varying the packet sizeswith the -s option, you can easily generate a wide range of traffic loads. Unfortunately, this wontgenerate the type of traffic you may need for some types of tests. Two tools, spray and mgen, aredescribed here. The better known of these is probably spray. (It was introduced in Chapter 4.) It is alsofrequently included with systems so you may already have a copy. mgen is one of the most versatile.9.1.2.1 sprayspray is useful in getting a rough idea of a computers network performance, particularly its interface.spray, on the local computer, communicates with the rpc.sprayd daemon on the remote system beingtested. (Youll need to make sure this is running on the remote system.) It effectively floods the remotesystem with a large number of fixed-length UDP packets. The remote daemon, generally started byinetd, receives and counts these packets. The local copy of spray queries the remote daemon to Ydetermine the number of packets that were successfully received. By comparing the number of packetssent to the number received, spray can calculate the number of packets lost. FLHere is an example of spray using default values: AMbsd2# spray sol1sending 1162 packets of lnth 86 to 172.16.2.233 ... in 0.12 seconds elapsed time TE 191 packets (16.44%) droppedSent: 9581 packets/sec, 804.7K bytes/secRcvd: 8006 packets/sec, 672.4K bytes/secCommand-line options allow you to set the number of packets sent (-c), the length of the packets sent(-l ), and a delay between packets in microseconds (-d ).You should not be alarmed that packets are being dropped. The idea is to send packets as fast aspossible so that the interface will be stressed and packets will be lost. spray is most useful incomparing the performance of two machines. For example, you might want to see if your server cankeep up with your clients. To test this, youll want to use spray to send packets from the client to theserver. If the number of packets dropped is about the same, the machines are fairly evenly matched. Ifa client is able to overwhelm a server, then you may have a potential problem.In the previous example, spray was run on bsd2, flooding sol1. Here are the results of running sprayon sol1, flooding bsd2 :sol1# spray bsd2sending 1162 packets of length 86 to 172.16.2.236 ... 610 packets (52.496%) dropped by 172.16.2.236 36 packets/sec, 3144 bytes/secClearly, sol1 is faster than bsd2 since bsd2 is dropping a much larger percentage of packets. 191 Team-Fly®
  • 202. Unfortunately, while spray can alert you to a problem, it is unable to differentiate among the variousreasons why a packet was lost—collision, slow interface, lack of buffer space, and so on. The obviousthings to look at are the speed of the computer and its interfaces.9.1.2.2 MGENThe Multi-Generator Toolset or MGEN is actually a collection of tools for generating traffic,receiving traffic, and analyzing results. The work of Brian Adamson at the Naval Research Laboratory,this sophisticated set of tools will give you a high degree of control over the shape of the traffic yougenerate. However, you arent given much control over the actual UDP packets the utility sends—thats not the intent of the tool. For its intended uses, however, you have all the control you are likelyto need.The traffic generation tool is mgen. It can be run in command-line mode or by using the -g option ingraphical mode. At its simplest, it can be used with command-line options to generate traffic. Here is asimple example:bsd2# mgen -i ep0 -b 205.153.63.30:2000 -r 10 -s 64 -d 5MGEN: Version 3.1a3MGEN: Loading event queue ...MGEN: Seeding random number generator ...MGEN: Beginning packet generation ... (Hit <CTRL-C> to stop)Trying to set IP_TOS = 0x0MGEN: Packets Txd : 50MGEN: Transmission period: 5.018 seconds.MGEN: Ave Tx pkt rate : 9.964 pps.MGEN: Interface Stats : ep0 Frames Txd : 55 Tx Errors : 0 Collisions : 0MGEN: Done.In this case, 10 packets per second for 5 seconds yields 50 packets.Other options for mgen include setting the interface (-i), the destination address and port (-b), thepacket rate (-r), the packet size (-s), and the duration of the flow in seconds (-d ). There are a numberof other options described in the documentation, such as the type of service and TTL fields.The real strength of mgen comes when you use it with a script. Here is a very simple example of ascript called demo :START NOW00001 1 ON 205.153.63.30:5000 PERIODIC 5 6405000 1 MOD 205.153.63.30:5000 POISSON 20 6415000 1 OFFThe first line tells mgen to start generating traffic as soon as the program is started. (An absolute starttime can also be specified.) The second line creates a flow with an ID of 1 that starts 1 millisecondinto a run that has port 5000 on 205.153.63.30 as its destination. The traffic is 5 packets per second,and each packet is 64 bytes in length. The third line tells mgen to modify the flow with ID 1. 5000milliseconds (or 5 seconds) into the flow, packet generation should switch to a Poission distributionwith a rate of 20 packets per second. The last line terminates the flow at 15,000 milliseconds. Whilethis script has only one flow, a script can contain many. 192
  • 203. Here is an example of the invocation of mgen with a script:bsd2# mgen -i ep0 demoMGEN: Version 3.1a3MGEN: Loading event queue ...MGEN: Seeding random number generator ...MGEN: Beginning packet generation ...MGEN: Packets Txd : 226MGEN: Transmission period: 15.047 seconds.MGEN: Ave Tx pkt rate : 15.019 pps.MGEN: Interface Stats : ep0 Frames Txd : 234 Tx Errors : 0 Collisions : 0MGEN: Done.Since a Poisson distribution was used for part of the flow, we cant expect to see exactly 225 packetsin exactly 15 seconds.For many purposes, mgen is the only tool from the MGEN tool set that you will need. But for somepurposes, you will need more. drec is a receiver program that can log received data. mgen and dreccan be used with RSVP (with ISIs rsvpd). You will recall that with RSVP, the client must establishthe session. drec has this capability. Like mgen, drec has an optional graphical interface. In addition tomgen and drec, the MGEN tool set includes a number of additional utilities that can be used to analyzethe data collected by drec.One last note on load generators—software load generators assume that the systems they run on arefast enough to generate enough traffic to adequately load the system being tested. In somecircumstances, this will not be true. For some applications, dedicated hardware load generators mustbe used.9.2 Network Emulators and SimulatorsBasically, an emulator is a device that sits on a network and mimics the behavior of network devicesor the behavior of part of a system, e.g., subnets. Actual traffic measurements are made on a networkwhose behavior is controlled, in part, by the emulator. Simulators are software systems that modelwith software the behavior of the system or networks. A simulator is a totally artificial or syntheticenvironment.At best, network emulators and simulators are very unlikely troubleshooting tools. But for theextremely ambitious (or desperate), it is possible to investigate the behavior of a network using thesetools. Neither of these approaches is for the fainthearted or novice. Generally an expensive andcomplex proposition, there are two projects that are making these approaches more accessible. If youare really interested in making the investment in time and effort needed to use emulators or simulators,read on.There is a continuum of approaches to investigating the behavior of a network, ranging from directmeasurement at one extreme through emulation to simulation at the opposite extreme. Its not unusualfor emulators to provide limited simulation features or for simulators to have emulation features. Thisis certainly true for the two tools briefly described here. 193
  • 204. We have already discussed measurement techniques. But while real measurements have anunquestionable authenticity, a number of problems are associated with real measurements. Lack ofreproducibility is one problem. Scale problems, such as the cost of increasing the size of the testnetwork, are another concern. If you are considering implementation issues, then direct measurementcan only be done late in the development cycle, compounding the cost of mistakes. Emulation andsimulation offer lower-cost alternatives.Simulators have the advantages of being relatively cheap, providing highly reproducible results,scaling very well and inexpensively, and giving results quickly. It is generally very straightforward tocustomize the degree of detail in reports so you can focus on just what is of interest. Simulations varyin degree of abstraction. The greater the degree of abstraction, the easier it is to focus on what is ofinterest at the cost of lost realism. However, if a simulation is poorly designed, the results can havelittle basis in reality. Also, some simulators may be implemented primarily for one type of use andmay not be appropriate for other uses. From a troubleshooting perspective, you might use a simulatorto further investigate a hypothesis. Simulators would provide a way to closely examine behavior toconfirm or refute the hypothesis without creating problems on a production network.Emulators lie between simulators and live systems. They allow controlled experiments with a highdegree of reproducibility. They make it much easier to create the type of traffic or events of interest.They also provide a mechanism to test real systems effectively. For example, an emulator mightduplicate or approximate the behavior of an attached device or network. A router emulator might droptraffic or inject traffic into the actual test network. On the downside, some emulators tend to be veryspecialized and are usually platform specific. For troubleshooting, an emulator could be used to stressa network.9.2.1 NISTNetNIST Network Emulation Tool (NISTNet) is a general purpose tool that can be used to emulate thedynamics in an IP network. It was developed by the National Institute of Standards and Technology(NIST) and is implemented as an extension to the Linux operating system through a kernel module.Unlike many emulators, NISTNet supports a fairly heterogeneous approach to emulation. And since itwill run on a fairly standard platform, it is remarkably inexpensive to set up and use.NISTNet allows you to use a Linux system configured as a router, through an X Window interface, tomodel or emulate a number of different scenarios. For example, you can program both fixed andvariable packet delays and random reordering of packets. Packets can be dropped either randomly(uniform distribution) or based on congestion.[2] Random duplication of packets, bandwidth limitations,or asymmetric bandwidth can all be programmed into NISTNet. You can also program in jitter and dobasic quality-of-service measurements. NISTNet can be driven by traces from measurements fromexisting networks. User-defined packet handlers can be added to the system to add timestamps, dodata collection, generate responses for emulated clients, and so forth. [2] Gateway emulators that support this kind of behavior are sometimes less charitably called flakeways.9.2.2 ns and namIf you want to consider simulations, you should first look into a pair of programs, ns and nam. ns is anetwork simulator, while nam is a network visualization tool. Both are under development by theVirtual InterNetwork Testbed (VINT) project, a DARPA-funded research project whose goal is toproduce a simulator for studying scale and protocol interactions. VINT is a collaborative project thatinvolves USC, Xerox PARC, LBNL, and UCB. 194
  • 205. ns is derived from earlier simulation projects such as REAL and has gone through a couple ofincarnations. The kernel is written in C++, while user scripts are written in MITs Object ToolCommand Language (OTCL), an object-oriented version of Tcl. With any simulation software, youshould expect a steep learning curve, and ns is no exception. Youll need to learn how to use theproduct, and you will also need a broad knowledge of simulations in general. To use ns, youll need tolearn how to write scripts in OTCL.Fortunately, the ns project provides a wealth of documentation. The Unix manpage is more than 30pages and displays the typical unreadable terseness associated with Unix manual pages—great forlooking up something you already know (arguably the intended use) but abysmal for learningsomething new. There is also a downloadable manual that runs more than 300 pages. However, thebest place to start is with Marc Greiss tutorial. It is a more manageable 50 pages and introduces thescripting language in a series of readable examples.One problem with simulations is that they can produce an overwhelming amount of information. Evenworse, simulation results describe dynamic events that are difficult to interpret when viewed statically.nam is a visualization tool that animates network simulations. It is hard to convey the real flavor ofnam from a single black-and-white snapshot, but Figure 9-1 should give you some idea of its value. Figure 9-1. nam exampleThis is output from one of the sample scripts that comes with the program. The basic topology of thenetwork should be obvious. Packets are drawn as colored rectangles. Different colors are used fordifferent sources. As the animation is played, you see the packets generated, queued at devices, moveacross the network, and occasionally, dropped from the network. Node 6 in the figure shows a stack ofpackets that have been queued and one packet below the node that has been dropped. (Droppedpackets fall to the bottom of the screen.) The control buttons at the top are used just as you wouldexpect—to play, stop, or rewind the animation.NISTNet, ns, and nam are all described as ongoing projects. But all three are more polished than manycompleted projects.9.3 Microsoft Windows 195
  • 206. Few of the tools described in this chapter are available for Windows. Those that are available includesome of the more ambitious tools, however. In particular, ns and nam have downloadable binaries forWindows. According to the mgen documentation, a Windows "version may appear shortly." (netcathas also been ported to Windows.)If you are interested in traffic generation for loading purposes, you might look to ipload. This is a verysimple program that will flood a remote device with UDP packets. You can specify the destinationaddress, destination port, packet rate, and packet payload. As the program runs, it will display awindow with the elapsed time, the number of packets sent, the packet rate, and the number of bytesper second. ipload comes from BTT Software in the U.K. and requires no installation.Several network-oriented benchmark programs available for Windows might also be of interest. Inparticular, you may want to look at NetBench, which can be downloaded from Ziff Daviss web site,http://etestinglabs.com/benchmarks/netbench/netbench.asp. It is designed to test client/serverperformance. Youll need to download both client and server versions of the software. 196
  • 207. Chapter 10. Application-Level ToolsThis chapter briefly surveys some additional tools that might be of interest. You will not need toolsthat are useful when setting up and debugging programs using application-level protocols. The chapteris organized around different application protocols. You will not need the tools described here often.The goal of this chapter is to make you aware of what is available should the need arise, and theapproach described here may be more useful than the specific tools mentioned. Unless you have aspecific problem, youll probably want to just skim this chapter the first time through.10.1 Application-Protocols ToolsMany network applications are built upon application-level protocols rather than being built directlyupon network- or transport-level protocols. For example, email readers typically use SMTP to sendemail and POP2, POP3, or IMAP to receive email. For some applications, it is difficult to distinguishthe application from the underlying protocol. NFS is a prime example. But when an implementationseparates the application from its underlying protocol, a number of advantages can be realized. First,the separation helps to ensure interoperability. A client developed on one platform can communicateeffectively with server software running on a different system. For example, your web browser cancommunicate with any web server because it uses a standardized protocol—HTTP. Tools based on theunderlying protocol can be used to obtain basic information regardless of the specific applicationbeing used.Most of the tools described in this chapter collect information at the protocol level. While it is unlikelythat any of these tools will provide the detailed information you would want for a problem with aspecific application, they should help you identify where the problem lies and will help if the problemis with the protocol. Most applications will have their own approaches to solving problems, e.g.,debug modes, and log files. But youll want to be sure the problem is with the application before youstart with these. If the problem is with the application, youll need to consult the specificdocumentation for the application.If you are having trouble setting up a network application for the first time, you are probably better offrereading the documentation than investing time learning a new tool. But if youve read the directionsthree or four times in several different books or if you have used an application many times and it hassuddenly stopped working, then its probably time to look at tools. For many of the protocols, youllhave a number of choices. You wont need every tool, so pick the most appropriate, convenient tooland start there.Providing a detailed description of all available tools is beyond the scope of any reasonable book. Thiswould require both a detailed review of the protocol as well as a description of the tool. For example,Hal Sterns 400-page book, Managing NFS and NIS, has three chapters totaling about 125 pages ontools, debugging, and tuning NIS and NFS. What Im trying to do here is provide you with enoughinformation to get started and handle simple problems. If you need more information, you shouldconsider looking at one of the many books, like Sterns, devoted to the specific protocol in question. Anumber of such books are described in Appendix B.Generally, these applications are based on a client/server model. The approach youll take indebugging a client may be different from that used to debug a server. The first step, in general, is todecide if the problem is with the client application, the server application, or the underlying protocols. 197
  • 208. If any client on any machine can connect to a server, the server and protocols are probably operatingcorrectly. So when communications fail, the first thing you may want to try is a different clientprogram or a similar client on a different computer. With many protocols, you dont even need a clientprogram. Many protocols are based on the exchange of commands in NVT ASCII[1] over a TCPconnection. You can interact with these servers using any program that can open a TCP connectionusing NVT ASCII. Examples include telnet and netcat. [1] Network Virtual Terminal (NVT) ASCII is a 7-bit U.S. variant of the common ASCII character code. It is used throughout the TCP/IP protocol. It uses 7 bits to encode a character that is transmitted as an 8- bit byte with the high-order bit set to 0.10.1.1 EmailEmail protocols such as SMTP, POP2, and POP3 are perfect examples of protocols where telnet is theoptimal tool to begin with. Here is an example using telnet to send a brief message via SMTP.(Depending on your system, you may need to enable local echoing so that what you type will bevisible.)bsd2# telnet mail.lander.edu 25Trying 205.153.62.5...Connected to mail.lander.edu.Escape character is ^].220 mail.lander.edu ESMTP Sendmail 8.9.3/8.9.3; Wed, 22 Nov 2000 13:22:15 -0500helo 205.153.60.236250 mail.lander.edu Hello [205.153.60.236], pleased to meet youmail from:<jsloan@205.153.60.236>250 <jsloan@205.153.60.236>... Sender okrcpt to:<jsloan@lander.edu>250 jsloan@lander.edu... Recipient okdata354 Enter mail, end with "." on a line by itselfThis is the body of a message..250 NAA28089 Message accepted for deliveryquit221 mail.lander.edu closing connectionConnection closed by foreign host.The process is very simple. telnet is used to connect to port 25, the SMTP port, on the email server inquestion. The next four lines were returned by the server. At this point, we can see that the server is upand that we are able to communicate with it. To send email, use the commands helo to identifyyourself, mail from: to specify the email source, and rcpt to: to specify the destination. Use names, notIP addresses, to specify the destination. Notice that no password is required to send email. (The serveris responding with the lines starting with numbers or codes.) The data command was used to signalthe start of the body of the message. The body is one line long here but can be as long as you like.When you are done entering the body, it is terminated with a new line that has a single period on it.The session was terminated with the quit command. Clearly the server is up and can be reached in thisexample. Any problems you may be having must be with your email client.As noted, you had a pretty good idea the server was working as soon as it replied and could have quitat this point. There are a couple of reasons for going through the process of sending a message. First,it gives a nice warm feeling seeing that everything is truly working. More important, it confirms thatthe recipient is known to the server. For example, consider the following:rcpt to:<jsloane@lander.edu> 198
  • 209. 550 <jsloane@lander.edu>... User unknownThis reply lets us know that the user is unknown to the system. If you have doubts about a recipient,you can use the vrfy and expand commands. The vrfy command will confirm the recipient address isvalid, as shown in the following example:vrfy jsloan250 Joseph Sloan <jsloan@mail.lander.edu>vrfy freddy550 freddy... User unknownexpn fully expands an alias, giving a list of all the recipients named in the alias. Be warned, expn andvrfy are often seen as security holes and may be disabled. (Prudence would dictate using vrfy and expnonly on your own systems.) There are other commands, but these are enough to verify that the serveris available.Another reason for sending the email is that it gives you something to retrieve, the next step in testingyour email connection. The process of retrieving email with telnet is similar, although the commandswill vary with the specific protocol being used. Here is an example using a POP3 server:bsd2# telnet mail.lander.edu 110Trying 205.153.62.5...Connected to mail.lander.edu.Escape character is ^].+OK POP3 mail.lander.edu v7.59 server readyuser jsloan+OK User name accepted, password pleasepass xyzzy+OK Mailbox open, 1 messagesretr 1+OK 347 octetsReturn-Path: <jsloan@205.153.60.236>Received: from 205.153.60.236 ([205.153.60.236]) by mail.lander.edu (8.9.3/8.9.3) with SMTP id NAA28089; Wed, 22 Nov 2000 13:23:14 -0500Date: Wed, 22 Nov 2000 13:23:14 -0500From: jsloan@205.153.60.236Message-Id: <200011221823.NAA28089@mail.lander.edu>Status:This is the body of a message..dele 1+OK Message deletedquit+OK SayonaraConnection closed by foreign host.As you can see, telnet is used to connect to port 110, the POP3 port. As soon as the first messagecomes back, you know the server is up and reachable. Next, you identify yourself using the user andpass commands. This is a quick way to make sure that the account exists and you have the rightpassword. Often, email readers give cryptic error messages when you use a bad account or password.The system has informed us that there is one message waiting for this user. Next, retrieve that messagewith the retr command. The argument is the message number. This is the message we just sent. Deletethe message and log off with the dele and quit commands, respectively. (As an aside, sometimes mailclients will hang with overlarge attachments. You can use the dele command to delete the offendingmessage.) 199
  • 210. Of course, this is how a system running POP3 or SMTP is supposed to work. If it works this way, anysubsequent problems are probably with the client, and you need to turn to the client documentation.You can confirm this with packet capture software. If your system doesnt work properly, the problemcould be with the server software or with communications. You might try logging onto the server andverifying that the appropriate software is listening, using ps, or, if it is started by inetd, using netstat.Or you might try using telnet to connect to the server directly from the server, i.e., telnetlocalhost 25. If this succeeds, you may have routing problems, name service problems, orfirewall problems. If it fails, then look to the documentation for the software you are using on theserver.The commands used by most email protocols are described in the relevant RFCs. For SMTP, see RFC821; for POP2, see RFC 937; for POP3, see REF 1939; and for IMAP, see RFC 1176.10.1.2 HTTPHTTP is another protocol that is based on commands in NVT ASCII sent over a TCP session. It canbe fairly complicated to figure out the correct syntax, but even an error message will tell you that theserver is running and the connection works. Try typing HEAD / HTTP / 1.0 followed by twocarriages returns. Here is an example:bsd2# telnet localhost httpTrying 127.0.0.1...Connected to localhost.lander.edu.Escape character is ^].HEAD / HTTP / 1.0HTTP/1.1 200 OKDate: Sun, 22 Apr 2001 13:27:32 GMTServer: Apache/1.3.12 (Unix)Content-Location: index.html.enVary: negotiate,accept-language,accept-charsetTCN: choiceLast-Modified: Tue, 29 Aug 2000 09:14:16 GMTETag: "a4cd3-55a-39ab7ee8;3a4a1b39"Accept-Ranges: bytesContent-Length: 1370Connection: closeContent-Type: text/htmlContent-Language: enExpires: Sun, 22 Apr 2001 13:27:32 GMTConnection closed by foreign host.In this example, Ive checked to see if the server is responding from the server itself. In general,however, using telnet is probably not worth the effort since it is usually very easy to find a workingweb browser that you can use somewhere on your network.Most web problems, in my experience, stem from incorrectly configured security files or areperformance problems. For security configuration problems, youll need to consult the appropriatedocumentation for your software. For a quick performance profile of your server, you might visitPatrick Killeleas web site, http://patrick.net. If you have problems, you probably want to look at hisbook, Web Performance Tuning.10.1.3 FTP and TFTP 200
  • 211. FTP is another protocol that uses NVT ASCII and can be checked, to a very limited extent, with telnet.Here is a quick check to see if the server is up and can be reached:lnx1# telnet bsd2 ftpTrying 172.16.2.236...Connected to bsd2.lander.edu.Escape character is ^].220 bsd2.lander.edu FTP server (Version 6.00LS) ready.user jsloan331 Password required for jsloan.pass xyzzy230 User jsloan logged in.stat211- bsd2.lander.edu FTP server status: Version 6.00LS Connected to 172.16.3.234 Logged in as jsloan TYPE: ASCII, FORM: Nonprint; STRUcture: File; transfer MODE: Stream No data connection211 End of statusquit221 Goodbye.Connection closed by foreign host.Once you know the server is up, youll want to switch over to a real FTP client. Because FTP opens areverse connection when transferring information, you are limited with what you can do with telnet. YFortunately, this is enough to verify that the server is up, communication works, and you can FLsuccessfully log on to the server.Unlike FTP, TFTP is UDP based. Consequently, TCP-based tools like telnet are not appropriate. AMYoull want to use a TFTP client to test for connectivity. Fortunately, TFTP is a simple protocol andusually works well. TE10.1.4 Name ServicesSince name resolution is based primarily on UDP, you wont be able to debug it with telnet. Nameresolution can be a real pain since problems are most likely to show up when you are using otherprograms or services. Name service applications are applications that youll want to be sure areworking on your system. For clients, it is one of the easiest protocols to test. For servers, however,ferreting out that last error can be a real chore. Fortunately, there are a number of readily availabletools, particularly for DNS.If you suspect name resolution is not working on a client, try using ping, alternating betweenhostnames and IP addresses. If you are consistently able to reach remote hosts with IP addresses butnot with names, then you are having a problem with name resolution. If you have a problem withname resolution on the client side, start by reviewing the configuration files. It is probably easiest tostart with /etc/hosts and then look at DNS. Leave NIS until last.10.1.4.1 nslookup and digThere are several tools, such as nslookup, dig, dnsquery, and host, that are used to query DNS servers.These are most commonly used to retrieve basic domain information such as what name goes withwhat IP address, aliases, or how a domain is organized. With this information, you can map out anetwork, for example, at least to the extent the DNS entries reflect the structure of the network. Whentroubleshooting on the client side, it can be used to ensure the client can reach the appropriate DNS 201 Team-Fly®
  • 212. server. The real value for troubleshooting, however, is being able to examine the information returnedby servers. This allows you to check this information for consistency, correctness, and completeness.For most purposes, there is not much difference among these programs. Your choice will largely be amatter of personal preference. However, you should be aware that some other programs may be builton top of dig, so be sure to keep it around even if you prefer one of the other tools.Of these, nslookup, written by Andrew Cherenson, is the most ubiquitous and the most likely to beinstalled by default. It is even available under Windows. It can be used either in command-line modeor interactively. In command-line mode, you use the name or IP address of interest as an argument:sol1# nslookup 205.153.60.20Server: lab.lander.eduAddress: 205.153.60.5Name: ntp.lander.eduAddress: 205.153.60.20bsd2# nslookup www.lander.eduServer: lab.lander.eduAddress: 205.153.60.5Name: web.lander.eduAddress: 205.153.60.15Aliases: www.lander.eduAs you can see, it returns both the name and IP address of the host in question, the identity of theserver supplying the information, and, in the second example, that the queried name is an alias. Youcan specify the server you want to use as well as other options on the command line. You should beaware, however, that it is not unusual for reverse lookups to fail, usually because the DNS database isincomplete.Earlier versions of nslookup required a special format for finding the names associated with IPaddresses. For example, to look up the name associated with 205.153.60.20, you would have used thecommand nslookup 20.60.153.205.in-addr.arpa. Fortunately, unless you are using a veryold version of nslookup, you wont need to bother with this.While command-line mode is adequate for an occasional quick query, if you want more information,youll probably want to use nslookup in interactive mode. If you know the right combination ofoptions, you could use command-line options. But if you are not sure, it is easier to experiment step-by-step in interactive mode.Interactive mode is started by typing nslookup without any arguments:sol1# nslookupDefault Server: lab.lander.eduAddress: 205.153.60.5>As you can see, nslookup responds with the name of the default server and a prompt. A ? will return alist of available options. You can change the server you want to query with the server command. Youcan get a listing of all machines in a domain with the ls command. For example, ls netlab.lander.eduwould list all the machines in the netlab.lander.edu domain. Use the ls command with caution—it canreturn a lot of information. You can use the -t option to specify a query type, i.e., a particular type of 202
  • 213. record. For example, ls -t mx lander.edu will return the mail entries from lander.edu. Query types caninclude cname to list canonical names for aliases, hinfo for host information, ns for name servers fornamed zones, soa for zone authority record, and so on. For more information, start with the manpagefor nslookup.One useful trick is to retrieve the soa record for local and authoritative servers. Here is part of onesuch record retrieved in interactive mode:> ls -t soa lander.edu[lab.lander.edu]$ORIGIN lander.edu.@ 1D IN SOA lab root ( 960000090 ; serialThe entry labeled serial is a counter that should be incremented each time the DNS records areupdated. If the serial number on your local server, when compared to the authoritative server, is off bymore than 1 or 2, the local server is not updating its records in a timely manner. One possible cause isan old version of bind.Many administrators prefer dig to nslookup. While not quite as ubiquitous as nslookup, it is includedas a tool with bind and is also available as a separate tool. dig is a command-line tool that is quite easyto use. It seems to have a few more options and, since it is command line oriented, it is more suited forshell scripts. On the other hand, using nslookup interactively may be better if you are groping aroundand not really sure what you are looking for.dig, short for Domain Internet Groper, was written by Steve Hotz. Here is an example of using dig todo a simple query:bsd2# dig @lander.edu www.lander.edu; <<>> DiG 8.3 <<>> @lander.edu www.lander.edu; (1 server found);; res options: init recurs defnam dnsrch;; got answer:;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 6;; flags: qr aa rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 1, ADDITIONAL: 1;; QUERY SECTION:;; www.lander.edu, type = A, class = IN;; ANSWER SECTION:www.lander.edu. 1D IN CNAME web.lander.edu.web.lander.edu. 1D IN A 205.153.60.15;; AUTHORITY SECTION:lander.edu. 1D IN NS lander.edu.;; ADDITIONAL SECTION:lander.edu. 1D IN A 205.153.60.5;; Total query time: 9 msec;; FROM: bsd2.lander.edu to SERVER: lander.edu 205.153.60.5;; WHEN: Tue Nov 7 10:26:42 2000;; MSG SIZE sent: 32 rcvd: 106The first argument, in this case @lander.edu, is optional. It gives the name of the name server to bequeried. The second argument is the name of the host you are looking up. 203
  • 214. As you can see, a simple dig provides a lot more information, by default at least, than does nslookup.It begins with information about the name server and resolver flags used. (The flags are documentedin the manpage for bind s resolver.) Next come the header fields and flags followed by the querybeing answered. These are followed by the answer, authority records, and additional records. Theformat is the domain name, TTL field, type code for the record, and the data field. Finally, summaryinformation about the exchange is included.You can also use dig to get other types of information. For example, the -x option is used to do areverse name lookup:bsd2# dig -x 205.153.63.30; <<>> DiG 8.3 <<>> -x;; res options: init recurs defnam dnsrch;; got answer:;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 4;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 1;; QUERY SECTION:;; 30.63.153.205.in-addr.arpa, type = ANY, class = IN;; ANSWER SECTION:30.63.153.205.in-addr.arpa. 1D IN PTR sloan.lander.edu.;; AUTHORITY SECTION:63.153.205.in-addr.arpa. 1D IN NS lander.edu.;; ADDITIONAL SECTION:lander.edu. 1D IN A 205.153.60.5;; Total query time: 10 msec;; FROM: bsd2.lander.edu to SERVER: default -- 205.153.60.5;; WHEN: Mon Nov 6 10:54:17 2000;; MSG SIZE sent: 44 rcvd: 127The mx option (no hyphen) will return mail records, the soa option will return zone authority records,and so on. See the manpage for details.nslookup and dig are not unique. For example, host and dnsquery are other alternatives you may wantto look at. host is said to be designed as a successor for nslookup and dig. But it does everythingonline and can generate a lot of traffic as a result. While very useful tools, all of them rely on yourability to go back and analyze the information returned. There are other tools that help to fill this gap.10.1.4.2 doc, dnswalk, and lamersdoc is one such tool. It was originally written by Steve Hotz and Paul Mockapetris with latermodifications by Brad Knowles. Built on top of dig, doc is a script that attempts to validate theconsistency of information within a domain:bsd2# doc lander.edu.Doc-2.1.4: doc lander.edu.Doc-2.1.4: Starting test of lander.edu. parent is edu.Doc-2.1.4: Test date - Mon Nov 6 11:55:07 EST 2000;; res_nsend to server g.root-servers.net. 192.112.36.4: Operation timed outDIGERR (UNKNOWN): dig @g.root-servers.net. for SOA of parent (edu.) failedSummary: ERRORS found for lander.edu. (count: 3) WARNINGS issued for lander.edu. (count: 1) 204
  • 215. Done testing lander.edu. Mon Nov 6 11:55:40 EST 2000The results are recorded in a log file; in this case log.lander.edu. is the filename. (Note its trailingperiod.)dnswalk, written by David Barr, is a similar tool. It is a Perl script that does a zone transfer and checksthe database for internal consistency. (Be aware that more and more systems are disabling zonetransfers from unknown sites.)bsd2# dnswalk lander.edu.Checking lander.edu.BAD: lander.edu. has only one authoritative nameserverGetting zone transfer of lander.edu. from lander.edu...done.SOA=lab.lander.edu contact=root.lander.eduWARN: bookworm.lander.edu A 205.153.62.205: no PTR recordWARN: library.lander.edu A 205.153.61.11: no PTR recordWARN: wamcmaha.lander.edu A 205.153.62.11: no PTR recordWARN: mrtg.lander.edu CNAME elmer.lander.edu: unknown host0 failures, 4 warnings, 1 errors.Be sure to include the period at the end of the domain name. This can produce a lot of output, so youmay want to redirect output to a file. A number of options are available. Consult the manpage.Youll want to take the output from these tools with a grain of salt. Even though these tools do a lot ofwork for you, youll need a pretty good understanding of DNS to make sense of the error messages.And, as you can see, for the same domain, one found three errors and one warning while the otherfound one error and four warnings for a fully functional DNS domain. There is no question that thisdomains database, which was being updated when this was run, has a few minor problems. But it doeswork. The moral is, dont panic when you see an error message.Another program you might find useful is lamers. This was written by Bryan Beecher and requiresboth doc and dig. It is used to find lame delegations, i.e., a name server that is listed as authoritativefor a domain but is not actually performing that service for the listed domain. This problem most oftenarises when name services are moved from one machine to another, but the parent domain is notupdated. lamers is a simple script that can be used to identify this problem.10.1.4.3 Other toolsIn addition to these debugging tools, there are a number of additional tools that are useful in setting upDNS in the first place. Some, such as make-zones, named-bootconf, and named-xfer, come with bind.Be sure to look over your port carefully. Others, often scripts or collections of scripts, are availablefrom other sources. Examples include h2n and dnsutl. There are a number of good tools out there, sobe sure to look around.10.1.4.4 NIS and NIS+NIS and its variants bring their own set of difficulties. If you are running both DNS and NIS, thebiggest problem may be deciding where the problem lies. Unfortunately, there is no easy way to dothis that will work in every case. The original implementation of nslookup completely bypasses NIS.If it failed, you could look to DNS. If it succeeded, your problems were probably with NIS.Unfortunately, the new, "improved" version of nslookup now queries NIS so this simple test isunreliable. (For other suggestions, see Managing NFS and NIS by Hal Stern or DNS and BIND by Liuet al.) 205
  • 216. If you are setting up NIS, your best strategy is to fully test DNS first. If you are having problems withNIS, there are a number of simple utilities supplied with NIS. ypcat lists an entire map, ypmatchmatches a single key and prints an entry, and ypwhich identifies client bindings. But if you have readthe NIS documentation, you are already familiar with these.10.1.5 RoutingIf you are having routing problems, e.g., receiving error messages saying the host or network isunreachable, then the first place to look is at the routing tables. On the local machine, youll use thenetstat -r command as previously discussed. For remote machines, you can use SNMP if you haveSNMP access.If you are using RIP, rtquery and ripquery are two tools that can be used to retrieve routing tablesfrom remote systems. rtquery is supplied as part of the routed distribution, while ripquery comes withgated. The advantage of these tools is that they use the RIP query and response mechanism to retrievethe route information. Thus, you can use either of these tools to confirm that the RIP exchangemechanism is really working, as well as to retrieve the routing tables to check for correctness.It really doesnt matter which of these you use, as the output from the two is basically the same. Hereis the output from ripquery:bsd2# ripquery 172.16.2.184 bytes from NLCisco.netlab.lander.edu(172.16.2.1) to 172.16.2.236 version 2: 172.16.1.0/255.255.255.0 router 0.0.0.0 metric 1 tag0000 172.16.3.0/255.255.255.0 router 0.0.0.0 metric 1 tag0000 172.16.5.0/255.255.255.0 router 0.0.0.0 metric 2 tag0000 172.16.7.0/255.255.255.0 router 0.0.0.0 metric 2 tag0000Here is the output from rtquery :bsd2# rtquery 172.16.2.1NLCisco.netlab.lander.edu (172.16.2.1): RIPv2 84 bytes 172.16.1.0/24 metric 1 172.16.3.0/24 metric 1 172.16.5.0/24 metric 2 172.16.7.0/24 metric 2Youll notice that these are not your usual routing tables. Rather, these are the tables used by RIPsdistance vector algorithm. They give reachable networks and the associated costs. Of course, youcould always capture a RIP update with tcpdump or ethereal or use SNMP, but the tools discussedhere are a lot easier to use.If you are using Open Shortest Path First (OSPF) (regretfully I dont at present), gated providesospf_monitor. This interactive program provides a wealth of statistics, including I/O statistics anderror logs in addition to OSPF routing tables. (For more information on routing protocols, you mightconsult Routing in the Internet by Christian Huitema or Interconnections by Radia Perlman.)10.1.6 NFS 206
  • 217. With time, Network File System (NFS) has become fairly straightforward to set up. At one time, therewere a number of utilities for debugging NFS problems, but finding current ports has become difficult.At the risk of repeating myself, if you are having trouble setting up NFS, reread your documentation.Keep in mind that the various implementations of NFS all seem to be different, sometimes a lotdifferent. By itself, generic directions for NFS dont work—be sure to consult the specificdocumentation for your operating system!Unlike most other protocols where a single process is started, NFS relies on a number of differentprograms or daemons that vary from client to server and, to some extent, from system to system. Ifyou are having problems with NFS, the first step is to consult your documentation to determine whichdaemons need to be running on your system. Next, make sure they are running. Be warned, thedaemons you need and the names they go by vary from operating system to operating system. Forexample, on most systems, mountd and nfsd, respectively, mount filesystems and access files. Onsome systems they go by the names rpc.mountd and rpc.nfsd. Since these rely on portmap, sometimescalled rpcbind, youll need to make sure it is running as well. (NFS daemons are typically based onRPC and use the portmapper daemon to provide access information.) The list of daemons will bedifferent for the client and the server. For example, nfsiod (or biod) will typically be running on theclient but not the server. Keep in mind, however, that a computer may be both a client and a server.There are a couple of ways to ensure the appropriate processes are available. You could log on to bothmachines and use ps to discover what is running. This has the advantage of showing you everythingthat is running. Another approach is to use rpcinfo to do a portmapper dump. Here is an example ofquerying a server from a client:bsd2# rpcinfo -p bsd1 program vers proto port 100000 2 tcp 111 portmapper 100000 2 udp 111 portmapper 100005 3 udp 1023 mountd 100005 3 tcp 1023 mountd 100005 1 udp 1023 mountd 100005 1 tcp 1023 mountd 100003 2 udp 2049 nfs 100003 3 udp 2049 nfs 100003 2 tcp 2049 nfs 100003 3 tcp 2049 nfs 100024 1 udp 1011 status 100024 1 tcp 1022 statusThis has the advantage of showing that these services are actually reachable across the network.Once you know that everything is running, you should check the access files, typically /etc/dfs/dfstabor /etc/exports, to make sure the client isnt being blocked. You cant just edit these files and expect tosee the results immediately. Consult your documentation on how to inform your NFS implementationof the changes. Be generous with privileges if you are having problems, but dont forget to tightensecurity once everything is working.Finally, check your syntax. Make sure the mount point exists and has appropriate permissions. Mountthe remote system manually and verify that it is mounted with the mount command. You should seesomething recognizable. Here are mount table entries returned, respectively, by FreeBSD, Linux, andSolaris:bsd1:/ on /mnt/nfs type nfs (rw,addr=172.16.2.231,addr=172.16.2.231)172.16.2.231:/ on /mnt/nfs (nfs)/mnt/nfs on 172.16.2.231:/usr read/write/remote on Thu Nov 30 09:49:52 2000 207
  • 218. While they are not too similar, you should see a recognizable change to the mount table before andafter mounting a remote filesystem.If you are having intermittent problems or if you suspect performance problems, you might want touse the nfsstat command. It provides a wealth of statistics about your NFS connection and itsperformance. You can use it to query the client, the server, or both. When called without any options,it queries both client and server. With the -c option, it queries the client. With the -s option, it queriesthe server. Here is an example of querying a client:bsd2# nfsstat -cClient Info:Rpc Counts: Getattr Setattr Lookup Readlink Read Write Create Remove 0 0 33 2 0 21 4 0 Rename Link Symlink Mkdir Rmdir Readdir RdirPlus Access 0 0 0 0 0 8 0 66 Mknod Fsstat Fsinfo PathConf Commit GLease Vacate Evict 0 13 3 0 2 0 0 0Rpc Info: TimedOut Invalid X Replies Retries Requests 0 0 0 0 152Cache Info:Attr Hits Misses Lkup Hits Misses BioR Hits Misses BioW Hits Misses 232 36 74 33 0 0 0 21BioRLHits Misses BioD Hits Misses DirE Hits Misses 13 2 18 8 13 0Unfortunately, it seems that every operating system has its own implementation of nfsstat and eachimplementation returns a different set of statistics labeled in a different way. What youll be mostinterested in is the number of problems in relation to the total number of requests. For example, alarge number of timeouts is no cause for concern if it is a small percentage of a much larger number oftotal requests. If the timeouts are less than a couple of percent, they are probably not a cause forconcern. But if the percent of timeouts is large, you need to investigate. Youll need to sort out themeaning of various numbers returned by your particular implementation of nfsstat. And, unfortunately,the labels arent always intuitive.Several other NFS tools were once popular but seem to have languished in recent years. You probablywont have much luck in finding these or getting them running. Two of the ones that were once morepopular are nhfsstone and nfswatch. nhfsstone is a benchmark tool for NFS, which seems to have beensuperseded with the rather pricey SFS tool in SPEC. nfswatch is a tool that allows you to watch NFStraffic. tcpdump or ethereal, when used with the appropriate filters, provide a workable alternative tonfswatch.10.2 Microsoft WindowsMany of the services described in this chapter are traditionally provided by Unix systems. While moreand more are becoming available, there arent a lot of tools that currently run under Windows. Oneexception is nslookup, which is nearly identical to its Unix counterpart. Of course, the telnet-basedtesting will work as shown. And you can always test a Windows server from a Unix client. If youwant Windows-based tools, the best place to start looking is in the appropriate Windows Resource Kitfrom Microsoft. 208
  • 219. Chapter 11. Miscellaneous ToolsThis chapter contains odds and ends that dont really fit any of the categories described in previouschapters. Most of the software presented here isnt really designed with network troubleshooting inmind, but it is, nonetheless, quite useful. These are tools that will make your life easier. With a fewnotable exceptions, you should already be familiar with most of the tools described here.Consequently, the descriptions of the tools are, for the most part, fairly brief. Feel free to jump aroundin this chapter as needed.11.1 Communications ToolsIf you are going to effectively administer remote systems, you will need to log on remotely. Even withsmall networks, it isnt reasonable to jump up and run to the remote system every time you need to dothis. This section has three subsections. First, a quick review of techniques you can use to record orlog your activities when using familiar tools like telnet, rlogin, and X Windows. Next comes adiscussion of vnc, a tool that allows you to view a computers graphical display remotely. Then Ibriefly discuss security concerns for these tools including a short description of ssh.11.1.1 Automating DocumentationThis book has assumed that you are familiar with tools like telnet, rlogin, and X Windows. To usethese tools effectively, youll want to be able to record or log your activities from time to time.Arguably, one reason documentation is so often flawed is that it is usually written after the fact. Thisis often done from memory or an incomplete set of notes several days after changes have been made.While the best time to write documentation is as you go, often this simply isnt possible. When yournetwork is down and management is calling every five minutes asking if its fixed yet, you probablywont be pausing to write much down.There are a few things you can do to help simplify writing documentation after the fact. First, getcopious printouts at every stage, preferably with some kind of time and date stamp. When aproduction system is down, it is not the time to worry about the cost of paper. Several commands youare probably already familiar with may be easy to overlook with the stress of dealing with a deadsystem.If you are using X Windows, you can use the xwd command to capture windows. To use thiscommand, in an xterm window, type:bsd1# xwd -out xwdfileYou can then click on the window you want to capture. In this example, the file xwdfile will be createdin the current directory. The file can be examined later or printed using tools such as xv or gimp. Besure to give these files meaningful names so that you can sort things out later.If you are using a text-based interface and are interested in capturing the output of a single command,you may be able to use the tee command. This command allows you to send output from a commandto both the screen and a file. For example, the following command will display the output of thecommand arp -a on the screen and write it to the file outfile: 209
  • 220. bsd1# arp -a | tee outfileThe tee command may require special measures to work. For example, you must use the option -l withtcpdump if you want to use tee. An example was given in Chapter 5. As with xwd, you should becareful to use meaningful filenames, particularly if you are capturing windows on the fly.An alternative to tee is script. It can be used to capture the output of a single command or a series ofcommands. To capture a series of commands, you start script and then issue the commands of interest.For example, the following command will create the file scriptfile and return to the system prompt:bsd1# script scriptfileScript started, output file is scriptfileEverything that is displayed on your terminal will be logged to the file scriptfile. One advantage oflogging a series of commands is that you can embed documentation into the file as you go. Simplytype the comment character for your shell, and everything else you type on the line will be ignored.For example, with the Bourne shell, you might type something like:bsd1# #Well, the foo program didnt work. >Lets try the bar program.The "" character was used to continue the comment on a new line.When you are done logging a session, type exit or press Ctrl+D as in:bsd1# exitScript done, output file is scriptfileYou can now print or edit the file as desired.One option that is often overlooked is to include a command with the script command. For example:bsd1# script scriptfile ifconfig -awill run the program ifconfig -a, writing the output to the file scriptfile and displaying the output onthe screen as well. This file will include two time and date stamps, one at the beginning and one at theend of the file.You should be aware of a few problems with using script. First, the file can get very big very quickly.This shouldnt be much of a problem unless you are pressed for disk space, but it can be painful toread after the fact. Second, it is all too easy to lose the file. For example, if a system crashes or ishalted, the file may be lost in the process. Third, commands that directly control the screen such as vitend to fill the output file with garbage. Finally, since a new shell is started by script, environmentalchanges made while script is running may be lost.If you are connecting to a remote system using a variant of telnet, you may be able to log the sessionor print the screen. This is particularly true for PC implementations of telnet. See the documentationfor the version you are using.11.1.2 vnc 210
  • 221. vnc, short for virtual network computing, was developed by what is now the AT&T Laboratories atCambridge. vnc is actually a pair of programs. One is a server, which generates and sends the localdisplays contents to another computer. The other is a viewer, which reconstructs the servers display.You use the computer running the viewer program to control the remote computer running the serverprogram. An application, for example, would actually be running on the servers CPU but controlledby the station running the viewer.The programs implementation is based on the concept of a remote frame buffer (i.e., remote videodisplay memory). The server maintains the frame buffer, a picture of the servers display, and sends itto the viewer. The viewer recreates the display on the local host. The updates to the remote framebuffer may be the complete contents of the frame buffer or, to minimize the impact on bandwidth, justwhat has changed since the last update.In a Unix environment, vnc provides a way to deliver an X Windows session to a host that may notsupport a native X Windows connection. On the surface, a vnc connection probably seems a lot like anX Windows connection. There are, however, some fundamental differences. vnc is designed so theviewer is a very thin client. Unlike an X Windows, almost no work is done at the viewer, and theclient software is stateless. And vnc is freely available on some non-Unix systems where X Windowisnt.vnc can run in one of two modes. In view only mode, the screen is displayed, but the viewer is notgiven control of the servers mouse or keyboard. If view only mode is not selected, the viewer will Yshare control of the mouse and keyboard. Please note, the mouse and keyboard will not necessarily bedisabled at the server. FLTo use vnc in a Unix environment, telnet to the remote computer and start the vnc server with the AMvncserver command. The first time you run it, it will create a .vnc directory under your home directoryand will query you for a connection password that will be used for all future sessions. (You canchange this with the vncpasswd command.) TElnx1$ vncserverYou will require a password to access your desktops.Password:Verify:New X desktop is lnx1.lander.edu:1Creating default startup script /home/jsloan/.vnc/xstartupStarting applications specified in /home/jsloan/.vnc/xstartupLog file is /home/jsloan/.vnc/lnx1.lander.edu:1.logThe command returns an address or hostname and a display number for the newly created display, inthis instance lnx1.lander.edu:1. (Alternately, you could start the vnc server while seated at themachine and then go to the client. This will be necessary if you want to run the server on a MicrosoftWindows platform.)Next, connect a viewer to the display. To start the viewer on a Unix system, start an X Windowsession and then use the vncviewer command with the host and display number returned by the viewerprogram as an argument to the command. By default, vncserver uses the twm X Window manager, butthis can be reconfigured.[1] If you are used to all the clutter that usually comes with gnome orsomething similar, the display may seem a little austere at first but will perform better. The basicfunctionality you need will be there, and you will be able to run whichever X programs you need. 211 Team-Fly®
  • 222. [1] To change the window manager, edit the file xstartup in the .vnc directory. For example, if you use gnome, you would change twm to exec gnome-session.vnc starts a number of processes; youll want to be sure that they are all stopped when you are done.You can stop vnc with the -kill option as shown here:lnx1$ vncserver -kill :1Killing Xvnc process ID 6171Note that you need to specify only the display number, in this case :1. You should also be aware thatthis sometimes misses a process on some systems. You may need to do a little extra housekeepingnow and then.Once running, vnc supports sending special keystroke combinations such as Ctrl-Alt-Del. If bothsystems support it, you can cut and paste ASCII data between windows.vnc also provides a reasonable level of security. Once the password has been set, it is not transmittedover the network. Rather, a challenge response system is used. In addition to the password, theMicrosoft Windows version of vncserver can be configured to accept connections from only a specificlist of hosts. It can also be configured to use a secure shell (SSH) session. The default port can bereassigned to simplify configuration with firewalls.The viewer and server can be on the same or different machines or can even be used on differentarchitectures. vnc will run on most platforms. In particular, the viewer will run on just about anyMicrosoft Windows machine including Windows CE. It will run under an X Window session, onMacintoshes, and as plug-ins for web browsers. vnc is available in Java, and the server contains asmall web server that can be accessed by some Java-aware browsers. To do this, you simply add 5800to the window number for the HTTP port number. In the previous example, the window was :1, sothe HTTP port number would be :5801, and the URL would be http://lnx1.lander.edu:5801.There is substantial documentation available at the AT&T Laboratories web site,http://www.uk.research.att.com/vnc.11.1.3 sshOne of the problems with telnet, rlogin, rsh, and the like is a lack of security. Passwords are sent inclear text and can be easily captured by any computer they happen to pass. And with the r-services, itcan be very easy to mimic a trusted system. Attach a laptop to the network, set the IP addressappropriately, and there is a good chance you can mimic a trusted host.One alternative is ssh, written by Tatu Ylönen, a replacement for the r-services that uses encryption.While the original version is free, with Version 2 ssh has evolved into a commercial product,marketed by SSH Communications Security, Inc. However, Version 2 is freely available for academicand noncommercial use. Recently, the OpenSSH project, a spin-off of the OpenBSD project, releaseda free port that is compatible with both versions of ssh and is covered by the standard BSD license.ssh is actually a set of programs that uses encryption to both authenticate users and provide encryptedsessions. It provides four levels of authentication, ranging from trusted users and systems, like rsh andrlogin, to RSA-based authentication. By doing host authentication as well as user authentication, DNS,IP, and route spoofing attacks can be circumvented. 212
  • 223. On the downside, ssh provides minimal protection once your systems have been compromised.Version 1 of the SSH protocol has also been compromised by man-in-the-middle attacks whenincorrectly configured. Also, some of its authentication methods can be relatively insecure. ssh is nottrivial to configure correctly, but fortunately, there is a fair amount of documentation available for ssh,including two books devoted exclusively to ssh. If you need particularly robust security, pay closeattention to how you configure it or consider Version 2.The legality of ssh is yet another question. Since encryption is sometimes the subject of peculiar lawsin some countries, using or exporting ssh may not be legal. The OpenBSD and OpenSSH projectsavoid some of these problems by developing code outside of the United States. Consequently, thedistribution of their code is not subject to the United States peculiar munitions export laws since it canbe obtained outside the United States.Despite these concerns, ssh is something you should definitely consider if security is an issue.11.2 Log Files and AuditingA primary source of information on any system is its log files. Of course, log files are not unique tonetworking software. They are simply another aspect of general systems management that you mustmaster.Some applications manage their own log files. Web servers and accounting software are primeexamples. Many of these applications have specific needs that arent well matched to a more generalapproach. In dealing with these, you will have to consult the documentation and deal with each on acase-by-case basis. Fortunately, most Unix software is now designed to use a central logging service,syslog, which greatly simplifies management.11.2.1 syslogYou are probably already familiar with syslog, a versatile logging tool written by Eric Allman. Whatis often overlooked is that syslog can be used across networks. You can log events from your Ciscorouter to your Unix server. There are even a number of Windows versions available. Here is a quickreview of syslog.An early and persistent criticism of Unix was that every application seemed to have its own set of logfiles hidden away in its own directories. syslog was designed to automate and standardize the processof maintaining system log files. The main program is the daemon syslogd, typically started as aseparate process during system initialization. Messages can be sent to the daemon either through a setof library routines or by a user command, logger. logger is particularly useful for logging messagesfrom scripts or for testing syslog, e.g., checking file permissions.11.2.1.1 Configuring syslogsyslogd s behavior is initialized through a configuration file, which by default is /etc/syslog.conf. Analternative file can be specified with the -f option when the daemon is started. If changes are made tothe configuration file, syslogd must be restarted for the changes to take effect. The easiest way to dothis is to send it a HUP signal using the kill command. For example:bsd1# kill -HUP 127 213
  • 224. where 127 is the PID for syslogd, found using the ps command. (Alternately, the PID is written to thefile /var/run/syslogd.pid on some systems.)The configuration file is a text file with two fields separated by tabs, not spaces! Blank lines areignored. Lines beginning with # in the first column are comments. The first field is a selector, and thesecond is an action. The selector identifies the program or facility sending the message. It is composedof both a facility name and a security level. The facility names must be selected from a short list offacilities defined for the kernel. You should consult the manpage for syslogd for a complete list anddescription of facilities, as these vary from implementation to implementation. The security level isalso taken from a predefined list: emerg, alert, crit, err, warning, notice, info, or debug. Theirmeanings are just what you might guess. emerg is the most severe. You can also use * for all or nonefor nothing. Multiple facilities can be combined on a single line if you separate them with commas.Multiple selectors must be separated with semicolons.The Action field tells where to send the messages. Messages can be sent to files, including device filessuch as the console or printers, logged-in users, or remote hosts. Pathnames must be absolute, and thefile must exit with the appropriate permissions. You should be circumspect in sending too much to theconsole. Otherwise, you may be overwhelmed by messages when you are using the console,particularly when you need the console the most. If you want multiple actions, you will need multiplelines in the configuration file.Here are a few lines from a syslog.conf file that should help to clarify this:mail.info /var/log/maillogcron.* /var/log/cronsecurity.* @loghost.netlab.lander.edu*.notice;news.err root*.err /dev/console*.emerg *The first line says that all informational messages from sendmail and other mail related programsshould be appended to the file /var/log/maillog. The second line says all messages from cron,regardless of severity, should be appended to the file /var/log/cron. The next line says that all securitymessages should be sent to a remote system, loghost.netlab.lander.edu. Either a hostname or an IPaddress can be used. The fourth line says that all notice-level messages and any news error messagesshould be sent to root if root is logged on. The next to last line says that all error messages, includingnews error messages, should be displayed on the system console. Finally, the last line says emergencymessages should be sent to all users. It is easy to get carried away with configuration files, soremember to keep yours simple.One problem with syslog on some systems is that, by default, the log files are world readable. This is apotential security hole. For example, if you log mail transactions, any user can determine who issending mail to whom—not necessarily something you want.11.2.1.2 Remote loggingFor anything but the smallest of networks, you really should consider remote logging for two reasons.First, there is simply the issue of managing and checking everything on a number of different systems.If all your log files are on a single system, this task is much easier. Second, should a system becomecompromised, one of the first things crackers alter are the log files. With remote logging, futureentries to log files may be stopped, but you should still have the initial entries for the actual break-in. 214
  • 225. To do remote logging, you will need to make appropriate entries in the configuration files for twosystems. On the system generating the message, youll need to specify the address of the remotelogging machine. On the system receiving the message, youll need to specify a file for the messages.Consider the case in which the source machine is bsd1 and the destination is bsd2. In the configurationfile for bsd1, you might have an entry like:local7.* @bsd2.netlab.lander.edubsd2 s configuration file might have an entry like:local7.* /var/log/bsd1Naming the file for the remote system makes it much easier to keep messages straight. Of course,youll need to create the file and enable bsd2 to receive remote messages from bsd1.You can use the logger command to test your configuration. For example, you might use thefollowing to generate a message:bsd1# logger -p local7.debug "testing"This is what the file looks like on bsd2:bsd2# cat bsd1Dec 26 14:22:08 bsd1 jsloan: testingNotice that both a timestamp and the source of the message have been included in the file.There are a number of problems with remote logging. You should be aware that syslog uses UDP. Ifthe remote host is down, the messages will be lost. You will need to make sure that your firewalls passappropriate syslog traffic. syslog messages are in clear text, so they can be captured and read. Also, itis very easy to forge a syslog message.It is also possible to overwhelm a host with syslog messages. For this reason, some versions of syslogprovide options to control whether information from a remote system is allowed. For example, withFreeBSD the -s option can be used to enter secure mode so logging requests are ignored. Alternately,the -a option can be used to control hosts from which messages are accepted. With some versions ofLinux, the -r option is used to enable a system to receive messages over the network. While you willneed to enable your central logging systems to receive messages, you should probably disable this onall other systems to avoid potential denial-of-service attacks. Be sure to consult the manpage forsyslogd to find the particulars for your system.Both Linux and FreeBSD have other enhancements that you may want to consider. If security is amajor concern, you may want to investigate secure syslog (ssyslog) or modular syslog (msyslog). Forgreater functionality, you may also want to look at syslog-ng.11.2.2 Log File ManagementEven after you have the log files, whether created by syslog or some other program, you will face anumber of problems. The first is keeping track of all the files so they dont fill your filesystem. It iseasy to forget fast-growing files, so I recommend keeping a master list for each system. Youll want todevelop a policy of what information to keep and how long to keep it. This usually comes down to 215
  • 226. some kind of log file rotation system in which older files are discarded or put on archival media. Beaware that what you save and for how long may have legal implications, depending on the nature ofyour organization.Another issue is deciding how much information you want to record in the first place. Many authorsargue, with some justification, that you should record anything and everything that you might want, nomatter how remote the possibility. In other words, it is better to record too much than to discover, afterthe fact, that you dont have something you need. Of course, if you start with this approach, you cancut back as you gain experience.The problem with this approach is that you are likely to be so overwhelmed with data that you wontbe able to find what you need. syslog goes a long way toward addressing this problem with its supportfor different security levels—you can send important messages one place and everything elsesomewhere else. Several utilities are designed to further simplify and automate this process, each withits own set of strengths. These utilities may condense or display log files, often in real time. They canbe particularly useful if you are managing a number of devices.Todd Atkins swatch (simple watcher) is one of the best known. Designed with security monitoring inmind, the program is really suitable to monitor general system activity. swatch can be run in threedifferent ways—making a pass over a log file, monitoring messages as they are appended to a log file,or examining the output from a program. You might scan a log file initially to come up-to-date onyour system, but the second usage is the most common.swatchs actions include ignoring the line, echoing the line on the controlling terminal, ringing the bell,sending the message to someone by write or mail, or executing a command using the line as anargument. Behavior is determined based on a configuration file composed of up to four tab-separatedfields. The first and second fields, the pattern expression and actions, are the most interesting. Thepattern is a regular expression used to match messages. swatch is written in Perl, so the syntax usedfor the regular expressions is fairly straightforward.While it is a powerful program, you are pretty much on your own in setting up the configuration files.Deciding what you will want to monitor is a nontrivial task that will depend on what you think isimportant. Since this could be almost anything—errors, full disks, security problems such as privilegeviolations—youll have a lot of choices if you select swatch. The steps are to decide what is of interest,identify the appropriate files, and then design your filters.swatch is not unique. xlogmaster is a GTK+ based program for monitoring log files, devices, andstatus-gathering programs. It was written by Georg Greve and is available under the GNU GeneralPublic License. It provides filtering and displays selected events with color and audio. Althoughxlogmaster is no longer being developed, it is a viable program that you should consider. Its successoris GNU AWACS. AWACS is new code, currently under development, that expands on the capabilitiesof xlogmaster.Another program worth looking at is logcheck. This began as a shell script written by Craig Rowland.logcheck is now available under the GNU license from Psionic Software, Inc., a company founded byRowland. logcheck can be run by cron rather than continuously.You should be able to find a detailed discussion of log file management in any good book on Unixsystem administration. Be sure to consult Appendix B for more information.11.2.3 Other Approaches to Logging 216
  • 227. Unfortunately, many services traditionally dont do logging, either through the syslog facility orotherwise. If these services are started by inetd, you have a couple of alternatives.Some implementations of inetd have options that will allow connection logging. That is, each time aconnection is made to one of these services, the connection is logged. With inetd on Solaris, the -toption traces all connections. On FreeBSD, the -l option records all successful connections. Theproblem with this approach is that it is rather indiscriminate.One alternative is to replace inetd with Panos Tsirigotiss xinetd. xinetd is an expanded version of inetdthat greatly expands inetd s functionality, particularly with respect to logging. Another program toconsider is tcpwrappers.11.2.3.1 tcpwrappersThe tcpwrappers program was developed to provide additional security, including logging. Written byWietse Venema, a well-respected security expert, tcpwrappers is a small program that sits betweeninetd (or inetd-like programs) and the services started by inetd. When a service is requested, inetdcalls the wrapper program, tcpd, which checks permission files, logs its actions, and then, ifappropriate, starts the service. For example, if you want to control access to telnet, you might changethe line in /etc/inetd.conf that starts the telnet daemon from:telnet stream tcp nowait root /usr/libexec/telnetd telnetdto:telnet stream tcp nowait root /usr/sbin/tcpd telnetdNow, the wrapper daemon tcpd is started initially instead of telnetd, the telnet daemon. Youll need tomake similar changes for each service you want to control. If the service is not where tcpd expects it,you can give an absolute path as an argument to tcpd in the configuration file. Actually, there is an alternative way of configuring tcpwrappers. You can leave the inetd configuration file alone, move each service to a new location, and replace the service at its default location with tcpd. I strongly discourage this approach as it can create maintenance problems, particularly when you upgrade your system.As noted, tcpwrappers is typically used for two functions—logging and access control.[2] Logging isdone through syslog. The particular facility used will depend on how tcpwrappers is compiled.Typically, mail or local2 is used. You will need to edit /etc/syslog.conf and recompile tcpwrappers ifyou want to change how logging is recorded. [2] tcpwrappers provides additional functionality not described here, such as login banners.Access is typically controlled through the file /etc/hosts.allow, though some systems may also have an/etc/hosts.deny file. These files specify which systems can access which services. These are a fewpotential rules based on the example configuration:ALL : localhost : allowsendmail : nice.guy.example.com : allowsendmail : .evil.cracker.example.com : deny 217
  • 228. sendmail : ALL : allowtcpwrappers uses a first match wins approach. The first rule allows all services from the local machinewithout further testing. The next three rules control the sendmail program. The first rule allows aspecific host, nice.guy.example.com. All hosts on the domain .evil.cracker.example.com are blocked.(Note the leading dot.) Finally, all other hosts are permitted to use sendmail.There are a number of other forms for rules that are permitted, but these are all pretty straightforward.The distribution comes with a very nice example file. But, should you have problems, tcpwrapperscomes with two utilities for testing configuration files. tcpdchk looks for general syntax errors withinthe file. tcpdmatch can be used to check how tcpd will respond to a specific action. (Kudos to Venemafor including these!)The primary limitation to tcpwrappers is that, since it disappears after it starts the target service, itscontrol is limited to the brief period while it is running. It provides no protection from attacks thatbegin after that point.tcpwrappers is a ubiquitous program. In fact, it is installed by default on many Linux systems.Incidentally, some versions of inetd now have wrappers technology built-in. Be sure to review yourdocumentation.11.3 NTPOne problem with logging events over a network is that differences in system clocks can makecorrelating events on different systems very difficult. It is not unusual for the clock on a system tohave drifted considerably. Thus, there may be discrepancies among timestamps for the same eventslisted in different log files. Fortunately, there is a protocol you can use to synchronize the clocks onyour system.Network Time Protocol (NTP) provides a mechanism so that one system can compare and adjust itsclock to match another systems clock. Ideally, you should have access to a very accurate clock asyour starting point. In practice, you will have three choices. The best choice is an authoritativereference clock. These devices range from atomic clocks to time servers that set their clocks based ontime signals from radios or GPS satellites.The next best source is from a system that gets its clock setting from one of these reference clocks.Such systems are referred to as stratum 1 servers. If you cant get your signal from a stratum 1 server,the next best choice is to get it from a system that does, a stratum 2 server. As you might guess, thereis a whole hierarchy of servers with the stratum number incrementing with each step you take awayfrom a reference clock. There are public time servers available on the Internet with fairly low stratumnumbers that you can coordinate to occasionally, but courtesy dictates that you ask before using thesesystems.Finally, if you are not attached to the Internet, you can elect to simply designate one of your systemsas the master system and coordinate all your other systems to that system. Your clocks wont be veryaccurate, but they will be fairly consistent, and you will be able to compare system logs.NTP works in one of several ways. You can set up a server to broadcast time messages periodically.Clients then listen for these broadcasts and adjust their clocks accordingly. Alternately, the server can 218
  • 229. be queried by the client. NTP uses UDP, typically port 123. Over the years, NTP has gone throughseveral versions. Version 4 is the current one, but Version 3 is probably more commonly used at thispoint. There is also a lightweight time protocol, Simple Network Time Protocol (SNTP), used byclients that need less accuracy. SNTP is interoperable with NTP.For Unix systems, the most common implementation is ntpd, formerly xntpd, which is described here.This is actually a collection of related programs including the daemon ntpd and support programs suchas ntpq, ntpdate, and ntptrace. Youll want to start ntpd automatically each time you boot your system.ntpd uses a configuration file, /etc/ntp.conf, to control its operation. This configuration file can getquite complicated depending on what you want to do, but a basic configuration file is fairly simple.Here is a simple three-line example:server 205.153.60.20logconfig =syncevents +peerevents +sysevents +allclockdriftfile /etc/ntp.driftThe first line identifies the server. This is the minimum youll need. The second establishes whichevents will be logged. The last line identifies a drift file. This is used by ntpd to store informationabout how the clock on the system drifts. If ntpd is stopped and restarted, it can use the old driftinformation to help keep the clock aligned rather than waiting to calculate new drift information.One minor warning about ntpd is in order. If your clock is too far off, ntpd will not reset it. (Amongother things, this prevents failures from propagating throughout a network.) This is rarely a problemwith computers, but it is not unusual to have a networking device whose clock has never been set. Justremember that you may need to manually set your clock to something reasonable before you run ntpd.ntpdate can be used to do a onetime clock set:bsd2# ntpdate 205.153.60.20 4 Jan 10:07:36 ntpdate[13360]: step time server 205.153.60.20 offset 11.567081secntpdate cannot be used if ntpd is running, but there shouldnt be any need for it if that is the case.ntpq can be used to query servers about their state:bsd2# ntpq -p 172.16.2.1 remote refid st t when poll reach delay offset jitter==============================================================================*ntp.lander.edu .GPS. 1 u 18 64 173 5.000 -1.049 375.210 CHU_AUDIO(1) CHU_AUDIO(1) 7 - 34 64 177 0.000 0.000 125.020 172.16.3.3 0.0.0.0 16 - - 64 0 0.000 0.000 16000.0 172.16.2.2 0.0.0.0 16 u - 64 0 0.000 0.000 16000.0In this example, we have queried a system for a list of its peers.ntptrace can be used to discover the chain of NTP servers, i.e., who gets their signal from whom:bsd2# ntptrace 172.16.2.1NLCisco.netlab.lander.edu: stratum 2, offset 0.009192, synch distance 0.00526ntp.lander.edu: stratum 1, offset 0.007339, synch distance 0.00000, refid GPSOnly two servers were involved in this example, but you should get the basic idea. 219
  • 230. Each of these tools has other features that are documented in their manpages. NTP can be an involvedprotocol if used to its fullest. Fortunately, a lot of documentation is available. Whatever you want—information, software, a list of public NTP servers—the best place to start is athttp://www.eecis.udel.edu/~ntp. The work of Dave Mills and others, this is a remarkable site.11.4 Security ToolsA final group of tools that should not be overlooked is security tools. Security, of course, is anessential part of systems management. While this isnt a book on network security, security is so broada topic that there is considerable overlap with it and the issues addressed in this book. Strictlyspeaking, a number of the tools described in this book (such as portscan, nmap, and tcpwrappers) arefrequently described as security tools.Basically, any tool that provides information about a network has both security implications andmanagement potential. So dont overlook the tools in your security toolbox when addressing othernetworking problems. For example, security scanners like satan, cops, and iss can tell you a lot abouthow your system is configured.One particularly useful group of tools is system integrity checkers. This class of programs tracks thestate of your system and allows you to determine what is changing—such as files, permissions,timestamps. While the security implications should be obvious, management and troubleshootingimplications should also be clear. Often described as tools to identify files that intruders have changed,they can be used to identify files that have been changed or corrupted for any reason. For example,they can be used to determine exactly what is changed when you install a new program.The best known of these is tripwire. It is a considerable stretch to call tripwire a networking tool, butit is an administrative tool that can make managing a system, whether networked or not, much easier.11.4.1 tripwiretripwire was originally written by Eugene Spafford and Gene Kim. It is another product that hasevolved into a commercial product. It is now marketed by Tripwire, Inc. The original free version isstill available at the companys web site as the Academic Source Release. The current version, in aslightly modified form, is also available for free download for Linux. The current version is mucheasier to use, but the older version is usable if you are willing to take the time to learn it.tripwire creates a database of information about files on the system including cryptographicchecksums. A configuration file is used to determine what information is collected and for which filesit is collected. If security is a concern, the collected information should be stored offline to preventtampering.As a security tool, tripwire is used to identify any changes that have been made to a compromised host.It doesnt prevent an attack, but it shows the scope to the attack and changes to the system. As atroubleshooting tool, it can be used to track any changes to a system, regardless of the cause—hacker,virus, or bit rot. It can also be used to verify the integrity of transferred files or the consistency ofconfigurations for multiple installations.If all you want is a checksum, you might consider just using the siggen program, which comes withtripwire. siggen will generate a number of checksums for a file. Here is an example: 220
  • 231. bsd2# siggen siggensig0: nullsig : 0sig1: md5 : 0EpNJLBbf7JJgh1yUdAPgZsig2: snefru : 25I3DS:thJ3N:16UchVdNRsig3: crc32 : 0jeUpKsig4: crc16 : 00056osig5: md4 : 02x6dNiYw7GwjSssW7IeLWsig6: md2 : 30s7ugrC1gLhk129Zo1BXWsig7: sha : EWed2qYLHGcK.i7P7bVDO2mtKvrsig8: haval : 1cqs7t9CwipMcuWPM3eRF1sig9: nullsig : 0You can use an optional argument to limit which checksums you want. For example, the option -13will calculate just the first and third checksums, the MD5 digest and the 32-bit CRC checksum.I certainly wouldnt recommend that you install tripwire just for troubleshooting. But if you haveinstalled it as a security tool, something I would strongly recommend, then dont forget that you canuse it for these other purposes. Incidentally, with some systems, such as OpenBSD, integrity checkingis an integral part of the system.11.5 Microsoft Windows YWhen documenting problems with Windows, the usual approach is to open a word processing file and FLcopy and paste as needed. Unfortunately, some tools, such as Event Viewer, will not allow copying. Ifthis is the case, you should look to see if there is a Save option. With Event Viewer, you can save the AMmessages to a text file and then copy and paste as needed.If this is not possible, you can always get a screen dump. Unfortunately, the way to do this seems tochange with every version of Windows. Typically, if an individual window is selected, only that TEwindow is captured. If a window is not selected, the screen is copied. For Windows 95 and NT, Shift-PrintScreen (or Ctrl-PrintScreen) will capture the contents of the screen, while Alt-PrintScreen willcapture just the current window. For Windows 98, use Alt-PrintScreen. The screen is copied on thesystems clipboard. It can be viewed with ClipBook Viewer. While it is included with the basicWindows distribution, ClipBook Viewer may not be installed on all systems. You may need to go toyour distribution disks to install it. With Windows NT, be sure to select Clipboard on the Windowsmenu. Unfortunately, this gives a bitmapped copy of the screen that is difficult to manipulate, but it isbetter than nothing.As previously noted, vnc is available for Windows. The viewer is a very small program—anexecutable will fit on a floppy so it is very easy to take with you.There are a number of implementations of ssh for Windows. You might look at Metro State College ofDenvers mssh, Simon Tathams putty, or Robert OCallahans ttssh extensions to Takashi Teranishisteraterm communications program. If these dont meet your need, there are a number of similarprograms available over the Web.Although I have not used them, there are numerous commercial, shareware, and freeware versions ofsyslog for Windows. Your best bet is to search the Web for such tools. You might look athttp://www.loop-back.com/syslog.htm or search for kiwis_syslogd.exe. 221 Team-Fly®
  • 232. ntpd can be compiled for Windows NT. Binaries, however, dont seem to be generally available. Ifyou just want to occasionally set your clock, you might also consider cyberkit. cyberkit was describedin Chapter 6. Go to the Time tab, fill in the address of your time server, select the radio button SNTP,make sure the Synchronize Local Clock checkbox is selected, and click on the Go button. The outputwill look something like this:Time - Thursday, December 28, 2000 09:02:59Generated by CyberKit Version 2.5Copyright © 1996-2000 by Luc NeijensTime Server: ntp.netlab.lander.eduProtocol: SNTP ProtocolSynchronize Local Clock: YesLeap Indicator 0, NTP Version 1, Mode 4Stratum Level 1 (Primary reference, e.g. radio clock)Poll Interval 6 (64 seconds), Precision -8 (3.90625 ms)Root Delay 0.00 ms, Root Dispersion 0.00 msReference Identifier GPSTime server clock was last synchronized on Thursday, December 28, 2000 09:02:38Server Date & Time: Thursday, December 28, 2000 09:02:38Delta (Running slow): 1.590 msRound Trip Time 29 msLocal clock synchronized with time serverThe last line is the one of interest. It indicates that synchronization was successful. The help systemincludes directions for creating a shortcut that you can click on to automatically update your clock. Goto the index and look under tips and tricks for adding cyberkit to the startup menu and undercommand-line parameters for time client parameters.A commercial version of tripwire is available for Windows NT. 222
  • 233. Chapter 12. Troubleshooting StrategiesWhile many of the tools described in this book are extremely powerful, no one tool does everything. Ifyou have been downloading and installing these tools as you have read this book, you now have anextensive, versatile set of tools. When faced with a problem, you should be equipped to select the besttool or tools for the particular job, augmenting your selection with other tools as needed.This chapter outlines several strategies that show how these tools can be used together. Whentroubleshooting, your approach should be to look first at the specific task and then select the mostappropriate tool(s) based on the task. I do not describe the details of using the tools or show output inthis chapter. You should already be familiar with these from the previous chapters. Rather, thischapter focuses on the selection of tools and the overall strategy you should take in using them. If youfeel confident in your troubleshooting skills, you may want to skip this chapter.12.1 Generic TroubleshootingAny troubleshooting task is basically a series of steps. The actual steps you take will vary fromproblem to problem. Later steps in the process may depend on the results from earlier steps. Still, it isworth thinking about and mapping out the steps since doing this will help you remain focused andavoid needless steps. In watching others troubleshoot, I have been astonished at how often peopleperform tests with no goal in mind. Often the test has no relation to the problem at hand. It is justsomething easy to do. When your car wont start, what is the point of checking the air pressure of thetires?For truly difficult problems, you will need to become formal and systematic. A somewhat general,standard series of steps you can go through follows, along with a running example. Keep in mind, thisset of steps is only a starting point. 1. Document. Before you do anything else, start documenting what you are doing. This is a real test of willpower and self-discipline. It is extremely difficult to force yourself to sit down and write a problem description or take careful notes when your system is down or crackers are running rampant through your system.[1] This is not just you; everyone has this problem. But it is an essential step for several reasons. [1] Compromised hosts are a special problem requiring special responses. Documentation can be absolutely essential, particularly if you are contemplating legal action or have liability concerns. Documentation used in legal actions has special requirements. For more information you might look at Simson Garfinkel and Gene Spaffords Practical UNIX & Internet Security or visit http://www.cert.org/nav/recovering.html. Depending on your circumstances, management may require a written report. Even if this isnt the usual practice, if an outage becomes prolonged or if there are other consequences, it might become necessary. This is particularly true if there are some legal consequences of the problem. An accurate log can be essential in such cases. If you have a complex problem, you are likely to forget at some point what you have actually done. This often means starting over. It can be particularly frustrating if you appear to have 223
  • 234. found a solution, but you cant remember exactly what you did. A seemingly insignificant step may prove to be a key element in a solution.2. Collect information and identify symptoms. Actually, this step is two intertwined steps. But they are often so intertwined that you usually cant separate them. You must collect information while filtering that information for indications of anomalous behavior. These two steps will be repeated throughout the troubleshooting process. This is easiest when you have a clear sense of direction. As you identify symptoms, try to expand and clarify the problem. If the problem was reported by someone else, then you will want to try to recreate the problem so that you can observe the symptoms directly. Keep in mind, if you cant recognize normal behavior, you wont be able to recognize anomalous behavior. This has been a recurring theme in this book and a reason you should learn how to use these tools before you need them. As an example, the first indication of a problem might be a user complaining that she cannot telnet from host bsd1 to host lnx1. To expand and clarify the problem, you might try different applications. Can you connect using ftp ? You might look to see if bsd1 and lnx1 are on the same network or different networks. You might see if lnx1 can reach bsd1. You might include other local and remote hosts to see the extent of the problem.3. Define the problem. Once you have a clear idea, you can begin coming to terms with the problem. This is not the same as identifying the symptoms but is the process of combining the symptoms and making generalizations. You are looking for common elements that allow you to succinctly describe the anomalous behavior of a system. Your problem definition may go through several refinements. Continuing with the previous problem, you might, over time, generate the following series of problem definitions: o bsd1 cant telnet to lnx1. o bsd1 cant connect to lnx1. o bsd1 cant connect to lnx1, but lnx1 can connect to other hosts including bsd1. o Hosts on the same network as lnx1 cant connect to lnx1. o Hosts on the same network as lnx1 cant connect to lnx1, but hosts on remote networks can connect to lnx1. (Yes, this was a real problem, and no, I didnt get that last one backward.) It is natural to try to define the problem as quickly as possible, but you shouldnt be too tied to your definition. Try to keep an open mind and be willing to redefine your problem as your information changes.4. Identify systems or subsystems involved. As you collect information, as seen in the previous example, you will define and refine not only the nature of the problem, but also the scope of the problem. This is the step in which we divide and hopefully conquer our problem. In this example, we have worked outward from one system to include a number of systems. Usually troubleshooting tries to narrow the scope of the problem, but as seen from this example, in networking just the opposite may happen. You must discover the full scope of the problem before you can narrow your focus. In this running example, realizing that remote connections could connect was a key discovery. 224
  • 235. 5. Develop a testable hypothesis. Of course, what you can test will depend on what tools you have, the rationale for this book. But dont let tools drive your approach. With the definition of the problem and continual refinement comes the generation of the hypotheses as to the cause or nature of the problem. Such generalizations are relatively worthless unless they can be verified. (Remember those lectures on the scientific method in high school?) In this sense, developing a set of tests is more important than having an exact definition of a problem. In many instances, if you know the source of the problem, you can correct it without fully understanding the problem. For example, if you know an Ethernet card is failing, you can replace it without ever worrying about which chip on the card malfunctioned. Im not suggesting that you dont want to understand the problem, but that there are levels of understanding. Your hypotheses must be guided by what you can test. As in science, an untestable hypothesis is worthless. In general, you want tests that will reduce the size of the search space (i.e., identify subsystem involved), that are easy to apply, that do not create further problems, and so on. In our running example, a necessary first step in making a connection is doing address resolution. This suggests that there might be some problem with the ARP mechanism. Notice that this is not a full hypothesis, but rather a point of further investigation. Having expanded the scope of the problem, we are attempting to focus in on subsystems to reduce the problem. Also notice that I havent used any fancy tools up to this point. Keep it simple as long as you can.6. Select and apply tests. Not all tests are created equally. Some will be much easier to apply, while others will provide more information. Determining the optimal order for a set of tests is largely a judgment call. Clearly, the simple tests that answer questions decisively are the best. Returning to our example, there are several ways we could investigate whether the ARP mechanism is functioning correctly. One way would be to use tcpdump or ethereal to capture traffic on the network to see if the ARP requests and responses are present. A simpler test, however, is to use the arp command to see if the appropriate entries are in the ARP cache on the hosts that are trying to connect to lnx1. In this instance, it was observed that the entries were missing from all the hosts attempting to connect to lnx1. The exception was the router on the network that had a much longer cache timeout than did the local hosts. This also explained why remote hosts could connect but local hosts could not connect. The remote hosts always went through the router, which had cached the Ethernet address bypassing the ARP mechanism. Note that this was not a definitive test but was done first because it was much easier.7. Assess results. As you perform tests, you will need to assess the results, refine your tests, and repeat the process. You will want new tests that confirm your results. This is clearly an iterative process. With our extended example, two additional tests were possible. One was to manually add the address of lnx1 to bsd1s ARP table using the arp command. When this was done, connectivity was restored. When the entry was deleted, connectivity was lost. A more revealing but largely unnecessary test using packet-capture software to watch the exchange of packets between the bsd1 and lnx1 revealed that bsd1s ARP requests were being ignored by lnx1.8. Develop and assess solutions. Once you have clearly identified the problem, you must develop and assess possible solutions. With many problems, there will be several possible 225
  • 236. solutions to consider. You should not hastily implement a solution until you have thought out the consequences. With lnx1, solutions ranged from rebooting the system to reinstalling software. I chose the simplest first and rebooted the system. 9. Implement and evaluate your solution. Once you have decided on a solution and have implemented it, you should confirm the proper operation of your system. Depending on the scope of the changes needed, this may mean extensive testing of the system and all related systems. With our running problem, this was not necessary. Connectivity was fully restored when the system was rebooted. What caused the problem? That was never fully resolved, but since the problem never recurred, it really isnt an issue. If restarting the system hadnt solved the problem, what would have been the next step? In this case, the likely problem was corrupted system software. If you are running an integrity checker like tripwire, you might try locating anything that has changed and do a selective reinstallation. Otherwise, you may be faced with reinstalling the operating system.One last word of warning. It is often tempting to seize on an overly complex explanation and ignoresimpler explanations. Frequently, problems really are complex, but not always. It is worth askingyourself if there is a simpler solution. Often, this will save a tremendous amount of time.12.2 Task-Specific TroubleshootingThe guidelines just given are a general or generic overview of troubleshooting. Of course, eachproblem will be different, and you will need to vary your approach as appropriate. The remainder ofthis chapter consists of guidelines for a number of the more common troubleshooting tasks you mightface. It is hoped that these will give you further insight into the process.12.2.1 Installation TestingIronically, one of the best ways to save time and avoid troubleshooting is to take the time to do athorough job of testing when you install software or hardware. You will be testing the system whenyou are most familiar with the installation process, and you will avoid disruptions to service that canhappen when a problem isnt discovered until the software or hardware is in use.This is a somewhat broad interpretation of troubleshooting, but in my experience, there is very littledifference between the testing you will do when you install software and the testing you will do whenyou encounter a problem. Overwhelmingly the only difference for most people is the scope of thetesting done. Most people will test until they believe that a system is working correctly and then stop.Failures, particularly multiple failures, may leave you skeptical, while some people tend to be overlyoptimistic when installing new software.12.2.1.1 Firewall testingBecause of the complexities, firewall testing is an excellent example of the problems that installationtesting may present. Troubleshooting a firewall is a demanding task for several reasons. First, to avoiddisruptions in service, initial firewall testing should be done in an isolated environment before movingon to a production environment. 226
  • 237. Second, you need to be very careful to develop an appropriate set of tests so that you dont leavegaping holes in your security. Youll need to go through a firewall rule by rule. You wont be able tocheck every possibility, but you should be able to test each general type of traffic. For example,consider a rule that passes HTTP traffic to your web server. You will want to pass traffic to port 80 onthat server. If you are taking the approach of denying all traffic that is not explicitly permitted,potentially, you will want to block traffic to that host at all other ports. You will also want to blocktraffic to port 80 on other hosts.[2] Thus, you should develop a set of three tests for this one action.Although there will be some duplicated tests, youll want to take the same approach for each rule.Developing an explicit set of tests is the key step in this type of testing. [2] If you doubt the need for this last test, read RFC 3093, a slightly tongue-in-cheek description of how to use port 80 to bypass a firewall.The first step in testing a firewall is to test the environment in which the firewall will function withoutthe firewall. It can be extraordinarily frustrating to try to debug anomalous firewall behavior only todiscover that you had a routing problem before you began. Thus, the first thing you will want to do isturn off any filtering and test your routing. You could use tools like ripquery to retrieve routing tablesand examine entries, but it is probably much simpler to use ping to check connectivity, assumingICMP ECHO_REQUEST packets arent being blocked. (If this is the case, you might try tools likenmap or hping.)Youll also want to verify that all concomitant software is working. This will include all intrusiondetection software, accounting and logging software, and testing software. For example, youllprobably use packet capture software like tcpdump or ethereal to verify the operation of your firewalland will want to make sure the firewall is working properly. I hate to admit it, but Ive started packetcapture software on a host that I forgot was attached to a switch and banged my head wondering why Iwasnt seeing anything. Clearly, if I had used this setup to make sure packets were blocked withoutfirst testing it, I could have been severely misled.Test the firewall in isolation. If you are adding filtering to a production router, admittedly this is goingto be a problem. The easiest way to test in isolation is to connect each interface to an isolated host thatcan both generate and capture packets. You might use hping, nemesis, or any of the other custompacket generation software discussed in Chapter 9. Work through each of your tests for each rule withthe rule disabled and enabled. Be sure you explicitly document all your tests, particularly the syntax.Once you are convinced that the firewall is working, it is time to move it online. If you can scheduleoffline testing, that is the best approach. Work through your tests again with and without the filtersenabled. If offline testing isnt possible, you can still go through your tests with the filters enabled.Finally, dont forget to come back and go through these tests periodically. In particular, youll want toreevaluate the firewall every time you change rules.12.2.2 Performance Analysis and MonitoringIf a system simply isnt working, then you know troubleshooting is needed. But in many cases, it maynot be clear that you even have a problem. Performance analysis is often the first step to getting ahandle on whether your system is functioning properly. And it is often the case that carefulperformance analysis will identify the problem so that no further troubleshooting is needed.Performance analysis is another management task that hinges on collecting information. It is a taskthat you will never complete, and it is important at every stage in the systems life cycle. The most 227
  • 238. successful network administrator will take a proactive approach, addressing issues before they becomeproblems. Chapter 7 and Chapter 8 discussed the use of specific tools in greater detail.For planning, performance analysis is used to compare systems, establish system requirements, and docapacity planning and forecasting. For management, it provides guidance in configuring and tuningthe system. In particular, the identification of bottlenecks can be essential for management, planning,and troubleshooting.There are three general approaches to performance analysis—analytical modeling, simulations, andmeasurement. Analytical models are mathematical models usually based on queuing theory.Simulations are computer models that attempt to mimic the behavior of the system through computerprograms. Measurement is, of course, the collection of data from an existing network. This book hasfocused primarily on measurement (although simulation tools were mentioned in Chapter 9).Each approach has its role. In practice, there can be a considerable overlap in using these approaches.Analytical models can serve as the basis for simulations, or direct measurements may be needed tosupply parameters used with analytical models or simulations.Measurement has its limitations. Obviously, the system must exist before measurements can be madeso it may not be a viable tool for planning. Measurements tend to produce the most variable results.And many things can go wrong with measurements. On the positive side, measurement carries a greatdeal of authority with most people. When you say you have measured something, this is treated asirrefutable evidence by many, often unjustifiably.12.2.2.1 General stepsMeasuring performance is something of an art. It is much more difficult to decide what to measureand how to make the actual measurements than it might appear at first glance. And there are manyways to waste time collecting data that will not be useful for your purposes.What follows is a fairly informal description of the steps involved in performance analysis. As I saidbefore, listing the steps can be very helpful in focusing attention on some parts of the process thatmight otherwise be ignored.[3] Of course, every situation is different, so these steps are only anapproximation. Designing performance analysis tests is an iterative process. You should go backthrough these steps as you proceed, refining each step as needed. [3] If you would like a more complete discussion of the steps in performance analysis, you should get Raj Jains exceptional book, The Art of Computer Systems Performance Analysis. Jains book considers performance analysis from a broader perspective than this book. 1. State your goal. This is the question you want to answer. At this point, it may be fairly vague, but you will refine it as you progress. You need a sense of direction to get started. A common mistake is to allow a poorly defined goal to remain vague throughout the process, so be sure to revisit this step often. Also, try to avoid goals that bias your approach. For instance, set out to compare systems rather than show that one system is better than another. As an example, a network administrator might ask if the network backbone is adequate to support current levels of traffic. While an extremely important question, it is quite vague at this point. But stating the goal allows you to start focusing on the problem. For example, formally stating this problem may lead you to ask what adequate really means. Or you might go on to consider what the relevant time frame is, i.e., what current means. 228
  • 239. 2. Define your system. The definition of your system will vary with your goal. You will need to decide what parts of the system to include and in what detail. You may want to exclude those parts outside your control. If you are interested in server performance, you will undoubtedly want to consider the various subsystems of the server separately—such as disks, memory, CPU, and network interfaces. With the backbone example, what exactly is the backbone? Certainly it will include equipment such as routers and switches, but does it include servers? If you do include servers, you will want to view the server as a single entity, a source or sink for network traffic perhaps, but not component by component.3. Identify possible outcomes. This step consists of identifying possible answers to the question you want to answer. This is a refinement of Step 1 but should be addressed after the parts of the system are identified. Identifying outcomes establishes the level of your interest, how much detail you might need, and how much work you are going to have to do. You are determining the granularity of your measurements with this step. For example, possible outcomes for the question of backbone performance might be that performance is adequate, that the system suffers minor congestion during the periods of heaviest load, or that the system is usually suffering serious congestion with heavy packet loss. For many purposes, just selecting one of these three answers might be adequate. However, in some cases, you may want a much more descriptive answer. For example, you may want some estimation of the average utilization, maximum utilization, percent of time at maximum utilization, or number of lost packets. Ultimately, the degree of detail required by the answer will determine the scope of the project. You need to make this decision early, or you may have to repeat the project to gather additional information.4. Identify and select what you will measure. Metrics are those system characteristics that can be quantitatively measured. The choice of a metric will depend on the services you are examining. Be careful in your selection. It is often tempting to go with metrics based on how easy the data is to collect rather than on how relevant the data is to the goal. For a network backbone, this might include throughput, delay, utilization, number of packets sent, number of packets discarded, or average packet size.5. If appropriate, identify test parameters and factors.[4] Parameters and factors are characteristics of the system that affect performance that can be changed. Youll change these to see what effect they have on the system. Parameters include both system and load (or traffic) parameters. Try to be as systematic as possible in identifying and evaluating parameters to avoid arbitrary decisions. It is very easy to overlook relevant parameters or include irrelevant ones. [4] Further distinctions between parameters and factors are sometimes made but dont seem relevant when considered solely from the perspective of measurements. For a network backbone, system parameters may include interface speeds and link speeds or the use of load sharing. For traffic, you might use a tool like mgen to add an additional load. But for simple performance measurement, you may elect to change nothing.6. Select tools. Once you have a clear picture of what you want to do, it is time to select the tools of interest. It is all too easy to do this too soon. Dont let the tools you have determine what you are going to do. Tools for backbone performance might include using ntop on a link or SNMP-based tools. 229
  • 240. 7. Establish measurement constraints. On a production network, establishing constraints usually means deciding when and where to make your measurements. You will also need to decide on the frequency and duration of your measurements. This is often more a matter of intuition than engineering. This is something that you will have to do iteratively, adjusting your approach based on the results you get. Unless you have a very compelling reason, measurements should be taken under representative conditions. For backbone performance, for example, router interfaces are the obvious places to look. Server interfaces are another reasonable choice. You may also need to look at individual links as well, particularly in a switched network. You will also need to sample at different times, including in particular those times when the load is heaviest. (Use mrtg or cricket to determine this.) You will need to ensure that your measurements have the appropriate level of detail. If you have isochronous applications, such as video conferencing, that are extremely sensitive to delay, five-minute averages will not provide adequate information. 8. Review your experimental design. Once you have decided what you want to measure and how, you should look back over the process before you begin. Are there any optimizations you can make to minimize the amount of work you will have to do? Will the measurements you make really answer your questions? It is wise to review these questions before you invest large amounts of time. 9. Collect data. The single most important consideration in collecting data is that you adequately document what you are doing. It is an all too common experience to discover that you have a wonderful collection of data, but you dont fully know or remember the circumstances surrounding its collection. Consequently, you dont know how to interpret it. If this happens, the only thing you can do is discard the data and start over. Remember, collecting data is an iterative process. You must examine your results and make adjustments as needed. It is too easy to continue collecting worthless data when even a cursory examination of your data would have revealed you were on the wrong track. 10. Analyze data. Once the data is collected, you must analyze, interpret, and act upon your results. This analysis will, of course, depend heavily on the context and goals of the investigation. But an essential element is to condense the data and extract the needed information, presenting it in a concise form. It is often the case that measurements will create massive amounts of data that are meaningless until carefully analyzed. Dont get too carried away. Often the simplest analyses are of greater value than overly complex analyses. Simple analyses can often be more easily understood. But whatever you conclude, youll need to do it all again. System performance analysis is a never-ending task.12.2.2.2 Bottleneck analysisSince networks are composed of a number of pieces, if the pieces are not well matched, poorperformance may depend on the behavior of a single component. Bottleneck analysis is the process ofidentifying this component.When looking at performance, youll need to be sure you get a complete picture. Generally, onebottleneck will dominate performance statistics. Many systems, however, will have multiplebottlenecks. Its just that one bottleneck is a little worse than the others. Correcting one bottleneck willsimply shift the problem—the bottleneck will move from one component to another. When doingperformance monitoring, your goal should be to discover as many bottlenecks as possible.Often identifying a bottleneck is easy. Once you have a clear picture of your networks architecture,topology, and uses, bottlenecks will be obvious. For example, if 90% of your network traffic is to the 230
  • 241. Internet and you have a gigabit backbone and a 56-Kbps WAN connection, you wont need a carefulanalysis to identify your bottleneck.Identifying bottlenecks is process dependent. What may be a bottleneck for one process may not be aproblem for another. For example, if you are moving small files, the delay in making a connection willbe the primary bottleneck. If you are moving large files, the speed of the link may be more important.Bottleneck analysis is essential in planning because it will tell you what improvements will providethe greatest benefit to your network. The only real way to escape bottlenecks is to grosslyoverengineer your network, not something youll normally want to do. Thus, your goal should not beto completely eliminate bottlenecks but to minimize their impact to the point that they dont cause anyreal problems. Upgrading the network in a way that doesnt address bottlenecks will provide very littlebenefit to the network. If the bottlenecks on your network are a slow WAN connection and slowservers, upgrading from Fast Ethernet to Gigabit Ethernet will be a foolish waste of money. The keyconsideration here is utilization. If you are seeing 25% utilization with Fast Ethernet, dont besurprised to see utilization drop below 3% with Gigabit Ethernet. But you should be aware that even ifthe utilization is low, increasing the capacity of a line will shorten download times for large files.Whether this is worthwhile will depend on your organizations mission and priorities.Here is a rough outline of the steps you might go through to identify a bottleneck: 1. Map your network. The first step is to develop a clear picture of your networks topology. To Y do this, you can use the tools described in Chapter 6. tkined might be a good choice. Often potential bottlenecks are obvious once you have a clear picture of your network. At the very FL least, you may be able to distinguish the parts of the network that are likely to have bottlenecks from parts that dont need to be examined, reducing the work you will have to do. AM 2. Identify time-dependent behavior. The problems bottlenecks cause, unless they are really severe, tend to come and go. The next logical step is to locate the most heavily used devices and the times when they are in greatest use. Youll want to use a tool like mrtg or cricket to identify time-dependent behavior. (Understanding time-dependent behavior can also be TE helpful in identifying when you can work on the problem with the least impact on users.) 3. Pinpoint the problems. At this point, you should have narrowed your focus to a few key parts of the network and a few key times. Now you will want to drill down on specific devices and links. ntop is a likely choice at this point, but any SNMP-based tool may be useful. 4. Select the tool. How you will proceed from here will depend on what you have discovered. It is likely that you will be able to classify the problem as stemming from an edge device, such as a server or a path between devices. Doing so will simplify the decision of what to do next. If you believe the problem lies with a path, you can use the tools described in Chapter 4 to drill down to a specific device or single link. Youll probably want to get an idea of the nature of the traffic over the link. ntop is one choice, or you could use a tool like tcpdump, ethereal, or one of the tools that analyzes tcpdump traffic. For a link device like a router or switch, youll need to look at basic performance. SNMP- based tools are the best choice here. For end devices, you need to look at the performance of the device at each level of the communications architecture. You could use spray to examine the interface performance. For the stack, you might compare the time between SYN and ACK packets with the time between application packets. (Use ethereal or tcpdump to collect this information.) The setup times should be independent of the application, depending only on the stack. If the stack responds quickly and the application doesnt, youll need to focus on the application. 231 Team-Fly®
  • 242. 5. Fix the problem. Once you have an idea of the source of the problem, you can then decide how to deal with it. For poor link performance, you have several choices. You can upgrade the link bandwidth or alter the network topology to change the load on the link. Adding interfaces to a server is one very simple solution. Attaching a server to multiple subnets is a quick way to decrease traffic between those subnets. Policy-based routing is yet another approach. You can use routing priorities to ensure that important traffic is handled preferentially. For an edge device such as an attached server, youll want to distinguish among hardware problems, operating system problems, and application problems, then upgrade accordingly.Bottleneck analysis is something you should do on an ongoing basis. The urgency will depend on userperceptions. If users are complaining, it doesnt matter what the numbers say, you have a problem. Ifusers arent complaining, your analysis is less pressing but should still be done.12.2.2.3 Capacity planningCapacity planning is an extremely important task. Done correctly, it is also an extremely complex anddifficult task, both to learn and to do. But this shouldnt keep you from attempting it. The descriptionhere can best be described as a crude, first-order approximation of capacity planning. But it will giveyou a place to start while you are learning.Capacity planning is really an umbrella that describes several closely related activities. Capacitymanagement is the process of allocating resources in a cost-efficient way. It is concerned with theresources that you currently have. (As you might guess, this is closely related to bottleneck analysis.)Trend analysis is the process of looking at system performance over time, trying to identify how it haschanged in the past with the goal of predicting future changes. Capacity planning attempts to combinecapacity management and trend analysis. The goal is to predict future needs to provide for effectiveplanning.The basic steps are fairly straightforward to describe, just difficult to carry out. First, decide what youneed to measure. That means looking at your system in much the same way you did with bottleneckanalysis but augmenting your analysis with anything you know about the future growth of your system.Youll need to think about your system in context to do this.Next, select appropriate tools to collect the information youll need. (mrtg and cricket are the mostobvious tools among those described in this book, but there are a number of other viable tools if youare willing to do the work to archive the data.) With the tools in place, begin monitoring your system,recording and archiving appropriate data. Deciding what to keep and how to organize it is atremendously difficult problem. Every situation is different. Each situation is largely a question ofbalancing the amount of work involved in keeping the data in an organized and accessible mannerwith the likelihood that you will actually use it. This can come only from experience.Once you have the measurements, you will need to analyze them. In general, focus on areas that showthe greatest change. Collecting and analyzing data will be an iterative process. If little is different fromone measurement to the next, then collect data less frequently. When there is high variability, collectmore often.Finally, youll make your predictions and adjust your system accordingly.There are a number of difficulties in capacity planning. Perhaps the greatest difficulty comes withunanticipated, fundamental changes in the way your network is used. If you will be offering new 232
  • 243. services, predictions based on trends that predate these services will not adequately predict new needs.For example, if you are introducing new technologies such as Internet telephony or video, trendanalysis before the fact will be of limited value. There is a saying that you cant predict how manypeople will use a bridge by counting how many people are currently swimming across the river. If thisis the case, about the best you can do is look to others who have built similar bridges over similarrivers.Another closely related problem is differential growth. If your network, like most, provides a varietyof different services, then they are probably growing at different rates. This makes it very difficult topredict aggregate performance or need if you havent adequately collected data to analyze individualtrends.Yet another difficulty is motivation. The key to trend analysis is keeping adequate records, i.e.,measuring and recording information in a way that makes it accessible and usable. This is difficult formany people since the records wont have much immediate utility. Their worth comes from being ableto look back at them over time for trends. It is difficult to invest the time needed to collect andmaintain this data when there will be no immediate return on the effort and when fundamentalchanges can destroy the utility of the data.You should be aware of these difficulties, but you should not let them discourage you. The cost of notdoing capacity planning is much greater. 233
  • 244. Appendix A. Software SourcesThis appendix begins with a brief discussion of retrieving and installing software tools. It thenprovides a list of potential sources for the software. First I describe several excellent general sourcesfor tools, then I list specific sources.Much of this software requires root privileges and could contain dangerous code. Be sure you get yourcode from reliable sources. Considerable effort has been made to provide canonical sources, but noguarantee can be made for the trustworthiness of the code or the sources listed here. Most of theseprograms are available as FreeBSD ports or Linux packages. I have used them, when available, fortesting for this book.A.1 Installing SoftwareI have not tried to describe how to install individual tools in this book. First, in my experience, a set ofdirections that is accurate for one version of the software may not be accurate for the next version.Even more likely, directions for one operating system may fail miserably for another. This isfrequently true even for different versions of the same operating system. Consequently, trying todevelop a reasonable set of directions for each tool for a variety of operating systems was consideredunfeasible. In general, the best source of information, i.e., the only information that is likely to bereliable, is the information that comes with the software itself. Read the directions!Having said this, I have tried to give some generic directions for installing software. At best, these aremeant to augment the existing directions. They may help clarify matters when the included directionsare a little too brief. These instructions are not meant as replacements.Installing software has gotten much easier in the last few years, thanks in part to several developments.First, GNU configure and build tools have had a tremendous impact in erasing the differences createdby different operating systems. Second, there have been improvements in file transfer andcompression tools as well as increased standardization of the tools used. Finally, several operatingsystems now include mechanisms to automate the process. If you can use these, your life will be muchsimpler. I have briefly described three here—the Solaris package system, the Red Hat packagemanager, and the FreeBSD port system. Please consult the appropriate documentation for the detailsfor each.A.1.1 Generic InstallsHere is a quick review of basic steps you will go through in installing a program. Not every step willbe needed in every case. If you have specific directions for a product, use those directions, not these!(Although slightly dated, a very comprehensive discussion can be found in Porting Unix Software byGreg Lehey.) 1. Locate a reliable, trustworthy source for both the software and directions. Usually, the best sources are listed on a web page managed by the author or her organization. 2. If you can locate directions before you begin, read them first. Typically, basic directions can be found at the softwares home page. Frequently, however, the most complete directions are 234
  • 245. included with the software distribution, so you may need to retrieve and unpack the software to get at these. 3. Download the tool using FTP. You may be able to do this with your web browser. Be certain you use a binary transfer if you are doing this manually. 4. Uncompress the software if needed. If the filename ends with .tgz or .gz, use gunzip. These are the two most common formats, but there are other possibilities. Leheys book contains a detailed list of possibilities and appropriate tools. 5. Use tar to unpack the software if needed, i.e., if the filename ends with .tar. Typically, I use the -xvf options. 6. Read any additional documentation that was included with the distribution. 7. If the file is a precompiled binary, you need only move it to the correct location. In general, it is safer to download the source code and compile it yourself. It is much harder to hide Trojan horses in source code (but not impossible). 8. If you have a very simple utility, you may need to compile it directly. This means calling the compiler with the appropriate options. But for all but the simplest programs, a makefile should be provided. If you see a file named Makefile, you will use the make command to build the program. It may be necessary to customize the Makefile before you can proceed. If you are lucky, the distribution will include a configure script, a file that, when executed, will automatically make any needed changes to the Makefile. Look for this script first. If you dont find it, look back over your directions for any needed changes. If you dont find anything, examine the makefile for embedded directions. If all else fails, you can try running make without making any changes. 9. Finally, you may also need to run make with one or more arguments to finish the installation, e.g., make install to move the files to the appropriate directories or make clean to remove unused files such as object modules after linking. Look at your directions, or look for comments embedded in the makefile.Hopefully each of these steps will be explained in detail in the documentation with the software.A.1.2 Solaris PackagesIn Solaris, packages are directories of the files needed to build or run a program. This is themechanism Sun Microsystems uses to distribute software. If you are installing from a CD-ROM, thefiles will typically be laid out just the way you need them. You will only need to mount the CD-ROMso you can get to them. If you are downloading packages, you will typically need to unpack them first,usually with the tar command. You may want to do this under the default directory /var/spool/pkg, butyou can override this location with command options when installing the package.Once you have the appropriate package on your system, you can use one of several closely relatedcommands to manage it. To install a package, use the pkgadd command. Without any arguments,pkgadd will list the packages on your system and give you the opportunity to select the package ofinterest. Alternately, you can name the package you want to install. You can use the -d option tospecify a different directory.Other commands include the pkgrm command to remove a package, the pkginfo command to displayinformation on which packages are already installed on your system, and pkgchk to check the integrityof the package.For other software in package format, you might begin by looking at http://sunfreeware.com orsearching the Web for Suns university alliance software repositories. Use the string "sunsite" in yoursearch. 235
  • 246. A.1.3 Red Hat Package ManagerDifferent versions of Linux have taken the idea of packages and expanded on it. Several differentpackage formats are available, but the Red Hat format is probably the most common. There areseveral programs for the installation of software in the RPM format. Of these, the Red Hat PackageManager (rpm) is what I generally use. Two other package management tools that provide GUIsinclude glint and gnorpm.First, download the package in question. Then, to install a package, call rpm with the options -ivh andthe name of the package. If all goes well, that is all there is to it. You can use the -e option to remove apackage.A variety of packages come with many Linux distributions. Numerous sites on the Web offerextensive collections of Linux software in RPM format. If you are using Red Hat Linux, tryhttp://www.redhat.com. Many of the repositories will provide you with a list of dependencies, whichyoull need to install first.A.1.4 FreeBSD PortsAnother approach to automating software installation is the port collection approach used by FreeBSD.This, by far, is the easiest approach to use and has been adapted to other systems including OpenBSDand Debian Linux. The FreeBSD port collection is basically a set of directions for installing software.Literally thousands of programs are available.Software is grouped by category in subdirectories in the /usr/ports directory. You change to theappropriate directory for the program of interest and type make install. At that point, you sit back andwatch the magic. The port system will attempt to locate the appropriate file in the /usr/ports/distfilesdirectory. If the file is not there, it will then try downloading the file from an appropriate site via FTP.Usually the port system knows about several sites so, if it cant reach one, it will try another. Once ithas the file, it will calculate and verify a checksum for the file. It next applies appropriate patches andchecks dependencies. It will automatically install other ports as needed. Once everything is in place, itwill compile the software. Finally, it installs the software and documentation. When it works, which isalmost always, it is simply extraordinary. The port collection is an installation option with FreeBSD.Alternately, you can visit http://www.freebsd.org. The process is described in the FreeBSD Handbook.When evaluating a new piece of software, I have the luxury of testing the software on several differentplatforms. In general, I find the FreeBSD port system the easiest approach to use. If I have troublewith a FreeBSD port, Ill look for a Linux package next. If that fails, I generally go to a generic sourceinstall. In my experience, Solaris packages tend to be hard to find.A.2 Generic SourcesThe Cooperative Association for Internet Data Analysis (CAIDA) maintains an extensive listing ofmeasurement tools on the Web. The page at http://www.caida.org/tools/measurement has a number oftables grouping tools by function. Brief descriptions of each tool, including links to relevant sites,follow the tables. This listing includes both free and commercial tools and seems to be updated on aregular basis. Another CAIDA page, http://www.caida.org/tools/taxonomy/, provides a listing of toolsby taxonomy. 236
  • 247. Another web site maintaining a list of network-monitoring tools ishttp://www.slac.stanford.edu/xorg/nmtf/nmtf-tools.html. In general, there are several collaborativeInternet measurement projects that regularly introduce or discuss measurement tools. These includeCAIDA and the Stanford Linear Accelerator Center (SLAC), among others.Other sites that you might want to look at include those that develop tools, such ashttp://moat.nlanr.net, http://www-nrg.ee.lbl.gov/, and http://www.merit.edu. Dont forget specialpurpose sites. Security sites like http://www.cert.org and http://www.ciac.org/ciac/ may have links touseful tools. Keep your eyes open.Finally, several RFCs discuss tools. The most comprehensive is RFC 1470. Unfortunately, it is quitedated. RFC 1713, also somewhat dated, deals with DNS tools, and RFC 2398 deals with tools fortesting TCP implementation.A.3 LicensesAlthough some commercial software has been mentioned, this book has overwhelmingly focused onfreely available software. But "freely available" is a very vague expression that covers a lot of ground.At one extreme is software that is released without any restrictions whatsoever. You can use it as yousee fit, modify it, and, in some cases, even try to sell your enhanced versions. Most of the softwaredescribed here, however, comes with some limitations on what you can do with it, particularly withrespect to reselling it.Some of this software is freely available to some classes of users but not to others. For example, somesoftware distinguishes between commercial and noncommercial users or between commercial andacademic users. For some of the tools, binaries are available, but source code is either not available orrequires a license. Some of the software exists in multiple forms. For example, there may be both freeand commercial versions of a tool. Other tools restrict what you do with them. For example, you maybe free to use the tool, but you may be expected to share any improvements you make.You should also be aware that licensing may change over time. It is not uncommon for a tool to movefrom the free category to the commercial category, particularly as new, improved versions are released.This seems to be a fairly common business model.I have not attempted to describe the licensing for individual tools. I am not a lawyer and do not fullyunderstand all the subtleties of license agreements. Different licenses will apply to differentorganizations in different ways. In some cases, such as when encryption is involved, differentcountries have laws that impact licenses in unusual ways. Finally, license agreements change sofrequently, anything I write could be inaccurate by the time you read this.The bottom line, then, is that you should be sure to check appropriate licensing agreements wheneveryou retrieve any software. Ultimately, it is your responsibility to ensure that your use of these tools ispermissible.A.4 Sources for Tools 237
  • 248. This section gives basic information on each tool discussed in this book. I have not included built-intools like ps. The tools are listed alphabetically. I have tried to make a note of which tools are specificto Windows, but I did not list Windows tools separately, since many tools are available for both Unixand Windows.A few tools discussed in the book, particularly older tools, seem to have no real home but may beavailable in some archives. This is generally an indication that the tool is fading into oblivion andshould be used as a last alternative. (Some of these tools, however, are alive and well as Linuxpackages or FreeBSD ports.) While I was writing this book, a number of home pages for toolschanged. Also, several of the sites seem to be down more than they are up. I have supplied the mostrecent information I have, but many of the tools will have m