Network troubleshoots

9,186 views
9,106 views

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
9,186
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
79
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Network troubleshoots

  1. 1. Network Troubleshooting Tools By Joseph D. Sloan Publisher : OReilly Pub Date : August 2001 ISBN : 0-596-00186-XTable of Pages : 364Contents Network Troubleshooting Tools helps you sort through the thousands of tools that have been developed for debugging TCP/IP networks and choose the ones that are best for your needs. It also shows you how to approach network troubleshooting using these tools, how to document your network so you know how it behaves under normal conditions, and how to think about problems when they arise so you can solve them more effectively. Y FL AM TE Team-Fly®
  2. 2. Table of Content Table of Content ........................................................................................................... ii Preface........................................................................................................................... v Audience................................................................................................................... vi Organization............................................................................................................. vi Conventions ............................................................................................................. ix Acknowledgments ................................................................................................... ix Chapter 1. Network Management and Troubleshooting ........................................ 1 1.1 General Approaches to Troubleshooting....................................................... 1 1.2 Need for Troubleshooting Tools...................................................................... 3 1.3 Troubleshooting and Management................................................................. 5 Chapter 2. Host Configurations................................................................................ 14 2.1 Utilities ............................................................................................................... 15 2.2 System Configuration Files ............................................................................ 27 2.3 Microsoft Windows .......................................................................................... 32 Chapter 3. Connectivity Testing............................................................................... 35 3.1 Cabling .............................................................................................................. 35 3.2 Testing Adapters.............................................................................................. 40 3.3 Software Testing with ping............................................................................. 41 3.4 Microsoft Windows .......................................................................................... 54 Chapter 4. Path Characteristics ............................................................................... 56 4.1 Path Discovery with traceroute...................................................................... 56 4.2 Path Performance............................................................................................ 62 4.3 Microsoft Windows .......................................................................................... 77 Chapter 5. Packet Capture ....................................................................................... 79 5.1 Traffic Capture Tools ...................................................................................... 79 5.2 Access to Traffic .............................................................................................. 80 5.3 Capturing Data ................................................................................................. 81 5.4 tcpdump............................................................................................................. 82 5.5 Analysis Tools .................................................................................................. 93 5.6 Packet Analyzers ............................................................................................. 99 5.7 Dark Side of Packet Capture ....................................................................... 103 5.8 Microsoft Windows ........................................................................................ 105 Chapter 6. Device Discovery and Mapping.......................................................... 107 6.1 Troubleshooting Versus Management ....................................................... 107 6.2 Device Discovery ........................................................................................... 109 6.3 Device Identification ...................................................................................... 115 6.4 Scripts.............................................................................................................. 119 6.5 Mapping or Diagramming............................................................................. 121 6.6 Politics and Security...................................................................................... 125 6.7 Microsoft Windows ........................................................................................ 126 Chapter 7. Device Monitoring with SNMP............................................................ 128 7.1 Overview of SNMP ........................................................................................ 128 7.2 SNMP-Based Management Tools .............................................................. 132 ii
  3. 3. 7.3 Non-SNMP Approaches ............................................................................... 154 7.4 Microsoft Windows ........................................................................................ 154Chapter 8. Performance Measurement Tools ..................................................... 158 8.1 What, When, and Where .............................................................................. 158 8.2 Host-Monitoring Tools................................................................................... 159 8.3 Point-Monitoring Tools.................................................................................. 160 8.4 Network-Monitoring Tools ............................................................................ 167 8.5 RMON.............................................................................................................. 176 8.6 Microsoft Windows ........................................................................................ 179Chapter 9. Testing Connectivity Protocols ........................................................... 184 9.1 Packet Injection Tools................................................................................... 184 9.2 Network Emulators and Simulators ............................................................ 193 9.3 Microsoft Windows ........................................................................................ 195Chapter 10. Application-Level Tools ..................................................................... 197 10.1 Application-Protocols Tools ....................................................................... 197 10.2 Microsoft Windows ...................................................................................... 208Chapter 11. Miscellaneous Tools .......................................................................... 209 11.1 Communications Tools ............................................................................... 209 11.2 Log Files and Auditing ................................................................................ 213 11.3 NTP................................................................................................................ 218 11.4 Security Tools .............................................................................................. 220 11.5 Microsoft Windows ...................................................................................... 221Chapter 12. Troubleshooting Strategies............................................................... 223 12.1 Generic Troubleshooting............................................................................ 223 12.2 Task-Specific Troubleshooting.................................................................. 226Appendix A. Software Sources .............................................................................. 234 A.1 Installing Software......................................................................................... 234 A.2 Generic Sources............................................................................................ 236 A.3 Licenses.......................................................................................................... 237 A.4 Sources for Tools .......................................................................................... 237Appendix B. Resources and References ............................................................. 250 B.1 Sources of Information ................................................................................. 250 B.2 References by Topic..................................................................................... 253 B.3 References ..................................................................................................... 256Colophon ................................................................................................................... 259 iii
  4. 4. Copyright © 2001 OReilly & Associates, Inc. All rights reserved.Printed in the United States of America.Published by OReilly & Associates, Inc., 101 Morris Street, Sebastopol, CA 95472.Nutshell Handbook, the Nutshell Handbook logo, and the OReilly logo are registered trademarks ofOReilly & Associates, Inc. Many of the designations used by manufacturers and sellers to distinguishtheir products are claimed as trademarks. Where those designations appear in this book, and OReilly& Associates, Inc. was aware of a trademark claim, the designations have been printed in caps orinitial caps. The association between the image of a basilisk and network troubleshooting is atrademark of OReilly & Associates, Inc.While every precaution has been taken in the preparation of this book, the publisher assumes noresponsibility for errors or omissions, or for damages resulting from the use of the informationcontained herein. iv
  5. 5. PrefaceThis book is not a general introduction to network troubleshooting. Rather, it is about one aspect oftroubleshooting—information collection. This book is a tutorial introduction to tools and techniquesfor collecting information about computer networks. It should be particularly useful when dealingwith network problems, but the tools and techniques it describes are not limited to troubleshooting.Many can and should be used on a regular basis regardless of whether you are having problems.Some of the tools I have selected may be a bit surprising to many. I strongly believe that the bestapproach to troubleshooting is to be proactive, and the tools I discuss reflect this belief. Basically, ifyou dont understand how your network works before you have problems, you will find it verydifficult to diagnose problems when they occur. Many of the tools described here should be usedbefore you have problems. As such, these tools could just as easily be classified as networkmanagement or network performance analysis tools.This book does not attempt to catalog every possible tool. There are simply too many tools alreadyavailable, and the number is growing too rapidly. Rather, this book focuses on the tools that I believeare the most useful, a collection that should help in dealing with almost any problem you see. I havetried to include pointers to other relevant tools when there wasnt space to discuss them. In many cases,I have described more than one tool for a particular job. It is extremely rare for two tools to haveexactly the same features. One tool may be more useful than another, depending on circumstances.And, because of the differences in operating systems, a specific tool may not be available on everysystem. It is worth knowing the alternatives.The book is about freely available Unix tools. Many are open source tools covered by GNU- or BSD-style licenses. In selecting tools, my first concern has been availability. I have given the highestpriority to the standard Unix utilities. Next in priority are tools available as packages or ports forFreeBSD or Linux. Tools requiring separate compilation or available only as binaries were given alower priority since these may be available on fewer systems. In some cases, PC-only tools andcommercial tools are noted but are not discussed in detail. The bulk of the book is specific to Ethernetand TCP/IP, but the general approach and many of the tools can be used with other technologies.While this is a book about Unix tools, at the end of most of the chapters I have included a brief sectionfor Microsoft Windows users. These sections are included since even small networks usually include afew computers running Windows. These sections are not, even in the wildest of fantasies, meant to bedefinitive. They are provided simply as starting points—a quick overview of what is available.Finally, this book describes a wide range of tools. Many of these tools are designed to do one thingand are often overlooked because of their simplicity. Others are extremely complex tools or sets oftools. I have not attempted to provide a comprehensive treatment for each tool discussed. Some ofthese tools can be extremely complex when used to their fullest. Some have manuals and otherdocumentation that easily exceed the size of this book. Most have additional documentation that youwill want to retrieve once you begin using them.My goal is to make you aware of the tools and to provide you with enough information that you candecide which ones may be the most useful to you and in what context so that you can get started usingthe tools. Each chapter centers on a collection of related tasks or problems and tools useful for dealingwith these tasks. The discussion is limited to features that are relevant to the problem being discussed.Consequently, the same tool may be discussed in several places throughout the book. v
  6. 6. Please be warned: the suitability or behavior of these tools on your system cannot be guaranteed.While the material in this book is presented in good faith, neither the author nor OReilly & Associatesmakes any explicit or implied warranty as to the behavior or suitability of these tools. We stronglyurge you to assess and evaluate these tool as appropriate for your circumstances.AudienceThis book is written primarily for individuals new to network administration. It should also be usefulto those of you who have inherited responsibility for existing systems and networks set up by others.This book is designed to help you acquire the additional information you need to do your job.Unfortunately, the book may also appeal to crackers. I truly regret this and wish there were a way topresent this material to limit its worth to crackers. I never met a system manager or networkadministrator who wasnt overworked. Time devoted to security is time stolen from providing newservices to users or improving existing services. There simply is no valid justification for cracking. Ican only hope that the positive uses for the information I provide will outweigh the inevitablemalicious uses to which it may be put. I would feel much better if crackers would forego buying thisbook.In writing this book, I attempted to write the sort of book I often wished I had when I was learning.Certainly, there are others who are more knowledgeable and better prepared to write this book. Butthey never seemed to get around to it. They have written pieces of this book, a chapter here or atutorial there, for which I am both immensely thankful and greatly indebted.I see this book as a work in progress. I hope that the response to it will make future expanded editionspossible. You can help by sending me your comments and corrections. I would particularly like tohear about new tools and about how you have used the tools described here to solve your problems.Perhaps some of the experts who should have written this book will share their wisdom! While I cantpromise to respond to your email, I will read it. You can contact me through OReilly Book Support atbooktech@oreilly.com.OrganizationThere are 12 chapters and 2 appendixes in this book. The book begins with individual network hosts,discusses network connections next, and then considers networks as a whole.It is unlikely that every chapter in the book will be of equal interest to you. The following outline willgive you an overview of the book so you can select the chapters of greatest interest and either skim orskip over the rest.Chapter 1 This chapter attempts to describe network management and troubleshooting in an administrative context. It discusses the need for network analysis and probing tools, their appropriate and inappropriate uses, professionalism in general, documentation practices, and vi
  7. 7. the economic ramifications of troubleshooting. If you are familiar with the general aspects of network administration, you may want to skip this chapter.Chapter 2 Chapter 2 is a review of tools and techniques used to configure or determine the configuration of a networked host. The primary focus is on built-in utilities. If you are well versed in Unix system administration, you can safely skip this chapter.Chapter 3 Chapter 3 describes tools and techniques to test basic point-to-point and end-to-end network connectivity. It begins with a brief discussion of cabling. A discussion of ping, ping variants, and problems with ping follows. Even if you are very familiar with ping, you may want to skim over the discussion of the ping variants.Chapter 4 This chapter focuses on assessing the nature and quality of end-to-end connections. After a discussion of traceroute, a tool for decomposing a path into individual links, the primary focus is on tools that measure link performance. This chapter covers some lesser known tools, so even a seasoned network administrator may find a few useful tools and tricks.Chapter 5 This chapter describes tools and techniques for capturing traffic on a network, primarily tcpdump and ethereal, although a number of other utilities are briefly mentioned. Using this chapter requires the greatest understanding of Internet protocols. But, in my opinion, this is the most important chapter in the book. Skip it at your own risk.Chapter 6 This chapter begins with a general discussion of management tools. It then focuses on a few tools, such as nmap and arpwatch, that are useful in piecing together information about a network. After a brief discussion of network management extensions provided for Perl and Tcl/Tk, it concludes with a discussion of route and network discovery using tkined.Chapter 7 Chapter 7 focuses on device monitoring. It begins with a brief review of SNMP. Next, a discussion of NET SNMP (formerly UCD SNMP) demonstrates the basics of SNMP. The chapter continues with a brief description of using scotty to collect SNMP information. Finally, it describes additional features of tkined, including network monitoring. In one sense, this chapter is a hands-on tutorial for using SNMP. If you are not familiar with SNMP, you will definitely want to read this chapter.Chapter 8 This chapter is concerned with monitoring and measuring network behavior over time. The stars of this chapter are ntop and mrtg. I also briefly describe using SNMP tools to retrieve vii
  8. 8. RMON data. This chapter assumes that you have a thorough knowledge of SNMP. If you dont, go back and read Chapter 7.Chapter 9 This chapter describes several types of tools for examining the behavior of low-level connectivity protocols, protocols at the data link and network levels, including tools for custom packet generation and load testing. The chapter concludes with a brief discussion of emulation and simulation tools. You probably will not use these tools frequently and can safely skim this chapter the first time through.Chapter 10 Chapter 10 looks at several of the more common application-level protocols and describes tools that may be useful when you are faced with a problem with one of these protocols. Unless you currently face an application-level problem, you can skim this chapter for now.Chapter 11 This chapter describes a number of different tools that are not really network troubleshooting or management tools but rather are tools that can ease your life as a network administrator. Youll want to read the sections in this chapter that discuss tools you arent already familiar with.Chapter 12 When dealing with a complex problem, no single tool is likely to meet all your needs. This last chapter attempts to show how the different tools can be used together to troubleshoot and analyze performance. No new tools are introduced in this chapter. Arguably, this chapter should have come at the beginning of the book. I included it at the end so that I could name specific tools without too many forward references. If you are familiar with general troubleshooting techniques, you can safely skip this chapter. Alternately, if you need a quick review of troubleshooting techniques and dont mind references to tools you arent familiar with, you might jump ahead to this chapter.Appendix A This appendix begins with a brief discussion of installing software and general software sources. This discussion is followed by an alphabetical listing of those tools mentioned in this book, with Internet addresses when feasible. Beware, many of the URLs in this section will be out of date by the time you read this. Nonetheless, these URLs will at least give you a starting point on where to begin looking.Appendix B This appendix begins with a discussion of different sources of information. Next, it discusses books by topic, followed by an alphabetical listing of those books mentioned in this book. viii
  9. 9. ConventionsThis book uses the following typographical conventions:Italics For program names, filenames, system names, email addresses, and URLs and for emphasizing new terms when first definedConstant width In examples showing the output from programs, the contents of files, or literal informationConstant-width italics General syntax and items that should be replaced in expressions Indicates a tip, suggestion, or general note. Indicates a warning or caution.AcknowledgmentsThis book would not have been possible without the help of many people. First on the list are thetoolsmiths who created the tools described here. The number and quality of the tools that are availableis truly remarkable. We all owe a considerable debt to the people who selflessly develop these tools.I have been very fortunate that many of my normal duties have overlapped significantly with tasksrelated to writing this book. These duties have included setting up and operating Lander Universitysnetworking laboratory and evaluating tools for use in teaching. For their help with the laboratory, Igratefully acknowledge Landers Department of Computing Services, particularly Anthony Aven,Mike Henderson, and Bill Screws. This laboratory was funded in part by a National ScienceFoundation grant, DUE-9980366. I gratefully acknowledge the support the National ScienceFoundation has given to Lander. I have also benefited from conversations with the students andfaculty at Lander, particularly Jim Crabtree. I would never have gotten started on this project withoutthe help and encouragement of Jerry Wilson. Jerry, I owe you lunch (and a lot more).This book has benefited from the help of numerous people within the OReilly organization. Inparticular, the support given by Robert Denn, Mike Loukides, and Rob Romano, to name only a few,has been exceptional. After talking with authors working with other publishers, I consider myself veryfortunate in working with technically astute people from the start. If you are thinking about writing atechnical book, OReilly is a publisher to consider. ix
  10. 10. The reviewers for this book have done an outstanding job. Thanks go to John Archie, Anthony Aven,Jon Forrest, and Kevin and Diana Mullet. They cannot be faulted for not turning a sows ear into a silkpurse.It seems every author always acknowledges his or her family. It has almost become a cliché, but thatdoesnt make it any less true. This book would not have been possible without the support andpatience of my family, who have endured more that I should have ever asked them to endure. Thankyou. x
  11. 11. Chapter 1. Network Management and TroubleshootingThe first step in diagnosing a network problem is to collect information. This includes collectinginformation from your users as to the nature of the problems they are having, and it includes collectingdata from your network. Your success will depend, in large part, on your efficiency in collecting thisinformation and on the quality of the information you collect. This book is about tools you can use andtechniques and strategies to optimize their use. Rather than trying to cover all aspects oftroubleshooting, this book focuses on this first crucial step, data collection.There is an extraordinary variety of tools available for this purpose, and more become available daily.Very capable people are selflessly devoting enormous amounts of time and effort to developing thesetools. We all owe a tremendous debt to these individuals. But with the variety of tools available, it iseasy to be overwhelmed. Fortunately, while the number of tools is large, data collection need not beoverwhelming. A small number of tools can be used to solve most problems. This book centers on acore set of freely available tools, with pointers to additional tools that might be needed in somecircumstances.This first chapter has two goals. Although general troubleshooting is not the focus of the book, itseems worthwhile to quickly review troubleshooting techniques. This review is followed by anexamination of troubleshooting from a broader administrative context—using troubleshooting tools inan effective, productive, and responsible manner. This part of the chapter includes a discussion of Ydocumentation practices, personnel management and professionalism, legal and ethical concerns, and FLeconomic considerations. General troubleshooting is revisited in Chapter 12, once we have discussedavailable tools. If you are already familiar with these topics, you may want to skim or even skip thischapter. AM TE1.1 General Approaches to TroubleshootingTroubleshooting is a complex process that is best learned through experience. This section looksbriefly at how troubleshooting is done in order to see how these tools fit into the process. But whileevery problem is different, a key step is collecting information.Clearly, the best way to approach troubleshooting is to avoid it. If you never have problems, you willhave nothing to correct. Sound engineering practices, redundancy, documentation, and training canhelp. But regardless of how well engineered your system is, things break. You can avoidtroubleshooting, but you cant escape it.It may seem unnecessary to say, but go for the quick fixes first. As long as you dont fixate on them,they wont take long. Often the first thing to try is resetting the system. Many problems can beresolved in this way. Bit rot, cosmic rays, or the alignment of the planets may result in the systementering some strange state from which it cant exit. If the problem really is a fluke, resetting thesystem may resolve the problem, and you may never see it again. This may not seem very satisfying,but you can take your satisfaction in going home on time instead.Keep in mind that there are several different levels in resetting a system. For software, you can simplyrestart the program, or you may be able to send a signal to the program so that it reloads itsinitialization file. From your users perspective, this is the least disruptive approach. Alternately, you 1 Team-Fly®
  12. 12. might restart the operating system but without cycling the power, i.e., do a warm reboot. Finally, youmight try a cold reboot by cycling the power.You should be aware, however, that there can be some dangers in resetting a system. For example, itis possible to inadvertently make changes to a system so that it cant reboot. If you realize you havedone this in time, you can correct the problem. Once you have shut down the system, it may be toolate. If you dont have a backup boot disk, you will have to rebuild the system. These are, fortunately,rare circumstances and usually happen only when you have been making major changes to a system.When making changes to a system, remember that scheduled maintenance may involve restarting asystem. You may want to test changes you have made, including their impact on a system reset, priorto such maintenance to ensure that there are no problems. Otherwise, the system may fail whenrestarted during the scheduled maintenance. If this happens, you will be faced with the difficult task ofdeciding which of several different changes are causing problems.Resetting the system is certainly worth trying once. Doing it more than once is a different matter. Withsome systems, this becomes a way of life. An operating system that doesnt provide adequate memoryprotection will frequently become wedged so that rebooting is the only option.[1] Sometimes you maywant to limp along resetting the system occasionally rather than dealing with the problem. In auniversity setting, this might get you through exam week to a time when you can be more relaxed inyour efforts to correct the underlying problem. Or, if the system is to be replaced in the near future,the effort may not be justified. Usually, however, when rebooting becomes a way of life, it is time formore decisive action. [1] Do you know what operating system Im tactfully not naming?Swapping components and reinstalling software is often the next thing to try. If you have the sparecomponents, this can often resolve problems immediately. Even if you dont have spares, switchingcomponents to see if the problem follows the equipment can be a simple first test. Reinstallingsoftware can be much more problematic. This can often result in configuration errors that will worsenproblems. The old, installed version of the software can make getting a new, clean installationimpossible. But if the install is simple or you have a clear understanding of exactly how to configurethe software, this can be a relatively quick fix.While these approaches often work, they arent what we usually think of as troubleshooting. Youcertainly dont need the tools described in this book to do them. Once you have exhausted the quicksolutions, it is time to get serious. First, you must understand the problem, if possible. Problems thatare not understood are usually not fixed, just postponed.One standard admonition is to ask the question "has anything changed recently?" Overwhelmingly,most problems relate to changes to a working system. If you can temporarily change things back andthe problem goes away, you have confirmed your diagnosis.Admittedly, this may not help with an installation where everything is new. But even a newinstallation can and should be grown. Pieces can be installed and tested. New pieces of equipment canthen be added incrementally. When this approach is taken, the question of what has changed onceagain makes sense.Another admonition is to change only one thing at a time and then to test thoroughly after each change.This is certainly good advice when dealing with routine failures. But this approach will not apply ifyou are dealing with a system failure. (See the upcoming sidebar on system failures.) Also, if you dofind something that you know is wrong but fixing it doesnt fix your problem, do you really want to 2
  13. 13. change it back? In this case, it is often better to make a note of the additional changes you have madeand then proceed with your troubleshooting.A key element to successful debugging is to control the focus of your investigation so that you arereally dealing with the problem. You can usually focus better if you can break the problem into pieces.Swapping components, as mentioned previously, is an example of this approach. This technique isknown by several names—problem decomposition, divide and conquer, binary search, and so on. Thisapproach is applicable to all kinds of troubleshooting. For example, when your car wont start, firstdecide whether you have an electrical or fuel supply problem. Then proceed accordingly. Chapter 12outlines a series of specific steps you might want to consider. System FailuresThe troubleshooting I have described so far can be seen roughly as dealing with normalfailures (although there may be nothing terribly normal about them). A second general classof problems is known as system failures. System failures are problems that stem from theinteraction of the parts of a complex system in unexpected ways. They are most often seenwhen two or more subsystems fail at about the same time and in ways that interact.However, system failures can result through interaction of subsystems without anyostensible failure in any of the subsystems.A classic example of a system failure can be seen in the movie China Syndrome. In onescene the reactor scrams, the pumps shut down, and the water-level indicator on a strip-chart recorder sticks. The water level in the reactor becomes dangerously low due to thepump shutdown, but the problem is not recognized because the indicator gives misleadinginformation. These two near-simultaneous failures conceal the true state of the reactor.System failures are most pernicious in systems with tight coupling between subsystems andsubsystems that are linked in nonlinear or nonobvious ways. Debugging a system failurecan be extremely difficult. Many of the more standard approaches simply dont work. Thestrategy of decomposing the system into subsystems becomes difficult, because thesymptoms misdirect your efforts. Moreover, in extreme cases, each subsystem may beoperating correctly—the problem stems entirely from the unexpected interactions.If you suspect you have a system failure, the best approach, when feasible, is to substituteentire subsystems. Your goal should not be to look for a restored functioning system, but tolook for changes in the symptoms. Such changes indicate that you may have found one ofthe subsystems involved. (Conversely, if you are working with a problem and the symptomschange when a subsystem is replaced, this is strong indication of a system failure.)Unfortunately, if the problem stems from unexpected interaction of nonfailing systems,even this approach will not work. These are extremely difficult problems to diagnose. Eachproblem must be treated as a unique, special problem. But again, an important first step iscollecting information.1.2 Need for Troubleshooting Tools 3
  14. 14. The best time to prepare for problems is before you have them. It may sound trite, but if you dontunderstand the normal behavior of your network, you will not be able to identify anomalous behavior.For the proper management of your system, you must have a clear understanding of the currentbehavior and performance of your system. If you dont know the kinds of traffic, the bottlenecks, orthe growth patterns for your network, then you will not be able to develop sensible plans. If you dontknow the normal behavior, you will not be able to recognize a problems symptoms when you seethem. Unless you have made a conscious, aggressive effort to understand your system, you probablydont understand it. All networks contain surprises, even for the experienced administrator. You onlyhave to look a little harder.It might seem strange to some that a network administrator would need some of the tools described inthis book, and that he wouldnt already know the details that some of these tools provide. But there area number of reasons why an administrator may be quite ignorant of his network.With the rapid growth of the Internet, turnkey systems seem to have grown in popularity. Afundamental assumption of these systems is that they are managed by an inexperienced administratoror an administrator who doesnt want to be bothered by the details of the system. Documentation isalmost always minimal. For example, early versions of Sun Microsystems Netra Internet servers, bydefault, did not install the Unix manpages and came with only a few small manuals. Print serviceswere disabled by default.This is not a condemnation of turnkey systems. They can be a real blessing to someone who needs togo online quickly, someone who never wants to be bothered by such details, or someone who canoutsource the management of her system. But if at some later time she wants to know what herturnkey system is doing, it may be up to her to discover that for herself. This is particularly likely ifshe ever wants to go beyond the basic services provided by the system or if she starts having problems.Other nonturnkey systems may be customized, often heavily. Of course, all these changes should becarefully documented. However, an administrator may inherit a poorly documented system. (And, ofcourse, sometimes we do this to ourselves.) If you find yourself in this situation, you will need todiscover (or rediscover) your system for yourself.In many organizations, responsibilities may be highly partitioned. One group may be responsible forinfrastructure such as wiring, another for network hardware, and yet another for software. In someenvironments, particularly universities, networks may be a distributed responsibility. You may havevery little control, if any, over what is connected to the network. This isnt necessarily bad—its theway universities work. But rogue systems on your network can have annoying consequences. In thissituation, probably the best approach is to talk to the system administrator or user responsible for thesystem. Often he will be only too happy to discuss his configuration. The implications of what he isdoing may have completely escaped him. Developing a good relationship with power users may giveyou an extra set of eyes on your network. And, it is easier to rely on the system administrator to tellyou what he is doing than to repeatedly probe the network to discover changes. But if this fails, as itsometimes does, you may have to resort to collecting the data yourself.Sometimes there may be some unexpected, unauthorized, or even covert changes to your network.Well-meaning individuals can create problems when they try to help you out by installing equipmentthemselves. For example, someone might try installing a new computer on the network by copying thenetwork configuration from another machine, including its IP address. At other times, some "volunteeradministrator" simply has her own plans for your network.Finally, almost to a person, network administrators must teach themselves as they go. Consequently,for most administrators, these tools have an educational value as well as an administrative value. They 4
  15. 15. provide a way for administrators to learn more about their networks. For example, protocol analyzerslike ethereal provide an excellent way to learn the inner workings of a protocol like TCP/IP. Often,more than one of these reasons may apply. Whatever the reason, it is not unusual to find yourselfreading your configuration files and probing your systems.1.3 Troubleshooting and ManagementTroubleshooting does not exist in isolation from network management. How you manage yournetwork will determine in large part how you deal with problems. A proactive approach tomanagement can greatly simplify problem resolution. The remainder of this chapter describes severalimportant management issues. Coming to terms with these issues should, in the long run, make yourlife easier.1.3.1 DocumentationAs a new administrator, your first step is to assess your existing resources and begin creating newresources. Software sources, including the tools discussed in this book, are described and listed inAppendix A. Other sources of information are described in Appendix B.The most important source of information is the local documentation created by you or yourpredecessor. In a properly maintained network, there should be some kind of log about the network,preferably with sections for each device. In many networks, this will be in an abysmal state. Almostno one likes documenting or thinks he has the time required to do it. It will be full of errors, out ofdate, and incomplete. Local documentation should always be read with a healthy degree of skepticism.But even incomplete, erroneous documentation, if treated as such, may be of value. There areprobably no intentional errors, just careless mistakes and errors of omission. Even flaweddocumentation can give you some sense of the history of the system. Problems frequently occur due tomultiple conflicting changes to a system. Software that may have been only partially removed canhave lingering effects. Homegrown documentation may be the quickest way to discover what mayhave been on the system.While the creation and maintenance of documentation may once have been someone elsesresponsibility, it is now your responsibility. If you are not happy with the current state of yourdocumentation, it is up to you to update it and adopt policies so the next administrator will not bemuttering about you the way you are muttering about your predecessors.There are a couple of sets of standard documentation that, at a minimum, you will always want tokeep. One is purchase information, the other a change log. Purchase information includes salesinformation, licenses, warranties, service contracts, and related information such as serial numbers. Aninventory of equipment, software, and documentation can be very helpful. When you unpack a system,you might keep a list of everything you receive and date all documentation and software. (Achangeable rubber date stamp and ink pad can help with this last task.) Manufacturers can do a poorjob of distinguishing one version of software and its documentation from the next. Dates can behelpful in deciding which version of the documentation applies when you have multiple systems orupgrades. Documentation has a way of ending up in someones personal library, never to be seen again,so a list of what you should have can be very helpful at times.Keep in mind, there are a number of ways software can enter your system other than through purchaseorders. Some software comes through CD-ROM subscription services, some comes in over the 5
  16. 16. Internet, some is bundled with the operating system, some comes in on a CD-ROM in the back of abook, some is brought from home, and so forth. Ideally, you should have some mechanism to tracksoftware. For example, for downloads from the Internet, be sure to keep a log including a listidentifying filenames, dates, and sources.You should also keep a change log for each major system. Record every significant change or problemyou have with the system. Each entry should be dated. Even if some entries no longer seem relevant,you should keep them in your log. For instance, if you have installed and later removed a piece ofsoftware on a server, there may be lingering configuration changes that you are not aware of that maycome to haunt you years later. This is particularly true if you try to reinstall the program but couldeven be true for a new program as well.Beyond these two basic sets of documentation, you can divide the documentation you need to keepinto two general categories—configuration documentation and process documentation. Configurationdocumentation statically describes a system. It assumes that the steps involved in setting up the systemare well understood and need no further comments, i.e., that configuration information is sufficient toreconfigure or reconstruct the system. This kind of information can usually be collected at any time.Ironically, for that reason, it can become so easy to put off that it is never done.Process documentation describes the steps involved in setting up a device, installing software, orresolving a problem. As such, it is best written while you are doing the task. This creates a differentset of collection problems. Here the stress from the task at hand often prevents you from documentingthe process.The first question you must ask is what you want to keep. This may depend on the circumstances andwhich tools you are using. Static configuration information might include lists of IP addresses andEthernet addresses, network maps, copies of server configuration files, switch configuration settingssuch as VLAN partitioning by ports, and so on.When dealing with a single device, the best approach is probably just a simple copy of theconfiguration. This can be either printed or saved as a disk file. This will be a personal choice basedon which you think is easiest to manage. You dont need to waste time prettying this up, but be sureyou label and date it.When the information spans multiple systems, such as a list of IP addresses, management of the databecomes more difficult. Fortunately, much of this information can be collected automatically. Severaltools that ease the process are described in subsequent chapters, particularly in Chapter 6.For process documentation, the best approach is to log and annotate the changes as you make themand then reconstruct the process at a later time. Chapter 11 describes some of the common Unixutilities you can use to automate documentation. You might refer to this chapter if you arent familiarwith utilities like tee, script, and xwd.[2] [2] Admittedly these guidelines are ideals. Does anyone actually do all of this documenting? Yes, while most administrators probably dont, some do. But just because many administrators dont succeed in meeting the ideal doesnt diminish the importance of trying.1.3.2 Management PracticesA fundamental assumption of this book is that troubleshooting should be proactive. It is preferable toavoid a problem than have to correct it. Proper management practices can help. While some of thissection may, at first glance, seem unrelated to troubleshooting, there are fundamental connections. 6
  17. 17. Management practices will determine what you can do and how you do it. This is true both foravoiding problems and for dealing with problems that cant be avoided. The remainder of this chapterreviews some of the more important management issues.1.3.2.1 ProfessionalismTo effectively administer a system requires a high degree of professionalism. This includes personalhonesty and ethical behavior. You should learn to evaluate yourself in an honest, objective manner.(See The Peter Principle Revisited.) It also requires that you conform to the organizations mission andculture. Your network serves some higher purpose within your organization. It does not exist strictlyfor your benefit. You should manage the network with this in mind. This means that everything youdo should be done from the perspective of a cost-benefit trade-off. It is too easy to get caught in thetrap of doing something "the right way" at a higher cost than the benefits justify. Performance analysisis the key element.The organizations mind-set or culture will have a tremendous impact on how you approach problemsin general and the use of tools in particular. It will determine which tools you can use, how you canuse the tools, and, most important, what you can do with the information you obtain. Withinorganizations, there is often a battle between openness and secrecy. The secrecy advocate believes thatdetails of the network should be available only on a need-to-know basis, if then. She believes, notwithout justification, that this enhances security. The openness advocate believes that the details of asystem should be open and available. This allows users to adapt and make optimal use of the systemand provides a review process, giving users more input into the operation of the network.Taken to an extreme, the secrecy advocate will suppress information that is needed by the user,making a system or network virtually unusable. Openness, taken to an extreme, will leave a networkvulnerable to attack. Most peoples views fall somewhere between these two extremes but often favorone position over the other. I advocate prudent openness. In most situations, it makes no sense to shutdown a system because it might be attacked. And it is asinine not to provide users with the informationthey need to protect themselves. Openness among those responsible for the different systems within anorganization is absolutely essential.1.3.2.2 Ego managementWe would all like to think that we are irreplaceable, and that no one else could do our jobs as well aswe do. This is human nature. Unfortunately, some people take steps to make sure this is true. Themost obvious way an administrator may do this is hide what he actually does and how his systemworks.This can be done many ways. Failing to document the system is one approach—leaving comments outof code or configuration files is common. The goal of such an administrator is to make sure he is theonly one who truly understands the system. He may try to limit others access to a system by restrictingaccounts or access to passwords. (This can be done to hide other types of unprofessional activities aswell. If an administrator occasionally reads other users email, he may not want anyone else to havestandard accounts on the email server. If he is overspending on equipment to gain experience withnew technologies, he will not want any technically literate people knowing what equipment he isbuying.)This behavior is usually well disguised, but it is extremely common. For example, a technician mayinsist on doing tasks that users could or should be doing. The problem is that this keeps usersdependent on the technician when it isnt necessary. This can seem very helpful or friendly on the 7
  18. 18. surface. But, if you repeatedly ask for details and dont get them, there may be more to it than meetsthe eye.Common justifications are security and privacy. Unless you are in a management position, there isoften little you can do other than accept the explanations given. But if you are in a managementposition, are technically competent, and still hear these excuses from your employees, beware! Youhave a serious problem.No one knows everything. Whenever information is suppressed, you lose input from individuals whodont have the information. If an employee cant control her ego, she should not be turned loose onyour network with the tools described in this book. She will not share what she learns. She will onlyuse it to further entrench herself.The problem is basically a personnel problem and must be dealt with as such. Individuals in technicalareas seem particularly prone to these problems. It may stem from enlarged egos or from insecurity.Many people are drawn to technical areas as a way to seem special. Alternately, an administrator maysee information as a source of power or even a weapon. He may feel that if he shares the information,he will lose his leverage. Often individuals may not even recognize the behavior in themselves. It isjust the way they have always done things and it is the way that feels right.If you are a manager, you should deal with this problem immediately. If you cant correct the problemin short order, you should probably replace the employee. An irreplaceable employee today will beeven more irreplaceable tomorrow. Sooner or later, everyone leaves—finds a better job, retires, orruns off to Poughkeepsie with an exotic dancer. In the meantime, such a person only becomes moreentrenched making the eventual departure more painful. It will be better to deal with the problem nowrather than later.1.3.2.3 Legal and ethical considerationsFrom the perspective of tools, you must ensure that you use tools in a manner that conforms not just tothe policies of your organization, but to all applicable laws as well. The tools I describe in this bookcan be abused, particularly in the realm of privacy. Before using them, you should make certain thatyour use is consistent with the policies of your organization and all applicable laws. Do you have theappropriate permission to use the tools? This will depend greatly on your role within the organization.Do not assume that just because you have access to tools that you are authorized to use them. Norshould you assume that any authorization you have is unlimited.Packet capture software is a prime example. It allows you to examine every packet that travels acrossa link, including applications data and each and every header. Unless data is encrypted, it can bedecoded. This means that passwords can be captured and email can be read. For this reason alone, youshould be very circumspect in how you use such tools.A key consideration is the legality of collecting such information. Unfortunately, there is a constantlychanging legal morass with respect to privacy in particular and technology in general. Collecting somedata may be legitimate in some circumstances but illegal in others.[3] This depends on factors such asthe nature of your operations, what published policies you have, what assurances you have given yourusers, new and existing laws, and what interpretations the courts give to these laws. [3] As an example, see the CERT Advisory CA-92.19 Topic: Keystroke Logging Banner at http://www.cert.org/advisories/CA-1992-19.html for a discussion on keystroke logging and its legal implications. 8
  19. 19. It is impossible for a book like this to provide a definitive answer to the questions such considerationsraise. I can, however, offer four pieces of advice: • First, if the information you are collecting can be tied to the activities of an individual, you should consider the information highly confidential and should collect only the information that you really need. Be aware that even seemingly innocent information may be sensitive in some contexts. For example, source/destination address pairs may reveal communications between individuals that they would prefer not be made public. • Second, place your users on notice. Let them know that you collect such information, why it is necessary, and how you use the information. Remember, however, if you give your users assurances as to how the information is used, you are then constrained by those assurances. If your management policies permit, make their prior acceptance of these policies a requirement for using the system. • Third, you must realize that with monitoring comes obligations. In many instances, your legal culpability may be less if you dont monitor. • Finally, dont rely on this book or what your colleagues say. Get legal advice from a lawyer who specializes in this area. Beware: many lawyers will not like to admit that they dont know everything about the law, but many arent current with the new laws relating to technology. Also, keep in mind that even if what you are doing is strictly legal and you have appropriate authority, your actions may still not be ethical. The Peter Principle RevisitedIn 1969, Laurence Peter and Raymond Hull published the satirical book, The PeterPrinciple. The premise of the book was that people rise to their level of incompetence. Forexample, a talented high school teacher might be promoted to principal, a job requiring aquite different set of skills. Even if ill suited for the job, once she has this job, she willprobably remain with it. She just wont earn any new promotions. However, if she is adeptat the job, she may be promoted to district superintendent, a job requiring yet another set ofskills. The process of promotions will continue until she reaches her level of incompetence.At that point, she will spend the remainder of her career at that level.While hardly a rigorous sociological principle, the book was well received because itcontained a strong element of truth. In my humble opinion, the Peter Principle usually failsmiserably when applied to technical areas such as networking and telecommunications. Theproblem is the difficulty in recognizing incompetence. If incompetence is not recognized,then an individual may rise well beyond his level of incompetence. This often happens intechnical areas because there is no one in management who can judge an individualstechnical competence.Arguably, unrecognized incompetence is usually overengineering. Networking, a field ofengineering, is always concerned with trade-offs between costs and benefits. Anunderengineered network that fails will not go unnoticed. But an overengineered networkwill rarely be recognizable as such. Such networks may cost many times what they should,drawing resources from other needs. But to the uninitiated, it appears as a normal,functioning network.If a network engineer really wants the latest in new equipment when it isnt needed, who,outside of the technical personnel, will know? If this is a one-person department, or if all themembers of the department can agree on what they want, no one else may ever know. It is 9
  20. 20. too easy to come up with some technical mumbo jumbo if they are ever questioned.If this seems far-fetched, I once attended a meeting where a young engineer was arguingthat a particular router needed to be replaced before it became a bottleneck. He had pickedout the ideal replacement, a hot new box that had just hit the market. The problem with allthis was that I had recently taken measurements on the router and knew the averageutilization of that "bottleneck" was less than 5% with peaks that rarely hit 40%.This is an extreme example of why collecting information is the essential first step innetwork management and troubleshooting. Without accurate measurements, you can easilyspend money fixing imaginary problems.1.3.2.4 Economic considerationsSolutions to problems have economic consequences, so you must understand the economicimplications of what you do. Knowing how to balance the cost of the time used to repair a systemagainst the cost of replacing a system is an obvious example. Cost management is a more generalissue that has important implications when dealing with failures.One particularly difficult task for many system administrators is to come to terms with the economicsof networking. As long as everything is running smoothly, the next biggest issue to upper managementwill be how cost effectively you are doing your job. Unless you have unlimited resources, when youoverspend in one area, you take resources from another area. One definition of an engineer that Iparticularly like is that "an engineer is someone who can do for a dime what a fool can do for adollar." My best guess is that overspending and buying needlessly complex systems is the single mostcommon engineering mistake made when novice network administrators purchase network equipment.One problem is that some traditional economic models do not apply in networking. In mostengineering projects, incremental costs are less than the initial per-unit cost. For example, if a 10,000-square-foot building costs $1 million, a 15,000-square-foot building will cost somewhat less than $1.5million. It may make sense to buy additional footage even if you dont need it right away. This isjustified as "buying for the future."This kind of reasoning, when applied to computers and networking, leads to waste. Almost no onewould go ahead and buy a computer now if they wont need it until next year. Youll be able to buy abetter computer for less if you wait until you need it. Unfortunately, this same reasoning isnt appliedwhen buying network equipment. People will often buy higher-bandwidth equipment than they need,arguing that they are preparing for the future, when it would be much more economical to buy onlywhat is needed now and buy again in the future as needed.Moores Law lies at the heart of the matter. Around 1965, Gordon Moore, one of the founders of Intel,made the empirical observation that the density of integrated circuits was doubling about every 12months, which he later revised to 24 months. Since the cost of manufacturing integrated circuits isrelatively flat, this implies that, in two years, a circuit can be built with twice the functionality with noincrease in cost. And, because distances are halved, the circuit runs at twice the speed—a fourfoldimprovement. Since the doubling applies to previous doublings, we have exponential growth.It is generally estimated that this exponential growth with chips will go on for another 15 to 20 years.In fact, this growth is nothing new. Raymond Kurzweil, in The Age of Spiritual Machines: WhenComputers Exceed Human Intelligence, collected information on computing speeds and functionalityfrom the beginning of the twentieth century to the present. This covers mechanical, electromechanical 10
  21. 21. (relay), vacuum tube, discrete transistor, and integrated circuit technologies. Kurzweil found thatexponential growth has been the norm for the last hundred years. He believes that new technologieswill be developed that will extend this rate of growth well beyond the next 20 years. It is certainly truethat we have seen even faster growth in disk densities and fiber-optic capacity in recent years, neitherof which can be attributed to semiconductor technology.What does this mean economically? Clearly, if you wait, you can buy more for less. But usually,waiting isnt an option. The real question is how far into the future should you invest? If the price iscoming down, should you repeatedly buy for the short term or should you "invest" in the long term?The general answer is easy to see if we look at a few numbers. Suppose that $100,000 will provideyou with network equipment that will meet your anticipated bandwidth needs for the next four years.A simpleminded application of Moores Law would say that you could wait and buy similarequipment for $25,000 in two years. Of course, such a system would have a useful life of only twoadditional years, not the original four. So, how much would it cost to buy just enough equipment tomake it through the next two years? Following the same reasoning, about $25,000. If your growth istracking the growth of technology,[4] then two years ago it would have cost $100,000 to buy four yearsworth of technology. That will have fallen to about $25,000 today. Your choice: $100,000 now or$25,000 now and $25,000 in two years. This is something of a no-brainer. It is summarized in the firsttwo lines of Table 1-1.[4] This is a pretty big if, but its reasonable for most users and organizations. Most users and organizationshave selected a point in the scheme of things that seems right for them—usually the latest technology they Ycan reasonably afford. This is why that new computer you buy always seems to cost $2500. You are buying FLthe latest in technology, and you are trying to reach about the same distance into the future. Table 1-1. Cost estimates AM Year 1 Year 2 Year 3 Year 4 TotalFour-year plan $100,000 $0 $0 $0 $100,000Two-year plan $25,000 $0 $25,000 $0 $50,000 TEFour-year plan with maintenance $112,000 $12,000 $12,000 $12,000 $148,000Two-year plan with maintenance $28,000 $3,000 $28,000 $3,000 $62,000Four-year plan with maintenance and 20% MARR $112,000 $10,000 $8,300 $6,900 $137, 200Two-year plan with maintenance and 20% MARR $28,000 $2,500 $19,500 $1,700 $51,700If this argument isnt compelling enough, there is the issue of maintenance. As a general rule of thumb,service contracts on equipment cost about 1% of the purchase price per month. For $100,000, that is$12,000 a year. For $25,000, this is $3,000 per year. Moores Law doesnt apply to maintenance forseveral reasons: • A major part of maintenance is labor costs and these, if anything, will go up. • The replacement parts will be based on older technology and older (and higher) prices. • The mechanical parts of older systems, e.g., fans, connectors, and so on, are all more likely to fail. • There is more money to be made selling new equipment so there is no incentive to lower maintenance prices.Thus, the $12,000 a year for maintenance on a $100,000 system will cost $12,000 a year for all fouryears. The third and fourth lines of Table 1-1 summarize these numbers. 11 Team-Fly®
  22. 22. Yet another consideration is the time value of money. If you dont need the $25,000 until two yearsfrom now, you can invest a smaller amount now and expect to have enough to cover the costs later. Sothe $25,000 needed in two years is really somewhat less in terms of todays dollars. How much lessdepends on the rate of return you can expect on investments. For most organizations, this number iscalled the minimal acceptable rate of return (MARR). The last two lines of Table 1-1 use a MARR of20%. This may seem high, but it is not an unusual number. As you can see, buying for the future ismore than two and a half times as expensive as going for the quick fix.Of course, all this is a gross simplification. There are a number of other important considerations evenif you believe these numbers. First and foremost, Moores Law doesnt always apply. The mostimportant exception is infrastructure. It is not going to get any cheaper to pull cable. You should takethe time to do infrastructure well; thats where you really should invest in the future.Most of the other considerations seem to favor short-term investing. First, with short-term purchasing,you are less likely to invest in dead-end technology since you are buying later in the life cycle and willhave a clearer picture of where the industry is going. For example, think about the difference twoyears might have made in choosing between Fast Ethernet and ATM for some organizations. For thesame reason, the cost of training should be lower. You will be dealing with more familiar technology,and there will be more resources available. You will have to purchase and install equipment moreoften, but the equipment you replace can be reused in your networks periphery, providing additionalsavings.On the downside, the equipment you buy wont have a lot of excess capacity or a very long, usefullifetime. It can be very disconcerting to nontechnical management when you keep replacingequipment. And, if you experience sudden unexpected growth, this is exactly what you will need to do.Take the time to educate upper management. If frequent changes to your equipment are particularlydisruptive or if you have funding now, you may need to consider long-term purchases even if they aremore expensive. Finally, dont take the two-year time frame presented here too literally. Youlldiscover the appropriate time frame for your network only with experience.Other problems come when comparing plans. You must consider the total economic picture. Dontlook just at the initial costs, but consider ongoing costs such as maintenance and the cost of periodicreplacement. As an example, consider the following plans. Plan A has an estimated initial cost of$400,000, all for equipment. Plan B requires $150,000 for equipment and $450,000 for infrastructureupgrades. If you consider only initial costs, Plan A seems to be $200,000 cheaper. But equipmentneeds to be maintained and, periodically, replaced. At 1% per month, the equipment for Plan A wouldcost $48,000 a year to maintain, compared to $18,000 per year with Plan B. If you replace equipmenta couple of times in the next decade, that will be an additional $800,000 for Plan A but only $300,000for Plan B. As this quick, back-of-the-envelope calculation shows, the 10-year cost for Plan A was$1.68 million, while only $1.08 million for Plan B. What appeared to be $200,000 cheaper was really$600,000 more expensive. Of course, this was a very crude example, but it should convey the idea.You shouldnt take this example too literally either. Every situation is different. In particular, you maynot be comfortable deciding what is adequate surplus capacity in your network. In general, however,you are probably much better off thinking in terms of scalability than raw capacity. If you want tohedge your bets, you can make sure that high-speed interfaces are available for the router you areconsidering without actually buying those high-speed interfaces until needed.How does this relate to troubleshooting? First, dont buy overly complex systems you dont really need.They will be much harder to maintain, as you can expect the complexity of troubleshooting to growwith the complexity of the systems you buy. Second, dont spend all your money on the system and 12
  23. 23. forget ongoing maintenance costs. If you dont anticipate operational costs, you may not have thefunds you need. 13
  24. 24. Chapter 2. Host ConfigurationsThe goal of this chapter is to review system administration from the perspective of the individual hostson a network. This chapter presumes that you have a basic understanding of system administration.Consequently, many of the more basic issues are presented in a very cursory manner. The intent ismore to jog your memory, or to fill an occasional gap, than to teach the fundamentals of systemadministration. If you are new to system administration, a number of the books listed in Appendix Bprovide excellent introductions. If, on the other hand, you are a knowledgeable system administrator,you will probably want to skim or even skip this chapter.Chapter 1 lists several reasons why you might not know the details of your network and the computerson it. This chapter assumes that you are faced with a networked computer and need to determine orreconstruct its configuration. It should be obvious that if you dont understand how a system isconfigured, you will not be able to change its configuration or correct misconfigurations. The toolsdescribed in this chapter can be used to discover or change a hosts configuration.As discussed in Chapter 1, if you have documentation for the system, begin with it. The assumptionhere is that such documentation does not exist or that it is incomplete. The primary focus is networkconfiguration, but many of the techniques can easily be generalized.If you have inherited a multiuser system that has been in service for several years with manyundocumented customizations, reconstructing its configuration can be an extremely involved andextended process. If your system has been compromised, the intruder has taken steps to hide heractivity, and you arent running an integrity checker like tripwire, it may be virtually impossible todiscover all her customizations. (tripwire is discussed briefly in Chapter 11.) While it may not befeasible, you should at least consider reinstalling the system from scratch. While this may seemdraconian, it may ultimately be much less work than fighting the same battles over and over, as oftenhappens with compromised systems. The best way to do this is to set up a replacement system inparallel and then move everyone over. This, of course, requires a second system.If rebuilding the system is not feasible, or if your situation isnt as extreme as that just described, thenyou can use the techniques described in this chapter to reconstruct the systems configuration.Whatever your original motivation, you should examine your systems configuration on a regular basis.If for no other reason, this will help you remember how your system is configured. But there are otherreasons as well. As you learn more, you will undoubtedly want to revisit your configuration to correctproblems, improve security, and optimize performance. Reviewing configurations is a necessary stepto ensure that your system hasnt been compromised. And, if you share management of a system, youmay be forced to examine the configuration whenever communications falter.Keep a set of notes for each system, giving both the configuration and directions for changing theconfiguration. Usually the best place to start is by constructing a list of what can be found where in thevendor documentation you have. This may seem pointless since this information is in thedocumentation. But the information you need will be spread throughout this documentation. Youwont want to plow through everything every time you need to check or change something. You mustcreate your own list. I frequently write key page numbers inside the front covers of manuals andspecifics in the margins throughout the manual. For example, Ill add device names to the manpagesfor the mount command, something I always seem to need but often cant remember. (Be warned thatthis has the disadvantage of tying manuals to specific hardware, which could create other problems.) 14
  25. 25. When reconstructing a hosts configuration, there are two basic approaches. One is to examine thesystems configuration files. This can be a very protracted approach. It works well when you knowwhat you are looking for and when you are looking for a specific detail. But it can be difficult toimpossible to find all the details of the system, particularly if someone has taken steps to hide them.And some parameters are set dynamically and simply cant be discovered just from configuration files.The alternative is to use utilities designed to give snapshots of the current state of the system.Typically, these focus on one aspect of the system, for example, listing all open files. Collectively,these utilities can give you a fairly complete picture. They tend to be easy to use and give answersquickly. But, because they may focus on only one aspect of the system, they may not provide all theinformation you need if used in isolation.Clearly, by itself, neither approach is totally adequate. Where you start will depend in part on howquickly you must be up to speed and what specific problems you are facing. Each approach will bedescribed in turn.2.1 UtilitiesReviewing system configuration files is a necessary step that you will have to address before you canclaim mastery of a system. But this can be a very time-consuming step. It is very easy to overlook oneor more key files. If you are under time pressure to resolve a problem, configuration files are not thebest place to start.Even if you plan to jump into the configuration files, you will probably want a quick overview of thecurrent state of the system before you begin. For this reason, we will examine status and configurationutilities first. This approach has the advantage of being pretty much the same from one version ofUnix to the next. With configuration files, the differences among the various flavors of Unix can bestaggering. Even when the files have the same functionality and syntax, they can go by differentnames or be in different directories. Certainly, using these utilities is much simpler than looking atkernel configuration files. The output provided by these utilities may vary considerably from system to system and will depend heavily on which options are used. In practice, this should present no real problem. Dont be alarmed if the output on your system is formatted differently.2.1.1 psThe first thing any system administrator should do on a new system is run the ps command. You areprobably already familiar with ps so I wont spend much time on it. The ps command lists whichprocesses are running on the system. Here is an example:bsd4# ps -auxUSER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMANDroot 6590 22.0 2.1 924 616 ?? R 11:14AM 0:09.80 inetd: chargen [2root 1 0.0 0.6 496 168 ?? Ss Fri09AM 0:00.03 /sbin/init --root 2 0.0 0.0 0 0 ?? DL Fri09AM 0:00.52 (pagedaemon)root 3 0.0 0.0 0 0 ?? DL Fri09AM 0:00.00 (vmdaemon)root 4 0.0 0.0 0 0 ?? DL Fri09AM 0:44.05 (syncer) 15
  26. 26. root 100 0.0 1.7 820 484 ?? Ss Fri09AM 0:02.14 syslogddaemon 109 0.0 1.5 828 436 ?? Is Fri09AM 0:00.02 /usr/sbin/portmaproot 141 0.0 2.1 924 616 ?? Ss Fri09AM 0:00.51 inetdroot 144 0.0 1.7 980 500 ?? Is Fri09AM 0:03.14 cronroot 150 0.0 2.8 1304 804 ?? Is Fri09AM 0:02.59 sendmail: acceptiroot 173 0.0 1.3 788 368 ?? Is Fri09AM 0:01.84 moused -p /dev/psroot 213 0.0 1.8 824 508 v1 Is+ Fri09AM 0:00.02 /usr/libexec/gettroot 214 0.0 1.8 824 508 v2 Is+ Fri09AM 0:00.02 /usr/libexec/gettroot 457 0.0 1.8 824 516 v0 Is+ Fri10AM 0:00.02 /usr/libexec/gettroot 6167 0.0 2.4 1108 712 ?? Ss 4:10AM 0:00.48 telnetdjsloan 6168 0.0 0.9 504 252 p0 Is 4:10AM 0:00.09 -sh (sh)root 6171 0.0 1.1 464 320 p0 S 4:10AM 0:00.14 -su (csh)root 0 0.0 0.0 0 0 ?? DLs Fri09AM 0:00.17 (swapper)root 6597 0.0 0.8 388 232 p0 R+ 11:15AM 0:00.00 ps -auxIn this example, the first and last columns are the most interesting since they give the owners and theprocesses, along with their arguments. In this example, the lines, and consequently the arguments,have been truncated, but this is easily avoided. Running processes of interest include portmap, inetd,sendmail, telnetd, and chargen.There are a number of options available to ps, although they vary from implementation toimplementation. In this example, run under FreeBSD, the parameters used were -aux. Thiscombination shows all users processes (-a), including those without controlling terminals (-x), inconsiderable detail (-u). The options -ax will provide fewer details but show more of the command-line arguments. Alternately, you can use the -w option to extend the displayed information to 132columns. With AT&T-derived systems, the options -ef do pretty much the same thing. Interestingly,Linux supports both sets of options. You will need to precede AT&T-style options with a hyphen.This isnt required for BSD options. You can do it either way with Solaris. /usr/bin/ps follows theAT&T conventions, while /usr/ucb/ps supports the BSD options.While ps quickly reveals individual processes, it gives a somewhat incomplete picture if interpretednaively. For example, the inetd daemon is one source of confusion. inetd is used to automatically startservices on a system as they are needed. Rather than start a separate process for each service thatmight eventually be run, the inetd daemon runs on their behalf. When a connection request arrives,inetd will start the requested service. Since some network services like ftp, telnet, and finger areusually started this way, ps will show processes for them only when they are currently running. If psdoesnt list them, it doesnt mean they arent available; they just arent currently running.For example, in the previous listing, chargen was started by inetd. We can see chargen in this instancebecause it was a running process when ps was run. But, this particular test system was configured torun a number of additional services via inetd (as determined by the /etc/inetd.conf configuration file).None of these other services show up under ps because, technically, they arent currently running. Yet,these other services will be started automatically by inetd, so they are available services.In addition to showing what is running, ps is a useful diagnostic tool. It quickly reveals defunctprocesses or multiple instances of the same process, thereby pointing out configuration problems andsimilar issues. %MEM and %CPU can tell you a lot about resource usage and can provide crucialinformation if you have resource starvation. Or you can use ps to identify rogue processes that arespawning other processes by looking at processes that share a common PPID. Once you arecomfortable with the usual uses, it is certainly worth revisiting ps periodically to learn more about itsother capabilities, as this brief discussion just scratches the surface of ps.2.1.2 top 16
  27. 27. Although less ubiquitous, the top command, a useful alternative to ps, is available on many systems. Itwas written by William LeFebvre. When running, top gives a periodically updated listing of processesranked in order of CPU usage. Typically, only the top 10 processes are given, but this isimplementation dependent, and your implementation may let you select other values. Here is a singleinstance from our test system:15 processes: 2 running, 13 sleepingCPU states: 0.8% user, 0.0% nice, 7.4% system, 7.8% interrupt, 84.0% idleMem: 6676K Active, 12M Inact, 7120K Wired, 2568K Cache, 3395K Buf, 1228K FreeSwap: 100M Total, 100M Free PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND 6590 root 35 0 924K 616K RUN 0:15 21.20% 20.75% inetd 144 root 10 0 980K 500K nanslp 0:03 0.00% 0.00% cron 150 root 2 0 1304K 804K select 0:03 0.00% 0.00% sendmail 100 root 2 0 820K 484K select 0:02 0.00% 0.00% syslogd 173 root 2 0 788K 368K select 0:02 0.00% 0.00% moused 141 root 2 0 924K 616K select 0:01 0.00% 0.00% inetd 6167 root 2 0 1108K 712K select 0:00 0.00% 0.00% telnetd 6171 root 18 0 464K 320K pause 0:00 0.00% 0.00% csh 6168 jsloan 10 0 504K 252K wait 0:00 0.00% 0.00% sh 6598 root 28 0 1556K 844K RUN 0:00 0.00% 0.00% top 1 root 10 0 496K 168K wait 0:00 0.00% 0.00% init 457 root 3 0 824K 516K ttyin 0:00 0.00% 0.00% getty 214 root 3 0 824K 508K ttyin 0:00 0.00% 0.00% getty 213 root 3 0 824K 508K ttyin 0:00 0.00% 0.00% getty 109 daemon 2 0 828K 436K select 0:00 0.00% 0.00% portmapOutput is interrupted with a q or a Ctrl-C. Sometimes system administrators will leave top running onthe console when the console is not otherwise in use. Of course, this should be done only in aphysically secure setting.In a sense, ps is a more general top since it gives you all running processes. The advantage to top isthat it focuses your attention on resource hogs, and it provides a repetitive update. top has a largenumber of options and can provide a wide range of information. For more information, consult itsUnix manpage.[1] [1] Solaris users may want to look at process management utilities included in /usr/proc/bin.2.1.3 netstatOne of the most useful and diverse utilities is netstat. This program reports the contents of kernel datastructures related to networking. Because of the diversity in networking data structures, many ofnetstat s uses may seem somewhat unrelated, so we will be revisiting netstat at several points in thisbook.One use of netstat is to display the connections and services available on a host. For example, this isthe output for the system we just looked at:bsd4# netstat -aActive Internet connections (including servers)Proto Recv-Q Send-Q Local Address Foreign Address (state)tcp 0 0 bsd4.telnet 205.153.60.247.3473 TIME_WAITtcp 0 17458 bsd4.chargen sloan.1244 ESTABLISHEDtcp 0 0 *.chargen *.* LISTENtcp 0 0 *.discard *.* LISTEN 17
  28. 28. tcp 0 0 *.echo *.* LISTENtcp 0 0 *.time *.* LISTENtcp 0 0 *.daytime *.* LISTENtcp 0 0 *.finger *.* LISTENtcp 0 2 bsd4.telnet sloan.1082 ESTABLISHEDtcp 0 0 *.smtp *.* LISTENtcp 0 0 *.login *.* LISTENtcp 0 0 *.shell *.* LISTENtcp 0 0 *.telnet *.* LISTENtcp 0 0 *.ftp *.* LISTENtcp 0 0 *.sunrpc *.* LISTENudp 0 0 *.1075 *.*udp 0 0 *.1074 *.*udp 0 0 *.1073 *.*udp 0 0 *.1072 *.*udp 0 0 *.1071 *.*udp 0 0 *.1070 *.*udp 0 0 *.chargen *.*udp 0 0 *.discard *.*udp 0 0 *.echo *.*udp 0 0 *.time *.*udp 0 0 *.daytime *.*udp 0 0 *.sunrpc *.*udp 0 0 *.syslog *.*Active UNIX domain socketsAddress Type Recv-Q Send-Q Inode Conn Refs Nextref Addrc3378e80 dgram 0 0 0 c336efc0 0 c3378f80c3378f80 dgram 0 0 0 c336efc0 0 c3378fc0c3378fc0 dgram 0 0 0 c336efc0 0 0c336efc0 dgram 0 0 c336db00 0 c3378e80 0 /var/run/logThe first column gives the protocol. The next two columns give the sizes of the send and receivequeues. These should be 0 or near 0. Otherwise, you may have a problem with that particular service.The next two columns give the socket or IP address and port number for each end of a connection.This socket pair uniquely identifies one connection. The socket is presented in the formhostname.service. Finally, the state of the connection is given in the last column for TCP services.This is blank for UDP since it is connectionless. The most common states are ESTABLISHED forcurrent connections, LISTEN for services awaiting a connection, and TIME_WAIT for recentlyterminated connections. Any of the TCP states could show up, but you should rarely see the others.An excessive number of SYN_RECEIVED, for example, is an indication of a problem (possibly adenial-of-service attack). You can safely ignore the last few lines of this listing.A couple of examples should clarify this output. The following line shows a Telnet connectionbetween bsd4 and sloan using port 1082 on sloan:tcp 0 2 bsd4.telnet sloan.1082 ESTABLISHEDThe next line shows that there was a second connection to sloan that was recently terminated:tcp 0 0 bsd4.telnet 205.153.60.247.3473 TIME_WAITTerminated connections remain in this state for a couple of minutes, during which time the socket paircannot be reused.Name resolution can be suppressed with the -n option if you would rather see numeric entries. Thereare a couple of reasons you might want to do this. Typically, netstat will run much faster withoutname resolution. This is particularly true if you are having name resolution problems and have to wait 18
  29. 29. for requests to time out. This option can help you avoid confusion if your /etc/services or /etc/hostsfiles are inaccurate.The remaining TCP entries in the LISTEN state are services waiting for a connection request. Since arequest could come over any available interface, its IP address is not known in advance. The * in theentry *.echo acts as a placeholder for the unknown IP address. (Since multiple addresses may beassociated with a host, the local address is unknown until a connection is actually made.) The *.*entries indicate that both the remote address and port are unknown. As you can see, this shows anumber of additional services that ps was not designed to display. In particular, all the services that areunder the control of inetd are shown.Another use of netstat is to list the routing table. This may be essential information in resolvingrouting problems, e.g., when you discover that a host or a network is unreachable. Although it may betoo long or volatile on many systems to be very helpful, the routing table is sometimes useful ingetting a quick idea of what networks are communicating with yours. Displaying the routing tablerequires the -r option.There are four main ways entries can be added to the routing table—by the ifconfig command when aninterface is configured, by the route command, by an ICMP redirect, or through an update from adynamic protocol like RIP or OSPF. If dynamic protocols are used, the routing table is an example ofa dynamic structure that cant be discovered by looking at configuration files.Here is an example of a routing table from a FreeBSD system:bsd1# netstat -rnRouting tablesInternet:Destination Gateway Flags Refs Use Netif Expiredefault 205.153.60.2 UGSc 0 0 xl0127.0.0.1 127.0.0.1 UH 0 0 lo0172.16.1/24 172.16.2.1 UGSc 0 7 xl1172.16.2/24 link#2 UC 0 0 xl1172.16.2.1 0:10:7b:66:f7:62 UHLW 2 0 xl1 913172.16.2.255 ff:ff:ff:ff:ff:ff UHLWb 0 18 xl1172.16.3/24 172.16.2.1 UGSc 0 2 xl1205.153.60 link#1 UC 0 0 xl0205.153.60.1 0:0:a2:c6:e:42 UHLW 4 0 xl0 906205.153.60.2 link#1 UHLW 1 0 xl0205.153.60.5 0:90:27:9c:2d:c6 UHLW 0 34 xl0 987205.153.60.255 ff:ff:ff:ff:ff:ff UHLWb 1 18 xl0205.153.61 205.153.60.1 UGSc 0 0 xl0205.153.62 205.153.60.1 UGSc 0 0 xl0205.153.63 205.153.60.1 UGSc 2 0 xl0At first glance, output from other systems may be organized differently, but usually the same basicinformation is present. In this example, the -n option was used to suppress name resolution.The first column gives the destination, while the second gives the interface or next hop to thatdestination. The third column gives the flags. These are often helpful in interpreting the first twocolumns. A U indicates the path is up or available, an H indicates the destination is a host rather than anetwork, and a G indicates a gateway or router. These are the most useful. Others shown in this tableinclude b, indicating a broadcast address; S, indicating a static or manual addition; and W and c,indicating a route that was generated as a result of cloning. (These and other possibilities are describedin detail in the Unix manpage for some versions of netstat.) The fourth column gives a reference count, 19

×