A Web Based Network Monitoring Tool
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
906
On Slideshare
904
From Embeds
2
Number of Embeds
1

Actions

Shares
Downloads
7
Comments
0
Likes
0

Embeds 2

http://www.slideshare.net 2

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • This page describes the motivations for this work. The major objective is to provide some method of testing to the desktop. The speaker should note that it is difficult or impossible for a campus admin to run repetative Tests to every desktop on site. Even if it were done, running enough test to get a statistically Valid baseline is extremely difficult. A better approach is needed to allow testing on an as-needed Basis. The NDT tester meets these design goals.
  • This page gives some additional background. How the underlying Web100 project came into being and how the NDT Uses this as the basic data gathering methodology. These are Web100 project goals not NDT goals.
  • This is the web based java client. This client means the applet automatically downloads into the client, eliminating the need to pre-install SW on the client machine. This is a plus when a new user wants to test or complain. He/She didn’t need to pre-load any SW before a test could begin. It is also important to define the NDT strengths and deficiencies (next slide). Note that the performance tuning info is based on getting to the NDT server, not the real application host. Thus the tuning info may be suspect but it should provide the right trends in setting buffer sizes.
  • As noted in the previous slide, it’s important to define what the NDT can’t do. There is enough variation is the Internet and in individual hosts that running a test to one desktop will not provide any help in determining how another computer will operate. Neither does it help tell you how your desktop will operate when talking to a different server (system load, file system constraints) all play a role in the wall clock time required to complete a specific task.
  • This slide, and the next show how the NDT fits into the rest of the piPEs architecture.
  • Boxes in black working and deployed (either released or in prototype form). Boxes in red under development. Theses are the software components that make up the piPEs measurement framework. Some are released (BWCTL, OWAMP), some are in “prototype format” (Database, Traceroute, PMP, PMC, web service, network monitoring), and some are under development (“Detective Applet”, Discovery module, Analysis module, MDI, NDT). The Measurement Domain Interface (MDI) is a web services interface that speaks the GGF NMWG Request/Report schema and handles authentication and authorization. It is being designed to be interoperable with other measurement frameworks (current and future). The Network Diagnostic Tool (NDT) is an existing tool that the original author is integrating into piPEs. It is designed to detect common problems in the first mile (the common case for most network “issues”).
  • This is the first major task. The issue is, what is the bottleneck link speed. For example suppose you have a 10/100/1000 interface card and the intra building network is GigE based, but you get plugged into a FastE network port. The NDT will tell you that the bottleneck is a Fast E link somewhere in the path. Another example: suppose the path takes you through a slow exchange point and there is a backup Ethernet link being used while the normal FastE link is down for some reason. The NDT will report that a bottleneck Ethernet links exists. The NDT uses packet dispersion techniques, e.g., it measures the interpacket arrival times for all data and ACK packets sent or received. It also knows the packet size so it can calculate the speed for each pair of packets sent or received. The results are then quantized, meaning that the NDT doesn’t recognize fractional link speed. It’s either Ethennet, T3 or FastE. It wouldn’t detect a bonded Etherchannel interface.
  • Improving the detection of this problem has been the focus on recent work. We have an analytical model and a detection algorithm was created based on this model.
  • Finally a list of basic features. Configuration files: means the admin can store run time options in a config file FIFO scheduling: means that the NDT will handle multiple request in a first-come first-servers manner, other users will wait in a queue for service Simple discover protocol: allows multiple servers to find each other when operating in Federated mode. Federated mode: allows multiple servers to redirect clients to the ‘closest’ NDT server Command line client: allows admin to run test remotely without access to web browser Speaker should note the this is a sourceforge project.
  • Finally, where you can go to get the source and email support.
  • This is the basic flow chart for the NDT program. The process starts with the user opening a browser and entering the NDT servers URL An optional step is to point to a well known server and accept a redirect message (Federated mode) Otherwise the URL points to the NDT server itself (either an apache or the fakewww process answer the request) The web server responds by returning the page, with an embedded java applet (class or jar file is also returned) The user must then manually request a test be performed by clicking the “start” button The applet then opens a connection back to the server’s testing engine (web100srv process) A child process is created to handle the test and the parent goes back to listening for more test requests. The parent also keeps the FIFO queue needed to process multiple requests. A control channel is then created between the server and client to control the clients actions and synchronize the start of the various tests. The client then opens 2 new data channels back to the client, allowing the client to open connections allows the tests to get past client side firewall boxes. The client opens and closes a connection to perform the middlebox test The client then streams data back to the server to measure the clients upload speed. The client then opens another connection and the server streams data back to the client measuring the clients download speed The server then extracts the web100 data and analyzed the connection for faults. The results are recorded in the servers’ log file and the results are returned to the client for display to the user.
  • This is a list of servers on the Abilene network, and other public servers. Note that this is not a complete list and more are being added when they become available. The latest public server is located in Russia and there is a server located at StarLight. In addition several institutions run private servers notably DOD and possibly DOE NNSA. There are no restrictions on the use, just the University of Chicago public license requirement.
  • Other topics and observations found after running a public server for several years.
  • This slide shows why it’s important to test to the users desktop and why having the network staff show up with a ‘good’ laptop doesn’t help much. In this case one laptop client with a 10 Mbps Ethernet NIC saw 7 Mpbs (70% utilization) which is good for a half-duplex connection. Note that some timeouts and retransmission occurred (probably due to the half-duplex nature of the link). When informed of this loss the network admin came in with a ‘tuned’ laptop and ran a test with a 100 Mbps NIC. Good throughput (85% utilization) and no loss. His conclusion is that there is no network problem and I should report that there was. What is a typical user going to think? (I see a problem and the network staff says no problem found)
  • This example comes from a lab setup. 12 desktop computers all connected to the same Cisco switch with 2 vlans and a Cisco router between the vlans. Note these test are vlan to vlan, e.g., through the router, and in the first 4 cases everything is 100 Mbps full duplex. Note the order of magnitude change in RTT, a factor of 4 in speed changes and no correlation between speed and RTT. Also note that loss never reaches 1%. In the last 2 cases, one of the hosts was changes to a 10 Mbps link. Note the order of magnitude change in RTT but speed remains constant and loss is again below 1%. Next page describes the network conditions present during the test.
  • Test resutls. Case 1, everything is operating normally with 100 Mbps full duplex links The router had a bad interface module, and it was reporting these errors in the router logs, note loss/sec rate In this case the TCP traffic is flowing in the opposite direction but the bad router interface is still present. (Who would report a problem?) In this case three pairs of hosts are testing at once, causing congestion on the shared router links (should be reported as normal) In this case one of the hosts is set to 10 Mbps. (normal operation) In this case the faulty router interface is again in the path. Note the increased loss/second rate, but speed is still good. Imagine what happens with GigE attached servers and FastE attached clients. Would anyone complain?
  • This formula describes the normal operating mode for a Reno TCP connection. As noted, the NDT server is reporting that some connections don’t conform to this model. It isn’t clear why this discrepancy exists.
  • These are NDT goals. An analogy is that repetitive tests build up an historical record that can point out when changes occur (a depth of Measurement data). The NDT relies on multiple data variables (a breadth of measurement data) to achieve similar results.
  • These are some of the benefits of the NDT system. Providing hard evidence is an important part of making the user feel that something can be done to improve things.
  • This introduces the audience to the NDT operation methodology. The next few slides provide the details.
  • There has been some preliminary work done on detecting this problem. At one point I did find a bad router interface in a test network.
  • This is also an area where more work needs to be performed. The issue is max performance, where a half duplex link will not achieve as high a speed as a full duplex link. Note: that old ethernet hubs require half-duplex operation.
  • This is another area where more work is required. The issue is to detect when your traffic is sharing the network infrastructure with other users. In this case you should get 1/Nth of the bottleneck link speed. It would also be nice to know when TCP is entering the congestion avoidance phase.

Transcript

  • 1. Developing the Web100 Based Network Diagnostic Tool (NDT) Internet2 piPEs Tutorial Rich Carlson [email_address]
  • 2. Demo
    • http://ndt-newyork.abilene.ucaid.edu:7123
  • 3. Normal operation in campus
  • 4. Duplex Mismatch Detected
  • 5. Low throughput from remote host
  • 6. Increase TCP buffer size
  • 7. Motivation for work
    • Measure performance to users desktop
    • Develop “single shot” diagnostic tool that doesn’t use historical data
    • Combine numerous Web100 variables to analyze connection
    • Develop network signatures for ‘typical’ network problems
  • 8. Web100 Project
    • Joint PSC/NCAR project funded by NSF
    • ‘First step’ to gather TCP data
      • Kernel Instrument Set (KIS)
    • Requires patched Linux kernel
    • Geared toward wide area network performance
    • Future steps will automate tuning to improve application performance
  • 9. Web Based Performance tool
    • Operates on Any client with a Java enabled Web browser
    • What it can do
      • Positively state if Sender, Receiver, or Network is operating properly
      • Provide accurate application tuning info
      • Suggest changes to improve performance
  • 10. Web base Performance tool
    • What it can’t do
      • Tell you where in the network the problem is
      • Tell you how other servers perform
      • Tell you how other clients will perform
  • 11. Internet2 piPEs Project
    • Develop E2E measurement infrastructure capable of finding network problems
    • Tools include
      • BWCTL: Bandwidth Control wrapper for NLANR Iperf
      • OWAMP: One-Way Active Measurement
      • NDT: Network Diagnostic Tool
  • 12. piPEs Integration
  • 13. Bottleneck Link Detection
    • What is the slowest link in the end-2-end path?
      • Monitors packet arrival times using libpcap routine
      • Use TCP dynamics to create packet pairs
      • Quantize results into link type bins (no fractional or bonded links)
    • Cisco URP grant work
  • 14. Duplex Mismatch Detection
    • Developed analytical model to describe how Ethernet responds (no prior art?)
    • Expanding model to describe UDP and TCP flows
    • Develop practical detection algorithm
    • Test models in LAN, MAN, and WAN environments
    • NIH/NLM grant funding
  • 15. Future enhancements
    • WiFi detection
    • Faulty Hardware detection
    • Congestion modification
    • Full/Half duplex detection
  • 16. Additional Functions and Features
    • Provide basic tuning information
    • Basic Features
      • Basic configuration file
      • FIFO scheduling of tests
      • Simple server discovery protocol
      • Federation mode support
      • Command line client support
    • Created sourceforge.net project page
  • 17. Availability
    • Open Source Development project
      • http://www.sourceforge.net/projects/ndt
    • Tools available via from
      • http://e2epi.internet2.edu/ndt/download.html
      • Contains source code
    • Email discussion list [email_address]
      • Goto http://e2epi.internet2.edu/ndt web site and click
        • ndt-users – General discussion on NDT tool
        • ndt-announce – Announcements on new features
  • 18. NDT Flow Chart Client Web Browser Java Applet NDT - Server Web Server Testing Engine Child Test Engine Spawn child Well Known NDT Server Web Request Redirect msg Web Page Request Web page response Test Request Control Channel Specific test channels
  • 19. NDT servers
  • 20. Results and Observations
    • Changing desktop effects performance
    • Faulty Hardware identification
    • Mathis et.al formula fails
  • 21.
    • 10 Mbps NIC
      • Throughput 6.8/6.7 Mbps send/receive
      • RTT 20 ms
      • Retransmission/Timeouts 25/3
    • 100 Mbps NIC
      • Throughput 84/86 Mbps send/receive
      • RTT 10 ms
      • Retransmission/Timeouts 0/0
    Different Host, Same Switch Port
  • 22.
    • 100 Mbps FD
    • Ave Rtt %loss
      • 5.41 0.00
      • 1.38 0.78
      • 6.16 0.00
      • 14.82 0.00
    • 10 Mbps
      • 72.80 0.01
      • 8.84 0.75
    • Speed
      • 94.09
      • 22.50
      • 82.66
      • 33.61
      • 6.99
      • 7.15
    LAN Testing Results
  • 23.
    • 100 Mbps FD
    • Ave Rtt %loss loss/sec
      • 5.41 0.00 0.03
      • 1.38 0.78 15.11
      • 6.16 0.00 0.03
      • 14.82 0.00 0.10
    • 10 Mbps
      • 72.80 0.01 0.03
      • 8.84 0.75 4.65
    • Speed
      • 94.09 Good
      • 22.50 Bad NIC
      • 82.66 Bad reverse
      • 33.61 Congestion
      • 6.99 Good
      • 7.15 Bad NIC
    LAN Testing Results
  • 24. Mathis et.al Formula fails
    • Estimate = (K * MSS) / (RTT * sqrt(loss))
      • old-loss = (Retrans - FastRetran) / (DataPktsOut - AckPktsOut)
      • new-loss = CongestionSignals / PktsOut
    • Estimate < Measured (K = 1)
      • old-loss 91/443 (20.54%)
      • new-loss 35/443 (7.90%)
  • 25. NDT Hardware Requirements
    • Minimum requirements
      • 500 MHz Intel or AMD CPU
      • 64 MB of RAM
      • Fast Ethernet
    • Buying something now
      • 2 GHz or better processor
      • 256 MB of RAM
      • Gigabit Ethernet
    • Disk space for executables and log files
      • No disk I/O involved during test
  • 26. NDT Software Requirements
    • Web100 enhancements
      • Linux kernel
      • User library
    • Other 3 rd party SW needed to compile source
      • Java SDK
      • pcap library
      • Client uses Java JRE (beware of version mismatch)
    • NDT source file
      • Test engine (web100srv) requires root authority
  • 27. Recommended Settings
    • There are no settings or options for the Web based java applet.
      • It allows the user to run a fixed set of tests for a limited time period
    • Test engine settings
      • Turn on admin view ( -a option)
      • If multiple network interfaces exist use –i option to specify correct interface to monitor ( ethx )
    • Simple Web server (fakewww)
      • Use –l fn option to create log file
  • 28. Potential Risks
    • Non-standard kernel required
      • GUI tools can be used to monitor other ports
    • Public servers generate trouble reports from remote users
      • Respond or ignore emails
    • Test streams can trigger IDS alarms
      • Configure IDS to ignore NDT server
  • 29. Possible Alternatives
    • Other tools that can perform client testing
      • Several web sites offer the ability for a user to check PC upload/download speed.
      • Internet2/Surfnet Detective
      • NCSA Advisor
  • 30.
    • Supplemental information
  • 31. NDT’s Web100 Based Approach
    • Simple bi-directional test to gather E2E data
    • Gather multiple data variables from server
    • Compare measured performance to analytical values
    • Translate network values into plain text messages
    • Geared toward campus area network
  • 32. NDT Benefits
    • End-user based view of network
    • Can identify configuration problems
    • Can identify performance bottlenecks
    • Provides some ‘hard evidence’ to users and network administrators to reduce finger pointing
    • Doesn’t rely on historical data
  • 33. NDT methodology
    • Identify specific problem(s) that affect end users
    • Analyze problem to determine ‘Network Signature’ for this problem
    • Provide testing tool to automate detection process
  • 34. IEEE 802.11 (WiFi) Detection
    • Detect when host is connected via wireless (wifi) link
      • Radio signal changes strength
      • NICs implement power saving features
      • Multiple standards (a/b/g/n)
    • Some data has been collected
  • 35. Faulty Hardware/Link Detection
    • Detect non-congestive loss due to
      • Faulty NIC/switch interface
      • Bad Cat-5 cable
      • Dirty optical connector
    • Preliminary works shows that it is possible to distinguish between congestive and non-congestive loss
  • 36. Full/Half Link Duplex setting
    • Detect half-duplex link in E2E path
      • Identify when throughput is limited by half-duplex operations
    • Preliminary work shows detection possible when link transitions between blocking states
  • 37. Normal congestion detection
    • Shared network infrastructures will cause periodic congestion episodes
      • Detect/report when TCP throughput is limited by cross traffic
      • Detect/report when TCP throughput is limited by own traffic