Your SlideShare is downloading. ×
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • I’m Matt Zekauskas and you’re not. Correct, right off the bat.
  • Before we really start, I want to say that we don’t have all the answers. I’m going to tell you about common problems, and tools and techniques we’ve found useful. You’ll learn more about specific tools over the course of the day. However, if you have a common problem, or a particularly difficult problem, we’d like to hear about it. In fact, we collect “war stories” for publication on our web site. In addition, if you have a tool or technique that we don’t talk about today, please do speak up during the day, or send us details after the workshop.
  • I’ll start off with a list of network problems that we find affect performance of most applications. Packet loss slows TCP (bulk data transfer), and causes dropouts with voice and “jaggies” with video. Jitter, or the change in the rate that packets arrive, can also cause TCP to slow down, or at least react to problems more slowly; excessive jitter in a real-time application can cause some packets to be treated as if they were lost, causing dropouts and video problems. If you have an interactive application, say remote control of a scanning-tunneling microscope, jitter makes it hard for humans to react. We can learn how to deal with latency, but we can’t adjust for arbitrary changes in latency. An Ohio state study of h.323 codecs found that jitter caused more problems than loss (up to some point). Out-of-order packets can be viewed as extreme jitter; beside the problems already listed, many applications are not written well (for example, early MPEG-2 and HDTV codecs), and out of order packets can cause more problems than lost packets. Applications should be written to be tolerant of out-of-order packets (since parallelism in the network can generate them naturally, and they actually occur quite frequently), but for now, reducing the number of out of order packets will improve application efficiency. Duplicate packets waste bandwidth, and in extreme cases can cause TCP to slow down or confuse real-time applications. Excessive latency makes it difficult for interactive applications (video conferencing, remote instrument control), although humans can compensate to some extent. TCP’s control system also has more trouble as latency increases, since it reacts more slowly. In general, it is best to engineer paths so that latency is minimized.
  • Let’s look at how “vanilla” Reno TCP (still the most commonly deployed TCP stack) reacts to losses to see how important limiting loss along a path is. If a path is congested, it is obvious (at least once the problem link is found) because link utilization is high. However losses can be caused for other reasons (which we will get to in a moment), and these “non-congestive” losses are especially hard to track down. Let’s say our goal is modest (for modern workstations): send 100 megabits from coast-to-coast. With full size Ethernet packets (1500 bytes for 100Mbps interfaces) you need a probability of packet loss on the order of one in a million. That’s one loss every 83 seconds. How about if you have gigabit Ethernet? Then the loss probability must be less than one in ten to the negative eighth power, or one loss every 497 seconds. The situation gets better if you can use so-called “jumbo-frames”, or 9000 byte packets, it goes back to close to the 100 megabit case. That’s one reason to try and make high-performance paths “9000-byte clean”. The situation also gets better with some of the newer TCP algorithms (high-speed TCP, BIC TCP, FAST TCP) which is why there’s a lot of research into new bulk-transfer control algorithms. But we would still like a way to find and remove non-congestive losses as much as possible. RC. I’ve re-arranged this slide to make another point. While eliminating loss is a worthy goal, it may not be possible. Looking at the optical links themselves, you will find error rates in the 10^-10 to -12 range. Much better that we supposedly need. Why do IP network engineers have this worry and why are the loss rates higher? The answer is that for TCP, most losses aren’t due to link BER’s. They come from 2 areas. TCP tries to use as much of the link as it can. TCP also tries to fair share the link with other users. So it is possible for TCP to overdrive the link by sending more data into the network than a specific link can handle. This can a router buffer to overflow and loss occurs. In the fair share case, it is possible for the aggregate sum of several hosts to overflow a router’s buffer which also leads to congestive loss. The individual links on the Internet are build using different technologies from wireless to optics. Each technology has different BER loss probabilities so the TCP loss rate is the product of the probabilities of each link in the End-to-End path. The Mathis formula describes how the path loss rate effects TCP performance. This is not necessarily what a link specific engineer thinks about.
  • OK, we talked about network problems. However, you also should know that the number one reason for TCP not running at “full-speed” is for it to be starved for buffer space. Vendors ship TCP stacks with buffers that are tuned for the commercial Internet. If the buffer is too small, TCP, which uses a “sliding window” for flow control, must wait for packets to be acknowledged in order to advance the window and send more data. Essentially the sender is forced to stop and wait. You need to be able to buffer the number of bits you can send in one round-trip time at your desired speed. For example, with a 70 millisecond round-trip time (more-or-less trans-continental North America), to sustain one gigabit per second you need 8.4 megabytes of buffer space. For 100Mbps at the same distance you need 855 kilobytes. Many stacks default to 64 kilobytes, which only allows 7.4 Mbps. One word of caution: network kilobits, megabits, gigabits are powers of 10. Memory kilobytes and megabytes are in powers of two, a kilobyte being 1024 bytes (2^10) and a megabyte being 1,048,576 bytes (2^20).
  • I also want to mention that the same problem carries up to the applications themselves. We won’t be speaking more about this today, but for video and audio (streaming media) the lack of buffer space in the application (in our world, MPEG-2 based applications are especially bad) means the application is very sensitive to packet loss or reordering. Of course, if your application is interactive, then increased buffering can lead to lag in response, which is not desirable, either. This generalizes to bad network application behavior, so that they are not robust to network changes or anomalies. Drops will occur. Reordering will occur. Even if only very occasionally. Even applications that would like to use TCP to do bulk transfer can do things like not hand enough data to TCP to allow it to stream over long distances. One that was brought to light recently is scp; popular versions of scp do not provide large enough buffers for TCP to stream. (There is a pointer to a good version of scp off a tcp tuning page at PSC mentioned later.)
  • So what causes these problems? Here’s a laundry-list of the “usual suspects”. First on the list, and most common is a bad host configuration. As we just mentioned, this is usually because operating systems ship tuned to the commercial internet, and we have very different paths over the Internet2 infrastructure (in particular the “bandwidth delay product” is much greater). Second is duplex mismatch, usually due to autoconfiguration failure, with one side believing it is full-duplex (can send and receive simultaneously), and the other side believing it is half-duplex (can only send or receive one at a time). This is a legacy of how the Ethernet standard has evolved. This is the major cause of “non-congestive” packet losses. Wiring or fiber problems can cause non-congestive packet losses. Bad equipment (anything from host interfaces that cannot run full-speed, to host, switch, router, or fiber equipment failure) can cause excessive delays, jitter, or non-congestive packet loss. Bad routing can cause exessive latency, or sometimes jitter due to multiple different length paths being used. Congestion causes varying delays and packet loss.
  • Here’s an example that is at least three years old now, but still illustrates the difficulty in debugging problems. Folks at JPL were trying to do a demo to the Goddard Space Flight Center near Washington DC. Here you have real rocket scientists working on the problem. And they were smart… they made sure they were using Abilene. They had their hosts tuned well. Things worked fine locally… therefore the problem HAD to be in the backbone network, right? (Here the target was a 50mbps TCP flow.) We had test equipment inside Abilene. We were able to demonstrate 80Mbps easily from Los Angeles to Washington, DC. We offered them the use of our equipment, but this caused them to go ask friends at NCAR in Colorado… and they found the problem was on the path from LA to Boulder. This set the local network folks off to check, and in the end the problem was a bad fiber connection in California. Having intermediate test points to segment the problem is crucial. Having periodic tests running on common paths (we have such tests in the backbone) allows you to refute quickly “problem is in the middle of the network” finger pointing.
  • OK. This slide illustrates one way to attack a performance problem. First and foremost, if you are planning a demo, or other event, do test ahead of time. If you have a concerned application community, this may mean periodic testing among points close to key equipment. For example, all of the VLBI sites may test among each other. It may also mean periodic testing within your network to points in Abilene, or other campuses you talk to frequently. Now say you have a problem that the periodic testing did not pick up (there are just too many paths to test them all). The first question – do you have connectivity and reasonable latency? Ping will give you round-trip times, assuming it isn’t blocked along the way. We’ll describe a tool, owamp, that measures one-way delay, which allows you to disambiguate problems that might occur asymmetrically – asymmetric routing, asymmetric traffic queuing, a dirty fiber can cause asymmetric problems (since each fiber transmits light in one direction). Are you seeing many losses with these low-rate tests? If so, there’s something terribly wrong. If the latency is not what you expect, there may be a routing problem. The best-known tool is traceroute, and you can use that to make sure the path looks reasonable. It goes through your campus, possibly through a gigapop, across Abilene and down to the other side in a reasonable fashion (not taking a scenic tour of the US, for example). Remember that you have to test in the opposite direction; the Abilene router proxy and traceroute servers can help. Has the host been tuned? Is there potentially a duplex-mismatch one of the local Ethernet connections? Here, running NDT, also to be described today, can point out a series of common problems. NDT itself relies on web100, which instruments the Linux kernel. You might consider installing a web100 machine (or using machines with web100 code); there are additional diagnostics you can run using the web100-provided variables, and the kernel itself is better “out of the box”: it can automatically tune buffers on some TCP connections. If routing looks reasonable, and the host is reasonable, you may have a problem in the path. (Large losses in the low rate tests also indicate path problems, assuming it isn’t a duplex mismatch problem, local congestion (perhaps a denial of service attack) or even broken network hardware on the end system.) Iperf is a tool to run synthetic TCP streams (memory-to-memory) between two machines. Bwctl is a tool that we will talk about today that adds authentication and scheduling to iperf, and allows you to test to multiple points, including midpoints within Abilene.
  • And, for path problems, the best strategy is usually to “divide and conquer”; test to a midpoint, and see which side the problem is on, and then test to a midpoint on one side, until you’ve exhausted your midpoints and have localized the problem as much as you can. We’re working on tools to automate this process, but for now it’s manual. This picture shows testing to a bunch of points that you have access to.
  • Probably want to move these to the end… interrupts the flow. References for debugging strategies and application design. Flip through these quickly, they are here so participants can look later.
  • Add slide probably: 1.4ghz PIII, dual bank. Whatever the chipset. Fairly slow, but fastest we could get with a 48VDC supply off the shelf when we were building Abilene.
  • Add separate slide for application communities?
  • Just because my name ‘ll probably come off the front if it becomes standard course material
  • Transcript

    • 1. Finding Network Problems that Influence Applications: Measurement Tools Matt Zekauskas, Rich Carlson, Spring Member Meeting May 2, 2005
    • 2. Outline
      • Problems, typical causes, diagnostic strategies
      • Tools: First mile, host issues
      • Tools: Path issues
      • Tools: Others to be aware of
      • Tools within Abilene
      • Wrap-up
    • 3. We Would Like Your Help
      • What problems are you experiencing?
      • Have you used a good tool?
      • Give us the benefit of your experience: successful problem resolution?
    • 4. What Are The Problems?
      • Packet loss
      • Jitter
      • Out-of-order packets (extreme jitter)
      • Duplicated packets
      • Excessive latency
        • Interactive applications
        • TCP’s control system
    • 5. Link vs Path loss rates
      • Eliminating loss has been the goal
        • TCP: 100 Mbit Ethernet coast-to-coast:
          • Full size packets… need 10 -6 P loss [Mathis]
          • Less than 1 loss every 83 seconds
        • GigE: 10 -8 , 1 loss every 497 seconds
      • Not all loss is avoidable
        • Fair sharing can lead to congestive loss
        • Non-congestive losses especially tricky
    • 6. What Are The Problems?
      • TCP: lack of buffer space
        • Forces protocol into stop-and-wait
        • Number one TCP-related performance problem.
        • 70ms * 1Gbps = 70*10^6 bits, or 8.4MB
        • 70ms * 100Mbps = 855KB
        • Many stacks default to 64KB, or 7.4Mbps
    • 7. What Are The Problems?
      • Video/Audio: lack of buffer space
        • Makes broadcast streams very sensitive to previous problems
      • Application behaviors
        • Stop-and-wait behavior; Can’t stream
        • Lack of robustness to network anomalies
    • 8. The Usual Suspects
      • Host performance problems (TCP buffers)
      • Duplex mismatch (Ethernet)
      • Wiring/Fiber problem
      • Bad equipment
      • Bad routing
      • Congestion
        • “ Real” traffic
        • Unnecessary traffic (broadcasts, multicast, denial of service attacks)
    • 9. JPL/Caltech – GSFC
      • The situation
        • Using Abilene
        • Tuned hosts
        • Things work locally
      • Therefore it MUST be Abilene
        • Tests show good flows router-router
        • Intermediate tests point towards CA
      • Bad fiber connection!
    • 10. Strategy
      • Most problems are local…
      • Test ahead of time!
      • Is there connectivity & reasonable latency? (ping -> OWAMP)
      • Is routing reasonable (traceroute)
      • Is host reasonable (NDT; web100)
      • Is path reasonable (iperf -> BWCTL)
    • 11. One Technique: Problem Isolation via Divide and Conquer
    • 12. Strategy (references)
      • See also
        • Look at stories, documents, tools
        • Pointer to the tool, and using it for debugging the last mile
    • 13. Strategy (references)
        • How to tweak OS parameters (also scp pointer)
        • TCP debugging the detailed way
        • Tips for app writers
        • And some checking to do by hand & debugging.
    • 14. Outline
      • Problems, typical causes, diagnostic strategies
      • Tools: First mile, host issues
      • Tools: Path issues
      • Tools: Others to be aware of
      • Tools within Abilene
      • Wrap-up
    • 15. Internet2 Detective
      • A simple “is there any hope” tool
        • Windows “tray” application
        • Red/green lights, am I on Internet2
        • Multicast available
        • IPv6 available
    • 16. NLANR Performance Advisor
      • Geared for the naive user
      • Run at both ends, and see if a standard problem is detected.
      • Can also work with intermedate servers
    • 17. NDT
      • Network Debugging Tool
      • Java applet
      • Connects to server in middle, runs tests, and evaluates hueristics looking for host and first mile problems.
      • Has detailed output.
      • You’ll see lots of detail later today.
      • A commercial tool that tests for TCP buffer problems:
    • 18. Host/OS Tuning: Web100
      • Goal: TCP stack, tuning not bottleneck
      • Large measurement component
        • TCP performance not what you expect? Ask TCP why!
          • Receiver bottleneck (out of receiver window)
          • Sender bottleneck (no data to send)
          • Path bottleneck (out of congestion window)
          • Path anomalies (duplicate, out of order, loss)
    • 19. Reference Servers (Beacons)
      • H.323 conferencing
        • Goal: portable machines that tell you if system likely to work (and if not, why?)
        • Moderate-rate UDP of interest
        • E.g., H.323 Beacon
        • ViDeNet Scout,
    • 20. Outline
      • Problems, typical causes, diagnostic strategies
      • Tools: First mile, host issues
      • Tools: Path issues
      • Tools: Others to be aware of
      • Tools within Abilene
      • Wrap-up
    • 21. OWAMP
      • One-Way Active Measurement Protocol
      • Requires NTP-Synchronized clocks
      • Look for one-way latency, loss
      • Authentication and Scheduling
      • Again, lots more later today
    • 22. BWCTL
      • A tool for throughput testing that includes scheduling and authentication.
      • Currently uses iperf for actual tests.
      • Can assign users (or IP addresses) to classes, give classes different throughput limits or time limits.
      • Periodic and on-demand testing.
      • Lots more later today.
    • 23. A commercial tool
      • Apparent Networks has a tool that can diagnose path properties from a single vantage point. Finds things like duplex mismatches, MTU black holes, rate limiting links. Depends on ICMP (or a reflector).
    • 24. Outline
      • Problems, typical causes, diagnostic strategies
      • Tools: First mile, host issues
      • Tools: Path issues
      • Tools: Others to be aware of
      • Tools within Abilene
      • Wrap-up
    • 25. Some Commercial Tools
      • Caveat: only a partial list, give me more!
      • Spirent (nee Netcom/Adtech):
        • working on a box for ‘end-to-end’ measurements
        • SmartBits: test at low & high rates, QoS; test components or end-to-end path
      • NetIQ: Chariot/Pegasus
      • Agilent (like SmartBits, and FireHunter)
      • Ixia (like SmartBits/Spirent)
      • Brix Networks (like AMP/Owamp, for ‘QoS’)
      • Apparent Networks: path debugger
    • 26. Some Noncommercial Tools
      • Iperf:
        • See also
      • Flowscan:
      • SLAC’s traceroute perl script:
      • One large list:
    • 27. Outline
      • Problems, typical causes, diagnostic strategies
      • Tools: First mile, host issues
      • Tools: Path issues
      • Tools: Others to be aware of
      • Tools within Abilene
      • Wrap-up
    • 28. Abilene: Measurements from the Center
      • Active (latency, throughput)
        • Measurement within Abilene
        • Measurements to the edge
      • Passive
        • SNMP stats (esp. core Abilene links)
        • Variables via router proxy
        • Router configuration
        • Route state
        • Characterization of traffic
          • Netflow; OCxMON
    • 29. Goal
      • Abilene goal to be an exemplar
        • Measurements open
        • Tests possible to router nodes
        • Throughput tests routinely through backbone
        • …as well as existing utilization, etc.
    • 30. Abilene: Machines
      • GigE connected high-performance tester
        • bwctl, “nms1”, 9000 byte MTU
      • Latency tester
        • owamp, “nms4”, 100bT
      • Stats collection
        • SNMP, flow-stats, “nms3”, 100bT
      • Ad-hoc tests
        • NDT server, “nms2”, gigE, 1500 byte MTU
    • 31. Throughput
      • Take tests 1/hr, 20 seconds each
        • IPv4 TCP
        • IPv6 TCP (no discernable difference)
        • IPv4 UDP (on our platforms flakey at 1G)
        • IPv6 UDP (ditto)
      • Others test to our nodes
      • Others test amongst themselves
      • Net result: 25% of traffic (NOT capacity) is measurement
    • 32. Latency
      • CDMA used to synchronize NTP
      • Test among all router node pairs
      • 10/sec
      • IPv4 and IPv6
      • Minimal sized packets
      • Poisson schedule
    • 33. Passive - Utilization
      • The Abilene NOC takes
        • Packets in,out
        • Bytes in,out
        • Drops/Errors
        • ..for all interfaces, publishes internal links & peering points (at 5 min intervals)
        • ..via SNMP polling – every 60 sec
    • 34.  
    • 35. Abilene Pointers
        • Monitoring
        • Tools
      • (summaries)
    • 36. Outline
      • Problems, typical causes, diagnostic strategies
      • Tools: First mile, host issues
      • Tools: Path issues
      • Tools: Others to be aware of
      • Tools within Abilene
      • Wrap-up
    • 37. End-to-End Measurement Infrastructure vision
      • Ongoing monitoring to test major elements, and (some, important) end-to-end paths.
        • Elements: gigaPoP links, peering, …
        • Utilization
        • Delay
        • Loss
        • Occasional throughput
        • Multicast connectivity
    • 38. End-to-End Measurement Infrastructure Vision II
      • There are many more paths end to end than can be monitored.
      • Diagnostic tools available on-demand (with authorization)
        • Show routes
        • Perform flow tests (perhaps app tests)
        • Parse/debug flows (a-la tcpdump or OCXmon with heuristic tools)
    • 39. What Campuses Can Do
      • Export SNMP data
        • I have an “Internet2 list”, can add you
        • Monitor loss as well as throughput
      • Performance test point at campus edge
        • Hopefully, the result of today’s workshop
        • Possibly also traceroute “looking glass”
        • Commercial (e.g., NetIQ) complements
        • We have a master list
    • 40. Acknowledgements
      • The original presentation by Matt   Zekauskas using ideas inspired by material from NLANR DAST, Matt   Mathis, and others.
      • Copyright Internet2 2005, All Rights Reserved.
      • Your mileage may vary. Caveat Emptor. It’s a desert topping and a floor wax. They all do that. It’s a feature. It’s wafer-thin. Sleep is for the weak. Coffee won’t hurt you, look what it’s done for meeee…
    • 41.