Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Finding Network Problems that
Influence Applications:
Measurement Tools
  Internet2 Performance Workshop


          22 Ma...
Outline


Problems, typical causes, diagnostic
strategies
Examples showing usage of the tools
we’ll be talking about today...
We Would Like Your Help


What problems are you experiencing?


Have you used a good tool?


Give us the benefit of your e...
What Are The Problems? (1)


Packet loss
Jitter
Out-of-order packets (extreme jitter)
Duplicated packets
Excessive latency...
For TCP


Eliminating loss is the goal
Non-congestive losses especially tricky
TCP: 100 Mbit Ethernet coast-to-coast:
   •...
What Are The Problems? (2)


TCP: lack of buffer space
  • Forces protocol into stop-and-wait
  • Number one TCP-related p...
What Are The Problems? (3)


Video/Audio: lack of buffer space
  • Makes broadcast streams very sensitive
    to previous ...
The Usual Suspects


Host configuration errors (TCP buffers)
Duplex mismatch (Ethernet)
Wiring/Fiber problem
Bad equipment...
Strategy


Most problems are local…
Test ahead of time!
Is there connectivity & reasonable
latency? (ping -> OWAMP)
Is rou...
One Technique: Problem
Isolation via Divide and Conquer




          Finding Network Problems: Measurement Tools   V0.4 2...
Outline


Problems, typical causes, diagnostic
strategies
Examples showing usage of the tools
we’ll be talking about today...
Tool Examples


When to use NDT
 • NDT in action at SC’04
When to use BWCTL
 • BWCTL in action with e-VLBI
When to use OWA...
When to use NDT


When you want to know about last mile
and host problems
When you want a quick and easy test to
provide c...
Technique

Start by testing to the nearest NDT server
  from each end of the problem path
This will help you with a majori...
SC’04 Real Life Example


Booth having trouble getting application
to run from Amsterdam to Pittsburgh
Tests between Amste...
SC’04 Determine WinXP info




http://www.dslreports.com/drtcp
                                  Finding Network Problems:...
SC’04 Confirm PC settings


DrTCP reported 16 MB buffers, but test
 program still slow, Q: How to confirm?
Run test to SCI...
SC’04 Local PC Configured OK


No problem found
Able to run at line rate
Confirmed that PC’s TCP buffers were
set correctl...
SC’04 Amsterdam SGI

Run test from remote SGI to SC show floor (SGI is
 Gigabit Ethernet connected).
Downloaded and built ...
SC’04 Amsterdam SGI (tuned)

Re-run test from remote SGI to SC show floor
with –b # option.
  • Client-to-Server: 107 Mbps...
SC’04 Debugging Results


Team spent over 1 hour looking at Win XP
 config, trying to verify Buffer size
  • 2 tools used ...
SC’04 Debugging Results


8 Minutes to find SGI limits and
 determine maximum allowable buffer
 setting (2 MB)
Total time ...
When to use BWCTL


You want to understand segments of the
path
You want to know if each segment can
handle flows of a spe...
Technique


Divide and Conquer!
Look for segments with performance
less that required by the application




             ...
e-VLBI Case Study


The e-VLBI project needed to move
massive amounts of data between a
number of sites around the world
T...
e-VBLI test infrastructure


David Lapsley, one of the research
engineers, established BWCTL servers
at the sites of the p...
e-VLBI Results #1


They used Abilene nodes to divide the
problem path
David found that there was considerable
packet loss...
e-VLBI Results #2


For one site that was using a commodity
Internet only 1 Mbps was regularly seen
The application was ch...
e-VLBI Regular Testing


They found the testing to be very useful in
 understanding the network status
They established a ...
When to use OWAMP


Want baseline “heartbeat” information
Asymmetric routes can make problem
location more difficult
OWAMP...
Why use OWAMP


It is very sensitive to minor network
changes
  • Route changes
  • Packet queuing
It tells you about one-...
OWAMP Case Study
         Queuing on Abilene

 Tuesday, 2004-08-17, 16:05-16:20 UTC
 That’s 11:05 to 11:20 EDT
 Caltech to...
One Links History


The Denver to KSCY Link




                 Finding Network Problems: Measurement Tools   V0.4 22-Mar...
What It Shows


Only paths that traverse DNVR>KSCY
showed additional delay
Some delayed by ~ an extra 35msec
Probable caus...
Example 2 – SCP file transfer


Bob and Carol are collaborating on a
project. Bob needs to send a copy of
the data (50 MB)...
What should we expect?


Assumptions:
  • 100 Mbps Fast Ethernet is the slowest link
  • 50 msec round trip time
Bob & Car...
Initial SCP Test Results




          Finding Network Problems: Measurement Tools
Initial Test Results


This is unacceptable!
First look for network infrastructure
problem
  • Use NDT tester to examine b...
Initial NDT testing shows
Duplex Mismatch at one end




           Finding Network Problems: Measurement Tools
NDT Found Duplex Mismatch


Investigating this it is found that the
switch port is configured for 100 Mbps
Full-Duplex ope...
Duplex Mismatch Corrected




           Finding Network Problems: Measurement Tools
SCP results after
Duplex Mismatch Corrected




         Finding Network Problems: Measurement Tools
Intermediate Results


Time dropped from 18 minutes to 40
seconds.
But our calculations said it should take 4
seconds!
  •...
Default TCP window settings




        Finding Network Problems: Measurement Tools
Calculating the Window Size


Remember Bob found the round-trip
time was 50 msec
Calculate window size limit
  • 85.3KB * ...
Resetting Window Value




          Finding Network Problems: Measurement Tools
With TCP windows tuned




         Finding Network Problems: Measurement Tools
Steps so far


Found and fixed Duplex Mismatch
  • Network Infrastructure problem
Found and fixed TCP window values
  • Ho...
SCP results with tuned windows




             Finding Network Problems: Measurement Tools
Intermediate Results


SCP still runs slower than expected
  • Hint: SCP uses internal buffers
  • Patch available from PS...
SCP Results with tuned SCP




        Finding Network Problems: Measurement Tools
Final Results


Fixed infrastructure problem
Fixed host configuration problem
Fixed Application configuration problem
  • ...
Why is it hard to Find/Fix Problems?


Network infrastructure is complex
Network infrastructure is shared
Network infrastr...
Outline


Problems, typical causes, diagnostic
strategies
Examples showing usage of the tools
we’ll be talking about today...
End-to-End Measurement
       Infrastructure Vision

Ongoing monitoring to test major
elements, and end-to-end paths.
  • ...
End-to-End Measurement
       Infrastructure Vision II

Many more end to end paths than can
be monitored.
Diagnostic tools...
What Campuses Can Do


Export SNMP data
  • I have an “Internet2 list”, can add you
  • Monitor loss as well as throughput...
Strategy (references) (1)


See also
  • http://e2epi.internet2.edu/
    Look at stories, documents, tools
  • http://e2ep...
Strategy (references) (2)


• http://www.psc.edu/networking/projects/tcptune/
  How to tweak OS parameters (also scp point...
www.internet2.edu
Acknowledgements


The original presentation by
Matt Zekauskas using ideas inspired by
material from NLANR DAST,
Matt Math...
Background:
Detailed Tools Discussion



        22 Mar 2005 v0.4
Bakground: Tools Outline


Tools: First mile, host issues
Tools: Path issues
Tools: Others to be aware of
Tools within Abi...
Internet2 Detective


A simple “is there any hope” tool
  • Windows “tray” application
  • Red/green lights, am I on Inter...
NLANR Performance Advisor


Geared for the naive user
Run at both ends, and see if a standard
problem is detected.
Can als...
NDT


Network Debugging Tool
Java applet
Connects to server in middle, runs tests, and
evaluates heuristics looking for ho...
Host/OS Tuning: Web100


Goal: TCP stack, tuning not bottleneck
Large measurement component
  • TCP performance not what y...
Reference Servers (Beacons)


H.323 conferencing
  • Goal: portable machines that tell you if system
    likely to work (a...
Background: Tools Outline


Tools: First mile, host issues
Tools: Path issues
Tools: Others to be aware of
Tools within Ab...
OWAMP – Latency/Loss


One-Way Active Measurement Protocol
Requires NTP-Synchronized clocks
Look for one-way latency, loss...
BWCTL -- Throughput


A tool for throughput testing that
includes scheduling and authentication.
Currently uses iperf for ...
Background: Tools Outline


Tools: First mile, host issues
Tools: Path issues
Tools: Others to be aware of
Tools within Ab...
Some Commercial Tools


Caveat: only a partial list, give me more!
Spirent (nee Netcom/Adtech):
   • SmartBits: test at lo...
Some Noncommercial Tools


Iperf: dast.nlanr.net/Projects/iperf
   • See also http://www-itg.lbl.gov/nettest/
   • http://...
Background: Tools Outline


Tools: First mile, host issues
Tools: Path issues
Tools: Others to be aware of
Tools within Ab...
Abilene:
          Measurements from the Center

Active (latency, throughput)
  • Measurement within Abilene
  • Measureme...
Goal


Abilene goal to be an exemplar
  • Measurements open
  • Tests possible to router nodes
  • Throughput tests routin...
Abilene: Machines


GigE connected high-performance tester
  • bwctl, “nms1”, 9000 byte MTU
Latency tester
  • owamp, “nms...
Throughput


Take tests 1/hr, 20 seconds each
   • IPv4 TCP
   • IPv6 TCP (no discernable difference)
   • IPv4 UDP (on ou...
Latency


CDMA used to synchronize NTP
  • www.endruntechnologies.com
Test among all router node pairs
10/sec
IPv4 and IPv...
Passive - Utilization


The Abilene NOC takes
  • Packets in,out
  • Bytes in,out
  • Drops/Errors
  • ..for all interface...
Finding Network Problems: Measurement Tools   V0.4 22-Mar-2005   82
Abilene Pointers


http://www.abilene.iu.edu/
  • Monitoring
  • Tools
http://www.itec.oar.net/abilene-netflow
http://netf...
Upcoming SlideShare
Loading in …5
×

Finding Network Problems that Influence Applications ...

409 views

Published on

  • Be the first to comment

  • Be the first to like this

Finding Network Problems that Influence Applications ...

  1. 1. Finding Network Problems that Influence Applications: Measurement Tools Internet2 Performance Workshop 22 Mar 2005 v0.4
  2. 2. Outline Problems, typical causes, diagnostic strategies Examples showing usage of the tools we’ll be talking about today End-to-End Measurement Infrastructure Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 2
  3. 3. We Would Like Your Help What problems are you experiencing? Have you used a good tool? Give us the benefit of your experience: successful problem resolution! Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 3
  4. 4. What Are The Problems? (1) Packet loss Jitter Out-of-order packets (extreme jitter) Duplicated packets Excessive latency • Interactive applications • TCP’s control system Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 4
  5. 5. For TCP Eliminating loss is the goal Non-congestive losses especially tricky TCP: 100 Mbit Ethernet coast-to-coast: • Full size packets… need 10-6 Ploss [Mathis] • Less than 1 loss every 83 seconds http://www.psc.edu/~mathis/papers/JTechs200105/ GigE: 10-8, 1 loss every 497 seconds Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 5
  6. 6. What Are The Problems? (2) TCP: lack of buffer space • Forces protocol into stop-and-wait • Number one TCP-related performance problem. • 70ms * 1Gbps = 70*10^6 bits, or 8.4MB • 70ms * 100Mbps = 855KB • Many stacks default to 64KB, or 7.4Mbps Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 6
  7. 7. What Are The Problems? (3) Video/Audio: lack of buffer space • Makes broadcast streams very sensitive to previous problems Application behaviors • Stop-and-wait behavior; Can’t stream • Lack of robustness to network anomalies Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 7
  8. 8. The Usual Suspects Host configuration errors (TCP buffers) Duplex mismatch (Ethernet) Wiring/Fiber problem Bad equipment Bad routing Congestion • “Real” traffic • Unnecessary traffic (broadcasts, multicast, denial of service attacks) Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 8
  9. 9. Strategy Most problems are local… Test ahead of time! Is there connectivity & reasonable latency? (ping -> OWAMP) Is routing reasonable (traceroute) Is host reasonable (NDT; Web100) Is path reasonable (iperf -> BWCTL) Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 9
  10. 10. One Technique: Problem Isolation via Divide and Conquer Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 10
  11. 11. Outline Problems, typical causes, diagnostic strategies Examples showing usage of the tools we’ll be talking about today End-to-End Measurement Infrastructure Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 11
  12. 12. Tool Examples When to use NDT • NDT in action at SC’04 When to use BWCTL • BWCTL in action with e-VLBI When to use OWAMP OWAMP in action with Abilene Putting it all together Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 12
  13. 13. When to use NDT When you want to know about last mile and host problems When you want a quick and easy test to provide clues at possible problem cause When you want to understand large segments of the path from the host view point When a user wants to test their own host Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 13
  14. 14. Technique Start by testing to the nearest NDT server from each end of the problem path This will help you with a majority of problems If test both indicate good performance, test to a distant NDT server If tests still indicate good performance, suspect a problem in the application, not the host or network. Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 14
  15. 15. SC’04 Real Life Example Booth having trouble getting application to run from Amsterdam to Pittsburgh Tests between Amsterdam SGI and Pittsburgh PC showed throughput limited to < 20 Mbps Assumption is: PC buffers too small Question: How do we set WinXP send/receive buffer Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 15
  16. 16. SC’04 Determine WinXP info http://www.dslreports.com/drtcp Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 16
  17. 17. SC’04 Confirm PC settings DrTCP reported 16 MB buffers, but test program still slow, Q: How to confirm? Run test to SCInet NDT server (PC has Fast Ethernet Connection) • Client-to-Server: 90 Mbps • Server-to-Client: 95 Mbps • PC Send/Recv Buffer size: 16 Mbytes (wscale 8) • NDT Send/Recv Buffer Size: 8 Mbytes (wscale 7) • Reported TCP average RTT: 46.2 msec – approximately 600 Kbytes of data in TCP buffer • Min buffer size / RTT: 1.3 Gbps Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 17
  18. 18. SC’04 Local PC Configured OK No problem found Able to run at line rate Confirmed that PC’s TCP buffers were set correctly Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 18
  19. 19. SC’04 Amsterdam SGI Run test from remote SGI to SC show floor (SGI is Gigabit Ethernet connected). Downloaded and built command line tool on SGI IRIX • Client-to-Server: 17 Mbps • Server-to-Client: 16 Mbps • SGI Send/Recv Buffer size: 256 Kbytes (wscale 3) • NDT Send/Recv Buffer Size: 8 Mbytes (wscale 7) • Average RTT: 106.7 msec • Min Buffer size / RTT: 19 Mbps Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 19
  20. 20. SC’04 Amsterdam SGI (tuned) Re-run test from remote SGI to SC show floor with –b # option. • Client-to-Server: 107 Mbps • Server-to-Client: 109 Mbps • SGI Send/Recv Buffer size: 2 Mbytes (wscale 5) • NDT Send/Recv Buffer Size: 8 Mbytes (wscale 7) • Reported average RTT: 104 msec • Min Buffer size / RTT: 153.8 Mbps Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 20
  21. 21. SC’04 Debugging Results Team spent over 1 hour looking at Win XP config, trying to verify Buffer size • 2 tools used gave different results Single NDT test verified this in under 30 seconds 10 minutes to download and install NDT client on SGI 15 minutes to discuss options and run client test with set buffer option Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 21
  22. 22. SC’04 Debugging Results 8 Minutes to find SGI limits and determine maximum allowable buffer setting (2 MB) Total time 34 minutes to verify problem was with remote servers’ TCP send/receive buffer size Network path verified but Application still performed poorly until it was also tuned Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 22
  23. 23. When to use BWCTL You want to understand segments of the path You want to know if each segment can handle flows of a specific size You want to know parameters such as bandwidth, packet loss and latency To help design or tune an application based on available performance Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 23
  24. 24. Technique Divide and Conquer! Look for segments with performance less that required by the application Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 24
  25. 25. e-VLBI Case Study The e-VLBI project needed to move massive amounts of data between a number of sites around the world They found that performance from some sites was only in the 1 Mbps range They needed to understand why Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 25
  26. 26. e-VBLI test infrastructure David Lapsley, one of the research engineers, established BWCTL servers at the sites of the project. • Japan: Kashima Observatory • Sweden: Onsala Observatory • US: Haystack (BOS) He performed a full mesh of tests between all of the servers Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 26
  27. 27. e-VLBI Results #1 They used Abilene nodes to divide the problem path David found that there was considerable packet loss in the area of Haystack Observatory Working with network folk from the area the problem was isolated and resolved Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 27
  28. 28. e-VLBI Results #2 For one site that was using a commodity Internet only 1 Mbps was regularly seen The application was changed to locate caching to reduce dependence on that site. Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 28
  29. 29. e-VLBI Regular Testing They found the testing to be very useful in understanding the network status They established a regular testing schedule They established a web site for reporting the results All researchers can check the network status http://web.haystack.mit.edu/staff/dlapsley/tsev7.html Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 29
  30. 30. When to use OWAMP Want baseline “heartbeat” information Asymmetric routes can make problem location more difficult OWAMP can provide detailed performance on one direction in the path When you want to know precise latency information Good for helping real-time applications Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 30
  31. 31. Why use OWAMP It is very sensitive to minor network changes • Route changes • Packet queuing It tells you about one-direction of the path Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 31
  32. 32. OWAMP Case Study Queuing on Abilene Tuesday, 2004-08-17, 16:05-16:20 UTC That’s 11:05 to 11:20 EDT Caltech to CERN performing 10GE throughput experiment • Single adapter to date, PCI-X • Theoretical limit of ~8.5 Gbps • Practical limit closer to 7.5 Gbps • Exactly what was tested at that time is unkown “Worst 10” delay list had some larger than normal variances… to date, software issues Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 32
  33. 33. One Links History The Denver to KSCY Link Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 33
  34. 34. What It Shows Only paths that traverse DNVR>KSCY showed additional delay Some delayed by ~ an extra 35msec Probable cause – Router started queuing packets create a small delay It tells you that there is congestion on the link. Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 34
  35. 35. Example 2 – SCP file transfer Bob and Carol are collaborating on a project. Bob needs to send a copy of the data (50 MB) to Carol every ½ hour. Bob and Carol are 2,000 miles apart. How long should each transfer take? • 5 minutes? • 1 minute? • 5 seconds? Finding Network Problems: Measurement Tools
  36. 36. What should we expect? Assumptions: • 100 Mbps Fast Ethernet is the slowest link • 50 msec round trip time Bob & Carol calculate: • 50 MB * 8 = 400 Mbits • 400 Mb / 100 Mb/sec = 4 seconds Finding Network Problems: Measurement Tools
  37. 37. Initial SCP Test Results Finding Network Problems: Measurement Tools
  38. 38. Initial Test Results This is unacceptable! First look for network infrastructure problem • Use NDT tester to examine both hosts Finding Network Problems: Measurement Tools
  39. 39. Initial NDT testing shows Duplex Mismatch at one end Finding Network Problems: Measurement Tools
  40. 40. NDT Found Duplex Mismatch Investigating this it is found that the switch port is configured for 100 Mbps Full-Duplex operation. • Network administrator corrects configuration and asks for re-test Finding Network Problems: Measurement Tools
  41. 41. Duplex Mismatch Corrected Finding Network Problems: Measurement Tools
  42. 42. SCP results after Duplex Mismatch Corrected Finding Network Problems: Measurement Tools
  43. 43. Intermediate Results Time dropped from 18 minutes to 40 seconds. But our calculations said it should take 4 seconds! • 400 Mb / 40 sec = 10 Mbps • Why are we limited to 10 Mbps? • Are you satisfied with 1/10th of the possible performance? Finding Network Problems: Measurement Tools
  44. 44. Default TCP window settings Finding Network Problems: Measurement Tools
  45. 45. Calculating the Window Size Remember Bob found the round-trip time was 50 msec Calculate window size limit • 85.3KB * 8 b/B = 698777 b • 698777 b / .050 s = 13.98 Mbps Calculate new window size • (100 Mb/s * .050 s) / 8 b/B = 610.3 KB • Use 1MB as a minimum Finding Network Problems: Measurement Tools
  46. 46. Resetting Window Value Finding Network Problems: Measurement Tools
  47. 47. With TCP windows tuned Finding Network Problems: Measurement Tools
  48. 48. Steps so far Found and fixed Duplex Mismatch • Network Infrastructure problem Found and fixed TCP window values • Host configuration problem Are we done yet? Finding Network Problems: Measurement Tools
  49. 49. SCP results with tuned windows Finding Network Problems: Measurement Tools
  50. 50. Intermediate Results SCP still runs slower than expected • Hint: SCP uses internal buffers • Patch available from PSC Finding Network Problems: Measurement Tools
  51. 51. SCP Results with tuned SCP Finding Network Problems: Measurement Tools
  52. 52. Final Results Fixed infrastructure problem Fixed host configuration problem Fixed Application configuration problem • Achieved target time of 4 seconds to transfer 50 MB file over 2000 miles Finding Network Problems: Measurement Tools
  53. 53. Why is it hard to Find/Fix Problems? Network infrastructure is complex Network infrastructure is shared Network infrastructure consists of multiple components Finding Network Problems: Measurement Tools
  54. 54. Outline Problems, typical causes, diagnostic strategies Examples showing usage of the tools we’ll be talking about today End-to-End Measurement Infrastructure Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 54
  55. 55. End-to-End Measurement Infrastructure Vision Ongoing monitoring to test major elements, and end-to-end paths. • Elements: gigaPoP links, peering, … • Utilization • Delay • Loss • Occasional throughput • Multicast connectivity Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 55
  56. 56. End-to-End Measurement Infrastructure Vision II Many more end to end paths than can be monitored. Diagnostic tools available on-demand (with authorization) • Show routes • Perform flow tests (perhaps app tests) • Parse/debug flows (a-la tcpdump or OCXmon with heuristic tools) Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 56
  57. 57. What Campuses Can Do Export SNMP data • I have an “Internet2 list”, can add you • Monitor loss as well as throughput Performance test point at campus edge • Hopefully, the result of today’s workshop • Possibly also traceroute “looking glass” • Commercial (e.g., NetIQ) complements • We have a master list Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 57
  58. 58. Strategy (references) (1) See also • http://e2epi.internet2.edu/ Look at stories, documents, tools • http://e2epi.internet2.edu/ndt/ Pointer to the tool, and using it for debugging the last mile Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 58
  59. 59. Strategy (references) (2) • http://www.psc.edu/networking/projects/tcptune/ How to tweak OS parameters (also scp pointer) • http://www.ncne.org/research/tcp/ TCP debugging the detailed way • http://dast.nlanr.net/Guides/WritingApps/ Tips for app writers • http://dast.nlanr.net/Guides/GettingStarted And some checking to do by hand & debugging. Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 59
  60. 60. www.internet2.edu
  61. 61. Acknowledgements The original presentation by Matt Zekauskas using ideas inspired by material from NLANR DAST, Matt Mathis, and others. Copyright Internet2 2005, All Rights Reserved. Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 61
  62. 62. Background: Detailed Tools Discussion 22 Mar 2005 v0.4
  63. 63. Bakground: Tools Outline Tools: First mile, host issues Tools: Path issues Tools: Others to be aware of Tools within Abilene Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 63
  64. 64. Internet2 Detective A simple “is there any hope” tool • Windows “tray” application • Red/green lights, am I on Internet2 • Multicast available • IPv6 available http://detective.internet2.edu/ Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 64
  65. 65. NLANR Performance Advisor Geared for the naive user Run at both ends, and see if a standard problem is detected. Can also work with intermediate servers http://dast.nlanr.net/Projects/Advisor Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 65
  66. 66. NDT Network Debugging Tool Java applet Connects to server in middle, runs tests, and evaluates heuristics looking for host and first mile problems. Has detailed output. You’ll see lots of detail later today. A commercial tool that tests for TCP buffer problems: http://www.dslreports.com/tweaks/ Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 66
  67. 67. Host/OS Tuning: Web100 Goal: TCP stack, tuning not bottleneck Large measurement component • TCP performance not what you expect? Ask TCP why! –Receiver bottleneck (out of receiver window) –Sender bottleneck (no data to send) –Path bottleneck (out of congestion window) –Path anomalies (duplicate, out of order, loss) www.web100.org Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 67
  68. 68. Reference Servers (Beacons) H.323 conferencing • Goal: portable machines that tell you if system likely to work (and if not, why?) • Moderate-rate UDP of interest • E.g., H.323 Beacon http://www.osc.edu/oarnet/itecohio.net/beacon/ • ViDeNet Scout, http://scout.video.unc.edu/ Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 68
  69. 69. Background: Tools Outline Tools: First mile, host issues Tools: Path issues Tools: Others to be aware of Tools within Abilene Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 69
  70. 70. OWAMP – Latency/Loss One-Way Active Measurement Protocol Requires NTP-Synchronized clocks Look for one-way latency, loss Authentication and Scheduling Again, lots more later today Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 70
  71. 71. BWCTL -- Throughput A tool for throughput testing that includes scheduling and authentication. Currently uses iperf for actual tests. Can assign users (or IP addresses) to classes, give classes different throughput limits or time limits. Periodic and on-demand testing. Lots more later today. Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 71
  72. 72. Background: Tools Outline Tools: First mile, host issues Tools: Path issues Tools: Others to be aware of Tools within Abilene Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 72
  73. 73. Some Commercial Tools Caveat: only a partial list, give me more! Spirent (nee Netcom/Adtech): • SmartBits: test at low & high rates, QoS; test components or end-to-end path NetIQ: Chariot/Pegasus Agilent (like SmartBits, and FireHunter) Ixia (like SmartBits/Spirent) Brix Networks (like AMP/Owamp, for ‘QoS’) Apparent Networks: path debugger Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 73
  74. 74. Some Noncommercial Tools Iperf: dast.nlanr.net/Projects/iperf • See also http://www-itg.lbl.gov/nettest/ • http://www-didc.lbl.gov/NCS/ Flowscan: • http://www.caida.org/tools/utilities/flowscan/ • http://net.doit.wisc.edu/~plonka/FlowScan/ SLAC’s traceroute perl script: • http://www.slac.stanford.edu/comp/net/wan-mon/traceroute-srv.html One large list: • http://www.slac.stanford.edu/xorg/nmtf/nmtf-tools.html Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 74
  75. 75. Background: Tools Outline Tools: First mile, host issues Tools: Path issues Tools: Others to be aware of Tools within Abilene Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 75
  76. 76. Abilene: Measurements from the Center Active (latency, throughput) • Measurement within Abilene • Measurements to the edge Passive • SNMP stats (esp. core Abilene links) • Variables via router proxy • Router configuration • Route state • Characterization of traffic – Netflow; OCxMON Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 76
  77. 77. Goal Abilene goal to be an exemplar • Measurements open • Tests possible to router nodes • Throughput tests routinely through backbone • …as well as existing utilization, etc. • The “Abilene Observatory” http://abilene.internet2.edu/observatory Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 77
  78. 78. Abilene: Machines GigE connected high-performance tester • bwctl, “nms1”, 9000 byte MTU Latency tester • owamp, “nms4”, 100bT Stats collection • SNMP, flow-stats, “nms3”, 100bT Ad-hoc tests • NDT server, “nms2”, gigE, 1500 byte MTU Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 78
  79. 79. Throughput Take tests 1/hr, 20 seconds each • IPv4 TCP • IPv6 TCP (no discernable difference) • IPv4 UDP (on our platforms flakey at 1G) • IPv6 UDP (ditto) Others test to our nodes Others test amongst themselves Net result: 25% of traffic (NOT capacity) is measurement Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 79
  80. 80. Latency CDMA used to synchronize NTP • www.endruntechnologies.com Test among all router node pairs 10/sec IPv4 and IPv6 Minimal sized packets Poisson schedule Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 80
  81. 81. Passive - Utilization The Abilene NOC takes • Packets in,out • Bytes in,out • Drops/Errors • ..for all interfaces, publishes internal links & peering points (at 5 min intervals) • ..via SNMP polling – every 60 sec http://loadrunner.uits.iu.edu/weathermap s/abilene/abilene.html Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 81
  82. 82. Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 82
  83. 83. Abilene Pointers http://www.abilene.iu.edu/ • Monitoring • Tools http://www.itec.oar.net/abilene-netflow http://netflow.internet2.edu/weekly/ (summaries) Finding Network Problems: Measurement Tools V0.4 22-Mar-2005 83

×