Published on

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Trafficclusters

    1. 1. Automatically Inferring Patterns of Resource Consumption in Network Traffic Cristian Estan, Stefan Savage, George Varghese University of California, San Diego
    2. 2. Who is using my link?
    3. 3. Looking at the traffic Do something smarter! Too much data for a human
    4. 4. Looking at traffic aggregates <ul><li>Aggregating on individual packet header fields gives useful results but </li></ul><ul><ul><li>Traffic reports are not always at the right granularity (e.g. individual IP address, subnet, etc.) </li></ul></ul><ul><ul><li>Cannot show aggregates defined over multiple fields (e.g. which network uses which application) </li></ul></ul><ul><li>The traffic analysis tool should automatically find aggregates over the right fields at the right granularity </li></ul>Most traffic goes to the dorms … What apps are used? Where does the traffic come from? …… Which network uses web and which one kazaa? 3 2 1 Rank 2.83% 3.12% 11.9% Traffic Destination IP 3 2 1 Rank 17.8% 18.1% 27.5% Traffic Destination network 3 2 1 Rank 6.3% Ssh 6.7% Kazaa 42.1% Web Traffic Source port Dest. IP Dest. net Source port Src. IP Src. port Src. net Dest. port Dest. IP Dest. net Protocol
    5. 5. Ideal traffic report Web is the dominant application The library is a heavy user of web That’s a big flash crowd ! This is a Denial of Service attack !! This paper is about giving the network administrator insightful traffic reports 11.9% ICMP traffic from to 13.4% Web traffic from 26.7% Web traffic to 42.1% Web traffic Traffic Traffic aggregate
    6. 6. Contributions of this paper <ul><li>Approach </li></ul><ul><li>Definitions </li></ul><ul><li>Algorithms </li></ul><ul><li>System </li></ul><ul><li>Experience </li></ul>
    7. 7. Approach <ul><li>Characterize traffic mix by describing all important traffic aggregates </li></ul><ul><ul><li>Multidimensional aggregates (e.g. flash crowd described by protocol, port number and IP address) </li></ul></ul><ul><ul><li>Aggregates at the the right level of granularity (e.g. computer, subnet, ISP) </li></ul></ul><ul><ul><li>Traffic analysis is automated – finds insightful data without human guidance </li></ul></ul>
    8. 8. Definition: traffic clusters <ul><li>Traffic clusters are the multidimensional traffic aggregates identified by our reports </li></ul><ul><li>A cluster is defined by a range for each field </li></ul><ul><li>The ranges are from natural hierarchies (e.g. IP prefix hierarchy) – meaningful aggregates </li></ul><ul><li>Example </li></ul><ul><ul><li>Traffic aggregate : incoming web traffic for CS Dept. </li></ul></ul><ul><ul><li>Traffic cluster : ( SrcIP=*, DestIP in, Proto=TCP, SrcPort=80, DestPort in [1024,65535] ) </li></ul></ul>
    9. 9. <ul><li>Traffic reports give the volume of chosen traffic clusters </li></ul><ul><li>To keep report size manageable describe only clusters above threshold (e.g. H=total of traffic/20) </li></ul><ul><li>To avoid redundant data compress by omitting clusters whose traffic can be inferred (up to error H) from non-overlapping more specific clusters in the report </li></ul><ul><li>To highlight non-obvious aggregates prioritize by using unexpectedness label </li></ul><ul><ul><li>Example </li></ul></ul><ul><ul><ul><li>50% of all traffic is web </li></ul></ul></ul><ul><ul><ul><li>Prefix B receives 20% of all traffic </li></ul></ul></ul><ul><ul><ul><li>The web traffic received by prefix B is 15% instead of 50%*20%=10%, unexpectedness label is 15%/10%=150% </li></ul></ul></ul>Definition: traffic report
    10. 10. Contributions of this paper <ul><li>Approach </li></ul><ul><li>Definitions </li></ul><ul><li>Algorithms </li></ul><ul><li>System </li></ul><ul><li>Experience </li></ul>
    11. 11. Algorithms and theory <ul><li>Algorithms and theoretical bounds in the paper </li></ul><ul><ul><li>Unidimensional reports are easy to compute </li></ul></ul><ul><ul><li>Multidimensional reports are exponentially harder as we add more fields </li></ul></ul><ul><li>Next few slides </li></ul><ul><ul><li>Example of unidimensional compression </li></ul></ul><ul><ul><li>Example for the structure of the multidimensional cluster space </li></ul></ul>
    12. 12. Unidimensional report example 15 35 30 40 160 110 35 75 Hierarchy Threshold=100 50 70 270 35 75 75 305 50 70 120 380 500 500 120 380 305 270 160 110
    13. 13. Unidimensional report example 120 380 160 110 Compression 305-270<100 380-270 ≥ 100 270 120 500 305 380 160 110 380 110 160 120 Traffic Source IP
    14. 14. Multidimensional structure ex. Nodes (clusters) have multiple parents US Web Nodes (clusters) overlap CA All traffic All traffic US EU CA NY GB DE Web Mail Source net Application US Web
    15. 15. Contributions of this paper <ul><li>Approach </li></ul><ul><li>Definitions </li></ul><ul><li>Algorithms </li></ul><ul><li>System </li></ul><ul><li>Experience </li></ul>
    16. 16. System: AutoFocus Traffic parser Web based GUI Cluster miner Grapher Packet header trace categories names
    17. 20. Contributions of this paper <ul><li>Approach </li></ul><ul><li>Definitions </li></ul><ul><li>Algorithms </li></ul><ul><li>System </li></ul><ul><li>Experience </li></ul>
    18. 21. <ul><li>Backups from CAIDA to tape server </li></ul><ul><ul><li>Semi-regular time pattern </li></ul></ul><ul><li>FTP from SLAC Stanford </li></ul><ul><li>Scripps web traffic </li></ul><ul><li>Web & Squid servers </li></ul><ul><li>Large ssh traffic </li></ul><ul><li>Steady ICMP probing from CAIDA </li></ul>Structure of regular traffic mix SD-NAP SD-NAP
    19. 22. Analysis of unusual events <ul><li>UCSD to UCLA route change </li></ul><ul><li>Sapphire/SQL Slammer worm </li></ul>Site 2
    20. 23. Conclusions 1010111101010000101011111101011001010101101011010000101010100101010111101010101000101111010000010111111101011001010111010111100100101010100011011111100010101110110101100101010110101111000010101011110111010111010101010111111010110010101011010101111101010000110100001011010100101011001000000101011001010101011111000010001000010101011110101000010111001010101101011110000010101011111101011000101111010000010111110101011010111100100101010110010101010001010100101010110101010010111001010000010100001110110101010110111111000101011101011101011001010101101011110000110111101110101110101010101111110101100101010110101111011101010000110101010010101101010111010101001010000101011010101001010100000101010101010101101011101010100000010101010101101010101011110101110101011010100011000101010010111010101001101010100001000110101111010100010110
    21. 24. Conclusions <ul><li>Multidimensional traffic clusters using natural hierarchies describe traffic aggregates </li></ul><ul><li>Traffic reports using thresholding identify automatically conspicuous resource consumption at the right granularity </li></ul><ul><li>Compression produces compact traffic reports and unexpectedness labels highlight non-obvious aggregates </li></ul><ul><li>Our prototype system, AutoFocus , provides insights into the structure of regular traffic and unexpected events </li></ul>
    22. 25. Thank you! <ul><li>Alpha version of AutoFocus downloadable from </li></ul><ul><li> </li></ul><ul><li>Any questions? </li></ul><ul><li>Acknowledgements: NIST, NSF, Vern Paxson, David Moore, Liliana Estan, Jennifer Rexford, Alex Snoeren, Geoff Voelker </li></ul>
    23. 26. Bounds and running times linear linear ≤ T/H 1dim. report O(m(d-1)) O(n+m(d-1)) ≤ 1+(d-1)T/H unc. 1dim. rep. ≈ e result ≈ result*n Running time O(m+result) ≤ T/H ∏d i unc. +dim. rep. ≤ T/H ∏d i /max(d i ) +dim. rep. +dim. Δ report linear Memory usage ≤ T 1 /H+T 2 /H Report size 1dim. Δ report
    24. 27. Open questions <ul><li>Are there tighter bounds for the size of the reports? </li></ul><ul><li>Are there algorithms that produce smaller results? </li></ul><ul><li>Are there algorithms that compute traffic reports more efficiently? In streaming fashion? </li></ul>
    25. 28. Delta reports <ul><li>Why repeat the same traffic report if the traffic doesn’t change from one day to the other? </li></ul><ul><li>Delta reports describe the clusters that increased or decreased by more than the threshold from one interval to the other </li></ul><ul><li>On related traffic mixes delta reports much smaller than traffic reports </li></ul><ul><li>Multidimensional compression very hard for delta reports </li></ul><ul><ul><li>We have only exponential algorithm for the cluster delta </li></ul></ul>
    26. 29. Greedy compression algorithm
    27. 30. Multidimensional report example Thresholding Compression
    28. 31. System details 350 1000 5400 LoC Glue GUI Backend Part evolving perl functional HTML, Javascript stable C++ Status Language