SlideShare a Scribd company logo
1 of 7
Download to read offline
LI LAYOUT   4/27/11   2:24 PM    Page 2




             Measurement and Diagnosis of Address
                   Misconfigured P2P Traffic
                                      Zhichun Li, NEC Laboratories America, Inc.
                                            Anup Goyal, Yahoo! Search
                            Yan Chen and Aleksandar Kuzmanovic, Northwestern University


                                                                   Abstract
                             Through measurement study, we discover an interesting phenomenon, P2P address
                             misconfiguration, in which a large number of peers send P2P file downloading
                             requests to a “random” target on the Internet. Through measuring three large
                             datasets spanning four years and across five different /8 networks, we find
                             address-misconfigured P2P traffic on average contributes 38.9 percent of Internet
                             background radiation, increasing by more than 100 percent every year. To detect
                             and diagnose such unwanted traffic, we design the P2PScope, a measurement
                             tool. After analyzing about 2 Tbytes of data and tracking millions of peers, we
                             found that in all the P2P systems, address misconfiguration is caused by resource
                             mapping contamination: the sources returned for a given file ID through P2P index-
                             ing are not valid. Different P2P systems have different reasons for such contamina-
                             tion. For eMule, we find that the root cause is mainly a network byte-order
                             problem in the eMule Source Exchange protocol. For BitTorrent misconfiguration,
                             one reason is that anti-P2P companies actively inject bogus peers into the P2P sys-
                             tem. Another reason is that the KTorrent implementation has a byte-order problem.




      P          eer-to-peer (P2P) traffic has grown to be one of the
                 dominant sources of Internet traffic. Given the
                 unprecedented amount of P2P traffic flowing through
                 the Internet, misconfiguration, due to software bugs
       or attackers with malicious motives, is a serious problem of
       both the Internet and P2P systems.
          Usually it is believed that so-called Internet background
                                                                              Gb/s intercontinental link (e.g., between Los Angeles and
                                                                              Tokyo) costs $1.4 million per year to lease, ISPs might want to
                                                                              remove such traffic for a more “green” Internet.
                                                                                 Our first contribution, discovering and measuring address-
                                                                              misconfigured P2P traffic, further motivates us to build tools
                                                                              for detection and diagnosis of such unwanted traffic. Here
                                                                              detection refers to attributing the misconfiguration to a partic-
       radiation [1] — unsolicited traffic sent to all IP addresses           ular variant or version of the P2P software that contains bugs,
       including unused ones — is mainly scanning traffic or denial-          or a particular peer group (e.g., peers from anti-P2P compa-
       of-service (DoS) backscatter traffic. As part of the contribu-         nies who are hired by the movie and music industries to pro-
       tions of this article, however, after we analyze about 2 Tbytes        tect their copyrights). Detection is of crucial importance,
       of data from three institutions spanning four years on five dif-       because without it, nobody will take responsibility for fixing
       ferent /8 networks, we discover that address-misconfigured             the problem, given the large number of P2P software variants.
       P2P traffic is one of the major sources (an average of 38.9            The root causes can be either internal (bugs or software mis-
       percent in terms of the number of connections) of Internet             behavior) or external (e.g., injection attacks). Diagnosis means
       background radiation. Address-misconfigured P2P traffic is             inferences of the root causes, say, by locating the software
       caused by a large number of peers sending P2P file download-           code that contains the bug or identifying the anti-P2P peers
       ing requests to a target that has never been part of the P2P           that trigger the misconfiguration.
       network. In addition, we observe that from 2004 to 2007,                  We design two general principles:
       address-misconfigured P2P traffic increased more than 100              • P2P software testing by tracking its information flow in a
       percent each year (as shown in Fig. 1). To the best of our                controlled environment
       knowledge, we are the first to discover and study address-mis-         • P2P traffic measurement including both passive monitoring
       configured P2P traffic with large-scale datasets.                         and active backtracking to identify misconfiguration events
          For end users, such unwanted traffic affects their P2P sys-            in the wild
       tem performance and can involve innocent users in distributed             Applying the principles above, we design and implement
       DoS (DDoS) attacks unconsciously [2].                                  P2PScope. Our analysis shows that all address misconfigura-
          From the Internet service providers’ (ISPs’) perspective,           tions are caused by resource mapping contamination, that is,
       such misconfigured traffic wastes Internet resources. As a first       the peers returned through indexing are invalid, thus causing
       order estimation, extrapolating the address-misconfigured P2P          a large amount of incorrect file downloading requests. Differ-
       traffic observed in our datasets to the whole Internet, we find        ent systems have different reasons for contamination. We
       it consumes modest bandwidth, 7.9 Gb/s globally. Moreover,             have performed root cause analysis for the two largest P2P
       this traffic is mostly intercontinental. Therefore, given a 10         systems, eMule and BitTorrent.


       2                                    0890-8044/11/$25.00 © 2011 IEEE                                   IEEE Network • March/April 2011
LI LAYOUT   4/27/11                                  2:24 PM   Page 3




             Number of connections (millions)
                                                14
                                                12
                                                                                      scheme, we use the GQ trace only for prevalence analysis.
                                                10
                                                7                                     Address Misconfiguration Event Identification
                                                6
                                                                                      To identify address-misconfigured P2P traffic, we first filter
                                                                                      out the scanning traffic caused by botnets or worms based on
                                                4                                     whether they have malicious payloads that match known
                                                2                                     worm/botnet signatures or they scan a large number of IP
                                                                                      addresses (≥ 10). We have also checked the removed traffic
                                                0
                                                     2004      2005     2006   2007   against P2P protocol signatures, and have not found any
                                                                                      matches. For remaining traffic, we detect P2P connections
        Figure 1. The total number of connections that match the P2P                  based on whether their payloads match P2P protocol signa-
          payload signatures from 2004 to 2007 (LBL).                                 tures.
                                                                                         After that, we aggregate the address-misconfigured P2P
                                                                                      traffic to events. We find all the peers target a few hotspots in
         For eMule, one major root cause is a network byte-order                      Honeynets. Ninety-seven percent of peers only contact one
       problem in the source exchange protocol. We confirm that                       Honeynet IP per day (all contact less than four per day).
       aMule (an eMule variant) has the byte-order bug. For BitTor-                   Therefore, we group the P2P connections into address-mis-
       rent, address misconfiguration is mostly disseminated by the                   configuration events based on their target destination. We
       Peer Exchange (PEX) protocol implemented by uTorrent-                          mainly analyze the events with more than 100 unique sources
       compatible clients. We find two reasons:                                       seen in a six hour interval. For smaller events, it is hard to
       • Some anti-P2P companies actively inject bogus peers                          profile their patterns. The event time boundaries are the time
         through the PEX protocol using modified Azureus.                             intervals that the number of unique sources reduces to less
       • KTorrent has a byte ordering problem. We have confirmed                      than one-third of the peak.
         the bug with the developers.                                                    The above method only conservatively estimates the lower
                                                                                      bound of the number of P2P connections and number of
                                                                                      events seen in the Honeynets. We address this by adding a
       Passive Measurement Analysis                                                   “Likely” category in Table 2 for the event in which we cannot
       When monitoring the Internet background radiation, we dis-                     identify which protocol is being used, but its traffic pattern is
       cover an interesting traffic pattern that has not been described               similar to that of an address-misconfiguration event. These
       in the literature before. We call it P2P address misconfigura-                 “likely” events follow the same pattern—a large number of
       tion. In this section, we study its basic behavior through pas-                sources contact a few hotspots. We believe that they are actu-
       sive measurement. In the next section, we further design the                   ally address-misconfiguration events, because there are no
       P2PScope system to diagnose its root causes.                                   similar patterns known for Internet background radiation. The
                                                                                      likely cases account for only 9.6 percent of the total events,
       Data Collections                                                               showing that we are able to identify most cases accurately.
       In our study, we mainly use three datasets collected at differ-
       ent Honeynet/Honeyfarm sensors. Honeynets are unused                           P2P System Diversity
       address blocks on which we deploy honeypot responders in                       Table 2 shows the percentage of P2P connections that match
       order to elicit information about incoming probes. Those hon-                  P2P signatures and the number of address-misconfiguration
       eypots are protocol simulators, which respond to incoming                      events found. Each row of the table is for one P2P protocol.
       probes with faked responses as if the responses are from real                  The “other P2P” category is for connections that matched
       machines. Honeyfarm goes one step further, channeling the                      P2P protocols other than the six protocols listed in Table 2.
       incoming traffic to virtual machines as if real machines are                      We find that address-misconfiguration events mainly come
       running at the addresses (Table 1).                                            from six different P2P systems. BitTorrent and eMule are the
                                                                                      most popular P2P systems, and thus also generate the most
       The LBL Honeynet Dataset — The LBL Honeynet sensor is at                       traffic and events. We also find a number of address miscon-
       the Lawrence Berkeley National Laboratory with five continu-                   figurations from other P2P software, such as Gnutella, Soriba-
       ous /24 IP blocks in one /16 network. We have data from 2004                   da, Xunlei, and VAgaa.
       through 2007. The honeynet simulates the common protocols                         These P2P systems include centralized, decentralized dis-
       and acts as an echo server (sending back the same requests to                  tributed hash table (DHT)-based, and decentralized unstruc-
       the clients) for other port numbers. In 2008, traffic collection               tured P2P systems. This diversity suggests that the prevalence
       was stopped because the Motion Picture Association of Amer-                    of address misconfiguration is not confined to any single type
       ica (MPAA) repeatedly complained that the Honeynet IPs                         of P2P system.
       were hosting their latest movies (actually false positives).

       The NU Honeynet Dataset — The NU Honeynet sensor at
       Northwestern University has 10 discontinuous /24 IP blocks                                       LBL             NU             GQ
       within three different /8 IP prefixes. It used the same configu-
       ration as the LBL Honeynet until 23 August, 2007. After that,                   Sensor size      5 /24           10 /24         4 /16
       we developed the P2P-enabled Honeynet for the P2PScope
       system, which simulates P2P protocols on all port numbers.                      Trace size       901 Gbytes      916 Gbytes     49 Gbytes

       The GQ Honeyfarm Dataset — We have 26 days of traffic                           Starting time    2004/03/07      2006/09/01     2006/01/03
       from the GQ honeyfarm at the International Computer Sci-
       ence Institute, which was originally designed to capture worms                  End time         2008/01/21      2007/12/31     2006/01/28
       and botnets. The GQ honeyfarm has four /16 networks in one
       /8 network. Since the GQ honeyfarm runs a different response                   Table 1. Data description.


       IEEE Network • March/April 2011                                                                                                               3
LI LAYOUT   4/27/11   2:25 PM      Page 4




                      event (≥ uniq     P2P connections
                        srcs) 100         by payloads
                                                             comments
                                                                                             sion or peer group):
                      LBL     NU       LBL       NU                                          • Measurement, including both passive monitor-
                                                                                                ing and active backtracking
         eMule        143      416     5.96%      76.52%       Popular, especially in        • Tracking the information flow for the suspi-
                                                               Europe                           cious P2P software in a controlled environment
         BitTorrent   74       211     75.1%      19.79%       Popular                       The first approach helps quickly identify which
         Gnutella     4        3       2.48%      3.65%        Popular, unstructured
                                                                                             software version, communication mechanism, or
         Soribada     6        0       15.8%      .0001%       Popular in Korea
                                                                                             peer groups is/are potentially the sources of the
         Xunlei       12       0       0.34%      0%           Popular in China
                                                                                             misconfiguration. This approach also helps iden-
         VAgaa        1        1       0.14%      0.013%       Popular in China
         Other P2P    0        0       0.17%      0.018%
                                                                                             tify the possible attacks or misconfiguration
                                                                                             injected, such as those from anti-P2P companies.
                                                                                             The second approach can further validate mis-
         Likely       73       20      N/A        N/A
                                                                                             configuration caused by software bugs.
          Table 2. Address-misconfigured P2P traffic distribution and event distribu-           After detection, we want to further diagnose
          tion (LBL and NU Honeynet datasets).                                               the root causes of the misconfiguration. This
                                                                                             diagnosis step is harder than detection, and usu-
                                                                                             ally requires manual inspection. Information flow
       Characteristics of Address Misconfiguration Events                      tracking is still very useful in this stage. Another useful
                                                                               approach is to form the hypotheses regarding the root causes
       We also discover that such events are quite heterogeneous.              and then to conduct experiments for validation.
       The scale of events (in terms of duration and the number of                 Per the principles above, P2PScope has two major subsys-
       unique sources involved) varies substantially. Some events are          tems, detection and diagnosis, as shown in Fig. 2.
       as short as one to three hours; some are as long as one month.
       The average number of peers in eMule events is 1404 for LBL             Detection Subsystem
       and 1028 for NU. BitTorrent events are on a smaller scale:              The detection system has three modules:
       the average number of peers in BitTorrent events is 894 for             • Passive monitoring module
       LBL and 291 for NU.                                                     • Backtracking module
                                                                               • Information flow tracking module
       Prevalence in Time and Space                                                The passive monitoring module detects the address-miscon-
       We use the four-year LBL data to analyze the temporal trend             figured P2P traffic from Honeynets. Further, the backtracking
       of address-misconfiguration. Since the scale of the events dif-         module collects more information about the misconfigured
       fers substantially, instead of using the number of events, we           peers and the files related to address misconfiguration. When
       use the total number of connections that match the P2P signa-           these two measurement modules detect some suspicious P2P
       tures in a year as the metric for studying temporal trend. This         software version through protocol analysis, the software will
       metric gives us the lower bound of the number of P2P con-               be passed to the information flow tracking module. In the
       nections. Figure 1 shows the increasing trend of address-mis-           information flow tracking module, we install that version of
       configured P2P traffic. Additionally, we analyze the annual             P2P software and check its peer propagation. Finally, all col-
       trend of the total volume of Internet background radiation in           lected information will be output to root cause diagnosis sub-
       the LBL sensor. The total traffic is stable while the address-          system.
       misconfigured P2P traffic increases quickly.
          We also observe that address misconfiguration is prevalent           Passive Monitoring Module — We find that Honeynets are
       across the IP space. As we mentioned earlier, the LBL and               useful for monitoring the address-misconfigured P2P traffic
       NU Honeynets are in four different /8. In all the four /8 pre-          on unused IP blocks. The Honeynets respond to the incoming
       fixes, we find misconfiguration traffic. We also observe similar        connections and generate “faked” responses. We use Honeyd-
       behavior in the GQ Honeyfarm traces. Therefore, we believe              based [3] lightweight Honeynets. Honeyd uses port numbers
       the behavior is not confined to a single subspace of the IP             to identify protocols, but P2P traffic is on random port num-
       space.                                                                  bers. To overcome this limitation, we apply a payload signature
                                                                               based approach. We first identify the P2P protocol based on
       Global Address-Misconfigured P2P Traffic Estimation                     payload signatures and then call appropriate P2P responders
       We try to estimate the percentage of address-misconfigured              for further processing. Currently, we have developed BitTor-
       P2P traffic in Internet background radiation traffic. It is very        rent and eMule protocol responders.
       challenging, given the limited data we have. We use the data
       (June–December 2007) from the LBL and NU Honeynets for                  Backtracking Module — To understand how peers get miscon-
       this analysis. We treat the percentage estimated in each /8             figured, the backtracking module tracks the peers in real time.
       prefix of the Honeynet sensors as an independent sample. We             In P2P systems, a peer needs to download from a list of peers
       then use the average percentage from all the samples as a               that have the given resource. Three methods can facilitate the
       more representative result. As a first order estimation, we find        resource to peer mapping: central index servers (tracker in
       the average percentage of address-misconfigured P2P traffic             BitTorrent terminology), distributed hash indexing, and peer
       in the Internet background radiation is 38.9 percent.                   exchange (query more peers from known peers). For eMule
                                                                               and BitTorrent, we implement all three approaches. In the
       System Design                                                           backtracking module, for a given resource we periodically
                                                                               query the index servers or DHT for the peer lists. Then, for
       General Design Principles                                               any peer, we use peer exchange protocols to query more
       In general, two reasons can potentially cause P2P misconfigu-           peers. The whole process stops when the total number of
       rations: software bugs, and external attack or misconfiguration         peers remains stable.
       injection. Two approaches are useful for detecting misconfigu-
       ration (attributing the problem to a particular software ver-           Information Flow Tracking Module — Whenever a certain P2P


       4                                                                                                     IEEE Network • March/April 2011
LI LAYOUT   4/27/11     2:25 PM      Page 5




                  Honeynet traffic

                                            Detection                        Data Plane Traffic Radiation
                                            subsystem
             Passive monitoring module                                       Usually, there are a number of different protocol messages
                                                                             involved in a P2P file sharing network. Address misconfigura-
                                                            Root cause       tion in all the six P2P systems that we explored has a common
                                         Information flow    diagnosis
                                          tracking module   subsystem        feature: the traffic is caused by file downloading requests, i.e.,
               Backtracking module                                           data plane traffic which is directly related to data transfer.
                                                                             Different P2P systems use different ways for exchanging con-
                   Index servers
                       DHT
                                                                             trol plane information (e.g., index server, Peer Exchange Pro-
                  Peer exchange                                              tocol, and DHT), but all have the same problem of giving
                                                                             bogus IPs to peers. These bogus IPs propagate quickly in P2P
                                                                             systems and cause large amounts of address-misconfigured
        Figure 2. Architecture of the P2PScope system.                       P2P traffic.

                                                                             eMule Misconfiguration Diagnosis
                                                                             Which Peers Spread Misconfiguration? — Hypothetically, the
       client version is suspicious, we install it in our controlled envi-   misconfigured peers that target the Honeynets might include
       ronment. In the controlled environment, by interpreting the           anti-P2P peers. However, among 1,771,296 misconfigured
       protocols (index server, DHT, and peer exchange protocol)             peers we observed, 99.90 percent of peers are normal peers
       used for peer propagation, we monitor the peers in and out            (non-anti-P2P peers).
       through protocol analysis. If the software gives out a peer that
       never comes in, we know the software will cause address mis-          What Is the Root Cause? — When analyzing the network-level
       configuration.                                                        behavior of the eMule misconfiguration, we notice that the IP
                                                                             addresses formed by reversing the byte order of the targets
       Diagnosis Subsystem                                                   were mostly alive. With the real-time P2PScope system, we
       Tracking down the root causes of address misconfiguration is          have done two experiments to verify the byte-order problem.
       a challenging problem. We have developed the following tools          Furthermore, we have confirmed the bug in source code.
       for understanding the root causes:                                       In the first experiment, we tested 13 targeted destinations
       • Track the information flow within the suspicious P2P soft-          in real time, and observed that 61 percent of reverse IPs were
          ware.                                                              indeed running eMule with the same port number of the tar-
       • Trace the Honeynet IP addresses propagated in the P2P               gets. In the second experiment, we checked whether the
          systems.                                                           reverse IPs of unroutable peers observed in the peer list of a
       • Check the routability of the peers returned.                        given file ran eMule. We aggregated the result of 100 files
       • Check whether the peers are on the anti-P2P IP lists and            found in real time. 10.3 percent of backtracked peer IPs are
          analyze their behavior.                                            unroutable. Among them, we observe 69.6 percent whose
       • Check the reverse byte order of peer IPs, because it is one         reverse IPs run eMule. This, again, is more strong evidence in
          of the most frequent mistakes in networked system imple-           favor of the byte order hypothesis.
          mentations.                                                           We further located the bug in the source code of aMule, a
          To understand why the bogus peers can be produced by the           popular eMule alternative. eMule has a complex client ID to
       suspicious P2P software, we trace how the bogus peers are             IP mapping system called HybridID. In this mapping system,
       produced. Another useful toolkit is the routability checking.         to represent IP in index servers or DHTs different byte orders
       In [4], Cooke et al. report 66.8 percent of all possible IPv4         are used. The source (peer) exchange protocol needs to con-
       addresses are unroutable. If we assume that the bogus peers           sider both cases. However, in aMule with versions before
       are produced randomly, we would find a reasonably large per-          2.1.0, this problem was ignored, causing the byte-order bug.
       cent of unroutable bogus peers. If the percentage is small, it
       implies that bogus peers are not produced randomly. The per-          How Is the Misconfiguration Disseminated? — eMule offers
       centage of unroutable peers can serve as a low bound of the           three ways to obtain peers for a file: index servers, eMule
       bogus peers. Furthermore, since we suspect the anti-P2P com-          source exchange protocol, and eMule Kademlia DHT. We
       panies, which try to take down the P2P networks, might be             would like to know which one disseminates the misconfigura-
       related to address misconfiguration, we also check whether            tion. Since we have not observed any misconfigured DHT
       the peers belong to the anti-P2P IP list.                             traffic in the approximately 2-Tbyte trace we have, we believe
          With the above tools, we mainly try to answer the following        the DHT does not have the byte-order problem. To attribute
       questions:                                                            the problem to index servers or the source exchange protocol,
       • Which peers spread misconfiguration?                                we queried 100 files and got a total of 37,079 unique peers.
       • What is the root cause?                                             We find that 10.7 percent of the peers from the source
       • How is misconfiguration disseminated?                               exchange have Honeynet IPs in their neighbor list, but the
       • What is the percentage of bogus peers in misconfigured P2P          index servers never return Honeynet IPs. We have also done a
          networks?                                                          routability check, and found that 12.8 percent of the peers
                                                                             returned by the source exchange protocol are unroutable;
       Detection and Diagnosis Results                                       none of the peers returned by the index servers is unroutable.
       We have deployed P2PScope in the NU Honeynet since                    Also, 15.7 percent of peers from source exchange have reverse
       August 2007.                                                          IPs that run eMule; none of the peers returned by index
          We first study the general characteristics that are related to     servers has such behavior. Therefore, it is not an index server
       the root causes of all P2P systems. Then we focus on two case         but the source exchange protocol that is used for disseminat-
       studies for eMule and BitTorrent, the most popular P2P soft-          ing the misconfiguration.
       ware generating most of the address-misconfigured P2P traf-
       fic.                                                                  What Is the Percentage of Bogus Peers in the Misconfigured
                                                                             P2P Networks? — We try to estimate the percentage of peers


       IEEE Network • March/April 2011                                                                                                       5
LI LAYOUT     4/27/11       2:25 PM      Page 6




                                           Normal peer events   Anti-P2P peer events

            Number of events               136-NU, 36-LBL       75-NU, 39-LBL             a small number of networks, which belong to
                                                                                          server hosting companies. The metric source cor-
                                           90% uTorrent                                   relation measures the degree to which the sources
            Client software — NU                                100% Azureus
                                           10% others                                     of different events are overlapped. Ninety-seven
                                                                                          percent of pairs of anti-P2P peer events share
                                           57% Bit Spirit,                                more than 25 percent of sources. However, the
            Client software — LBL          31% BitComet         100% Azureus              normal peer events are little coordinated.
                                           12% others
                                                                                          What Is the Root Cause? — We have not cap-
            Message length                 Variable lengths     Similar lengths           tured any real-time anti-P2P peer events. Thus,
                                                                                          we cannot diagnose their root causes. In this sec-
            Files queried                  Diverse              Latest music/movies       tion and later, we mainly focus on normal peer
                                                                                          events. Normal peer events can still be caused by
            Avg. no. of connections/IP     25                   400                       anti-P2P peers, as discussed next.

            Source correlation             Little coordinated   Highly coordinated        Root Cause I: Anti-P2P Companies Deliberately
                                                                                          Inject Bogus Peers in P2P Systems — Our
         Arrival and departure         Gradually              All together                P2PScope system tracks the misconfigured peers
                                                                                          in real time. We detect some interesting anti-P2P
         Average event duration        106.1 hours            4.5 hours                   hosts coming from the same server farm,
                                                                                          72.172.90/24, owned by an anti-P2P company
         IP distribution               Diverse                Small number of /24         called Artist-direct. Through further analysis on
                                                                                          the server farm, we discover that each of these IPs
          Table 3. Comparison between normal peer events and anti-P2P peer events.        runs hundreds of BitTorrent clients simultaneous-
                                                                                          ly. All of the clients run a version of Azureus
                                                                                          2500. Moreover, in the normal case, the peers
       (from the source exchange protocol) that have the byte-order         only respond to a peer exchange message when they download
       problem. This is challenging because there are certain cases,        (or own) the file so that they can return the other peers from
       as we show below, in which we cannot determine whether               which they download. However, these anti-P2P clients respond
       they have a byte-order problem. Instead, we estimate the             to any arbitrary file; that is, they declare they have any file the
       lower and upper bounds.                                              remote peer wants and exchange the bogus peers with it. Actu-
          If a peer is not running eMule but its reverse IP is, we          ally, they also respond to any random file ID even when no file
       believe it definitely has a byte-order problem. Such peers are       is associated with that value. Artist-direct might have modified
       12.7 percent of the total peers (37,079 peers for 100 files).        the Azureus client to implement this.
          If a peer is unroutable but its reverse IP is routable, we           We have checked the neighbor lists returned by these anti-
       believe it is likely to have a byte-order problem. Because for       P2P IPs. Most returned peers are bogus peers. Among the
       those peers we cannot verify whether their reverse IPs run           peers, 56.7 percent are unroutable, 1.8 percent are from the
       eMule directly, we only count them in the upper bound, which         same serverfarm, and we could not connect to any of the
       is 25 percent of the total peers.                                    remaining 41.5 percent of the peers. We suspect anti-P2P
                                                                            companies want to slow down the downloading process by
       BitTorrent Misconfiguration Diagnosis                                supplying bogus peers randomly.
       Which Peers Spread Misconfiguration? — When analyzing the               The anti-P2P related normal peers are the normal peers that
       events discussed earlier, we found two groups of BitTorrent          query the files in which anti-P2P peers are interested. Most of
       events. In one group, the majority of peers (> 90 percent) are       these peers (> 90 percent) are within two IP hops of anti-
       anti-P2P peers. We call them anti-P2P events. In the other           P2P peers. The anti-P2P related normal peers are 19.7 per-
       group, the majority of peers are normal peers. We call them          cent of the total misconfigured peers and are responsible for
       normal events. There are no other events between these.              20.7 percent of the probes sent to the Honeynet. Therefore,
       Anti-P2P events surprise us, since they show that the peers          the anti-P2P companies are responsible for 1/5 of the traffic
       from the anti-P2P companies somehow get themselves solely            observed. Since it is impossible to identify all the anti-P2P
       misconfigured.                                                       peers, this estimate is quite conservative.
          As shown in Table 3, we compare the properties of these
       two types of events and discover that their characteristics are      Root Cause II: The Byte-Order Problem of KTorrent Clients —
       indeed different. Peers from anti-P2P peer events all use            We further actively track peers related to the files requested
       Azureus. We conjecture that anti-P2P companies modify                by the misconfigured peers. We find that a large portion of
       Azureus and add their functionalities because Azureus is a           unroutable peers are supplied by the KTorrent peers, which
       popular open source BitTorrent client with a nice plug-in            are suspicious. We also study uTorrent and Azureus, since
       architecture. On the other hand, peers from normal peer              they are the most popular clients. We set up a controlled envi-
       events run on a diverse set of clients. In addition, the peers in    ronment to test these three clients. We evaluate the top seven
       anti-P2P peer events are highly coordinated. They start con-         torrent files seen in Honeynets.
       tacting the Honeynet IPs at almost the same time. They also             We classify the peers seen by clients into two types:
       depart at the same time. By contrast, the peers in normal peer       • Incoming peers: peers that come to the client
       events arrive and depart randomly and gradually. The mes-            • Outgoing peers: peers that are sent out by the client
       sages sent in anti-P2P peer events are almost the same; they         Outgoing — incoming peers are the peers that go out from the
       have a similar message length and even a similar bitmap that         client but never come in. These are the bogus peers created
       annotates the content availability distribution. To some extent,     by the client itself. Among all outgoing KTorrent peers, 56
       the group of anti-P2P peers is like a clone army with identical      percent of them are bogus. Thus, it shows that KTorrent is
       soldiers. Moreover, the peers in anti-P2P peer events are from       one of the sources of misconfigurations. µTorrent and Azureus


       6                                                                                                      IEEE Network • March/April 2011
LI LAYOUT     4/27/11    2:25 PM     Page 7




            void UTPex::encode(…) // UTPEX.CPP
            {     ...
                  // Use KDE API to get an IP in Network Byte Order.
                                                                                      bilities and misconfiguration in Border Gateway Proto-
                  // WriteUint32 swap the byte order again!!!
                  WriteUint32(buf,size,addr.ipAddress().IPv4Addr());
                                                                                      col (BGP) [6]. However, we find little literature on the
                  …}                                                                  study of P2P misconfiguration.
            void WriteUint32(Uint8* buf,Uint32 off,Uint32 val) //FUNCTIONS.CPP
            {
                                                                                      P2P Measurement
                  // swap the byte order                                               Most existing measurement studies of P2P networks are
                  buf[off + 0] = (Uint8) ((val & 0xFF000000) >> 24);                   based on the measurement of normal P2P traffic [7]. On
                  buf[off + 1] = (Uint8) ((val & 0x00FF0000) >> 16);                   the other hand, we measure the abnormal P2P traffic
                  buf[off + 2] = (Uint8) ((val & 0x0000FF00) >> 8);                    observed in Honeynets. This is also observed by Yeg-
                  buf[off + 3] = (Uint8) (val & 0x000000FF);                           neswaran et al. in [1]. Their work focuses on the compar-
            }                                                                          ison of network-level characteristics of P2P address
                                                                                       misconfiguration with botnets and worms, and places an
        Figure 3. Code snippet of KTorrent byte order bug.                             emphasis on accurately classifying botnet sweeps and
                                                                                       worm outbreaks. We mainly study the prevalence and
                                                                                       trend of address-misconfigured P2P traffic, and its root
       do not have any outgoing — incoming peers.                                causes. Liang et al. show that P2P systems like Fasttrack and
          After that, we study the source code of KTorrent and pin-              Overnet are vulnerable to index poisoning attacks [8]. In con-
       point the exact bug. The problem is also a byte-order prob-               trast, we investigate the root causes for both intentional and
       lem. In Fig. 3, we show the code snippet related to this bug.             unintentional index misconfiguration.
       The uTorrent PEX protocol requires the IP addresses in peer
       exchange messages to be in network byte order. When KTor-                 Distributed System Debugging
       rent sends out the peer exchange messages, in its encode func-            Distributed systems are extremely hard to debug. People have
       tion it reverses the byte order twice, so finally IP addresses            proposed various ways to debug distributed systems, including
       become host byte order again. addr.ipAddress().IPv4Addr() is              replay-based predicate checking [9], online monitoring [10],
       a KDE library method that returns IP addresses in network                 log forensic analysis [11], network trace based inference [12],
       byte order. However, in the WriteUint32 function implemented              and model checking [13]. Although some of the approaches
       by KTorrent, it reverses the byte order again, and thus causes            might be useful for the root cause diagnosis of P2P address
       the problem. To our knowledge, we have not found anyone                   misconfiguration, the following properties make leveraging
       else reporting this bug.                                                  any of the above existing techniques hard. First, the P2P sys-
                                                                                 tems we diagnose are heterogeneous. Multiple different
       How Is the Misconfiguration Disseminated? —Similar to the                 implementations exist for a given P2P protocol such as Bit-
       eMule study, we find that BitTorrent DHT and Index servers                Torrent. Therefore, it is hard to apply model checking tech-
       (trackers) are unlikely to spread or to generate misconfigura-            niques. Second, the peers are out of our control. We cannot
       tions. Unlike eMule, BitTorrent has several incompatible peer             modify the peers to add the logging functionality required by
       exchange protocols proposed by popular BitTorrent clients,                replay-based predicate checking, online monitoring, and log
       such as µTorrent, Azureus, and BitComet. The µTorrent PEX                 forensic analysis. It is also very difficult to log network traces
       protocol is the most popular, and we find it is the source for            from remote peers. Third, the P2P systems we monitor are
       misconfiguration dissemination.                                           very large, consisting of millions of peers. All the existing
          To understand which µTorrent PEX compatible clients                    techniques cannot handle this scale yet.
       propagate the misconfigurations, we study the behavior of dif-
       ferent clients. We mainly study, when a client obtains peers
       from uTorrent PEX protocol, how it checks the availability of
                                                                                 Conclusions
       the peers before it further propagates the peers. We classify             In this article, we first discover a new traffic pattern, P2P
       the behavior into three categories:                                       address misconfiguration. Its traffic is about 38.9 percent of
        • No checking                                                            Internet background radiation and is growing quickly. Such
       • Requiring peers to respond to SYN-ACK (only active port                 traffic is harmful for both end users and ISPs. As the second
          regardless of whether it is running BitTorrent)                        contribution, we design P2PScope to understand the root
       • Requiring peers to respond to BT-HANDSHAKE messages                     causes. We find that data plane traffic radiation is the com-
       KTorrent propagates anything that comes to it without even                mon characteristic of the six P2P systems. For eMule, the
       checking for an open port. µTorrent propagates anything that              problem is mainly caused by a byte-order problem. For Bit-
       has an open port. Azureus only propagates peers that are                  Torrent, we find two reasons:
       active BitTorrent clients. Therefore, KTorrent will fully prop-           • Anti-P2P companies deliberately injecting invalid peers
       agate the bogus peers. µTorrent will propagate the bogus                  • KTorrent byte-order bug
       peers if they happen to have the port open. Azureus usually
       will not propagate any bogus peers. To reduce bogus peers,                Acknowledgments
       we believe all the clients should have strict checking like               Our thanks to Vern Paxson and the Lawrence Berkeley
       Azureus. In this way, the address-misconfigured P2P traffic               National Laboratory for support of the LBL Honeynet opera-
       can be greatly reduced.                                                   tions. This work was supported in part by NSF Awards
                                                                                 0627715, DoD Young Investigator Award FA9550-07-1-0074,
       Related Work                                                              and DoE Career award DE-FG02-05ER25692/A001. Opin-
                                                                                 ions, findings, and conclusions or recommendations are those
       Misconfiguration Studies                                                  of the authors, and do not necessarily reflect the views of the
       Misconfiguration is widely spread across different network                funding sources.
       systems on the Internet. Labovitz et al. studied wide-area
       backbone failures and concluded that misconfiguration could
       be responsible for 12 percent of the incidents [5]. There has
       also been much recent work that considers the various insta-


       IEEE Network • March/April 2011                                                                                                           7
LI LAYOUT   4/27/11       2:25 PM        Page 8




       References
       [1] V. Yegneswaran et al., “Using Honeynets for Internet Situational Awareness,”
           Proc. ACM Hotnets IV, 2005.
       [2] Ruben Torres et al., “Inferring Undesirable Behavior from P2P Traffic Analy-
           sis,” Proc. ACM SIGMETRICS, 2009.
       [3] N. Provos, “A Virtual Honeypot Framework,” Proc. USENIX Security, 2004.
       [4] E. Cooke et al., “The Dark Oracle: Perspective-Aware Unused and Unreach-
           able Address Discovery,” Proc. USENIX NSDI, 2006.
       [5] C. Labovitz et al., “Experimental Study of Internet Stability and Backbone
           Failures,” Proc. FTCS: Int’l. Symp. Fault-Tolerant Computing, 1999.
       [6] R. Mahajan, D. Wetherall, and T. Anderson, “Understanding BGP Misconfig-
           uration,” Proc. ACM SIGCOMM, 2002.
       [7] K. Gummadi et al., “Measurement, Modeling, and Analysis of a Peer-to-Peer
           File-Sharing Workload,” Proc. ACM SOSP, 2003.
       [8] J. Liang et al., “The Index Poisoning Attack in P2P File-Sharing Systems,”
           Proc. IEEE INFOCOM, 2006.
       [9] D. Geels et al., “Friday: Global Comprehension for Distributed Replay,” Proc.
           NSDI, 2007.
       [10] X. Liu et al., “D3s: Debugging Deployed Distributed Systems,” Proc. NSDI,
           2008.
       [11] M. K. Aguilera et al., “Performance Debugging for Distributed Systems of
           Black Boxes,” Proc. ACM SOSP, 2003.
       [12] P. Bahl et al., “Towards Highly Reliable Enterprise Network Services Via
           Inference of Multi-Level Dependencies,” Proc. ACM SIGCOMM, 2007.
       [13] C. Killian et al., “Finding Liveness Bugs in System Code,” Proc. NSDI,
           2007.

       Biographies
       ZHICHUN LI (zhichun@nec-labs.com) is a research staff member with NEC Labora-
       tories America, Inc. He received his Ph.D. in computer science from Northwest-
       ern University in December 2009. His research focuses on network security,
       system security, network measurement and monitoring, and distributed system
       diagnosis.

       A NUP G OYAL is an engineer with Yahoo Search since October 2008. He
       received his M.S. degree in 2008 from the Computer Science and Engineering
       Department at Northwestern University and his B.Tech degree from the Indian
       Institute of Technology, Kharagur.

       YAN CHEN is an associate professor in the Department of Electrical Engineering
       and Computer Science at Northwestern University, Evanston, Illinois. He got his
       Ph.D. in computer science from the University of California at Berkeley in 2003.
       His research interests include network measurement, monitoring, and security,
       and P2P systems. He won the DOE Early CAREER award in 2005, and Microsoft
       Trustworthy Computing Awards in 2004 and 2005.

       ALEKSANDAR KUZMANOVIC is an associate professor in the Department of Electri-
       cal Engineering and Computer Science at Northwestern University. He received
       his B.S. and M.S. degrees from the University of Belgrade, Serbia, in 1996 and
       1999, respectively. He received the Ph.D. degree from Rice University in 2004.
       His research interests are in the area of computer networking with emphasis on
       design, measurements, analysis, denial-of-service resiliency, and prototype imple-
       mentation of protocols and algorithms for the Internet. He received the National
       Science Foundation CAREER Award in 2008.




       8                                                                                    IEEE Network • March/April 2011

More Related Content

What's hot

Mobile Hosts Participating in Peer-to-Peer Data Networks: Challenges and Solu...
Mobile Hosts Participating in Peer-to-Peer Data Networks: Challenges and Solu...Mobile Hosts Participating in Peer-to-Peer Data Networks: Challenges and Solu...
Mobile Hosts Participating in Peer-to-Peer Data Networks: Challenges and Solu...Zhenyun Zhuang
 
Towards botnet detection through features using network traffic classification
Towards botnet detection through features using network traffic classificationTowards botnet detection through features using network traffic classification
Towards botnet detection through features using network traffic classificationIJERA Editor
 
ON THE OPTIMIZATION OF BITTORRENT-LIKE PROTOCOLS FOR INTERACTIVE ON-DEMAND ST...
ON THE OPTIMIZATION OF BITTORRENT-LIKE PROTOCOLS FOR INTERACTIVE ON-DEMAND ST...ON THE OPTIMIZATION OF BITTORRENT-LIKE PROTOCOLS FOR INTERACTIVE ON-DEMAND ST...
ON THE OPTIMIZATION OF BITTORRENT-LIKE PROTOCOLS FOR INTERACTIVE ON-DEMAND ST...IJCNCJournal
 
Bot net detection by using ssl encryption
Bot net detection by using ssl encryptionBot net detection by using ssl encryption
Bot net detection by using ssl encryptionAcad
 
Copyright Protection in the Internet
Copyright Protection in the InternetCopyright Protection in the Internet
Copyright Protection in the Internetipoque
 
AN EXPERIMENTAL STUDY OF IOT NETWORKS UNDER INTERNAL ROUTING ATTACK
AN EXPERIMENTAL STUDY OF IOT NETWORKS UNDER INTERNAL ROUTING ATTACKAN EXPERIMENTAL STUDY OF IOT NETWORKS UNDER INTERNAL ROUTING ATTACK
AN EXPERIMENTAL STUDY OF IOT NETWORKS UNDER INTERNAL ROUTING ATTACKIJCNCJournal
 
COMPARATIVE STUDY FOR PERFORMANCE ANALYSIS OF VOIP CODECS OVER WLAN IN NONMOB...
COMPARATIVE STUDY FOR PERFORMANCE ANALYSIS OF VOIP CODECS OVER WLAN IN NONMOB...COMPARATIVE STUDY FOR PERFORMANCE ANALYSIS OF VOIP CODECS OVER WLAN IN NONMOB...
COMPARATIVE STUDY FOR PERFORMANCE ANALYSIS OF VOIP CODECS OVER WLAN IN NONMOB...Zac Darcy
 
EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...
EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...
EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...IJNSA Journal
 
Bit sync personal_cloud
Bit sync personal_cloudBit sync personal_cloud
Bit sync personal_cloudFlavio Vit
 
A Cross-Layer Packet Loss Identification Scheme to Improve TCP Veno Performance
A Cross-Layer Packet Loss Identification Scheme to Improve TCP Veno PerformanceA Cross-Layer Packet Loss Identification Scheme to Improve TCP Veno Performance
A Cross-Layer Packet Loss Identification Scheme to Improve TCP Veno PerformanceCSCJournals
 

What's hot (10)

Mobile Hosts Participating in Peer-to-Peer Data Networks: Challenges and Solu...
Mobile Hosts Participating in Peer-to-Peer Data Networks: Challenges and Solu...Mobile Hosts Participating in Peer-to-Peer Data Networks: Challenges and Solu...
Mobile Hosts Participating in Peer-to-Peer Data Networks: Challenges and Solu...
 
Towards botnet detection through features using network traffic classification
Towards botnet detection through features using network traffic classificationTowards botnet detection through features using network traffic classification
Towards botnet detection through features using network traffic classification
 
ON THE OPTIMIZATION OF BITTORRENT-LIKE PROTOCOLS FOR INTERACTIVE ON-DEMAND ST...
ON THE OPTIMIZATION OF BITTORRENT-LIKE PROTOCOLS FOR INTERACTIVE ON-DEMAND ST...ON THE OPTIMIZATION OF BITTORRENT-LIKE PROTOCOLS FOR INTERACTIVE ON-DEMAND ST...
ON THE OPTIMIZATION OF BITTORRENT-LIKE PROTOCOLS FOR INTERACTIVE ON-DEMAND ST...
 
Bot net detection by using ssl encryption
Bot net detection by using ssl encryptionBot net detection by using ssl encryption
Bot net detection by using ssl encryption
 
Copyright Protection in the Internet
Copyright Protection in the InternetCopyright Protection in the Internet
Copyright Protection in the Internet
 
AN EXPERIMENTAL STUDY OF IOT NETWORKS UNDER INTERNAL ROUTING ATTACK
AN EXPERIMENTAL STUDY OF IOT NETWORKS UNDER INTERNAL ROUTING ATTACKAN EXPERIMENTAL STUDY OF IOT NETWORKS UNDER INTERNAL ROUTING ATTACK
AN EXPERIMENTAL STUDY OF IOT NETWORKS UNDER INTERNAL ROUTING ATTACK
 
COMPARATIVE STUDY FOR PERFORMANCE ANALYSIS OF VOIP CODECS OVER WLAN IN NONMOB...
COMPARATIVE STUDY FOR PERFORMANCE ANALYSIS OF VOIP CODECS OVER WLAN IN NONMOB...COMPARATIVE STUDY FOR PERFORMANCE ANALYSIS OF VOIP CODECS OVER WLAN IN NONMOB...
COMPARATIVE STUDY FOR PERFORMANCE ANALYSIS OF VOIP CODECS OVER WLAN IN NONMOB...
 
EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...
EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...
EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...
 
Bit sync personal_cloud
Bit sync personal_cloudBit sync personal_cloud
Bit sync personal_cloud
 
A Cross-Layer Packet Loss Identification Scheme to Improve TCP Veno Performance
A Cross-Layer Packet Loss Identification Scheme to Improve TCP Veno PerformanceA Cross-Layer Packet Loss Identification Scheme to Improve TCP Veno Performance
A Cross-Layer Packet Loss Identification Scheme to Improve TCP Veno Performance
 

Similar to Measurement and diagnosis of address

EFFECTIVE TOPOLOGY-AWARE PEER SELECTION IN UNSTRUCTURED PEER-TO-PEER SYSTEMS
EFFECTIVE TOPOLOGY-AWARE PEER SELECTION IN UNSTRUCTURED PEER-TO-PEER SYSTEMSEFFECTIVE TOPOLOGY-AWARE PEER SELECTION IN UNSTRUCTURED PEER-TO-PEER SYSTEMS
EFFECTIVE TOPOLOGY-AWARE PEER SELECTION IN UNSTRUCTURED PEER-TO-PEER SYSTEMSijp2p
 
EFFECTIVE TOPOLOGY-AWARE PEER SELECTION IN UNSTRUCTURED PEER-TO-PEER SYSTEMS
EFFECTIVE TOPOLOGY-AWARE PEER SELECTION IN UNSTRUCTURED PEER-TO-PEER SYSTEMS EFFECTIVE TOPOLOGY-AWARE PEER SELECTION IN UNSTRUCTURED PEER-TO-PEER SYSTEMS
EFFECTIVE TOPOLOGY-AWARE PEER SELECTION IN UNSTRUCTURED PEER-TO-PEER SYSTEMS ijp2p
 
Analytical Modelling of Localized P2P Streaming Systems under NAT Consideration
Analytical Modelling of Localized P2P Streaming Systems under NAT ConsiderationAnalytical Modelling of Localized P2P Streaming Systems under NAT Consideration
Analytical Modelling of Localized P2P Streaming Systems under NAT ConsiderationIJCNCJournal
 
A study of index poisoning in peer topeer
A study of index poisoning in peer topeerA study of index poisoning in peer topeer
A study of index poisoning in peer topeerIJCI JOURNAL
 
SECURITY CONSIDERATION IN PEER-TO-PEER NETWORKS WITH A CASE STUDY APPLICATION
SECURITY CONSIDERATION IN PEER-TO-PEER NETWORKS WITH A CASE STUDY APPLICATIONSECURITY CONSIDERATION IN PEER-TO-PEER NETWORKS WITH A CASE STUDY APPLICATION
SECURITY CONSIDERATION IN PEER-TO-PEER NETWORKS WITH A CASE STUDY APPLICATIONIJNSA Journal
 
A Brief Note On Peer And Peer ( P2P ) Applications Have No...
A Brief Note On Peer And Peer ( P2P ) Applications Have No...A Brief Note On Peer And Peer ( P2P ) Applications Have No...
A Brief Note On Peer And Peer ( P2P ) Applications Have No...Brenda Thomas
 
FUTURE OF PEER-TO-PEER TECHNOLOGY WITH THE RISE OF CLOUD COMPUTING
FUTURE OF PEER-TO-PEER TECHNOLOGY WITH  THE RISE OF CLOUD COMPUTINGFUTURE OF PEER-TO-PEER TECHNOLOGY WITH  THE RISE OF CLOUD COMPUTING
FUTURE OF PEER-TO-PEER TECHNOLOGY WITH THE RISE OF CLOUD COMPUTINGijp2p
 
FUTURE OF PEER-TO-PEER TECHNOLOGY WITH THE RISE OF CLOUD COMPUTING
FUTURE OF PEER-TO-PEER TECHNOLOGY WITH THE RISE OF CLOUD COMPUTINGFUTURE OF PEER-TO-PEER TECHNOLOGY WITH THE RISE OF CLOUD COMPUTING
FUTURE OF PEER-TO-PEER TECHNOLOGY WITH THE RISE OF CLOUD COMPUTINGijp2p
 
Guarding Against Large-Scale Scrabble In Social Network
Guarding Against Large-Scale Scrabble In Social NetworkGuarding Against Large-Scale Scrabble In Social Network
Guarding Against Large-Scale Scrabble In Social NetworkEditor IJCATR
 
Copyright Protection in Peer To Peer Network
Copyright Protection in Peer To Peer NetworkCopyright Protection in Peer To Peer Network
Copyright Protection in Peer To Peer NetworkIJERA Editor
 
The International Journal of Engineering and Science (IJES)
The International Journal of Engineering and Science (IJES)The International Journal of Engineering and Science (IJES)
The International Journal of Engineering and Science (IJES)theijes
 
Peer To Peer.pptx
Peer To Peer.pptxPeer To Peer.pptx
Peer To Peer.pptxHananShk
 
Exploring Peer-To-Peer Data Mining
Exploring Peer-To-Peer Data MiningExploring Peer-To-Peer Data Mining
Exploring Peer-To-Peer Data Miningcsandit
 
EXPLORING PEER-TO-PEER DATA MINING
EXPLORING PEER-TO-PEER DATA MININGEXPLORING PEER-TO-PEER DATA MINING
EXPLORING PEER-TO-PEER DATA MININGcscpconf
 
ipoque Internet Study 2008/2009
ipoque Internet Study 2008/2009ipoque Internet Study 2008/2009
ipoque Internet Study 2008/2009ipoque
 
Peer-to-Peer Architecture Case Study Gnutella NetworkMate.docx
Peer-to-Peer Architecture Case Study Gnutella NetworkMate.docxPeer-to-Peer Architecture Case Study Gnutella NetworkMate.docx
Peer-to-Peer Architecture Case Study Gnutella NetworkMate.docxherbertwilson5999
 
Bandwidth Management Solutions for Network Operators
Bandwidth Management Solutions for Network OperatorsBandwidth Management Solutions for Network Operators
Bandwidth Management Solutions for Network Operatorsipoque
 
Scale-Free Networks to Search in Unstructured Peer-To-Peer Networks
Scale-Free Networks to Search in Unstructured Peer-To-Peer NetworksScale-Free Networks to Search in Unstructured Peer-To-Peer Networks
Scale-Free Networks to Search in Unstructured Peer-To-Peer NetworksIOSR Journals
 
Textual based retrieval system with bloom in unstructured Peer-to-Peer networks
Textual based retrieval system with bloom in unstructured Peer-to-Peer networksTextual based retrieval system with bloom in unstructured Peer-to-Peer networks
Textual based retrieval system with bloom in unstructured Peer-to-Peer networksUvaraj Shan
 

Similar to Measurement and diagnosis of address (20)

EFFECTIVE TOPOLOGY-AWARE PEER SELECTION IN UNSTRUCTURED PEER-TO-PEER SYSTEMS
EFFECTIVE TOPOLOGY-AWARE PEER SELECTION IN UNSTRUCTURED PEER-TO-PEER SYSTEMSEFFECTIVE TOPOLOGY-AWARE PEER SELECTION IN UNSTRUCTURED PEER-TO-PEER SYSTEMS
EFFECTIVE TOPOLOGY-AWARE PEER SELECTION IN UNSTRUCTURED PEER-TO-PEER SYSTEMS
 
EFFECTIVE TOPOLOGY-AWARE PEER SELECTION IN UNSTRUCTURED PEER-TO-PEER SYSTEMS
EFFECTIVE TOPOLOGY-AWARE PEER SELECTION IN UNSTRUCTURED PEER-TO-PEER SYSTEMS EFFECTIVE TOPOLOGY-AWARE PEER SELECTION IN UNSTRUCTURED PEER-TO-PEER SYSTEMS
EFFECTIVE TOPOLOGY-AWARE PEER SELECTION IN UNSTRUCTURED PEER-TO-PEER SYSTEMS
 
Analytical Modelling of Localized P2P Streaming Systems under NAT Consideration
Analytical Modelling of Localized P2P Streaming Systems under NAT ConsiderationAnalytical Modelling of Localized P2P Streaming Systems under NAT Consideration
Analytical Modelling of Localized P2P Streaming Systems under NAT Consideration
 
A study of index poisoning in peer topeer
A study of index poisoning in peer topeerA study of index poisoning in peer topeer
A study of index poisoning in peer topeer
 
SECURITY CONSIDERATION IN PEER-TO-PEER NETWORKS WITH A CASE STUDY APPLICATION
SECURITY CONSIDERATION IN PEER-TO-PEER NETWORKS WITH A CASE STUDY APPLICATIONSECURITY CONSIDERATION IN PEER-TO-PEER NETWORKS WITH A CASE STUDY APPLICATION
SECURITY CONSIDERATION IN PEER-TO-PEER NETWORKS WITH A CASE STUDY APPLICATION
 
A Brief Note On Peer And Peer ( P2P ) Applications Have No...
A Brief Note On Peer And Peer ( P2P ) Applications Have No...A Brief Note On Peer And Peer ( P2P ) Applications Have No...
A Brief Note On Peer And Peer ( P2P ) Applications Have No...
 
FUTURE OF PEER-TO-PEER TECHNOLOGY WITH THE RISE OF CLOUD COMPUTING
FUTURE OF PEER-TO-PEER TECHNOLOGY WITH  THE RISE OF CLOUD COMPUTINGFUTURE OF PEER-TO-PEER TECHNOLOGY WITH  THE RISE OF CLOUD COMPUTING
FUTURE OF PEER-TO-PEER TECHNOLOGY WITH THE RISE OF CLOUD COMPUTING
 
FUTURE OF PEER-TO-PEER TECHNOLOGY WITH THE RISE OF CLOUD COMPUTING
FUTURE OF PEER-TO-PEER TECHNOLOGY WITH THE RISE OF CLOUD COMPUTINGFUTURE OF PEER-TO-PEER TECHNOLOGY WITH THE RISE OF CLOUD COMPUTING
FUTURE OF PEER-TO-PEER TECHNOLOGY WITH THE RISE OF CLOUD COMPUTING
 
Non Path-Based Mutual Anonymity Protocol for Decentralized P2P System
Non Path-Based Mutual Anonymity Protocol for Decentralized P2P SystemNon Path-Based Mutual Anonymity Protocol for Decentralized P2P System
Non Path-Based Mutual Anonymity Protocol for Decentralized P2P System
 
Guarding Against Large-Scale Scrabble In Social Network
Guarding Against Large-Scale Scrabble In Social NetworkGuarding Against Large-Scale Scrabble In Social Network
Guarding Against Large-Scale Scrabble In Social Network
 
Copyright Protection in Peer To Peer Network
Copyright Protection in Peer To Peer NetworkCopyright Protection in Peer To Peer Network
Copyright Protection in Peer To Peer Network
 
The International Journal of Engineering and Science (IJES)
The International Journal of Engineering and Science (IJES)The International Journal of Engineering and Science (IJES)
The International Journal of Engineering and Science (IJES)
 
Peer To Peer.pptx
Peer To Peer.pptxPeer To Peer.pptx
Peer To Peer.pptx
 
Exploring Peer-To-Peer Data Mining
Exploring Peer-To-Peer Data MiningExploring Peer-To-Peer Data Mining
Exploring Peer-To-Peer Data Mining
 
EXPLORING PEER-TO-PEER DATA MINING
EXPLORING PEER-TO-PEER DATA MININGEXPLORING PEER-TO-PEER DATA MINING
EXPLORING PEER-TO-PEER DATA MINING
 
ipoque Internet Study 2008/2009
ipoque Internet Study 2008/2009ipoque Internet Study 2008/2009
ipoque Internet Study 2008/2009
 
Peer-to-Peer Architecture Case Study Gnutella NetworkMate.docx
Peer-to-Peer Architecture Case Study Gnutella NetworkMate.docxPeer-to-Peer Architecture Case Study Gnutella NetworkMate.docx
Peer-to-Peer Architecture Case Study Gnutella NetworkMate.docx
 
Bandwidth Management Solutions for Network Operators
Bandwidth Management Solutions for Network OperatorsBandwidth Management Solutions for Network Operators
Bandwidth Management Solutions for Network Operators
 
Scale-Free Networks to Search in Unstructured Peer-To-Peer Networks
Scale-Free Networks to Search in Unstructured Peer-To-Peer NetworksScale-Free Networks to Search in Unstructured Peer-To-Peer Networks
Scale-Free Networks to Search in Unstructured Peer-To-Peer Networks
 
Textual based retrieval system with bloom in unstructured Peer-to-Peer networks
Textual based retrieval system with bloom in unstructured Peer-to-Peer networksTextual based retrieval system with bloom in unstructured Peer-to-Peer networks
Textual based retrieval system with bloom in unstructured Peer-to-Peer networks
 

More from ingenioustech

Supporting efficient and scalable multicasting
Supporting efficient and scalable multicastingSupporting efficient and scalable multicasting
Supporting efficient and scalable multicastingingenioustech
 
Monitoring service systems from
Monitoring service systems fromMonitoring service systems from
Monitoring service systems fromingenioustech
 
Locally consistent concept factorization for
Locally consistent concept factorization forLocally consistent concept factorization for
Locally consistent concept factorization foringenioustech
 
Impact of le arrivals and departures on buffer
Impact of  le arrivals and departures on bufferImpact of  le arrivals and departures on buffer
Impact of le arrivals and departures on bufferingenioustech
 
Exploiting dynamic resource allocation for
Exploiting dynamic resource allocation forExploiting dynamic resource allocation for
Exploiting dynamic resource allocation foringenioustech
 
Efficient computation of range aggregates
Efficient computation of range aggregatesEfficient computation of range aggregates
Efficient computation of range aggregatesingenioustech
 
Dynamic measurement aware
Dynamic measurement awareDynamic measurement aware
Dynamic measurement awareingenioustech
 
Design and evaluation of a proxy cache for
Design and evaluation of a proxy cache forDesign and evaluation of a proxy cache for
Design and evaluation of a proxy cache foringenioustech
 
Throughput optimization in
Throughput optimization inThroughput optimization in
Throughput optimization iningenioustech
 
Phish market protocol
Phish market protocolPhish market protocol
Phish market protocolingenioustech
 
Peering equilibrium multi path routing
Peering equilibrium multi path routingPeering equilibrium multi path routing
Peering equilibrium multi path routingingenioustech
 
Online social network
Online social networkOnline social network
Online social networkingenioustech
 
On the quality of service of crash recovery
On the quality of service of crash recoveryOn the quality of service of crash recovery
On the quality of service of crash recoveryingenioustech
 
It auditing to assure a secure cloud computing
It auditing to assure a secure cloud computingIt auditing to assure a secure cloud computing
It auditing to assure a secure cloud computingingenioustech
 
Bayesian classifiers programmed in sql
Bayesian classifiers programmed in sqlBayesian classifiers programmed in sql
Bayesian classifiers programmed in sqlingenioustech
 

More from ingenioustech (20)

Supporting efficient and scalable multicasting
Supporting efficient and scalable multicastingSupporting efficient and scalable multicasting
Supporting efficient and scalable multicasting
 
Monitoring service systems from
Monitoring service systems fromMonitoring service systems from
Monitoring service systems from
 
Locally consistent concept factorization for
Locally consistent concept factorization forLocally consistent concept factorization for
Locally consistent concept factorization for
 
Impact of le arrivals and departures on buffer
Impact of  le arrivals and departures on bufferImpact of  le arrivals and departures on buffer
Impact of le arrivals and departures on buffer
 
Exploiting dynamic resource allocation for
Exploiting dynamic resource allocation forExploiting dynamic resource allocation for
Exploiting dynamic resource allocation for
 
Efficient computation of range aggregates
Efficient computation of range aggregatesEfficient computation of range aggregates
Efficient computation of range aggregates
 
Dynamic measurement aware
Dynamic measurement awareDynamic measurement aware
Dynamic measurement aware
 
Design and evaluation of a proxy cache for
Design and evaluation of a proxy cache forDesign and evaluation of a proxy cache for
Design and evaluation of a proxy cache for
 
Throughput optimization in
Throughput optimization inThroughput optimization in
Throughput optimization in
 
Tcp
TcpTcp
Tcp
 
Privacy preserving
Privacy preservingPrivacy preserving
Privacy preserving
 
Phish market protocol
Phish market protocolPhish market protocol
Phish market protocol
 
Peering equilibrium multi path routing
Peering equilibrium multi path routingPeering equilibrium multi path routing
Peering equilibrium multi path routing
 
Peace
PeacePeace
Peace
 
Online social network
Online social networkOnline social network
Online social network
 
On the quality of service of crash recovery
On the quality of service of crash recoveryOn the quality of service of crash recovery
On the quality of service of crash recovery
 
Layered approach
Layered approachLayered approach
Layered approach
 
It auditing to assure a secure cloud computing
It auditing to assure a secure cloud computingIt auditing to assure a secure cloud computing
It auditing to assure a secure cloud computing
 
Intrution detection
Intrution detectionIntrution detection
Intrution detection
 
Bayesian classifiers programmed in sql
Bayesian classifiers programmed in sqlBayesian classifiers programmed in sql
Bayesian classifiers programmed in sql
 

Recently uploaded

Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 

Recently uploaded (20)

Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 

Measurement and diagnosis of address

  • 1. LI LAYOUT 4/27/11 2:24 PM Page 2 Measurement and Diagnosis of Address Misconfigured P2P Traffic Zhichun Li, NEC Laboratories America, Inc. Anup Goyal, Yahoo! Search Yan Chen and Aleksandar Kuzmanovic, Northwestern University Abstract Through measurement study, we discover an interesting phenomenon, P2P address misconfiguration, in which a large number of peers send P2P file downloading requests to a “random” target on the Internet. Through measuring three large datasets spanning four years and across five different /8 networks, we find address-misconfigured P2P traffic on average contributes 38.9 percent of Internet background radiation, increasing by more than 100 percent every year. To detect and diagnose such unwanted traffic, we design the P2PScope, a measurement tool. After analyzing about 2 Tbytes of data and tracking millions of peers, we found that in all the P2P systems, address misconfiguration is caused by resource mapping contamination: the sources returned for a given file ID through P2P index- ing are not valid. Different P2P systems have different reasons for such contamina- tion. For eMule, we find that the root cause is mainly a network byte-order problem in the eMule Source Exchange protocol. For BitTorrent misconfiguration, one reason is that anti-P2P companies actively inject bogus peers into the P2P sys- tem. Another reason is that the KTorrent implementation has a byte-order problem. P eer-to-peer (P2P) traffic has grown to be one of the dominant sources of Internet traffic. Given the unprecedented amount of P2P traffic flowing through the Internet, misconfiguration, due to software bugs or attackers with malicious motives, is a serious problem of both the Internet and P2P systems. Usually it is believed that so-called Internet background Gb/s intercontinental link (e.g., between Los Angeles and Tokyo) costs $1.4 million per year to lease, ISPs might want to remove such traffic for a more “green” Internet. Our first contribution, discovering and measuring address- misconfigured P2P traffic, further motivates us to build tools for detection and diagnosis of such unwanted traffic. Here detection refers to attributing the misconfiguration to a partic- radiation [1] — unsolicited traffic sent to all IP addresses ular variant or version of the P2P software that contains bugs, including unused ones — is mainly scanning traffic or denial- or a particular peer group (e.g., peers from anti-P2P compa- of-service (DoS) backscatter traffic. As part of the contribu- nies who are hired by the movie and music industries to pro- tions of this article, however, after we analyze about 2 Tbytes tect their copyrights). Detection is of crucial importance, of data from three institutions spanning four years on five dif- because without it, nobody will take responsibility for fixing ferent /8 networks, we discover that address-misconfigured the problem, given the large number of P2P software variants. P2P traffic is one of the major sources (an average of 38.9 The root causes can be either internal (bugs or software mis- percent in terms of the number of connections) of Internet behavior) or external (e.g., injection attacks). Diagnosis means background radiation. Address-misconfigured P2P traffic is inferences of the root causes, say, by locating the software caused by a large number of peers sending P2P file download- code that contains the bug or identifying the anti-P2P peers ing requests to a target that has never been part of the P2P that trigger the misconfiguration. network. In addition, we observe that from 2004 to 2007, We design two general principles: address-misconfigured P2P traffic increased more than 100 • P2P software testing by tracking its information flow in a percent each year (as shown in Fig. 1). To the best of our controlled environment knowledge, we are the first to discover and study address-mis- • P2P traffic measurement including both passive monitoring configured P2P traffic with large-scale datasets. and active backtracking to identify misconfiguration events For end users, such unwanted traffic affects their P2P sys- in the wild tem performance and can involve innocent users in distributed Applying the principles above, we design and implement DoS (DDoS) attacks unconsciously [2]. P2PScope. Our analysis shows that all address misconfigura- From the Internet service providers’ (ISPs’) perspective, tions are caused by resource mapping contamination, that is, such misconfigured traffic wastes Internet resources. As a first the peers returned through indexing are invalid, thus causing order estimation, extrapolating the address-misconfigured P2P a large amount of incorrect file downloading requests. Differ- traffic observed in our datasets to the whole Internet, we find ent systems have different reasons for contamination. We it consumes modest bandwidth, 7.9 Gb/s globally. Moreover, have performed root cause analysis for the two largest P2P this traffic is mostly intercontinental. Therefore, given a 10 systems, eMule and BitTorrent. 2 0890-8044/11/$25.00 © 2011 IEEE IEEE Network • March/April 2011
  • 2. LI LAYOUT 4/27/11 2:24 PM Page 3 Number of connections (millions) 14 12 scheme, we use the GQ trace only for prevalence analysis. 10 7 Address Misconfiguration Event Identification 6 To identify address-misconfigured P2P traffic, we first filter out the scanning traffic caused by botnets or worms based on 4 whether they have malicious payloads that match known 2 worm/botnet signatures or they scan a large number of IP addresses (≥ 10). We have also checked the removed traffic 0 2004 2005 2006 2007 against P2P protocol signatures, and have not found any matches. For remaining traffic, we detect P2P connections Figure 1. The total number of connections that match the P2P based on whether their payloads match P2P protocol signa- payload signatures from 2004 to 2007 (LBL). tures. After that, we aggregate the address-misconfigured P2P traffic to events. We find all the peers target a few hotspots in For eMule, one major root cause is a network byte-order Honeynets. Ninety-seven percent of peers only contact one problem in the source exchange protocol. We confirm that Honeynet IP per day (all contact less than four per day). aMule (an eMule variant) has the byte-order bug. For BitTor- Therefore, we group the P2P connections into address-mis- rent, address misconfiguration is mostly disseminated by the configuration events based on their target destination. We Peer Exchange (PEX) protocol implemented by uTorrent- mainly analyze the events with more than 100 unique sources compatible clients. We find two reasons: seen in a six hour interval. For smaller events, it is hard to • Some anti-P2P companies actively inject bogus peers profile their patterns. The event time boundaries are the time through the PEX protocol using modified Azureus. intervals that the number of unique sources reduces to less • KTorrent has a byte ordering problem. We have confirmed than one-third of the peak. the bug with the developers. The above method only conservatively estimates the lower bound of the number of P2P connections and number of events seen in the Honeynets. We address this by adding a Passive Measurement Analysis “Likely” category in Table 2 for the event in which we cannot When monitoring the Internet background radiation, we dis- identify which protocol is being used, but its traffic pattern is cover an interesting traffic pattern that has not been described similar to that of an address-misconfiguration event. These in the literature before. We call it P2P address misconfigura- “likely” events follow the same pattern—a large number of tion. In this section, we study its basic behavior through pas- sources contact a few hotspots. We believe that they are actu- sive measurement. In the next section, we further design the ally address-misconfiguration events, because there are no P2PScope system to diagnose its root causes. similar patterns known for Internet background radiation. The likely cases account for only 9.6 percent of the total events, Data Collections showing that we are able to identify most cases accurately. In our study, we mainly use three datasets collected at differ- ent Honeynet/Honeyfarm sensors. Honeynets are unused P2P System Diversity address blocks on which we deploy honeypot responders in Table 2 shows the percentage of P2P connections that match order to elicit information about incoming probes. Those hon- P2P signatures and the number of address-misconfiguration eypots are protocol simulators, which respond to incoming events found. Each row of the table is for one P2P protocol. probes with faked responses as if the responses are from real The “other P2P” category is for connections that matched machines. Honeyfarm goes one step further, channeling the P2P protocols other than the six protocols listed in Table 2. incoming traffic to virtual machines as if real machines are We find that address-misconfiguration events mainly come running at the addresses (Table 1). from six different P2P systems. BitTorrent and eMule are the most popular P2P systems, and thus also generate the most The LBL Honeynet Dataset — The LBL Honeynet sensor is at traffic and events. We also find a number of address miscon- the Lawrence Berkeley National Laboratory with five continu- figurations from other P2P software, such as Gnutella, Soriba- ous /24 IP blocks in one /16 network. We have data from 2004 da, Xunlei, and VAgaa. through 2007. The honeynet simulates the common protocols These P2P systems include centralized, decentralized dis- and acts as an echo server (sending back the same requests to tributed hash table (DHT)-based, and decentralized unstruc- the clients) for other port numbers. In 2008, traffic collection tured P2P systems. This diversity suggests that the prevalence was stopped because the Motion Picture Association of Amer- of address misconfiguration is not confined to any single type ica (MPAA) repeatedly complained that the Honeynet IPs of P2P system. were hosting their latest movies (actually false positives). The NU Honeynet Dataset — The NU Honeynet sensor at Northwestern University has 10 discontinuous /24 IP blocks LBL NU GQ within three different /8 IP prefixes. It used the same configu- ration as the LBL Honeynet until 23 August, 2007. After that, Sensor size 5 /24 10 /24 4 /16 we developed the P2P-enabled Honeynet for the P2PScope system, which simulates P2P protocols on all port numbers. Trace size 901 Gbytes 916 Gbytes 49 Gbytes The GQ Honeyfarm Dataset — We have 26 days of traffic Starting time 2004/03/07 2006/09/01 2006/01/03 from the GQ honeyfarm at the International Computer Sci- ence Institute, which was originally designed to capture worms End time 2008/01/21 2007/12/31 2006/01/28 and botnets. The GQ honeyfarm has four /16 networks in one /8 network. Since the GQ honeyfarm runs a different response Table 1. Data description. IEEE Network • March/April 2011 3
  • 3. LI LAYOUT 4/27/11 2:25 PM Page 4 event (≥ uniq P2P connections srcs) 100 by payloads comments sion or peer group): LBL NU LBL NU • Measurement, including both passive monitor- ing and active backtracking eMule 143 416 5.96% 76.52% Popular, especially in • Tracking the information flow for the suspi- Europe cious P2P software in a controlled environment BitTorrent 74 211 75.1% 19.79% Popular The first approach helps quickly identify which Gnutella 4 3 2.48% 3.65% Popular, unstructured software version, communication mechanism, or Soribada 6 0 15.8% .0001% Popular in Korea peer groups is/are potentially the sources of the Xunlei 12 0 0.34% 0% Popular in China misconfiguration. This approach also helps iden- VAgaa 1 1 0.14% 0.013% Popular in China Other P2P 0 0 0.17% 0.018% tify the possible attacks or misconfiguration injected, such as those from anti-P2P companies. The second approach can further validate mis- Likely 73 20 N/A N/A configuration caused by software bugs. Table 2. Address-misconfigured P2P traffic distribution and event distribu- After detection, we want to further diagnose tion (LBL and NU Honeynet datasets). the root causes of the misconfiguration. This diagnosis step is harder than detection, and usu- ally requires manual inspection. Information flow Characteristics of Address Misconfiguration Events tracking is still very useful in this stage. Another useful approach is to form the hypotheses regarding the root causes We also discover that such events are quite heterogeneous. and then to conduct experiments for validation. The scale of events (in terms of duration and the number of Per the principles above, P2PScope has two major subsys- unique sources involved) varies substantially. Some events are tems, detection and diagnosis, as shown in Fig. 2. as short as one to three hours; some are as long as one month. The average number of peers in eMule events is 1404 for LBL Detection Subsystem and 1028 for NU. BitTorrent events are on a smaller scale: The detection system has three modules: the average number of peers in BitTorrent events is 894 for • Passive monitoring module LBL and 291 for NU. • Backtracking module • Information flow tracking module Prevalence in Time and Space The passive monitoring module detects the address-miscon- We use the four-year LBL data to analyze the temporal trend figured P2P traffic from Honeynets. Further, the backtracking of address-misconfiguration. Since the scale of the events dif- module collects more information about the misconfigured fers substantially, instead of using the number of events, we peers and the files related to address misconfiguration. When use the total number of connections that match the P2P signa- these two measurement modules detect some suspicious P2P tures in a year as the metric for studying temporal trend. This software version through protocol analysis, the software will metric gives us the lower bound of the number of P2P con- be passed to the information flow tracking module. In the nections. Figure 1 shows the increasing trend of address-mis- information flow tracking module, we install that version of configured P2P traffic. Additionally, we analyze the annual P2P software and check its peer propagation. Finally, all col- trend of the total volume of Internet background radiation in lected information will be output to root cause diagnosis sub- the LBL sensor. The total traffic is stable while the address- system. misconfigured P2P traffic increases quickly. We also observe that address misconfiguration is prevalent Passive Monitoring Module — We find that Honeynets are across the IP space. As we mentioned earlier, the LBL and useful for monitoring the address-misconfigured P2P traffic NU Honeynets are in four different /8. In all the four /8 pre- on unused IP blocks. The Honeynets respond to the incoming fixes, we find misconfiguration traffic. We also observe similar connections and generate “faked” responses. We use Honeyd- behavior in the GQ Honeyfarm traces. Therefore, we believe based [3] lightweight Honeynets. Honeyd uses port numbers the behavior is not confined to a single subspace of the IP to identify protocols, but P2P traffic is on random port num- space. bers. To overcome this limitation, we apply a payload signature based approach. We first identify the P2P protocol based on Global Address-Misconfigured P2P Traffic Estimation payload signatures and then call appropriate P2P responders We try to estimate the percentage of address-misconfigured for further processing. Currently, we have developed BitTor- P2P traffic in Internet background radiation traffic. It is very rent and eMule protocol responders. challenging, given the limited data we have. We use the data (June–December 2007) from the LBL and NU Honeynets for Backtracking Module — To understand how peers get miscon- this analysis. We treat the percentage estimated in each /8 figured, the backtracking module tracks the peers in real time. prefix of the Honeynet sensors as an independent sample. We In P2P systems, a peer needs to download from a list of peers then use the average percentage from all the samples as a that have the given resource. Three methods can facilitate the more representative result. As a first order estimation, we find resource to peer mapping: central index servers (tracker in the average percentage of address-misconfigured P2P traffic BitTorrent terminology), distributed hash indexing, and peer in the Internet background radiation is 38.9 percent. exchange (query more peers from known peers). For eMule and BitTorrent, we implement all three approaches. In the System Design backtracking module, for a given resource we periodically query the index servers or DHT for the peer lists. Then, for General Design Principles any peer, we use peer exchange protocols to query more In general, two reasons can potentially cause P2P misconfigu- peers. The whole process stops when the total number of rations: software bugs, and external attack or misconfiguration peers remains stable. injection. Two approaches are useful for detecting misconfigu- ration (attributing the problem to a particular software ver- Information Flow Tracking Module — Whenever a certain P2P 4 IEEE Network • March/April 2011
  • 4. LI LAYOUT 4/27/11 2:25 PM Page 5 Honeynet traffic Detection Data Plane Traffic Radiation subsystem Passive monitoring module Usually, there are a number of different protocol messages involved in a P2P file sharing network. Address misconfigura- Root cause tion in all the six P2P systems that we explored has a common Information flow diagnosis tracking module subsystem feature: the traffic is caused by file downloading requests, i.e., Backtracking module data plane traffic which is directly related to data transfer. Different P2P systems use different ways for exchanging con- Index servers DHT trol plane information (e.g., index server, Peer Exchange Pro- Peer exchange tocol, and DHT), but all have the same problem of giving bogus IPs to peers. These bogus IPs propagate quickly in P2P systems and cause large amounts of address-misconfigured Figure 2. Architecture of the P2PScope system. P2P traffic. eMule Misconfiguration Diagnosis Which Peers Spread Misconfiguration? — Hypothetically, the client version is suspicious, we install it in our controlled envi- misconfigured peers that target the Honeynets might include ronment. In the controlled environment, by interpreting the anti-P2P peers. However, among 1,771,296 misconfigured protocols (index server, DHT, and peer exchange protocol) peers we observed, 99.90 percent of peers are normal peers used for peer propagation, we monitor the peers in and out (non-anti-P2P peers). through protocol analysis. If the software gives out a peer that never comes in, we know the software will cause address mis- What Is the Root Cause? — When analyzing the network-level configuration. behavior of the eMule misconfiguration, we notice that the IP addresses formed by reversing the byte order of the targets Diagnosis Subsystem were mostly alive. With the real-time P2PScope system, we Tracking down the root causes of address misconfiguration is have done two experiments to verify the byte-order problem. a challenging problem. We have developed the following tools Furthermore, we have confirmed the bug in source code. for understanding the root causes: In the first experiment, we tested 13 targeted destinations • Track the information flow within the suspicious P2P soft- in real time, and observed that 61 percent of reverse IPs were ware. indeed running eMule with the same port number of the tar- • Trace the Honeynet IP addresses propagated in the P2P gets. In the second experiment, we checked whether the systems. reverse IPs of unroutable peers observed in the peer list of a • Check the routability of the peers returned. given file ran eMule. We aggregated the result of 100 files • Check whether the peers are on the anti-P2P IP lists and found in real time. 10.3 percent of backtracked peer IPs are analyze their behavior. unroutable. Among them, we observe 69.6 percent whose • Check the reverse byte order of peer IPs, because it is one reverse IPs run eMule. This, again, is more strong evidence in of the most frequent mistakes in networked system imple- favor of the byte order hypothesis. mentations. We further located the bug in the source code of aMule, a To understand why the bogus peers can be produced by the popular eMule alternative. eMule has a complex client ID to suspicious P2P software, we trace how the bogus peers are IP mapping system called HybridID. In this mapping system, produced. Another useful toolkit is the routability checking. to represent IP in index servers or DHTs different byte orders In [4], Cooke et al. report 66.8 percent of all possible IPv4 are used. The source (peer) exchange protocol needs to con- addresses are unroutable. If we assume that the bogus peers sider both cases. However, in aMule with versions before are produced randomly, we would find a reasonably large per- 2.1.0, this problem was ignored, causing the byte-order bug. cent of unroutable bogus peers. If the percentage is small, it implies that bogus peers are not produced randomly. The per- How Is the Misconfiguration Disseminated? — eMule offers centage of unroutable peers can serve as a low bound of the three ways to obtain peers for a file: index servers, eMule bogus peers. Furthermore, since we suspect the anti-P2P com- source exchange protocol, and eMule Kademlia DHT. We panies, which try to take down the P2P networks, might be would like to know which one disseminates the misconfigura- related to address misconfiguration, we also check whether tion. Since we have not observed any misconfigured DHT the peers belong to the anti-P2P IP list. traffic in the approximately 2-Tbyte trace we have, we believe With the above tools, we mainly try to answer the following the DHT does not have the byte-order problem. To attribute questions: the problem to index servers or the source exchange protocol, • Which peers spread misconfiguration? we queried 100 files and got a total of 37,079 unique peers. • What is the root cause? We find that 10.7 percent of the peers from the source • How is misconfiguration disseminated? exchange have Honeynet IPs in their neighbor list, but the • What is the percentage of bogus peers in misconfigured P2P index servers never return Honeynet IPs. We have also done a networks? routability check, and found that 12.8 percent of the peers returned by the source exchange protocol are unroutable; Detection and Diagnosis Results none of the peers returned by the index servers is unroutable. We have deployed P2PScope in the NU Honeynet since Also, 15.7 percent of peers from source exchange have reverse August 2007. IPs that run eMule; none of the peers returned by index We first study the general characteristics that are related to servers has such behavior. Therefore, it is not an index server the root causes of all P2P systems. Then we focus on two case but the source exchange protocol that is used for disseminat- studies for eMule and BitTorrent, the most popular P2P soft- ing the misconfiguration. ware generating most of the address-misconfigured P2P traf- fic. What Is the Percentage of Bogus Peers in the Misconfigured P2P Networks? — We try to estimate the percentage of peers IEEE Network • March/April 2011 5
  • 5. LI LAYOUT 4/27/11 2:25 PM Page 6 Normal peer events Anti-P2P peer events Number of events 136-NU, 36-LBL 75-NU, 39-LBL a small number of networks, which belong to server hosting companies. The metric source cor- 90% uTorrent relation measures the degree to which the sources Client software — NU 100% Azureus 10% others of different events are overlapped. Ninety-seven percent of pairs of anti-P2P peer events share 57% Bit Spirit, more than 25 percent of sources. However, the Client software — LBL 31% BitComet 100% Azureus normal peer events are little coordinated. 12% others What Is the Root Cause? — We have not cap- Message length Variable lengths Similar lengths tured any real-time anti-P2P peer events. Thus, we cannot diagnose their root causes. In this sec- Files queried Diverse Latest music/movies tion and later, we mainly focus on normal peer events. Normal peer events can still be caused by Avg. no. of connections/IP 25 400 anti-P2P peers, as discussed next. Source correlation Little coordinated Highly coordinated Root Cause I: Anti-P2P Companies Deliberately Inject Bogus Peers in P2P Systems — Our Arrival and departure Gradually All together P2PScope system tracks the misconfigured peers in real time. We detect some interesting anti-P2P Average event duration 106.1 hours 4.5 hours hosts coming from the same server farm, 72.172.90/24, owned by an anti-P2P company IP distribution Diverse Small number of /24 called Artist-direct. Through further analysis on the server farm, we discover that each of these IPs Table 3. Comparison between normal peer events and anti-P2P peer events. runs hundreds of BitTorrent clients simultaneous- ly. All of the clients run a version of Azureus 2500. Moreover, in the normal case, the peers (from the source exchange protocol) that have the byte-order only respond to a peer exchange message when they download problem. This is challenging because there are certain cases, (or own) the file so that they can return the other peers from as we show below, in which we cannot determine whether which they download. However, these anti-P2P clients respond they have a byte-order problem. Instead, we estimate the to any arbitrary file; that is, they declare they have any file the lower and upper bounds. remote peer wants and exchange the bogus peers with it. Actu- If a peer is not running eMule but its reverse IP is, we ally, they also respond to any random file ID even when no file believe it definitely has a byte-order problem. Such peers are is associated with that value. Artist-direct might have modified 12.7 percent of the total peers (37,079 peers for 100 files). the Azureus client to implement this. If a peer is unroutable but its reverse IP is routable, we We have checked the neighbor lists returned by these anti- believe it is likely to have a byte-order problem. Because for P2P IPs. Most returned peers are bogus peers. Among the those peers we cannot verify whether their reverse IPs run peers, 56.7 percent are unroutable, 1.8 percent are from the eMule directly, we only count them in the upper bound, which same serverfarm, and we could not connect to any of the is 25 percent of the total peers. remaining 41.5 percent of the peers. We suspect anti-P2P companies want to slow down the downloading process by BitTorrent Misconfiguration Diagnosis supplying bogus peers randomly. Which Peers Spread Misconfiguration? — When analyzing the The anti-P2P related normal peers are the normal peers that events discussed earlier, we found two groups of BitTorrent query the files in which anti-P2P peers are interested. Most of events. In one group, the majority of peers (> 90 percent) are these peers (> 90 percent) are within two IP hops of anti- anti-P2P peers. We call them anti-P2P events. In the other P2P peers. The anti-P2P related normal peers are 19.7 per- group, the majority of peers are normal peers. We call them cent of the total misconfigured peers and are responsible for normal events. There are no other events between these. 20.7 percent of the probes sent to the Honeynet. Therefore, Anti-P2P events surprise us, since they show that the peers the anti-P2P companies are responsible for 1/5 of the traffic from the anti-P2P companies somehow get themselves solely observed. Since it is impossible to identify all the anti-P2P misconfigured. peers, this estimate is quite conservative. As shown in Table 3, we compare the properties of these two types of events and discover that their characteristics are Root Cause II: The Byte-Order Problem of KTorrent Clients — indeed different. Peers from anti-P2P peer events all use We further actively track peers related to the files requested Azureus. We conjecture that anti-P2P companies modify by the misconfigured peers. We find that a large portion of Azureus and add their functionalities because Azureus is a unroutable peers are supplied by the KTorrent peers, which popular open source BitTorrent client with a nice plug-in are suspicious. We also study uTorrent and Azureus, since architecture. On the other hand, peers from normal peer they are the most popular clients. We set up a controlled envi- events run on a diverse set of clients. In addition, the peers in ronment to test these three clients. We evaluate the top seven anti-P2P peer events are highly coordinated. They start con- torrent files seen in Honeynets. tacting the Honeynet IPs at almost the same time. They also We classify the peers seen by clients into two types: depart at the same time. By contrast, the peers in normal peer • Incoming peers: peers that come to the client events arrive and depart randomly and gradually. The mes- • Outgoing peers: peers that are sent out by the client sages sent in anti-P2P peer events are almost the same; they Outgoing — incoming peers are the peers that go out from the have a similar message length and even a similar bitmap that client but never come in. These are the bogus peers created annotates the content availability distribution. To some extent, by the client itself. Among all outgoing KTorrent peers, 56 the group of anti-P2P peers is like a clone army with identical percent of them are bogus. Thus, it shows that KTorrent is soldiers. Moreover, the peers in anti-P2P peer events are from one of the sources of misconfigurations. µTorrent and Azureus 6 IEEE Network • March/April 2011
  • 6. LI LAYOUT 4/27/11 2:25 PM Page 7 void UTPex::encode(…) // UTPEX.CPP { ... // Use KDE API to get an IP in Network Byte Order. bilities and misconfiguration in Border Gateway Proto- // WriteUint32 swap the byte order again!!! WriteUint32(buf,size,addr.ipAddress().IPv4Addr()); col (BGP) [6]. However, we find little literature on the …} study of P2P misconfiguration. void WriteUint32(Uint8* buf,Uint32 off,Uint32 val) //FUNCTIONS.CPP { P2P Measurement // swap the byte order Most existing measurement studies of P2P networks are buf[off + 0] = (Uint8) ((val & 0xFF000000) >> 24); based on the measurement of normal P2P traffic [7]. On buf[off + 1] = (Uint8) ((val & 0x00FF0000) >> 16); the other hand, we measure the abnormal P2P traffic buf[off + 2] = (Uint8) ((val & 0x0000FF00) >> 8); observed in Honeynets. This is also observed by Yeg- buf[off + 3] = (Uint8) (val & 0x000000FF); neswaran et al. in [1]. Their work focuses on the compar- } ison of network-level characteristics of P2P address misconfiguration with botnets and worms, and places an Figure 3. Code snippet of KTorrent byte order bug. emphasis on accurately classifying botnet sweeps and worm outbreaks. We mainly study the prevalence and trend of address-misconfigured P2P traffic, and its root do not have any outgoing — incoming peers. causes. Liang et al. show that P2P systems like Fasttrack and After that, we study the source code of KTorrent and pin- Overnet are vulnerable to index poisoning attacks [8]. In con- point the exact bug. The problem is also a byte-order prob- trast, we investigate the root causes for both intentional and lem. In Fig. 3, we show the code snippet related to this bug. unintentional index misconfiguration. The uTorrent PEX protocol requires the IP addresses in peer exchange messages to be in network byte order. When KTor- Distributed System Debugging rent sends out the peer exchange messages, in its encode func- Distributed systems are extremely hard to debug. People have tion it reverses the byte order twice, so finally IP addresses proposed various ways to debug distributed systems, including become host byte order again. addr.ipAddress().IPv4Addr() is replay-based predicate checking [9], online monitoring [10], a KDE library method that returns IP addresses in network log forensic analysis [11], network trace based inference [12], byte order. However, in the WriteUint32 function implemented and model checking [13]. Although some of the approaches by KTorrent, it reverses the byte order again, and thus causes might be useful for the root cause diagnosis of P2P address the problem. To our knowledge, we have not found anyone misconfiguration, the following properties make leveraging else reporting this bug. any of the above existing techniques hard. First, the P2P sys- tems we diagnose are heterogeneous. Multiple different How Is the Misconfiguration Disseminated? —Similar to the implementations exist for a given P2P protocol such as Bit- eMule study, we find that BitTorrent DHT and Index servers Torrent. Therefore, it is hard to apply model checking tech- (trackers) are unlikely to spread or to generate misconfigura- niques. Second, the peers are out of our control. We cannot tions. Unlike eMule, BitTorrent has several incompatible peer modify the peers to add the logging functionality required by exchange protocols proposed by popular BitTorrent clients, replay-based predicate checking, online monitoring, and log such as µTorrent, Azureus, and BitComet. The µTorrent PEX forensic analysis. It is also very difficult to log network traces protocol is the most popular, and we find it is the source for from remote peers. Third, the P2P systems we monitor are misconfiguration dissemination. very large, consisting of millions of peers. All the existing To understand which µTorrent PEX compatible clients techniques cannot handle this scale yet. propagate the misconfigurations, we study the behavior of dif- ferent clients. We mainly study, when a client obtains peers from uTorrent PEX protocol, how it checks the availability of Conclusions the peers before it further propagates the peers. We classify In this article, we first discover a new traffic pattern, P2P the behavior into three categories: address misconfiguration. Its traffic is about 38.9 percent of • No checking Internet background radiation and is growing quickly. Such • Requiring peers to respond to SYN-ACK (only active port traffic is harmful for both end users and ISPs. As the second regardless of whether it is running BitTorrent) contribution, we design P2PScope to understand the root • Requiring peers to respond to BT-HANDSHAKE messages causes. We find that data plane traffic radiation is the com- KTorrent propagates anything that comes to it without even mon characteristic of the six P2P systems. For eMule, the checking for an open port. µTorrent propagates anything that problem is mainly caused by a byte-order problem. For Bit- has an open port. Azureus only propagates peers that are Torrent, we find two reasons: active BitTorrent clients. Therefore, KTorrent will fully prop- • Anti-P2P companies deliberately injecting invalid peers agate the bogus peers. µTorrent will propagate the bogus • KTorrent byte-order bug peers if they happen to have the port open. Azureus usually will not propagate any bogus peers. To reduce bogus peers, Acknowledgments we believe all the clients should have strict checking like Our thanks to Vern Paxson and the Lawrence Berkeley Azureus. In this way, the address-misconfigured P2P traffic National Laboratory for support of the LBL Honeynet opera- can be greatly reduced. tions. This work was supported in part by NSF Awards 0627715, DoD Young Investigator Award FA9550-07-1-0074, Related Work and DoE Career award DE-FG02-05ER25692/A001. Opin- ions, findings, and conclusions or recommendations are those Misconfiguration Studies of the authors, and do not necessarily reflect the views of the Misconfiguration is widely spread across different network funding sources. systems on the Internet. Labovitz et al. studied wide-area backbone failures and concluded that misconfiguration could be responsible for 12 percent of the incidents [5]. There has also been much recent work that considers the various insta- IEEE Network • March/April 2011 7
  • 7. LI LAYOUT 4/27/11 2:25 PM Page 8 References [1] V. Yegneswaran et al., “Using Honeynets for Internet Situational Awareness,” Proc. ACM Hotnets IV, 2005. [2] Ruben Torres et al., “Inferring Undesirable Behavior from P2P Traffic Analy- sis,” Proc. ACM SIGMETRICS, 2009. [3] N. Provos, “A Virtual Honeypot Framework,” Proc. USENIX Security, 2004. [4] E. Cooke et al., “The Dark Oracle: Perspective-Aware Unused and Unreach- able Address Discovery,” Proc. USENIX NSDI, 2006. [5] C. Labovitz et al., “Experimental Study of Internet Stability and Backbone Failures,” Proc. FTCS: Int’l. Symp. Fault-Tolerant Computing, 1999. [6] R. Mahajan, D. Wetherall, and T. Anderson, “Understanding BGP Misconfig- uration,” Proc. ACM SIGCOMM, 2002. [7] K. Gummadi et al., “Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload,” Proc. ACM SOSP, 2003. [8] J. Liang et al., “The Index Poisoning Attack in P2P File-Sharing Systems,” Proc. IEEE INFOCOM, 2006. [9] D. Geels et al., “Friday: Global Comprehension for Distributed Replay,” Proc. NSDI, 2007. [10] X. Liu et al., “D3s: Debugging Deployed Distributed Systems,” Proc. NSDI, 2008. [11] M. K. Aguilera et al., “Performance Debugging for Distributed Systems of Black Boxes,” Proc. ACM SOSP, 2003. [12] P. Bahl et al., “Towards Highly Reliable Enterprise Network Services Via Inference of Multi-Level Dependencies,” Proc. ACM SIGCOMM, 2007. [13] C. Killian et al., “Finding Liveness Bugs in System Code,” Proc. NSDI, 2007. Biographies ZHICHUN LI (zhichun@nec-labs.com) is a research staff member with NEC Labora- tories America, Inc. He received his Ph.D. in computer science from Northwest- ern University in December 2009. His research focuses on network security, system security, network measurement and monitoring, and distributed system diagnosis. A NUP G OYAL is an engineer with Yahoo Search since October 2008. He received his M.S. degree in 2008 from the Computer Science and Engineering Department at Northwestern University and his B.Tech degree from the Indian Institute of Technology, Kharagur. YAN CHEN is an associate professor in the Department of Electrical Engineering and Computer Science at Northwestern University, Evanston, Illinois. He got his Ph.D. in computer science from the University of California at Berkeley in 2003. His research interests include network measurement, monitoring, and security, and P2P systems. He won the DOE Early CAREER award in 2005, and Microsoft Trustworthy Computing Awards in 2004 and 2005. ALEKSANDAR KUZMANOVIC is an associate professor in the Department of Electri- cal Engineering and Computer Science at Northwestern University. He received his B.S. and M.S. degrees from the University of Belgrade, Serbia, in 1996 and 1999, respectively. He received the Ph.D. degree from Rice University in 2004. His research interests are in the area of computer networking with emphasis on design, measurements, analysis, denial-of-service resiliency, and prototype imple- mentation of protocols and algorithms for the Internet. He received the National Science Foundation CAREER Award in 2008. 8 IEEE Network • March/April 2011