Design and evaluation of a proxy cache for


Published on

Published in: Education, Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Design and evaluation of a proxy cache for

  1. 1. 1 Design and Evaluation of a Proxy Cache for Peer to Peer Traffic Mohamed Hefeeda, Senior Member, IEEE, and Cheng-Hsin Hsu, Student Member, IEEE, and Kianoosh Mokhtarian, Student Member, IEEE Abstract—Peer-to-peer (P2P) systems generate a major fraction of the current Internet traffic, and they significantly increase the load on ISP networks and the cost of running and connecting customer networks (e.g., universities and companies) to the Internet. To mitigate these negative impacts, many previous works in the literature have proposed caching of P2P traffic, but very few (if any) have considered designing a caching system to actually do it. This paper demonstrates that caching P2P traffic is more complex than caching other Internet traffic, and it needs several new algorithms and storage systems. Then, the paper presents the design and evaluation of a complete, running, proxy cache for P2P traffic, called pCache. pCache transparently intercepts and serves traffic from different P2P systems. A new storage system is proposed and implemented in pCache. This storage system is optimized for storing P2P traffic, and it is shown to outperform other storage systems. In addition, a new algorithm to infer the information required to store and serve P2P traffic by the cache is proposed. Furthermore, extensive experiments to evaluate all aspects of pCache using actual implementation and real P2P traffic are presented. Index Terms—Caching, Peer-to-Peer Systems, File Sharing, Storage Systems, Performance Evaluation !1 I NTRODUCTION In this paper, we present the design, implementation, and evaluation of a proxy cache for P2P traffic, which we callFile-sharing using peer-to-peer (P2P) systems is currently the pCache. pCache is designed to transparently intercept andkiller application for the Internet. A huge amount of traffic serve traffic from different P2P systems, while not affect-is generated daily by P2P systems [1]–[3]. This huge amount ing traffic from other Internet applications. pCache explicitlyof traffic costs university campuses thousands of dollars every considers the characteristics and requirements of P2P traffic.year. Internet service providers (ISPs) also suffer from P2P As shown by numerous previous studies, e.g., [1], [2], [8],traffic [4], because it increases the load on their routers and [10], [11], network traffic generated by P2P applications haslinks. Some ISPs shape or even block P2P traffic to reduce different characteristics than traffic generated by other Internetthe cost. This may not be possible for some ISPs, because applications. For example, object size, object popularity, andthey fear losing customers to their competitors. To mitigate the connection pattern in P2P systems are quite different fromnegative effects of P2P traffic, several approaches have been their counterparts in web systems. Most of these characteristicsproposed in the literature, such as designing locality-aware impact the performance of caching systems, and therefore,neighbor selection algorithms [5] and caching of P2P traffic they should be considered in their design. In addition, as will[6]–[8]. We believe that caching is a promising approach to be demonstrated in this paper, designing proxy caches formitigate some of the negative consequences of P2P traffic, P2P traffic is more complex than designing caches for webbecause objects in P2P systems are mostly immutable [1] or multimedia traffic, because of the multitude and diverseand the traffic is highly repetitive [9]. In addition, caching nature of existing P2P systems. Furthermore, P2P protocolsdoes not require changing P2P protocols and can be deployed are not standardized and their designers did not provision fortransparently from clients. Therefore, ISPs can readily deploy the potential caching of P2P traffic. Therefore, even decidingcaching systems to reduce their costs. Furthermore, caching whether a connection carries P2P traffic and if so extractingcan co-exist with other approaches, e.g., enhancing P2P traffic the relevant information for the cache to function (e.g., thelocality, to address the problems created by the enormous requested byte range) are non-trivial tasks.volume of P2P traffic. The main contributions of this paper are as follows.• M. Hefeeda is with the School of Computing Science, Simon Fraser • We propose a new storage system optimized for P2P University, 250-13450 102nd Ave, Surrey, BC V3T0A3, Canada. proxy caches. The proposed system efficiently serves Email:• C. Hsu is with Deutsche Telekom R&D Lab USA, 5050 El Camino Real requests for object segments of arbitrary lengths, and it Suite 221, Los Altos, CA 94022. has a dynamic segment merging scheme to minimize the• K. Mokhtarian is with Mobidia Inc., 230-10451 Shellbridge Way, number of disk I/O operations. Using traces collected Richmond, BC V6X 2W8, Canada. from a popular P2P system, we show that the proposed This work is partially supported by the Natural Sciences and Engineering storage system is much more efficient in handling P2P Research Council (NSERC) of Canada. traffic than storage systems designed for web proxy caches. More specifically, our experimental results indi-
  2. 2. 2 cate that the proposed storage system is about five times effective handling of P2P traffic in order to reduce its negative more efficient than the storage system used in a famous impacts on ISPs and the Internet. web proxy implementation. The rest of this paper is organized as follows. In Sec. 2, we • We propose a new algorithm to infer the information summarize the related work. Sec. 3 presents an overview of the required to store and serve P2P traffic by the cache. proposed pCache system. This is followed by separate sections This algorithm is needed because some P2P systems (e.g., describing different components of pCache. Sec. 4 presents the BitTorrent) maintain information about the exchanged ob- new inference algorithm. Sec. 5 presents the proposed storage jects in metafiles that are held by peers, and peers request management system. Sec. 6 explains why full transparency data relative to information in these metafiles, which is required in P2P proxy caches and describes our method to are not known to the cache. Our inference algorithm is achieve it in pCache. It also explains how non-P2P connections efficient and provides a quantifiable confidence on the are efficiently tunneled through pCache. We experimentally returned results. Our experiments with real BitTorrent evaluate the performance of pCache in Sec. 7. Finally, in clients running behind our pCache confirm our theoretical Sec. 8, we conclude the paper and outline possible extensions analysis and show that the proposed algorithm returns for pCache. correct estimates in more than 99.7% of the cases. Also, the ideas of the inference algorithm are general and could be useful in inferring other characteristics of P2P traffic. 2 R ELATED W ORK • We demonstrate the importance and difficulty of provid- P2P Traffic Caching: Models and Systems. The benefits ing full transparency in P2P proxy caches, and we present of caching P2P traffic have been shown in [9] and [4]. Object our solution for implementing it in the Linux kernel. We replacement algorithms especially designed for proxy cache of also propose efficient splicing of non-P2P connections in P2P traffic have also been proposed in [6] and in our previous order to reduce the processing and memory overhead on works [8], [11]. The above works do not present the design the cache. and implementation of an actual proxy cache for P2P traffic, • We conduct an extensive experimental study to evaluate nor do they address storage management issues in such caches. the proposed pCache system and its individual com- The authors of [7] propose using already-deployed web caches ponents using actual implementation in two real P2P to serve P2P traffic. This, however, requires modifying the networks: BitTorrent and Gnutella which are currently P2P protocols to wrap their messages in HTTP format and among the most popular P2P systems. (A recent white to discover the location of the nearest web cache. Given the paper indicates the five most popular P2P protocols are: distributed and autonomous nature of the communities devel- BitTorrent, eDonkey, Gnutella, DirectConnect, and Ares oping P2P client software, incorporating these modifications [12]). BitTorrent and Gnutella are chosen because they into actual clients may not be practical. In addition, several are different in their design and operation: The former is measurement studies have shown that the characteristics of swarm-based and uses built-in incentive schemes, while P2P traffic are quite different from those of web traffic [1], the latter uses a two-tier overlay network. Our results [2], [8], [10], [11]. These different characteristics indicate that show that our pCache is scalable and it benefits both the web proxy caches may yield poor performance if they were clients and the ISP in which it is deployed, without hurt- to be used for P2P traffic. The experimental results presented ing the performance of the P2P networks. Specifically, in this paper confirm this intuition. clients behind the cache achieve much higher download In order to store and serve P2P traffic, proxy caches need speeds than other clients running in the same conditions to identify connections belonging to P2P systems and extract without the cache. In addition, a significant portion of the needed information from them. Several previous works address traffic is served from the cache, which reduces the load the problem of identifying P2P traffic, including [15]–[17]. on the expensive WAN links for the ISP. Our results also The authors of [15] identify P2P traffic based on application show that the cache does not reduce the connectivity of signatures appearing in the TCP stream. The authors of [17] clients behind it, nor does it reduce their upload speeds. analyze the behavior of different P2P protocols, and identify This is important for the whole P2P network, because patterns specific to each protocol. The authors of [16] identify reduced connectivity could lead to decreased availability P2P traffic using only transport layer information. A compar- of peers and the content stored on them, while reduced ison among different methods is conducted in [18]. Unlike upload speeds could degrade the P2P network scalability. our work, all previous identification approaches only detect the presence of P2P traffic: they just decide whether a packet In addition, this paper summarizes our experiences and or a stream of packets belongs to one of the known P2Plessons learned during designing and implementing a running systems. This is useful in applications such as traffic shapingsystem for caching P2P traffic, which was demonstrated at and blocking, capacity planning, and service differentiation.[13]. These lessons could be of interest to other researchers P2P traffic caching, however, does need to go beyond justand to companies developing products for managing P2P identifying P2P traffic; it requires not only the exchangedtraffic. We make the source code of pCache (more than 11,000 object ID, but also the requested byte range of that object. Inlines of C++ code) and our P2P traffic traces available to the some P2P protocols, this information can easily be obtainedresearch community [14]. Because it is open-source, pCache by reading a few bytes from the payload, while in others it iscould stimulate more research on developing methods for not straightforward.
  3. 3. 3 Several P2P caching products have been introduced to the pCachemarket. Oversi’s OverCache [19] implements P2P caching, Storage System Managerbut takes a quite different approach compared to pCache. An Block In−memory ReplacementOverCache server participates in P2P networks and only serves Allocation Structures Policy Methodpeers within the ISP. This approach may negatively affect fair-ness in P2P networks because peers in ISPs with OverCachedeployed would never contribute as they can always receive P2P Traffic Processor Connectiondata from the cache servers without uploading to others. This Manager Parser Composer Analyzerin turns degrades the performance of P2P networks. PeerApp’sUltraBand [20] supports multiple P2P protocols, such as Bit- P2P TrafficTorrent, Gnutella, and FastTrack. These commercial products Gateway Routerdemonstrate the importance and timeliness of the problemaddressed in this paper. The details of these products, however, P2P Traffic Identifierare not publicly available, thus the research community cannotuse them to evaluate new P2P caching algorithms. In contrast, Internet Trafficthe pCache system is open-source and could be used to develop Transparent Proxyalgorithms for effective handling of P2P traffic in order toreduce loads on ISPs and the Internet. That is, we develop Fig. 1. The design of pCache.pCache software as a proof of concept, rather than yet anothercommercial P2P cache. Therefore, we do not compare ourpCache against these commercial products. multimedia object into segments using segmentation strategies Storage Management for Proxy Caches. Several storage such as uniform, exponential, and frame-based. P2P proxymanagement systems have been proposed in the literature caches do not have the freedom to choose a segmentationfor web proxy caches [21]–[24] and for multimedia proxy strategy; it is imposed by P2P software clients. Rate-splitcaches [25]–[27]. These systems offer services that may not caching employs scalable video coding that encodes a videobe useful for P2P proxy caches, e.g., minimizing startup delay into several substreams, and selectively caches some of theseand clustering of multiple objects coming from the same substreams. This requires scalable video coding structures,web server, and they lack other services, e.g., serving partial which renders rate-split caching useless for P2P applications.requests with arbitrary byte ranges, which are essential in P2Pproxy caches. We describe some examples to show why suchsystems are not suitable for P2P caches. 3 OVERVIEW OF P C ACHE Most storage systems of web proxy caches are designed for The proposed pCache is to be used by autonomous systemssmall objects. For example, the Damelo system [28] supports (ASes) or ISPs that are interested in reducing the burdenobjects that have sizes less than a memory page, which is of P2P traffic. Caches in different ASes work independentlyrather small for P2P systems. The UCFS system [23] maintains from each other; we do not consider cooperative cachingdata structures to combine several tiny web objects together in in this paper. pCache would be deployed at or near theorder to fill a single disk block and to increase disk utilization. gateway router of an AS. The main components of pCacheThis adds overhead and is not needed in P2P systems, because are illustrated in Fig. 1. At high-level, a client participating ina segment of an object is typically much larger than a disk a P2P network issues a request to download an object. Thisblock. Clustering of web objects downloaded from the same request is transparently intercepted by pCache. If the requestedweb server is also common in web proxy caches [22], [24]. object or parts of it are stored in the cache, they are servedThis clustering exploits the temporal correlation among these to the requesting client. This saves bandwidth on the externalobjects in order to store them near to each other on the disk. (expensive) links to the Internet. If no part of the requestedThis clustering is not useful in P2P systems, because even a object is found in the cache, the request is forwarded to thesingle object is typically downloaded from multiple senders. P2P network. When the response comes back, pCache may Proxy caches for multimedia streaming systems [25]–[27], store a copy of the object for future requests from other clientson the other hand, could store large objects and serve byte in its AS. Clients inside the AS as well as external clients areranges. Multimedia proxy caches can be roughly categorized not aware of pCache, i.e., pCache is fully transparent in bothinto four classes [25]: sliding-interval, prefix, segment-based, directions.and rate-split caching. Sliding-interval caching employs a As shown in Fig. 1, the Transparent Proxy and P2P Trafficsliding-window for each cached object to take advantage of Identifier components reside on the gateway router. Theythe sequential access pattern that is common in multimedia transparently inspect traffic going through the router andsystems. Sliding-interval caching is not applicable to P2P forward only P2P connections to pCache. Traffic that doestraffic, because P2P applications do not request segments in not belong to any P2P system is processed by the routera sequential manner. Prefix caching stores the initial portion in the regular way and is not affected by the presence ofof multimedia objects to minimize client start-up delays. P2P pCache. Once a connection is identified as belonging to aapplications seek shorter total download times rather than P2P system, it is passed to the Connection Manager, whichstart-up delays. Segment-based multimedia caching divides a coordinates different components of pCache to store and serve
  4. 4. 4requests from this connection. pCache has a custom-designed inspecting the byte stream of the connection, the ParserStorage System optimized for P2P traffic. In addition, pCache determines the boundaries of messages exchanged betweenneeds to communicate with peers from different P2P systems. peers, and it extracts the request and response messages thatFor each supported P2P system, the P2P Traffic Processor are of interest to the cache. The Parser returns the ID ofprovides three modules to enable this communication: Parser, the object being downloaded in the session, as well as theComposer, and Analyzer. The Parser performs functions such requested byte range (start and end bytes). The byte range isas identifying control and payload messages, and extracting relative to the whole object. The Composer prepares protocol-messages that could be of interest to the cache such as specific messages, and may combine data stored in the cacheobject request messages. The Composer constructs properly- with data obtained from the network into one message to beformatted messages to be sent to peers. The Analyzer is a sent to a peer. While the Parser and Composer are mostlyplace holder for any auxiliary functions that may need to be implementation details, the Analyzer is not. The Analyzerperformed on P2P traffic from some systems. For example, contains an algorithm to infer information required by thein BitTorrent the Analyzer infers information (piece length) cache to store and serve P2P traffic, but is not included inneeded by pCache that is not included in messages exchanged the traffic itself. This is not unusual (as demonstrated belowbetween peers. for the widely-deployed BitTorrent), since P2P protocols are The design of the proposed P2P proxy cache is the re- not standardized and their designers did not conceive/provisionsult of several iterations and refinements based on extensive for caching P2P traffic. We propose an algorithm to infer thisexperimentation. Given the diverse and complex nature of information with a quantifiable confidence.P2P systems, proposing a simple, well-structured, design thatis extensible to support various P2P systems is indeed anontrivial systems research problem. Our running prototype 4.2 Inference Algorithm for Information Required bycurrently serves BitTorrent and Gnutella traffic at the same the Cachetime. To support a new P2P system, two things need to be The P2P proxy cache requires specifying the requested bytedone: (i) installing the corresponding application identifier in range such that it can correctly store and serve segments.the P2P Traffic Identifier, and (ii) loading the appropriate Some P2P protocols, most notably BitTorrent, do not provideParser, Composer, and optionally Analyzer modules of the this information in the traffic itself. Rather, they indirectlyP2P Traffic Processor. Both can be done in run-time without specify the byte range, relative to information held only by endrecompiling or impacting other parts of the system. peers and not known to the cache. Thus, the cache needs to Finally, we should mention that the proxy cache design employ an algorithm to infer the required information, whichin Fig. 1 does not require users of P2P systems to perform we propose in this subsection. We describe our algorithm inany special configurations of their client software, nor does it the context of the BitTorrent protocol, which is currently theneed the developers of P2P systems to cooperate and modify most-widely used P2P protocol [12]. Nonetheless, the ideasany parts of their protocols: It caches and serves current P2P and analysis of our algorithm are fairly general and could betraffic as is. Our design, however, can further be simplified applied to other P2P protocols, after collecting the appropriateto support cases in which P2P users and/or developers may statistics and making minor adjustments.cooperate with ISPs for mutual benefits—a trend that has Objects exchanged in BitTorrent are divided into equal-recently seen some interests with projects such as P4P [29], size pieces. A single piece is downloaded by issuing multiple[30]. For example, if users were to configure their P2P clients requests for byte ranges, i.e., segments, within the use proxy caches in their ISPs listening on a specific port, Thus, a piece is composed of multiple segments. In requestthe P2P Traffic Identifier would be much simpler. messages, the requested byte range is specified by an offset within the piece and the number of bytes in the range. Peers4 P2P T RAFFIC I DENTIFIER AND P ROCESSOR know the length of the piece of the object being downloaded, because it is included in the metafile (torrent file) held byThis section describes the P2P Traffic Identifier and Processor them. The cache needs the piece length for three reasons.components of pCache, and presents a new algorithm to infer First, the piece length is needed to perform segment merging,information needed by the cache to store and serve P2P traffic. which can reduce the overhead on the cache. For example, assuming the cache has received byte range [0, 16] KB of4.1 Overview piece 1 and range [0, 8] KB of piece 2; without knowing theThe P2P Traffic Identifier determines whether a connection precise piece length, the cache can not merge these two bytebelongs to any P2P system known to the cache. This is done ranges into a continuous byte range. Second, the piece length isby comparing a number of bytes from the connection stream required to support cross-torrent caching of the same content,against known P2P application signatures. The details are as different torrent files can have different piece length. Third,similar to the work in [15], and therefore omitted. We have the piece length is needed for cross-system caching. Forimplemented identifiers for BitTorrent and Gnutella. example, a file cached for in BitTorrent networks may be To store and serve P2P traffic, the cache needs to per- used to serve Gnutella users requesting the same file, whileform several functions beyond identifying the traffic. These Gnutella protocol has no concept of piece length. One wayfunctions are provided by the P2P Traffic Processor, which for the cache to obtain this information is to capture metafileshas three components: Parser, Composer, and Analyzer. By and match them with objects. This, however, is not always
  5. 5. 5possible, because peers frequently receive metafiles by various 40%means/applications including emails and web downloads. Fraction of Total Torrents Our algorithm infers the piece length of an object when the 30%object is first seen by the cache. The algorithm monitors afew segment requests, and then it makes an informed guess 20%about the piece length. To design the piece length inferencealgorithm, we collected statistics on the distribution of the 10%piece length in actual BitTorrent objects. We wrote a script 0%to gather and analyze a large number of torrent files. We 16KB 64KB 256KB 1MB 4MB 16MB Piece Lengthcollected more than 43,000 unique torrent files from populartorrent websites on the Internet. Our analysis revealed that Fig. 2. Piece length distribution in BitTorrent.more than 99.8% of the pieces have sizes that are powerof 2. The histogram of the piece length distribution is givenin Fig. 2, which we will use in the algorithm. Using theseresults, we define the set of all possible piece lengths as Fig. 2 to find the required number of samples n in order toS = {2kmin , 2kmin +1 , . . . , 2kmax }, where 2kmin and 2kmax achieve a given probability of correctness.are the minimum and maximum possible piece lengths. We The assumption that samples are equally likely to be indenote by L the actual piece length that we are trying to infer. [0, L] is realistic, because the cache observes requests fromA request for a segment comes in the form {offset, size} within many clients at the same time for an object, and the clientsthis piece, which tells us that the piece length is at least as are not synchronized. In few cases, however, there could belarge as offset + size. There are multiple requests within the one client and that client may issue requests sequentially. Thispiece. We denote each request by xi = offseti + sizei , where depends on the actual implementation of the BitTorrent client0 < xi ≤ L. The inference algorithm observes a set of n software. To address this case, we use the following heuristic.samples x1 , x2 , . . . , xn and returns a reliable guess for L. We maintain the maximum value seen in the samples taken After observing the n-th sample, the algorithm computes so far. If a new sample increases this maximum value, wek such that 2k is the minimum power of two integer that is reset the counters of the observed samples. In Sec. 7.5, wegreater than or equal to xi (i = 1, 2, . . . , n). For example, if empirically validate the above inference algorithm and wen = 5 and the xi values are 9, 6, 13, 4, 7, then the estimated show that the simple heuristic improves its accuracy.piece length would be 2k = 16. Our goal is to determine the Handling Incorrect Inferences. In the evaluation section,minimum number of samples n such that the probability that we experimentally show that the accuracy of our inferencethe estimated piece length is correct, i.e., P r(2k = L), exceeds algorithm is about 99.7% when we use six segments in thea given threshold, say 99%. We compute the probability that inference, and it is almost 100% if we use 10 segments. Thethe estimated piece length is correct as follows. We define E very rare incorrect inferences are observed and handled byas the event that the n samples are in the range [0, 2k ] and the cache as follows. An incorrect inference is detected byat least one of them does not fall in [0, 2k−1 ], where 2k is observing a request exceeding the inferred piece boundary,the estimated value returned by the algorithm. Assuming that because we use the minimum possible estimate for the pieceeach sample is equally likely to be anywhere within the range length. This may happen at the beginning of a download[0, L], we have: session for an object that was never seen by the cache before. The cache would have likely stored very few segments of this P r(2k = L) object and most of them are still in the buffer, not flushed toP r(2k = L | E) = P r(E | 2k = L) P r(E) the disk yet. Thus, all the cache needs to do is to re-adjust a P r(2k = L) few pointers in the memory with the correct piece length. In =[1 − P r(all n samples in [0, 2k−1 ] | 2k = L)] the worst case, a few disk blocks will also need to be moved P r(E) to other disk locations. An even simpler solution is possible: 1 n P r(2k = L) =[1 − ( ) ] . (1) discard from the cache these few segments, which has a 2 P r(E) negligible cost. Finally, we note that the cache starts storingIn the above equation, P r(2k = L) can be directly known segments after the inference algorithm returns an estimate, butfrom the empirically-derived histogram in Fig. 2. Furthermore, it does not immediately serve these segments to other clientsP r(E) is given by: until a sufficient number of segments (e.g., 100) have been downloaded to make the probability of incorrect inferenceP r(E) = P r(l = L)P r(E | l = L) practically 0. Therefore, the consequence of a rare incorrect l∈S inference is a slight overhead in the storage system and for a 1 1 1 temporary period (till the correct piece length is computed), =P r(2k = L)(1 − ( )n ) + P r(2k+1 = L)(( )n − ( )n )+ 2 2 4 and no wrong data is served to the clients at anytime. kmax 1 kmax −k n 1 kmax −k+1 n · · · + P r(2 = L)((( ) ) − (( ) ) ). 2 2 5 S TORAGE M ANAGEMENT (2) In this section, we elaborate on the need for a new storageOur algorithm solves Eqs. (2) and (1) using the histogram in system, in addition to what we mentioned in Sec. 2. Then, we
  6. 6. 6 Segment Lookup Table Object #0present the design of the proposed storage system. ID Hash Segments Segment #0 (disjoint and sorted) Offset Len RefCnt Buffer* Block* Object #0 Segments*5.1 The Need for a New Storage System Segment #1 Object #1 Segments* Offset Len RefCnt Buffer* Block*In P2P systems, a receiving peer requests an object from Segment#nmultiple sending peers in the form of segments, where a Offset Len RefCnt Buffer* Block*segment is a range of bytes. Object segmentation is protocoldependent and even implementation dependent. Moreover, seg-ment sizes could vary based on the number and type of senders Segment* Size Datain the download session, as in the case of Gnutella. Therefore,successive requests of the same object can be composed of Segment* Size Datasegments with different sizes. For example, a request comes Memory Page Buffers Disk Blocksto the cache for the byte range [128—256 KB] as one segment,which is then stored locally in the cache for future requests. Fig. 3. The proposed storage management system.However, a later request may come for the byte range [0—512KB] of the same object and again as one segment. The cacheshould be able to identify and serve the cached portion of yield very close performance to the kernel-space ones. A user-the second request. Furthermore, previous work [11] showed space storage management system can be built on top of athat to improve the cache performance, objects should be large file created by the file system of the underlying operatingincrementally admitted in the cache because objects in P2P system, or it can be built directly on a raw disk partition. Wesystems are fairly large, their popularity follows a flattened- implement and analyze two versions of the proposed storagehead model [1], [11], and they may not be even downloaded in management system: on top of the ext2 Linux file systemtheir entirety since users may abort the download sessions [1]. (denoted by pCache/Fs) and on a raw disk partition (denotedThis means that the cache will usually store random fragments by pCache/Raw).of objects, not complete contiguous objects, and the cache As shown in Fig. 3, the proposed storage system maintainsshould be able to serve these partial objects. two structures in the memory: metadata and page buffers. While web proxy caches share some characteristics with The metadata is a two-level lookup table designed to enableP2P proxy caches, their storage systems can yield poor per- efficient segment lookups. The first level is a hash table keyedformance for P2P proxy caches, as we show in Sec. 7.3. on object IDs; collisions are resolved using common chainingThe most important reason is that most web proxy caches techniques. Every entry points to the second level of the table,consider objects with unique IDs, such as URLs, in their which is a set of cached segments belonging to the sameentirety. P2P applications almost never request entire objects, object. Every segment entry consists of a few fields: Offsetinstead, segments of objects are exchanged among peers. A indicates the absolute segment location within the object,simple treatment of reusing web proxy caches for P2P traffic Len represents the number of bytes in this segment, RefCntis to define a hash function on both object ID and segment keeps track of how many connections are currently using thisrange, and consider segments as independent entities. This segment, Buffer points to the allocated page buffers, and Blocksimple approach, however, has two major flaws. First, the hash points to the assigned disk blocks. Each disk is divided intofunction destroys the correlation among segments belonging to fixed-size disk blocks, which are the smallest units of diskthe same objects. Second, this approach cannot support partial operations. Therefore, block size is a system parameter thathits, because different segment ranges are hashed to different, may affect caching performance: larger block sizes are moreunrelated, values. vulnerable to segmentations, while smaller block sizes may The proposed storage system supports efficient lookups lead to high space overhead. RefCnt is used to prevent evictingfor partial hits. It also improves the scalability of the cache a buffer page if there are connections currently using reducing the number of I/O operations. This is done Notice that segments do not arrive to the cache sequentially,by preserving the correlation among segments of the same and not all segments of an object will be stored in theobjects, and dynamically merging segments. To the best of our cache. Thus, a naive contiguous allocation of all segmentsknowledge, there are no other storage systems proposed in the will waste memory, and will not efficiently find partial hits.literature for P2P proxy caches, even though the majority of We implement the set of cached segments as a balanced (red-the Internet traffic comes from P2P systems [1]–[3]. black) binary tree, which is sorted based on the Offset field. Using this structure, partial hits can be found in at most O(log S) steps, where S is the number of segments in the5.2 The Proposed Storage System object. This is done by searching on the offset field. SegmentA simplified view of the proposed storage system is shown in insertions and deletions are also done in logarithmic numberFig. 3. We implement the proposed system in the user space, of steps. Since this structure supports partial hits, the cachedso that it is not tightly coupled with the kernels of operating data is never obtained from the P2P network again, and onlysystems, and can be easily ported to different kernel versions, mutually disjoint segments are stored in the cache.operating system distributions, and even to various operating The second part of the in-memory structures is the pagesystems. As indicated by [24], user-space storage systems buffers. Page buffers are used to reduce disk I/O operations as
  7. 7. 7well as to perform segment merging. As shown in Fig. 3, we While partial transparency is sufficient for most web proxypropose to use multiple sizes of page buffers, because requests caches, it is not enough and will not work for P2P proxycome to the cache from different P2P systems in variable sizes. caches. This is because external peers may not respond toWe also pre-allocate these pages in memory for efficiency. requests coming from the IP address of the cache, since theDynamic memory allocation may not be suitable for proxy cache is not part of the P2P network and it does not participatecaches, since it imposes processing overheads. We maintain in the P2P protocol. Implementing many P2P protocols inunoccupied pages of the same size in the same free-page list. If the cache to make it participate in P2P networks is a tediouspeers request segments that are in the buffers, they are served task. More important, participation in P2P networks imposesfrom memory and no disk I/O operations are issued. If the significant overhead on the cache itself, because it will haverequested segments are on the disk, they need to be swapped to process protocol messages and potentially serve too manyin some free memory buffers. When all free buffers are used objects to external peers. For example, if the cache was toup, the least popular data in some of the buffers are swapped participate in the tit-for-tat BitTorrent network, it would haveout to the disk if this data has been modified since it was to upload data to other external peers proportional to databrought in memory, and it is overwritten otherwise. downloaded by the cache on behalf of all internal BitTorrent peers. Furthermore, some P2P systems require registration and authentication steps that must be done by users, and the cache6 T RANSPARENT C ONNECTION M ANAGE - cannot do these steps.MENT Supporting full transparency in P2P proxy caches is notThis section starts by clarifying the importance and com- trivial, because it requires the cache to process and modifyplexity of providing full transparency in P2P proxy caches. packets destined to remote peers, i.e., its network stack acceptsThen, it presents our proposed method for splicing non-P2P packets with non-local IP addresses and port numbers.connections in order to reduce the processing and memoryoverhead on the cache. We implement the Connection Manager 6.2 Efficient Splicing of non-P2P Connectionsin Linux 2.6. The details on the implementation of connection Since P2P systems use dynamic ports, the proxy process mayredirection and the operation of the Connection Manager [14] initially intercept some connections that do not belong toare not presented due to the space limitations. P2P systems. This can only be discovered after inspecting a few packets using the P2P Traffic Identification module. Each6.1 Importance of Full Transparency intercepted connection is split into a pair of connections, and all packets have to go through the proxy process. This imposesBased on our experience of developing a running caching overhead on the proxy cache and may increase the end-to-endsystem, we argue that full transparency is important in P2P delay of the connections. To reduce this overhead, we proposeproxy caches. This is because non-transparent proxies may to splice each pair of non-P2P connections using TCP splicingnot take full advantage of the deployed caches, since they techniques [32], which have been used in layer-7 switching.require users to manually configure their applications. This For spliced connections, the sockets in the proxy process areis even worse in the P2P traffic caching case due to the closed and packets are relayed in the kernel stack insteadexistence of multiple P2P systems, where each has many of passing them up to the proxy process (in the applicationdifferent client implementations. Transparent proxies, on the layer). Since packets do not traverse through the applicationother hand, actively intercept connections between internal and layer, TCP splicing reduces overhead on maintaining a largeexternal hosts, and they do not require error-prone manual number of open sockets as well as forwarding threads. Severalconfigurations of the software clients. implementation details had to be addressed. For example, Transparency in many web caches, e.g., Squid [31], is since different connections start from different initial sequenceachieved as follows. The cache intercepts the TCP connec- numbers, the sequence numbers of packets over the splicedtion between a local client and a remote web server. This TCP connections need to be properly changed before beinginterception is done by connection redirection, which forwards forwarded. This is done by keeping track of the sequenceconnection setup packets traversing through the gateway router number difference between two spliced TCP connections, andto a proxy process. The proxy process accepts this connection updating sequence numbers before forwarding packets.and serves requests on it. This may require the proxy to createa connection with the remote server. The web cache uses itsIP address when communicating with the web server. Thus, 7 E XPERIMENTAL E VALUATION OF P C ACHEthe server can detect the existence of the cache and needs In this section, we conduct extensive experiments to evaluateto communicate with it. We call this type of caches partially all aspects of the proposed pCache system with real P2Ptransparent, because the remote server is aware of the cache. In traffic. We start our evaluation, in Sec. 7.1, by validatingcontrast, we refer to proxies that do not reveal their existence that our implementation of pCache is fully transparent, andas fully-transparent proxies. When a fully-transparent proxy it serves actual non-corrupted data to clients as well ascommunicates with the internal host, it uses the IP address saves bandwidth for ISPs. In Sec. 7.2, we show that pCacheand port number of the external host, and similarly when it improves the performance of P2P clients without reducingcommunicates with the external host it uses the information the connectivity in P2P networks. In Sec. 7.3, we rigorouslyof the internal host. analyze the performance of the proposed storage management
  8. 8. 8 3system and show that it outperforms others. In Sec. 7.4, we 10 Number of Scheduled Downloads Total Number of Download: 2729demonstrate that pCache is scalable and could easily servemost customer ASes in the Internet using a commodity PC. 2 10In Sec. 7.5, we validate the analysis of the inference algorithmand show that its estimates are correct in more than 99.7% ofthe cases with low overhead. Finally, in Sec. 7.6, we show 10 1that the proposed connection splicing scheme improves theperformance of the cache. 0 10 0 The setup of the testbed used in the experiments consists 10 10 1 10 2 3 10of two separate IP subnets. Traffic from client machines in Object Ranksubnet 1 goes through a Linux router on which we install Fig. 5. Download sessions per object.pCache. Client machines in subnet 2 are directly attachedto the campus network. All internal links in the testbedhave 100 Mb/s bandwidth. All machines are configured withstatic IP addresses, and appropriate route entries are added in BitTorrent as follows. We developed a crawler to contactin the campus gateway router in order to forward traffic to popular torrent search engines such as TorrentSpy, MiniNova,subnet 1. Our university normally filters traffic from some and IsoHunt. We collected numerous torrent files. Each torrentP2P applications and shapes traffic from others. Machines in file contains information about one object; an object canour subnets were allowed to bypass these filters and shapers. be a single file or multiple files grouped together. We alsopCache is running on a machine with an Intel Core 2 Duo contacted trackers to collect the total number of downloads1.86 GHz processor, 1 GB RAM, and two hard drives: one that each object received, which indicates the popularity offor the operating system and another for the storage system the object. We randomly took 500 sample objects from all theof the cache. The operating system is Linux 2.6. In our torrent files collected by our crawler, which were downloadedexperiments, we concurrently run 10 P2P software clients 111,620 times in total. The size distribution of the chosenin each subnet. We could not deploy more clients because 500 objects ranges from several hundred kilobytes to a fewof the excessive volume of traffic they generate, which is hundred megabytes. To conduct experiments within a reason-problematic in a university setting (during our experiments, we able amount of time, we scheduled about 2,700 downloadwere notified by the network administrator that our BitTorrent sessions. The number of download sessions assigned to anclients exchanged more than 300 GB in one day!). object is proportional to its popularity. Fig. 5 shows the distribution of the scheduled download sessions per objects. This figure indicates that our empirical popularity data follows7.1 Validation of pCache and ISP Benefits Mandelbrot-Zipf distribution, which is a generalized form ofpCache is a fairly complex software system with many com- Zipf-like distributions with an extra parameter to capture theponents. The experiments in this section are designed to flattened head nature of the popularity distribution observedshow that the whole system actually works. This verification near the most popular objects in our popularity data. Theshows that: (i) P2P connections are identified and transparently popularity distribution collect by our crawler is similar to thosesplit, (ii) non-P2P connections are tunneled through the cache observed in previous measurement studies [1], [8].and are not affected by its existence, (iii) segment lookup, After distributing the 2,700 scheduled downloads among thebuffering, storing, and merging all work properly and do not 500 objects, we randomly shuffled them such that downloadscorrupt data, (iv) the piece-length inference algorithm yields for the same object do not come back to back with eachcorrect estimates, and (v) data is successfully obtained from other, but rather they are interleaved with downloads for otherexternal peers, assembled with locally cached data, and then objects. We then equally divided the 2,700 downloads intoserved to internal peers in the appropriate message format. ten lists. Each of these ten lists, along with the correspondingAll of these are shown for real BitTorrent traffic with many torrent files, was given to a pair of CTorrent clients to execute:objects that have different popularities. The experiments also one in subnet 1 (behind the cache) and another in subnet 2. Tocompare the performance of P2P clients with and without the conduct fair comparisons between the performance of clientsproxy cache. with and without the cache, we made each pair of clients start We modified an open-source BitTorrent client called CTor- a scheduled download session at the same time and with therent. CTorrent is a light-weight (command-line) C++ program; same information. Specifically, only one of the two clientswe compiled it on both Linux and Windows. We did not contacted the BitTorrent tracker to obtain the addresses ofconfigure CTorrent to contact or be aware of the cache in seeders and leechers of the object that need to be downloaded.anyway. We deployed 20 instances of the modified CTorrent This information was shared with the other client. Also, a newclient on the four client machines in the testbed. Ten clients (in download session was not started unless the other client eithersubnet 1) were behind the proxy cache and ten others were finished or aborted its current session. We let the 20 clientsdirectly connected to the campus network. All clients were run for two days, collecting detailed statistics from each clientcontrolled by a script to coordinate the download of many every 20 seconds.objects. The objects and the number of downloads per object Several sanity checks were performed and passed. First,were carefully chosen to reflect the actual relative popularity all downloaded objects passed the BitTorrent checksum test.
  9. 9. 9 1 1 1 0.8 0.8 0.8 0.6 0.6 0.6 CDF CDF CDF 0.4 0.4 0.4 0.2 0.2 0.2 Without pCache Without pCache Without pCache With pCache With pCache With pCache 0 0 0 0 100 200 300 400 0 10 20 30 40 50 0 5 10 15 20 25 30 Download Speed (KB) Upload Speed (KB) Max Number of Connected Peers (a) Download Speed (b) Upload Speed (c) Number of Connected PeersFig. 4. The impact of the cache on the performance of clients.Second, the total number of download sessions that completed clients, in addition to benefiting the ISP deploying the cachewas almost the same for the clients with and without the by saving WAN traffic. Furthermore, the increased downloadcache. Third, clients behind the cache were regular desktops speed did not require clients behind the cache to significantlyparticipating in the campus network, and other non-P2P net- increase their upload speeds, as shown in Fig. 4(b). This is alsowork applications were running on them. These applications beneficial for both clients and the ISP deploying the cache,included web browsers, user authentication modules using cen- while it does not hurt the global BitTorrent network, as clientstralized database (LDAP), and file system backup applications. behind the cache are still uploading to other external peers.All applications worked fine through the cache. Finally, a The difference between the upload and download speeds cansignificant portion of the total traffic was served from the be attributed mostly to the contributions of the cache. Fig. 4(c)cache; up to 90%. This means that the cache identified, stored, demonstrates that the presence of the cache did not reduce theand served P2P traffic. This is all happened transparently connectivity of clients behind the cache; they have roughlywithout changing the P2P protocol or configuring the client the same number of connected peers. This is important for thesoftware. Also, since this is BitTorrent traffic, the piece-length local clients as well as the whole network, because reducedinference algorithm behaved as expected, and our cache did connectivity could lead to decreased availability of peers andnot interfere with the incentive scheme of BitTorrent. We the content stored on them. Finally, we notice that the cachenote that the unusually high 90% byte hit rate seen in our may add some latency in the beginning of the downloadexperiments is due to the limited number of clients that we session. This latency was in the order of milliseconds in ourhave; the clients did not actually generate enough traffic to experiments. This small latency is really negligible in P2Pfill the whole storage capacity of the cache. In full-scale systems in which sessions last for minutes if not hours.deployment of the cache, the byte hit rate is expected to besmaller, in the range of 30 to 60% [11], which would stillyield significant savings in bandwidth given the huge volume 7.3 Performance of the Storage Systemof P2P traffic. Experimental Setup. We compare the performance of the proposed storage management system against the storage system used in the widely-deployed Squid proxy cache [31],7.2 Performance of P2P Clients after modifying it to serve partial hits needed for P2P traffic.Next, we analyze the performance of the clients and the impact This is done to answer the question: “What happens if we wereon the P2P network. We are interested in the download speed, to use the already-deployed web caches for P2P traffic?” To theupload speed, and number of peers to which each of our best of our knowledge, we are not aware of any other storageclients is connected. These statistics were collected every 20 systems designed for P2P proxy caches, and as describedseconds from all 20 clients. The download (upload) speed in Sec. 2, storage systems proposed in [21]–[24] for webwas measured as the number of bytes downloaded (uploaded) proxy caches and in [25]–[27] for multimedia caches are notduring the past period divided by the length of that period. readily applicable to P2P proxy caches. Therefore, we couldThen, the average download (upload) speed for each completed not conduct a meaningful comparison with them.session was computed. For the number of connected peers, We implemented the Squid system, which organizes fileswe took the maximum number of peers that each of our into a two-level directory structure: 16 directories in the firstclients was able to connect to during the sessions. We then level and 256 subdirectories in every first-level directory. Thisused the maximum among all clients (in the same subnet) to is done to reduce the number of files in each directory in ordercompare the connectivity of clients with and without the cache. to accelerate the lookup process and to reduce the numberThe results are summarized in Fig. 4. Fig. 4(a) shows that of i-nodes touched during the lookup. We implemented twoclients behind the cache achieved much higher download speed versions of our proposed storage system, which are denotedthan other clients. This is expected as a portion of the traffic by pCache/Fs and pCache/Raw in the plots. pCache/Fs iscomes from the cache. Thus, the cache would benefit the P2P implemented on top of the ext2 Linux file system by opening
  10. 10. 10a large file that is never closed. To write a segment, we move system with a large number of requests. We also needed thethe file pointer to the appropriate offset and then write that stream of requests to be reproducible such that comparisonssegment to the file. The offset is determined by our disk block among different storage systems are fair. To satisfy these twoallocation algorithm. Reading a segment is done in a similar requirements, we collected traces from an operational P2Pway. The actual writing/reading to the disk is performed by network instead of just generating synthetic traces. Notice thatthe ext2 file system, which might perform disk buffering and running the cache with real P2P clients (as in the previouspre-fetching. pCache/Raw is implemented on a raw partition section) would not give us enough traffic to stress the storageand it has complete control of reading/writing blocks from/to system, nor would it create identical situations across repeatedthe disk. pCache/Raw uses direct disk operations. experiments with different storage systems because of the high In addition to the Squid storage system, we also imple- dynamics in P2P systems. To collect this trace, we modified anmented a storage system that uses a multi-directory structure, open-source Gnutella client to run in super-peer mode and towhere segments of the same object are grouped together under simultaneously connect to up to 500 other super-peers in theone directory. We denote this storage system as Multi-dir. This network (the default number of connections is up to only 16).is similar to the previous proposals on clustering correlated Gnutella was chosen because it has a two-tier structure, whereweb objects together for better disk performance, such as in ordinary peers connect to super peers and queries/replies are[21]. We extended Multi-dir for P2P traffic to maintain the forwarded among super peers for several hops. This enabled uscorrelation among segments of the same object. Finally, we to passively monitor the network without injecting traffic. Ouralso implemented the Cyclic Object Storage System (COSS) monitoring super peer ran continuously for several months,which has recently been proposed to improve the performance and because of its high connectivity it was able to recordof the Squid proxy cache [33, Sec. 8.4]. COSS sequentially a large portion of the query and reply messages exchangedstores web objects in a large file, and wraps around when it in the Gnutella network. It observed query/reply messages inreaches the end of the file. COSS overwrites the oldest objects thousands of ASes across the globe, accounting to more thanonce the disk space is used up. 6,000 tera bytes of P2P traffic. We processed the collected data We isolate the storage component from the rest of the to separate object requests coming from individual ASes. WepCache system. All other components (including the traffic used the IP addresses of the receiving peers and an IP-to-ASidentification, transparent proxy, and connection splicing mod- mapping tool in this separation. We chose one large AS (ASules) are disabled to avoid any interactions with them. We also 9406) with a significant amount of P2P traffic. The createdavoid interactions with the underlying operating system by trace contains: timestamp of the request, ID of the requestedinstalling two hard drives: one for the operating system and object, size of the object, and the IP address of the receiver.the other is dedicated to the storage system of pCache. No Some of these traces were used in our previous work [11],processes, other than pCache, have access to the second hard and are available online [14].drive. The capacity of the second hard drive is 230 GB, and The trace collected from Gnutella provides realistic objectit is formatted into 4 KB blocks. The replacement policy used sizes, relative popularities, and temporal correlation amongin the experiments is segment-based LRU, where the least- requests in the same AS. However, it has a limitation: in-recently used segment is evicted from the cache. formation about how objects are segmented and when exactly We subject a specific storage system (e.g., pCache/Fs) to a each segment is requested is not known. We could not obtainlong trace of 1.5 million segment requests; the trace collection this information because it is held by the communicating peersand processing are described below. During the execution of and transferred directly between them without going throughthe trace, we periodically collect statistics on the low-level super peers. To mitigate this limitation, we divided objectsoperations performed on the disk. These statistics include the into segments with typical sizes, which we can know eithertotal number of: read operations, write operations, and head from the protocol specifications or from analyzing a smallmovements (seek length in sectors). We also measure the trace sample of files. In addition, we generated the request times forcompletion time, which is the total time it takes the storage segments as follows. The timestamp in the trace marks the startsystem to process all requests in the trace. These low-level disk of downloading an object. A completion time for this object isstatistics are collected using the blktrace tool, which is an randomly generated to represent peers with different networkoptional module in Linux kernels. Then, the whole process is connections and the dynamic conditions of the P2P network.repeated for a different storage system, but with the same trace The completion time can range from minutes to hours. Then,of segment requests. During these experiments, we fix the total the download time of each segment of the object is randomlysize of memory buffers at 0.5 GB. In addition, because they are scheduled between the start and end times of downloadingbuilt on top of the ext2 file system, Squid and pCache/Fs have that object. This random scheduling is not unrealistic, becauseaccess to the disk cache internally maintained by the operating this is actually what is being done in common P2P systemssystem. Due to the intensive I/O nature of these experiments, such as BitTorrent and Gnutella. Notice that using this randomLinux may substantially increase the size of the disk cache scheduling, requests for segments from different objects willby stealing memory from other parts of the pCache system, be interleaved, which is also realistic. We sort all requests forwhich can degrade the performance of the whole system. To segments based on their scheduled times and take the firstsolve this problem, we modified the Linux kernel to limit the 1.6 million requests to evaluate the storage systems. With thissize of the disk cache to a maximum of 0.5 GB. number of requests, some of our experiments took two days P2P Traffic Traces. We needed to stress the storage to finish.
  11. 11. 11 6 14 x 10 x 10 3.5 10 30 Squid Squid Squid Multi-dir Number of Read Operations 3 Multi-dir Multi-dir pCache/Fs 8 pCache/Fs 25 pCache/Fs Completion Time (hr) Seek Length (sectors) pCache/Raw pCache/Raw pCache/Raw 2.5 20 COSS 2 6 15 1.5 4 10 1 2 5 0.5 0 0 0 0 5 10 15 0 5 10 15 0 5 10 15 Number of Requests x 10 5 Number of Requests x 10 5 Number of Requests x 10 5 (a) Read Operations (b) Head Movements (c) Completion TimeFig. 6. Comparing the performance of the proposed storage management system on top of a raw disk partition(pCache/Raw) and on top of a large file (pCache/Fs) versus the Squid and other storage systems. Main Results. Some of our results are presented in Fig. 6 to our storage system. This is because Squid uses a hashfor a segment size of 0.5 MB. Results for other segment sizes function to determine the subdirectory of each segment, whichare similar. We notice that the disk I/O operations resulted destroys the locality among segments of the same the COSS storage system are not reported in Figs. 6(a) This leads to many head jumps between subdirectory to serveand 6(b). This is because COSS storage system has a built-in segments of the same object. In contrast, our storage systemreplacement policy that leads to fewer cache hits, thus fewer performs segment merging in order to store segments of theread operations and many more write operations, than other same object near to each other on the disk, which reduces thestorage systems. Since the COSS storage system results in number of head movements.different disk access pattern than that of other storage sys- Fig. 6(b) also reveals that Multi-dir can reduce the numbertems, we cannot conduct a meaningful low-level comparison of head movements compared to the Squid storage system.between them in Figs. 6(a) and 6(b). We will discuss more This is because the Multi-dir storage system exploits theabout the replacement policy of the COSS storage system correlation among segments of the same P2P object by storingin a moment. Fig. 6 demonstrates that the proposed storage these segments in one directory. This segment placement struc-system is much more efficient in handling the P2P traffic ture enables the underlying file system to cluster correlatedthan other storage systems, including Squid, Multi-dir, and segments closer as most file systems cluster files in the sameCOSS. The efficiency is apparent in all aspects of the disk subdirectory together. Nevertheless, the overhead of creatingI/O operations: The proposed system issues a smaller number many files/directories and maintaining their i-nodes was non-of read and write operations and it requires a much smaller trivial. This can be observed in Fig. 6(a) where the averagenumber of disk head movements. Because of the efficiency in number of read operations of Multi-dir is slightly smaller thaneach element, the total time required to complete the whole that of pCache/Raw when the number of cached objects istrace (Fig. 6(c)) under the proposed system is less than 5 smaller (in the warm-up period), but rapidly increases oncehours, while it is 25 hours under the Squid storage system. more objects are cached. We note that the Multi-dir results inThat is, the average service time per request using pCache/Fs fewer read operations than pCache/Raw in the warm-up period,or pCache/Raw is almost 1/5th of that time using Squid. This because of the Linux disk buffer and prefectch mechanism.experiment also shows that COSS and Multi-dir improves the This advantage of Multi-dir quickly diminishes when theperformance of the Squid system, but they have an inferior number of i-nodes increases.performance compared to our pCache storage system. Notice We observe that COSS performs better than Squid butalso that, unlike the case for Squid, Multi-dir, and COSS, the worse than Multi-dir in Fig. 6(c). The main cause of its badaverage time per request of our proposed storage system does performance is because COSS uses the large file to mimic annot rapidly increase with number of requests. Therefore, the LRU queue for object replacement, which requires it to writeproposed storage system can support more concurrent requests, a segment to the disk whenever there is a cache hit. This notand it is more scalable than other storage systems, including only increases the number of write operations but also reducesthe widely-deployed Squid storage system. the effective disk utilization, because a segment may be stored Because it is optimized for web traffic, Squid anticipates a on the disk several times although only one of these copieslarge number of small web objects to be stored in the cache, (the most recent one) is accessible. We plot the effective diskwhich justifies the creation of many subdirectories to reduce utilization of all considered storage systems in Fig. 7, wherethe search time. Objects in P2P systems, on the other hand, storage systems except COSS employ a segment-based LRUhave various sizes and can be much larger. Thus the cache with high/low-watermarks at 90% and 80%, respectively. Incould store a smaller number of P2P objects. This means this figure, we observe that the disk utilization for COSS isthat maintaining many subdirectories may actually add more less than 50% of that for other storage systems. Since COSSoverhead to the storage system. In addition, as Fig. 6(b) shows, tightly couples the object replacement policy with the storageSquid requires a larger number of head movements compared system, it is not desirable for P2P proxy caches, because
  12. 12. 12 100 and we stress the storage system by continuously submitting requests. We measure the average throughput (in Mbps) of the Disk Utilization (%) 80 storage system every eight minutes, and we plot the average results in Fig. 8. We ignore the first one hour (warm-up 60 period), because the cache was empty and few data swapping 40 operations between memory and disk occur. The figure shows Squid that in the steady state an average throughput of more than Multi-dir 20 pCache/Fs 300 Mbps can easily be provided by our cache running on pCache/Raw a commodity PC. This kind of throughput is probably more COSS 0 0 5 10 15 than enough for the majority of customer ASes such as Number of Requests x 10 5 universities and small ISPs, because the total capacities of their Internet access links are typically smaller than the maximumFig. 7. Disk utilization for various storage systems. throughput that can be achieved by pCache. For large ISPs with Gbps links, a high-end server with high-speed disk array could be used. Notice that our pCache code is not highlytheir performance may benefit from replacement policies that optimized for performance. Notice also that the 300 Mbpscapitalize on the unique characteristics of P2P traffic, such as is the throughput for P2P traffic only, not the whole Internetthe ones observed in [6], [8]. traffic, and it represents a worst-case performance because the Additional Results and Comments. We analyzed the disk is continuously stressed.performance of pCache/Fs and pCache/Raw, and comparedthem against each other. As mentioned before, pre-fetchingby the Linux file system may negatively impact the perfor- 7.5 Evaluation of the Inference Algorithmmance of pCache/Fs. We verified this by disabling this pre- We empirically evaluate the algorithm proposed in Sec. 4 forfetching using the hdparm utility. By comparing pCache/Fs inferring the piece length in BitTorrent traffic. We deployed tenversus pCache/Raw using variable segment sizes, we found CTorrent clients on the machines behind the cache. Each clientthat pCache/Fs outperforms pCache/Raw when the segment was given a few thousands torrent files. For each torrent file,sizes are small (16KB or smaller). pCache/Raw, however, the client contacted a BitTorrent tracker to get potential more efficient for larger segment sizes. This is because The client re-connected to the tracker if the number of peerswhen segment sizes are small, the buffering performed by dropped below 10 to obtain more peers. After knowing thethe Linux file system helps pCache/Fs by grouping several peers, the client started issuing requests and receiving traffic.small read/write operations together. pCache/Raw bypasses The cache logged all requests during all download sessions.this buffering and uses direct disk operations. The benefit of Many of the sessions did not have any traffic exchanged,buffering diminishes as the segment size increases. Therefore, because there were not enough active peers trading pieces ofwe recommend using pCache/Fs if pCache will mostly serve those objects anymore, which is normal in BitTorrent. TheseP2P traffic with small segments such as BitTorrent. If the sessions were dropped after a timeout period. We ran thetraffic is dominated by larger segments, as in the case of experiment for several days. The cache collected informationGnutella, pCache/Raw is recommended. from more than 2,100 download sessions. We applied the inference algorithm on the logs and we compared the estimated7.4 Scalability of pCache piece lengths against the actual ones (known from the torrentTo analyze the scalability of pCache, ideally we should deploy files).thousands of P2P clients and configure them to incrementally The results of this experiment are presented in Fig. 9.request objects from different P2P networks. While this would The x-axis shows the number of samples taken to infertest the whole system, it was not possible to conduct in the piece length, and the y-axis shows the correspondingour university setting, because it would have consumed too accuracy achieved. The accuracy is computed as the numbermuch bandwidth. Another less ideal option is to create many of correct inferences over the total number of estimations,clients and connect them in local P2P networks. However, which is 2,100. We plot the theoretical results computedemulating the high dynamic behavior of realistic P2P networks from Eqs. (1) and (2) and the actual results achieved byis not straightforward, and will make our results questionable our algorithm with and without the improvement matter what we do. Instead, we focus on the scalability The figure shows that the performance of our basic algorithmof the slowest part, the bottleneck, of pCache, which is the using actual traffic is very close to the theoretical results,storage system according to previous works in the literature which validates our analysis in Sec. 4. In addition, the simplesuch as [34]. All other components of pCache perform simple heuristic improved the inference algorithm significantly. Theoperations on data stored in the main memory, which is orders improved algorithm infers the piece length correctly in aboutof magnitude faster than the disk. We acknowledge that this 99.7% of the cases, which is done by using only 6 samplesis only a partial scalability test, but we believe it is fairly on average. From further analysis, we found that the samplesrepresentative. used by the algorithm correspond to less than 3% of each We use the large trace of 1.5 million requests used in the object on average, which shows how fast the inference is doneprevious section. We run the cache for about seven hours, in terms of the object’s size. Keeping this portion small is