1196 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 18, NO. 4, AUGUST 2010delay performance, but also has the potential to do so in theface of realistic network impairments, such as long propagationdelays and random bandwidth ﬂuctuations. The delay boundsderived in our analysis can serve as delay performance bench-marks for various proposed/deployed P2P streaming systems.Insights brought forth by the study of the snowball streamingalgorithm can be used to guide the design of new P2P streamingsystems with shorter startup delays and playback lags. The paper is organized as follows. In Section II, we providea short overview on the existing P2P streaming solutions. The Fig. 1. Balanced multi-tree-based streaming. (a) Seven-nodes example. (b) Hi-bounds on the delay for a single chunk dissemination is estab- erarchical view.lished in Section III for both homogeneous and heterogeneousP2P network environments. A snowball chunk dissemination al-gorithm is introduced to achieve the delay bound for a single (including themselves). In terms of time, after transmissionschunk dissemination. In Section IV, we show that the snow- by peer 0, the task of disseminating a chunk to peers becomesball chunk dissemination algorithm can be extended to a snow- subtasks of disseminating the chunk to peers.ball streaming algorithm to achieve the delay bounds in contin-uous video streaming. The performance of the snowball chunk C. Mesh-Based Streamingdissemination algorithm under realistic network environment is The management of streaming trees is challenging in facestudied in Section V. A centralized dynamic snowball streaming of frequent peer churns. Mesh-based streaming systems arealgorithm is presented in Section VI. Through simulations, we more robust against peer dynamics. Many recent P2P streamingdemonstrate that the dynamic snowball streaming algorithm can systems adopt mesh-based streaming approach , , ,approach the minimum delay bounds in highly variable network , . In a mesh-based system, there is no static streamingenvironments with a small peer upload bandwidth overhead. topology. Peers establish and terminate peering relationshipThe paper is concluded with future work in Section VII. dynamically. A peer may download/upload video from/to multiple peers simultaneously. However, in practice, the delay II. BACKGROUND AND RELATED WORK performance of mesh-based streaming is still not satisfactory. Existing P2P streaming solutions can be classiﬁed into the One important motivation of the study presented in this paperfollowing categories. is to provide some guidelines for the design of peering strate- gies and chunk scheduling schemes for mesh-based streamingA. Single-Tree Streaming systems to achieve better delay performance. In a single-tree-based approach, peers form a tree topology atthe application layer, with the video source server as the root. D. Related Work on Delay PerformanceEach peer receives the stream from its parent peer and forwards Despite P2P streaming systems’ popularity, few studiesto its children peers. The fan-out degree of a peer is limited by its have addressed their delay performance analytically. Oneuploading bandwidth. An early example is Overcast . One related work was presented in . Authors of  studiedmajor drawback of the single-tree approach is that all the leaf the tradeoff between the server bandwidth cost, the maximumnodes do not contribute their uploading bandwidth. Since leaf number of peers that can be supported, and the minimumnodes account for a large portion of peers in the system, this number of streaming hops experienced by a peer. We studylargely degrades the peer bandwidth utilization efﬁciency. the optimal streaming strategy when the server only plays a minimum role in video uploading. The delay bounds obtainedB. Balanced Multi-Tree Streaming through our analysis are much tighter than those predicted in To solve the leaf nodes problem, multi-tree-based approaches  and can be achieved by the proposed snowball streaminghave been proposed , . In balanced multi-tree streaming, algorithm. A recent paper  studied the minimum tree depththe server divides the stream into substreams. Instead of one of multi-tree-based streaming as a function of server and peerstreaming tree, subtrees are formed, one for each substream. bandwidth and peer degree. They assumed that video can beIn a fully balanced multi-tree streaming, the node degree of each divided inﬁnitely into substreams like ﬂuid. Consequently,subtree is . Each peer joins all subtrees to retrieve substreams. the chunk transmission delay was not considered. AuthorsA single peer is positioned on an internal node in only one tree of  proposed a heuristic algorithm to build low-delayand only uploads one substream to its children peers in that overlay mesh for P2P live streaming. The delay between twotree. In each of the remaining subtrees, the peer is posi- peers at the overlay level is the end-to-end propagation delaytioned on a leaf node and downloads a substream from its parent along the underlay path between the two. Again, the chunkpeer. Fig. 1(a) shows an example of two-tree streaming for seven transmission delay was not take into consideration. Differentpeers. For balanced multi-tree streaming, a chunk is dissemi- from those works, we study the delay bounds for chunk-basednated in a hierarchical way. As illustrated in Fig. 1(b), for an P2P streaming where the chunk transmission delays are not -degree tree of peers, peer 0 sends a chunk to its chil- negligible, compared to chunk transmission delays. We developdren peers at level 1, each of which is then responsible for dis- continuous P2P streaming algorithms to schedule the transmis-seminating the chunk in its own subtree with peers sions of chunks to approach the minimum delay bounds. After
LIU: DELAY BOUNDS OF CHUNK-BASED PEER-TO-PEER VIDEO STREAMING 1197the conference version of this paper , a recent work  If there are peers, the number of levels of each subtree isalso studied the delay bound of chunk-based P2P streaming by . The only peer at level 0 downloadsmodeling the diffusion process using difference equations. We the chunk directly from the server, and a peer at level then up-systematically study P2P delay bounds considering peer het- loads a video chunk to children peers at level . Let beerogeneity, random propagation delay, and upload bandwidth. the number of peers at level . Then, ,The tightness of delay bounds in dynamic network environment and . Since each peer/server only has up-is also established by a centralized streaming algorithm. loading bandwidth of 1, if the uploading is done in parallel, all children peers of one peer will receive the chunk time slots III. BOUND ON SINGLE-CHUNK DISSEMINATION after their common parent receives the chunk. The peer at the top level can always receive the chunk from the server after one In a P2P live video streaming session, a sequence of video time slot. For parallel uploading, the peers at the very bottomchunks are continuously generated by the server and dissemi- level will receive the chunk in time slots. Thenated to all peers in the session. The streaming delay is deter- average delay among all peers ismined by how fast all chunks can be delivered to peers. In thissection, instead of developing the streaming delay bound, we (1)assume that there is only one chunk to be disseminated in a P2Pvideo system and develop the delay bound for the single-chunkdissemination. Obviously, the single-chunk delay bound is a When is large, the average delay and the worst-case delay arelower bound for streaming delay. We will generalize the anal- both of the formysis for the single-chunk dissemination to continuous streamingin Section IV. (2) Given a P2P system with a server and peers, one can an-swer the question: If the server generates a chunk of content at To achieve the shortest delay, one can choose tree degreetime , how does one disseminate that chunk to all peersin the shortest time possible? The answer depends on the sizeof the chunk, available bandwidth, and the propagation delaysamong all nodes in the system, including the server and all peers. i.e., the server divides the stream into three substreams andWithout loss of generality, we can normalize the chunk size to feeds each stream into one subtree with node degree of 3.be one and choose the video streaming rate as the bandwidth The minimum delay, in both average and worst-case sense, isunit. Consequently, the chosen time unit after the normalization .equals to the chunk transmission time on a unit bandwidth link, If the uploading is done sequentially, the ﬁrst child peer willwhich in turn equals to the average playback time of video con- receive the chunk from its parent within one time slot, and thetained in a chunk. For now, let us assume the propagation delay last child of a peer will receive the chunk after time slots. Thebetween any two nodes is dominated by the chunk transmission longest delay at level is still . Therefore, the worst-casedelay and thus can be ignored. We will take propagation delays delay is still . A degree of 3 can achieve theinto account in Sections V-A and VI when the chunk transmis- minimum worst-case delay of . The averagesion delay becomes small. delay at level is time slots more than the average delay at level . The peer at the top level can always receiveA. Homogeneous Case the chunk from the server after one time slot. We can calculate the average delay among all peers as We start with a homogeneous case where the server and allpeers have upload bandwidth of 1. Each peer uploads and down-loads at the same rate, and the whole P2P streaming system isself-scalable. Throughout this paper, we assume all peers haveenough download bandwidth to receive the whole video stream. Again, when is large, the average delay isTherefore, the download part is never a bottleneck in our anal-ysis. We further assume that the server will upload only one copyof the chunk to one peer and will not participate in the chunk dis-semination afterward. When the tree degree is 4, the average delay is minimized to 1) Single-Tree Chunk Dissemination: Given the unit band- , which is less than of the average delaywidth on all peers, a peer can only have one child. The only pos- of parallel uploading.sible single-tree-based streaming solution is a chain: The server 3) Snowball Chunk Dissemination: For single-chunk dis-uploads the chunk to peer 0, then peer 0 uploads it to peer 1, and semination, peers only need to disseminate one chunk instead ofso on until peer uploads it to peer . The chunk propa- a continuous stream of chunks. After downloading the chunk, agates along the chain from the server to all peers in time . The peer can keep uploading that chunk to other peers until all peersaverage delay is . receive it. This will largely reduce the chunk dissemination time. 2) Multi-Tree Chunk Dissemination: If the multi-tree ap- The accumulation of the aggregate uploading bandwidth for theproach with degree is employed, a chunk propagates from chunk mimics the formation of a snowball. We refer to it as thethe server to all peers along a subtree with node degree of . snowball chunk dissemination approach. Fig. 2(a) illustrates the
1198 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 18, NO. 4, AUGUST 2010 TABLE I MINIMUM DELAY ACHIEVED BY DIFFERENT STREAMING STRATEGIES FOR HOMOGENEOUS CASE Proof: See proof of Theorem 1 in . Table I compares the delay performance of snowball chunk dissemination with tree-based and the optimal multiple-tree ap- proach. For a system of 1024 peers, if the transmission delay of a chunk is 0.2 s, it takes only 2 s for the snowball approach to complete chunk dissemination to all peers, while the minimum delay achieved by the multi-tree approach is 3.78 s. Since the single-tree approach degrades to a chain, peers’ average delay is around 100 s. As detailed in , if the server bandwidth is increased from 1 to , one can divide peers into clusters, and let the server upload the chunk to one peer in each cluster within oneFig. 2. Snowball chunk dissemination. (a) Eight nodes. (b) Recursive view. time slot. The delay bound can be reduced by a constant of to . However, if all peers’ upload band- width is increased to , with sequential chunk uploading be-progress of snowball chunk dissemination for eight peers. An tween peers, each chunk can be uploaded within time slot.arc from node to node with a label represents peer (or Consequently, the delay bound can be reduced to .the server) uploading the chunk to peer in time slot . The In the snowball approach, peers who receive the chunk inserver uploads the chunk to peer 0 in time slot 0. In time slot 1, the th time slot upload the chunk for times, and thepeer 0 uploads the received chunk to peer 1. In time slot 2, both peers who receive the chunk in the last time slot (about halfpeer 0 and peer 1 will upload the chunk to peers 2 and 3, re- of the peers) do not get a chance to upload the chunk to otherspectively. Peers 0,1,2,3 will upload the chunk to peers 4,5,6,7 peers. Their uploading bandwidth can be utilized to upload otherin time slot 3. All peers receive the chunk after four time slots. chunks in continuous video streaming when multiple chunks are For a general case, the snowball approach disseminates a in transition simultaneously. We will further show in Section IVchunk in a recursive way. As illustrated in Fig. 2(b), after peer 0 that the snowball chunk dissemination can be extended to snow-sends a chunk to peer 1, the task of disseminating a chunk to ball continuous streaming to continuously disseminate a streampeers becomes two subtasks of disseminating the chunk to of chunks, and the worst-case delay for each chunk is stillpeers. Peer 0 continues to lead one subtask, and peer 1 becomes . The snowball streaming in Section IV is designed inthe leader for the other subtask. Even though the task-splitting an optimal way such that the uploading bandwidth of all peersdegree is only 2, compared to degree in Fig. 1(b), it happens is fully utilized to achieve the minimum delay bound for eachafter only one chunk transmission instead of transmissions in chunk.Fig. 1(b). We will show that this is indeed the fastest branchingprocess. B. Heterogeneous Cases Let denote the number of peers that have the chunk at In a real network environment, different peers have differentthe beginning of time slot . In time slot 0, the server uploads types of network access with different upload bandwidth. Thethe chunk to one peer; therefore, . Afterward, every chunk dissemination delay is determined by how quickly peers’peer with the chunk will upload it to another peer in one time bandwidth can be utilized to upload the chunk. We deﬁne theslot, and we have . Therefore, it system-wide usable uploading bandwidth for the chunk astakes time slots for all peers to receive the aggregate uploading bandwidth that can be utilized to uploadthe chunk. One peer receives the chunk after one time slot, the chunk at any time . In the homogeneous case, every peerpeers receive the chunk after time slots , and has the same uploading bandwidth. is proportional to the peers receive the chunk after time slots. The number of peers with the chunk . The order at which peersaverage delay performance is receive the chunk has no impact on how grows over time. However, in a heterogeneous environment, the order at which peers receive the chunk determines the growth speed of and consequently the chunk dissemination delay. For the quick growth of , the intuition is to upload the chunk to peers withIf , the average delay is . large uploading capacities ﬁrst. Theorem 1: In a homogeneous P2P streaming system, In this section, we study the impact of uploading bandwidththe snowball chunk dissemination approach simultaneously heterogeneity among peers on the chunk dissemination delayachieves the minimum average peer delay and the minimum by studying several typical cases. It will become clear that theworst-case peer delay. peer uploading bandwidth heterogeneity enables the snowball
LIU: DELAY BOUNDS OF CHUNK-BASED PEER-TO-PEER VIDEO STREAMING 1199approach to achieve a shorter chunk dissemination delay than three levels. Another insight obtained from this example is that:the homogeneous case. Peers should be organized into tiers according to their uploading 1) Case 1: Super Peers and Free-Riders: Suppose there are bandwidth, peers within each tier should help each other to ob- super peers that can upload at rate . All the re- tain the chunk in the shortest possible time, then pass it down tomaining peers are free-riders and do not participate in the up- the neighboring lower tier. This way, the delay performance ofloading. The chunk can be disseminated by the snowball ap- the whole system can be reduced.proach to all super peers within time 3) General Heterogeneous Case: For general heterogeneousslots. Then, all super peers can upload the chunk to the re- cases, one can index peers according to the decreasing order ofmaining free-riders in additional time their uploading capacities. Suppose the sorted uploading capac-slots. The total delay is . In this case, the average ities of peers are . To derive a lower bound on theuploading bandwidth of peers are . If all peers have the av- shortest chunk dissemination time, let us allow inﬁnitely ﬁneerage uploading bandwidth 1, the shortest delay is , chunk stripping, namely, multiple peers can upload different bitswhich is around times of the heterogeneous case. This shows of a chunk to the same peer simultaneously. If the ﬁrst peersthat the heterogeneity of peer uploading bandwidth helps reduce have the chunk at time , the uploading to peer can ﬁnishthe chunk dissemination delay. by ; therefore, the lower delay bound can be calculated 2) Case 2: Multilevel Bandwidth Hierarchy: In the previous ascase, peers form a two-level hierarchy according to their up-loading contribution. A fraction of super peers with up-loading bandwidth stay at the top level and feed video chunkto the free-riders at the bottom level. In real network environ-ment, peers can be clustered based on the types of their net- However, this is a loose bound. For example, for the homoge-work access. In this case, we extend the two-level hierarchy to neous case, the bound is .accommodate multiple levels and show that even a very small We know the shortest delay without chunk stripping is insteadpercentage of super peers can bootstrap the chunk dissemina- . In , we developed several variations of thetion. snowball chunk dissemination algorithm to achieve short delay Suppose there are super peers with bandwidth in the general heterogeneous case. Due to the space limit, wemedium peers with bandwidth , and slow peers refer interested readers to  for more details.with bandwidth . To quickly disseminate the chunk to allpeers, the following chunk scheduling algorithm can be em- IV. SNOWBALL STREAMINGployed: In single-chunk dissemination, any peer can be utilized to up- 1) Use the snowball algorithm to upload to super peers load the chunk after it has downloaded the chunk. In continuous within time . streaming, one new chunk is generated by the server each time 2) Each of those super peers acts as a server with band- slot. When the server capacity is less than , one chunk cannot width and uploads to other medium peers. As be disseminated to all peers within one time slot. Therefore, studied in Section III-A, the uploading can ﬁnish within there will be more than one chunk in transition at any given time. time . Now, medium peers have If is the minimum transmission delay for a single chunk, the chunk. there will be at least chunks in transition at any given time. 3) Each of those medium peers acts as a server with If the chunk scheduling is not set up appropriately, some chunks bandwidth and uploads to other slow peers within cannot be disseminated to all peers within time slots. time . Now, slow peers have the A. Homogeneous Environment chunk. The total delay is In this section, we show that for the homogeneous case, there exists a chunk streaming schedule such that all chunks can be disseminated to all peers within the minimum chunk delay time. In the snowball chunk dissemination approach, the server up- loads the chunk to the ﬁrst peer at time slot 0. Before the begin-Without those super and medium peers, the fastest chunk dis- ning of time slot , all peers will receivesemination to slow peers takes time the chunk. Let be the number of peers with the chunk at . the beginning of time slot that will upload that chunk in time This suggests that the existence of super peers (even if a very slot . We havesmall percentage) can dramatically reduce the chunk dissemina-tion delay. For example, it takes at least 15 time slots to dissemi-nate a chunk to peers with bandwidth 1. Meanwhile, .if and ,in other words, 32 (only 0.1%) of them have bandwidth of 10 We call the snowball chunkand 1024 (only 3%) of them have bandwidth of 5, the time to dissemination proﬁle. To achieve the minimum chunk delay,disseminate a chunk to all peers is less than 5.2 time slots. all chunks have to be disseminated according to the optimalThe example can be easily extended to incorporate more than proﬁle .
1200 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 18, NO. 4, AUGUST 2010 Theorem 2: For a homogeneous P2P streaming system, there to those peers and ﬁnish the upload of chunk . Peers inexists a continuous streaming schedule such that all chunks in can be used to upload other chunks in time slotthe stream will be disseminated to all peers with the shortest . We set . Then,delay achieved by the snowball algorithm in the single- .chunk dissemination. II. If , chunk will be uploaded for the Proof: Without loss of generality, the server uploads chunk second-to-last time in slot . According to to some peer at time slot . Let be the number peers in set will upload chunk to otherof peers that have chunk and will upload chunk to other peers that do not have chunk . In addition, the schedulepeers at time slot . For any feasible schedule, we should have should guarantee that there will be peers avail- , i.e., at any time slot the aggregate up- able in time slot to upload chunk .loading bandwidth for all chunks is at most , and If , let each peer in upload , i.e., each peer can upload to at most one peer within any chunk to any peer without chunk , then picktime slot. A streaming schedule can achieve the optimal delay peers out of to form the set of peers to upload for each chunk if and only if each chunk can be uploaded chunk in next time slot, i.e., . Other peersaccording to the snowball chunk dissemination proﬁle after in can be used to upload other chunks in time slotit is uploaded to some peer by the server, i.e., . We set . We have . If , from step 1, otherwise. , we can take a subset of peers out of , and letIt can be veriﬁed that such a schedule satisﬁes the feasibility peers in upload chunk to peers in . Re-constraints maining peers in then upload chunk to arbitrary peers without chunk . Now, peers in are ready to upload chunk in time slot ; therefore, we set . We also have .and . To complete the proof, for each time III. Let . Any chunkslot, we need to construct a uploading schedule for all active , needs to be uploaded to peers by peers in setchunks. Let be the set of all peers. Denote by the set of . We havepeers with chunk at the beginning of time slot that will uploadthe chunk to other peers without chunk in the time slot.To follow the optimal dissemination proﬁle , it is sufﬁcient tohave , and are pairwise-disjoint(since each peer can only upload one chunk in one time slot). We (3)call the previous condition the sufﬁcient condition to achievethe minimum delay streaming. We complete the proof of the Then , take a subset of peerstheorem by constructing a chunk uploading schedule for each out of , let all peers in upload chunk to peerstime slot through inductions. in , and set Initial Condition: The server uploads chunk 0 to peer 0 in . At the end, due to (3), we will havetime slot 0. Therefore, at the beginning of time slot 1, . , and . It can be easily veriﬁed that the IV. The server uploads chunk to some peer in ,sufﬁcient condition is satisﬁed at the beginning of time slot 1. and set . Induction: If at the beginning of time slot , the con- Following the previous scheduling steps, the sufﬁcient con-dition is satisﬁed, we can construct a schedule in time slot dition will be satisﬁed at the beginning of time slot .such that is still satisﬁed at the beginning of time slot . Conclusion: There exists a schedule such that all chunks can At the beginning of time slot , according to be disseminated with snowball chunk dissemination proﬁle is the ID of the oldest chunk that needs to and achieve the optimal delay .be uploaded in time slot . Then, ; Fig. 3 illustrates the previous snowball streaming schedule in are pairwise-disjoint, . a system with eight peers. We use a sequence of eight subﬁguresDeﬁne a set , i.e., the set of peers that to show the snowball streaming schedule among all peers withindo not need to upload any chunk at the beginning of time slot eight consecutive time slots. Blocks represent chunks, and cir- . The following scheduling will guarantee the condition is cles represent peers. For time slot , a white chunk beside a peerstill satisﬁed at the beginning of time slot . is the chunk that the peer has and will be uploaded to another I. If , chunk will be uploaded for peer within that time slot. An arc from peer to indicates that the last time in slot . Since the chunk has been uploaded peer uploads its chunk to peer . A black chunk beside a peer times by the server and peers in the indicates that the server will inject that chunk to the peer in time previous time slots, only peers do slot . Chunk 0 is uploaded to all peers by the end of time slot 3, not have it. Let all peers in set upload chunk and chunk 1 is uploaded to all peers by the end of time slot 4.
LIU: DELAY BOUNDS OF CHUNK-BASED PEER-TO-PEER VIDEO STREAMING 1201Fig. 3. Evolution of chunk scheduling of snowball streaming among eight Peers. All chunks are delivered to all peers three time slots after they are injected bythe server. (a) Time 0; (b) Time 1; (c) Time 2; (d) Time 3; (e) Time 4; (f) Time 5; (g) Time 6; (h) Time 7.The example shows that all chunks can be disseminated to all V. IMPACT OF NETWORK IMPAIRMENTS ON SINGLE-CHUNKpeers within the minimum chunk dissemination delay bound. DISSEMINATION In real networks, the performance of P2P video streaming is subject to various network impairments. In this section, weB. Heterogeneous Environment evaluate the performance of the snowball chunk dissemination in network settings with long propagation delays and random For the heterogeneous case, the delay bound for single-chunk bandwidth variations.dissemination cannot always be achieved in continuous A. Impact of Propagation Delaysstreaming. For example, if the server’s upload capacity is 1,and seven peers’ upload capacities are 2, 1, 1, 1, 1, 1, and From the analysis in the previous sections, using smaller0, following the snowball chunk dissemination approach in chunks in P2P video streaming leads to smaller chunk trans-Section III, a single chunk can be disseminated to the seven mission delay and consequently smaller overall disseminationpeers in three time slots. However, no streaming algorithm can latency. On the other hand, using smaller chunks increasesachieve this. If peer 0 is still uploading chunk 0 at time slot 2, the signaling overhead and the scheduling complexity amongchunk 1 cannot be uploaded according to the greedy chunk peers. Meanwhile, as the chunk transmission delay gettingproﬁle . In this case, the ﬁrst peer with bandwidth 2 becomes smaller, the propagation delay between peers will play a morethe scheduling bottleneck for adjacent chunks. For the two important role. We still use the transmission time of a chunk asspecial heterogeneous cases considered in Section III-B, we are the time unit. Now, suppose the propagation delay isable to prove the existence of snowball streaming to achieve time slots . The time between when a sender begins tothe minimum chunk dissemination delay for all chunks. upload the chunk and when the receiving peer gets the whole Theorem 3: For a P2P streaming system with super chunk is time slots. For the multi-tree approach, if parallelpeers and free-riders, there exists a continuous uploading is employed, the chunk transmission delay from astreaming schedule such that all chunks in the stream will be dis- peer to all its children increases from to , and theseminated to all peers within a delay of time slots. delay performance is . If sequential uploading is Proof: See proof of Theorem 3 in . employed, the worst-case delay is still , and the Corollary 4: If peers in a streaming system form an -level average delay is .hierarchy with peers on level with uploading ca- Again, denote by the number of peers with the chunk atpacity of , there exists a continuous the beginning of time slot . All the chunks received right beforestreaming schedule such that chunks can be streamed to all peers the beginning of time slot were sent out at the beginning ofwith a delay of , where . time slot . Therefore, we have Proof: See proof of Corollary 4 in . Examples of snowball streaming schedule in heterogeneousenvironment can be found in the technical report . (4)
1202 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 18, NO. 4, AUGUST 2010 TABLE II variations, the available bandwidth on a connection between two MINIMUM DELAY ACHIEVED BY DIFFERENT STREAMING STRATEGIES WITH peers varies over time. Consequently, the transmission time of a PROPAGATION DELAYS chunk is not constant. In this section, we investigate the robust- ness of different streaming strategies against the randomness in chunk transmissions. For the clarity of presentation, we assume all transmission delays are independent and follow the same distribution. We introduce random variable for sequential transmission time, with and ; the -parallel transmissionwhere is a Fibonacci series with time lag ( is the time is , with and .standard Fibonacci series). We can solve by taking Z-trans- For the chain-based approach, we have shown in  thatform on (4) mean and variance of the worst-case delay is For the average delay among all peers, we havewhere is the largest root of the denominator. The ﬁnish timeis approximately , which is times of thesnowball delay without propagation delay. We compare the delay performance of multi-tree-basedstrategies and the snowball strategy in Table II. The delay This suggests that, in a chain topology, the impact of the ran-performance is measured in the unit of the average delay of domness of individual chunk transmission on the average andsnowball approach when there is no propagation delay. For worst-case chunk delay performance of all peers is proportionalparallel multi-tree strategy, we ﬁx the node degree to the number of peers.that minimizes the average and worst-case delay when there For the parallel multi-tree approach, all peers at the bottomis no propagation delay. For sequential multi-tree strategy, level will receive the chunk after independent parallelat different propagation delays, the node degree is optimized chunk transmissions. Then, we have for the worst-case delayfor the average delay. The associated worst-case delay is alsocalculated. As the propagation delay increases, the delay perfor-mance of all three strategies degrades. For parallel multiple tree For the sequential multi-tree approach, there is one peer at thewith ﬁxed degree, its delay increases fastest among the three. bottom level that will receive the chunk after inde-As the propagation delay increases, the optimal node degree pendent sequential chunk transmissions. We have for worst-casefor sequential multi-tree also increases. This is because prop- delayagation delays provide additional chance for pipelining chunktransmissions from a peer to its children. Sequential multi-treestrategy explores this pipelining gain. It increases node degree,and a peer will spend more time to upload the same chunk to all Therefore, the mean and variance of the worst-case delay forits children. This makes it closer to the uploading philosophy multi-tree-based approaches are proportional to .of snowball streaming: A peer should keep uploading the same As detailed in , we also calculated the mean and vari-chunk until all peers have it. As a result, sequential multi-tree ance of the average delay performance for multi-tree-based ap-has better delay performance than the ﬁxed-degree parallel proaches using recursions. It was shown that for both parallelmulti-tree. However, its worst-case delay performance is much and sequential multi-tree, and are theworse than that of the snowball approach. This is because leaf same as the deterministic case. The variance of the average delaypeers at the bottom level have large delay variations. Leaf performance for parallel and sequential uploading are,peers do not contribute to the uploading of the chunk even ifthey receive the chunk early. To the contrary, in the snowballapproach, a peer always contributes to the uploading as long asthe chunk is still missing on some peers. In both cases, the impact of the variability of individual trans- Analysis and simulation for single-chunk dissemination missions on the average delay performance is independent ofunder random propagation delays can be found in our technical the number of peers. Also, the average delay variance will notreport . diminish as grows. This is due to the variability at the ﬁrst few transmission steps will affect almost all peers.B. Impact of Bandwidth Variations In the proposed snowball approach, a peer will keep up- In previous sections, we assume that peers have constant up- loading a chunk until all peers have the chunk. Within one timeloading bandwidth and a chunk transmission completes in con- period, a peer that has more bandwidth will upload to morestant time. In sequential transmission, a chunk can be trans- peers than a peer with less bandwidth. Over time, the workloadmitted from a peer to another peer in one time slot. In parallel of a peer is naturally adaptive to its bandwidth: uploads more iftransmission with degree , a peer can transmit a chunk simul- it has more bandwidth; uploads less if its bandwidth reduces.taneously to children in time slots. Due to network trafﬁc As for the recursive view in Fig. 2(b), due to the workload
LIU: DELAY BOUNDS OF CHUNK-BASED PEER-TO-PEER VIDEO STREAMING 1203self-adaptiveness, the number of peers in each subtree is no delay bound. The DSB algorithm is developed as a centralizedlonger . What remains to be true is that the uploading in both streaming algorithm. We defer the distributed implementationsubgroups will ﬁnish around the same time. of DSB algorithms to future work. To further illustrate, let us assume the chunk transmissiontime between two peers follows exponential distribution with A. Dynamic Snowball (DSB) Streaming Algorithmmean 1. Denote by the time interval between the time in- The philosophy of DSB streaming algorithm follows thestants when the th and the th peers receive the chunk. static snowball streaming algorithm. DSB aims at pushing is the transmission time from peer 0 to peer 1; it is an ex- out older chunks as quickly as possible to reduce the chunkponential random variable with rate 1. For , due to the dissemination delays as well as the number of active chunksmemoryless property of exponential distribution, follows an in transition in the system. At the same time, DSB should alsoexponential distribution with rate . Therefore, the worst-case make sure that newer chunks get enough peer upload bandwidthdelay is . We have access to quickly grow the usable upload bandwidth for them. In a static network environment, as studied in Section IV, these two seemingly conﬂicting objectives can be simultaneously achieved by employing a carefully calculated chunk upload schedule among peers. The challenge for DSB streaming in aThe expected chunk dissemination ﬁnish time is only dynamic network environment is that the chunk transmission % of the deterministic case. Due to the constant bounded complete time is not predictable. Therefore, there is no optimaldelay variance, for large , the snowball algorithm has better static streaming schedule that can achieve the minimum delaydelay performance in the exponential case than in the determin- bound for all chunks in a video stream. Instead, our DSBistic case. Similarly, we have shown in  that the average algorithm is a simple heuristic algorithm that mimics the staticdelay performance of the snowball algorithm is better in the snowball streaming algorithm and dynamically resolves theexponential case than in the deterministic case conﬂicts between active chunks in continuous streaming. The DSB streaming algorithm works in rounds. At each round, let be the set of active chunks that have been gener- ated by the video source server but have not been uploaded toFor chunk transmission time following general distribution with all peers. For any chunk , let be the number of peersmean 1, the interval between two chunk upload ﬁnish times is with chunk and be the number of peers without chunk .no longer exponential. However, the superposition of a large Deﬁne the demand factor for chunk as , whichnumber of point processes converges to a Poisson process . is the expected workload for each peer with chunk to uploadFor large approximately follows an exponential distribu- it to some peers without it. Then, for any peer , let be thetion with rate . We can apply the previous exponential distribu- set of chunks in its buffer. The total expected workload for peertion analysis to study the behavior of large system with generally can be calculated as . The DSB algorithmdistributed chunk transmission time. It is our conjecture that for calculates the chunk uploading schedule among peers round bya reasonable large , e.g., , one can expect snowball round. The DSB algorithm is outlined in Algorithm 1.approach achieves better delay performance than the determin-istic case. Analysis and simulations of snowball chunk dissem- B. Performance Study of DSBination when the chunk transmission time distribution time fol- We implemented the centralized DSB streaming algorithmlows general distribution are presented in . and conducted simulations of a P2P video streaming systems with 4000 peers. For each simulation, a stream of 1000 contin- VI. CONTINUOUS STREAMING IN DYNAMIC ENVIRONMENT uous chunks are disseminated to all peers. We introduce random variations in peer upload bandwidth and propagation delays be- In the previous section, we studied the delay performance tween peers. More speciﬁcally, for each chunk transmission be-of the snowball single-chunk dissemination scheme under tween peers, the transmission time follows a truncated expo-long propagation delays and network bandwidth variations. nential distribution. The propagation delays between two peersHowever, for continuous streaming, due to the randomness in follow a truncated normal distribution. We record how long itnetwork bandwidth and propagation delays, we can no longer takes for each chunk to be received by each peer. Then, wepredetermine ﬁxed chunk streaming schedules among peers calculate the average and worst-case streaming delay for eachas in the static network case studied in Section IV. Instead, chunk and compare them to the single-chunk dissemination de-chunk uploading schedules have to be calculated dynamically lays obtained using a single-chunk dissemination simulator2 into adapt to network bandwidth and delay variations. Now, we a system with the same bandwidth and propagation delay set-extend the static snowball streaming algorithm to the Dynamic tings.Snowball (DSB) streaming algorithm. We will show through When there is no bandwidth variation and the propagationsimulations that, with a small peer upload bandwidth over- delays are negligible, the transmission time of a chunk is set tohead, the proposed DSB streaming algorithm can approach the be eight simulation time-steps. The DSB streaming algorithmminimum delay bounds in dynamic network environment. The achieves the minimum single-chunk delay bound as presentedmain purpose of this section is to demonstrate the potential ofsnowball type of streaming algorithms to achieve the minimum 2Details of the simulator and results are described in .
1204 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 18, NO. 4, AUGUST 2010 Fig. 4. Delay performance of dynamic snowball streaming algorithm degrades when there are variations in propagation delays and peer upload bandwidth. The minimum delay bounds can be approached by slightly increasing peer up- load bandwidth. (a) Random propagation delay; (b) random upload bandwidth; (c) random delay and random bandwidth. TABLE III DELAY PERFORMANCE OF DSB UNDER RANDOM PROPAGATION DELAYS AND UPLOAD BANDWIDTH average peer upload bandwidth and the streaming rate , .in Section III. Each of the 1000 chunks in the stream is deliv- The single-chunk delays can be approached by the DSB algo-ered to 4000 peers after exactly 97 time-steps, and the average rithm if peers have upload bandwidth slightly higher than thedelay experienced by peers is 88.81 time-steps. It demonstrated streaming rate. (The statistics of average and worst-case delaythat the dynamic snowball streaming is delay-optimal in static performance of DSB and Multi-Tree are compared in Table III.homogeneous environment. The delay performance of Mulit-Tree is worse than DSB. How- Next, we conduct simulations to evaluate the performance of ever, the delay variance is much smaller than DSB. This sug-DSB in dynamic network environment. For comparison, we also gests that Multi-Tree can adapt well to propagation delay vari-run the same simulations for the balanced multi-tree streaming ations.with sequential upload and node degree of 5.3 Now, we repeat the previous simulation with zero propaga- We ﬁrst introduce random propagation delays according to tion delay and random peer upload bandwidth. Now each chunka truncated normal distribution with the mean equal to eight transmission time follows a truncated exponential distribution,time-steps (the chunk transmission time) and the standard de- with the mean equal to eighttime-steps, and the lower and upperviation equal to four time-steps. The lower and upper bound for limit is 1 and 24, respectively. Again, we use the single-chunkthe random propagation delay is one and 16 time-steps, respec- dissemination simulation as the reference point. As predicted bytively. In Fig. 4(a), we compare the delay performance of DSB the analysis in Section V, with random chunk transmission time,when the average peer upload bandwidth varies from 1 to 1.125 the average and worst-case single-chunk delays (66.8 and 91.3,to 1.25. We use as a reference point the single-chunk delays ob- respectively) are smaller than those for the zero propagationtained from 100 simulation runs of a single-chunk dissemination delay case (88.81 and 97). The streaming delay performance ofbetween 4000 peers with the same random propagation delay DSB is plotted in Fig. 4(b). When the average peer upload band-setting using the simulator described in . Due to the prop- width is equal to the streaming rate, due to conﬂicts betweenagation delay, the average and worst-case single-chunk delays chunks, the streaming delay performance is much worse than theare 127.3 and 151.6, respectively, which are larger than 88.81 corresponding single-chunk delay performance. By increasingand 97 for the zero propagation delay case. Fig. 4(a) plots the peer upload bandwidth by 12.5%, the delay performance is re-average streaming delay for each chunk in DSB. The system re- duced by 25%. If we further increase the average peer uploadsource index in the ﬁgure is deﬁned as the ratio between the bandwidth to 1.25 times the streaming rate [corresponding to 3The degree is optimized for the transmission and propagation delay ratio of the curve labeled with resource in Fig. 4(b)], the1 according to Table II. delay performance is getting closer to the single-chunk delay
LIU: DELAY BOUNDS OF CHUNK-BASED PEER-TO-PEER VIDEO STREAMING 1205 TABLE IV P2P video systems. The proposed DSB algorithm can be DELAY PERFORMANCE IMPROVEMENT OF DSB AT DIFFERENT CHUNK SIZES implemented in a clustered P2P streaming framework, such as HCPS . Within a cluster, peers can employ the DSB chunk scheduling to achieve the minimum delay. For general mesh-based systems, using insights obtained in this study, we will design a distributed chunk scheduling algorithm that willbound. As seen in Table III, the delay performance of Mulit-Tree mimic the snowball schedule spirit. It will explore the peeris much worse than DSB. This is because the delivery path of bandwidth heterogeneity for shorter delay. It will also schedulechunks are predetermined in Multi-Tree. If one chunk transmis- uploads of multiple active chunks to achieve a delay perfor-sion gets delayed on one link, all subsequent chunks have to mance close to the ideal contention-free delay bound. Thebe queued. This can happen on any link on the path and re- snowball algorithm minimizes the delay by pushing the oldestsults in a “chain effect.” In contrast, DSB can adaptively ﬁnd chunk ﬁrst. As chunk delays decrease, the number of activepeers with available bandwidth to quickly disseminate chunks. chunks in the system decreases. This increases the chance ofIts delay performance is much better than Multi-Tree. content bottleneck. Rarest-ﬁrst type of chunk scheduling has Next, we introduce both random propagation delays and proven efﬁcient in eliminating content bottlenecks in P2P ﬁlerandom peer upload bandwidth by combining the random delay sharing . We will develop chunk scheduling algorithms thatand bandwidth variations introduced in the previous two sets of efﬁciently combine oldest-ﬁrst and rarest-ﬁrst scheduling rulessimulations. In Fig. 4(c), we compare the delay performance of to achieve a good balance between delay performance and peerDSB when the average peer upload bandwidth varies from 1 to bandwidth utilization. We will test its performance in a real1.125 to 1.25. Again, the minimum single-chunk delay bounds network environment and compare it to the theoretical boundscan be approached by the DSB algorithm if peers have upload predicted by our analysis here. Another direction for futurebandwidth slightly higher than the streaming rate. In Table III, work is to extend the delay performance analysis to take intodue to the bandwidth variations, the delay performance of consideration other factors, such as peer churns, geographicMulti-Tree is still much worse than DSB. To study the impact locality of peers and correlations among individual chunkof chunk size on the delay performance improvement of DSB, transmissions, etc. More broadly, we are interested in extendingwe ﬁx the average propagation delay at eight time slots and our design and analysis of snowball-type algorithms to othervary the size of chunks such that the chunk transmission time forms of P2P systems with stringent delay requirements, suchranges from two to 16 time slots. As indicated in Table IV, as as Content Delivery Networks and P2P gaming systems .chunk size decreases, the delay performance of both DSB andMulti-Tree both degrade. The performance gap between themalso decreases. When there is random bandwidth variation, REFERENCESDSB still outperforms Multi-Tree by around 25% even with a  A. Bharambe, J. R. Douceur, J. R. Lorch, T. Moscibroda, J. Pang, S. Se-small chunk size of 2. shan, and X. Zhuang, “Donnybrook: Enabling large-scale, high-speed, Through simulations, we demonstrated that with a little peer-to-peer games,” in Proc. ACM SIGCOMM, 2008, pp. 389–400.  G. Bianchi, N. B. Melazzi, L. Bracciale, F. L. Piccolo, and S. Sal-bit of extra peer uploading bandwidth, our dynamic snowball sano, “Fundamental delay bounds in peer-to-peer chunk-based real-streaming algorithm can approach the minimum delay bounds time streaming systems,” Tech. Rep., Feb. 2009 [Online]. Available:in the face of random variations in peer uploading bandwidth http://arxiv.org/PS cache/arxiv/pdf/0902/0902. 1394v1.pdf  J. Cao and K. Ramanan, “A Poisson limit for buffer overﬂow probabil-and propagation delays on peering connections. ities,” in Proc. IEEE INFOCOM, 2002, vol. 2, pp. 994–1003.  M. Castro, P. Druschel, A.-M. Kermarrec, A. Nandi, A. Rowstron, and A. Singh, “SplitStream: High-bandwidth multicast in cooperative en- VII. CONCLUSION AND FUTURE WORK vironments,” in Proc. ACM SOSP, 2003, pp. 298–313. In this paper, we analytically study the delay performance  M. Cha, H. Kwak, P. Rodriguez, Y.-Y. Ahn, and S. Moon, “I tube, you tube, everybody tubes: Analyzing the world’s largest user generatedof P2P live video streaming systems. We derive various delay content video system,” in Proc. IMC, 2007, pp. 1–14.bounds that can serve as delay performance benchmarks for pro-  Y. Chu, S. Rao, S. Seshan, and H. Zhang, “Enabling conferencing ap-posed/deployed P2P streaming systems. Through our analysis, plications on the internet using an overlay multicast architecture,” in Proc. ACM SIGCOMM, 2001, pp. 55–67.we quantify the impact of the bandwidth distribution among  Y.-H. G. Chu, S. Rao, and H. Zhang, “A case for end system multicast,”peers on their delay performance. Insights brought forth by our in Proc. ACM SIGMETRICS, 2000, pp. 1–12.study can be used to guide the design of new P2P streaming  X. Hei, C. Liang, J. Liang, Y. Liu, and K. W. Ross, “A measurement study of a large-scale P2P IPTV system,” IEEE Trans. Multimedia, vol.systems with shorter startup delays and playback lags. Static 9, no. 8, pp. 1672–1687, Dec. 2007.snowball streaming algorithms is proposed to achieve the min-  X. Hei, Y. Liu, and K. Ross, “Inferring network-wide quality in P2Pimum delay bounds in static homogeneous and heterogeneous live streaming systems,” IEEE J. Sel. Areas Commun., vol. 25, no. 9, pp. 1640–1654, Dec. 2007, Special Issue on Advances in P2P Streaming.P2P video systems. A dynamic snowball streaming algorithm  J. Jannotti, D. K. Gifford, K. L. Johnson, M. F. Kaashoek, and J.is also developed to approach the minimum delay bounds with W. O’Toole, Jr., “Overcast: Reliable multicasting with an overlaya small peer upload bandwidth overhead. Through analysis and network,” in Proc. OSDI, 2000, pp. 197–212.  D. Kostic, A. Rodriguez, J. Albrecht, and A. Vahdat, “Bullet: Highsimulation, we show that the snowball type of streaming algo- bandwidth data dissemination using an overlay mesh,” in Proc. ACMrithms are robust to network impairments, such as long propa- SOSP, 2003, pp. 282–297.gation delays and random bandwidth variations.  R. Kumar, Y. Liu, and K. Ross, “Stochastic ﬂuid theory for P2P streaming systems,” in Proc. IEEE INFOCOM, 2007, pp. 919–927. The next step is to develop distributed implementation of  C. Liang, Y. Guo, and Y. Liu, “Hierarchically clustered P2P streamingthe proposed snowball streaming algorithms in mesh-based system,” in Proc. IEEE GLOBECOM, 2007, pp. 236–241.
1206 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 18, NO. 4, AUGUST 2010  S. Liu, R. Zhang-Shen, W. Jiang, J. Rexford, and M. Chiang, “Perfor-  M. Zhang, J.-G. Luo, L. Zhao, and S.-Q. Yang, “A peer-to-peer network mance bounds for peer-assisted live streaming,” in Proc. ACM SIG- for live media streaming using a push-pull approach,” in Proc. ACM METRICS, 2008, pp. 313–324. Multimedia, 2005, pp. 287–290.  Y. Liu, “On the minimum delay peer-to-peer video streaming: How  M. Zhang, L. Zhao, J. Tang, and L. Y. S. Yang, “A peer-to-peer network real-time can it be?,” in Proc. ACM Multimedia, 2007 [Online]. Avail- for live media streaming—Using a push–pull approach,” in Proc. ACM able: http://eeweb.poly.edu/faculty/yongliu/docs/mm07.pdf Multimedia, 2005, pp. 287–290.  Y. Liu, “Delay bounds of peer-to-peer video streaming,” Polytechnic  X. Zhang, J. Liu, B. Li, and T.-S. P. Yum, “CoolStreaming/DONet: A Inst. NYU, Tech. Rep., Jun. 2009 [Online]. Available: http://eeweb. data-driven overlay network for live media streaming,” in Proc. IEEE poly. edu/faculty/yongliu/docs/mm2009.pdf INFOCOM, 2005, vol. 3, pp. 2102–2111.  N. Magharei and R. Rejaie, “Prime: Peer-to-peer receiver-driven mesh- based streaming,” in Proc. IEEE INFOCOM, 2007, pp. 1415–1423.  V. Pai, K. Kumar, K. Tamilmani, V. Sambamurthy, and A. Mohr, “Chainsaw: Eliminating trees from overlay multicast,” in Proc. 4th Yong Liu (M’02) received the Bachelor’s and Int. Workshop Peer-to-Peer Syst., 2005, pp. 127–140. Master’s degrees in automatic control from the  “PPLive homepage,” PPLIVE [Online]. Available: http://www. University of Science and Technology of China, pplive.com Hefei, China, in 1994 and July 1997, respectively,  “PPStream homepage,” PPSTREAM [Online]. Available: http://www. and the Ph.D. degree in electrical and computer ppstream.com engineering from the University of Massachusetts,  D. Ren, Y. Li, and S. Chan, “On reducing mesh delay for peer-to-peer Amherst, in May 2002. live streaming,” in Proc. IEEE INFOCOM, 2008, pp. 1058–1066. He has been an Assistant Professor with the Elec-  T. Small, B. Liang, and B. Li, “Scaling laws and tradeoffs in peer-to- trical and Computer Engineering Department, Poly- peer live multimedia streaming,” in Proc. 14th Annu. ACM Int. Conf. technic Institute of NYU, Brooklyn, NY, since March Multimedia, 2006, pp. 539–548. 2005. His general research interests lie in modeling,  “SopCast homepage,” SOPCAST [Online]. Available: http://www.sop- design, and analysis of communication networks. His current research directions cast.org include robust network routing, peer-to-peer IPTV systems, overlay networks,  “UNKOWN BitTorrent Web site,” [Online]. Available: http://www.bit- and network measurement. torrent.com/ Dr. Liu is a Member of the Association for Computing Machinery (ACM). He  V. Venkataraman, K. Yoshida, and J. P. Francis, “Chunkyspread: Het- is the winner of the IEEE INFOCOM Best Paper Award in 2009 and the IEEE erogeneous unstructured tree-based peer-to-peer multicast,” in Proc. Communications Society Best Paper Award in Multimedia Communications in 14th IEEE ICNP, 2006, pp. 2–11. 2008.