Scalable Video Streaming to Heterogeneous Receivers


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Scalable Video Streaming to Heterogeneous Receivers

  1. 1. This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE CCNC 2006 proceedings. Scalable Video Streaming to Heterogeneous Receivers Osama Lotfallah, and Sethuraman Panchanathan Dept. of Computer Science and Engineering Arizona State University Tempe, AZ 85281, USA {osalatif, panch} Abstract— Video streaming over networks, such as wireless and provided by prioritizing the transmission of various layers, Internet, is a challenging problem, due to network heterogeneity using different forward error correction (FEC) codes, automatic and variability of the visual content. Various scalable video repeat request (ARQ) or using Diffserv protocol. coding techniques have been proposed to facilitate adaptable transmission of video streams. In this paper, we compare the Recently, a joint effort between ISO and ITU-T to develop bandwidth utilization of such scalable schemes for video advanced scalable coding schemes has proposed standardized streaming to receivers of diverse capacities. The results reveal the levels for spatial resolution, temporal resolution, and signal-to- superiority of scalable schemes that adopt bit plane coding, such noise ratios (SNR) to promote compatible scalability, when as Fine Granularity Scalability (FGS) and Progressive-FGS used by video servers for point-to-point and multicast (PFGS). In addition, we evaluate the reconstructed qualities of streaming [10,11]. This effort is mainly focused on improving FGS and PFGS streams due to heterogeneity of receiver the compression efficiency of scalable codecs. capacities and variability of the transmission channel. The results suggest that, for low bit rate receivers/channels, FGS streams can In this paper, we evaluate the efficiency of various be used instead of PFGS streams, thus reducing the scalability schemes, in terms of their effects on the computational overhead at the video receiver. transmission channel and the reconstructed qualities. In particular, we investigate the role of increasing the number of Keywords: Scalable coding, PFGS, FGS, Hetrogenious enhancement layers on the performance of the video receivers, Video adaptation transmission. Diversity of receiver capacities is taken into account in our analysis. The capabilities of both the FGS and I. INTRODUCTION the PFGS coding schemes support the transmission of pre- encoded video streams. Although the PFGS scheme uses the The time varying nature of existing wireless channels and available bandwidth more efficiently, the bit rate of PFGS packet switching networks makes adaptable video streaming streams is optimized for the bit rate that represents the techniques mandatory. Video adaptation approaches can be encoding rate. To the best of our knowledge, the important classified into single-rate adaptation, simulcast, and layered problem of adapting the enhancement layer encoding rate for adaptation [5,7,12]. Single-rate adaptation is a direct extension streaming PFGS video has to date only been explored in [13], of the traditional feedback-based unicast adaptation. For where a transcoding approach is developed that can lower the current heterogeneous Internet/wireless receivers, single-rate enhancement layer encoding rate, to make the transmission of adaptation protocol provides low satisfaction among viewers. the video stream more efficient for lower bit rates. In this Simulcast protocols are a trade-off between user satisfaction paper, we present simulation results of reconstructed qualities and bandwidth efficiency [1]. A simulcast protocol groups the of various PFGS streams (with different encoding rates) and receivers into a number of clusters/sessions, according to their compare it to the reconstructed qualities of corresponding FGS bandwidth capacities, and sends an independent video stream streams, for receivers of different bandwidths as well as for to each cluster. Each receiver subscribes to the stream that best lossy video transmission. matches its capacity. Bandwidth redundancy (due to the simultaneous transmission of independent streams for each The paper is organized as follows. Section 2 presents the session) is a drawback of simulcast. Advances in video basic scalable video coding schemes. Section 3 explains the technology now allow the decomposition of video streams into utility curves of various scalability schemes. Section 4 presents a number of non-overlapping layers, using a method called a performance evaluation of bit plane coding schemes for lossy scalable coding [2,4,5,10]. The most important features of the transmission. Finally, the discussion is presented in Section 5, video content are coded in the base layer. Additional followed by references. enhancement layers progressively refine the video reconstruction quality. Ideally, the base layer is coded with a II. OVERVIEW OF THE SCALABLE CODING SCHEMES bit rate that is low enough to be reliably transmitted to all receivers. Each receiver then subscribes to some number of This section provides an overview of five scalable video enhancement layers, according to its bandwidth capacity [7,9]. coding schemes (see Fig. 1), which can be categorized into Network support for such a scalable coding approach can be spatial, temporal, and SNR scalability with variable number of 1-4244-0086-4/05/$20.00 ©2005 IEEE. 933
  2. 2. enhancement layers. These schemes include most of recently compared to the frame used by encoder for motion proposed scalability in [10,11]. compensation. This inevitably results in a drifting error at the decoder, which is investigated in Section 4. Spatial scalability is typically achieved by decomposing the video stream into two layers. The base layer codes a lower resolution version of the original video by using a reduced frame size. Each enhancement layer frame is a motion Enhancement P B B B B Enhancement P B B B B compensated version of (1) an up-sampled base layer frame, Layer Layers and/or (2) the previously reconstructed enhancement layer frame, as shown in Fig. 1(a). For I-frames, the enhancement layer frames are motion compensated only from the corresponding up-sampled base layer frame. Base I B B P B Base I B B B P Layer Layer Spatio/temporal scalability is achieved by (1) coding each even-numbered frame as a motion compensated version of the corresponding lower-resolution base layer frame and the previous even-numbered frame, and (2) coding each odd- (a) spa (b) spa/temp numbered frames as a motion compensated version of the corresponding based layer frame and/or the previous even- numbered enhancement layer frame, as shown in Fig. 1(b). Temporal scalability can be achieved by including Enhancement sufficient number of B-frames, to conventional non-scalable Layers B B B B codecs. These B-frames constitute the enhancement layer, which can be split further into variable number of enhancement layers. This happens because video codec such as MPEG-4 prohibits using B-frames for motion compensation. For P P example, in order to have a base layer frame rate that is 1/3 of Base Layer I the original rate, two B-frames are coded between I-frame and P-frames, as shown in Fig. 1(c). SNR scalability can be accomplished with FGS or PFGS schemes. For the FGS scheme, the base layer is coded using a (c) temp non-scalable codec, such as H.26L or MPEG-4. The enhancement layer (which represents the residual discrete Intra Base Layer cosine transform (DCT) coefficients) is formed using a bit Prediction Stream Entropy plane coding technique, where the most significant bit planes + DCTQ Coding s0 - are coded first [5]. The motion compensation of an enhancement layer frame references the underlying base layer MC Q-1IDCT frame. Using bit-plane coding, the enhancement layer of the Frame Loop FGS stream can be regulated to any bit rate, and can also be Video ME Buffer0 Filter + split into a number of variable enhancement layer rates. Therefore, FGS streams can be used for on-line rate adaptation Frame Buffer1 + of pre-encoded video sequences. MC s2 The PFGS scheme improves the bandwidth efficiency of + FGS codec by including motion compensation from Loop IDCT reconstructed enhancement layer frames [4]. Fig. 1(d) Filter illustrates an PFGS codec, where the base layer is coded using an H.26L encoder. There are two predictions used at the s1 α(t) Enhancement - enhancement layer. One is a low quality prediction + DCT Bit Layer Stream VLC Plane m(t) (reconstructed from the base layer), while the other is a high quality prediction (from the reconstructed enhancement layer). Macroblocks can be coded with reference to high or low (d) Encoder structure of H.26L-PFGS quality prediction. In Fig. 1(d), two switches s1 and s2 are used Figure 1. Various scalability schemes to control the reference for motion compensation and reconstruction. The residue after prediction is transformed into DCT coefficients, followed by bit plane coding, similar to FGS III. PERFORMANCE EVALUATION USING UTILITY CURVES scheme. Only the first α(t) bits are used to reconstruct the Video streaming to heterogeneous receivers requires enhancement reference for the next frame, which we call the protocols that maximize overall user satisfaction. Users are encoding rate of the enhancement layer. If part of the billed according to bandwidth usage, so it is intuitive to express enhancement layer stream is lost, due to channel bandwidth user satisfaction using utility metrics that characterize the fluctuations, the decoder reconstructs a degraded frame, bandwidth utilization. In the following simulations, we 934
  3. 3. evaluate the bandwidth efficiency of various scalable coding performance of temporal scalability, the ratios of service denial schemes with respect to the rate variability of the receivers. In of video services to the total number of video receivers for 4 addition, the effect of the number of enhancement layers is video sequences are shown in Table 1. The service denials analyzed. For simplicity, receiver capacities are assumed to be depend on the visual activity, as well as the distribution of uniformly distributed between Rmin and Rmax Kbps. Four video receiver capacities (represented by Rmax). sequences are used in this study. For the news and foreman video sequences, Rmin is 256 Kbps, while for coastguard and Fig. 3 shows the impact of the number of enhancement stefan video sequences, Rmin is 512 Kbps. The video sequences layers (N) on the average utility. Most of the video scalability schemes that were simulated show a monotonically increasing are composed of 300 frames of CIF (352×288) format with 30 utility, as function of N, which is consistent with the theorem fps. The enhancement layer rates used in these simulations are presented in [7]. The utility curves of Fig. 2 and Fig. 3 evenly distributed between Rmin and Rmax. The transmission is demonstrate the superiority of bit plane coding schemes (i.e. prioritized by allowing the receivers to subscribe to first i FGS and PFGS). Guided by this insight, their reconstructed enhancement layers that best match the receiver capacity. We qualities over time varying channels (such as wireless) and use the utility metric presented in [3], to evaluate the various receiver capacities were then further investigated. bandwidth efficiency of the various scalable coding schemes. We denote the average utility as U, the expected bandwidth of a receiver as cj, and the aggregate bit rate of the base layer and TABLE I. RATIOS OF SERVICE DENIAL DUE TO TEMPORAL SCALABILITY the first i enhancement layers as ri (ri≤ ri+1). The following Rmax (Kbps) News Foreman Coastguard Stefan equations represent such a utility metric: 1024 0.1168 0.5014 1.0 1.0 1536 0.0706 0.3009 0.5213 0.8516 1 M Γ (c j ) U = M ∑ cj 2048 0.0521 0.2165 0.3469 0.5679 j =1 (1) 3072 0.0331 0.1362 0.2076 0.3357 4096 0.024 0.1024 0.1505 0.2456 Γ(c j ) = max{ri : ri < c j , i = 0,1,.., N } 0.9 0.8 where M refers to the number of receivers, and N refers to the number of enhancement layers. In order to guarantee a 0.7 statistical confidence in our simulations, M is assigned a large 0.6 value. Since this utility metric is calculated using bit rate ratios, 0.5 Untility the FGS and PFGS scheme show the same performance 0.4 characteristics due to the ability of both schemes to fine tune 0.3 the enhancement layer at any bit rate. Spatial scalability 0.2 temp- New s spa/temp - New s streams show very low utility metric due to their limited 0.1 fgs/pfgs - New s spa/temp - Foreman temp - Foreman f gs/pf gs - Foreman number of enhancement layers (only two layers), and therefore 0 are excluded from the following analysis. The utility metric can 1024 1536 2048 2560 3072 3584 4096 Rm ax be used to model the video traffic of the scalability scheme, and to study the impact of resource allocation protocols for next Figure 2. The average utility (U) as function of the last enhancement layer generation wireless channels. We should note that more rate (Rmax) advanced utility metrics (which take into account reconstructed visual qualities) can be found in [8]. 0.9 0.8 Fig. 2 shows the relationship between the rate variability of 0.7 receivers (which is represented by Rmax) and the average utility 0.6 metric. Simulations were run on four video sequences, but due to space limitation, we show only the results of news and 0.5 Untility foreman video sequences, which are typical. For the news 0.4 video sequence, N equals 3, while for the foreman video 0.3 sequence, N equals 2. The utility curves of both FGS and PFGS 0.2 temp- New s spa/temp - New s are monotonically decreasing with Rmax. This is due to 0.1 fgs/pfgs - New s spa/temp - Foreman temp - Foreman fgs/pfgs - Foreman variability of receiver capacities, while fixing the number of 0 enhancement layers. For temporal scalability, the utility of the 1 2 3 Num be r of Enh Layers 4 5 news video sequence also follows a monotonically decreasing function. However, because the foreman video sequence Figure 3. The average utility (U) as function of the number of enhancement layers (N) contains moderate video activity, it reaches a maximum at Rmax=2560 Kbps. The base layer rate of the temporal scalability, as well as spatio/temporal scalability, did exceed Rmin Kbps, which would have resulted in a service denial for certain receivers. In order to provide more insight into the 935
  4. 4. IV. PERFORMANCE COMPARISON OF BIT PLANE CODING The R-D curves of FGS video streams are also shown in SCHEMES Fig. 4. For low enhancement layer rates (< 512 Kbps for news and < 1024 Kbps for foreman) the reconstructed qualities of Scalable schemes that split enhancement layers using bit FGS streams are very comparable to PFGS streams. This plane coding provide the video server with an on-line ability to happens due to the large residual error of PFGS streams that is regulate transmission rates of pre-encoded sequences. Although accumulated at this low enhancement layer rate. Therefore, it is PFGS schemes exploit the network bandwidth more efficiently, better for the video server to switch the transmission to the a drifting error is obtained if the enhancement layer rate ri is FGS streams for this low enhancement layer rate because this less than the encoding rate α, which is referred to as α(t) in Fig. avoids the computational overhead (at the receivers) of 1(d). decoding the motion information of PFGS streams. This We have conducted a number of simulations to (1) analyze reduction in the computational complexity reduces the power the lossy nature of the wireless channel on the reconstructed consumption of the wireless handheld devices. qualities of PFGS streams when streamed to receivers of various capacities, and (2) compare it to the reconstructed quality of FGS streams. Rate regulation was achieved by 46 dropping the same number of bits from each frame, starting 45 alpha-0.5Mbps alpha-1Mbps News 44 with the least significant bits. Due to space limitation, 43 alpha-1.5Mbps alpha-2Mbps simulation results for only news and foreman video sequence 42 fgs are shown in Fig. 4. To improve the error resilience of the 41 compressed streams, the codec video sequences contain an I- 40 PSNR (dB) 39 frame every 25 frames, and the rest of the video frames are P- 38 frames. The base layer rates of FGS streams and PFGS streams 37 were fixed at a low bit rate. Hence, the results show bit rates of 36 35 the enhancement layer. These results represent the rate- 34 distortion (R-D) curves of FGS and PFGS streams with 33 different encoding rates α. The reconstructed qualities are 32 31 measured using the average peak-SNR (PSNR) with respect to 30 the original video frames. 0 512 1024 1536 2048 BitRate (Kbps) We denote the average reconstructed quality of PFGS (a) streams due to receiving an enhancement layer of r bit rate as Q(r,α), where the video stream is coded at α encoding rate. In 41 alpha-1Mbps the case of α2 > α1 > r, we observe the following relationship: 40 alpha-2Mbps alpha-3Mbps Foreman 39 alpha-4Mbps fgs 38 Q(r ,α 2 ) < Q(r ,α 1 ) (2) 37 PSNR (dB) 36 We also observe that the quality reduction of PFGS streams 35 due to regulating the bit rate from α to α-∆ is more significant than any other bit rate regulations, which can be expressed as: 34 33 32 Q (α , α ) − Q (α − ∆, α ) ≥ Q (α − m∆,α ) − Q (α − ( m + 1) ∆, α ) (3) 31 30 0 512 1024 1536 2048 BitRate (Kbps) ∀m = 0,1,.., (α / ∆ ) − 1 (b) Figure 4. R-D curves of the enhcanment layer of FGS and PFGS streams The visual activity and the scene complexity play a role on with various encoding rates α the reconstructed quality for PFGS streams. For example, Q(r,α) curves of low activity scenes (such as the news video sequence) exhibit more concavity, compared to higher activity V. DISCUSSION scenes (such as the foreman video sequence). Hence, the In this paper, we present an early stage work by quality reduction due to regulating the bit rate from α to α-∆ is investigating various scalable schemes that can be employed more significant for low motion/complexity scenes, which can for adaptable video streaming. User satisfaction is estimated be expressed as: using utility metrics, as well as reconstructed visual qualities. Spatial scalability schemes are not suitable for many wireless handheld devices, which support a small fixed-size display QLM (α , α ) − QLM (α − ∆, α ) ≥ QHM (α , α ) − QHM (α − ∆, α ) (4) screen. In addition, temporal scalability schemes require a significant bandwidth in order to guarantee the minimum base where LM (HM) refers to low motion (high motion) scenes. quality. Compared to other scalability schemes, FGS and PFGS 936
  5. 5. schemes achieve better utilization of the receiver bandwidths. [4] Y. He, F. Wu, S. Li, Y. Zhong and S. Yang, “H.26L-based fine PFGS streams are optimized for an encoding rate that is used granularity scalable video coding,” Proc. IEEE ISCAS 2002, vol. 4, pp. IV-548 - IV-551 for motion estimation of enhancement layer frames at the [5] W. Li, “Overview of the Fine Granularity Scalability in MPEG-4 Video encoder. The reconstructed qualities of PFGS streams can be Standard,” IEEE Trans. CSVT, vol. 11, no. 3, pp. 301-317, Mar. 2001. severely affected for receivers/channels of low bit rates, in [6] X. Li, M. Ammar, and S. Paul, “Video Multicast over the Internet,” which case the FGS streams can be used with minimal IEEE Network Magazine, vol. 13, no. 2, pp. 46-60, Apr. 1999. overhead. The design of video servers that can support a [7] J. Liu, Bo Li, and Ya-Qin Zhang, “An end-to-end adaptation protocol for switching method between FGS and PFGS streams for various layered video multicast using optimal rate allocation,” IEEE Trans. in channel bit rates and various visual contents is a potential topic Multimedia, vol. 6, no.1, pp. 87- 102, 2004. for future research. [8] C.E. Luna, L.P. Kondi, and A.K. Katsaggelos, “Maximizing user utility in video streaming applications,” IEEE Trans. on CSVT, vol. 13, no. 2, pp. 141 – 148, Feb. 2003. ACKNOWLEDGMENT [9] S. McCanne, V. Jacobson, and M. Vetterli, “Receiver-Driven Layered We are grateful to Dr. Feng Wu from Microsoft China for Multicast,” Proc. ACM SIGCOMM Conf., ACM Press, 1996, pp. 117- providing the H.26L-PFGS codec. 130. [10] J.-R. Ohm, “Advances in scalable video coding,” Proceedings of the IEEE, vol. 93, no. 1, pp. 42 – 56, Jan 2005. REFERENCES [11] J. Reichel, H. Schwarz, and M. Wien, “Working Draft 1.0 of 14496- [1] S. Cheung, M. Ammar, and X. Li, “On the Use of Destination Set 10:200x/Amd.1 Scalable Video Coding,” ISO/IEC JTC1/SC29//WG11 Grouping to Improve Fairness in Multicast Video Distribution,” Proc. doc. N6901, Jan. 2005. IEEE INFOCOM 96, pp. 553-560. [12] L. Vicisano, L. Rizzo, and J. Crowcroft, “TCP-Like Congestion Control [2] Wu Feng, Li Shipeng and Ya-Qin Zhang, “A framework for efficient for Layered Multicast Data Transfer,” Proc. of IEEE INFOCOM 98, pp. progressive fine granularity scalable video coding,” IEEE Trans. on 996-1003. Circuits and Systems for Video Technology (CSVT), vol. 11, no. 3, pp. [13] J. Xu, Feng Wu, S. Li, “Transcoding for progressive fine granularity 332 – 344, March 2001. scalable vide coding” Proc. IEEE ISCAS 2004, vol. 3, pp III - 765-8. [3] S. Gorinsky and H. Vin “The utility of feedback in layered multicast congestion control” Proc. ACM NOSSDAV 2001, pp. 93 – 102. 937