Enabling VCR Functionalities in Streaming Media
          (Intelligent Interactive Streaming I2Stream)
bandwidth variations.                                                access any picture at any time by just transmitting t...
Forward      I          P          P           P           P            P              P             P            I       ...
as FW is used, a user expects to browse through a relatively                                                              ...
Structures using backward coding (Figure 8a) and B pictures                       Joint Committee Draft”, document JVT-D01...
Upcoming SlideShare
Loading in …5

Enabling VCR Functionalities in Streaming Media (Intelligent ...


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Enabling VCR Functionalities in Streaming Media (Intelligent ...

  1. 1. Enabling VCR Functionalities in Streaming Media (Intelligent Interactive Streaming I2Stream) Alexis M. Tourapis, Member, IEEE, Feng Wu, Member, IEEE, and Shipeng Li, Member, IEEE implies that, in order to decode and display a specific segment Abstract—In this paper we present new architectures and or picture within a video stream it might be necessary that methods that could be used in order to enable Video Cassette Recorder (VCR) functionalities in Digital Streaming Media. Our several other pictures have to be first decoded. Obviously, this methods are extended for both scalable and non-scalable media, requirement could be very expensive for the decoder but also and could potentially considerably enhance the user's experience for the network bandwidth, in the case of streaming while at the same time reduce the decoding complexity and the applications, considering that more data need to be decoded or network bandwidth required for these applications. transmitted respectively at a given instant than usual. If the capabilities of the decoding device (e.g. PocketPC) or the Index Terms—Digital video, video coding, digital video cassette network (e.g. Internet) are limited, then such functionalities recording (VCR), streaming video, interactive video, I2stream, within the existing architectures become, if not impossible, scalable video very difficult to achieve. To partly solve this problem, most of the existing systems consider only I pictures when a VCR functionality is requested. Basically, in order to achieve a I. INTRODUCTION specific FW or RW ratio, only I pictures that are most closely ULTIMEDIA streaming applications have in recent satisfying the given ratio compared to the current position are M years become a very important part of our lives, mainly due to the emergence of new highly efficient video transmitted and decoded. Evidently, the functionality of this method is highly limited by the GOP structure while, when coding standards (MPEG-4 [1] and H.264 [2]), architectures considering that I pictures themselves usually require (Microsoft’s Windows MediaTM [3] and RealNetwork’s considerable higher bitrate than other picture types, it is very HelixTM platform [4] -[5]), and digital set-top box devices such likely that this solution is not adequate when considering as ReplayTV [6] and TiVo [7]. These technologies are network bandwidth. Recent standards also tend to prefer only currently set to displace older analog-based devices and encoding I pictures when necessary (e.g. on scene changes or systems such as analog TV and Video Cassette Recorders after relatively long intervals) since higher encoding (VCR). The task though of enabling Interactive and VCR like performance can be achieved by P and B pictures. functionalities, such as fast-forward (FW), rewind (RW), In [8] Chen et al. proposed, in order to enhance the RW and segment repetition etc, within these digital systems still Random Access functions for a Video Client, converting remains a rather difficult and daunting task considering the already received pictures into I pictures and storing them onto nature of coded video. On the other hand, the only constrain a local hard-disk. This would enable the client to be able to for analog VCR devices was for random location/segment access and decode at any time any picture within the video, seeking mainly due to the use of tape storage. Unlike though thus reducing its computational cost. Unfortunately this kind of analog video, where each picture is completely independent buffering would also imply that a considerable storage space and can be viewed easily at any time, digital video (DV) is exists on the decoder, while also it might not be desirable for a instead constrained by a large number of interdependencies client to be able to store the video after this was received (i.e. within different sections, i.e. pictures or Groups of Pictures due to copyright reasons). Other methods instead of using I (GOP). More specifically, a GOP may consist of Intraframe pictures, considered skipping entire segments/GOPs [9] in (I), Interframe (P), and Bi-directional (B) pictures. From these, order to achieve certain VCR functionalities but their quality I pictures are encoded completely independently from other and performance was rather limited. In [10] an alternative pictures types, while P and B pictures use motion compensated method was proposed according to which an additional version methods and are predicted from previously encoded pictures to of the same video is encoded for FW retrieval at a higher achieve much higher coding efficiency. frame rate. During normal playback, the original video is The decoding dependency of P and B pictures immediately transmitted and decoded, while if FW mode was requested, data from the FW video are first GOP synchronized (using key I pictures) and then transmitted. In [11] an extension was A. M. Tourapis was with Microsoft Research Asia, Beijing 100080 China. He is now with Thomson - Corporate Research, 2 Independence Way, proposed where multiple FW and RW videos were used, Princeton, NJ 08540 USA (phone: 609-987-7329; fax: 609-987-7299; e-mail: which enabled support for multiple FW and RW ratios. This alexismt@ieee.org). method also suggested sub-sampling the FW and RW videos F. Wu, is with Microsoft Research Asia, Beijing 100080 China. (e-mail: thus reducing storage and transmission cost. Nevertheless, fengwu@microsoft.com). S. Li, is with Microsoft Research Asia, Beijing 100080 China. (e-mail: these methods were all restricted by the GOP synchronization spli@microsoft.com). process, while none of them appeared to consider network
  2. 2. bandwidth variations. access any picture at any time by just transmitting the SI New technologies have been recently proposed that can picture information, and without having to transmit any further partly solve some of the previously discussed issues. More information about other pictures within the sequence. specifically, in [12]-[14] the concepts of Switching P (SP) and Obviously fewer SI pictures could be used, i.e. at given Switching I (SI) pictures were introduced which allowed the intervals, while other pictures (i.e. P2, P4, P6, and P8 in Figure use of a more flexible GOP structure within a stream, and thus 1) could be accessed by transmitting and decoding, apart from allowing higher coding efficiency compared to fixed GOP their own encoded information, their corresponding SI picture. structures. VCR functionalities could be achieved through the transmission of SI pictures, which allowed easy access at any I1 P2 SP3 P4 SP5 P6 SP7 P8 I9 SP10 point of the stream without significantly affecting coding Original Bitstream performance. In [15]-[16] a different method that employed two opposite direction streams with specific GOP structures Switching was proposed which enabled not only RW functionality, but Bitstream also managed to considerably reduce the decoder complexity SI3 SI5 SI7 SI10 and network bandwidth through efficient switching between 1 2 3 4 5 6 7 8 9 10 the two streams. Due to its nature, we will call this scheme as Figure 1: Switching through SI pictures. the Bwd/Fwd scheme throughout this paper. In this paper a new architecture is presented that can further Unfortunately SI pictures, although considerably reducing reduce the associated costs on the decoder complexity and decoding complexity, when considering that their size is network bandwidth, and essentially enhance VCR similar if not slightly larger to that of I pictures would require functionalities within DV applications. Similar to the a considerable network bandwidth for transmission that may Bwd/Fwd scheme, this can be achieved by the introduction of not be available. An additional problem of such a method is an additional, not necessarily of opposite direction, secondary that for enabling SI reconstruction, the original bitstream needs stream named as the Intelligent Interactive Stream (I2Stream). to be constrained under some restrictions (i.e. the switching In our architecture, the original stream does not need to points should be coded with the less efficient SP pictures comply to any specific GOP structure, and thus can be coded instead of P pictures), which would imply a relatively small with the best coding performance in mind, exploiting all loss in coding efficiency for the original bitstream. SI pictures possible interdependencies between video segments. On the themselves would also require significant file storage, which other hand, the I2Stream stream is coded by exploiting might not be desirable. considerably fewer interdependencies, thus achieving the 22 required functionalities at lower decoding and bandwidth cost. 20 Average bitrate (Mbps) 18 Furthermore, in our architecture both original and I2Stream 16 streams can be used for normal playback, which can be quite Single 14 useful for other applications, such as error recovery. 12 Bwd/Fwd 10 In Section II we will first present in more detail the concept 8 of SI pictures and how the Bwd/Fwd scheme could improve 6 VCR functionalities. In Section III the I2Stream architecture 4 2 4 6 8 10 12 will be presented, while finally an extension of the I2Stream Speed-up factor within scalable streams will be discussed in Section IV. Figure 2: Average bitrate for transmitting a 3Mbps sequence (GOP=14) over the network using a single forward stream versus the Bwd/Fwd scheme with II. SWITCHING PICTURES AND BI-DIRECTIONAL respect to different speed-up factors for Fast Forward mode STREAM COMBINING B. Bidirectional (Bwd/Fwd) Stream Combining A. Switching Intra Pictures An alternative method using an additional backward coded Switching Intra (SI) Pictures were introduced in [13] for stream was introduced in [15]-[16]. In this scheme I pictures enabling random access, splicing and error resiliency/recovery for both streams were alternately equally placed thus enabling within a sequence. Such pictures are very similar in structure easy random access capability at key intervals (Figure 3a). to I pictures, i.e., applying spatial transform and quantization Furthermore, by introducing a least cost method according to with [2] or without [1] employing spatial prediction for a block the distance of each requested picture versus the two from its neighboring pixels, whereas the blocks are corresponding I pictures from each bitstream, this scheme reconstructed as in SP-pictures [12],[14], i.e., applying the allowed considerably reduced bandwidth wastage versus a transform and quantization steps for the predicted block from single bitstream approach (Figure 2). The method also enabled the intra prediction. transparent and of equal complexity RW function support. A In order to support VCR functionalities using such a drift compensated approach was also introduced (Figure 3b) method, a secondary switching stream only comprising from which avoided any undesirable errors from the bitstream SI pictures, as can be seen in Figure 1, is required. If for every switching process by also transmitting an additional residual picture within the original stream there exists a corresponding, picture that allowed perfect switching between the two identical in reconstruction, SI picture, this would allow us to streams, essentially similar to SP and SI pictures. Even though
  3. 3. Forward I P P P P P P P I P Bitstream FB Drift Forward Comp.Stream Bitstream RFB RFB RFB RFB R FB R FB R FB R FB R FB R FB I P P P P P P P I P BF Drift Backward Comp.Stream Bistream RBF RBF R BF R BF R BF R BF R BF RBF RBF R BF P P P P I P P P P P Backward Bistream 1 2 3 4 5 6 7 8 9 10 P P P P I P P P P P 1 2 3 4 5 6 7 8 9 10 (a) (b) Figure 3: Bwd/Fwd Stream without (a) and with (b) Drift Compensation such method is quite attractive, it can be observed from Figure pictures if such are not needed for display. This can 2 that in terms of bandwidth but also of decoding complexity considerably decrease the network bandwidth and decoder further improvement is necessary. In particular, for complexity requirements. Furthermore, similar to the transmitting a 3Mbps bitstream with a more than 4 times Bwd/Fwd scheme, data from the original stream could also be speed–up factor, we would require approximately 3 times the used for accessing specific pictures to improve performance, bandwidth to be available. One other, relatively minor, which could be decided based on a given cost measure (e.g drawback of such a method is that in order for the backward number of decoded pictures or/and bitrate). This process stream to be available, the entire sequence needs to be already obviously requires that the positions of each picture within the available. A fixed GOP size is usually also necessary for both streams are accessible, i.e. with the use of a Metadata file, backward and forward streams, thus also affecting coding which would also allow us to calculate whether the switching efficiency. is necessary. Drift compensation could also be used to remove any error propagation due to switching. Drift compensation III. INTELLIGENT INTERACTIVE STREAMING (I2STREAM) could be performed from the I2Stream to the original but also In this paper we introduce a new scheme that, similar to the vice versa (Figure 4b). This could also be partly avoided, by Bwd/Fwd scheme, also requires an additional stream named as using a fixed GOP structure and allowing switching only at I2Stream. The I2Stream, due to its format, can solve most of key intervals, with though some loss in coding efficiency of the the problems discussed previously, and essentially reduce the original. In general though, the two streams can have relatively decoding complexity and associated network bandwidth of distinct GOP structures, and could be optimized separately VCR applications even further. from each other. It is of course also possible to use the same Unlike the Bwd/Fwd and SI picture schemes, the I2Stream alternate I picture and bidirectional coding structure of the relies on the usage of a new GOP structure where not all P Bwd/Fwd scheme to improve random access and the RW pictures can be used as references for other future, or past function (Figure 4b). B pictures (Figure 5) could also be used reference pictures, and could even be immediately discarded within the I2Stream stream, and could either completely after decoding (Figure 4a). This concept of disposable P replace Pd pictures, or can be combined together within the pictures (or Pd pictures), even though would imply some stream. In this case we may allow or disallow B pictures to coding efficiency loss for encoding the I2Stream stream refer to Pd pictures, depending on whether we would prefer (around 30-40% if 3 disposable pictures are used) compared to having higher coding efficiency within the I2Stream or the original stream which remains unaffected, it allows us of improved VCR functionality performance. In general though, increased flexibility in terms of VCR capabilities. In although B pictures can enhance coding efficiency, they also particular, if there is a request for a specific VCR function, the tend to increase the dependencies between pictures, and thus server or decoder can switch, if needed, to the I2Stream while also probably limiting the required VCR capabilities. it does not need to transmit or decode any of the disposable By taking in consideration that if a VCR functionality such I P P P P P P P I P Original Original I P P P P P P P I P Bitstream Bitstream Drift Comp.Stream Drift R R R R R R R R R R Comp.Stream 2 I Stream R R R R R R R R R R Bistream I2Stream Bistream I Pd Pd Pd P Pd Pd Pd I Pd P P Pr Pr Pr I Pr Pr P Pr r 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 (a) (b) Figure 4: The I2Stream scheme with (a) single and (b) dual compensation switching. I2Stream can also use backward coding (b), to enhance RW.
  4. 4. as FW is used, a user expects to browse through a relatively especially if it is known that the error would be compensated long number of pictures without caring as much for their actual soon enough (i.e. by an I picture). This information could be quality compared to when viewing the original bitstream, easily included in the metadata file and can be decided on the higher efficiency can be achieved by coding the entire or server or decoding device. Furthermore, drifting compensation certain pictures (i.e. I pictures) within the I2Stream at lower does not always need to be perfect, and some small drifting, quality or/and resolution compared to the original stream. This for some applications, could be tolerated. Thus, for would allow us to considerably improve the compression of generalization purposes, the switching pictures need not the I2Stream thus reducing the network bandwidth required. always comply with the restrictions imposed by the switching Obviously this would also imply that the drift compensation process. data needed when switching back to the original stream are An additional benefit of our I2Stream architecture is that we likely to be considerably more, but since these are transmitted may use the I2Stream for other applications that may require only when the VCR function is terminated, they would not temporal or spatial scalability. Obviously the I2Stream can affect the performance of our scheme significantly. allow temporal scalability due to the disposable Pd and B I P P P P P P P P P pictures, while spatial scalability can be achieved if all Original Bitstream I2Stream pictures are coded at a lower resolution, and could be essentially used for playback purposes of a relatively coarse version of the video stream at a considerably lower bitrate than the original. Apart from this design enabling scalable Drift Comp.Stream streaming applications without the necessity of more R R R R R R R R R R complicated designs such as Fine Granularity Scalability I2Stream Bistream (FGS) coding [17]-[20], it could potentially also be useful for P B Pr B I B Pr B P B several other applications including Picture in Picture (PnP), video browsing, video searching etc. For example, if PnP is 1 2 3 4 5 6 7 8 9 10 requested by a user, two completely different video sequence Figure 5: I2Stream with B pictures streams could be transmitted at the same time, one for normal An estimate of the average number of pictures needed to be playback using the original stream and a second one for PnP transmitted and decoded for accessing one picture versus a playback. This second stream can be transmitted at a lower single and the Bwd/Fwd streams can be seen in Figure 6, resolution and lower frame rate using the I2Stream. The user where we observe that for lower speed up ratios (up to 6 times) can at any time select which stream should be for normal or for considerably fewer pictures (10-40%) are necessary. PnP playback since only a relatively simple switching process between the original and the corresponding I2Stream of each 7 sequence is required. 6 Av. # of decoded pictures Streaming Enhancement 5 Single layer BW/FW 4 Streaming Proposed Base 3 Layer SF SF SF SF SF SF SF 2 I2Stream Enhancement 1 layer 2 4 6 8 10 12 Speed-up factor I2Stream Switching Figure 6: Average number of decoded pictures to display one picture for a layer given speed-up factor. I2Stream Base As we have previously discussed, drifting compensation can Layer I Pr Pr P Pr Pr I be used for removing any error propagation due to the switching process. Nevertheless, this process can also have an 1 2 3 4 5 6 7 impact on the quality of the original stream due to restrictions Figure 7: Usage of the I2Stream in scalable streams imposed in the encoding process (i.e. using SP pictures instead of P). To avoid this, we may restrict switching to the original IV. I2STREAM IN SCALABLE STREAMS only at fixed key positions. The I2Stream can itself be used for normal playback purposes and switching is performed only Even though we have already claimed that the I2Stream can when the switching position is reached. It is also possible, provide some scalability capability, this can nevertheless be under certain conditions, to allow switching to the original at a considered relatively limited. On the other hand, all of the given point without caring about error propagation and having above concepts could also be applied on scalable video to transmit any drift compensation data. This could be quite architectures [17]-[20]. In this case the I2Stream can also be a beneficial and would not impact visual quality significantly scalable stream that consists of a base layer (BL), a switching layer (SL), and an enhancement layer (EL) (Figure 7).
  5. 5. Structures using backward coding (Figure 8a) and B pictures Joint Committee Draft”, document JVT-D015d5, Jul’02 [3] Microsoft Windows MediaTM, Microsoft Corporation Inc., (Figure 8b) could also be used. The BL essentially provides http://www.microsoft.com/windows/windowsmedia/ the same functionality as was discussed in previous sections. If [4] RealNetworks HelixTM platform, RealNetworks Inc. , a VCR function is chosen then, if necessary, the I2Stream BL http://www.realnetworks.com/solutions/leadership/helix.html is transmitted instead. If it is necessary to switch back to the [5] The Helix Community, https://www.helixcommunity.org/ original stream, then the SL can be transmitted to avoid error [6] SonicBlue Inc. ReplayTV, http://www.replay.com/ [7] TiVo Inc, http://www.tivo.com/ propagation on the original. The SL can be coded either as a [8] M.S. Chen and D.D. Kandlur, “Downloading and stream conversion: single error image, or itself use FGS methods. For example, in supporting interactive playout of videos in a client station,” in Proc. of some cases where some drifting is allowed, the switching 2nd Int. Conf. on Multimedia Computing and Systems, pp 73-80, May could take place by sending only part of, or by even 1995 [9] T.G. Kwon and S. Lee, “PRR: prime round-robin placement for completely omitting, the switching layer. implementing VCR operations,” in Proc. of 1995 IEEE Int. Conf. on For obvious reasons, and to satisfy the scalability Systems, Man and Cybernetics-'Intelligent Systems for the 21st requirements during playback, we observe from Figure 7 that Century', Vol.5, Pages:3920-3925, 22-25 Oct 1995 switching is always performed at the two base layers. [10] S. Berson, S. Ghandeharizadeh, R.R. Muntz, and X. Ju, “Staggered Striping in Multimedia Information Systems,” SIGMOD Conference Nevertheless, the I2Stream EL, which is also FGS coded, can 1994, pp. 79-90 also be used to predict the original's enhancement layer and [11] D.B. Andersen, “A proposed method for creating VCR functions using thus not limiting performance. Furthermore, this layer could be MPEG streams,” in Proc. of the Twelfth International Conference on used as a mechanism for providing enhanced quality for any Data Engineering, pp.380-382, 26 Feb-1 Mar 1996 [12] Ragip Kurceren and Marta Karczewicz, " Improved SP-frame given picture especially if Progressive FGS (PFGS) [19][20] is Encoding", document VCEG-M73, ITU-T Video Coding Experts Group used on the original. In PFGS coding the Els are not Meeting, Austin, TX, 02-04 April 2001 completely independent which would imply that considerable [13] Ragip Kurceren and Marta Karczewicz, “New Macroblock Modes for drifting would occur if they were used independently to SP-frames”, document VCEG-O47, ITU-T Video Coding Experts Group Meeting, Pattaya, Dec. 2001. enhance the quality of a single picture for display. Finally, as [14] Xiaoyan Sun, Feng Wu, Shipeng Li, and Ragip Kurceren, “The in the non-scalable case, the I2Stream can also be used for improved JVT-B097 SP coding scheme”, document JVT-C114, JVT error resiliency/recovery purposes. Meeting, Fairfax, May 2002. [15] Chia-Wen Lin, Jian Zhou, Jeongnam Youn, and Ming-Ting Sun, “MPEG video streaming with VCR functionality,” IEEE Trans. Circuits V. CONCLUSION and Systems for Video Technology, vol. 11, no. 3, pp. 415-425, Mar. In this paper, we have introduced a new architecture, called 2001 I2Stream, which can enable VCR and Interactive capabilities, [16] Chia-Wen Lin, Jeongnam Youn, Jian Zhou, Ming-Ting Sun and Iraj Sodagar, “MPEG video streaming with VCR functionality,” in Proc. such as Fast Forward, Rewind, Random Access, Selective IEEE Int. Symp. Multimedia Software Eng., pp. 146-153, Dec. 2000, Picture Quality Enhancement etc, within multimedia content, Taipei, Taiwan. while minimizing the required network bandwidth and decoder [17] W. Li, “Fine granularity scalability in MPEG-4 for streaming video”, in complexity. Our method can be applied in both scalable and 2000 Proc. IEEE Int. Symp. on Circuits and Systems, ISCAS 2000, vol 1, 299-302, Switzerland, May 2000. non-scalable video content and architectures, while it can also [18] W. Li, “Overview of Fine Granularity Scalability in MPEG-4 Video find use in error resiliency and recovery applications. Standard”, IEEE Trans. Circuits and Systems for Video Technology, vol. 11, no. 3, pp. 301-317, Mar. 2001 REFERENCES [19] F. Wu, S. Li and Y.-Q. Zhang, “A framework for efficient progressive fine granularity scalable video coding”, IEEE Trans. Circuits and [1] ISO/IEC Standard 14496-2:2001. Information technology – Coding of Systems for Video Technology, vol. 11, no 3, pp 332-344, Mar. 2001. audio-visual objects – Part 2: Visual [20] Xiaoyan Sun, Feng Wu, Shipeng Li, Wen Gao, “The framework for [2] Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG, “Joint seamless switching of scalable bitstreams”, ISO/IEC JTC1/SC29/WG11 Video Specification (ITU-T Rec. H.264 | ISO/IEC 14496-10 AVC) – – Doc. No.W8214, Jeju Island, March 2002. Streaming Streaming Enhancement Enhancement layer layer Streaming Streaming Base Base Layer Layer SF SF SF SF SF SF SF SF SF SF SF SF SF SF I2Stream I2Stream Enhancement Enhancement layer layer I2Stream I2Stream Switching Switching layer layer I2Stream I2Stream Base Base Layer Layer I Pr Pr P Pr Pr I I B B P B B I 1 2 3 4 5 6 7 1 2 3 4 5 6 7 (a) (b) Figure 8: I2Streams with (a) backward coded GOP and (b) B pictures