3DTV CONTENTCAPTURE, ENCODINGAND TRANSMISSIONBUILDING THE TRANSPORTINFRASTRUCTURE FORCOMMERCIAL SERVICESDaniel MinoliA JOHN WILEY & SONS, INC., PUBLICATION
3DTV CONTENTCAPTURE, ENCODINGAND TRANSMISSION
3DTV CONTENTCAPTURE, ENCODINGAND TRANSMISSIONBUILDING THE TRANSPORTINFRASTRUCTURE FORCOMMERCIAL SERVICESDaniel MinoliA JOHN WILEY & SONS, INC., PUBLICATION
Copyright 2010 by John Wiley & Sons, Inc. All rights reservedPublished by John Wiley & Sons, Inc., Hoboken, New JerseyPublished simultaneously in CanadaNo part of this publication may be reproduced, stored in a retrieval system, or transmitted in anyform or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise,except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, withouteither the prior written permission of the Publisher, or authorization through payment of theappropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers,MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requeststo the Publisher for permission should be addressed to the Permissions Department, John Wiley &Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online athttp://www.wiley.com/go/permission.Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their bestefforts in preparing this book, they make no representations or warranties with respect to theaccuracy or completeness of the contents of this book and speciﬁcally disclaim any impliedwarranties of merchantability or ﬁtness for a particular purpose. No warranty may be created orextended by sales representatives or written sales materials. The advice and strategies containedherein may not be suitable for your situation. You should consult with a professional whereappropriate. Neither the publisher nor author shall be liable for any loss of proﬁt or any othercommercial damages, including but not limited to special, incidental, consequential, or otherdamages.For general information on our other products and services or for technical support, please contactour Customer Care Department within the United States at (800) 762-2974, outside the UnitedStates at (317) 572-3993 or fax (317) 572-4002.Wiley also publishes its books in a variety of electronic formats. Some content that appears in printmay not be available in electronic formats. For more information about Wiley products, visit ourweb site at www.wiley.com.Library of Congress Cataloging-in-Publication Data:Minoli, Daniel, 1952- 3DTV content capture, encoding and transmission : building thetransport infrastructure for commercial services / Daniel Minoli. p. cm. ISBN 978-0-470-64973-2 (cloth) 1. Stereoscopic television. I. Title. TK6643.M56 2010 621.388– dc22 2010008432Printed in Singapore10 9 8 7 6 5 4 3 2 1
For Anna, Emma, Emile, Gabby, Gino, and Angela
CONTENTSPreface xiAbout the Author xiii1 Introduction 1 1.1 Overview 1 1.2 Background 6 1.2.1 Adoption of 3DTV in the Marketplace 6 1.2.2 Opportunities and Challenges for 3DTV 16 1.3 Course of Investigation 19 References 24 Appendix A1: Some Recent Industry Events Related to 3DTV 262 3DV and 3DTV Principles 29 2.1 Human Visual System 29 2.1.1 Depth/Binocular Cues 33 2.1.2 Accommodation 34 2.1.3 Parallax 34 2.2 3DV/3DTV Stereoscopic Principles 35 2.3 Autostereographic Approaches 42 References 453 3DTV/3DV Encoding Approaches 47 3.1 3D Mastering Methods 51 3.1.1 Frame Mastering for Conventional Stereo Video (CSV) 51 3.1.2 Compression for Conventional Stereo Video (CSV) 55 3.2 More Advanced Methods 59 3.2.1 Video Plus Depth (V + D) 60 vii
viii CONTENTS 3.2.2 Multi-View Video Plus Depth (MV + D) 63 3.2.3 Layered Depth Video (LDV) 65 3.3 Short-term Approach for Signal Representation and Compression 69 3.4 Displays 69 References 69 Appendix A3: Color Encoding 73 Appendix B3: Additional Details on Video Encoding Standards 74 B3.1 Multiple-View Video Coding (MVC) 75 B3.2 Scalable Video Coding (SVC) 78 B3.3 Conclusion 794 3DTV/3DV Transmission Approaches and Satellite Delivery 81 4.1 Overview of Basic Transport Approaches 81 4.2 DVB 90 4.3 DVB-H 95 References 99 Appendix A4: Brief Overview of MPEG Multiplexing and DVB Support 101 A4.1 Packetized Elementary Stream (PES) Packets and Transport Stream (TS) Unit(s) 101 A4.2 DVB (Digital Video Broadcasting)-Based Transport in Packet Networks 104 A4.3 MPEG-4 and/or Other Data Support 1055 3DTV/3DV IPTV Transmission Approaches 113 5.1 IPTV Concepts 114 5.1.1 Multicast Operation 115 5.1.2 Backbone 120 5.1.3 Access 125 5.2 IPv6 Concepts 132 References 135 Appendix A5: IPv6 Basics 138 A5.1 IPv6 Overview 138 A5.2 Advocacy for IPv6 Deployment—Example 157
CONTENTS ix6 3DTV Standardization and Related Activities 163 6.1 Moving Picture Experts Group (MPEG) 165 6.1.1 Overview 165 6.1.2 Completed Work 166 6.1.3 New Initiatives 178 6.2 MPEG Industry Forum (MPEGIF) 182 6.3 Society of Motion Picture and Television Engineers (SMPTE) 3D Home Entertainment Task Force 183 6.4 Rapporteur Group On 3DTV of ITU-R Study Group 6 184 6.5 TM-3D-SM Group of Digital Video Broadcast (DVB) 187 6.6 Consumer Electronics Association (CEA) 188 6.7 HDMI Licensing, LLC 189 6.8 Blu-ray Disc Association (BDA) 189 6.9 Other Advocacy Entities 190 6.9.1 3D@Home Consortium 190 6.9.2 3D Consortium (3DC) 190 6.9.3 European Information Society Technologies (IST) Project “Advanced Three-Dimensional Television System Technologies” (ATTEST) 191 6.9.4 3D4YOU 192 6.9.5 3DPHONE 196 References 198Glossary 201Index 225
PREFACE3 Dimensions TV (3DTV) became commercially available in the United Statesin 2010 and service in other countries was expected to follow soon thereafter.3DTV is a subset of a larger discipline known as 3D Video (3DV). There arenow many routine vendor announcements related to 3DTV/3DV, and there arealso conferences wholly dedicated to the topic. To highlight the commercial interest in this topic, note that ESPN announced inJanuary 2010 that it planned to launch what would be the world’s ﬁrst 3D sportsnetwork with the 2010 World Cup soccer tournament in June 2010, followed byan estimated 85 live sports events during its ﬁrst year of operation. DirecTV wasplanning to become the ﬁrst company to offer satellite-based 3D as announcedat the 2010 International Consumer Electronics Show. Numerous manufacturersshowed 3D displays at recent consumer electronics trade shows. Several standardsbodies and industry consortia are now working to support commercialization ofthe service. An increasing inventory of content is now also becoming availablein 3D. This text offers an overview of the content capture, encoding, and transmis-sion technologies that have emerged of late in support of 3DTV/3DV. It focuseson building the transport infrastructure for commercial services. The book isaimed at interested planners, researchers, and engineers who wish to get anoverview of the topic. Stakeholders involved with the rollout of the infrastructureinclude video engineers, equipment manufacturers, standardization committees,broadcasters, satellite operators, Internet Service Providers, terrestrial telecom-munications carriers, storage companies, content-development entities, designengineers, planners, college professors and students, and venture capitalists. While there is a lot of academic interest in various aspects of the overall sys-tem, service providers and the consumers ultimately tend to take a system-levelview. While service providers do to an extent take a constructionist bottom-upview to deploy the technological building blocks (such as encoders, encapsula-tors, IRDs, and set-top boxes), 3DTV stakeholders need to consider the overallarchitectural system-level view of what it will take to deploy an infrastructure thatis able to reliably and cost-effectively deliver a commercial-grade quality bundleof multiple 3DTV content channels to paying customers with high expectations. This text, therefore, takes such system-level view. Fundamental visual con-cepts supporting stereographic perception of 3DTV are reviewed. 3DTV technol-ogy and digital video principles are discussed. Elements of an end-to-end 3DTVsystem are covered. Compression and transmission technologies are assessed xi
xii PREFACEfor satellite and terrestrial (or hybrid) IPTV-based architecture. Standardizationactivities, critical to any sort of broad deployment, are identiﬁed. The focus of this text is how to actually deploy the technology. There is asigniﬁcant quantity of published material in the form of papers, reports, and tech-nical speciﬁcations. This published material forms the basis for this synthesis, butthe information is presented here in a self-contained, organized, tutorial fashion.
ABOUT THE AUTHORMr. Minoli has done extensive work in video engineering, design, and implemen-tation over the years. The results presented in this book are based on work donewhile at Bellcore/Telcordia, Stevens Institute of Technology, AT&T, and otherengineering ﬁrms, starting in the early 1990s and continuing to the present. Someof his video work has been documented in the books he has authored such as 3DTelevision (3DTV) Technology, Systems, and Deployment - Rolling out theInfrastructure for Next-Generation Entertainment (Francis and Taylor, 2010);IP Multicast with Applications to IPTV and Mobile DVB-H (Wiley/IEEEPress, 2008); Video Dialtone Technology: Digital Video over ADSL, HFC,FTTC, and ATM (McGraw-Hill, 1995); Distributed Multimedia ThroughBroadband Communication Services (co-authored) (Artech House, 1994); Dig-ital Video (4 chapters) in The Telecommunications Handbook, K. Terplan &P. Morreale Editors, IEEE Press, 2000; and, Distance Learning: Technologyand Applications (Artech House, 1996). Mr. Minoli has many years of technical hands-on and managerial experience inplanning, designing, deploying, and operating IP/IPv6-, telecom-, wireless-, andvideo networks, and data center systems and subsystems for global best-in-classcarriers and ﬁnancial companies. He has worked in ﬁnancial ﬁrms such as AIG,Prudential Securities, Capital One Financial, and service provider ﬁrms such asNetwork Analysis Corporation, Bell Telephone Laboratories, ITT, Bell Commu-nications Research (now Telcordia), AT&T, Leading Edge Networks Inc., andSES Engineering, where he is Director of Terrestrial Systems Engineering (SESis the largest satellite services company in the world). At SES, in addition to otherduties, Mr. Minoli has been responsible for the development and deployment ofIPTV systems, terrestrial and mobile IP-based networking services, and otherglobal networks. He also played a founding role in the launching of two com-panies through the high-tech incubator Leading Edge Networks Inc., which heran in the early 2000s: Global Wireless Services, a provider of secure broadbandhotspot mobile Internet and hotspot VoIP services; and, InfoPort Communica-tions Group, an optical and Gigabit Ethernet metropolitan carrier supporting datacenter/SAN/channel extension and cloud computing network access services. Forseveral years, he has been Session, Tutorial, and now overall Technical ProgramChair for the IEEE ENTNET (Enterprise Networking) conference; ENTNETfocuses on enterprise networking requirements for large ﬁnancial ﬁrms and othercorporate institutions. xiii
xiv ABOUT THE AUTHOR Mr. Minoli has also written columns for ComputerWorld, NetworkWorld,and Network Computing (1985–2006). He has taught at New York University(Information Technology Institute), Rutgers University, and Stevens Institute ofTechnology (1984–2006). Also, he was a Technology Analyst At-Large, for Gart-ner/DataPro (1985–2001); based on extensive hand-on work at ﬁnancial ﬁrmsand carriers, he tracked technologies and wrote CTO/CIO-level technical scansin the area of telephony and data systems, including topics on security, disas-ter recovery, network management, LANs, WANs (ATM and MPLS), wireless(LAN and public hotspot), VoIP, network design/economics, carrier networks(such as metro Ethernet and CWDM/DWDM), and e-commerce. Over the yearshe has advised Venture Capitals for investments of $150M in a dozen high-techcompanies. He has acted as Expert Witness in a (won) $11B lawsuit regarding aVoIP-based wireless air-to-ground communication system, and has been involvedas a technical expert in a number of patent infringement lawsuits (including twolawsuits on digital imaging).
CHAPTER 1Introduction1.1 OVERVIEWRecently, there has been a lot of interest on the part of technology suppliers,broadcasters, and content providers to bring 3 Dimension Video (3DV) to theconsumer. The year 2010 has been called the ﬁrst year of 3D Television (3DTV)by some industry players. 3DTV is the delivery of 3DV on a TV screen, typi-cally in the consumer’s home. The initial step in this commercialization endeavorwas to make 3D content available on Blu-ray Discs (BDs), for example with therelease of Titanic, Terminator, and Avatar. However, well beyond that stand-alonehome arrangement there has been a concerted effort to develop end-to-end sys-tems to bring 3DTV services to the consumer, supported by regular commercialprogramming that is delivered and made available on a routine scheduled basis.Broadcasters such as, but not limited to, ESPN, DIRECTV, Discovery Commu-nications, BSkyB, and British Channel 4 were planning to start 3D programmingin 2010. LG, Samsung, Panasonic, Sony, JVC, Vizio, Sharp, and Mitsubishi,among others, were actively marketing high quality TV display products at presstime, with some such as Samsung and Mitsubishi already shipping 3D-ready ﬂat-panel TVs as far back as 2008. Front Projection 3D systems for medium-sizedaudiences (5–25 people), for example for the “prosumer,” have been availablefor longer; of course, movie theater systems have been around for years. Thegoal of the 3DTV industry is to replicate to the degree possible the experienceachievable in a 3D movie theater, but in the home setting. A commercial 3DTV system is comprised of the following functionalelements: capture of 3D content, speciﬁcally moving scenes; encoding(representation) of content; content compression; content transport over satellite,cable, Internet Protocol Television (IPTV), or over-the-air channels1 ; and contentdisplay. Figure 1.1 depicts a logical, functional view of an end-to-end 3DTV1 Internet-based downloading and/or streaming is also a possibility for some applications or subsetof users.3DTV Content Capture, Encoding and Transmission: Building the TransportInfrastructure for Commercial Services, by Daniel MinoliCopyright 2010 John Wiley & Sons, Inc. 1
2 INTRODUCTION Scene replica 3D scene Capture Representation Compression Coding Transmission Signal Display conversion Figure 1.1 Basic 3DTV system—logical view.system. Figure 1.2 depicts graphically a system architecture that may see earlycommercial introduction—this system is known as stereoscopic ConventionalStereo Video (CSV) or Stereoscopic 3D (S3D). Figures 1.3 and 1.4 showexamples of 3D camera arrangements, while Fig. 1.5 illustrates a typical 3Ddisplay (this one using active glasses, also called eyewear). Finally, Fig. 1.6depicts what we call a pictorialization of 3D TV screens, as may be includedin vendor brochures. This text offers an overview of the content capture, encoding, and trans-mission subelements, speciﬁcally the technologies, standards, and infrastruc-ture required to support commercial real-time 3DTV/3DV services. It reviewsthe required standards and technologies that have emerged of late—or are justemerging—in support of such new services, with a focus on encoding and thebuild-out of the transport infrastructure. Stakeholders involved with the rolloutof this infrastructure include consumer and system equipment manufacturers,broadcasters, satellite operators, terrestrial telecommunications carriers, InternetService Providers (ISPs), storage companies, content-development entities, andstandardization committees. There is growing interest on the part of stakeholders to introduce 3DTV ser-vices, basically as a way to generate new revenues. There was major emphasis
OVERVIEW 3 5 - Transmission 2 - Mastering of two 4 - Digital (HD) frames encoder Satellite IPTV cable over the air 3 - 3D processor combines Internet two frames into single 1 - Dual camera capture (HDTV) frames system 3DTV 3DTV 6 - Decoding glasses display 7 - Displaying, viewing Figure 1.2 Basic 3DTV system—conventional stereo video. Figure 1.3 Illustrative 2-camera rig for 3D capture. Source: www.inition.co.uks.on 3DTV from manufacturers at various consumer shows taking place in therecent past. One in four consumers surveyed by the Consumer Electronics Asso-ciation (CEA) in a press time study indicated that they plan to buy a 3D TV setwithin the next three years . The research ﬁrm DisplaySearch has forecastedthat the 3D display market will grow to $22 billion by 2018 (this represents an
4 INTRODUCTIONFigure 1.4 Illustrative single 3D camcorder with dual lenses. Source: Panasonic CES2010 Press Kit. Figure 1.5 Illustrative 3D home display. Source: Panasonic CES 2010 Press Kit.
OVERVIEW 5 Figure 1.6 Pictorialization of 3D home display. Source: LG CES 2010 Press Kit.annual compound growth rate of about 50%2 ). When it comes to entertainment,especially for a compelling type of entertainment that 3D has the opportunity ofbeing, there may well be a reasonably high take rate, especially if the price pointis right for the equipment and for the service. Classical questions that are (and/or should be) asked by stakeholders includethe following: • Which competing 3D encoding and transmission technologies should an operator adopt? • What technological advancements are expected in 3D, say by 2012 or 2015? • Where do the greatest market opportunities exist in the 3D market? These and similar questions are addressed in this text.2 The company originally forecast a $1B industry for 2010, but recently lowered that forecast byabout 50%.
6 INTRODUCTION1.2 BACKGROUNDThis section provides an encapsulated assessment of the 3DTV industry landscapeto give the reader a sense of what some of the issues are. It provides a press timesnapshot of industry drivers that support the assertion just made: that there is alot of activity in this arena at this time.1.2.1 Adoption of 3DTV in the MarketplaceIt should be noted that 3D ﬁlm and 3DTV trials have a long history, as shownin Fig. 1.7 (based partially on Ref. 2). However, the technology has ﬁnally Stereoscopic 3D pictures—1838 (Wheatsone) Popular by 1844 in US and Europe 2D Photography—1839 2D Movies—1867 (Lincon) 3D stereoscopic cinema—early 1900 2D TV—1920 (Belin and Baird) Stereoscopic 3D TV—1920 (Baird) Stereoscopic 3D cinema popular by 1950s Stereoscopic 3DTV broadcast—1953 First commercial 3DTV broadcast—1980s Vendor buzz—2010 called “The Year of 3DTV” by some Timeline 1838 1867 1920 1950s 1980s 2010 (not on a linear scale) Analog broadcast of 3DTV (limited to single movie or specific event): first experimental broadcast in 1953 first commercial broadcast in 1980 first experimental broadcast in Europe in 1982 Digital broadcast of 3DTV (typically stereoscopic 3DTV) Japan, 1998; Korea, 2002 3D cinema becomes popular routine stereoscopic broadcast with anaglypth methods, especially for sports events 3DTV over IP networks: video streaming experiments and demonstrations assessment of streaming protocols (RTP over UDP, DCCP) research into Multiview video streaming Development of plethora of displays (ongoing) Development of standards (ongoing) Figure 1.7 History of 3D in ﬁlm and television.
BACKGROUND 7progressed enough at this juncture, for example with the deployment of digitaltelevision (DTV) and High Deﬁnition Television (HDTV), that regular commer-cial services will ﬁnally be introduced at this juncture. We start by noting that there are two general commercial-grade displayapproaches for 3DTV: (i) stereoscopic TV, which requires special glasses towatch 3D movies, and (ii) autostereoscopic TV, which displays 3D images insuch a manner that the user can enjoy the viewing experience without specialaccessories.3 Short-term commercial 3DTV deployment, and the focus of this book, ison stereoscopic 3D imaging and movie technology. The stereoscopic approachfollows the cinematic model, is simpler to implement, can be deployed morequickly (including the use of relatively simpler displays), can produce the bestresults in the short term, and may be cheaper in the immediate future. However,the limitations are the requisite use of accessories (glasses), somewhat limitedpositions of view, and physiological and/or optical limitations including possibleeye strain. In summary, (i) glasses may be cumbersome and expensive (especiallyfor a large family) and (ii) without the glasses, the 3D content is unusable. Autostereoscopic 3DTV eliminates the use of any special accessories: itimplies that the perception of 3D is in some manner automatic, and doesnot require devices—either ﬁlter-based glasses or shutter-based glasses.Autostereoscopic displays use additional optical elements aligned on the surfaceof the screen, to ensure that the observer sees different images with each eye.From a home screen hardware perspective the autostereoscopic approach is morechallenging, including the need to develop relatively more complex displays;also, more complex acquisition/coding algorithms may be needed to makeoptimal use of the technology. It follows that this approach is more complex toimplement, will require longer to be deployed, and may be more expensive inthe immediate future. However, this approach can produce the best results in thelong term, including accessories-free viewing, multi-view operation allowingboth movement and different perspective at different viewing positions, andbetter physiological and/or optical response to 3D. Table 1.1 depicts a larger set of possible 3DTV (display) systems than whatwe identiﬁed above. The expectation is that 3DTV based on stereoscopy willexperience earlier deployment compared with other technological alternatives.Hence, this text focuses principally on stereoscopy. Holography and integralimaging are relatively newer technologies in the 3DTV context compared tostereoscopy; holographic and/or integral imaging 3DTV may be feasible late inthe decade. There are a number of techniques to allow each eye to view theseparate pictures, as summarized in Table 1.2 (based partially on Ref. 3.) All ofthese techniques work in some manner, but all have some shortcomings. To highlight the commercial interest in 3DTV at press time, note that ESPNannounced in January 2010 that it planned to launch what would be the world’s3 Autostereoscopic technology may also (in particular) be appropriate for mobile 3D phones and thereare several initiatives to explore these applications and this 3D phone-display technology.
8 INTRODUCTIONTABLE 1.1 Various 3D Display Approaches and TechnologiesStereoscopic 3D A system where two photographs (or video streams) taken (S3D) from slightly different angles that appear three-dimensional when viewed together; this technology is likely to see the earliest implementation using specially designed equipment displays that support polarizationAutostereoscopic 3D displays that do not require glasses to see the stereoscopic image (using lenticular or parallax barrier technology). Whether stereoscopic or autostereoscopic, a 3D display (screen) needs to generate parallax that, in turn, creates a stereoscopic sense. Will ﬁnd use in cell phone 3D displays in the near futureMulti-viewpoint A system that provides a sensation of depth and motion 3D system parallax based on the position and motion of the viewer; at the display side new images are synthesized, based on the actual position of the viewerIntegral imaging A technique that provides autostereoscopic images with full (holoscopic parallax by using an array of microlenses to generate a imaging) collection of 2D elemental images; in the reconstruction/display subsystem, the set of elemental images is displayed in front of a far-end microlens arrayHolography A technique for generating an image (hologram) that conveys a sense of depth, but is not a stereogram in the usual sense of providing ﬁxed binocular parallax information; holograms appear to ﬂoat in space and they change perspective as one walks left or right; no special viewers or glasses are necessary (note, however, that holograms are monochromatic)Volumetric Systems that use geometrical principles of holography, in systems conjunction with other volumetric display methods. Volumetric displays form the image by projection within a volume of space without the use of a laser light reference, but have limited resolution. They are primarily targeted, at least at press time, at the Industrial, Scientiﬁc, and Medical (ISM) communityﬁrst 3D sports network with the 2010 World Cup soccer tournament in June 2010,followed by an estimated 85 live sports events during its ﬁrst year of operation.DIRECTV announced that they will start 3D programming in 2010. DIRECTV’snew HD 3D channels will deliver movies, sports, and entertainment contentfrom some of the world’s most renowned 3D producers. DIRECTV is currentlyworking with AEG/AEG Digital Media, CBS, Fox Sports/FSN, Golden BoyPromotions, HDNet, MTV, NBC Universal, and Turner Broadcasting System,Inc., to develop additional 3D programming that will debut in 2010–2011. Atlaunch, the new DIRECTV HD 3D programming platform will offer a 24/7 3Dpay per view channel focused on movies, documentaries, and other programming;
TABLE 1.2 Current Techniques to Allow Each Eye to View Distinct Pictures Streams With Orthogonal Uses orthogonal (different) polarization planes for each, with matching viewer glasses for each of the appliances polarization left and right eye pictures. Light from each picture is ﬁltered such that only one plane for the light (glasses) wave is available. This is easy to arrange in a cinema, but more difﬁcult to arrange in a television display. Test television systems have been developed on the basis of this method, either using two projection devices projecting onto the same surface, or two displays orthogonally placed so that a combined image can be seen using a semisilvered mirror. In either case, these devices are “nonstandard” television receivers. Of the systems with glasses, this is considered the “best” Colorimetric One approach is to use different colorimetric arrangements (anaglyth) for each of the two pictures, arrangements coupled with glasses that ﬁlter appropriately. A second, is a relatively new notch ﬁlter color separation (or anaglyth) technique that can be used in projection systems (advanced by Dolby)—described later in the chapter Time Sometimes also called “interlaced stereo”, content shown with consecutive left and right signals and multiplexing shuttered glasses. This technology is applicable to 3DTV. This technique is still used for movie of the display theaters today, such as the IMAX, and sometimes used in conjunction with polarization plane separation. In a Cathode Ray Tube (CRT) environment, a major shortcoming of the interlaced stereo was image ﬂicker, since each eye would see only 25 or 30 images per second, rather than 50 or 60. To overcome this, the display rate could be doubled to 100 or 120 Hz to allow ﬂicker-free reception “Virtual reality” Technique using immersion headgear/glasses often used for video games headset Without Lenticular This technique arranges for each eye’s view to be directed toward separate picture elements by lenses. appliances This is done by fronting the screen with a ribbed (lenticular) surface Barrier This technique arranges for the screen to be fronted with barrier slots that perform a similar function. In this system, two views (left and right), or more than two (multi-camera 3D) can be used. However, since each of the picture elements (stripes or points) have to be laid next to each other, the number of views impacts on the resolution available. There is a trade-off between resolution and ease of viewing. Arrangements can be made with this type of system to track head or eye movements, and thus change the barrier position, giving the viewer more freedom of head movement9
10 INTRODUCTIONa 24/7 3D DIRECTV on Demand channel; and a free 3D sampler demo channelfeaturing event programming such as sports, music, and other content. Comcasthas announced that its VOD (Video-On-Demand) service is offering a numberof movies in anaglyph 3D (as well as HD) form. Comcast customers can pickup 3D anaglyph glasses at Comcast payment centers and malls “while supplieslast” (anaglyph is a basic and inexpensive method of 3D transmission that relieson inexpensive colored glasses, but its drawback is the relatively low quality.)Verizon’s FiOS was expected to support 3DTV programming by Late 2010. SkyTV in the United Kingdom was planning to start broadcasting programs in 3D inthe fall of 2010 on a dedicated channel that will be available to anyone who hasthe Sky HD package; there are currently 1.6 million customers who have a SkyHD set-top box. Sky TV has not announced what programs will be broadcastin 3D, but it is expected to broadcast live the main Sunday afternoon soccergame from the Premiership in 3D from the 2011 season, along with some artsdocumentaries and performances of ballet . Sky TV has already invested ininstalling special twin-lens 3D cameras at stadiums. (Appendix A1 includes alisting of events during the year, prior to the publication of this text to furtherdocument the activity in this arena.) 3DTV television displays could be purchased in the United States and UnitedKingdom as of the spring of 2010 for $1000–5000 initially, depending ontechnology and approach. Liquid Crystal Display (LCD) systems with activeglasses tend to generally cost less. LG released its 3D model, a 47-in. LCDscreen, expected to cost about $3000; with this system, viewers will need towear polarized dark glasses to experience broadcasts in 3D. Samsung and Sonyalso announced they were bringing their own versions to market by the summerof 2010, along with 3D Blu-ray players, allowing consumers to enjoy 3D moviessuch as Avatar and Up, in their own homes . Samsung and Sony’s modelsuse LED (Light-Emitting Diode) screens which are considered to give a crisperpicture and are, therefore, expected to retail for about $5000 or possibly more.While LG is adopting the use of inexpensive polarizing dark glasses, Sonyand Samsung are using active shutter technology. This requires users to buyexpensive dark glasses, which usually cost more than $50 and are heavier thanthe $2–3 plastic polarizing ones. Active shutter glasses alternately darken overone eye, and then the other, in synchronization with the refresh rate of the screenusing shutters built into the glasses (using infrared or Bluetooth connections).Panasonic Corporation has developed a full HD 3D home theater systemconsisting of a plasma full HD 3D TVs, 3D Blu-ray player, and active shutter3D glasses. The 3D display was originally available in 50-in., 54-in., 58-in.and 65-in. class sizes. High-end systems are also being introduced; for examplePanasonic announced a 152-in. 4K × 2K (4096 × 2160 pixels)-deﬁnition fullHD 3D plasma display. The display features a new Plasma Display Panel(PDP) that uses self-illuminating technology. Self-illuminating plasma panelsoffer excellent response to moving images with full motion picture resolution,making them suitable for rapid 3D image display (its illuminating speed is aboutone-fourth the speed of conventional full HD panels). Each display approachhas advantages and disadvantages as shown in Table 1.3.
TABLE 1.3 Summary of Possible, Commercially Available TV Screen/System Choices for 3D 3D Display System Advantages Disadvantages 1 Projection-based Big-screen 3D effect similar to cinematic experience Needs a silver screen to retain polarization of light FPTV (polarized Excellent-to-good light intensity Alignment of two projectors should be such that they display) with Choice of projectors/cost are stacked on top of each other passive glasses Inexpensive, lightweight passive 3D glasses Not totally d´ cor-friendly e 2 Projection-based Option of using newer single DLP projectors that More expensive glasses FPTV (unpolarized support 120 Hz refresh rate (active-based system) Need battery-powered LCD shutter glasses display) with active No polarization-protecting screen needed glasses 3 Projection-based Integrated unit—easier to add to room d´ cor e Some light intensity loss at the display level RPTV (polarized To present stereoscopic content, two images are Not of the “ﬂat-panel-TV type”; cannot be hung on display) with projected superimposed onto the same screen walls passive glasses through different polarizing ﬁlters (either linear or circular polarizing ﬁlters can be used) Viewer wears low-cost eyeglasses that also contain a corresponding pair of different polarizing ﬁlters 4 LCD 3DTV Simple-to-use system, not requiring projection setup Some possible loss of resolution (polarized display) To present stereoscopic content, two images are Viewer wears low-cost eyeglasses which also contain with passive projected superimposed onto the same screen a pair of different polarizing ﬁlters glasses through interlacing techniques Some light intensity loss at the display level Relatively expensive ($3000–5000 in 2010) 5 3D plasma/LCD TV Simple-to-use system not requiring projection setup Delivers two images to the same screen pixels, but (unpolarized Flat-screen TV type, elegant d´ cor e alternates them such that two different images are display) with active alternating on the screen glasses Active shutter glasses can be expensive, particularly for a larger viewing group (continued overleaf)11
12 TABLE 1.3 (Continued ) 3D Display System Advantages Disadvantages Requires TV sets to be able to accept and display images at 120/240 Hz Glasses need power Some light intensity loss at the viewer (glasses level) Some loss of resolution Size limited to 60–80 in. at this time but larger systems being brought to market LCDs are relatively cheaper than other alternatives: a 42-in. set based on LCD and shutter glasses was selling for about US$1000 and a 50-in. set was selling for more than US$2000 (compare that with a 42-in. HD LCD TV which costs about US$600–700) LEDs and or plasma systems can be costly 6 Autostereoscopic No glasses needed Very few suppliers in 2010 screen (lenticular Further out in time in terms of development and or barrier) deployment Some key manufacturers have exited the business (for now) Content production is more complex Displays have a “sweet spot” that requires viewers to be within this viewing zone a FPTV, Front Projection Television; DLP, Digital Light Processing; RPTV, Rear Projection Television.
BACKGROUND 13 Figure 1.8 3D Blu-ray disc logo. It is to be expected that 3DTV for home use is likely to ﬁrst see penetration viastored media delivery. For content source, proponents make the case that BD “isthe ideal platform” for the initial penetration of 3D technology in the mainstreammarket because of the high quality of pictures and sound it offers ﬁlm producers.Many products are being introduced by manufacturers: for example at the 2010Consumer Electronics Show (CES) International Trade Show, vendors introducedeight home theater product bundles (one with 3D capability), 14 new players(four with 3D capability), three portable players, and a number of software titles.In 2010 the Blu-ray Disc Association (BDA) launched a new 3D Blu-ray logoto help consumers quickly discern 3D-capable Blu-ray players from 2D-onlyversions (Fig. 1.8) . The BDA makes note of the strong adoption rate of the Blu-ray format. In2009, the number of Blu-ray households increased by more than 75% over 2008totals. After four years in the market, total Blu-ray playback devices (includingboth set-top players and PlayStation3 consoles) numbered 17.6 million units,and 16.2 million US homes had one or more Blu-ray playback devices. Bycomparison, DVD playback devices (set-tops and PlayStation2 consoles) reached14.1 million units after four years, with 13.7 million US households having oneor more playback devices. The strong performance of the BD format is due toa number of factors, including the rapid rate at which prices declined due tocompetitive pressures and the economy; the rapid adoption pace of HDTV sets,which has generated a US DTV household penetration rate exceeding 50%; and,a superior picture and sound experience compared to standard deﬁnition and evenother HD sources. Another factor in the successful adoption pace has been thewillingness of movie studios to discount popular BD titles . Blu-ray softwareunit sales in 2009 reached 48 million, compared with 22.5 million in 2008, upby 113.4%. A number of movie classics were available at press time throughleading retailers at sale prices as low as $10. The BDA also announced (at the end of 2009) the ﬁnalization and releaseof the Blu-ray 3D speciﬁcation. These BD speciﬁcations for 3D allow for fullHD 1080p resolution to each eye. The speciﬁcations are display agnostic, mean-ing they apply equally to plasma, LCD, projector, and other display formatsregardless of the 3D systems those devices use to present 3D to viewers. The
14 INTRODUCTIONspeciﬁcations also allow the PlayStation3 gaming console to play back 3D con-tent. The speciﬁcations that represent the work of the leading Hollywood studiosand consumer electronic and computer manufacturers, will enable the home enter-tainment industry to bring stereoscopic 3D experience into consumers’ livingrooms on BD, but will require consumers to acquire new players, HDTVs, andshutter glasses. The speciﬁcations allow studios (but do not require them) topackage 3D Blu-ray titles with 2D versions of the same content on the samedisc. The speciﬁcations also support playback of 2D discs in forthcoming 3Dplayers and can enable 2D playback of Blu-ray 3D discs on an installed baseof BD. The Blu-ray 3D speciﬁcation encodes 3D video using the Multi-ViewVideo Coding (MVC) codec, an extension to the ITU-T H.264 Advanced VideoCoding (AVC) codec currently supported by all BD players. MPEG-4 (MovingPicture Experts Group 4)-MVC compresses both left and right eye views with atypical 50% overhead compared to equivalent 2D content, according to BDA andcan provide full 1080p resolution backward compatibility with current 2D BDplayers . The broadcast commercial delivery of 3DTV on a large scale—whetherover satellite/Direct-To-Home (DTH), over the air, over cable systems, or viaIPTV—may take some number of years because of the relatively large-scaleinfrastructure that has to be put in place by the service providers and the limitedavailability of 3D-ready TV sets in the home (implying a small subscriber,and so small revenue base). A handful of providers were active at press time,as described earlier, but general deployment by multiple providers serving ageographic market will come at a future time. Delivery of downloadable 3DTVﬁles over the Internet may occur at any point in the immediate future, but theprovision of a broadcast-quality service over the Internet is not likely for theforeseeable future. At the transport level, 3DTV will require more bandwidth of regular pro-gramming, perhaps even twice the bandwidth in some implementations (e.g.,simulcasting—the transmission of two fully independent channels4 ); some newerschemes such as “video + depth” may require only 25% more bandwidth com-pared to 2D, but these schemes are not the leading candidate technologies foractual deployment in the next 2–3 years. Other interleaving approaches use thesame bandwidth of a channel now in use, but at a compromise in resolution.Therefore, in principle, if HDTV programming is broadcast at high quality, say,12–15 Mbps using MPEG-4 encoding, 3DTV using the simplest methods of twoindependent streams will require 24–30 Mbps.5 This data rate does not ﬁt astandard over-the-air digital TV (DTV) channel of 19.2 Mbps, and will also be4 Inthe 3DTV context, the term “simulcasting” has been used with two meanings: one use is asimplied above—the coding and transmission of two channels (which is unlikely to occur in reality);the second use is in the traditional sense of transmitting, say a HDTV signal and also a 3DTV signalby some other means or on some other channel/system.5 Some HDTV content may be delivered at lower rates by same operators, say 8 Mbps; this rate,however, may not be adequate for sporting HDTV channels, and may be marginal for 3D TV at1080p/60 Hz per eye.
BACKGROUND 15a challenge for non-Fiber-To-The-Home (non-FTTH) broadband Internet con-nections. However, one expects to see the emergence of bandwidth reductiontechniques, as alluded to above. On the other hand, DTH satellite providers,terrestrial ﬁberoptic providers, and some cable TV ﬁrms should have adequatebandwidth to support the service. For example, the use of the Digital VideoBroadcast Satellite Second Generation (DVB-S2) allows a transponder to carry75 Mbps of content with modulation using an 8-point constellation and twice thatmuch with a 16-point constellation. The trade-off would be, however (if we usethe raw HD bandwidth just described as a point of reference), that a DVB-S2transponder that would otherwise carry 25 channels of standard deﬁnition videoor 6–8 channels of HD video would now only carry 2–3 3DTV channels. To bepragmatic about this issue, most 3DTV providers are not contemplating deliveringfull resolution as just described and/or the transmission of two fully independentchannels (simulcasting), but some compromise; for example, lowering the per eyedata rate such that a 3DTV program ﬁts into a commercial-grade HDTV channel(say 8–10 Mbps), using time interleaving or spatial compression—again, this isdoable but comes with the degradation of ultimate resolution quality. There are a number of alternative transport architectures for 3DTV signals,also depending on the underlying media. As noted, the service can be sup-ported by traditional broadcast structures including the DVB architecture, wireless3G/4G transmission such as DVB-H approaches, Internet Protocol (IP) in sup-port of an IPTV-based service (in which case it also makes sense to considerIPv6) and the IP architecture for internet-based delivery (both non–real timeand streaming). The speciﬁc approach used by each of these transport meth-ods will also depend on the video-capture approach. One should note that inthe United States, one has a well-developed cable infrastructure in all Tier 1 andTier 2 metropolitan and suburban areas; in Europe/Asia, this is less so, with moreDTH delivery (in the United States DTH tends to serve more exurban and ruralareas). A 3DTV rollout must take these differences into account and/or accom-modate both. In reference to possible cable TV delivery, CableLabs announcedat press time that it started to provide testing capabilities for 3D TV implemen-tation scenarios over cable; these testing capabilities cover a full range of tech-nologies including various frame-compatible, spatial multiplexing solutions fortransmission . Standards are critical to achieving interworking and are of great value to bothconsumers and service providers. The MPEG of the International Organization forStandardization/International Electrotechnical Commission (ISO/IEC) has beenworking on coding formats for 3D video (and has already completed some ofthem.) The Society of Motion Picture and Television Engineers (SMPTE) 3DHome Entertainment Task Force has been working on mastering standards. TheRapporteur Group on 3DTV of the International Telecommunications Union-Radiocommunications Sector (ITU-R) Study Group 6, and the TM-3D-SM groupof DVB were working on transport standards.
16 INTRODUCTION1.2.2 Opportunities and Challenges for 3DTVThe previous section highlighted that many of the components needed to supportan end-to-end commercial broadcast service are available or are becoming avail-able. Hence, proponents see a signiﬁcant market opportunity at this time. CEAestimates that more than 25% of sets sold in 2013 will be 3D-enabled. A handfulof representative quotes from proponents of the 3D technology are as follows: No one can escape the buzz and excitement around 3D. We’re witnessing the start of dramatic change in how we view TV—the dawn of a new dimension. And through Sky’s clear commitment to 3D broadcasting, 3D in the home is set to become a reality . . . . . . . The next killer application for the home entertainment industry—3DTV . . . [It] will drive new revenue opportunities for content creators and distributors by enabling 3D feature ﬁlms and other programming to be played on their home television and computer displays—regardless of delivery channels . . . . . . . The most buzzed about topics at CES: 3-D stereoscopic content creation . . . Several pivotal announcements [in] 2010 including 3D-TV releases from the major consumer electronics manufacturers and the launch of several dedicated 3D broadcast channels are driving the rapid increase in demand for 3-D content . . . . 3D technology is now positioned “to become a major force in future in-home enter- tainment.”. . . . 3DTV is one of the ‘hottest’ subjects today in broadcasting. The combination of the audience’s‘wow’ factor and the potential to launch completely new services, makes it an attractive subject for both consumer and professional. There have already been broadcasts of a conventional display-compatible system, and the ﬁrst HDTV channel compatible broadcasts are scheduled to start in Europe in the Spring of 2010 . . . . . . . In Europe, the EC is currently funding a large series of projects for 3DTV, including multiview, mobile 3D and 3D search . . . . Naturally, while there are proponents of the 3DTV technology, at the sametime, there are industry observers that take a more conservative view. Theseobservers make note that there are uncertainties about the availability of content,the technological readiness, and acceptance in the living room, especially giventhe requirement to use polarized or shutter glasses. A rational approach to marketpenetration is certainly in order; also, the powerful tool of statistically validmarket research can be used to truly measure user interest and willingness to pay.Some representative quotes for a more conservative view of the 3D technologyare given below: . . . In a wide range of demos, companies . . . claim . . . in January 2010 that stereo- scopic 3D is ready for the home. In fact, engineers face plenty of work hammering out
BACKGROUND 17 the standards and silicon for 3DTV products, most of which will ship for the holiday 2010 season . . . . It has proven somewhat difﬁcult to create a 3D system that does not cause ‘eye fatigue’ after a certain time. Most current-generation higher resolution systems also need spe- cial eyeglasses which can be inconvenient. Apart from eye-fatigue, systems developed so far can also have limitations such as constrained viewing positions. Multiple view- point television systems are intended to alleviate this. Stereoscopic systems also allow only limited ‘production grammar’ . . . One should not underestimate the difﬁculty, or the imagination and creativity required, to create a near ‘ideal’ 3DTV system that the public could enjoy in a relaxed way, and for a long period of time . . . . . . . The production process for 3D television requires a fundamental rethinking of the underlying technology. Scenes have to be recorded with multiple imaging devices that may be augmented with additional sensor technology to capture the three-dimensional nature of real scenes. In addition, the data format used in 3D television is a lot more complex. Rather than normal video streams, time-varying computational models of the recorded scenes are required that comprise of descriptions of the scenes’ shape, motion, and multiview appearance. The reconstruction of these models from the multiview sensor data is one of the major challenges that we face today. Finally, the captured scene descriptions have to be shown to the viewer in three-dimensions which requires completely new display technology . . . . . . . The conventional stereoscopic concept entails with two views: it relies on the basic concept of an end-to-end stereoscopic video chain, that is, on the capturing, trans- mission and display of two separate video streams, one for the left and one for the right eye. [Advocates for the autostereoscopic approach argue that] this conventional approach is not sufﬁcient for future 3DTV services. The objective of 3DTV is to bring 3D imaging to users at home. Thus, like conventional stereo production and 3D Cin- ema, 3DTV is based on the idea of providing a viewer with two individual perspective views—one for the left eye and one for the right eye. The difference in approaches, however, lies in the environment in which the 3D content is presented. While it seems to be acceptable for a user to wear special glasses in the darkened theatrical audito- rium of a 3D Digital Cinema, [many, perhaps] most people would refuse to wear such devices at home in the communicative atmosphere of their living rooms. Basically, auto-stereoscopic 3D displays are better suited for these kinds of applications . . . . The two greatest industry-wide concerns [are]: (1.) That poor quality stereoscopic TV will‘poison the water’ for everyone. Stereoscopic content that is poorly realized in grammar or technology will create a reputation of eyestrain which cannot be shaken off. This has happened before in the 30s, the 50s, and the 80s in the cinema. (2.) That fragmentation of technical standards will split and confuse the market, and prevent stereoscopic television from ever being successful . . . . . . . people may quickly tire of the novelty. I think it will be a gimmick. I suspect there will be a lot of people who say it’s sort of neat, but it’s not really comfortable . . . . The challenge for the stakeholder is to determine where the “true” situation is,whether it is at one extreme, at the other extreme, or somewhere in the middle. An
18 INTRODUCTIONabbreviated list of issues to be resolved in order to facilitate broad deploymentof 3DTV services, beyond pure technology and encoding issues, include thefollowing : • Production grammar (3D production for television still in infancy) • Compatibility of systems (also possibly different issues for pay TV and free-to-air operators) • Assessment of quality/suitability – Methodologies for the quality assessment of 3D TV systems; – Parameters that need to be measured that are speciﬁc to 3D TV; – Sensation of reality; – Ease of viewing. • Understanding what the user requirements are.In general, a technology introduction process spans three phases: • Phase 1: The technology becomes available in basic form to support a given service; • Phase 2: A full panoply of standards emerges to support widespread deploy- ment of the technology; • Phase 3: The technology becomes inexpensive enough to foster large-scale adoption by a large set of end-users. With reference to 3DTV, we ﬁnd ourselves at some early point in Phase 1.However, there are several retarding factors that could hold back short-termdeployment of the technology on a broad scale, including deployment and servicecost (overall status of the economy), standards, content, and quality. The previous assertion can be further elaborated as follows: ITU-R WP 6Cclassiﬁes 3D TV systems into two groups. The “ﬁrst generation” systems areessentially those based on “Plano-stereoscopic” display of single or multiplediscrete lateral left and right eye pictures. Recommendations for such systemsshould be possible in the near future. The “second generation” systems are thosewhich are based on object-wave recording (holography) or approximations ofobject-wave recording. Recommendations for such systems may be possible in theyears ahead. We reﬁne these observations by deﬁning the following generationsof 3DTV technology: • Generation 0: Anaglyth TV transmission; • Generation 1: 3DTV that supports plano-stereoscopic displays, which are stereoscopic (that is, require active or passive glasses); • Generation 2: 3DTV that supports plano-stereoscopic displays, which are autostereoscopic (do not require glasses);
COURSE OF INVESTIGATION 193DTV epoch 0 Generation 0:Yesterday anaglypth TV Generation 2.5: Generation 1: Generation 2:3DTV epoch 1 plano-stereoscopic displays plano-stereoscopic displays plano-stereoscopic displays autostereoscopicThis decade stereoscopic autostereoscopic multiple (N = 9) views 2010–2013 2013–2015 20163DTV epoch 2 Generation 3: Generation 4: Generation 5:Speculative integral imaging, volumetric displays, object–wave transmission> 2020? transmission, and displays transmission, and displays re-creationNote: These blocks intend to represent a deployed commercial service, not just prototypes and/or trials Figure 1.9 Three epochs of 3DTV commercial deployment. • Generation 2.5: 3DTV that supports plano-stereoscopic displays, which are autostereoscopic (do not require glasses) and support multiple (N = 9) views; • Generation 3: 3DTV that supports integral imaging, transmission, and dis- plays; • Generation 4: 3DTV that supports volumetric displays, transmission, and displays; • Generation 5: 3DTV that supports object-wave transmission.See Figs. 1.9 and 1.10 (partially based on Ref. 2). Whether and when we getbeyond Generation 2.5 in this decade remains to be seen. This text, and the cur-rent commercial industry efforts concentrate on Generation 1 services. At presstime, we ﬁnd ourselves in Phase 1 of Generation 1. The existing commercialvideo infrastructure can handle 3D video in a basic, developmental form; how-ever, providing HD 3D with motion graphics is not achievable without makingenhancements to such infrastructure. Existing infrastructures, including satelliteand/or terrestrial distribution networks for example, can handle what some havetermed “half-HD resolution” per eye, or frame formats of 1080i, 1080p24, and1080i60. Existing encoders and set-top boxes can be used as long as signalingissues are addressed and 3D content is of a consistent form. The drawback ofhalf-HD resolution is that images can be blurry, especially for sporting events andhigh-action movies . New set-top chip sets are required for higher resolution3DTV.1.3 COURSE OF INVESTIGATIONWhile there is a lot of academic interest in various aspects of the overall system,service providers and the consumers ultimately tend to take a system-level view.While service providers do, to an extent, take a constructionist bottom-up view to
20 INTRODUCTION • Generation 0: Anaglypth TV transmission • Anaglypth Analog Digital • Stereoscopy • earliest form: known for ~ 170 years • simplest, based on “perception” • two simultaneous video/image to two eyes – color-based filtering (anaglyphs) • Generation 1: 3DTV that supports – polarization-based filtering plano-stereoscopic displays, – shutter-based filtering which are stereoscopic • other forms exist (that is, require active or • pulfrich effect stereoscopy passive glasses) • Generation 2: 3DTV that supports • Autostereoscopic viewing • no special eye-wear plano-stereoscopic displays, • lenticular or barrier technologies which are autostereoscopic • sweet-spot phenomenon (do not need glasses) • Generation 2.5: 3DTV that supports • Mult-view autostereoscopy plano-stereoscopic displays, • many simultaneous horizontally which are autostereoscopic spaced views (do not need glasses) and support • usually 7-9 views; may multiple go up to ~ 50 (N = 9) views • some horizontal parallax • Integral imaging • Generation 3: 3DTV that supports integral • known since 1905 imaging, transmission, and • microlens arrays during capture displays and projection • 2D array of many elemental images • both vertical and horizontal parallax • light field renderer in the limit – replicates 3D physical light distribution: True 3D technique – “incoherent Holography” • Volumetric 3D displays • Generation 4: 3DTV that supports volumetric • sweeping 3D volume either mechanically displays, transmission, and or electronically displays • voxels – self-luminous pixels – moving projection screens • Holography • basic principle—1948, first 30-year timeline: 1995–2025 holograms—1960 • based on physics: duplication of light field: True 3D technique • recording on • Generation 5: 3DTV that supports object–wave – photographic films transmission – high resolution CCD arrays • 3D reconstruction by proper illumination of the recording • digital holographic techniques • experimental holographic motion pictures—1989 • still at basic research phase Figure 1.10 A 30-year timeline for 3DTV services (1995–2025).
COURSE OF INVESTIGATION 21 Millions... Each viewer 2D decoder IGMPing v1c1, v1c2, v1c3,... 2D display 2D SD, HD Millions... 2D Encoder and v1camera content production m content providers CSV decoder Hundreds 200–300 CSV Aggregator content 3D display channels Each viewerStereo Encoder and 3D v1camera content production v2 m content IGMPing v1c1+v2c1, v1c2+v2c2, v1c3+v2c3,... providers Hundreds Millions... V + meta Aggregator Encoder and 3D IPTV withStereo v1 V+D decoder meta m content service providercamera content production providers managed IP network 3D display Hundreds Aggregator Each viewer V+D (DVB transmission) IGMPing v1c1+d1c1, v1c2+d2c2, v1c3+d3c3,... Depth Encoder and 3D v1 m contentcamera content production d providers Millions... Aggregator Hundreds MV + D 2D display v1 m content 200–300 MV+D decoder Multi- Encoder and 3D providers content with DIBRcamera content production vn Aggregator channels Hundreds (3v, d) Each viewer IGMPing v1c1+v5c1+ v9c1, v1c2+v5c2+v9c3,... M-view 3D display (a) All channels: each viewer selecting v1c1, v1c2, v1c3,... DTH DTH 3D DTH DTH Millions... 2D decoder 3D 3D 2D display 2D SD, HD DTH Millions... 3D DTH 2D Encoder and DTH v1camera content production m content DTH providers CSV decoder Hundreds CSV Aggregator All channels: 3D displayStereo Encoder and 3D v1 each viewercamera content production v2 m content selecting v1c1+v2c1, v1c2+v2c2, v1c3+v2c3,... providers Millions... Hundreds DTH V + meta Aggregator DTH DTHStereo Encoder and 3D v1 DTH V+D decoder meta m contentcamera content production providers All channels: 3D display Hundreds Aggregator each viewer V+D selecting v1c1+d1c1, v1c2+d2c2, v1c3+d2c3,... Depth Encoder and 3D v1 m contentcamera content production d providers Aggregator Millions... Hundreds DTH DTH MV + D DTH 2D display m content DTH MV+D decoder Multi- Encoder and 3D v1 with DIBR providerscamera content production vn Aggregator All channels: Hundreds (3v, d) each viewer selecting v1c1+ v5c1+ v9c1, v1c2+v5c2+v9c3,... M-view 3D display (b) Figure 1.11 A system view of a fully developed 3DTV distribution ecosystem.
22 INTRODUCTIONdeploy the technological building blocks (such as encoders, encapsulators, IRDs[Integrated Receiver/Decoders], set-top boxes, and so on), 3DTV stakeholdersneed to consider the overall architectural system-level view of what it will taketo deploy an infrastructure that is able to reliably and cost-effectively deliver acommercial-grade quality bundle of multiple 3DTV content channels to payingcustomers with high expectations. This text, therefore, takes such a system-levelview, namely how to actually deploy the technology. Figure 1.11 depicts the3DTV distribution ecosystem that this text addresses. Fundamental visual concepts supporting stereographic perception of 3DTV arereviewed in Chapter 2. 3DTV technology and digital video compression princi-ples are discussed in Chapter 3. Elements of an end-to-end 3DTV system arecovered from a satellite deliver perspective in Chapter 4. Transmission technolo-gies are assessed for terrestrial and IPTV-based architecture in Chapter 5. Finally,Chapter 6 covers standardization activities that are critical to any sort of broaddeployment. This author recently published a companion text, “3D Television (3DTV)Technology, Systems, and Deployment—Rolling out the Infrastructure forNext-Generation Entertainment” (published by Francis and Taylor, 2010), whichaddresses the broader issues related to technologies listed in Table 1.1 andwith an emphasis on post-CSV systems. At this time, as noted earlier, theCSV approach is the mostly likely to see early deployment in commercial3DTV systems. Table 1.4 identiﬁes the approaches that have been advancedby researchers for the treatment of the video images after their immediatecapture by a stereo (or multi-view) set of cameras [21–23]. The most commonapproaches are CSV, video plus depth (V + D), multi-view video plus depth(MV + D), and layered depth video (LDV). We provide a brief discussion ofthese other systems in the chapters that follow, but we do not focus on them;the reader is referred to our companion text for a more detailed discussion ofthese systems.TABLE 1.4 Common Video Treatment ApproachesaVideo TreatmentApproach DescriptionConventional stereo CSV is the most well-known and in a way, the simplest type video (CSV) of 3D video representation. Only color pixel video data are involved and are captured by at least two cameras. The resulting video signals may undergo some processing steps like normalization, color correction, and rectiﬁcation but in contrast to other 3D video formats, no scene geometry information is involved. The video signals are meant in principle to be directly displayed using a 3D display system, though some video processing might also be involved before display
COURSE OF INVESTIGATION 23TABLE 1.4 ContinuedVideo TreatmentApproach DescriptionVideo plus depth The video plus depth (V + D) representation consists of a video (V + D) signal and a per pixel depth map. (This is also called 2D plus depth by some and color plus depth by others). Per pixel depth data is usually generated from calibrated stereo or multi-view video by depth estimation and can be regarded as a monochromatic, luminance-only video signal. The depth range is restricted to a range in between two extremes z near and z far indicating the minimum and maximum distance respectively, of the corresponding 3D point from the camera. Typically, the depth range is quantized with 8 bit, associating the closest point with the value 255 and the most distant point with the value 0. With that, the depth map is speciﬁed as a grayscale image that can be fed into the luminance channel of a video signal and then be processed by any state-of-the-art video codec. For displaying V + D at the decoder, a stereo pair can be rendered from the video and depth information by 3D warping with camera geometry informationMulti-view video Advanced 3D video applications are wide-ranging multi-view plus depth autostereoscopic displays and free viewpoint videos, where the user (MV + D) can choose an own viewpoint. They require a 3D video format that allows rendering a continuum of output views or a very large number of different output views at the decoder. Multi-view video by itself does not support a continuum and coding is increasingly inefﬁcient for a large number of views. V + D supports only a very limited continuum around the available original view since view synthesis artifact increase dramatically with the distance of the virtual viewpoint. Therefore, a MV + D representation is required for advanced 3D video applications. MV + D involves a number of complex and error-prone processing steps. Depth has to be estimated for the N views at the sender. N color with N depth videos have to be encoded and transmitted. At the receiver, the data have to be decoded and the virtual views have to be rendered. The multi-view video coding (MVC) standard–developed MPEG supports this format and is capable of exploiting the correlation between the multiple views that are required to represent 3D videoLayered depth Layered depth video is a derivative and alternative to MV + D. It video (LDV) uses one color video with associated depth map and a background layer with associated depth map. The background layer includes image content that is covered by foreground objects in the main layer. LDV might be more efﬁcient than MV + D because less data have to be transmitted. On the other hand, additional error-prone vision tasks are included that operate on partially unreliable depth data that may increase artifactsa Based on concepts from: 3DPHONE Document “All 3D Imaging Phone, 7th Framework Pro-gramme”.
24 INTRODUCTIONREFERENCES 1. Steinberg S. 3DTV: Is the World Really Ready to Upgrade? Digital Trends, Online Magazine. Jan 7, 2010. 2. Onural L. The 3DTV Toolbox—The Results of the 3DTV NoE. 3DTV NoE Coordi- nator, Bilkent University, Workshop on 3DTV Broadcasting, Geneva. Apr 30, 2009. 3. Dosch C, Wood D. Can we create the “holodeck”? The challenge of 3D television. ITU News Magazine: Article: Issue No 09. Nov 2008. 4. Wallop H. CES 2010: 3D TVs on sale in UK by April. Telegraph. Jan 7, 2010. telegraph.co.uk. 5. Tarr G. BDA Welcomes 3D into Blu-ray Fold. TWICE. Jan 8, 2010. 6. Shilov A. Blu-Ray Disc Association Finalizes Stereoscopic 3D Speciﬁcation: Blu- Ray 3D Spec Finalized: New Players Incoming. xbitslabs On line Magazine. Dec 18, 2009. http://www.xbitlabs.com. 7. 3D TV Round-Up. ITVT Online Magazine. Jan 6, 2010. 8. Aylsworth W. New SMPTE 3D Home Content Master Requirements Set Stage For New Market Growth. Las Vegas (NV): National Association of Broadcasters; 2009. 9. Otellini P. Intel Corporation President and CEO, Keynote Speech, Consumer Elec- tronics Show, Las Vegas (NV). Jan 7, 2010.10. 3-D Video Changes the Consumer Content Experience, CEA/ETC@USC SURVEY FINDS. Joint Consumer Study of the Consumer Electronics Association and the Entertainment and Technology Center at the University of Southern California. Feb 20, 2009.11. Digital Video Broadcasting Project (DVB), Online website material regarding the launch of the DVB 3D TV Kick-Off Workshop. Jan 2010.12. Dosch C. Toward Worldwide Standards for First and Second Generation 3d TV. Workshop on Three-Dimensional Television Broadcasting. Organized jointly by ITU- R, EBU and SMPTE, Geneva. April 30, 2009.13. Merritt R. Incomplete 3DTV Products in CES Spotlight HDMI Upgrade One of Latest Pieces in Stereo 3D Puzzle. EE Times. Dec 23, 2009.14. ITU-R Newsﬂash. ITU Journey To Worldwide “3D Television” System Begins, Geneva. Jun 3, 2008.15. Rosenhahn B, editor. D26.3 Technical Report # 3 on 3D: Time-varying Scene Cap- ture Technologies. Project Number: 511568, Project Acronym: 3DTV Initiative Title: Integrated Three-Dimensional Television—Capture, Transmission and Display. TC1 WP7 Technical Report 3. March 2008.16. Kauff P, M¨ ller M, et al. ICT- 215075 3D4YOU, Deliverable D2.1.2: Requirements u on Post-production and Formats Conversion. August 2008.17. Wood D. Adding value to 3D TV standards, Chair, ITU-R WP 6C. Apr 29, 2009. Comments found at International Telecommunication Union website. www.itu.int.18. Steenhuysen J. For Some, 3D Movies a Pain in the Head. Reuters. Jan 11, 2010.19. ITU-R Activities in 3D WP6C Rapporteurs for 3D TV. Apr 30, 2009.20. TVB, Television Broadcast. A 3DTV Update From the MPEG Industry Forum. Online Magazine. Jan 20, 2010. www.televisionbroadcast.com.
FURTHER READING 2521. IST–6th Framework Programme, 3DTV NoE, 2004. Project Coordinator: Prof. Levent Onural, EEE Department, Bilkent University, TR-06800 Ankara, Turkey.22. 3DPHONE. Project no. FP7-213349, Project title: All 3D Imaging Phone, 7th Framework Programme, Speciﬁc Programme “Cooperation”, FP7-ICT- 2007.1.5—Networked Media, D5.2- Report on ﬁrst study results for 3D video solu- tions. Dec 31, 2008.23. 3DPHONE, Project no. FP7-213349, Project title: All 3D Imaging Phone, 7th Framework Programme, Speciﬁc Programme “Cooperation”, FP7-ICT- 2007.1.5—Networked Media, D5.1- Requirements and speciﬁcations for 3D video. Aug 19, 2008.24. TVB Magazine. TVB’s 3DTV Timeline. Online Magazine. Jan 5, 2010. www. televisionbroadcast.com.FURTHER READINGOzaktas HM, Onural L, editors, Three-dimensional television: capture, transmission, display. New York: Springer Verlag; 2008. XVIII, 630, p. 316 illus. ISBN: 978-3- 540-72531-2.Bahram J, Fumio Okano, editors, Three-dimensional television, video and display tech- nology. New York: Springer Verlag; 2002.Schreer O, Kauff P, Sikora T, editors, 3D Videocommunication: Algorithms, concepts and real-time systems in human-centered communication (Hardcover), New York: Wiley, John & Sons; 2005, ISBN-13: 9780470022719.Minoli D, 3D Television (3DTV) Technology, Systems, and Deployment—Rolling out the Infrastructure for Next-Generation Entertainment. Francis and Taylor; 2010.
26 INTRODUCTIONAPPENDIX A1: SOME RECENT INDUSTRY EVENTS RELATED TO 3DTVThis appendix includes a listing of events during the year prior to the publicationof this text, so as to further document the activity in this arena. It is based in itsentirety on Ref. 24. Despite the economic difﬁculties of 2009, the year marked a turning point in the adoption of 3D a viable entertainment format. TVB presents a timeline of 3D video developments over the last year, from content to workﬂow initiatives to display tech- nologies: December 4, 2008 : The San Diego Chargers and the Oakland Raiders appeared in a 3D simulcast displayed at theaters in Boston, Hollywood, and New York. January 8, 2009 : A 3D version of the Gators–Sooners match-up was simulcast in Las Vegas at the Consumer Electronics Show. February 14, 2009 : The NBA’s All-Star Game was simulcast in 3D. February 24, 2009 : Toshiba announces the development of OLED Wallpaper TV, with a 3D version utilizing circularly polarized light in the works. March 2, 2009 : Avid Technology announced it was developing native support for the Sony XDCAM format, as well as adding 3D capabilities to its various editing software packages, Composer and Symphony. March 9, 2009 : BSkyB continued plowing toward 3DTV, with a goal to offer it by the end of the year. April 6, 2009 : BSkyB successfully transmitted live 3DTV across its HD infrastructure in the United Kingdom. April 20, 2009 : At the NAB show in Las Vegas, Panasonic announced work on a full 3D HD production system, encompassing everything from capture to Blu- ray distribution. The Panasonic gear list comprised authoring, a twin-lens P2 camera recorder and drives, 3D Blu-ray discs and players, and a 3D plasma display. Panasonic displayed its HD 3D Plasma Home Theater at the NAB convention. July 30, 2009 : BSkyB now plans to launch its 3D channel in 2010. August 24, 2009 : Panasonic joined James Cameron in a ﬂack blitz for “Avatar,” with a multipoint media and sales campaign and a nationwide tour with customized 18- wheelers outﬁtted with 103-inch Panasonic Viera plasma HDTVs and Blu-ray disc players. September 2, 2009 : Sony announced that it planned to introduce a consumer-ready 3D TV set in 2010, as well as build 3D capability into many of its consumer electronics, encompassing music, movies, and video games. September 10, 2009 : Mobile TV production specialist NEP has rolled out its ﬁrst 3D truck.
APPENDIX A1: SOME RECENT INDUSTRY EVENTS RELATED TO 3DTV 27September 11, 2009 : BBC executives say some of the 2012 Olympics Games therecould be carried in 3D.September 12, 2009 : ESPN transmits an invitation-only 3D version of the Univer-sity of SoCal versus Ohio State game to theaters in Los Angeles, Columbus, Ohio;Hartford, Conn.; and Hurst, Texas.September 14, 2009 : IBC features several professional 3D technologies, includingNagravision’s 3D program guide and Viaccess 3D conditional access. The awardsceremony featured a 16-minute clip of James Cameron’s “Avatar.” Skeptics mentionedthe headache factor, as well as the difﬁculty of doing graphics for 3D.September 24, 2009 : In-Stat ﬁnds that about 25 percent of those who are at leastsomewhat interested in having the ability to view 3D content at home, however, wereunwilling to spend more money on a 3D TV.October 1, 2009 : Sony Broadcast bows a new single-lens 3D technology comprisinga new optical system that captures left and right images simultaneously, with existinghigh frame-rate recording technology to realize 240 fps 3D ﬁlming.October 8, 2009 : 3M says it has developed 3D for mobile devices. The autostereo-scopic 3D ﬁlm targets cell phones, small video game consoles, and other portabledigital devices, and requires no glasses.October 21, 2009 : SMPTE’s fall conference focuses on 3D, with Dolby Labs, FoxNetwork, DTS, and RealD lending input.October 26, 2009 : Televisa broadcast the ﬁrst soccer match in 3D over the weekendto parts of Mexico.November 11, 2009 : Rich Greenﬁeld, analyst with Pali Capital, deems 3D a gimmick,at least as far as the movie industry was concerned. The movie “Scrooge” in 3D fueledhis skepticism.November 23, 2009 : Sony chief Sir Howard Stringer is counting on 3D to be thecompany’s next $10 billion business.December 3, 2009 : The International Federation of Football said it would broadcast2010 World Cup soccer matches in 3D. FIFA said it has signed a media rights agree-ment with Sony, one of its ofﬁcial partners, to delivery 3D versions of up to 25matches in the 2010 FIFA World Cup South Africa.December 3, 2009 : Neither Michael Jackson’s videos nor the next Spiderman moviewould be among Sony’s upcoming 3D releases, the company’s top executive said.December 14, 2009 : Two events mark the advance of 3D. First was the debut of a live3D broadcast on the massive display screen at the Dallas Cowboys stadium in Arling-ton, Texas. The second—the release of “Avatar,” James Cameron’s interplanetary 3Depic.December 18, 2009 : “Avatar” changes Greenﬁeld’s doubts about 3D. “We are assum-ing ‘Avatar’ will reach opening weekend attendance levels of about 12 million, with
28 INTRODUCTION 57.5 percent of attendance occurring on 3D screens yielding total opening weekend box ofﬁce of over $100 million,” he said. January 4, 2010 : ESPN announces the intended launch of ESPN 3D in time for the June 11 FIFA World Cup Soccer games. January 5, 2010 : Sony, Discovery, and IMAX make it ofﬁcial, announcing the intended launch of a 3D network in 2011. 3D technologies dominate previews of the annual Consumer Electronics Show in Las Vegas. DIRECTV announced the 2010 launch of an HD 3D channel at the show.
CHAPTER 23DV and 3DTV PrinciplesThe physiological apparatus of the Human Visual System (HVS) which is respon-sible for the human sense of depth has been understood for a long time but itis not trivial. In this chapter we explore some basic key concepts of visual sci-ence that play a role in 3DTV. The concepts of stereoscopic vision, parallax,convergence, and accommodation are covered, among others.2.1 HUMAN VISUAL SYSTEMStereopsis is the binocular sense depth. Binocular as an adjective means “withtwo eyes, related to two eyes.” Depth is perceived by the HVS by way of cues.Binocular cues are depth cues that depend on perception with two eyes. Monoc-ular cues are depth cues that can be perceived with a single eye alone, such asrelative size, linear perspective, or motion parallax. Stereoscopic fusion is theability of the human brain to fuse the two different perspective views into a sin-gle, 3D image. Accommodation is the focusing of the eyes. Convergence is thehorizontal rotation of eyes (or cameras) that makes their optical axes intersectin a single point in 3D space. Interocular distance is the distance between anobserver’s eye—about 64 mm for adults. Disparity is the distance between corresponding points on left- and right-eyeimages. Retinal disparity is the disparity perceived at the retina of the humaneyes. Horopter is the 3D curve that is deﬁned as the set of points in spacewhose images form at corresponding points in the two retinas (i.e., the imagedpoints have zero disparity). Panum’s fusional area is a small region around thehoropter where retinal disparities can be fused by HVS into a single, 3D image.Point of convergence is a point in 3D point where optical axis of eyes (or con-vergent cameras) intersect. The plane of convergence is the depth plane whereoptical rays of sensor centers intersect in case of parallel camera setup. Crosseddisparity represents retinal disparities indicating that corresponding optical raysintersect in front of the horopter or the convergence plane. Uncrossed disparity3DTV Content Capture, Encoding and Transmission: Building the TransportInfrastructure for Commercial Services, by Daniel MinoliCopyright 2010 John Wiley & Sons, Inc. 29
30 3DV AND 3DTV PRINCIPLES a tc Horopter Convergence Panum’s area (area of fusion) Crossed disparity tc Horopter Convergence Uncrossed disparity tc Horopter Convergence Figure 2.1 Mechanism of stereoscopic fusion and retinal disparity.represents retinal (or camera disparities) where the optical rays intersect behindthe horopter or the convergence plane. Interocular distance (also caller inter-pupillary distance) is the distance between an observer’s eye, about 64 mm foradults  (although there is a distribution1 of distances ±12 mm). See Fig. 2.1 for illustrations on concepts of disparity and Fig. 2.2 for theconcept of fusion. Stereoscopy is the method used for creating a pair of planar stereo images.Plano-stereoscopic is the exact term for describing 3D displays that achieve abinocular depth effect by providing the viewer with images of slightly differentperspective at one common planar screen. Depth range is the extent of depththat is perceived when a plano-stereoscopic image is reproduced by means of astereoscopic viewing device.1 This distance may need to be taken into account by content producers. Also a percentage of people(around 6%) have depth impairments in their vision.
HUMAN VISUAL SYSTEM 31 Original object Right-eye view Left-eye view Views are fused to form a single perceived 3D object Figure 2.2 Fusion of left-eye–right-eye images. Corresponding points are the points in the left and right images that arepictures of the same point in 3D space. Parallax is the distance between corre-sponding points in the left- and right-eye images of a plano-stereoscopic image.Parallax angle is the angle under which the optical rays of the two eyes intersectat a particular point in the 3D space. Hence, (binocular) parallax is the apparentchange in the position of an object when viewed from different points (e.g., fromtwo eyes or from two different positions); in slightly different words, an appar-ent displacement or difference in the apparent position of an object viewed alongtwo different lines of sight. Negative parallax stereoscopic presentation occurswhere the optical rays intersect in front of the screen in the viewers’ space (thisrefers to crossed disparity). Positive parallax stereoscopic presentation occurswhere the optical rays intersect behind the screen in the screen space (this refersto uncrossed disparity). Screen space is the region behind the display screensurface. Objects will be perceived in this region if they have positive parallax.
32 3DV AND 3DTV PRINCIPLES (a) (b) (c)Figure 2.3 Parallax: (a) positive parallax, (b) zero parallax, and (c) negative parallax.Viewer space is the region between the viewer and the display screen surface.Objects will be perceived in this region if they have negative parallax (Fig. 2.3). Accommodation/convergence conﬂict is the deviation from the learned andhabitual correlation between accommodation and convergence when viewingplano-stereoscopic images. Binocular rivalry represents perception conﬂicts thatappear in case of colorimetric, geometric, photometric or other asymmetriesbetween the two (recreated) stereo images. Crosstalk is the imperfect sepa-ration of the left- and right-eye images when viewing plano-stereoscopic 3Dcontent. Crosstalk is a physical entity, whereas ghosting is a psychophysicalentity (Fig. 2.4). Convergence ZD a WD P te 3D display Accomodation 3D objects ZvFigure 2.4 Viewing a 3D image on a screen and the related accommodation–convergence conﬂict.
HUMAN VISUAL SYSTEM 332.1.1 Depth/Binocular CuesThe terms we just deﬁned are now further applied. The HVS is2 able to perceivedepth due to the brain’s ability to interpret several types of depth cues thatcan be separated into two broad categories: sources of information that requireonly one eye (e.g., relative size, linear perspective or motion parallax), are calledmonocular cues, whereas those that depend on both eyes are called binocularcues. Everyday scenes usually contain more than one type of depth cue andthe importance of each cue is based on learning and experience. In addition tothis, the inﬂuence of the different cues on human depth perception also dependson the relative distances between the observer and the objects in the scene .Binocular cues are mainly dominant for viewing distances below 10 m and,hence, they are particularly important for 3DTV. They are based on the fact thatthe human eyes are horizontally separated. Each eye provides the brain with aunique perspective of the observed scene. This horizontal separation—on averageapproximately 64 mm for an adult—is known as interocular distance te . It leadsto spatial distances between the relative projections of observed 3D points in thescene onto the left and the right retina, also known as retinal disparities. Thesedisparities provide the HVS with information about the relative distance of objectsand about the spatial structure of our 3D environment. It is the retinal disparitythat allows the human brain to fuse the two different perspective views from theleft and the right eye into a single 3D image. Figure 2.1 illustrates how this process of stereoscopic fusion works in detail.Basically, when looking at the 3D world, the eyes rotate until their optical axesconverge (intersect at a single point) on the “object of interest” . It followsthat the point of convergence is projected to the corresponding image points onthe respective retinas; that is, it does not produce any retinal disparity. The sameholds true for all points on the horopter which is deﬁned by the ﬁxation pointand the nodal points of both eyes. All other points, however, will produce retinaldisparities whose magnitude becomes larger, the further away the 3D points arefrom the horopter. Disparities that are caused by points in front of the horopter are said to becrossed, while disparities that result from points behind the horopter are calleduncrossed. As long as the—crossed or uncrossed—disparities do not exceeda certain magnitude, the two separate viewpoints can be merged by the humanbrain into a single, 3D percept. The small region around the horopter within whichdisparities are fused is known in the literature as Panum’s fusional area. Pointsoutside this area are not fused and double images will be seen, a phenomenonthat is called diplopia.2 Portions of the discussion that follows for the rest of Section 2.1 are based on a public report of the3D4YOU project under the ICT (Information and Communication Technologies) Work Programme2007– 2008 .
34 3DV AND 3DTV PRINCIPLES2.1.2 AccommodationThat the double images just described usually do not disturb visual perception isthe result of another habitual behavior that is tightly coupled with the describedconvergence process. In concert with the rotation of the optical axes, the eyes alsofocus (accommodate by changing the shape of the eye’s lenses) on the object ofinterest. This is important for two different reasons. First of all, focusing on thepoint of convergence allows the observer to see the object of interest clear andsharp. Secondly, the perception of disturbing double images, which in principleresult from all scene parts outside Panum’s fusional area, is efﬁciently suppresseddue to an increasing optical blur . Although particular realizations differ widely in the speciﬁcally used tech-niques, most of all stereoscopic displays and projections are based on the samebasic principle of providing the viewer with two different perspective images forthe left and the right eye. Usually, these slightly different views are presented atthe same planar screen. These displays are therefore called plano-stereoscopicdevices. In this case, the perception of binocular depth cues results from the spa-tial distances between corresponding points in both planar views, that is, from theso-called parallax P that in turn, induces the retinal disparities in the viewer’seyes. Thus, the perceived 3D impression depends on, among others, parameterssuch as the viewing distance on both, the amount and type of parallax.2.1.3 ParallaxAs shown in Fig. 2.3, three different cases have to be taken into account here: 1. Positive Parallax: Corresponding image points are said to have positive or uncrossed parallax P when the point in the right-eye view lies more to the right than the corresponding point in the left-eye view. Thus, the related viewing rays converge in a 3D point behind the screen, so that the reproduced 3D scene is perceived in the so-called screen space. Further- more, if the parallax P exactly equals the viewer’s interocular distance te , the 3D point is reproduced at inﬁnity. This also means that the allowed maximum of the positive parallax is limited to te . 2. Zero Parallax: With zero parallax, corresponding image points lie at the same position in the left- and the right-eye views. The resulting 3D point is therefore observed directly at the screen, a situation that is often referred to as the Zero Parallax Setting (ZPS). 3. Negative Parallax: Conjugate image points with negative or crossed par- allax P are located such that the point in the right-eye view lies more to the left than the corresponding point in the left-eye view. The viewing rays therefore converge in a 3D point in front of the screen in the so-called viewer space. The parallax angle is unlimited when looking at a real-world 3D scene. Inthis case, the eyes simultaneously converge and accommodate on the object
3DV/3DTV STEREOSCOPIC PRINCIPLES 35of interest. As explained, these jointly performed activities allow the viewer tostereoscopically fuse the object of interest and, at the same time, to suppressdiplopia (double image) effects for scene parts that are outside the Panum’sfusional area around the focused object. However, the situation is different instereoreproduction. When looking at a stereoscopic 3D display, the eyes alwaysaccommodate on the screen surface, but they converge according to parallax(Fig. 2.4). This deviation from the learned and habitual correlation betweenaccommodation and convergence is known as accommodation–convergenceconﬂict. It represents one of the major reasons for eyestrain, confusion, and lossof stereopsis in 3D stereoreproduction [5–7]. It is therefore important to makesure that the maximal parallax angle αmax is kept within acceptable limits or,in other words, to guarantee that the 3D world is reproduced rather close to thescreen surface of the 3D display. The related generation of planar stereoscopic views requires capture with asynchronized stereocamera. Because such 2-camera systems are intended to medi-ate the natural binocular depth cue, it is not surprising that their design shows astriking similarity with the HVS. For example, the interaxial distance tc betweenthe focal points of left- and the right-eye camera lens is usually chosen in relationto the interocular distance te . Furthermore, similar to the convergence capabilityof the HVS, it must be able to adapt a stereocamera to a desired convergencecondition or ZPS; that is, to choose the part of the 3D scene that is going tobe reproduced exactly on the display screen. As shown in Fig. 2.5, this can beachieved by two different camera conﬁgurations [8, 9]. 1. “Toed-In” Setup: With the toed-in approach, depicted in Fig. 2.4(a), a point of convergence is chosen by a joint inward rotation of the left- and the right-eye cameras. 2. “Parallel” Setup: With the parallel method, shown in Fig. 2.4(b), a plane of convergence is established by a small shift h of the sensor targets. At ﬁrst view, the toed-in approach intuitively seems to be the more suitablesolution because it directly ﬁts the convergence behavior of the HVS. However,it has been shown in the past that the parallel approach is nonetheless preferable,because it provides a higher stereoscopic image quality [8, 9].2.2 3DV/3DTV STEREOSCOPIC PRINCIPLESWe start this section with a few additional deﬁnitions. Stereo means “havingdepth, or being three-dimensional” and it describes an environment where twoinputs combine to create one uniﬁed perception of three-dimensional space.Stereoscopic vision is the process where two eye views combine in the brain tocreate the visual perception of one 3D image; it is a by-product of good binocularvision. Stereoscopy can be deﬁned as any technique that creates the illusion of
36 3DV AND 3DTV PRINCIPLES Z Z a a Ze Ze te te f f f′ f′ h h (a) (b)Figure 2.5 Basic stereoscopic camera conﬁgurations: (a) “toed-in” approach, and (b)“parallel” setup.depth of three-dimensionality in an image. Stereoscopic (literally: “solid look-ing”) is the term to describe a visual experience having visible depth as well asheight and width. The term may refer to any experience or device that is associ-ated with binocular depth perception. Stereoscopic 3D refers to two photographstaken from slightly different angles that appear three-dimensional when viewedtogether. Autostereoscopic describes 3D displays that do not require glasses tosee the stereoscopic image. Stereogram is a general term for any arrangementof left-eye and right-eye views that produces a three-dimensional result that mayconsist of (i) a side-by-side or over-and-under pair of images; (ii) superimposedimages projected onto a screen; (iii) a color-coded composite (anaglyph); (iv)lenticular images; or (v) alternate projected left-eye and right-eye images that fuseby means of the persistence of vision . Stereoplexing (stereoscopic multiplex-ing) is a mechanism to incorporate information for the left and right perspectiveviews into a single information channel without expansion of the bandwidth. On the basis of the principles discussed above, a number of techniques forre-creating depth for the viewer of photographic or video content have beendeveloped. Considerable amount of research has taken place during the past 30or more years on 3D graphics and imaging; most of the research has focused onphotographic techniques, computer graphics, 3D movies, and holography (theﬁeld of imaging, including 3D imaging relates more to the static or quasi-staticcapture/representation—encoding, compression/transmission/display/storage ofcontent, for example, photographs, medical images, CAD/CAM drawings, and soon, especially for high-resolution applications—this topic is not covered here).
3DV/3DTV STEREOSCOPIC PRINCIPLES 37 Fundamentally, the technique known as “stereoscopy” has been advanced,where two pictures or scenes are shot, one for each eye, and each eye ispresented with its proper picture or scene, in one fashion or another (Fig. 2.6).Stereoscopic 3D video is based on the binocular nature of human perception;to generate quality 3D content, the creator needs to control the depth andparallax of the scene, among other parameters. Depth perception is the abilityto see in 3D to allow the viewer to judge the relative distances of objects; depthrange is a term that applies to stereoscopic images created with cameras. Asnoted above, parallax is the apparent change in the position of an object whenviewed from different points; namely, the visual differences in a scene whenFigure 2.6 Stereoscopic capture of scene to achieve 3D when scene is seen with appro-priate display system. In this ﬁgure the separation between the two images is exaggeratedfor pedagogical reasons (in actual stereo photos the differences are very minute).
38 3DV AND 3DTV PRINCIPLES Statue Wallpaper Statue Wallpaper (2D image) (3D object) (2D image) Horizontal (binocular) parallax Object plane Figure 2.7 Generation of horizontal parallax for stereoscopic displays.viewed from different points. A 3D display (screen) needs to generate somesort of parallax, which, in turn, creates a stereoscopic sense (Fig. 2.7). Nearbyobjects have a larger parallax than more distant objects when observed fromdifferent positions; because of this feature, parallax can be used to determinedistances. Because the eyes of a person are in different positions on the head,they present different views simultaneously. This is the basis of stereopsis, theprocess by which the brain exploits the parallax due to the different views fromthe eye to gain depth perception and estimate distances to objects. 3D depthperception can be supported by 3D display systems that allow the viewer toreceive a speciﬁc and different view for each eye; such a stereo pair of viewsmust correspond to the human eye positions, thus enabling the brain to computethe 3D depth perception. In recent years, the main means of stereoscopic displayhas moved over the years from anaglyph to polarization and shutter glasses. Some basic terms and concepts related to camera management for stereoscopicﬁlming are as follows: interaxial distance is the distance between the left- andright-eye lenses in a stereoscopic camera. Camera convergence is the termused to denote the process of adjusting the ZPS in a stereoscopic camera. ZPSdeﬁnes the point(s) in 3D space that have zero parallax in the plano-stereoscopicimage created; for example, with a stereoscopic camera. These points will bestereoscopically reproduced on the surface of the display screen. Two simultaneous conventional 2D video streams are produced by a pair ofcameras mimicking the two human eyes that see the environment from twoslightly different angles. Simple planar 3D ﬁlms are made by recording separate
3DV/3DTV STEREOSCOPIC PRINCIPLES 39images for the left eye and the right eye from two cameras that are spaced acertain distance apart. The spacing chosen affects the disparity between the left-eye and the right-eye pictures, and thereby the viewer’s sense of depth. Whilethis technique achieves depth perception, it often results in eye fatigue afterwatching such a programming for a certain amount of time: within minutes afterthe onset of viewing, stereoscopy frequently causes eye fatigue and, in some,feelings similar to those experienced during motion sickness . Nevertheless,the technique is widely used for (stereoscopic) photography and moviemaking,and it has been tested many times for television . At the display level, one of these streams is shown to the left eye, and theother one to the right eye. Common means of separating the right-eye and left-eyeviews include glasses with colored transparencies, polarization ﬁlters, and shutterglasses. Polarization of light is the arrangement of beams of light into separateplanes or vectors by means of polarizing ﬁlters; when two vectors are crossedat right angles, vision or light rays are obscured. In the ﬁlter-based approach,complementary ﬁlters are placed jointly over two overlapping projectors (whenprojectors are used—refer back to Table 1.3) and over the two correspondingeyes (i.e., anaglyph, linear or circular polarization, or the narrow-pass ﬁlteringof Inﬁtec) . Although the technology is relatively simple, the necessity ofwearing glasses while viewing has often been considered a major obstacle to thewide acceptance of 3DTV. Also, there are some limitations to the approach, suchas the need to retain a head orientation that works properly with the polarizedlight (e.g., do not bend the head 45 degrees side to side), and the need to bewithin a certain viewing angle. There are a number of other mechanisms todeliver binocular stereo, including barrier ﬁlters over LCDs (vertical bars act asa fence, channeling data in speciﬁc directions for the eyes). It should be noted as we wrap up this brief overview of the HVS that indi-viduals vary along a continuum in their ability to process stereoscopic depthinformation. Studies have shown that a relatively large percentage of the popula-tion experience stereodeﬁciencies in depth discrimination/perception if the displayduration is very short, and that a certain percentage of the adult population (about6%) has persistent deﬁciencies. Figure 2.8 depicts the results of a study that quan-tiﬁes these observations . These results indicate that certain fast-cut methodsin scenes may not work for all in 3D. Object motion can also create visual prob-lem in stereoscopic 3DTV. Figure 2.9 depicts visual discomfort that has beenobserved in studies . At the practical level, in the context of cinematography,while new digital 3D technology has made the experience more comfortable formany, for some people with eye problems, a prolonged 3D session may result inan aching head according to ophthalmologists. Some people have very minor eyeproblems (e.g., a minor muscle imbalance), which the brain deals with naturallyunder normal circumstances; but in a 3D movie, these people are confrontedwith an entirely new sensory experience that translates into greater mental effort,making it easier to get a headache. Some people who do not have normal depthperception cannot see in 3D at all. People with eye muscle problems, in whichthe eyes are not pointed at the same object, have trouble processing 3D images.
40 3DV AND 3DTV PRINCIPLES Percentages stereo deficient (left/right depth discrimination) 60 n = 100 50 Any stereodeficiency Stereo–anomalous: uncrossed “Stereo blind” 40 Stereo–anomalous: crossed Percentage of viewers 30 20 10 0 0 200 400 600 800 1000 Display duration (ms) Figure 2.8 Stereo deﬁciencies in some populations . 100 Very comfortable 80 Comfortable Visual comfort 60 Mildly uncomfortable 40 Uncomfortable 20 Externely uncomfortable 0 Slow Medium Fast Velocity (cm/s) Figure 2.9 Visual discomfort caused by motion in a scene .
3DV/3DTV STEREOSCOPIC PRINCIPLES 41Headaches and nausea are cited as the main reasons 3D technology never tookoff. However, newer digital technology addresses many of the problems that typ-ically caused 3D moviegoers discomfort. Some of the problems were related tothe fact that the projectors were not properly aligned; systems that use a sin-gle digital projector help overcome some of the old problems . However,deeper-rooted issues about stereoscopic display may continue to affect a numberof viewers (these problems will be solved by future autostereoscopic systems). The two video views required for 3DTV can be compressed using standardvideo compression techniques. MPEG-2 encoding is widely used in digital TVapplications today and H.264/MPEG-4 AVC is expected to be the leading videotechnology standard for digital video in the near future. Extensions have beendeveloped recently to H.264/MPEG-4 AVC and other related standards to sup-port 3DTV; other standardization work is underway. The compression gains andquality of 3DTV will vary depending on the video coding standard used. Whileinter-view prediction will likely improve the compression efﬁciency compared tosimulcasting (transmitting the two views end-to-end, and so requiring a doublingof the channel bandwidth), new approaches, such as, but not limited to, asym-metric view coding, video-plus-depth, and layered video, are necessary to reducebandwidth requirements for 3DTV . Temporal multiplexing and spatial com-pression are being used in the short term, but with a compromise in resolution,as discussed in Chapter 3. There are a number of ways to create 3D content, including: (i) Computer-Generated Imagery (CGI); (ii) stereocameras; and (iii) 2D to 3D conversions.CGI techniques are currently the most technically advanced, with well-developed methodologies (and tools) to create movies, games, and othergraphical applications—the majority of cinematic 3D content is comprised ofanimated movies created with CGI. Camera-based 3D is more challenging. A2-camera approach is the typical approach here, at this time; another approachis to use a 2D camera in conjunction with a depth-mapping system. With the2-camera approach, the two cameras are assembled with same spatial separationto mimic how the eye may perceive a scene. The technical issues relate tofocus/focal length, speciﬁcally keeping in mind that these have to be matchedprecisely to avoid differences in vertical and horizontal alignment and/orrotational differences (lens calibration and motion control must be added to thecamera lenses). 2D to 3D conversion techniques include the following: • object segmentation and horizontal shifting; • depth mapping (bandwidth-efﬁcient multiple images and viewpoints); • creation of depth maps using information from 2D source images; • making use of human visual perception for 2D to 3D conversion; • creation of surrogate depth map (e.g., gray-level intensities of a color com- ponent). Conversion of 2D material is the least desirable but perhaps it is the approachthat could generate the largest amount of content in the short term. Some notethat it is “easy to create 3D content, but it is hard to create good 3D content” .
42 3DV AND 3DTV PRINCIPLES A practical problem relates to “insertion”. At least early on, 2D content will beinserted into a 3D channel, much the way standard-deﬁnition commercials stillshow up in HD content. A set-top could be programmed to automatically detectan incoming format and handle various frame packing arrangement to support2D/3D switching for advertisements . In summary, and as we transition the discussion to autostereoscopic approaches(and in preparation for that discussion), we list below the highlights of the variousapproaches, as provided in Ref.  (refer back to Table 1.1 for deﬁnition ofterms). Stereoscopy is the Simplest and Oldest Technique: • does not create physical duplicates of 3D light; • quality of resultant 3D effect is inferior; • lacks parallax; • focus and convergence mismatch; • mis-alignment is seen; • “motion sickness” type of a feeling (eye fatigue) is produced; • is the main reason for commercial failure of 3D techniques. Multi-view video provides some horizontal parallax: • still limited to a small angle (∼ 20–45 degrees); • jumping effect observed; • viewing discomfort similar to stereoscopy; • requires high-resolution display device; • leakage of neighboring images occurs. Integral Imaging adds vertical parallax: • gets closer to an ideal light-ﬁeld renderer as the number of lenses (ele- mental images) increase: true 3D; • alignment is a problem; • requires very high resolution devices; • leakage of neighboring images occurs. Holography is superior in terms of replicating physical light distribution: • recording holograms is difﬁcult; • very high resolution recordings are needed; • display techniques are quite different; • network transmission is anticipated to be extremely taxing.2.3 AUTOSTEREOGRAPHIC APPROACHESAutostereo implies that the perception of 3D is in some manner automatic, anddoes not require devices such as glasses—either ﬁltered or shuttered. Autostereo-scopic displays use additional optical elements aligned on the surface of the
AUTOSTEREOGRAPHIC APPROACHES 43screen, to ensure that the observer sees different images with each eye. 3Dautostereoscopic displays (where no headgear needed) are still in the researchphase at this time. We describe here displays based only on lenticular or parallaxbarrier binocular mechanisms (and not, for example holographic approaches). Lenticular lenses are curved optics that allow both eyes to see a differentimage of the same object at exactly the same time. Lenticules are tiny plasticlenses pasted in an array on a transparent sheet that is then applied onto thedisplay surface of the LCD screen (Fig. 2.10). A typical multi-view 3D displaydevice shows nine views simultaneously and allows a limited free-viewing angle(some prototype products support a larger number of views). When looking atthe cylindrical image on the TV, the left and right eye see two different 2Dimages that the brain combines to form one 3D image. The lenslet or lenticular Lenticular lens Left Right Figure 2.10 Lenticular approach.
44 3DV AND 3DTV PRINCIPLESelements are arranged to make parts of an underlying composite image visibleonly from certain viewing directions. Typically, a lenticular display multiplexesseparate images in cycling columns beneath its elements making them take onthe color of selected pixels beneath when viewed from different directions. LCDsor projection sources can provide the pixels for such display . A drawback ofthe technology is that it requires a very speciﬁc “optimal sitting spot” for gettingthe 3D effect, and shifting a small distance to either side will make the TV’simages seem distorted. A parallax barrier is a device used on the surface of non–glasses based3DTV system with slits that allow the viewer to see only certain vertical columnsof pixels at any one time. The parallax barrier is the more consumer-friendlytechnology of the two and the only one that allows for regular 2D viewing. Theparallax barrier is a ﬁne grating of liquid crystal placed in front of the screen,with slits in it that correspond to certain columns of pixels of the TFT (Thin-Film Transistor) screen (Fig. 2.11). These positions are carved so as to transmit 3D display mode parallax barrier on (light cannot pass through.) Figure 2.11 Parallax barrier approach.
REFERENCES 45alternating images to each eye of the viewer, who is again sitting in an optimal“sweet spot.” When a slight voltage is applied to the parallax barrier, its slitsdirect light from each image slightly differently to the left and right eyes, againcreating an illusion of depth and thus a 3D image in the brain . The parallaxbarrier can be switched on and off allowing the screen to be used for 2D or3D viewing. However, the need still exists to sit in the precise “sweet spots,”limiting the usage of this technology. Autostereoscopic technology will likely not be part of early 3DTV deploy-ments. For example, Philips reportedly folded an effort to deﬁne an autostereo-scopic technology that does not require glasses because it had a narrow viewingrange and a relatively high loss of resolution and brightness .REFERENCES 1. Kauff P, M¨ ller M, et al. ICT- 215075 3D4YOU, Deliverable D2.1.2: Requirements u on Post-production and Formats Conversion. Aug 2008. [This reference is not copy- righted.] 2. Cutting JE. How the eye measures reality and virtual reality. Behav Res Methods Instrum Comput 1997; 29(1):27–36. 3. Lipton L. Foundations of the Stereoscopic Cinema—A Study in Depth. Van Nostrand Reinhold, New York: NY, USA; 1982. 4. IJsselsteijn WA, Seunti¨ ns PJH, Meesters LMJ. State-of-the-Art in Human Factors e and Quality Issues of Stereoscopic Broadcast Television. ATTEST Technical Report D1, IST-2001-34396. Aug 2002. 5. IJsselsteijn WA, de Ridder H, Freeman J, Avons SE, Bouwhuis D. Effects of stereo- scopic presentation, image motion, and screen size on subjective and objective cor- roborative measures of presence. Presence 2001; 10(3). 6. Lipton L. Stereographics Developers’ Handbook. Developers’ Handbook. 1997. 7. Pastoor S. Handbuch der telekommunikation, chapter 3D-Displays: methoden und Stand der Technik. K¨ ln: Deutscher Wirtschaftsdienst; 2002. o 8. Woods A, Docherty T, Koch R. Image distortions in stereoscopic video systems. Proceedings of SPIE Stereoscopic Displays and Applications IV; Feb 1993; San Jose (CA). pp. 36–48. 9. Yamanoue H, Okui M, Okano F. Geometrical analysis of puppet-theater and cardboard effects in stereoscopic HDTV images. IEEE Trans Circuits Syst Video Technol 2006; 16(6):744–752.10. The 3D@Home Consortium. http://www.3dathome.org/. 2010.11. Onural L, Ozaktas HM. Three-dimensional television: from science-ﬁction to reality. In: Ozaktas HM, Onural L, editors. Volume XVIII, Three-dimensional television: capture, transmission, display. New York: Springer; 2008. 630 pp. ISBN: 978-3-540- 72531-2.12. Dosch C, Wood D. Can We Create the “Holodeck”? The Challenge of 3D Television. ITU News Magazine: Article: Issue No 09. Nov 2008.13. Baker H, Li Z, Papadas C. Calibrating camera and projector arrays for immersive 3D display. Hewlett-Packard Laboratories Papers; 2009; Palo Alto (CA).
46 3DV AND 3DTV PRINCIPLES14. Tam WJ. Human Visual Perception Relevant to 3D-TV. Ottawa: Communications Research Centre Canada; 2009.15. Steenhuysen J. For Some, 3D Movies a Pain in the Head. Reuters. Jan 11, 2010.16. Christodoulou L, Mayron LM, Kalva H, Marques O, Furht B. 3D TV using MPEG- 2 and H.264 view coding and autostereoscopic displays. International multimedia conference archive. Proceedings of the 14th Annual ACM International Conference on Multimedia; 2006; Santa Barbara (CA). ISBN:1-59593-447-2.17. Chinnock C. 3D Coming Home in 2010. 3D@Home White Paper, 3D@Home Con- sortium. http://www.3Dathome.org.18. TVB, Television Broadcast. A 3DTV Update from the MPEG Industry Forum. Online Magazine. Jan 20, 2010.19. Onural L. The 3DTV Toolbox—The Results of the 3DTV NoE. 3DTV NoE Coordi- nator, Bilkent University, Workshop on 3DTV Broadcasting, Geneva. Apr 30, 2009.20. Patkar M. How 3DTV Works: Part II—Without Glasses. Online Magazine. Oct 26, 2009. http://Thinkdigit.com.21. Merritt R, Incomplete 3DTV Products in CES Spotlight HDMI Upgrade One of Latest Pieces in Stereo 3D Puzzle. EE Times. Dec 23, 2009.
CHAPTER 33DTV/3DV Encoding ApproachesThis chapter looks at some of the subsystems elements that comprise anoverall 3DTV system. Subsystems include elements for the capture, representa-tion/deﬁnition, compression, distribution, and display of the signals. Figure 3.1depicts a logical end-to-end view of a 3DTV signal management system; Fig. 3.2provides additional details. Figure 3.3 provides a more physical perspective.3D approaches are an extension of traditional video capture and distributionapproaches. We focus here on the representation/deﬁnition of the signals andcompression.1 We provide only a brief discussion on the capture and displaytechnology. The reader may refer to [1–4] for more details on capture anddisplay methods and technologies. Distribution is covered in Chapters 4 and 5. We already mentioned in Chapter 1 that the availability of content will becritical to the successful introduction of the 3DTV service and that 3D content ismore demanding in terms of production. Real-time capture of 3D content almostinvariably requires a pair of cameras to be placed side-by-side in what is calleda 3D rig to yield a left-eye, right-eye view of a scene. The lenses on the leftand right cameras in a 3D rig must match each other precisely. The precision ofthe alignment of the two cameras is critical; misaligned 3D video is cumbersometo watch and will be stressful to the eyes. Two parameters of interest for 3Dcamera acquisition are camera separation and toe-in; we already covered theseissues in the previous chapter. This operation is similar to how the human eyeswork: as one focuses on an object in close proximity the eyes toe-in; as onefocuses on remote objects the eyes are parallel. Interaxial distance (also knownas interaxial separation) is the distance between camera lenses’ axes; this can bealso deﬁned as the distance between two taking positions for a stereo photograph.1 Some people refer to compression, in this context, as coding. We use a slightly different nomen-clature. Compression is the application of a bit-reduction technique; for example, an MPEG-basedDiscrete Cosine Transform (DCT) and the MPEG multiplex structure. By coding we mean theencapsulation needed prior to transmission into a network. This may include IP encapsulation, DVBencapsulation, Forward Error Correction (FEC), encryption, and other bit-level management algo-rithms (e.g. scrambling).3DTV Content Capture, Encoding and Transmission: Building the TransportInfrastructure for Commercial Services, by Daniel MinoliCopyright 2010 John Wiley & Sons, Inc. 47
48 3DTV/3DV ENCODING APPROACHES Scene replica 3D scene Capture Representation Compression Coding Signal Transmission Display conversion Capture: capture 3D scene to provide input to 3DTV system Scene representation: abstract representation of captured 3D scene information in digital form Compression: data reduction algorithms Coding: specify the exchange format of the data Transmission: transmit coded data Signal processing: conversion of 3DTV data to suitable forms for 3DTV displays Display: equipment to decode and display 3DTV signal.Figure 3.1 Logical view of end-to-end 3DTV system. (Based in part on the 3DTVProject, 3D Media Cluster.)The baseline distance between visual axes (separation) for the eyes is around 2.5in. (65 mm) although there is a distribution of values as shown in Fig. 3.4 thatmay have to be taken into account by content producers. 3D cameras use the sameseparation for baseline, but the separation can be smaller or larger to accentuatethe 3D effect of the displayed material. The separation will need to be varied fordifferent focal length lenses and by the distance from cameras to the subject .A number of measures can (or better yet, must) be taken to reduce eye fatigue
Scene replica 3D scene Display: Stereoscopic displays based on eye-wear Capture: Autostereoscopic displays Single camera techniques (lenticular, barrier) Multi camera techniques Integral imaging displays Holographic capture devices Volumetric displays Pattern projection techniques Time-of-flight techniques Signal conversion: Technology-dependent for Representation: polarized screens, time-interleaved Pseudo-3D Volumetric screens, LCD, LED, Plasma, Dense depth Texture mapping lenticular displays, barrier displays, Surface-based Light field and/or future displays (integral Point-based Object-based imaging screens, holography, etc.) Compression: Transmission: Stereoscopic video coding Satellite backbone + local distribution –ITU-T Rec. H.262/ISO/IEC 13818-2 Satellite DTH MPEG-2 video (Multiview profile) Terrestrial over the air Multiview video coding (MVC) Cable TV system –H264/AVC can be used for each IP streaming/Internet view independently IP/IPTV (private network) –MVC extension of H 264/AVC (Amend 4) 3G/4G Media (Blu-ray Disc) Internet downloads Coding: DVB-S DVB-C DVB-S2 DVB-H49 DVB-T IP/IPTV Figure 3.2 End-to-end details.
50 3DTV/3DV ENCODING APPROACHES Left eye 3D encoding and video 3D camera compression Blu-ray Disc Cable TV 3D video 3D Video Satellite TV distribution format compres- encode sion Terrestrial TV channels IPTV Internet 3D home package Right eye 3D home master Video 3D decom- format pression decode Media players and Set top boxes 3DTV Figure 3.3 End-to-end 3DTV system. 20 18 16 14 Percentage of viewers 12 10 8 6 4 2 0 45 50 55 60 65 70 75 Interpupillary distance (mm) Figure 3.4 Distribution of interpupillary distance in adults.
3D MASTERING METHODS 51in 3D during content development/creation. Some are considering the creationof 3D material by converting a 2D movie to a stereoscopic product with left-/right-eye tracks; in some instances non–real time conversion of 2D to 3D maylead to (marginally) satisfactory results. It remains a fact however, that it is notstraightforward to create a stereo pair from 2D content (issues relate to objectdepth and reconstruction of parts of the image that are obscured in the ﬁrst eye).Nonetheless, conversion from 2D may play a role in the short term.3.1 3D MASTERING METHODSFor the purpose of this discussion we deﬁne a mastering method as the mechanismused for representing a 3D scene in the video stream that will be compressed,stored, and/or transmitted. Mastering standards are typically used in this process. As alluded to earlier, a 3D mastering standard called “3D Master” is beingdeﬁned by SMPTE. The high-resolution 3D master ﬁle is one that is used to gen-erate other ﬁles appropriate for various channels; for example, theater release,media (DVD, Blu-ray Disc) release, and broadcast (e.g., satellite, terrestrial broad-cast, cable TV, IPTV, and/or Internet distribution). The 3D Master is comprised oftwo uncompressed ﬁles (left- and right-eye ﬁles), each of which has the same ﬁlesize as a 2D video stream. Formatting and encoding procedures have been devel-oped to be used in conjunction with already-established techniques, to deliver3D programming to the home over a number of distribution channels. In addition to normal video encoding, 3D mastering/transmission requiresadditional encoding/compression, particularly when attempting to use legacydelivery channels. Additional encoding schemes for CSV include the following: (i) spatial compression and (ii) temporal multiplexing.3.1.1 Frame Mastering for Conventional Stereo Video (CSV)CSV is the most well-developed and the simplest 3D video representation. Thisapproach deals only with (color) pixels of the video frames captured by the twocameras. The video signals are intended to be directly displayed using a 3Ddisplay system. Figure 3.5 shows an example of a stereo image pair: the samescene is visible from slightly different viewpoints. The 3D display system ensuresthat a viewer sees only the left view with the left eye and the right view withthe right eye to create a 3D depth impression. Compared to the other 3D videoformats, the algorithms associated with CSV are the least complex. A straightforward way to utilize existing video codecs (and infrastructure) forstereo video transmission is to apply one of the interleaving approaches illustratedin Fig. 3.6. A practical challenge is that there is no de facto industry standardavailable (so that any downstream decoder knows what kind of interleaving wasused by the encoder). However, there is an industry movement toward using anover/under approach (also called top/bottom spatial compression).
52 3DTV/3DV ENCODING APPROACHESFigure 3.5 A stereo image pair. (Note: Difference in left-eye/right-eye views is greatlyexaggerated in this and pictures that follow for pedagogical purposes.) (a) (b) (c)Figure 3.6 Stereo interleaving formats: (a) time multiplexed frames; (b) spatial multi-plexed as side-by-side; and (c) spatial multiplexed as over/under.
3D MASTERING METHODS 522.214.171.124 Spatial Compression. When an operator seeks to deliver 3D contentover a standard video distribution infrastructure, spatial compression is a commonsolution. Spatial compression allows the operator to deliver a stereo 3D signal(now called frame-compatible) over a 2D HD video signal making use of thesame amount of channel bandwidth. Clearly, this entails a loss of resolution (forboth the left and the right eye). The approach is to pack two images into a singleframe of video; the receiving device (e.g., set-top box) will, in turn, display thecontent in such a manner that a 3D effect is perceived (these images cannot beviewed in a standard 2D TV monitor). There are a number of ways of combiningtwo frames; the two most common are the side-by-side combination and theover/under combination. As can be seen there, the two images are reformattedat the compression/mastering point to ﬁt into that standard frame. The combinedframe is then compressed by standard methods and delivered to a 3D-compatibleTV, where it is reformatted/rendered for 3D viewing. The question is how to take two frames, a left frame and a right frame, andreformat them to ﬁt side-by-side or over/under in a single standard HD frame.Sampling is involved, but as noted, with some loss of resolution (50% to beexact). One approach is to take alternative columns of pixels from each image andthen pack the remaining columns in the side-by-side format. Another approach isto take alternative rows of pixels from each image and then pack the remainingrows in the above/under format (Fig. 3.7). Studies have shown that the eye is less sensitive to loss of resolution along adiagonal direction in an image than in the horizontal or vertical direction. Thisallows the development of encoders that optimize subjective quality by samplingeach image in a diagonal direction. Other encoding schemes are also being devel-oped to attempt to retain as much of the perceived/real resolution as possible.One approach that has been studied for 3D is quincunx ﬁltering. A quincunx isa geometric pattern comprised of ﬁve coplanar points, four of them forming asquare (or rectangle) and a point ﬁfth at its center, like a checkerboard. Quincunxﬁlter banks are 2D two-channel nonseparable ﬁlter banks that have been shownto be an effective tool for image coding applications. In such applications, it isdesirable for the ﬁlter banks to have perfect reconstruction, linear phase, highcoding gain, good frequency selectivity, and certain vanishing-moment properties[7–12]. Almost all hardware devices for digital image acquisition and output usesquare pixel grids. For this reason and for the ease of computations, all currentimage compression algorithms (with the exception of mosaic image compres-sion for single-sensor cameras) operate on square pixel grids. It turns out thatthe optimal sampling scheme in the two-dimensional image space is claimedto be the hexagonal lattice; unfortunately, a hexagonal lattice is not straight-forward in terms of hardware and software implementations. A compromise,therefore, is to use the quincunx lattice; this is a sublattice of the square lattice,as illustrated in Fig. 3.7. The quincunx lattice has a diamond tessellation that iscloser to optimal hexagon tessellation than square lattice, and it can be easilygenerated by down-sampling conventional digital images without any hardwarechange. Because of this, quincunx lattice is widely adopted by single-sensor dig-ital cameras to sample the green channel; also, quincunx partition of an image
54 3DTV/3DV ENCODING APPROACHES Side-by-side Left frame Right frame (a) Over/under Left frame Right frame (b) Quincunx Left frame Right frame (c)Figure 3.7 Selection of pixels in (a) side-by-side, (b) over/under, and (c) quincunxapproaches. (Note: Either black or white dots can comprise the lattice.)was recently studied as a means of multiple-description coding . When usingquincunx ﬁltering, the higher-quality sampled images are encoded and packagedin a standard video frame (either with the side-by-side or over/under arrange-ment). The encoded and reformatted images are compressed and distributed tothe home using traditional means (cable, satellite, terrestrial broadcast, and so on).126.96.36.199 Temporal Multiplexing. Temporal (time) multiplexing doubles theframe rate to 120 Hz to allow the sequential repetitive presentation of the lefteye and right eye images in the normal 60-Hz time frame. This approach retainsfull resolution for each eye, but requires a doubling of the bandwidth and storagecapacity. In some cases spatial compression is combined with time multiplexing;however, this is more typical of an in-home format and not a transmit/broadcastformat. For example, Mitsubishi’s 3D DLP TV uses quincunx sampled (spatiallycompressed) images that are clocked at 120 Hz as input.
3D MASTERING METHODS 553.1.2 Compression for Conventional Stereo Video (CSV)Typically, the algorithms to compress act to separately encode and decode themultiple video signals, as shown in Fig. 3.8a. This is also called simulcast.The drawback is the fact that the amount of data is increased compared to 2Dvideo; however, reduction of resolution can be used as needed, to mitigate thisrequirement. Table 3.1 summarizes the available methods. It turns out that the MPEG-2 standard includes an MPEG-2 Multi-ViewProﬁle (MVP) Coding that allows efﬁciency to be increased by combiningtemporal/inter-view prediction as illustrated in Fig. 3.6b.H.264/AVC wasenhanced a few years ago with a stereo Supplemental Enhancement Information(SEI) message that can also be used to implement a prediction as illustratedin Fig. 3.8b. Although not designed for stereo-view video coding, the H.264coding tools can be arranged to take advantage of the correlations between thepair of views of a stereo-view video, and provide very reliable and efﬁcientcompression performance as well as stereo/mono-view scalability . For more than two views, the approach can be extended to Multi-view VideoCoding (MVC) as illustrated in Fig. 3.9 ; MVC uses inter-view predictionby referring to the pictures obtained from the neighboring views. MVC hasbeen standardized in the Joint Video Team (JVT) of the ITU-T Video CodingExperts Group (VCEG) and ISO/IEC MPEG. MVC enables efﬁcient encoding ofsequences captured simultaneously from multiple cameras using a single videostream. MVC is currently the most efﬁcient way for stereo and MVC; for twoviews, the performance achieved by H.264/AVC stereo SEI message and MVCare similar . MVC is also expected to become a new MPEG video codingstandard for the realization of future video applications such as 3D Video (3DV)and Free Viewpoint Video (FVV). The MVC group in the JVT has chosen the Time Left view I B B P B B P B Encoder Right Encoder view I B B P B B P B (a) Left view I B B P B B P B Right view Encoder P B B B B B B B (b)Figure 3.8 Stereo video coding with combined temporal/inter-view prediction. (a)Traditional MPEG-2/MPEG-4 applied to 3DTV; (b) MPEG-2 multi-view proﬁle andH.264/AVC SEI message.
56 3DTV/3DV ENCODING APPROACHESTABLE 3.1 Compression MethodsService StandardSimulcast coding The separate encoding (and transmission) of the two video scenes in the Conventional Stereo Video (CSV) format. Any coding scheme, such as MPEG-4 can be used. The bitrate will typically be in the range of double that of 2DTV. Video plus depth (V + D) is more bandwidth efﬁcient: studies show that the depth map can typically be compressed to 10–20% of the color informationStereoscopic video coding • ITU-T Rec. H.262/ISO/IEC 13818-2 MPEG-2 Video (Multi-View Proﬁle) • Transport of this data is deﬁned in a separate MPEG Systems speciﬁcation “ISO/IEC 13818-1:2003 Carriage of Auxiliary Data”Multi-view video coding • H264/AVC can be used for each view independently ISO/MPEG and ITU/VCEG have recently jointly published MVC extension of H.264/AVC (Amendment 4)Standard DescriptionH.264/AVC Stereo H.264/AVC was enhanced with SEI message that can supplemental enhancement also be used to implement a prediction capability information (SEI) that reduces the overall bandwidth requirement. Some correlations between the pair of views of a stereo-view video can be exploitedH.264/AVC Scalable video Annex G supports the concept of scalable video coding coding scheme to enable the encoding of a video stream that contains one (or several) subset bitstream(s) of a lower spatial or temporal resolution (that is, lower quality video signal)—each separately or in combination—compared to the bitstream it is derived from (e.g., the subset bitstream is typically derived by dropping packets from the larger bitstream), that can itself (themselves) be decoded with a complexity and reconstruction quality comparable to that achieved using the existing coders (e.g., H.264/MPEG-4 AVC) with the same quantity of data as in the subset bitstream. Using SEI message deﬁned in H.264 Fidelity Range Extensions (FRExt), a decoder can easily synchronize the views, and a streaming server or a decoder can easily detect the scalability of a coded stereo video bitstream
3D MASTERING METHODS 57TABLE 3.1 (Continued )Service StandardISO/IEC FDIS Video plus depth (V + D) has been standardized for 23002-3:2007(E) this MPEG as an extension for 3D ﬁled under ISO/IEC FDIS 23002-3:2007(E) “ISO/IEC 23002-3 Representation of Auxiliary Video and Supplemental Information” (also known as MPEG-C Part 3). Transport of this data is deﬁned in a separate MPEG Systems speciﬁcation “ISO/IEC 13818-1:2003 Carriage of Auxiliary Data”MVC (ISO/IEC 14496-10:2008 This standard which supports MV + D (and also Amendment 1 and ITU-T V + D) encoded representation inside the MPEG-2 Recommendation H.264) transport stream, has been developed by the Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG (ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6). MVC allows the construction of bitstreams that represent multiple views. MVC supports efﬁcient encoding of video sequences captured simultaneously from multiple cameras using a single video stream. MVC can be used for encoding stereoscopic (two-view) and multi-view 3DTV, and for free viewpoint TVMPEG, new efforts See Appendix B.3ITU-R ITU-R BT.1198 New initiatives under way, to conclude by 2012 ITU-R ITU-R BT.1438 Existing (but limited) standards: • Rec. ITU-R BT.1198 (1995) Stereoscopic television based on R- and L-eye two-channel signals • Rec. ITU-R BT.1438 (2000) Subjective assessment of stereoscopic television picturesH.264/AVC-based MVC method as the MVC reference model, since this methodshowed better coding efﬁciency than H.264/AVC simulcast coding and the othermethods that were submitted in response to the call for proposals made by theMPEG [15, 17–20]. Some new approaches are also emerging and have been proposed to improveefﬁciency, especially for bandwidth-limited environments. A new approach usesbinocular suppression theory that employs disparate image quality in left- andright-eye views. Viewer tests have shown that (within reason), if one of theimages of a stereo pair is degraded, the perceived overall quality of the stereovideo will be dominated by the higher-quality image [16, 21, 22]. This conceptis illustrated in Fig. 3.10. Applying this concept, one could code the right-eyeimage with less than the full resolution of the left eye; for example, down-sampling it to half or quarter resolution (Fig. 3.11). Some call this asymmetrical
58 3DTV/3DV ENCODING APPROACHES Time T0 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 V0 I B2 B2 B2 B1 B2 B2 B2 I B2 B2 B2 V1 B1 b B2 b B2 b B2 b B1 b B2 b V2 P B2 B2 B2 B1 B2 B2 B2 P B2 B2 B2 V3 B1 b B2 b B2 b B2 b B1 b B2 b V4 P B2 B2 B2 B1 B2 B2 B2 P B2 B2 B2 V5 B1 b B2 b B2 b B2 b B1 b B2 b Multiview Encoder V6 P B2 B2 B2 B2 B1 B2 B2 B2 P B2 B2 B2 V7 B b B2 b B1 b B2 b B b B2 b “T” picture set predicted by “V” picture set predicted by temporal pictures on temporal axis interview pictures on view axis “V/T” picture set predicted by viewtemporal pictures on both axes Figure 3.9 Multi-view video coding with combined temporal/inter-view prediction. Figure 3.10 Use of binocular suppression theory for more efﬁcient coding.quality. Studies have shown that asymmetrical coding with cross-switching atscene cuts (namely alternating the eye that gets the more blurry image) is aviable method for bandwidth savings . In principle this should provide com-parable overall subjective stereo video quality, while reducing the bitrate: if onewere to adopt this approach, the 3D video functionality could be added by anoverhead of say 25%–30% to the 2D video for coding the right view at quarterresolution. See Appendix B3 for some additional details.
MORE ADVANCED METHODS 59 Full resolution Half resolution Quarter resolution Figure 3.11 Mixed resolution stereo video coding.3.2 MORE ADVANCED METHODSOther methods have been discussed in the industry, known generally as 2D inconjunction with metadata (2D + M). The basic concept here is to transmit2D images and to capture the stereoscopic data from the “other eye” image inthe form of an additional package, the metadata; the metadata is transmitted aspart of the video stream (Fig. 3.12). This approach is consistent with MPEGmultiplexing; therefore, to a degree, it is compatible with embedded systems.The requirement to transmit the metadata increases the bandwidth needed inthe channel: the added bandwidth ranges from 60%–80% depending on qualitygoals and techniques used. As implied, a set-top box employed in a traditional2D environment would be able to use the 2D content, ignoring the metadata, andproperly display the 2D image; in a 3D environment the set-top box would beable to render the 3D signal. Some variations of this scheme have already appeared. One approach is tocapture a delta ﬁle that represents the difference between the left and right images.
60 3DTV/3DV ENCODING APPROACHES Left eye 3D encoding and video 3D camera compression Blu-ray Disc Cable TV 3D Video 3D video Satellite TV format compres- distribution encode sion Terrestrial TV channels IPTV Internet Right eye 3D home 3D home master package Metadata Left eye (2D) Left eye (2D) Metadata Video 3D decom- format pression decode Media players and set top boxes 3DTV Figure 3.12 2D in conjunction with metadata.A delta ﬁle is usually smaller than the raw ﬁle because of intrinsic redundancies.The delta ﬁle is then transmitted as metadata. Companies such as Panasonic andTDVision use this approach. This approach can also be used for stored media. Forexample, Panasonic has advanced (and the Blu-ray Disc Association is studying),the use of metadata to achieve a full-resolution 3D Blu-ray Disc standard. A1920 × 1080p 24 fps resolution per eye is achievable. This standard would makeBlu-ray Disc a high-quality 3D content (storage) system. The goal was to agreeto the standard by early 2010 and have 3D Blu-ray Disk players emerge by theend-of-year shopping season 2010. Another approach entails transmitting the 2Dimage in conjunction with a depth map of each scene.3.2.1 Video Plus Depth (V + D)As noted above, many 3DTV proposals often rely on the basic concept of “stereo-scopic” video, that is, the capture, transmission, and display of two separate videostreams (one for the left eye and one for the right eye). More recently, speciﬁcproposals have been made for a ﬂexible joint transmission of monoscopic colorvideo and associated per-pixel depth information [24, 25]. The concept of V +D representation is the next notch up in complexity. From this data representation, one or more “virtual” views of the 3D scenecan then be generated in real-time at the receiver side, by means of Depth-Image-Based Rendering (DIBR) techniques . A system such as this provides
MORE ADVANCED METHODS 61important features, including backwards compatibility to today’s 2D digital TV;scalability in terms of receiver complexity; and easy adaptability to a wide rangeof different 2D and 3D displays. DIBR is the process of synthesizing “virtual”views of a scene from still or moving color images and associated per-pixel depthinformation. Conceptually, this novel view generation can be understood as thefollowing two-step process: at ﬁrst, the original image points are re-projectedinto the 3D world, utilizing the respective depth data; thereafter, these 3D spacepoints are projected into the image plane of a “virtual” camera that is locatedat the required viewing position. The concatenation of re-projection (2D to 3D)and subsequent projection (3D to 2D) is usually called 3D image warping in theComputer Graphics (CG) literature and will be derived mathematically in the fol-lowing paragraph. The signal processing and data transmission chain of this kindof 3DTV concept is illustrated in Fig. 3.13; it consists of four different functionalbuilding blocks: (i) 3D content creation, (ii) 3D video coding, (iii) transmission,and (iv) “virtual” view generation and 3D display. As it can be seen in Fig. 3.14, a video signal and a per-pixel depth map is cap-tured and eventually transmitted to the viewer. The per-pixel depth data can beconsidered a monochromatic luminance signal with a restricted range spanningthe interval [Znear , Zfar ] representing, respectively, the minimum and maximumdistance of the corresponding 3D point from the camera. The depth range is quan-tized with 8 bit, with the closest point having the value 255 and the most distantpoint having the value 0. Effectively, the depth map is speciﬁed as a grayscaleimage; these values can be supplied into the luminance channel of a video signaland the chrominance can be set to a constant value. In summary, this representa-tion uses a regular video stream enriched with so-called depth maps providing aZ -value for each pixel. Note that V + D enjoys backward compatibility becausea 2D receiver will display only the V portion of the V + D signal. Studies by “Virtual” view 3D video generation and coding 3D display Standard 2D color video Standard Recorded DVB MPEG-2 coded 2DTV 3D decoder 3D recording 3DTV Single 3D video DVB broadcast user Meta data coding network decoder 3DTV 2D-to-3D content conversion Depth information Multiple 2D out of MPEG-4 coded user 3D 3DTV 3D content generation Transmission Figure 3.13 Depth-image-based rendering (DIBR) system.
62 3DTV/3DV ENCODING APPROACHES z far 0 z near 255 Figure 3.14 Video plus depth (V + D) representation for 3D video. Figure 3.15 Regeneration of stereo video from V + D signals.the European ATTEST (Advanced Three Dimensional Television System Tech-nologies) project indicate that depth data can be compressed very efﬁciently andstill be of good quality; namely, that it needs only around 20% of the bitratethat would otherwise be needed to encode the color video (the qualitative resultswere conﬁrmed by means of subjective testing). This approach can be placed inthe category of Depth-Enhanced Stereo (DES). A stereo pair can be rendered from the V + D information, by 3D warping atthe decoder. A general warping algorithm takes a layer and deforms it in manyways: for example, twists it along any axis, or bends a layer around itself or addsarbitrary dimension with a displacement map. The generation of the stereo pairfrom a V + D signal at the decoder as illustrated in Fig. 3.15. This reconstruction
MORE ADVANCED METHODS 63affords extended functionality compared to CSV because the stereo image canbe adjusted and customized after transmission. Note that in principle, more thantwo views can be generated at the decoder thus enabling support of multi-viewdisplays (and head motion parallax viewing within reason). V + D enjoys backwards compatibility, compression efﬁciency, extendedfunctionality, and the ability to use existing coding algorithms. It is only neces-sary to specify high-level syntax that allows a decoder to interpret two incomingvideo streams correctly as color and depth. The speciﬁcations “ISO/IEC 23002-3Representation of Auxiliary Video and Supplemental Information” and “ISO/IEC13818-1:2003 Carriage of Auxiliary Data” enable 3D video-based V + D to bedeployed in a standardized fashion by broadcasters interested in adopting thismethod. It should be noted however, that the advantages of V + D over CSV entailincreased complexity for both, sender and receiver. At the receiver side, viewsynthesis has to be performed after decoding to generate the second view of thestereo pair. At the sender (capture) side, the depth data have to be generatedbefore encoding can take place. This is usually done by depth/disparity estima-tion from a captured stereo pair; these algorithms are complex and still errorprone. Thus in the near future, V + D might be more suitable for applicationswith playback functionality, where depth estimation can be performed ofﬂine onpowerful machines, for example in a production studio or home 3D editing suite,enabling viewing of downloaded 3D video clips and 3DTV broadcasting .3.2.2 Multi-View Video Plus Depth (MV + D)There are some advanced 3D video applications that are not properly supportedby any existing standards and where work by the ITU-R or ISO/MPEG is needed.Two such applications are given below: • wide range multi-view autostereoscopic displays (say, nine or more views); • FVV (environment where the user can chose his/her own viewpoint). These 3D video applications require a 3D video format that allows renderinga continuum and/or large number of output views at the decoder. There really areno available alternatives: MVC discussed above does not support a continuumand becomes inefﬁcient for a large number of views; and, we noted that V + Dcould in principle generate more than two views at the decoder but in practice,it supports only a limited continuum around the original view (artifacts increasesigniﬁcantly with the distance of the virtual viewpoint). In response, MPEGstarted an activity to develop a new 3D video standard that would support theserequirements (Chapter 6). The MV + D concept is illustrated in Fig. 3.16. MV + D involves a numberof complex processing steps where (i) depth has to be estimated for the N viewsat the capture point, and then (ii) N color with N depth video streams have to
64 3DTV/3DV ENCODING APPROACHES Camera views Transmission 3D decoding representation representation representation 3D coding estimation Depth Scene rendering 3D 3D 3D Decoder 3D scene Figure 3.16 Multi-view video plus depth (MV + D) concept.be encoded and transmitted. At the receiver, the data have to be decoded and thevirtual views have to be rendered (reconstructed). As was implied just above, MV + D can be used to support multi-viewautostereoscopic displays in a relatively efﬁcient manner. Consider a displaythat supports nine views (V1–V9) simultaneously (e.g., with a lenticular displaymanufactured by Philips; Fig. 3.17). From a speciﬁc position a viewer can see Decoded MV + D data D1 D5 D9 V1 V5 V9 DIBR DIBR Lenticular V1 V2 V3 V4 V5 V6 V7 V8 V9 display MV 3D display Left Right ∑ Pos1 Left Right ∑ Left Right ∑ Pos3 Pos2 Figure 3.17 Multi-view autostereoscopic displays based on MV + D.
MORE ADVANCED METHODS 65only a stereo pair of views, depending on the viewer’s position. Transmitting ninedisplay views directly (e.g., by using MVC) would be taxing from a bandwidthperspective; in this illustrative example only three original views (views V1,V5, and V9) along with corresponding depth maps D1, D5, and D9 are in thedecoded stream—the remaining views can be synthesized from these decodeddata by using DIBR techniques.3.2.3 Layered Depth Video (LDV)LVD is a derivative and also an alternative to MV + D. LDV is believed tobe more efﬁcient than MV + D because less information has to be transmitted;however, additional error-prone vision processing tasks are required that operateon partially unreliable depth data. These efﬁciency assessments remain to be fullyvalidated as of press time. LVD uses (i) one-color video with associated depth map and (ii) a backgroundlayer with associated depth map; the background layer includes image contentthat is covered by foreground objects in the main layer. This is illustrated inFigs 3.18 and 3.19. The occlusion information is constructed by warping two or Depth Occlusion Occlusion Foreground layer Capture estimation generation layer 1 2 3 4 Upstream/capture Transmission Downstream/rendering Center view Generation new view Viewing from left Viewing from right side of image side of image Figure 3.18 Layered depth video (LDV) concept.
66 3DTV/3DV ENCODING APPROACHES Capture Foreground layer Depth estimation zfar 0 znear 255 Occlusion generation Occlusion layer Figure 3.19 Layered depth video (LDV) example.more neighboring V + D views from the MV + D representation onto a deﬁnedcenter view. The LDV stream or substreams can then be encoded by a suitableLDV coding proﬁle. Note that LDV can be generated from MV + D by warping the main layerimage onto other contributing input images (e.g., an additional left and rightview). By subtraction, it is then determined which parts of the other contributinginput images are covered in the main layer image; these are then assigned asresidual images and transmitted while the rest is omitted . Figure 3.18 is based on a recent presentation at the 3D Media Workshop,Heinrich Hertz Institut (HHI) Berlin, October 15–16, 2009 [27, 28]. LDV pro-vides a single view with depth and occlusion information. The goal is to achieveautomatic acquisition of 3DTV content, especially to obtain depth and occlusioninformation from video and to extrapolate a new view without error. Table 3.2, composed from technical details in Ref.  provides a summaryof the issues associated with the various representation methods.
MORE ADVANCED METHODS 67TABLE 3.2 Summary of FormatsShort-term Stereoscopic 3D • Suboptions formats ◦ Simulcast (2 views transmitted, double the bandwidth) ◦ Spatially interleaved side-by-side ◦ Spatially interleaved above/under ◦ Time interleaved (2 views transmitted, double the bandwidth) • Standard format for 3D cinema (a plus) • Standard format for glasses-based consumer displays (a plus) • No support for non-glasses-based multi-view displays (a minus) • Allows adjustment of zero parallax (a plus) • No scaling of depth (a minus) ◦ No adjustment to display size ◦ No personal preferences, kids mode • No occlusion information ◦ no motion parallaxLonger-term Video plus depth: one • Successfully demonstrated by ATTEST video stream with project (2002–2004), MPEG-C Part 3 associated depth • Not the standard format for 3D cinema (a map minus) • Depth-image-based rendering ◦ Support for stereoscopic glasses-based consumer displays ◦ Support for non-glasses-based multi-view displays (a plus) ◦ Allows scaling of depth (a plus) – Adjustment to display size – Personal preferences, kids mode ◦ Views must be extrapolated (a minus) • Allows adjustment for zero parallax (a plus) • No occlusion information (a minus) ◦ Reduced quality of depth-image-based rendering (continued overleaf)
68 3DTV/3DV ENCODING APPROACHESTABLE 3.2 (Continued ) Layered depth video • Not the standard format for 3D cinema (a (LDV): video plus minus) depth-enhanced • Depth-image-based rendering with additional ◦ Support for stereoscopic glasses-based occlusion layer consumer displays with depth ◦ Support for non-glasses-based multi-view information (video displays (a plus) with per-pixel ◦ Allows scaling of depth (a plus) depth map and occlusion layer ◦ Views must be extrapolated (a minus) with depth map) • Allows adjustment for zero parallax (a plus) • Provides occlusion information (a plus) ◦ Better quality of depth-image-based rendering Depth-enhanced • Not the standard format for 3D cinema (a stereo (DES): 2 minus) video streams with • Easily usable for stereoscopic glasses-based depth map and consumer displays (a plus) additional • Depth-image-based rendering occlusion layer ◦ Support for non-glasses-based multi-view with depth displays (a plus) information (2 ◦ Allows scaling of depth (a plus) videos with per-pixel depth ◦ Views are interpolated or extrapolated map and occlusion • Allows adjustment for zero parallax (a plus) layer with depth • Provides excellent occlusion information (a map) big plus) Multiple video plus • Not the standard format for 3D cinema (a depth): 2 or more minus) video streams with • Easily usable for stereoscopic glasses-based depth (interpolation consumer displays of intermediate • Depth-image-based rendering virtual views from ◦ Support for non-glasses-based multi-view multiple video plus displays (a plus) depth (MVD)) ◦ Allows scaling of depth (a plus) ◦ Views are interpolated (a plus) • Allows adjustment for zero parallax (a plus) • Provides good occlusion handling due to redundant information (a plus)
REFERENCES 693.3 SHORT-TERM APPROACH FOR SIGNAL REPRESENTATIONAND COMPRESSIONIn summary, stereoscopic 3D will be used in the short term. Broadcasters appearto be rallying around top/bottom spatial compression; however, trials are stillongoing. Other approaches involve some form of compression including checker-board (quincunx ﬁlters), side-by-side or interleaved rows or columns . Spatialcompression can operate on the same channel capacity as an existing TV channelbut with a compromise in resolution. Stereoscopic 3D is the de facto standardfrom 3D cinema; note that this approach is directly usable for glasses-baseddisplays, but it does not allow for scaling of depth. It is also not usable fornon-glasses-based displays . (Preferably, a 3D representation format must begeneric for all display types—stereoscopic displays and multi-view displays—thelong-term approaches we listed above will support that goal.) For compression, one of the following four may ﬁnd use in the shortterm: (i) ITU-T Rec. H.262/ISO/IEC 13818-2 MPEG-2 Video (MVP); or(ii) H.264/AVC with SEI; or (iii) H264/AVC can be used for each viewindependently; or (iv) the MVC extension of H.264/AVC (Amendment 4).3.4 DISPLAYSWe include this topic here just to provide the end-to-end view implied in Fig. 3.1though we have indirectly covered it earlier along the way. 3D displays includethe following: • glasses-based displays—anaglyph; • glasses-based displays with active shutter glasses; • glasses-based displays with passive glasses; • non-glasses-based displays, lenticular; • non-glasses-based displays, barrier; • non-glasses-based displays, two views, tracked; • non-glasses-based displays, nine views, video + depth input (internal con- version to multi-view); • non-glasses-based displays, 2x video + depth. Tables 1.2 and 1.3 provided a synopsis of the technology, also with a perspec-tive on what was commercially available at press time.REFERENCES 1. Minoli D. 3D Television (3DTV) Technology, Systems, and Deployment—Rolling out the Infrastructure for Next-Generation Entertainment. Francis and Taylor; 2010. 2. Ozaktas HM, Onural L, editors. Three-dimensional television: capture, transmission, display. New York: Springer; 2008, XVIII, 630 p. 316 illus., ISBN: 978-3-540- 72531-2.
70 3DTV/3DV ENCODING APPROACHES 3. Bahram J, Fumio O, editors. Three-dimensional television, video and display tech- nology. New York: Springer; 2002. 4. Schreer O, Kauff P, Sikora T, editors. 3D Videocommunication: Algorithms, concepts and real-time systems in human-centered communication (Hardcover). New York: Wiley, John & Sons; 2005, ISBN-13: 9780470022719. 5. Johnston C. Will New Year of 3D Drive Lens Technology? TV Technology Online Magazine. Dec 15 2009. 6. Chinnock C. 3D Coming Home in 2010, 3D@Home White Paper, 3D@Home Con- sortium. www.3Dathome.org. 2010. 7. Tay DBH, Kingsbury NG. Flexible design of multidimensional perfect reconstruction FIR 2-band ﬁlters using transformations of variables. IEEE Trans Image Process 1993; 2(4): 466–480. 8. Sweldens W. The lifting scheme: a custom-design construction of biorthogonal wavelets. Appl Comput Harmonic Anal 1996; 3: 186–200. 9. Gouze A, Antonini M, Barlaud M. Quincunx lifting scheme for lossy image com- pression. Proceedings of IEEE International Conference on Image Processing, vol. 1; Sep 2000; Vancouver, BC, Canada. pp. 665–668.10. Kovacevic J, Sweldens W. Wavelet families of increasing order in arbitrary dimen- sions. IEEE Trans Image Process 2000; 9(3): 480–496.11. Chen Y, Adams MD, Lu WS. Design of optimal quincunx ﬁlter banks for image coding. EURASIP J Adv Signal Process 2007; 2007, Article ID 83858.12. Liu Y, Nguyen TT, Oraintara S. Embedded Image Coding using Quincunx Directional Filter Bank, ISCAS 2006, IEEE. pp 4943.13. Zhang X, Wu X, Wu F. Image coding on quincunx lattice with adaptive lifting and interpolation. Data Compression Conference (DCC ’07). IEEE Computer Society: Piscataway, NJ, USA; 2007.14. Sun S, Lei S. Stereo-view video coding using H.264 tools. Proc SPIE Int Soc Opt Eng 2005; 5685: 177–184.15. Hur J-H, Cho S, Lee Y-L. Illumination change compensation method for H.264/AVC- based multi-view video coding. IEEE Trans Circuit Syst Video Technol 2007; 17(11).16. 3DPHONE. Project no. FP7-213349, Project title: ALL 3D IMAGING PHONE, 7th FRAMEWORK PROGRAMME, Speciﬁc Programme “Cooperation”, FP7-ICT- 2007.1.5—Networked Media, D5.1- Requirements and speciﬁcations for 3D video. Aug 19 2008.17. Smolic A, McCutchen D. 3DAV exploration of video-based rendering technology in MPEG. IEEE Trans Circuits Syst Video Technol 2004; 14(3): 348–356.18. Sullivan G, Wiegand T, Luthra A. Draft of Version 4 of H.264/AVC (ITU-T Rec- ommendations H.264 and ISO/IEC 14496–10 (MPEG-4 Part 10) Advanced Video Coding), ISO/IEC JTC1/SC29/WG11 and ITU-T Q6/SG16, Doc. JVT-N050d1. 2005.19. Mueller K, Merkle P, Smolic A, et al. Multi-view Coding Using AVC, ISO/IEC JTC1/SC29/WG11, Bangkok, Thailand, Doc. M12945. 2006.20. ISO. Subjective Test Results for the CfP on Multi-view Video Coding, ISO/IEC JTC1/SC29/WG11, Bangkok, Thailand, Doc. N7779. 2006.21. Stelmach L, Tam WJ. Stereoscopic image coding: effect of disparate image-quality in left- and right-eye views. Signal Process Image Commun 1998; 14: 111–117.
REFERENCES 7122. Stelmach L, Tam WJ, Meegan D, et al. Stereo image quality: effects of mixed spatio- temporal resolution. IEEE Trans Circuits Syst Video Technol 2000; 10(2): 188–193.23. Tam WJ. Human Visual Perception relevant to 3D-TV. Ottawa: Communications Research Centre Canada; Apr 30 2009.24. Fehn C, Kauff P, et al. An evolutionary and optimized approach on 3DTV. Proceed- ings of International Broadcast Conference ’02; 2002; Amsterdam, The Netherlands. pp. 357–365.25. Fehn C. A 3DTV Approach Using Depth-Image-Based Rendering (DIBR). Proceed- ings of Visualization, Imaging, and Image Processing ’03; 2003; Benalmadena, Spain. pp. 482–487.26. Fehn C. Depth-Image-Based Rendering (DIBR), compression, and transmission for a ﬂexible approach on 3DTV [PhD thesis]. Germany: Technical University Berlin; 2006.27. Koch R. Future 3DTV Acquisition, 3D Media Workshop, HHI Berlin. Oct 15–16, 2009.28. Frick A, Kellner F, et al. Generation of 3DTV LDV content with time of ﬂight cameras. Proceedings of 3DTV-CON 2009; May 04–06 2009; Potsdam.29. Tanger R. 3D4YOU, Seventh Framework Theme ICT-2007.1.5 Networked Media, Position Paper submitted to ITU-R, 5/6/2009 Fraunhofer HHI, Berlin, Germany.30. Merritt R. Incomplete 3DTV products in CES spotlight HDMI upgrade one of latest pieces in stereo 3D puzzle, EE Times. Dec 23 2009.31. Starks M. Spacespex anaglyph—the only way to bring 3DTV to the masses. Online article. 2009.32. Choi Y-W, Thao NT. Implicit coding for very low bit rate image compression. 1998 IEEE International Conference on Image Processing, 1998. ICIP 98. Proceedings, Volume 2; 4–7 Oct Chicago: IL, USA; 1998; pp. 560–564.33. International Organization For Standardization. ISO/IEC JTC1/SC29/WG11, Coding Of Moving Pictures And Audio, “Vision on 3D Video, Video and Requirements,” ISO/IEC JTC1/SC29/WG11N10357. Lausanne, Switzerland. Feb 2009.34. Chen Y, Wang Y.-K, Ugur K, Hannuksela MM, Lainema J, Gabbouj M. The emerging MVC standard for 3D video services. EURASIP Journal on Advances in Signal Processing 2009; 2009: Article ID 786015, DOI 10.1155/2009/786015.35. Editor: Smolic A. Introduction to Multi-view Video Coding. Antalya, Turkey, Inter- national Organization for Standardization, ISO/IEC JTC 1/SC 29/WG 11, Coding of Moving pictures and Audio. Jan 2008.36. Yang W, Wu F, Lu Y, et al. Scalable multi-view video coding using wavelet. IEEE Int Symp Circuits Syst 2005; 6(23–26): 6078–6081. DOI 10.1109/ISCAS. 2005.1466026.37. Min D, Kim D, Yun SU, et al. 2D/3D freeview video generation for 3DTV sys- tem. Signal Process Image Commun 2009; 24(1–2): 31–48. DOI 10.1016/j.image. 2008.10.00938. Ozbek N, Tekalp A. Scalable multi-view video coding for interactive 3DTV. 2006 IEEE International Conference on Multimedia and Expo, Proceedings; July 09–12 2006; Toronto, Canada. pp. 213–216. ISBN: 1-4244-0366-7.
72 3DTV/3DV ENCODING APPROACHES39. Tech G, Smolic A, Brust H, et al. Optimization and comparison of coding algorithms for mobile 3DTV. White Paper, Fraunhofer Institute for Telecommunications. Berlin: Heinrich-Hertz-Institut, Image Processing Department Einsteinufer; 2009.40. Hewage CTER, Worrall S. Robust 3D video communications. IEEE Comsoc Mmtc E-Letter 2009; 94(3).
APPENDIX A3: COLOR ENCODING 73APPENDIX A3: COLOR ENCODINGColor encoding (anaglyph) is the de facto method used for 3D over the years. Infact, there are many hundreds of patents on anaglyphs in a dozen languages goingback 150 years. The left-eye and right-eye images are color encoded to derivea single merged (overlapped) frame; at the receiving end the two frames arerestored (separated) using colored glasses. This approach makes use of a numberof encoding processing techniques to optimize the signal in order to secure bet-ter color contrast, image depth, and overall performance (Fig. A3.1). Red/blue,red/cyan, green/magenta, or blue/yellow color coding can be used, with the ﬁrsttwo being the most common. Orange/blue anaglyph techniques are claimed bysome to provide good quality, but there is a continuum of combinations .Advantages of this approach include the fact that it is frame-compatible withexisting systems, can be delivered over any 2D system, provides full resolu-tion, and uses inexpensive “glasses.” However, it produces the lowest quality 3Dimage compared with the other systems discussed above. 3D camera Left eye 3D encoding and video Compression Blu-ray Disc Cable TV 3D video 3D Video Satellite TV distribution format compres- encode sion Terrestrial TV channels IPTV Internet 3D home Right eye master (One frame) (One frame) Video 3D decom- format pression decode Media players and set top boxes TV Figure A3.1 Anaglyph method.
74 3DTV/3DV ENCODING APPROACHESAPPENDIX B3: ADDITIONAL DETAILS ON VIDEO ENCODINGSTANDARDSThis appendix provides some additional details on video encoding, especially interms of future systems. Efﬁcient video encoding is required for 3DTV/3DV and for FVT/FVV.3DTV/3DV support 3D depth impression of the observed scenery, whileFVT/FVV additionally allow for an interactive selection of viewpoint anddirection within a certain operating range. Hence, a common feature of 3DVand FVV systems is the use of multiple views of the same scene that aretransmitted to the user. Multi-view 3D video can be encoded implicitly in theV + D representation or, as is more often the case, explicitly. In implicit coding one seeks to use (implicit) shape coding in combinationwith MPEG-2/MPEG-4. Implicit shape coding could mean that the shape canbe easily extracted at the decoder, without explicit shape information present inthe bitstream. These types of image compression schemes do not rely on theusual additive decomposition of an input image into a set of predeﬁned spanningfunctions. These schemes only encode implicit properties of the image and recon-struct an estimate of the scene at the decoding end. This has particular advantageswhen one seeks very low bitrate perceptually oriented image compression .The literature on this topic is relatively scanty. Chroma Key might be useful inthis context: Chroma Key, or green screen, allows one to put a subject anywherein a scene or environment using the Chroma Key as the background. One canthen import the image into the digital editing software, extract the Chroma Keyand replace with another image or video. Chroma Key shape coding for implicitshape coding (for medium quality shape extraction) has been proposed and alsodemonstrated in the recent past. On the other hand, there are a number of strategies for explicit coding of multi-view video: (i) simulcast coding, (ii) scalable simulcast coding, (iii) multi-viewcoding, and (iv) Scalable Multi-View Coding (SMVC). Simulcast coding is the separate encoding (and transmission) of the two videoscenes in the CSV format; clearly the bitrate will typically be in the range ofdouble that of 2DTV. V + D is more bandwidth efﬁcient not only in the abstract,but also in practice. At the practical level, in a V + D environment the qualityof the compressed depth map is not a signiﬁcant factor in the ﬁnal quality ofthe rendered stereoscopic 3D video. This follows from the fact that the depthmap is not directly viewed, but is employed to warp the 2D color image to twostereoscopic views. Studies show that the depth map can typically be compressedto 10%–20% of the color information. V + D (also called 2D plus depth, or 2D + depth, or color plus depth)has been standardized in MPEG as an extension for 3D ﬁled under ISO/IECFDIS 23002-3:2007(E). In 2007, MPEG speciﬁed a container format “ISO/IEC23002-3 Representation of Auxiliary Video and Supplemental Information” (alsoknown as MPEG-C Part 3) that can be utilized for V + D data. 2D + depth,as speciﬁed by ISO/IEC 23002-3 supports the inclusion of depth for generation
APPENDIX B3: ADDITIONAL DETAILS ON VIDEO ENCODING STANDARDS 75of an increased number of views. While it has the advantage of being backwardcompatible with legacy devices and is agnostic of coding formats, it is capable ofrendering only a limited depth range since it does not directly handle occlusions. Transport of this data is deﬁned in a separate MPEG systems speciﬁcation“ISO/IEC 13818-1:2003 Carriage of Auxiliary Data.” There is also major interest in MV + D. Applicable coding schemes of interesthere include the following: • Multiple-view video coding (MVC) • Scalable Video Coding (SVC) • Scalable multi-view video coding (SMVC) From a test/test-bed implementation perspective, for the ﬁrst two options, eachview can be independently coded using the public-domain H.264 and SVC codecsrespectively. Test implementations for MVC and for preliminary implementationsof an SMVC codec have been documented recently in the literature.B3.1 Multiple-View Video Coding (MVC)It has been recognized that MVC is a key technology for a wide variety of futureapplications including FVV/FTV, 3DTV, immersive teleconference and surveil-lance, and other applications. An MPEG standard, “Multi-View Video Coding(MVC),” to support MV + D (and also V + D) encoded representation insidethe MPEG-2 transport stream has been developed by the JVT of ISO/IEC MPEGand ITU-T VCEG (ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6). MVCallows the construction of bitstreams that represent multiple views ; MVCsupports efﬁcient encoding of video sequences captured simultaneously frommultiple cameras using a single video stream. MVC can be used for encodingstereoscopic (two-view) and multi-view 3DTV, and for FVV/FVT. MVC (ISO/IEC 14496-10:2008 Amendment 1 and ITU-T RecommendationH.264) is an extension of the AVC standard that provides efﬁcient codingof multi-view video. The encoder receives N temporally synchronized videostreams and generates one bitstream. The decoder receives the bitstream,decodes and outputs the N video signals. Multi-view video contains a largeamount of inter-view statistical dependencies, since all cameras capture the samescene from different viewpoints. Therefore, combined temporal and inter-viewprediction is the key for efﬁcient MVC. Also, pictures of neighboring camerascan be used for efﬁcient prediction . MVC supports the direct coding ofmultiple views and exploits inter-camera redundancy to reduce the bitrate.Although MVC is more efﬁcient than simulcast, the rate of MVC encoded videois proportional to the number of views. The MVC group in the JVT has chosen the H.264/MPEG-4 AVC-basedmulti-view video method as its MVC video reference model, since thismethod supports better coding efﬁciency than H.264/AVC simulcast coding.H.264/MPEG-4 AVC was developed jointly by ITU-T and ISO through the JVT
76 3DTV/3DV ENCODING APPROACHESin the early 2000s (the ITU-T H.264 standard and the ISO/IEC MPEG-4 AVC,ISO/IEC 14496-10-MPEG-4 Part 10 are jointly maintained to retain identicaltechnical content). H.264 is used with Blu-ray Disc and videos from the iTunesStore. The standardization of H.264/AVC was completed in 2003, but additionalextensions have taken place since then; for example, SVC as speciﬁed in AnnexG of H.264/AVC added in 2007. Owing to the increased data volume of multi-view video, highly efﬁcientcompression is needed. In addition to the redundancy exploited in 2D videofor compression, the common idea for MVC is to further exploit the redundancybetween adjacent views. This is because multi-view video is captured by multiplecameras at different positions and signiﬁcant correlations exist between neighborviews . As hinted elsewhere, there is interest in being able to synthesize novelviews from the virtual cameras in multi-view camera conﬁgurations; however, theocclusion problem can signiﬁcantly affect the quality of virtual view rendering. Also, for FVV, the depth map quality is important because it is used torender virtual views that are further apart than with the stereoscopic case: whenthe views are further apart, the distortion in the depth map has a greater effecton the ﬁnal rendered quality—this implies that the data rate of the depth maphas to be higher than in the CSV case.Note: Most existing MVC techniques are based on the traditional hybrid DCT-based video coding schemes. These neither fully exploit the redundancy amongdifferent views nor provide an easy way of implementation for scalabilities. Inaddition, all the existing MVC schemes mentioned above use DCT-based coding.A fundamental problem for DCT-based block coding is that it is not convenient toachieve scalability, which has become a more and more important feature for videocoding and communications. As a research topic, wavelet-based image and videocoding has been proved to be a good way to achieve both, good coding performanceand full scalabilities including spatial, temporal, and Signal-To-Noise Ratio (SNR)scalabilities. In the past, MVC has been included in several video coding standardssuch as MPEG-2 MVP, and MPEG-4 MAC (Multiple Auxiliary Component). Morerecently, an H.264-based MVC scheme has been developed that utilizes the multiplereference structure in H.264. Although this method does exploit the correlationsbetween adjacent views through inter-view prediction, it has some constraints forpractical applications compared to a method that uses, say, wavelets . As just noted, MPEG has developed a suite of international standards tosupport 3D services and devices. In 2009 MPEG initiated a new phase ofstandardization to be completed by 2011. MPEG’s vision is a new 3DV formatthat goes beyond the capabilities of existing standards to enable both, advancedstereoscopic display processing and improved support for autostereoscopicN -view displays, while enabling interoperable 3D services. 3DV aims to improverendering capability of 2D + depth format while reducing bitrate requirementsrelative to simulcast and MVC. Figure B3.1 illustrates ISO MPEG’s target of3DV format illustrating limited camera inputs and constrained rate transmission
APPENDIX B3: ADDITIONAL DETAILS ON VIDEO ENCODING STANDARDS 77 Stereoscopic displays Left Variable stereo baseline Adjust depth perception Right Limited camera inputs Data Data format format Constrained rate (based on distribuiton) Wide viewing angle Autostereoscopic Large number of N-view displays output viewsFigure B3.1 Target of 3D video format for ongoing MPEG standardization initiatives.according to a distribution environment. The 3DV data format aims to be capa-ble of rendering a large number of output views for autostereoscopic N -viewdisplays and support advanced stereoscopic processing. Owing to limitations inthe production environment, the 3DV data format is assumed to be based on lim-ited camera inputs; stereo content is most likely, but more views might also beavailable. In order to support a wide range of autostereoscopic displays, it shouldbe possible for a large number of views to be generated from this data format.Additionally, the rate required for transmitting the 3DV format should be ﬁxedto the distribution constraints; that is, there should not be an increase in the ratesimply because the display requires a higher number of views to cover a largerviewing angle. In this way, the transmission rate and the number of output viewsare decoupled. Advanced stereoscopic processing that requires view generationat the display would also be supported by this format . Compared to the existing coding formats, the 3DV format has several advan-tages in terms of bit rate and 3D rendering capabilities; this is also illustrated inFig. B3.2 . • 2D + depth, as speciﬁed by ISO/IEC 23002-3, is only capable of rendering a limited depth range since it does not directly handle occlusions. The 3DV format is expected to enhance the 3D rendering capabilities beyond this format. • MVC is more efﬁcient than simulcast but the rate of MVC encoded video is proportional to the number of views. The 3DV format is expected to signiﬁcantly reduce the bitrate needed to generate the required views at the receiver.
78 3DTV/3DV ENCODING APPROACHES Simulcast 3DV should be compatible with existing standards mono and stereo devices MVC existing or planned infrastructure Bitrate 3DV 2D + Depth 2D 3D rendering capabilityFigure B3.2 Illustration of 3D rendering capability versus bit rate for different formats.B3.2 Scalable Video Coding (SVC)The concept of the SVC scheme is to enable the encoding of a videostream that contains one (or several) subset bitstream(s) of a lower spatial ortemporal resolution (that is, lower quality video signal)—each separately or incombination—compared to the bitstream it is derived from (e.g., the subsetbitstream is typically derived by dropping packets from the larger bitstream), thatcan itself (themselves) be decoded with a complexity and reconstruction qualitycomparable to that achieved by using the existing coders (e.g., H.264/MPEG-4AVC) with the same quantity of data as in the subset bitstream. A standardfor SVC was recently being worked on by the ISO MPEG Group, and wascompleted in 2008. The SVC project was undertaken under the auspices of theJVT of the ISO/IEC MPEG and the ITU-T VCEG. In January 2005, MPEGand VCEG agreed to develop a standard for SVC, to become as an amendmentof the H.264/MPEG-4 AVC standard. It is now an extension, Annex G, of theH.264/MPEG-4 AVC video compression standard. A subset bitstream may encompass a lower temporal or spatial resolution (orpossibly a lower quality video signal, say with a camera of lower quality) ascompared to the bitstream it is derived from. • Temporal (Frame Rate) Scalability: the motion compensation dependencies are structured so that complete pictures (speciﬁcally packets associated with these pictures) can be dropped from the bitstream. (Temporal scalability is already available in H.264/MPEG-4 AVC but SVC provides supplemental information to ameliorate its usage.) • Spatial (Picture Size) Scalability: video is coded at multiple spatial resolu- tions. The data and decoded samples of lower resolutions can be used to
APPENDIX B3: ADDITIONAL DETAILS ON VIDEO ENCODING STANDARDS 79 predict data or samples of higher resolutions in order to reduce the bitrate to code the higher resolutions. • Quality Scalability: video is coded at a single spatial resolution but at dif- ferent qualities. In this case the data and samples of lower qualities can be utilized to predict data or samples of higher qualities—this is done in order to reduce the bitrate required to code the higher qualities.Products supporting the standard (e.g., for video conferencing) started to appearin 2008.B3.2.1 Scalable Multi-View Video Coding (SMVC). Although there aremany approaches published on SVC and MVC, there is no current work reportedon scalable multi-view video coding (SMVC). SMVC can be used for transport ofmulti-view video over IP for interactive 3DTV by dynamic adaptive combinationof temporal, spatial, and SNR scalability according to network conditions .B3.3 ConclusionTable B3.1 based on Ref.  indicates how the “better-known” compressionalgorithms can be applied, and what some of the trade-offs in quality are (thisstudy was done in the context of mobile delivery of 3DTV, but the concepts aresimilar in general). In this study, four methods for transmission and compres-sion/coding of stereo video content were analyzed. Subjective ratings show thatthe mixed resolution approach and the video plus depth approach do not impairvideo quality at high bitrates; at low bitrates simulcast transmission is outper-formed by the other methods. Objective quality metrics, utilizing the blurred orrendered view from uncompressed data as reference, can be used for optimizationof single methods (they cannot be used for comparison of methods since theyhave a positive or negative bias). Further research of individual methods willinclude combinations like inter-view prediction for mixed resolution coding anddepth representation at reduced resolution. In conclusion, the V + D format is considered by researchers to be a goodcandidate to represent stereoscopic video that is suitable for most of the 3Ddisplays currently available; MV + D (and the MVC standard) can be usedfor holographic displays and for FVV, where the user, as noted, can interactivelyselect his or her viewpoint and where the view is then synthesized from the closestspatially located captured views . However, for the initial deployment onewill likely see (in order of likelihood). • spatial compression in conjunction with MPEG-4/AVC; • H.264/AVC stereo SEI message; • MVC, which is an H.264/MPEG-4 AVC extension.
80 3DTV/3DV ENCODING APPROACHESTABLE B3.1 Application of Compression AlgorithmsH.264/AVC The left and right view are transmitted independently, each simulcast coded using H.264/MPEG-4 AVC. Hence, this method does not need any pre- or post-processing before coding and after decoding, the complexity on the sender and receiver sides is low. Redundancy between channels is not reduced, thus coding efﬁciency is not optimized Note: Nonhierarchical B pictures can be used with a Group of Pictures (GOP) structure of IBBP (hierarchical B pictures signiﬁcantly increase coding efﬁciency; however, hierarchical B pictures require increased complexity of the decoder and the encoder, which limits application in mobile devices)H.264/AVC stereo H.264/MPEG-4 AVC enables inter-view prediction through the SEI message stereo SEI syntax. Practically speaking, it is based on interlacing the left and the right view prior to coding and exploring interlaced coding mechanisms. It has been shown that the principle and efﬁciency of this approach is very similar to MVC, which is a H.264/MPEG-4 AVC extension to code two or more related video signals Note: Nonhierarchical B pictures can be used with a GOP structure of IBBP (hierarchical B pictures signiﬁcantly increase coding efﬁciency; however, hierarchical B pictures require increased complexity of the decoder and the encoder—this limits application in mobile devices)Mixed resolution Binocular suppression theory states that perceived image coding quality is dominated by the view with higher spatial resolution. The mixed resolution approach utilizes this attribute of human perception by decimating one view before transmission and up-scaling at the receiver side. This enables a trade-off between spatial subsampling and amplitude quantization. For example the right view can be reduced by a factor of about two in the horizontal and vertical directions; one can also alternate such reduction between the two eyes when there is a scene cutVideo plus depth MPEG-C Part 3 deﬁnes a video plus depth representation of the stereo video content. Depth is generated at the sender side for instance, by estimation from an original left and right view. One view is transmitted simultaneously with the depth signal. At the receiver, the other view is synthesized by depth-image–based rendering. Compared to video, a depth signal can, in most cases, be coded at a fraction of the bitrate at sufﬁcient quality for view synthesis. Errors in depth estimation and problems with disocclusions introduce artifacts to the rendered view
CHAPTER 43DTV/3DV Transmission Approachesand Satellite DeliveryThis chapter addresses some key concepts related to the transmission of 3DTVvideo in the near term. This chapter is not intended to be a research monographof open technical issues, but rather to discuss in some generality some of theapproaches being considered to distribute content to end users. If 3DTV as acommercial service is to become a complete reality in the next 2–5 years, it willcertainly use some or all of the technologies discussed in this chapter—thesesystems can all be deployed at this time and will be needed at the practical levelto make this nascent service a reality. We start with a generic discussion about transmission approaches and thenlook at DVB-based satellite approaches.4.1 OVERVIEW OF BASIC TRANSPORT APPROACHESIt is to be expected that 3DTV for home use will likely ﬁrst see penetration viastored media delivery (e.g., Blu-ray Disc). The broadcast commercial delivery of3DTV (whether over satellite/DTH, over the air, over cable, or via IPTV), maytake a few years because of the relatively large-scale infrastructure that has tobe put in place by the service providers and the limited availability of 3D-readyTV sets in the home (implying a small subscriber, and so small revenue, base).Delivery of downloadable 3DTV ﬁles over the Internet may occur at any pointin the immediate future, but the provision of a broadcast-quality service over theInternet is not likely in the foreseeable future. There are a number of alternative transport architectures for 3DTV signals,also depending on the underlying media. The service can be supported by tra-ditional broadcast structures including the DVB architecture, wireless 3G/4Gtransmission such as DVB-H approaches, Internet Protocol (IP) in support of anIPTV-based service (in which case it also makes sense to consider IPv6), and theIP architecture for Internet-based delivery (both non–real time and streaming).3DTV Content Capture, Encoding and Transmission: Building the TransportInfrastructure for Commercial Services, by Daniel MinoliCopyright 2010 John Wiley & Sons, Inc. 81
82 3DTV/3DV TRANSMISSION APPROACHES AND SATELLITE DELIVERYThe speciﬁc approach used by each of these transport methods will also dependon the video-capture approach, as depicted in Table 4.1. Initially conventionalstereo video (with temporal multiplexing or spatial compression) will be usedby all commercial 3DTV service providers; later in the decade other methodsmay be used. Also, make note in this context that in the United States one hasa well-developed cable infrastructure in all Tier 1 and Tier 2 metropolitan andsuburban areas; in Europe/Asia, this is less so, with more DTH delivery (in theUnited States DTH tends to serve more exurban and rural areas). A 3DTV rolloutmust take these differences into account and/or accommodate both. Note that the V + D data representation can be utilized to build 3DTV trans-port evolutionarily on the existing DVB infrastructure. The in-home 3D imagesare reconstructed at the receiver side by using DIBR. MPEG has established astandardization activity that focuses on 3DTV using V + D representation. There are generally two potential approaches for transport of 3DTVsignals: (i) connection-oriented (time/frequency division multiplexing) overexisting DVB infrastructure over traditional channels (e.g., satellite, cable,over-the-air broadcast, DVB-H/cellular), and (ii) connectionless/packet usingthe IP (e.g., “private/dedicated” IPTV network, Internet streaming, Interneton-demand servers/P2P i.e., peer-to-peer). These references, for example,among others, describe various methods for traditional video over packet/ATM(Asynchronous Transfer Mode)/IPTV/satellite/Internet [1–5]; many of theseapproaches and techniques can be extended/adapted for use in 3DTV. Figures 4.1–4.7 depict graphically system-level views of the possible deliverymechanisms. We use the term “complexity” in these ﬁgures to remind the readerthat it will not be trivial to deploy these networks on a broad national basis. A challenge in the deployment of multi-view video services, including 3Dand free-viewpoint TV, is the relatively large bandwidth requirement associatedwith transport of multiple video streams. Two-streams signals CSV, V + D, andLDV are doable: the delivery of a single stream of 3D video in the range of20 Mbps is not outside the technical realm of most providers these days, but todeliver a large number of channels in an unswitched mode (requiring say 2 Gbpsaccess to a domicile) will require FTTH capabilities. It is not possible to deliverthat content over an existing copper plant of the xDSL (Digital Subscriber Line)nature unless a provider could deploy ADSL2+ (Asymmetric Digital SubscriberLine; but why bother upgrading a plant to a new copper technology such as thisone when the provider could actually deploy ﬁber? However, ADSL2+ may beused in Multiple Dwelling Units as a riser for a FTTH plant). A way to deal withthis is to provide user-selected multicast capabilities where a user can select anappropriate content channel using IGMP (Internet Group Management Protocol).Even then, a household may have multiple TVs (say three or four) switched onsimultaneously (and maybe even an active Digital Video Recorder or DVR), thusrequiring bandwidth in the 15–60 Mbps. MV + D, where one wants to carrythree or even more intrinsic (raw) views, becomes much more challenging andproblematic for practical commercial applications. We cover ADSL2+ issues inChapter 5.
TABLE 4.1 Video Capture and Transmission Possibilities Terrestrial DTH with 3G/4G + IPTV Internet Real-Time Internet Non–real DVB DVB DVB-H (IPv4 or IPv6) Streaming Time Cable TV Conventional Fine Fine Limited Fine Fine Fine Fine stereo video (CSV) Video plus depth Good Good Fine Good Good Good Good (V + D) Multi-view video Good Good Fine Good Fine Good Good plus depth (MV + D) Layered depth Best Best Fine Best Fine Good Best video (LDV) Fine, doable; good, better approach; best, best approach.83
84 3DTV/3DV TRANSMISSION APPROACHES AND SATELLITE DELIVERY 2D SD, HD 2D display 2D Encoder and v1 camera content production Hundreds CSV Stereo Encoder and 3D v1 3D display camera content production v2 HDMI 1.4 Hundreds V + meta Stereo Encoder and 3D v1 camera content production Meta Hundreds V+D Depth Encoder and 3D v1 camera content production d Hundreds MV + D Multi- Encoder and 3D v1 M-view camera content production vn 3D display Hundreds Video Depth Metadata Figure 4.1 (Simplicity of) initial enjoyment of 3DTV in the home. Millions ... Each viewer 2D decoder IGMPing v1c1, v1c2, v1c3, ... 2D display 2D SD, HD Millions ... 2D Encoder and v1 camera content production m content providers CSV decoder Hundreds 200–300 CSV Aggregator content 3D display channels Stereo Encoder and 3D v1 Each viewer camera content production v2 m content IGMPing v1c1+v2c1, v1c2+v2c2, v1c3+v2c3, ... providers Hundreds Millions ... V + meta Aggregator Encoder and 3D IPTV with Stereo v1 V+D decoder meta m content service provider– camera content production providers managed IP network 3D display Hundreds Aggregator Each viewer V+D (DVB transmission) IGMPing v1c1+d1c1, v1c2+d2c2, v1c3+d3c3, ... Depth Encoder and 3D v1 m content camera content production d providers Millions ... Aggregator Hundreds MV + D 2D display v1 m content 200–300 MV+D decoder Multi- Encoder and 3D providers content with DIBR camera content production vn Aggregator channels Hundreds (3v, d) Each viewer IGMPing v1c1+ v5c1+ v9c1, v1c2+v5c2+v9c3, ... M-view Video Depth Metadata 3D display 200–300 content channelsFigure 4.2 Complexity of a commercial-grade 3DTV delivery environment using IPTV. Off-the-air broadcast could be accomplished with some compromise by usingthe entire HDTV bandwidth for a single 3DTV channel—here, multiple TVs ina household could be tuned to different programs. However, a traditional cable TV plant would ﬁnd it a challenge to deliver a(large) pack of 3DTV channels, but it could deliver a subset of their total selectionin 3DTV (say 10 or 20 channels) by scarifying bandwidth on the cable that couldotherwise carry distinct channels. The same is true for DTH applications. For IP, a service provider–engineered network could be used. Here, theprovider can control the latency, jitter, effective source–sink bandwidth, packet
OVERVIEW OF BASIC TRANSPORT APPROACHES 85 Millions ... All channels; 2D decoder each viewer 2D display 2D SD, HD selecting v1c1, v1c2, v1c3, ... Millions ... 2D Encoder and v1 camera content production m content providers CSV decoder Hundreds 200–300 CSV Aggregator content All channels; 3D display channels Stereo Encoder and 3D v1 each viewer camera content production v2 m content providers selecting v1c1+v2c1, v1c2+v2c2, v1c3+v2c3, ... Hundreds Millions ... V + meta Aggregator Stereo Encoder and 3D v1 V+D decoder Meta m content Cable TV camera content production providers Hundreds network(s) All channels; Aggregator 3D display V+D each viewer Hundreds selecting v1c1+d1c1, v1c2+d2c2, v1c3+d3c3, ... Depth Encoder and 3D v1 m content camera content production d providers Millions ... Aggregator Hundreds MV + D 2D display v1 m content 200–300 MV+D decoder Multi- Encoder and 3D providers content with DIBR camera content production vn Aggregator channels Hundreds (3v, d) All channels; M-view each viewer 3D display Video Depth Metadata selecting v1c1+v5c1+v9c1, v1c2+v5c2+v9c3, ... 200–300 content channelsFigure 4.3 Complexity of a commercial-grade 3DTV delivery environment using thecable TV infrastructure. All channels; each viewer selecting v1c1, v1c2, v1c3, ... DTH DTH 3D DTH DTH Millions ... 2D decoder 3D 3D 2D display 2D SD, HD DTH Millions ... 3D DTH 2D Encoder and DTH v1 camera content production m content DTH providers CSV decoder Hundreds CSV Aggregator All channels; 3D display Stereo Encoder and 3D v1 each viewer camera content production v2 m content selecting v1c1+v2c1, v1c2+v2c2, v1c3+v2c3, ... providers Millions ... Hundreds DTH V + meta Aggregator DTH DTH Stereo Encoder and 3D v1 DTH V+D decoder Meta m content camera content production providers All channels; 3D display Hundreds Aggregator each viewer V+D selecting v1c1+d1c1, v1c2+d2c2, v1c3+d2c3, ... Depth Encoder and 3D v1 m content camera content production d providers Aggregator Millions... Hundreds DTH DTH MV + D DTH 2D display m content DTH MV+D decoder Multi- Encoder and 3D v1 with DIBR providers camera content production vn Aggregator All channels; Hundreds (3v, d) each viewer selecting v1c1+v5c1+v9c1, v1c2+v5c2+v9c3, ... M-view Video Depth Metadata 3D display 200–300 content channelsFigure 4.4 Complexity of a commercial-grade 3DTV delivery environment using asatellite DTH infrastructure.
86 3DTV/3DV TRANSMISSION APPROACHES AND SATELLITE DELIVERY Millions ... All channels; 2D decoder each viewer 2D display 2D SD, HD selecting v1c1, v1c2, v1c3, ... Millions ... 2D Encoder and v1 camera content production m content providers CSV decoder Hundreds CSV Aggregator All channels; 3D display Stereo Encoder and 3D v1 each viewer camera content production v2 m content providers selecting v1c1+v2c1, v1c2+v2c2, v1c3+v2c3, ... Hundreds Millions ... V + meta Aggregator Stereo Encoder and 3D v1 V+D decoder Meta m content camera content production providers Hundreds Aggregator All channels; 3D display V+D each viewer selecting v1c1+d1c1, v1c2+d2c2, v1c3+d3c3, ... Depth Encoder and 3D v1 m content camera content production d providers Millions ... Aggregator Hundreds MV + D 2D display v1 m content MV+D decoder Multi- Encoder and 3D providers with DIBR camera content production vn Aggregator Hundreds (3v, d) All channels; M-view each viewer 3D display Video Depth Metadata selecting v1c1+v5c1+v9c1, v1c2+v5c2+v9c3, ... 200–300 content channelsFigure 4.5 Complexity of a commercial-grade 3DTV delivery environment using over-the-air infrastructure. Millions ... Each viewer 2D decoder selecting v1c1, v1c2, v1c3, ... 2D display 2D SD, HD Millions ... 2D Encoder and v1 camera content production m content providers CSV decoder Hundreds 200–300 CSV Aggregator content 3D display channels Each viewer Stereo Encoder and 3D v1 camera content production v2 m content selecting v1c1+v2c1, v1c2+v2c2, v1c3+v2c3, ... providers Millions ... V + meta Hundreds Aggregator Hundreds Server-driven Stereo Encoder and 3D v1 rate scaling (P2P) V+D decoder Meta m content camera content production providers Client-driven 3D display Hundreds Aggregator (selective) Each viewer V+D selecting v1c1+d1c1, v1c2+d2c2, v1c3+d3c3, ... Client-driven Depth Encoder and 3D v1 m content (multicast) camera content production d providers Servers Millions ... Aggregator Hundreds Internet MV + D 2D display v1 m content 200–300 MV+D decoder Multi- Encoder and 3D providers content with DIBR camera content production vn Aggregator channels Hundreds (3v, d) M-view Each viewer 3D display Video Depth Metadata selecting v1c1+v5c1+v9c1, v1c2+v5c2+v9c3, ... 200–300 content channelsFigure 4.6 Complexity of a commercial-grade 3DTV delivery environment using theInternet.loss, and other service parameters. However, if the approach is to use the Internet,performance issues will be a major consideration, at least for real-time services.A number of multi-view encoding and streaming strategies using RTP (Real-TimeTransport Protocol)/UDP (User Datagram Protocol)/IP or RTP/DCCP (DatagramCongestion Control Protocol)/IP exist for this approach. Video streaming architec-tures can be classiﬁed as (i) server to single client unicast, (ii) server multicasting
OVERVIEW OF BASIC TRANSPORT APPROACHES 87 Millions ... 2D decoder 2D display All channels; each viewer 2D SD, HD selecting v1c1, v1c2, v1c3, ... 2D Encoder and DVB-H v1 camera content production m content (or proprietary) providers CSV Hundreds Aggregator DVB-H Stereo Encoder and 3D v1 (or proprietary) camera content production v2 m content providers Millions ... V + meta Hundreds Aggregator DVB-H CSV decoder (or proprietary) Stereo Encoder and 3D v1 3D display Meta m content camera content production providers All channels; Hundreds Aggregator each viewer V+D selecting v1c1+v2c1, v1c2+v2c2, v1c3+v2c3, ... Depth Encoder and 3D v1 m content d DVB-H camera content production providers (or proprietary) Aggregator Hundreds MV + D v1 m content Multi- Encoder and 3D providers camera content production vn Aggregator Millions ... Hundreds (3v, d) V+D decoder 3D display Video Depth Metadata All channels; 200–300 content channels each viewer selecting v1c1+d1c1, v1c2+d2c2, v1c3+d3c3, ...Figure 4.7 Complexity of a commercial-grade 3DTV delivery environment using aDVB-H (or proprietary infrastructure). MVC JMVM Server-driven rate scaling (P2P) Scalable MVC Wireline IP client Client-driven (selective) MDC Client-driven (multicast) Server Internet infrastructure Wireless client Encoders Clients content providers (Viewers)Figure 4.8 Block diagram of the framework and system for 3DTV streaming transportover IP.to several clients, (iii) P2P unicast distribution, where each peer forwards packetsto another peer, and (iv) P2P multicasting, where each peer forwards packets toseveral other peers. Multicasting protocols can be supported at the network-layeror application layer . Figure 4.8 provides a view of the framework and systemfor 3DTV streaming transport over IP.
88 3DTV/3DV TRANSMISSION APPROACHES AND SATELLITE DELIVERY Yet, there is a lot of current academic research and interest in connectionlessdelivery of 3DTV content over shared packet networks. 3D video content needsto be protected when transmitted over unreliable communication channels. Theeffects of transmission errors on the perceived quality of 3D video could not beless than those for the equivalent 2D video applications, because the errors willinﬂuence several perceptual attributes (e.g., naturalness, presence, depth percep-tion, eye-strain, and viewing experience), associated with 3D viewing . It has long been known that IP-based transport can accommodate a wide rangeof applications. Transport and delivery of video in various forms goes backto the early days of the Internet. However, (i) the delivery of quality (jitter-,loss-free) content, particularly HD or even 3D; (ii) the delivery of content ina secure, money-making subscription-based manner; and (iii) the delivery ofstreaming real-time services for thousands of channels (worldwide) and millionsof simultaneous customers remain a long shot at this juncture. Obviously, atthe academic level, transmission of video over the Internet (whether 2D or 3D)is currently an active research and development area where signiﬁcant resultshave already been achieved. Some video-on-demand services that make use ofthe Internet, both for news and entertainment applications, have emerged butdesiderata (i), (ii), and (iii) have not been met. Naturally, it is critical to distinguishbetween the use of the IP (IPv4 or IPv6) protocol and the use of the Internet(that is based on IP), as a delivery mechanism (a delivery channel). IPTV servicesdelivered over a restricted IP infrastructure appear to be more tenable in the shortterm, both in terms of Quality of Service (QoS) and Quality of Experience (QoE).Advocates now advance the concept of 3D IPTV. The transport of 3DTV signalsover IP packet networks appears to be a natural extension of video over IPapplications; but the IPTV model (rather than the Internet) model seems moreappropriate at this time. The consuming public will not be willing, we arguebased on experience, to purchase new (fairly expensive) TV displays for 3D, ifthe quality of the service is not there. The technology has to “disappear into thebackground” and not be smack in the foreground, if the QoE has to be reasonable. To make a comparison with Voice over Internet Protocol (VoIP), it should benoted that while voice over the Internet is certainly doable end-to-end, special-ized commercial VoIP providers tend to use the Internet mostly for access (exceptfor international calling to secondary geographic locations). Most top-line (tradi-tional) carriers use the IPover their own internally designed, internally engineered,and internally provisioned network, and also for core transport [8–21]. Some of the research issues associated with IP delivery in general,and IP/Internet streaming in particular, include but are not limited to, thefollowing : 1. Determination of the best video encoding conﬁguration for each streaming strategy: multi-view video encoding methods provide some compression efﬁciency gain at the expense of creating dependencies between views that hinder random access to views.
OVERVIEW OF BASIC TRANSPORT APPROACHES 89 2. Determination of the best rate adaptation method: adaptation refers to adaptation of the rate of each view as well as inter-view rate allocation depending on available network rate and video content, and adaptation of the number and quality of views transmitted depending on available network rate and user display technology and desired viewpoint. 3. Packet-loss resilient video encoding and streaming strategies as well as better error concealment methods at the receiver: some ongoing industry research includes the following . • Some research related to Robust Source Coding is of interest. In a connec- tionless network, packets can get lost; the network is lossy. A number of standard source coding approaches are available to provide robust source coding for 2D video to deal with this issue, and many of these can be used for 3D V + D applications (features such as slice coding, redun- dant pictures, Flexible Macroblock Ordering or FMO, Intrarefresh, and Multiple Description Coding or MDC are useful in this context). Loss- aware rate-distortion optimization is often used for 2D video to optimize the application of robust source coding techniques. However, the models used have not been validated for use with 3D video in general and FVV in particular. • Some research related to Cross-Layer Error Robustness is also of interest for transport of V + D signals over a connectionless network. In recent years, attention has focused on cross-layer optimization of 2D video qual- ity. This has resulted in algorithms that have optimized channel coding and prioritized the video data. Similar work is needed to uncover/assess appropriate methods to transport 3D video across networks. • Other research work pertains to Error Concealment that might be needed when transporting V + D signals over a connectionless network. Most 2D error concealment algorithms can be used for 3D video. However, there is additional information that can be used in 3D video to enhance the concealed quality: for example, information such as motion vectors can be shared between the color and depth video; if color information is lost, then depth motion vectors can be used to carry out concealment. There are other opportunities with MVC, where adjacent views can be used to conceal a view that is lost. 4. Best peer-to-peer multicasting design methods are required, including topol- ogy discovery, topology maintenance, forwarding techniques, exploitation of path diversity, methods for enticing peers to send data and to stay con- nected, and use of dedicated nodes as relays. Some have argued that stereo streaming allows for ﬂexibility in congestioncontrol methods, such as video rate adaptation to the available network rate,methods for packet loss handling, and postprocessing for error concealment butit is unclear how a commercial service with paying customers (possibly paying a
90 3DTV/3DV TRANSMISSION APPROACHES AND SATELLITE DELIVERYpremium for the 3DTV service) would indeed be able to accept any degradationin quality. Developing laboratory test bed servers that unicast content to multiple clientswith stereoscopic displays should be easily achievable, as should be the case forother comparable arrangements. Translating those test beds to scalable, reliable(99.999% availability), cost-effective commercial-service-supporting infrastruc-tures is altogether another matter. In summary, real-time delivery of 3DTV content can make use of satel-lite, cable, broadcast, IP, IPTV, Internet, and wireless technologies. Any uniquerequirements of 3DTV need to be taken into account. The requirements are verysimilar to those needed for delivery of entertainment-quality video (e.g., with ref-erence to latency, jitter, and packet loss), but with the observation that a number(in not most) of the encoding techniques require more bandwidth. The incremen-tal bandwidth is as follows: (i) from 20% to 100% more for stereoscopic viewingcompared with 2D viewing;1 (ii) from 50% to 200% for multi-view systems com-pared with 2D viewing; and (iii) a lot more bandwidth for holoscopic/holographicdesigns (presently not even being considered for near-term commercial 3DTVservice). We mentioned explicit coding earlier: that would indeed provide moreefﬁciency, but as noted, most video systems in use today (or anticipated to beavailable in the near future) use explicit coding. Synthetic video generation basedon CGI techniques needs less bandwidth than actual video. There can also be con-tent with Mixed Reality (MR)/Augmented Reality (AR) that mix graphics withreal images, such as those that use depth information together with image data for3D scene generation. These systems may also require less bandwidth than actualfull video. Stereoscopic video (CSV) may be used as a reference point. Holo-scopic/holographic systems require the most. It should also be noted that whilegraphic techniques and/or implicit coding may require a very large transmissionbandwidth, the tolerance to information (packet loss) is typically very low. We conclude this section by noting, again, that while connectionless packetnetworks offer many research opportunities as related to supporting 3DTV, webelieve that a commercial 3DTV service will more likely occur in a connection-oriented (e.g., DTH, cable TV) environment and/or a controlled-environmentIPTV setting.4.2 DVBDVB is a consortium of over 300 companies in the ﬁelds of broadcasting andmanufacturing that work cooperatively to establish common international stan-dards for digital broadcasting. DVB-generated standards have become the lead-ing international standards, commonly referred to as “DVB,” and the acceptedchoice for technologies that enable an efﬁcient, cost-effective, high-quality, and1 As noted, spatial compression uses the same channel bandwidth as a traditional TV signal, but bycompromising resolution.
DVB 91interoperable digital broadcasting. The DVB standards for digital television havebeen adopted in the United Kingdom, across mainland Europe, in the Middle East,South America, and in Australasia. DBV standards are used for DTH satellitetransmission 22 (and also for terrestrial and cable transmission). The DVB standards are published by a Joint Technical Committee (JTC)of European Telecommunications Standards Institute (ETSI), European Com-mittee for Electrotechnical Standardization (Comit´ Europ´ en de Normalisation e eElectrotechnique—CENELEC), and European Broadcasting Union (EBU). DVBproduces speciﬁcations that are subsequently standardized in one of the Europeanstatutory standardization bodies. They cover the following DTV-related areas: • conditional access, • content protection copy management, • interactivity, • interfacing, • IP, • measurement, • middleware, • multiplexing, • source coding, • subtitling, • transmission. Standards have emerged in the past 10 years for deﬁning the physical layerand data link layer of a distribution system, as follows: • satellite video distribution (DVB-S and DVB-S2), • cable video distribution (DVB-C), • terrestrial television video distribution (DVB-T), • terrestrial television for handheld mobile devices (DVB-H). Distribution systems differ mainly in the modulation schemes used (becauseof speciﬁc technical constraints): • DVB-S (SHF) employs QPSK (Quadrature Phase-Shift Keying). • DVB-S2 employs QPSK, 8PSK (Phase-Shift Keying), 16APSK (Asymmet- ric Phase-Shift Keying) or 32APSK; 8PSK is the most common at this time (it supports a 30-megasymbols pre-satellite transponder and provides a usable rate in the 75 Mbps range, or about 25 SD-equivalent MPEG-4 video channels). • DVB-C (VHF/UHF) employs QAM (Quadrature Amplitude Moderation): 64-QAM or 256-QAM. • DVB-T (VHF/UHF) employs 16-QAM or 64-QAM (or QPSK) along with COFDM (Coded Orthogonal Frequency Division Multiplexing).
92 3DTV/3DV TRANSMISSION APPROACHES AND SATELLITE DELIVERY • DVB-H: refer to the next section. Because these systems have been widely deployed, especially in Europe, theymay well play a role in the near-term 3DTV services. IPTV also makes useof a number of these standards, particularly when making use of satellite links(an architecture that has emerged is to use satellite links to provide signals tovarious geographically distributed headends, which then distribute these signalsterrestrially to a small region using the telco IP network—these headends actas rendezvous point in the IP Multicast infrastructure). Hence, in the reasonableassumption that IPTV will play a role in 3DTV, these speciﬁcations will also beconsidered for 3DTV in that context. As implied above, transmission is a key area of activity for DVB. See Table 4.2for some of the key transmission speciﬁcations. In particular, EN 300 421 V1.1.2 (1997–2008) describes the modulation andchannel coding system for satellite digital multiprogram television (TV)/HDTVservices to be used for primary and secondary distribution in Fixed SatelliteService (FSS) and Broadcast Satellite Service (BSS) bands. This speciﬁcationis also known as DVB-S. The system is intended to provide DTH services forconsumer IRD, as well as cable television headend stations with a likelihoodof remodulation. The system is deﬁned as the functional block of equipmentperforming the adaptation of the baseband TV signals, from the output of theMPEG-2 transport multiplexer (ISO/IEC DIS 13818-1) to the satellite channelcharacteristics. The following processes are applied to the data stream: • transport multiplex adaptation and randomization for energy dispersal; • outer coding (i.e., Reed–Solomon); • convolutional interleaving; • inner coding (i.e., punctured convolutional code); • baseband shaping for modulation; • modulation. DVB-S/DVB-S2 as well as the other transmission systems could be used todeliver 3DTV. As seen in Fig. 4.9, MPEG information is packed into PESs(Packetized Elementary Streams), which are then mapped to TSs that are thenhandled by the DVB adaptation. The system is directly compatible with MPEG-2 coded TV signals. The modem transmission frame is synchronous with theMPEG-2 multiplex transport packets. Appropriate adaptation to the signal formats(e.g., MVC ISO/IEC 14496-10:2008 Amendment 1 and ITU-T RecommendationH.264, the extension of AVC) will have to be made, but this kind of adaptationhas recently been deﬁned in the context of IPTV to carry MPEG-4 streams overan MPEG-2 infrastructure (Fig. 4.10). Some additional arrangements for the use of satellite transmission are depictedin Chapter 5. Also, see Appendix A4 for a brief overview of MPEG multiplexingand DVB support.
DVB 93TABLE 4.2 Key DVB Transmission SpeciﬁcationsEN 300 421 V1.1.2 (08/97), S Framing structure, channel coding, and modulation for 11/12 GHz satellite servicesTR 101 198 V1.1.1 (09/97) Implementation of Binary Phase Shift Keying (BPSK) modulation in DVB satellite transmission systemsEN 302 307 V1.2.1 (08/09), S2 Second-generation framing structure, channel coding, and modulation systems for broadcasting, interactive services, news gathering, and other broadband satellite applicationsTR 102 376 V1.1.1 (02/05) User guidelines for the second-generation system for broadcasting, interactive services, news gathering, and other broadband satellite applicationsTS 102 441 V1.1.1 (10/05) DVB-S2 adaptive coding and modulation for broadband hybrid satellite dial-up applicationsEN 300 429 V1.2.1 (04/98), C Framing structure, channel coding, and modulation for cable systemsDVB BlueBook A138 (04/09), C2 Frame structure channel coding and modulation for a second-generation digital transmission system for cable systems (DVB-C2)EN 300 473 V1.1.2 (08/97), CS DVB Satellite Master Antenna Television (SMATV) distribution systemsTS 101 964 V1.1.1 (08/01) Control channel for SMATV/MATV (Master Antenna Television) distribution systems; baseline speciﬁcationTR 102 252 V1.1.1 (10/03) Guidelines for implementation and use of the control channel for SMATV/MATV distribution systemsEN 300 744 V1.6.1 (01/09), T Framing structure, channel coding, and modulation for digital terrestrial televisionTR 101 190 V1.3.1 (07/08) Implementation guidelines for DVB terrestrial services; transmission aspectsTS 101 191 V1.4.1 (06/04) Megaframe for Single Frequency Network (SFN) synchronizationEN 302 755 V1.1.1 (09/09) Frame structure channel coding and modulation for a second-generation digital terrestrial television broadcasting system (DVB-T2)DVB BlueBook A122 (12/09), T2 Frame structure channel coding and modulation for a second-generation digital terrestrial television broadcasting system (DVB-T2) (dEN302 755 V1.2.1) (continued overleaf )
94 3DTV/3DV TRANSMISSION APPROACHES AND SATELLITE DELIVERYTABLE 4.2 (Continued )DVB BlueBook A133 (12/09) Implementation guidelines for a second-generation digital terrestrial television broadcasting system (DVB-T2) (draft TR 102 831 V1.1.1)TS 102 773 V1.1.1(09/09) Modulator Interface (T2-MI) for a second-generation digital terrestrial television broadcasting system (DVB-T2)EN 302 304 V1.1.1 (11/04), H Transmission system for handheld terminalsTR 102 377 V1.4.1 (06/09) Implementation guidelines for DVB handheld servicesTR 102 401 V1.1.1 (05/05) DVB-H validation task force reportTS 102 585 V1.1.2 (04/08) System speciﬁcations for Satellite Services to Handheld Devices (SH) below 3 GHzEN 302 583 V1.1.1 (03/08), SH Framing structure, channel coding, and modulation for SH below 3 GHzDVB BlueBook A111 (12/09) Framing structure, channel coding, and modulation for SH below 3 GHz (dEN 302 583 v.1.2.1)TS 102 584 V1.1.1 (12/08) Guidelines for implementation for SH below 3 GHzDVB BlueBook A131 (11/08) MPE-IFEC (draft TS 102 772 V1.1.1)EN 300 748 V1.1.2 (08/97), MDS Multipoint Video Distribution Systems (MVDS) at 10 GHz and aboveEN 300 749 V1.1.2 (08/97) Framing structure, channel coding, and modulation for MMDS(Multichannel Multipoint Distribution Service) systems below 10 GHzEN 301 701 V1.1.1 (08/00) OFDM (Orthogonal Frequency Division Multiplexing) modulation for microwave digital terrestrial televisionEN 301 210 V1.1.1 (02/99), DSNG Framing structure, channel coding, and modulation for Digital Satellite News Gathering (DSNG) and other contribution applications by satelliteTR 101 221 V1.1.1 (03/99) User guidelines for DSNG and other contribution applications by satelliteEN 301 222 V1.1.1 (07/99) Coordination channels associated with DSNG For Digital Rights Management (DRM), the DVB Project–developed Dig-ital Video Broadcast Conditional Access (DVB-CA) deﬁnes a Digital VideoBroadcast Common Scrambling Algorithm (DVB-CSA) and a Digital VideoBroadcast Common Interface (DVB-CI) for accessing scrambled content: • DVB system providers develop their proprietary conditional access systems within these speciﬁcations; • DVB transports include metadata called service information (DVB-SI i.e., Digital Video Broadcast Service Information) that links the various Elemen- tary Streams (ESs) into coherent programs and provides human-readable descriptions for electronic program guides.
DVB-H 95 System General data information (already in IP packets) Video coder Programme 1 MUX Audio coder Transport MUX Data coder 2 Sync 187 bytes 1 byte Services components n MPEG-2 transport MUX packet DVB-S / DVB-S2 Services Encapsulation 204 bytes MPEG-2 Source coding and multiplexing Sync 1 R or RS(204,188,8) sync n 187 bytes Reed–Solomon RS (204,188, T-8) error protected packet DVB-S / DVB-S2 Infrastructure (e.g., DTH) Video coder Programme Convolutional 1 MUX Audio coder RS (204,188) code MUX Transport Base- adaptation Outer Convolutional Inner QPSK MUX Data coder 2 band and energy coder interleaver coder modulator shaping dispereal Services components n To the RF Services satellite MPEG-2 channel Source coding and multiplexing Satellite channel adapter Figure 4.9 Functional block diagram of DVB-S. MPEG-2 MPEG-4 PS TS DVD DVB DVB-S2 Figure 4.10 Mapping of MPEG-2/MPEG-4 to DVB/DVB-S2 systems.4.3 DVB-HThere is interest in the industry in delivering 3DTV services to mobile phones.It is perceived that simple lenticular screens can work well in this context andthat the bandwidth (even though always at a premium in mobile applications)
96 3DTV/3DV TRANSMISSION APPROACHES AND SATELLITE DELIVERYwould not be too onerous overall; even assuming a model with two independentstreams being delivered, it would double the bandwidth to 2 × 384 kbps or2 × 512 kbps and the use of spatial compression (which should not be such a“big” compromise here) would be handled at the traditional data rate, 384 kbpsor 512 kbps. DVB-H, as noted in Table 4.2, is a DVB speciﬁcation that deals withapproaches and technologies to deliver commercial-grade medium-qualityreal-time linear and on-demand video content to handheld, battery-powereddevices such as mobile telephones and PDAs (Personal Digital Assistants). IPMulticast is typically employed to support DVB-H. DVB-H2 addresses the requirements for reliable, high-speed, high–data ratereception for a number of mobile applications including real-time video to hand-held devices. DVB-H systems typically make use of IP Multicast. DVB-H isgenerating signiﬁcant interest in the broadcast and telecommunications worlds,and DVB-H services are expected to start at this time. The DVB-H standardshave been standardized through ETSI. ETSI EN 302 304 “Digital Video Broadcasting (DVB); Transmission Systemfor Handheld Terminals (DVB-H)” is an extension of the DVB-T standard. Addi-tional features have been added to support handheld and mobile reception. Lowerpower consumption for mobile terminals and secured reception in the mobilityenvironments are key features of the standard. It is meant for IP-based wirelessservices. DVB-H can share the DVB-T MUX with MPEG-2/MPEG-4 services,so it can be part of the IPTV infrastructure described in the previous chapter,except that lower bitrates are used for transmission (typically in the 384-kbpsrange). DVB-H was published as ETSI Standard in 2004 as an umbrella standarddeﬁning how to combine the existing (now updated) ETSI standards to form theDVB-H system (Fig. 4.11). DVB-H is based on DVB-T, a standard for digital transmission of terrestrialover-the-air TV signals. When DVB-T was ﬁrst published in 1997, it was notdesigned to target mobile receivers. However, DVB-T mobile services have beenlaunched in a number of countries. Indeed, with the advent of diversity antennareceivers, services that target ﬁxed reception can now largely be received on themove as well. DVB-T is deployed in more than 50 countries. Yet, a new standardwas sought, namely, DVB-H. Despite the success of mobile DVB-T reception, the major concern with anyhandheld device is that of battery life. The current and projected power con-sumption of DVB-T front-ends is too high to support handheld receivers that areexpected to last from one to several days on a single charge. The other majorrequirements for DVB-H were an ability to receive 15 Mbps in an 8-MHz chan-nel and in a wide area Single Frequency Network (SFN) at high speed. Theserequirements were drawn up after much debate and with an eye on emergingconvergence devices providing video services and other broadcast data servicesto 2.5G and 3G handheld devices. Furthermore, all this should be possible while2 This material is based on Ref. .
DVB-H 97 DVB-H IPE MPEG-2 DVB-T modulator IP mux MPE MPE FEC Time slicing 8K 4K 2K DVB-H TPS Channel DVB-H IPE IP MPE MPE FEC Time slicing DVB-H IPE IP MPE MPE FEC Time slicing DVB-H decapsulator DVB-T demodulator IP MPE MPE FEC Time slicing DVB-H TPS 2K 4K 8K DVB-H decapsulator DVB-T demodulator IP MPE MPE FEC Time slicing DVB-H TPS 2K 4K 8K DVB-H decapsulator DVB-T demodulator IP MPE MPE FEC Time slicing DVB-H TPS 2K 4K 8K New to DVB-H IPE = IP Encapsulator Figure 4.11 DVB-H Framework. IP Multicast Content Content Content Provisioning Operation encoding acquisition aggregation (low bit rate) of service of network Transmitter Operation Buy content Provisioning of cellular of cellular network Consumer Rights Figure 4.12 Block-level view of a DVB-H network.maintaining maximum compatibility with existing DVB-T networks and systems.Figure 4.12 depicts a block-level view of a DVB-H network. In order to meet these requirements, the newly developed DVB-H speciﬁcationincludes the capabilities discussed next.
98 3DTV/3DV TRANSMISSION APPROACHES AND SATELLITE DELIVERY • Time-Slicing: Rather than continuous data transmission as in DVB-T, DVB- H employs a mechanism where bursts of data are received at a time—a so-called IP datacast carousel. This means that the receiver is inactive for much of the time, and can thus, by means of clever control signaling, be “switched off.” The result is a power saving of about 90% and more in some cases. • “4-K Mode”: With the addition of a 4-K mode with 3409 active carriers, DVB-H beneﬁts from the compromise between the high-speed small-area SFN capability of 2-K DVB-T and the lower speed but larger area SFN of 8-K DVB-T. In addition, with the aid of enhanced in-depth interleavers in the 2-K and 4-K modes, DVB-H has even better immunity to ignition interference. • Multiprotocol Encapsulation–Forward Error Correction (MPE-FEC): The addition of an optional, multiplexer level, FEC scheme means that DVB-H transmissions can be even more robust. This is advantageous when consid- ering the hostile environments and poor (but fashionable) antenna designs typical of handheld receivers. Like DVB-T, DVB-H can be used in 6-, 7-, and 8-MHz channel environments.However, a 5-MHz option is also speciﬁed for use in non-broadcast environments.A key initial requirement, and a signiﬁcant feature of DVB-H, is that it can coexistwith DVB-T in the same multiplex. Thus, an operator can choose to have twoDVB-T services and one DVB-H service in the same overall DVB-T multiplex. Broadcasting is an efﬁcient way of reaching many users with a single (conﬁg-urable) service. DVB-H combines broadcasting with a set of measures to ensurethat the target receivers can operate from a battery and on the move, and isthus an ideal companion to 3G telecommunications, offering symmetrical andasymmetrical bidirectional multimedia services. DVB-H trials have been conducted in recent years in Germany, Finland, andthe United States. Such trials help frequency planning and improve understandingof the complex issue of interoperability with telecommunications networks andservices. However, to date at least in the United States, there has been limitedinterest (and success) in the use of DVB-H to deliver video to hand-held devices.Providers have tended to use proprietary protocols. Proponents have suggested the use of DVB-H for delivery of 3DTV to mobiledevices. Some make the claim that wireless 3DTV may be introduced at an earlypoint because of the tendency of wireless operators to feature new applicationsearlier than traditional carriers. While this may be true in some parts of theworld—perhaps mostly driven by the regulatory environment favoring wirelessin some countries, by the inertia of the wireline operators, and by the relativeease with which “towers are put up”—we remain of the opinion that the spectrumlimitations and the limited QoE of a cellular 3D interaction do not make cellular3D such a ﬁnancially compelling business case for the wireless operators toinduce them to introduce the service “over night.”
REFERENCES 99REFERENCES 1. Minoli D. IP multicast with applications to IPTV and mobile DVB-H. New York: Wiley/IEEE Press; 2008. 2. Minoli D. Video dialtone technology: digital video over ADSL, HFC, FTTC, and ATM. New York: McGraw-Hill; 1995. 3. Minoli D. Distributed multimedia through broadband communication services (co- authored). Norwood, MA: Artech House; 1994. 4. Minoli D. Digital video. In: Terplan K, Morreale P, editors. The telecommunications handbook, Chapter 4. New York: IEEE Press; 2000. 5. Minoli D. Distance learning: technology and applications. Norwood, MA: Artech House; 1996. 6. Tekalp M, editor. D32.2, Technical Report #2 on 3D Telecommunication Issues, Project Number: 511568, Project Acronym: 3DTV, Title: Integrated Three- Dimensional Television—Capture, Transmission and Display. Feb 20, 2007. 7. Hewage CTER, Worrall S. Robust 3D video communications. IEEE Comsoc MMTC E-Letter 2009; 4(3). 8. Minoli D. Delivering voice over IP networks, 1st edn (co-authored). New York: Wiley; 1998. 9. Minoli D. Delivering voice over IP and the Internet, 2nd edn (co-authored). New York: Wiley; 2002.10. Minoli D. Voice over MPLS. New York: McGraw-Hill; 2002.11. Minoli D. Voice over IPv6—architecting the next-generation VoIP. New York: Else- vier; 2006.12. Minoli D. Delivering voice over frame relay and ATM, (co-authored). New York: Wiley; 1998.13. Minoli D. Optimal packet length for packet voice communication. IEEE Trans Com- mun 1979; COMM-27:607–611.14. Minoli D. Packetized speech network, Part 3: Delay behavior and performance char- acteristics. Aust Electron Eng 1979:59–68.15. Minoli D. Packetized speech networks, Part 2: Queuing model. Aust Electron Eng 1979:68–76.16. Minoli D. Packetized speech networks, Part 1: Overview. Aust Electron Eng 1979:38–52.17. Minoli D. Satellite On-Board Processing of Packetized Voice. ICC 1979 Conference Record. pp. 58.4.1–58.4.5; New York, NY, USA.18. Minoli D. Issues in packet voice communication. Proc IEE 1979; 126(8):729–740.19. Minoli D. Analytical models for initialization of single hop packet radio networks. IEEE Trans Commun 1979; COMM-27:1959–1967, Special Issue on Digital Radio (with I. Gitman and D. Walters).20. Minoli D. Some design parameters for PCM-based packet voice communication. International Electrical/Electronics Conference Record; 1979; Toranto, ONT, Canada.21. Minoli D. Digital voice communication over digital radio links. SIGCOMM Comput Commun Rev 1979; 9(4):6–22.
100 3DTV/3DV TRANSMISSION APPROACHES AND SATELLITE DELIVERY22. Minoli D. Satellite systems engineering in an IPv6 environment. New York: Taylor and Francis; 2009.23. The DVB Project Ofﬁce. MacAvock P., Executive Director, DVB-H White Paper. EBU, Geneva, Switzerland. http://www.dvb.org. 2010.24. ISO/IEC IS 13818-1. Information technology—Generic coding of moving pictures and associated audio information—Part 1: Systems. International Organization for Standardization (ISO); 2000.25. ISO/IEC DIS 13818-2. Information technology—Generic coding of moving pictures and associated audio information: Video, International Organization for Standardiza- tion (ISO); 1995.26. ISO/IEC 13818-3:1995. Information technology—Generic coding of moving pictures and associated audio information—Part 3: Audio, International Organization for Stan- dardization (ISO). 1995.27. Fairhurst G, Montpetit M-J. Address Resolution for IP datagrams over MPEG-2 net- works. Internet Draft draft-ietf-ipdvb-ar-00.txt, IETF ipdvb. Jun 2005.28. Clausen HD, Collini-Nocker B, et al. Simple Encapsulation for transmission of IP datagrams over MPEG-2/DVB Networks. Internet Engineering Task Force draft- unisal-ipdvb-enc-00.txt. May 2003.29. Montpetit MJ, Fairhurst G, et al. RFC 4259, a framework for transmission of IP datagrams over MPEG-2 networks. Nov 2005.30. Faria G, Henriksson JA, Stare E, et al. DVB-H: digital broadcast services to handheld devices. IEEE Proc IEEE 2006; 94(1):194.
APPENDIX A4: BRIEF OVERVIEW OF MPEG MULTIPLEXING AND DVB SUPPORT 101APPENDIX A4: BRIEF OVERVIEW OF MPEG MULTIPLEXING AND DVBSUPPORTA4.1 Packetized Elementary Stream (PES) Packets and TransportStream (TS) Unit(s)International Standard ISO/IEC 13818-1 was prepared3 by JTC ISO/IEC JTC 1,Information technology, Subcommittee SC 29, “Coding of audio, picture, multi-media and hypermedia information,” in collaboration with ITU-T. The identicaltext is published as ITU-T Rec. H.222.0. ISO/IEC 13818 consists of the follow-ing parts, under the general title “Information technology—Generic coding ofmoving pictures and associated audio information”:4 • Part 1: Systems • Part 2: Video • Part 3: Audio • Part 4: Conformance testing • Part 5: Software simulation • Part 6: Extensions for DSM-CC • Part 7: Advanced Audio Coding (AAC) • Part 9: Extension for real time interface for systems decoders • Part 10: Conformance extensions for Digital Storage Media Command and Control (DSM-CC)The MPEG-2 and/or -4 standard deﬁnes three layers: systems, video, and audio[24–26]. The systems layer supports synchronization and interleaving of multiplecompressed streams, buffer initialization and management, and time identiﬁca-tion. For video and audio, the information is organized into access units, eachrepresenting a fundamental unit of encoding; for example, in video, an access unitwill usually be a complete encoded video frame. The audio and the video layersdeﬁne the syntax and semantics of the corresponding Elementary Streams (ESs).An ES is the output of an MPEG encoder and typically contains compresseddigital video, compressed digital audio, digital data, and digital control data. Theinformation corresponds to an access unit (a fundamental unit of encoding), suchas a video frame. The compression is achieved using the DCT. Each ES is inturn an input to an MPEG-2 processor that accumulates the data into a stream ofPES packets. A PES typically contains an integral number of ESs. Figure A4.1shows both the multiplex structure and the Protocol Data Unit (PDU) format. APES packet may be a ﬁxed- or variable-sized block, with up to 65,536 octets perblock and includes a 6-byte protocol header.3 This second edition, published in 2000, cancels and replaces the ﬁrst edition (ISO/IEC 3818-1:1996),which has been technically revised.4 Part 8 has been withdrawn; it addressed 10-bit video.
102 3DTV/3DV TRANSMISSION APPROACHES AND SATELLITE DELIVERY Uncompressed MPEG-2 ES streamVideo elementary Packetizer PES encoder MPEG Encoded Systems layer (compressed) stream multiplexer PES and SAR Uncompressed (segmentation/ Transport stream MPEG-2 reassembly) stream ESAudio elementary Packetizer TS encoder MPEG Encoded (compressed) stream PES ES Data Packetizer Header PES payload Packet start Stream PES packet Optional code prefix ID lenght PES header 24 8 16 Figure A4.1 Combining of Packetized Elementary Streams (PES) into a TS. Serial PES ENCODER Encoder digital Interface Video output Stream ID1 SPTS (single program (SDI) video transport stream) PES MPEG-2 UDP IP Audio Stream ID2 Mux 4 184 bytes encapsulation PES 4 184 bytes Data 4 184 bytes MPEG 2 transport stream (TS) 4 bytes 184 bytes Packetized elementary stream (PES) Header Payload (32 bits) Transport Packetized elementary stream Payload Transport Adaptation 6 bytes Sync error Transport Continuity unit start PID scrambling field byte indicator priority counter Header PES packet data bytes indicator control control (TEI) (48 bits) 8 1 1 1 13 2 2 4 Packet start Stream PES Optional code prefix ID packet length PES header 24 8 16 Figure A4.2 PES and TS multiplexing. As seen in the ﬁgure, and more directly in Fig. A4.2, PESs are then mappedto Transport Stream (TS) unit(s). Each MPEG-2 TS packet carries 184 octets ofpayload data preﬁxed by a 4-octet (32 bit) header (the resulting 188-byte packetsize was originally chosen for compatibility with Asynchronous Transfer Mode(ATM systems). These packets are the basic unit of data in a TS. They consist
APPENDIX A4: BRIEF OVERVIEW OF MPEG MULTIPLEXING AND DVB SUPPORT 103of a sync byte (0 × 47), followed by ﬂags and a 13-bit Packet Identiﬁer (PID5 ).This is followed by other (some optional) transport ﬁelds; the rest of the packetconsists of the payload. Figure A4.3 connects the PES and TS concepts together. PES PES PES packet payload PES packet payload hdr hdr TS TS TS TS TS payload TS payload TS payload TS payload hdr hdr hdr hdr Adaptation field (used for stuffing here) 188 bytes 4 bytes Adaptation field Header Payload (may not be present) - Sync_byte (sync the decoder) - Transport_error_indicator - Discontinuity_indicator - Payload_unit_start_indicator - Random_access_indicator (PSI or PES packet) - ES_priority_indicator - Transport priority - Flags (PCR_flag...) - PID (13 bit id for each stream) - PCR (if PCR_flag is set) (system - Transport_scrambling control time clock, every 0.1 sec, to sync - Adaptation_field_control decoder and encoder time) - Continuity_counter (counts - Other fields depending on flags packets of PES) - PES packet length Figure A4.3 A sequence of PESs leads to a sequence of uniform TS packets. The PID is a 13-bit ﬁeld that is used to uniquely identify the stream to whichthe packet belongs (e.g., PES packets corresponding to an ES) generated by themultiplexer. Each MPEG-2 TS channel is uniquely identiﬁed by the PID valuecarried in the header of ﬁxed length MPEG-2 TS packets. The PID allows thereceiver to identify the stream to which each received packet belongs; effectively,it allows the receiver to accept or reject PES packets at a high level withoutburdening the receiver with extensive processing. Often one sends only one PES(or a part of a single PES) in a TS packet (in some cases, however, a givenPES packet may span several TS packets so that the majority of TS packetscontain continuation data in their payloads). Each PID contains speciﬁc video,audio or data information. Programs are groups of one or more PID streams thatare related to each other. For example, a TS used in IPTV could contain ﬁveprograms, to represent ﬁve video channels. Assume that each channel consists ofone video stream, one or two audio streams, and metadata. A receiver wishing totune to a particular “channel” has to decode the payload of the PIDs associatedwith its program. It can discard the contents of all other PIDs. The number of TSlogical channels is limited to 8192, some of which are reserved; unreserved TS5 Some also call this the Program ID.
104 3DTV/3DV TRANSMISSION APPROACHES AND SATELLITE DELIVERYlogical channels may be use to carry audio, video, IP datagrams, or other data.Examples of systems using MPEG-2 include the DVB and Advanced TelevisionSystems Committee (ATSC) Standards for Digital Television.Note 1: Ultimately an IPTV stream consists of packets of ﬁxed size. MPEG (specif-ically MPEG-4) packets are aggregated into an IP packet then and the IP packetis transmitted using IP Multicast methods. MPEG TS are then typically encapsu-lated in the UDP and then in IP. In turn, and (only) for interworking with existingMPEG-2 systems already deployed (e.g., satellite systems and associated groundequipment supporting DTH), this IP packet needs further encapsulation, as dis-cussed later. Note that traditional MPEG-2 approaches make use of the PID toidentify content, whereas in IPTV applications, the IP Multicast address is usedto identify the content; also, the latest IPTV systems make use of MPEG-4-codedPESs.Note 2: The MPEG-2 standard deﬁnes two ways for multiplexing different ele-mentary stream types: (i) Program Stream (PS) and (ii) Transport Stream (TS). • An MPEG-2 PS is principally intended for storage and retrieval from storage media. It supports grouping of video, audio, and data ESs that have a common time base. Each PS consists of only one content (TV) program. The PS is used in error-free environments; for example, DVDs use the MPEG-2 PS. A PS is a group of tightly coupled PES packets referenced to the same time base. • An MPEG-2 TS combines multiple PESs (that may or may not have common time base) into a single stream and multiplexes these PESs into one stream, along with information for synchronizing between them. At the same time the TS segments the PES into the smaller ﬁxed-size TS packets. An entire video frame may be mapped in one PES packet. PES headers distinguish PES packets of various streams and also contain time stamp information. PESs are generated by the packetization process; the payload consists of the data bytes taken sequentially from the original ES. A TS may correspond to a single TV program; this type of TS is normally called a Single Program Transport Stream (SPTS). In most cases, one or more SPTS streams are combined to form a Multiple Program Transport Stream (MPTS). This larger aggre- gate also contains the control information (Program Speciﬁc Information or PSI) .A4.2 DVB (Digital Video Broadcasting)-Based Transport in PacketNetworksAs we discussed in the body of this chapter, DVB-S is set up to carry MPEG-2TS streams encapsulated with 16 bytes of Reed–Solomon FEC to create a packet
APPENDIX A4: BRIEF OVERVIEW OF MPEG MULTIPLEXING AND DVB SUPPORT 105 DVB packet 204 bytes 188 bytes 16 bytes MPEG-2 packet Reed–Solomon error correction 4 bytes 184 bytes MPEG-2 MPEG-2 payload header Packet identifier (PID) PIDs are from 0 × 0000 to 0 × 1fff PIDs from 0 × 0000 to 0 × 0020 and 0 × 1fff are reserved Figure A4.4 DVB packet.that is 204 bytes long (Fig. A4.4). DVB-S embodies the concept of “virtualchannels” in a manner analogous to ATM; virtual channels are identiﬁed by PIDs(one can think of the DVB packets as being similar to an ATM cell, but withdifferent length and format). DVB packets are transmitted over an appropriatenetwork. The receiver looks for speciﬁc PIDs that it has been conﬁgured toacquire (directly in the headend receiver for terrestrial redistribution purposes orin the viewer’s set-top box for a DTH application or in the set-top box via anIGMP join in an IPTV environment). Speciﬁcally, to display a channel of IPTV digital television, the DVB-basedapplication conﬁgures the driver in the receiver to pass up to it the packetswith a set of speciﬁc PIDs, for example, PID 121 containing video and PID131 containing audio (these packets are then sent to the MPEG decoder whichis either hardware- or software-based). So, in conclusion, a receiver or de-multiplexer extracts ESs from the TS in part by looking for packets identiﬁed bythe same PID.A4.3 MPEG-4 and/or Other Data SupportFor satellite transmission and to remain consistent with already existing MPEG-2technology6 , MPEG-4 TSs (or other data) are further encapsulated in Multipro-tocol Encapsulation (MPE - RFC 3016) and then segmented again and placedinto TS streams via a device called IP Encapsulator (IPE; Fig. A4.5). MPEis used to transmit datagrams that exceed the length of the DVB “cell,” justas Asynchronous Transfer Mode Adaptation Layer 5 (AAL5) is used for a6 Existing receivers (speciﬁcally, IRDs) are based on hardware that works by de-enveloping MPEG-2TSs; hence, the MPEG-4-encoded PESs are mapped to TSs at the source.
106 3DTV/3DV TRANSMISSION APPROACHES AND SATELLITE DELIVERY Uplink Receiver Multiprotocol encapsulation (MPE) Figure A4.5 Pictorial view of encapsulation.similar function in an ATM context. MPE allows one to encapsulate IP pack-ets into MPEG-2 TSs (“packets,” or “cells”; Fig. A4.6). IPEs handle statisticalmultiplexing and facilitate coexistence. IPE receives IP packets from an Ethernetconnection and encapsulates packets using MPE, and then maps these streamsinto an MPEG-2 TS. Once the device has encapsulated the data, the IPE for-wards the data packets to a satellite link. Generic data (IP) for transmission overthe MPEG-2 transport multiplex (or IP packets containing MPEG-4 video) ispassed to an encapsulator that typically receives PDUs (Ethernet frames, IP data-grams, or other network layer packets); the encapsulator formats each PDU intoa series of TS packets (usually after adding an encapsulation header) that are sentover a TS logical channel. The MPE packet has the format shown in Fig. A4.7.Figure A4.8 shows the encapsulation process.Note: IPEs are usually not employed if the output of the layer 2 switch is connectedto a router for transmission over a terrestrial network; in this case, the headend isresponsible for proper downstream enveloping and distribution of the trafﬁc to theultimate consumer. In other pure, IP-based video environments, where DVB-S orDVB-S2 are not used (e.g., a greenﬁeld IP network that is designed to just handlevideo), the TSs are included in IP packets that are then transmitted as needed(Fig. A4.9). Speciﬁcally, with the current generation of equipment, the encoderwill typically generate IP packets; these have a source IP address and a single ormulticast IP address. The advantage of having video in IP format is that it canbe carried over a regular (pure) Local Area Network (LAN) or carrier Wide AreaNetwork (WAN).
APPENDIX A4: BRIEF OVERVIEW OF MPEG MULTIPLEXING AND DVB SUPPORT 107 Ethernet 14 bytes IP 20 bytes IP multicast UDP 8 bytes 7 TSs--same PID Output of encoders TS TS TS TS TS TS TS MPEG2 transport 4 bytes Video or Audeo or Data MPEG-4 stream DVB packet = 204 bytes DVB packet = 204 bytes 184 bytes 4 184 bytes 16 4 184 bytes 16 Encoder 1 TS packet = 188 TS packet = 188 Encoder 2 See bottom of this figure L2S IPE Encoder 3 DVB-S/DVB-S2-based satellite network Terrestrial network 7 TSs-same PID MPE hdr IP UDP MPE trailer TS TS TS TS TS TS TS 12 bytes 20 bytes 8 bytes 4 bytes IP Multicast MPEG2 transport 4 bytes Video or Audeo or Data MPEG-4 stream 184 bytes MPE MPE MPE MPE MPE payload MPE payload hdr trailer hdr trailerTS TS TS TS TS payload TS payload TS payload TS payloadhdr hdr hdr hdr Adaptation field (used for stuffing here) 188 bytes 4 bytes Adaptation field Header Payload TS packet (may not be present) Adaptation field DVB Header Payload DVB-S packet (may not be present) trailer Figure A4.6 IPE protocol stack.
108 3DTV/3DV TRANSMISSION APPROACHES AND SATELLITE DELIVERY 13 28 4 139 byte payload byte bytes byte (content of the “virtual channel” (aka PID)) MPE UDP/IP CRC MPE IP payload header 28 bytes of UDP/IP header or 40 bytes of TCP/IP header Figure A4.7 MPE packet. If IP packet looks like this IP/UDP hdr Payload CRC 28 bytes 375 bytes 4 bytes The MPEG packets would look like this MPEG hdr Pointer MPEG hdr IP/UDP hdr Payload 4 bytes 1 byte 12 bytes 28 bytes 143 bytes MPEG hdr Payload 4 bytes 184 bytes MPEG hdr Payload CRC Filler “0×FFFF” 4 bytes 48 bytes 4 bytes 132 bytes Figure A4.8 Encapsulation process. Consider Fig. A4.9 again. It clearly depicts video (and audio) informationbeing organized into PESs that are then segmented into TSs. Examining theprotocol stack of Fig. A4.9 one should note that in a traditional MPEG-2 envi-ronment of DTV, either over-the-air transmission or cable TV transmission, theTSs are handled directly by an MPEG-2-ready infrastructure formally knownas an MPEG-2 Transport Multiplex (see left-hand side stack). As explained,MPEG-2 Transport Multiplex offers a number of parallel channels that are knownas TS logical channels. Each TS logical channel is uniquely identiﬁed by the PIDvalue that is carried in the header of each MPEG-2 TS packet. TS logical chan-nels are independently numbered on each MPEG-2 TS multiplex (MUX). As justnoted, the service provided by an MPEG-2 transport multiplex offers a numberof parallel channels that correspond to logical links (forming the MPEG TS) [24,28]. The MPEG-2 TS has been widely accepted not only for providing digitalTV services but also as a subnetwork technology for building IP networks, sayin cable TV–based Internet access.
APPENDIX A4: BRIEF OVERVIEW OF MPEG MULTIPLEXING AND DVB SUPPORT 109 System information Ethernet 14 bytes IP 20 bytes UDP 8 bytes 7 TSs -- same PD LAN / WAN IP encapsulation TS TS TS TS TS TS TS MPEG02 Transport 4 bytes Video or Audio or IP LAN / WAN (traditional network) Data Typical of traditional 184 bytes video-over-IP Figure A4.9 Simpliﬁed protocol hierarchy. There may be an interest in also carrying actual IP datagrams over thisMPEG-2 transport multiplex infrastructure (this may be generic IP data or IPpackets emerging from MPEG-4 encoders that contain 7 MPEG-4 frames.) Tohandle this requirement, packet data for transmission over an MPEG-2 transportmultiplex is passed to an IPE. This receives PDUs, such as Ethernet frames or IPpackets, and formats each into a Subnetwork Data Unit (SNDU), by adding anencapsulation header and trailer. The SNDUs are subsequently fragmented intoa series of TS packets. To receive IP packets over an MPEG-2 TS Multiplex, a
110 3DTV/3DV TRANSMISSION APPROACHES AND SATELLITE DELIVERYreceiver needs to identify the speciﬁc TS multiplex (physical link) and also theTS logical channel (the PID value of a logical link). It is common for a numberof MPEG-2 TS logical channels to carry SNDUs; therefore, a receiver must ﬁlter(accept) IP packets sent with a number of PID values, and must independentlyreassemble each SNDU . Some applications require transmission of MPEG-4streams over a preexisting MPEG-2 infrastructure, for example, in a cable TVapplication. This is also done via the IPE; here the IP packets generated by theMPEG-4 encoder are considered (treated) as if they were data, as just describedabove in this paragraph (Fig. A4.10). The encapsulator receives PDUs (e.g., IP packets or Ethernet frames) andformats these into SNDUs. An encapsulation (or convergence) protocol trans-ports each SNDU over the MPEG-2 TS service and provides the appropriatemechanisms to deliver the encapsulated PDU to the receiver IP interface. Informing an SNDU, the encapsulation protocol typically adds header ﬁelds thatcarry protocol control information, such as the length of SNDU, receiver address,multiplexing information, payload type, and sequence numbers. The SNDU pay-load is typically followed by a trailer that carries an Integrity Check (e.g., CyclicRedundancy Check, CRC). When required, an SNDU may be fragmented acrossa number of TS packets (Figs A4.11 and A4.12) . Other sources, e.g. TS • MPEG-2 digital video MPEG-4 digital • IP over MPEG-2 video (usually TS placed in an IP over IP packet) ethernet IP over MPEG-2 TS ASI MPEG-2-ready MPEG-2 transport network MPEG-4 encoder IP encapsulator (IPE) ASI Multiplexer multiplex IPE receives IP traffic on its ethernet interface and segments the IP datagrams into smaller packets and places them inside MPEG-2 transport stream packets. The IPE ASI output is fed into the multiplexer MPEG-4 digital video IP over (usually placed in an IP packet) ethernet IP network MPEG-4 encoder Figure A4.10 Encapsulator function. +--------+-------------------------+-----------------+ | Header | PDU | Integrity check | +--------+-------------------------+-----------------+ <--------------------- SNDU ------------------------->Figure A4.11 Encapsulation of a subnetwork IPv4 or IPv6 PDU to form an MPEG-2payload unit.
APPENDIX A4: BRIEF OVERVIEW OF MPEG MULTIPLEXING AND DVB SUPPORT 111 +-----------------------------------------+ |Encap header|Subnetwork data unit (SNDU) | +-----------------------------------------+ / / / / / / +------+----------+ +------+----------+ +------+----------+ |MPEG-2| MPEG-2 |..|MPEG-2| MPEG-2 |...|MPEG-2| MPEG-2 | |header| payload | |header| payload | |header| payload | +------+----------+ +------+----------+ +------+----------+Figure A4.12 Encapsulation of a PDU (e.g., IP packet) into a series of MPEG-2 TSpackets. Each TS packet carries a header with a common packet ID (PID) value denotingthe MPEG-2 TS logical channel. In summary, the standard DVB way of carrying IP datagrams in an MPEG-2TS is to use MPE; with MPE each IP datagram is encapsulated into one MPEsection. A stream of MPE sections are then put into an ES, that is, a stream ofMPEG-2 TS packets with a particular PID. Each MPE section has a 12-B header,a 4-B CRC (CRC-32) tail, and a payload length, which is identical to the lengthof the IP datagram that is carried by the MPE section .
CHAPTER 53DTV/3DV IPTV TransmissionApproachesIPTV services enable advanced content viewing and navigation by consumers;the technology is rapidly emerging and becoming commercially available. IPTVis being championed by the telecom industry in particular, given the signiﬁcantIP-based infrastructure these carriers already own. IPTV may be an ideal tech-nology to support 3DTV because of the efﬁcient network pruning supported byIP Multicast. Developers are encouraged to explore the use of IPv6 to supportevolving 3DTV needs. 3DTV is a forward-looking service and hence, it shouldmake use of a forward-looking IP transport technology, speciﬁcally IPv6. IP Multicast is also employed for control. While IP Multicast has been aroundfor a number of years, it is now ﬁnding fertile commercial applications in theIPTV and DVB-H arenas. Applications such as datacasting (e.g., stock market orother ﬁnancial data) tend to make use of large multihop networks; pruning is oftenemployed and nodal store-and-forward approaches are completely acceptable.Applications such as video are very sensitive to end-to-end delay, jitter, and(uncorrectable) packet loss; QoS considerations are critical. These networks tendto have fewer hops and pruning may be somewhat trivially implemented bymaking use of a simpliﬁed network topology. IPTV services enable traditional carriers to deliver SD (Standard Deﬁnition)and HD video to their customers in support of their Triple/Quadruple Play strate-gies. With the signiﬁcant erosion in revenues from traditional voice services onwireline-originated calls (both, in terms of depressed pricing and a shift to VoIPover broadband Internet services delivered over cable TV infrastructure), andwith the transition of many customers from wireline to wireless services, thetraditional telephone carriers ﬁnd themselves in need of generating new revenuesby seeking to deliver video services to their customers. Traditional phone carriersﬁnd themselves challenged in the voice arena (by VoIP and other providers); theirInternet services are also challenged in the broadband Internet access arena (bycable TV companies); and, their video services are nascent and challenged by alack of deployed technology.3DTV Content Capture, Encoding and Transmission: Building the TransportInfrastructure for Commercial Services, by Daniel MinoliCopyright 2010 John Wiley & Sons, Inc. 113
114 3DTV/3DV IPTV TRANSMISSION APPROACHES5.1 IPTV CONCEPTSAs described in Ref. , IPTV deals with approaches, technologies, and protocolsto deliver commercial-grade SD and HD entertainment-quality real-time linearand on-demand video content over IP-based networks, while meeting all pre-requisite QoS, QoE, Conditional Access (CA) (security), blackout management(for sporting events), Emergency Alert System (EAS), closed captions, parentalcontrols, Nielsen rating collection, secondary audio channel, picture-in-picture,and guide data requirements of the content providers and/or regulatory entities.Typically, IPTV makes use of MPEG-4 encoding to deliver 200–300 SD chan-nels and 20–30 HD channels; viewers need to be able to switch channels within2 s or less; also, the need exists to support multi-set-top boxes/multiprogramming(say 2–4) within a single domicile. IPTV is not to be confused with the simpledelivery of video over an IP network (including video streaming) that has beenpossible for over two decades; IPTV supports all business, billing, provisioning,and content protection requirements that are associated with commercial videodistribution. IP-based service needs to be comparable to that received over cableTV or Direct Broadcast Satellite. In addition to TV sets, the content may alsobe delivered to a personal computer. MPEG-4, which operates at 2.5 Mbps forSD video and 8–11 Mbps for HD video, is critical to telco-based video deliveryover a copper-based plant because of the bandwidth limitations of that plant, par-ticularly when multiple simultaneous streams need to be delivered to a domicile;MPEG-2 would typically require a higher bitrate for the same perceived videoquality. IP Multicast is typically employed to support IPTV. There has been sig-niﬁcant deployment of commercial-grade IPTV services around the world in therecent past, as seen in Table 5.1.TABLE 5.1 Partial List of IPTV Providers in the United States and EuropeUS IPTV providers AT&T: U-Verse TV offers 300+ channels. Features include DVR, VOD, and HD Verizon: FiOS TV offers 200+ channels. Features include VOD, HD, and multi-room DVR (control and watch DVR programs from multiple rooms)European IPTV providers Deutsche Telekom: T-Home service includes 60+ channels; HD, VOD, and TV archives are available Belgacom: Belgacom TV includes 70+ channels; VOD France Telecom: Orange TV offers 200+ channels;HD and VOD Telecom Italia: Alice Home TV offers 50+ channels; VOD British Telecom: BT Vision Service offers 40+ standard channels; DVR Telefonica: Imagenio service offers over 70+ channels Swisscom: Bluewin TV offers over 100+ channels; DVR and VOD
IPTV CONCEPTS 115 One can anticipate several phases in the deployment of IPTV, as follows: • Phase 1: IPTV introduced by the telcos for commercial delivery of entertainment-grade video over their IP/MPLS (Multiprotocol Label Switching) networks (2007–2012). • Phase 2: IPTV introduced by the cable TV companies for commercial deliv- ery of entertainment-grade video over their cable infrastructure (speculative, 2012+). • Phase 3: IPTV to morph to Internet TV for commercial delivery of any video content but of entertainment-grade quality over the Internet/broadband Internet access connections (2012+).5.1.1 Multicast OperationAs noted above, the backbone may consist of (i) a pure IP network or (ii) amixed satellite transmission link to a metropolitan headend that, in turn, usesa metropolitan (or regional) telco IP network. Applications such as video arevery sensitive to end-to-end delay, jitter, and (uncorrectable) packet loss; QoSconsiderations are critical. These networks tend to have fewer hops and pruningmay be somewhat trivially implemented by a making use of a simpliﬁed networktopology. At the logical level, there are three types of communication between systemsin a(n IP) network: • Unicast: Here, one system communicates directly to another system. • Broadcast: Here, one system communicates to all systems. • Multicast: Here, one system communicates to a select group of other sys- tems. In traditional IP networks, a packet is typically sent by a source to a singledestination (unicast); alternatively, the packet can be sent to all devices on thenetwork (broadcast). There are business- and multimedia (entertainment) appli-cations that require a multicast transmission mechanism to enable bandwidth-efﬁcient communication between groups of devices where information is trans-mitted to a single multicast address and received by any device that wishes toobtain such information. In traditional IP networks, it is not possible to generatea single transmission of data when this data is destined for a (large) group ofremote devices. There are classes of applications that require distribution of infor-mation to a deﬁned (but possibly dynamic) set of users. IP Multicast, an extensionto IP, is required to properly address these communications needs. As the termimplies, IP Multicast has been developed to support efﬁcient communicationbetween a source and multiple remote destinations. Multicast applications include, among others, datacasting—for example,for distribution of real-time ﬁnancial data—entertainment digital televisionover an IP network (commercial-grade IPTV), Internet radio, multipoint videoconferencing, distance-learning, streaming media applications, and corporate
116 3DTV/3DV IPTV TRANSMISSION APPROACHEScommunications. Other applications include distributed interactive simulation,cloud/grid computing, and distributed video gaming (where most receiversare also senders). IP Multicast protocols and underlying technologies enableefﬁcient distribution of data, voice, and video streams to a large population ofusers, ranging from hundreds to thousands to millions of users. IP Multicasttechnology enjoys intrinsic scalability, which is critical for these types ofapplications. As an example in the IPTV arena, with the current trend toward the deliv-ery of HDTV signals, each requiring the 12 Mbps range, and the consumers’desire for a large number of channels (200–300 being typical), there has to bean efﬁcient mechanism of delivering a signal of 1–2 Gbps1 aggregate to a largenumber of remote users. If a source had to deliver 1 Gbps of signal to, say,1 million receivers by transmitting all of this bandwidth across the core network,it would require a petabit per second network fabric; this is currently not pos-sible. On the other hand, if the source could send the 1 Gbps of trafﬁc to (say)50 remote distribution points (for example, headends), each of which then makesuse of a local distribution network to reach 20,000 subscribers, the core networkonly needs to support 50 Gbps, which is possible with proper design. For suchreasons, IP Multicast is seen as a bandwidth-conserving technology that opti-mizes trafﬁc management by simultaneously delivering a stream of informationto a large population of recipients, including corporate enterprise users and resi-dential customers. IPTV uses IP-based basic transport (where IP packets containMPEG-4 TSs) and IP Multicast for service control and content acquisition (groupmembership). See Fig. 5.1 for a pictorial example. One important design principle of IP Multicast is to allow receiver-initiatedattachment (joins) to information streams, thus supporting a distributed informat-ics model. A second important principle is the ability to support optimal pruningsuch that the distribution of the content is streamlined by pushing replication asclose to the receiver as possible. These principles enable bandwidth-efﬁcient useof underlying network infrastructure. The issue of security in multicast environments is addressed via ConditionalAccess Systems (CAS) that provide per-program encryption (typically, but notalways, symmetric encryption; also known as inner encryption) or aggregateIP-level encryption (again typically, but not always, symmetric encryption; alsoknown as outer encryption). Carriers have been upgrading their network infrastructure in the past fewyears to enhance their capability to provide QoS-managed services, such asIPTV. Speciﬁcally, legacy remote access platforms, implemented largely to sup-port basic DSL service roll-outs—for example, supporting ATM aggregation andDSL termination—are being replaced by new broadband network gateway accesstechnologies optimized around IP, Ethernet, and VDSL2 (Very High Bitrate1 Currently, a typical digital TV package may consist of 200–250 SD signals each operating at3 Mbps, and 30 HD signals each operating at 12 Mbps. This equates to about 1 Gbps; as moreHDTV signals are added, the bandwidth will reach the range of 2 Gbps.
IPTV CONCEPTS 117 S Traditional unicast IP R R R S Multicast IP R R R S = Source R = Receiver Figure 5.1 Bandwidth advantage of IP Multicast.Digital Subscriber Line 2). These services and capabilities are delivered withmultiservice routers on the network edge. Viewer-initiated program selection isachieved using the IGMP, speciﬁcally with the Join Group Request message.(IGMP v2 messages include Create Group Request, Create Group Reply, JoinGroup Request, Join Group Reply, Leave Group (LG) Request, LG Reply, Con-ﬁrm Group Request, and Conﬁrm Group Reply.) Multicast communication isbased on the construct of a group of receivers (hosts) that have an interest inreceiving a particular stream of information, be it voice, video, or data. There areno physical or geographical constraints, or boundaries to belong to a group, as
118 3DTV/3DV IPTV TRANSMISSION APPROACHESlong as the hosts have (broadband) network connectivity. The connectivity of thereceivers can be heterogeneous in nature, in terms of bandwidth and connectinginfrastructure (for example, receivers connected over the Internet), or homoge-neous (for example, IPTV or DVB-H users). Hosts that are desirous of receivingdata intended for a particular group join the group using a group management pro-tocol: hosts/receivers must become explicit members of the group to receive thedata stream, but such membership may be ephemeral and/or dynamic. Groups ofIP hosts that have joined the group and wish to receive trafﬁc sent to this speciﬁcgroup are identiﬁed by multicast addresses. Multicast routing protocols belong to one of two categories: Dense-Mode(DM) protocols and Sparse-Mode (SM) protocols. • DM protocols are designed on the assumption that the majority of routers in the network will need to distribute multicast trafﬁc for each multicast group. DM protocols build distribution trees by initially ﬂooding the entire network and then pruning out the (presumably small number of) paths without active receivers. The DM protocols are used in LAN environments, where band- width considerations are less important, but can also be used in WANs in special cases (for example, where the backbone is a one-hop broadcast medium such as a satellite beam with wide geographic illumination, such as in some IPTV applications). • SM protocols are designed on the assumption that only few routers in the network will need to distribute multicast trafﬁc for each multicast group. SM protocols start out with an empty distribution tree and add drop-off branches only upon explicit requests from receivers to join the distribution. SM protocols are generally used in WAN environments, where bandwidth considerations are important. For IP Multicast there are several multicast routing protocols that can beemployed to acquire real-time topological and membership information for activegroups. Routing protocols that may be utilized include the Protocol-IndependentMulticast (PIM), the Distance Vector Multicast Routing Protocol (DVMRP), theMOSPF (Multicast Open Shortest Path First), and Core-Based Trees (CBT). Mul-ticast routing protocols build distribution trees by examining routing a forwardingtable that contains unicast reachability information. PIM and CBT use the unicastforwarding table of the router. Other protocols use their speciﬁc unicast reachabil-ity routing tables; for example, DVMRP uses its distance vector routing protocolto determine how to create source-based distribution trees, while MOSPF utilizesits link state table to create source-based distribution trees. MOSPF, DVMRP,and PIM-DM are dense-mode routing protocols, while CBT and PIM-SM aresparse-mode routing protocols. PIM is currently the most-widely used protocol. As noted, IGMP (versions 1, 2, and 3) is the protocol used by Internet ProtocolVersion 4 (IPv4) hosts to communicate multicast group membership states tomulticast routers. IGMP is used to dynamically register individual hosts/receiverson a particular local subnet (for example, LAN) to a multicast group. IGMP
IPTV CONCEPTS 119 Specifies the maximum allowed time a host can wait before sending a corresponding report. 0 8 16 31 Type Max response time Type of IGMP packet (used in membership 16-bit check sum query messages) Class D address (used in a report packet) −0×11: Specifies a membership query packet. This packet is sent by a multicast router. −0×12: Specifies an IGMP v1 membership report packet. This packet is sent by a multicast host to signal participation in a specific multicast host group. −0×16: Specifies an IGMP v2 membership report packet. −0×17: Specifies a leave group packet. This packet is sent by a multicast host. Figure 5.2 IGMP v2 message format.version 1 deﬁned the basic mechanism. It supports a Membership Query (MQ)message and a Membership Report (MR) message. Most implementations atpress time employed IGMP version 2; it adds LG messages. Version 3 addssource awareness, allowing the inclusion or exclusion of sources. IGMP allowsgroup membership lists to be dynamically maintained. The host (user) sends anIGMP “report,” or join, to the router to be included in the group. Periodically,the router sends a “query” to learn which hosts (users) are still part of a group.If a host wishes to continue its group membership, it responds to the query witha “report.” If the host does not send a “report,” the router prunes the grouplist to delete this host; this eliminates unnecessary network transmissions. WithIGMP v2, a host may send an LG message to alert the router that it is no longerparticipating in a multicast group; this allows the router to prune the group list todelete this host before the next query is scheduled, thereby minimizing the timeperiod during which unneeded transmissions are forwarded to the network. The IGMP messages for IGMP version 2 are shown in Fig. 5.2. The messagecomprises an eight octet structure. During transmission, IGMP messages areencapsulated in IP datagrams; to indicate that an IGMP packet is being carried,the IP header contains a protocol number of 2. An IP datagram includes a ProtocolType ﬁeld, that for IGMP is equal to 2 (IGMP is one of many protocols that canbe speciﬁed in this ﬁeld). An IGMP v2 PDU consists of a 20-byte IP header and8 bytes of IGMP. Some of the areas that require consideration and technical support to developand deploy IPTV systems include the following, among many others: • content aggregation; • content encoding (e.g., AVC/H.264/MPEG-4 Part 10, MPEG-2, SD, HD, Serial Digital Interface (SDI), Asynchronous Serial Interface (ASI), Layer 1 switching/routing); • audio management;
120 3DTV/3DV IPTV TRANSMISSION APPROACHES • digital right management/CA: encryption (DVB-CSA, AES or Advanced Encryption StandardAdvanced Encryption Standard); key management schemes (basically, CAS); transport rights; • encapsulation (MPEG-2 transport stream distribution); • backbone distribution such as satellite or terrestrial (DVB-S2, QPSK, 8-PSK, FEC, turbo coding for satellite—SONET (Synchronous Optical Network)/SDH/OTN (Synchronous Digital Hierarchy/Optical Transport Network) for terrestrial); • metro-level distribution; • last-mile distribution (LAN/WAN/optics, GbE (Gigabit Ethernet), DSL/FTTH); • multicast protocol mechanisms (IP multicast); • QoS backbone distribution; • QoS, metro-level distribution; • QoS, last-mile distribution; • QoS, channel surﬁng; • Set-Top Box (STB)/middleware; • QoE; • Electronic Program Guide (EPG); • blackouts; • service provisioning/billing, service management; • advanced video services (e.g., PDR and VOD); • management and conﬁdence monitoring; • triple play/quadruple play.5.1.2 BackboneFigures 5.3–5.6 depict typical architectures for linear IPTV. Figure 5.3 shows a content aggregator preparing the content at single source Sfor terrestrial distribution of the content to multiple remote telcos. This exampledepicts the telcos acquiring the service from an aggregator/distributor, ratherthan performing that fairly complex function on their own since it can be fairlyexpensive to architect, develop, setup, and maintain. The operator must sign anagreement with each content provider; hundreds of agreements are thereforerequired to cover the available channels and VOD content. Aggregators providethe content to the operator, typically over satellite delivery and do a lot of the sig-nal normalization and CA work. However, an additional per-channel agreementwith one or more content aggregators is also needed.Note: This ﬁgure shows DSL delivery, likely ADSL2, but an FTTH can also beused.Note: This ﬁgure does not show the middleware server—either distributed at thetelco headend or centralized at the content aggregator.
IPTV CONCEPTS 121 Conditional access system Control word DSLAM generator Firewall EncryptorContent 1 2D and/or Dense or Sparse–Dense 3DContent 2 Dense DSLAM or Sparse–Content 3 DenseContent n Encryptor Firewall Dense Conditional or Sparse– access system Dense DSLAM Control word generator DSLAM Figure 5.3 Typical terrestrial-based single-source IPTV system.Note: This ﬁgure does not show the content acquisition; the uniform transcoding(e.g., using MPEG-4) is only hinted by the device at the far left.Note: This ﬁgure does not show the speciﬁcs of how the ECMs (EntitlementControl Message) and EMMs (Entitlement Management Message) to support theCA function are distributed resiliently. This is typically done in-band for the ECMsand out-of-band (e.g., using a Virtual Private Network or VPN over the Internet)for the EMMs.Note: This ﬁgure does not show the Video On Demand (VOD) overlay is deployedover the same infrastructure to deliver this and other advanced services.
122 3DTV/3DV IPTV TRANSMISSION APPROACHES Conditional access system rec Control word generator DSLAM rec Firewall Encryptor 2D E Mo U HPAContent 1 and/or Mi 3D recContent 2 DSLAM recContent 3 E Mo U HPA DenseContent n or Sparse–Dense Encryptor rec Firewall Conditional access system DSLAM rec Control word generator rec DSLAM rec Figure 5.4 Typical satellite-based single-source IPTV system.Note: This ﬁgure does not show a blackout management system, which is neededto support substitution of programming for local sports events.Note: This ﬁgure does not show how the tribune programming data is injected intothe IPTV system, which is needed for scheduling/programming support. Figure 5.4 is an architecture that is basically similar to that of Fig. 5.3, butthe distribution to the remote telcos is done via a satellite broadcast technology.Satellite delivery is typical of how cable TV operators today receive their signalsfrom various media content producers (e.g., ABC/Disney, CNN, UPN, Discovery,and A&E). In the case of the cable TV/Multiple Systems Operator (MSOs), theoperator would typically have (multiple) satellite antenna(s) accessing multipletransponders on a satellite or on multiple satellites, and then combine these signalsfor distribution. See Fig. 5.5 for a pictorial example. In contrast, in the architectureof Fig. 5.6, the operator will need only one receiver antenna, because the signal
IPTV CONCEPTS 123 2D and/or 3D Cable TV 1 Receivers Content Receivers providers Content Receivers providers Receivers Content providers Cable TV 2 Receivers Receivers ReceiversFigure 5.5 Disadvantages of distributed-source IPTV: requires dish farms at each telcoand for all ancillary subsystems.aggregation (CA, middleware administration, etc.) is done at the central point ofcontent aggregation. Zooming in a bit, the technology elements (subsystems) involved in linearIPTV include the following: • content aggregation, • uniform transcoding, • CA management, • encapsulation, • long-haul distribution, • local distribution, • middleware, • STBs, • catcher (for VOD services).
124 3DTV/3DV IPTV TRANSMISSION APPROACHES 2D and/or 3D Contentproviders Source receivers Video routing Uplink IP receiver H.264 encoding DVB modulation IP router STB IP switch IP encapsulation DSLAM Telco 1 IP receiver IP router STB DSLAM Telco 2Figure 5.6 Advantages of single-source IPTV: obviates need for dish farms at eachtelco. Each of these technologies/subcomponents has its own speciﬁc design require-ments, architectures, and considerations. Furthermore, these systems have tointeroperate for an end-to-end complete solution that has a high QoE for theuser, is easy to manage, and is reliable. In turn, each of these subsystems can be viewed as a vendor-provided platform.Different vendors have different product solutions to support these subsystems;generally, no one vendor has a true end-to-end solution. Hence, each of thefollowing can be seen as a subsystem platform in its own right: • content aggregation, • uniform transcoding, • CA management, • encapsulation,
IPTV CONCEPTS 125 • long-haul distribution, • local distribution, • middleware, • STBs, • catcher (for VOD services).5.1.3 AccessThe local distribution network is typically high-quality routed network with verytightly controlled latency, jitter, and packet loss. The local distribution networkis typically comprised of a metropolitan core tier and a consumer distributiontier (Fig. 5.7). In the metropolitan core tier, IPTV is generally transmitted using thetelco’s private “carrier-grade” IP network. The network engine can be pureIP-based, MPLS-based (layer “2.5”), metro Ethernet-based (layer 2), or opticalSONET/OTN based (layer 1), or a combination thereof. A (private) wirelessnetwork such as WiMAx can also be used. The backbone network supports IPMulticast, very typically PIM-DM or PIM Sparse–Dense. It is important to keep the telco-level IP network (the metropolitan core tier)streamlined with as few routed hops as possible, and with plenty of bandwidthbetween links and with high-power nodal routers in order to meet the QoS Consumer distribution tier 2D and/or DSLAM 3D Conditional access system Metropolitan coreContent 1Content 2 Consumer distribution tier Backbone DSLAMContent 3 coreContent n Conditional Metropolitan core access system Consumer distribution tier DSLAM Figure 5.7 Distribution networks.
126 3DTV/3DV IPTV TRANSMISSION APPROACHESrequirements of IPTV. Otherwise pixilation, tiling, waterfall effects, and evenblank screens will be an issue. It is important to properly size all Layer 2 andLayer 3 devices in the network. It is also important to keep multicast trafﬁcfrom “splashing back” and ﬂooding unrelated ports. IGMP snooping and othertechniques may be appropriate. The consumer distribution tier, the ﬁnal leg, is generally (but not always)DSL-based at this time (e.g., VDSL or ADSL2+); other technologies such asPON (Passive Optical Network) may also be used (Table 5.2). A bandwidthin the 20–50 Mbps is generally desirable for delivery of IPTV services. ForTABLE 5.2 Consumer Distribution TierApproach for ConsumerDistribution Tier Description“Classical” Digital Subscriber Line (DSL) delivers digital data over a copper connection, typically using the existing local loop. There are multiple DSL variants with ADSL2 and ADSL2+ being the most prevalent. DSL is distance-sensitive and has limited bandwidth. As a result, DSL often cannot be used alone, ﬁber must be deployed to connect to a Digital Subscriber Line Access Multiplexer (DSLAM) located in an outside plant cabinetUnder deployment • Fiber to the Neighborhood (FTTN, also referred to as Fiber To The Node): Fiber is extended to each neighborhood where IPTV service is to be supported. A small-size DSLAM in each neighborhood supports a few dozen subscribers • Fiber-to-the-Curb (FTTC): Fiber is extended to within (typically) less than 1/10th of a mile from the subscriber site. Each ﬁber typically supports one to three subscribers • Fiber-to-the-Premises/Home/Subscriber/Business (FTTP, FTTH, FTTS, FTTB): Fiber reaches the subscriber siteNew/future • Passive Optical Networks (PON) technology can be used to deliver service using end-to-end ﬁber. A single ﬁber emanates from the Central Ofﬁce, and a passive splitter in the outside plant splits the signal to support multiple subscribers. Broadband PON (BPON) supports up to 32 subscribers per port, while Gigabit PON (GPON) supports up to 128 subscribers per port • Fixed wireless WiMAX. Note that WiMAX supports only 17 Mbps of shared bandwidth over a 2.5-mile radius (and less at higher distances), and is therefore, rather limitedCable operators Hybrid Fiber Coax (HFC) is the traditional technology used by cable operators. Fiber is used for the ﬁrst section, from the headend to the subscriber’s neighborhood. The link is then converted to coax for the remainder of the connection, terminating at the subscriber premises
IPTV CONCEPTS 127example, the simultaneous viewing of an HD channel along with two SD channelswould require about 17 Mbps; Internet access would require additional band-width. Therefore, the 20 Mbps is seen as a lower bound on the bandwidth. In theUnited States Verizon is implementing Fiber-to-the-Premises (FTTP) technolo-gies, delivering ﬁber to the subscriber’s domicile; this supports high bandwidthbut it requires signiﬁcant investments. AT&T is implementing Fiber-to-the-Curb(FTTC) in some markets, using existing copper for only the last 1/10th of a mile,and Fiber-to-the-Node (FTTN) in other markets terminating the ﬁber run withina few thousand feet of the subscriber. These approaches lower the up-front costbut limit the total bandwidth. As noted, IPTV as delivered by the telephone carriers may use PON technol-ogy, as an FTTH implementation technology, or perhaps VDSL2. However, ifloop investments are made by these carriers it is likely that it will be in favorof FTTH. VDSL2 may ﬁnd a use in Multidwelling Unit (MDUs), as we notebelow. The VDSL2 standard ITU G.993.2 is an enhancement to G.993.1 (VDSL).It uses about 30 MHz of spectrum (versus 12 MHz in VDSL) and thus allowsmore data to be sent at higher speeds and over longer distances. VDSL2 utilizesup to 30 MHz of bandwidth to provide speeds of 100 Mbps both downstreamand upstream within 1000 ft. Data rates in excess of 25 Mbps are available fordistances up to 4000 ft (Fig. 5.8); Fig. 5.9, depicts, for illustrative purposes, testresults for Zhone’s VDSL2 products  VDSL2 technology can handle, say,three simultaneous HDTV streams (for example, according to the ﬁrm GigaOmResearch, the average US home has 3.1 televisions). Of course, there is the issue Aggregate data rate (DS + US) for AWGN-140dBm/Hz 200 VDSL2 180 VDSL1 ADSL2+ 160 ADSL2 140 Data rate (Mbps) 120 100 80 60 40 20 0 0 1 2 3 4 5 6 7 8 9 10 Loop length (kft, 26AWG) Figure 5.8 VDSL2 aggregate channel capacity as a function of the link length.
128 3DTV/3DV IPTV TRANSMISSION APPROACHES 100000 VDSL2 PROFILES 8A-8D Kbps 80000 8a-DS line rate 8a-US line rate 60000 8b-US line rate 8c-DS line rate 40000 8d-DS line rate 8d-DS line rate 20000 8b-DS line rate 8c-US line rate 0 0 1000 2000 3000 4000 5000 6000 DS = downstream; US = upstream Distance (ft) 100000 80000 VDSL2 PROFILES 12A-12B Kbps 60000 12a-DS line rate 40000 12b-DS line rate 20000 12a-US line rate 12b-US line rate 0 0 1000 2000 3000 4000 5000 6000 Distance (ft) 140000 120000 100000 VDSL2 PROFILE 17A Kbps 80000 DS line rate 60000 US line rate 40000 20000 0 0 1000 2000 3000 4000 5000 6000 Distance (ft) 140000 120000 VDSL2 PROFILE 30A 100000 DS line rate Kbps 80000 US line rate 60000 40000 20000 0 0 1000 2000 3000 4000 5000 6000 Distance (ft) Figure 5.9 Actual VDSL2 downstream/upstream line rates, based on proﬁles.that many homes in the United States are too far from the Central Ofﬁce. TheVDSL2 standard deﬁnes a set of proﬁles (Table 5.3) that can be used in differ-ent VDSL deployment architectures; ITU G.992.3 extends the North Americanfrequency range from 12 to 30 MHz. For example, carriers such as Verizon Communications may use VDSL2 forrisers in MDUs to bring FTTH service in these buildings. The carrier has beenusing relatively inexpensive Optical Network Terminal or ONTs (also calledOptical Network Units (ONUs)2 ) for Single Family Unit (SFUs) (using Broad-band Passive Optical Network or BPON) initially, but now also seeing GigabitPassive Optical Network or GPON deployment). Using this arrangement, it isnot excessively expensive to bring the ﬁber service to the living unit of an SFU.2 ONU is IEEE terminology and ONT is ITU-T terminology.
IPTV CONCEPTS 129TABLE 5.3 VDSL2 ProﬁlesRegional Relevance North America Europe AsiaProﬁles 8a 8b 8c 8d 12a 12b 17a 30aBandwidth (MHz) 8.5 12 17.7 30Bandwidth (KHz) 4.312 4.312 4.312 4.312 4.312 4.312 4.312 8.625Maximum Throughput 50 68 100 200 (Mbps, downstream)However, tenants of MDUs are more expensive to service because of the cost inpulling the ﬁber up the risers. Here is where DSL technologies still have someplay: on the link between the basement and the apartment unit. For BPON, itis all VDSL1; for GPON, it is all VDSL2. The carrier will set the locations ofthe MDUs so that the furthest tenant is around 500 ft.; this achieves speeds ofaround 35 Mbps downstream, 10 Mbps upstream on VDSL1 and BPON. OnGPON/VDSL2 the carrier expects to achieve 75 Mbps downstream (Fig. 5.10). PON is the leading FTTH technology3 (Fig. 5.11). This approach differs frommost of the telecommunications networks in place today by featuring “passive”operation. Active networks such as DSL, VDSL, and cable have active compo-nents in the network backbone equipment, in the central ofﬁce, neighborhoodnetwork infrastructure, and customer premises equipment. PONs employ onlypassive light transmission components in the neighborhood infrastructure; activecomponents are located only in the central ofﬁce and the customer premisesequipment. The elimination of active components means that the access networkconsists of one bidirectional light source and a number of passive splitters thatdivide the data stream into the individual links to each customer. At the centralofﬁce, the termination point is in the PON’s Optical Line Terminal (OLT) equip-ment. Between the OLT and the customer’s ONT/ONUs, one ﬁnds the PON; thePON is comprised of ﬁber links and passive splitters and couplers.188.8.131.52 APON, BPON, GPON, EPON, and GE-PON. These represent var-ious ﬂavors of PON technology. Asynchronous Transfer Mode Passive OpticalNetwork (APON) and BPON are the same speciﬁcation, which is commonlyreferred to as BPON. BPON is the oldest PON standard, deﬁned in the mid-1990s and while there is an installed base of BPON, most of the new marketdeployment focus is now on Ethernet Passive Optical Network (EPON)/GigabitEthernet Passive Optical Network (GE-PON). GE-PON and EPON are differentnames for the same speciﬁcation, which is deﬁned by the IEEE 802.3 ah Ethernetin the First Mile (EFM) standard ratiﬁed in 2004. This is the current standardizedhigh-volume solution for GPON technologies. GPON was being standardized asthe ITU-T G.984 recommendation and is attracting interest in North America andelsewhere, but with no ﬁnal standard. GPON devices have just been announced,and there is no volume deployment as yet.3 This section is based on material from Ref. .
130 3DTV/3DV IPTV TRANSMISSION APPROACHES 3DTV Multiple HDTV/5DTV 6652-A2-302 VDSL2 modern Data Voice 10/100 Mbps VDSL2 100/100 Mbps Main distribution ata IP/ D rks frame net wo GPON Bitstorm-HP VDSL2 IP-DSLAMFigure 5.10 MDU use of VDSL2. Zhone’s VDSL2 products shown for illustrative pur-poses.184.108.40.206 Differences between BPON, GPON and GE-PON. One impor-tant distinction between the standards is operational speed. BPON is relativelylow speed with a 155 Mbps upstream/622 Mbps downstream operation. GE-PON/EPON supports 1.0 Gbps symmetrical operation. GPON supports 2.5/1.25Gbps asymmetrical operation. Another key distinction is the protocol support fortransport of data packets between access network equipment. BPON is based onATM, GE-PON uses native Ethernet and GPON supports ATM, Ethernet, andWavelength Division Multiplexing (WDM) using a superset multiprotocol layer.BPON suffers from the very aggressive optical timing of ATM and the high com-plexity of the ATM transport layer. ATM-based FTTH solutions face a numberof problems posed by (i) the provisioning process (which requires ATM-basedcentral ofﬁce equipment); (ii) by the complexity (in timing requirements and pro-tocol complexity); and (iii) by the cost of components. This cost is exacerbatedby the relatively small market for traditional ATM equipment used in the back-bone telecommunications network. GPON is still evolving; the ﬁnal speciﬁcationof GPON is still being discussed by the ITU-T and Full Service Access Network
Telecommunications and internet backbone network OLT IP networks Passive optical line terminal Customer premise optical network equipment (CPE) Residency gateway Video/Audio ) ONT or ONU e over IP O m (C e Ho c ble ONU CATV overlay o ffi r ca service Fibe optical network units tr al en C Other networks U N O U e U Passive optical ON m N splitter O Ho U N O e m Home Ho ce ffi O Figure 5.11 PON.131
132 3DTV/3DV IPTV TRANSMISSION APPROACHESTABLE 5.4 PON ComparisonAttributes BPON (APON) GE-PON (EPON) GPONSpeed: upstream/downstream 155/622 Mbps 1.0/1.0 Gbps 1.25/2.5 GbpsNative protocol ATM Ethernet GEMComplexity High Low HighCost High Low UndeterminedStandards body ITU-T IEEE ITU-TStandard complete Yes, 1995 Yes, 2004 NoVolume deployment Yes, in 100,000s Yes, in 1,000,000s NoPrimary deployment area North America Asia Not applicable(FSAN) bodies. By deﬁnition, it requires the complexity of supporting multipleprotocols through translation to the native Generic Encapsulation Method (GEM)transport layer that through emulation, provides support for ATM, Ethernet, andWDM protocols. This added complexity and lack of standard low-cost 2.5/1.25Gbps optical components has delayed industry development of low-cost, high-volume GPON devices. GE-PON or EFM has been ratiﬁed as the IEEE 802.3ah EFM standard and is already widely deployed in Asia. It uses Ethernet asits native protocol and simpliﬁes timing and lowers costs by using symmetrical1 Gbps data streams using standard 1 Gbps Ethernet optical components. Simi-lar to other Ethernet equipment found in the extended network, Ethernet-basedFTTH equipment is much lower in cost relative to ATM-based equipment, andthe streamlined protocol support for an extended Ethernet protocol simpliﬁesdevelopment. Table 5.4 compares the technologies.5.2 IPv6 CONCEPTSWhile it is likely that initially 3DTV will be delivered by traditional transportmechanisms, including DVB over DTH systems, recently some research effortshave been focused on delivery (streaming) of 3DTV using IP. IP can be used forIPTV systems or over an IP shared infrastructure, whether a private network (hereshared with other applications) or over the Internet (here shared with a multitudeof other users and applications) (some studies have also been undertaken oflate on the capabilities of DVB-H to broadcast stereo-video streams.) However,it seems that the focus so far has been on IPv4; the industry is encouragedto assess the capabilities of IPv6. While this topic is partially tangential to acore 3DTV discussion, the abundant literature on proposals for packet-baseddelivery of future 3DTV (including but not limited to Refs [4–13]) makes theissue relevant. IPv6, when used with header compression, is expected to be avery useful technology to support IPTV in the future in general and 3DTV inparticular. For a general discussion of IPTV and DVB-H, the reader may referto Ref.  among other references.
IPv6 CONCEPTS 133 IPv6 was deﬁned in the mid-1990s in IETF Request for Comments (RFC)2460 “Internet Protocol, Version 6 (IPv6) Speciﬁcation” and a host of other morerecent RFCs, is an “improved, streamlined, successor version” of IPv4.4 Becauseof market pull from the Ofﬁce of Management and Budget’s mandate that 24major federal agencies in the US Government (USG) be IPv6-ready by June 30,2008, and because of market pull from European and Asian institutions, IPv6 isexpected to see gradual deployment from this point forward and in the comingdecade. With IPv6 already gaining momentum globally, with major interest andactivity in Europe and Asia and also some traction in the United States; theexpectation is that in the next few years a (slow) transition to this new protocolwill occur worldwide. An IP-based infrastructure has now become the ubiquitousunderlying architecture for commercial-, institutional-, and USG/Other (non-US)Government (OG) communications and services functions. IPv6 is expected tobe the next step in the industry’s evolution in the past 50 years from analog, todigital, to packet, to broadband. As an example of IPv6 deployment underway,Europe has set the objective to widely implement IPv6 by 2010; the goal isthat at least 25% of users should be able to connect to the IPv6 Internet andto access their most important content and service providers without noticing amajor difference when compared to IPv4. IPv6 offers the potential of achieving increased scalability, reachability, end-to-end interworking, QoS, and commercial-grade robustness for data communica-tion, mobile connectivity, and for VoIP/triple-play networks. The current versionof the IP, IPv4, has been in use successfully for almost 30 years and posessome challenges in supporting emerging demands for address space cardinality,high-density mobility, multimedia, and strong security. This is particularly true indeveloping domestic and defense department applications utilizing peer-to-peernetworking. IPv6 is an improved version of IP that is designed to coexist withIPv4 while providing better internetworking capabilities than IPv4 [14–17]. When the current version of IPv4 was conceived in the mid-1970s and deﬁnedsoon thereafter (1981), it provided just over 4 billion addresses; that is not enoughto provide each person on the planet with one address without even consideringthe myriad of other devices and device modules needing addressability (such asbut not limited to over 3 billion cellphones). Additionally, 74% of IPv4 havebeen assigned to North American organizations. The goal of developers is to beable to assign IP addresses to a new class of Internet-capable devices: mobilephones, car navigation systems, home appliances, industrial equipment, and otherdevices (such as sensors and Body-Area-Network medical devices). All of thesedevices can then be linked together, constantly communicating, even in wirelessmode. Projections show that the current generation of the Internet will “run outof space” in the near future (2010/2011) if IPv6 is not adopted around the world.IPv6 is an essential technology for ambient intelligence and will be a key driverfor a multitude of new, innovative mobile/wireless applications and services .4 IPv6 was originally deﬁned in RFC1883, RFC1884, and RFC1885, December 1995. RFC2460obsoletes RFC1883.
134 3DTV/3DV IPTV TRANSMISSION APPROACHES IPv6 was initially developed in the early 1990s because of the anticipatedneed for more end system addresses based on anticipated Internet growth, encom-passing mobile phone deployment, smart home appliances, and billions of newusers in developing countries (e.g., in China and India). New technologies andapplications such as VoIP, “always-on access” (e.g., DSL and cable), Ethernet-to-the-home, converged networks, and evolving ubiquitous computing applicationswill continue to drive this need even more in the next few years . IPv6 features, in comparison with IPv4, include the following : • Expanded Addressing Capabilities: IPv6 increases the IP address size from 32 bits to 128 bits to support more levels in the addressing hierarchy, a much greater number of addressable nodes, and simpler autoconﬁguration of addresses. The scalability of multicast routing is improved by adding a “scope” ﬁeld to multicast addresses. A new type of address called an “anycast address” is also deﬁned to be used to send a packet to any one of a group of nodes. • Header Format Simpliﬁcation: Some IPv4 header ﬁelds have been dropped or made optional, to reduce the common-case processing cost of packet handling and to limit the bandwidth cost of the IPv6 header. • Authentication and Privacy Capabilities: In IPv6, security is built-in as part of the protocol suite: extensions to support authentication, data integrity (encryption), and (optional) data conﬁdentiality are speciﬁed for IPv6. The security features of IPv6 are described in the Security Architecture for the Internet Protocol RFC 2401 , along with RFC 2402  and RFC2406 ; Internet Protocol Security (IPsec) deﬁned in these RFCs is required (mandatory). IPsec is a set of protocols and related mechanisms that supports conﬁdentiality and integrity. (IPsec was originally developed as part of the IPv6 speciﬁcation, but due to the need for security in the IPv4 environment, it has also been adapted for IPv4). • Flow Labeling Capability: A new feature is added to enable the labeling of packets belonging to particular trafﬁc “ﬂows” for which the sender requests special handling, such as non-default quality of service or “real-time” ser- vice. Services such as VoIP and IP-based entertainment video delivery (IPTV) is becoming broadly deployed and ﬂow labeling, especially in the network core, can be very beneﬁcial. • Improved Support for Extensions and Options: Changes in the way IP header options are encoded allows for more efﬁcient forwarding, less stringent limits on the length of options, and greater ﬂexibility for introducing new options in the future. End systems (such as PCs, servers), network elements (customer-owned and/orcarrier-owned) and (perhaps) applications need to be IPv6-aware to communicatein the IPv6 environment. IPv6 has been enabled on many computing platforms.At this juncture, many operating systems come with IPv6 enabled by default;
REFERENCES 135IPv6-ready Operating Systems (OS) include but are not limited to Mac OS X,OpenBSD, NetBSD, FreeBSD, Linux, Windows Vista, Windows XP (ServicePack 2), Windows 2003 Server, and Windows 2008 Server. Java began sup-porting IPv6 with J2SE 1.4 (in 2002) on Solaris and Linux. Support for IPv6on Windows was added with J2SE 1.5. Other languages, such as C and C++also support IPv6. At this time the number of applications with native IPv6support is signiﬁcant given that most important networking applications providenative IPv6 support. Hardware vendors including Apple Computer, Cisco Sys-tems, HP, Hitachi, IBM, and Microsoft, support IPv6. One should note that IPv6was designed with security in mind, but at the current time its implementationand deployment are (much) less mature than is the case for IPv4. When IPv4 wasdeveloped in the early 1980s, security was not a consideration; now a number ofmechanisms have been added to address security considerations to IP. When IPv6was developed in the early-to-mid 1990s, security was a consideration; hence, anumber of mechanisms have been built-in into the protocol from the get-go tofurnish security capabilities to IP.5 A presentation delivered during an open session at the July 2007 ICANNPublic Meeting in San Juan, Puerto Rico made note of the accelerated depletionrate of IPv4 addresses and the growing difﬁculties the Regional Internet Registries(RIRs) are experiencing in allocating contiguous address blocks of sufﬁcient sizeto service providers. Furthermore, the fragmentation in the IPv4 address spaceis taxing and stressing the global routing fabric and the near-term expectation isthat the RIRs will impose more restrictive IPv4 allocation policies and promotea rapid adoption of IPv6 addresses . The IPv4 address space is expected torun out by 2012.6 Appendix A5 provides some detailed information on IPv6.REFERENCES 1. Minoli D. IP multicast with applications to IPTV and mobile DVB-H. New York: Wiley/IEEE Press; 2008. 2. Zhone Technologies. Zhone VDSL2 technology, Zhone Technologies, Inc., Oakland (CA). Nov 2009. 3. PCM. FTTH Fiber to the Home Overview. Whitepaper. PCM-Sierra, Santa Clara (CA). 2009 4. Gurses E, Akar GB, Akar N. Optimal packet scheduling and rate control for video streaming. SPIE Visual Comm. and Image Processing (VCIP). Jan 2007.5 Some purist will argue (perhaps as an exercise in semantics), that since IPsec is available also toIPv4, that IPv6 and IPv4 have the same level of security. We take the approach in this text that sincethe use of IPsec is mandated as required in IPv6 while it is optional in IPv4, that at the practical,actual level, “IPv6 is more secure.”6 There has been talk about reclaiming unused IPv4 space that it would be a huge undertaking. Areclaiming of some portion of the IPv4 space will not help with the goal of proving an addressableIP address to appliances, cell phones, sensors (such as Smart Dust), surveillance cameras, Body-Area-Network devices, Unmanned Aerial Vehicle, and so on.
136 3DTV/3DV IPTV TRANSMISSION APPROACHES 5. Kurutepe E, Civanlar MR, Tekalp AM. Client-Driven Selective Streaming of Multi- View Video for Interactive 3DTV, submitted to IEEE Trans. CSVT. Dec 2006. 6. Kurutepe E, Civanlar MR, Tekalp AM. Interactive transport of multi-view videos for 3DTV applications. J Zhejiang Univ - Sci A: Appl Phys Eng 2006; 7(5): 830–836. 7. Petrovic G, de With PHN. Near-future streaming framework for 3D TV applications. Proceendings of the IEEE International Conference on Multimedia and Expo (ICME); Jul 2006; Toronto, Canada. pp. 1881–1884. 8. Tekalp AM, Kurutepe E, Civanlar MR. 3DTV over IP: End-to-end streaming of multi-view video. IEEE Signal Process Mag 2007; 24(6): 77–87. 9. Tekalp AM. 3D media delivery over IP. IEEE Multimed Comm Tech Committ E-Lett 2009; 4(3).10. Thomos N, Argyropoulos S, Boulgouris NV, Strintzis MG. Robust transmission of H.264/AVC streams using adaptive group slicing and unequal error protection. EURASIP J Appl Signal Process 2006.11. Argyropoulos S, Tan AS, Thomos N, Arikan E, Strintzis MG. Robust Transmission of Multi-View Video Streams Using Flexible Macroblock Ordering and Systematic LT Codes Proceedings of the 3DTV conference “3DTV-CON”; May 2007; Kos Island, Greece.12. Hu S-Y. A case for 3D streaming on peer-to-peer networks. Proceedings of ACM Web3D; April 2006; Columbia (ML). pp. 57–64.13. Sung W-L, Hu S-Y, Jiang J-R. Selection strategies for peer-to-peer 3D streaming. Proceedings of ACM NOSSDAV; 2008; Braunschweig, Germany. pp. 15–20.14. Amoss J, Minoli D. Handbook of IPv4 to IPv6 transition methodologies for institu- tional & corporate networks. Boca Raton (FL): Taylor and Francis; 2008.15. Minoli D, Kouns J. Security in an IPv6 environment. Boca Raton (FL): Taylor and Francis; 2009.16. Minoli D. Satellite systems engineering in an IPv6 environment. Boca Raton (FL): Taylor and Francis; 2009.17. Minoli D. Voice over IPv6—architecting the next-generation VoIP. New York: Else- vier; 2006.18. Directorate-Generals Information Society. IPv6: Enabling the Information Soci- ety. European Commission Information Society, Europe Information Society Portal. Feb 18 2008.19. IPv6 Portal. http://www.ipv6tf.org/meet/faqs.php.20. Postel J. Internet Protocol. STD 5, RFC 791, Sep 1981.21. Kent S, Atkinson R. Request for Comments: 2401. Security Architecture for the Internet Protocol. Nov 1998.22. Kent S, Atkinson R. Request for Comments: 2402. IP Authentication Header. Nov 1998.23. Kent S, Atkinson R. Request for Comments: 2406. IP Encapsulating Security Protocol (ESP). Nov 1998.24. ICANN Security and Stability Advisory Committee (SSAC). Survey of IPv6 Support in Commercial Firewalls. Oct 2007; Marina del Rey, CA.25. Lioy A. Security features of IPv6. Gai S, editor. Internetworking IPv6 with Cisco routers. McGraw-Hill; 1998. Chapter 8. Available at www.ip6.com/us/book/ Chap8.pdf.
REFERENCES 13726. An IPv6 Security Guide for U.S. Government Agencies—Executive Summary. The IPv6 World Report Series, Volume 4, Juniper Networks, Sunnyvale (CA). Feb 2008.27. Kaeo M, Green D, Bound J, Pouffary Y. IPv6 Security Technology Paper. North American IPv6 Task Force (NAv6TF) Technology Report. Jul 22 2006. ¨ u28. Yaiz RA, Ozt¨ rk O. Mobility in IPv6, The IPv6 Portal. 2000. www.ipv6tf.org.29. Srisuresh P, Egevang K. Traditional IP Network Address Translator (Traditional NAT). RFC 3022. Jan 2001.30. Microsoft Corporation. MSDN Library, Internet Protocol. 2004. http://msdn. microsoft.com.31. Hermann-Seton P. Security Features in IPv6, SANS Institute, As part of the Infor- mation Security Reading Room. 2002.32. Ertekin E, Christou C. IPv6 Header Compression, North American IPv6 Summit. Booz Allen Hamilton. Jun 2004.33. Donz´ F. IPv6 autoconﬁguration. The Internet Protocol Journal 2004; 7(2). Available e at http://www.cisco.com. San Jose, CA.34. Desmeules R. Cisco self-study: implementing Cisco IPv6 networks (IPv6). Cisco Press; 2003.35. 6NET. D2.2.4: Final IPv4 to IPv6 Transition Cookbook for Organizational/ISP (NREN) and Backbone Networks. Version: 1.0, Project Number: IST-2001-32603, CEC Deliverable Number: 32603/UOS/DS/2.2.4/A1. Feb 4 2005.36. Gilligan R, Nordmark E. Transition Mechanisms for IPv6 Hosts and Routers. RFC 2893. Aug 2000.37. Shin M-K. Application aspects of IPv6 transition. In: Hong Y-G, Hagino J, Savola P, et al., editors. RFC 4038. Mar 2005.38. Warﬁeld MH. Security implications of IPv6. 16th Annual First Conference on Com- puter Security Incident Handling; Jun 13–18 2004; Budapest, Hungary. X-Force, Internet Security Systems, Inc. (ISS).39. Commission of the European Communities, Communication from the Commission to the European Parliament, the Council, the European Economic and Social Committee and the Committee of the Regions—Advancing the Internet: Action Plan for the deployment of Internet Protocol version 6 (IPv6) in Europe, Brussels. May 27 2008.
138 3DTV/3DV IPTV TRANSMISSION APPROACHESAPPENDIX A5: IPv6 BASICSThis appendix does not discuss 3DTV technology phase. However, the positionis taken that 3DTV designers considering to use IP-based distribution networksshould give serious consideration to utilizing IPv6, if not immediately then as atransition plan.A5.1 IPv6 OverviewWhile the basic function of the IP is to move information across networks, IPv6has more capabilities built into its foundation than IPv4. A key capability is thesigniﬁcant increase in address space. For example, all devices could have a publicIP address so that they can be uniquely tracked.7 Today, inventory managementof dispersed assets in a very large dispersed organization such as the United StatesDepartment of Defense (DoD) Department cannot be achieved with IP mecha-nisms; during the inventory cycle someone has to manually verify the locationof each desktop computer. With IPv6 one can use the network to verify that suchequipment is there; even non-IT equipment in the ﬁeld can also be tracked, byhaving an IP address permanently assigned to it. IPv6 also has extensive auto-matic conﬁguration (autoconﬁguration) mechanisms and reduces the IT burden,making conﬁguration essentially plug-and-play (autoconﬁguration implies that aDynamic Host Conﬁguration Protocol or DHCP server is not needed and/or doesnot have to be conﬁgured. Owing to the fact that IPv4 manual conﬁguration isalready a challenge in itself, one can understand that manually manipulating IPv6addresses that are four times longer can be much more problematic. Corporationsand government agencies will be able to achieve a number of improvements withIPv6 such as, but not limited to the following • expanded addressing capabilities; • serverless autoconﬁguration (what some call “plug-n-play”) and reconﬁgu- ration; • streamlined header format and ﬂow identiﬁcation; • end-to-end security, with built-in, strong IP-layer encryption and authenti- cation (embedded security support with mandatory IPsec implementation); • in IPv6, creating a VPN is easier and more standard than in IPv4, because of the Authentication Header (AH) and Encapsulating Security Protocol (ESP) Extension Headers and the performance penalty is lower for the VPN implemented in IPv6 compared to those built in IPv4 ; • enhanced support for multicast and QoS (more reﬁned support for ﬂow control and QoS for the near real-time delivery of data); • more efﬁcient and robust mobility mechanisms (enhanced support for Mobile IP and mobile computing devices);7 Note that this has some potential negative security issues as attackers could be able to own amachine and then exactly know how to go back to that same machine again. Therefore, reliablesecurity mechanisms need to be put understood and put in place in IPv6 environments.
APPENDIX A5: IPv6 BASICS 139 • extensibility: improved support for feature options/extensions; • IPv6 makes it easy for nodes to have multiple IPv6 addresses on the same network interface. This can create the opportunity for users to establish overlay or Communities of Interest (COI) networks on top of other phys- ical IPv6 networks. Department, groups, or other users and resources can belong to one or more COIs, where each can have its own speciﬁc security policy ; • merging two IPv4 networks with overlapping addresses (say, if two organi- zations merge) is complex; it will be much easier to merge networks with IPv6; • IPv6 network architectures can easily adapt to an end-to-end security model where the end hosts have the responsibility of providing the security services necessary to protect any data trafﬁc between them; this results in greater ﬂexibility for creating policy-based trust domains that are based on varying parameters including node address and application . IPv6 basic capabilities include the following: • addressing, • anycast, • ﬂow labels, • ICMPv6, • Neighbor Discovery (ND). Table A5.1 shows the core protocols that comprise IPv6.TABLE A5.1 Key IPv6 ProtocolsProtocol DescriptionInternet Protocol Version 6 IPv6 is a connectionless datagram protocol used for (IPv6): RFC 2460 routing packets between hostsInternet Control Message A mechanism that enables hosts and routers that use Protocol for IPv6 IPv6 communication to report errors and send status (ICMPv6): RFC 2463 messagesMulticast Listener A mechanism that enables one to manage subnet Discovery (MLD): RFC multicast membership for IPv6. MLD uses a series of 2710, RFC 3590, RFC three ICMPv6 messages. MLD replaces the Internet 3810 Group Management Protocol (IGMP) v3 that is employed for IPv4Neighbor Discovery (ND): A mechanism that is used to manage node-to-node RFC 2461 communication on a link. ND uses a series of ﬁve ICMPv6 messages. ND replaces Address Resolution Protocol (ARP), ICMPv4 Router Discovery, and the ICMPv4 Redirect message ND is implemented using the Neighbor Discovery Protocol (NDP) IP was designed in the 1970s for the purpose of connecting computers thatwere in separate geographic locations. Computers in a campus were connected
140 3DTV/3DV IPTV TRANSMISSION APPROACHESby means of local networks, but these local networks were separated into essen-tially stand-alone islands. “Internet,” as a name to designate the protocol andmore recently the worldwide information network, simply means “internetwork”;that is, a connection between multiple networks. In the beginning, the protocolinitially had only military use in mind, but computers from universities and enter-prises were quickly added. The Internet as a worldwide information network isthe result of the practical application of the IP protocol; that is, the result ofthe interconnection of a large set of information networks . Starting in theearly 1990s, developers realized that the communication needs of the twenty-ﬁrstcentury required a protocol with some new features and capabilities, while at thesame time retaining the useful features of the existing protocol. While link-level communication does not generally require a node identiﬁer(address) since the device is intrinsically identiﬁed with the link-level address,communication over a group of links (a network) does require unique node iden-tiﬁers (addresses). The IP address is an identiﬁer that is applied to each deviceconnected to an IP network. In this setup, different elements taking part in thenetwork (servers, routers, desktop computers, etc.) communicate among eachother using their IP address as an entity identiﬁer. In version 4 of the IP proto-col, addresses consist of four octets. For ease of human conversation, IP protocoladdresses are represented as separated by periods, for example: 220.127.116.11,where the decimal numbers are a short hand (and correspond to) the binary codedescribed by the byte in question (an 8 bit number takes a value in the 0–255range). Since the IPv4 address has 32 bits there are nominally 232 differentIP addresses (approximately 4 billion nodes, if all combinations are used). TheDomain Name System (DNS) also helped the human conversation in the contextof IPv4; DNS is going to be even more critical in IPv6 and will have substantialimpact on security administrators that use IP addresses to deﬁne security policies(e.g., Firewalls). IPv4 has proven, by means of its long life, to be a ﬂexible and powerfulnetworking mechanism. However, IPv4 is starting to exhibit limitations, not onlywith respect to the need for an increase of the IP address space, driven, forexample, by new populations of users in countries such as China and India, andby new technologies with “always connected devices” (DSL, cable, networkedPrimary Deployment Area or PDAs, 2.5G/3G mobile telephones, etc.), but alsoin reference to a potential global rollout of VoIP. IPv6 creates a new IP addressformat, so that the number of IP addresses will not get exhausted for severaldecades or longer even though an entirely new crop of devices are expected toconnect to Internet. IPv6 also adds improvements in areas such as routing and network autoconﬁg-uration. Speciﬁcally, new devices that connect to Internet will be “plug-and-play”devices. With IPv6 one is not required to conﬁgure dynamic unpublished localIP addresses, the gateway address, the subnetwork mask or any other parameters.The equipment, when plugged into the network, automatically obtains all requisiteconﬁguration data . The advantages of IPv6 can be summarized as follows:
APPENDIX A5: IPv6 BASICS 141 • Scalability: IPv6 has 128 bit addresses versus 32 bit IPv4 addresses. With IPv4 the theoretical number of available IP addresses is 232 ∼ 1010 . IPv6 offers a 2128 space. Hence, the number of available unique node addressees are 2128 ∼ 1039 . • Security: IPv6 includes security features in its speciﬁcations such as payload encryption and authentication of the source of the communication. • Real-Time Applications: To provide better support for real-time trafﬁc (e.g., VoIP), IPv6 includes “labeled ﬂows” in its speciﬁcations. By means of this mechanism, routers can recognize the end-to-end ﬂow to which transmitted packets belong. This is similar to the service offered by MPLS, but it is intrinsic with the IP mechanism rather than an add-on. Also, it preceded this MPLS feature by a number of years. • “Plug-And-Play”: IPv6 includes a “plug-and-play” mechanism that facili- tates the connection of equipment to the network. The requisite conﬁguration is automatic. • Mobility: IPv6 includes more efﬁcient and enhanced mobility mechanisms, which are important for mobile networks.8 • Optimized Protocol: IPv6 embodies IPv4 best practices but removes unused or obsolete IPv4 characteristics. This results in a better-optimized Internet protocol. • Addressing and Routing: IPv6 improves the addressing and routing hierar- chy. • Extensibility: IPv6 has been designed to be extensible and offers support for new options and extensions. With IPv4, the 32-bit address can be represented as AdrClass|netID|hostID.The network portion can contain either a network ID or a network ID and a sub-net. Every network and every host or device has a unique address, by deﬁnition.Basic NATing is a method by which IP addresses (speciﬁcally IPv4 addresses)are transparently mapped from one group to another. Speciﬁcally, private “unreg-istered” addresses are mapped to a small set (as small as 1) of public registeredaddresses; this impacts the general addressability, accessibility, and “individual-ity” of the device. Network Address Port Translation (NAPT), also referred to asPort Address Translation (PAT), is a method by which many network addressesand their TCP/UDP ports are translated into a single network address and itsTCP/UDP ports. Together, these two methods, referred to as traditional Network8 Some of the beneﬁts of IPv6 in the context of mobility include  (i) larger addresses, whichallows for new techniques to be used in order for the Mobile Node (MN) to obtain a care-of address;here MNs can always get a collocated care-of address, a fact that removes the need for a ForeignAgent (FA). (ii) New routing header, which allows for proper use of source routing. This was notpossible with IPv4. (iii) AH, which allows for the authentication of the binding messages. (iv)Destination options header, which allows for the use an options without signiﬁcant performancedegradation; performance degradation may have occurred in IPv4 because every router along thepath had to examine the options even when they are only destined for the receiver of the packet.
142 3DTV/3DV IPTV TRANSMISSION APPROACHESAddress Translation (NAT), provide a mechanism to connect a realm with pri-vate addresses to an external realm with globally unique registered addresses .NAT is a short-term solution for the anticipated Internet growth requirements forthis decade and a better solution is needed for address exhaustion. There is a clearrecognition that NAT techniques make the Internet, the applications, and even thedevices more complex (especially when conducting business-to-business transac-tions) and this means a cost overhead . Overlapping encryptions domains hasbeen a substantial issue for organizations to deal with when creating gateway-to-gateway VPNs. The expectation is that IPv6 can make IP devices less expensive,more powerful, and even consume less power; the power issue is not only impor-tant for environmental reasons, but also improves operability (e.g., longer batterylife in portable devices, such as mobile phones). IPv4 addresses can be from an ofﬁcially assigned public range or from an inter-nal intranet private (but not globally unique) block. Internal intranet addressesmay be in the ranges 10.0.0.0/8, 172.16.0.0/12, and 192.168.0.0/16, as suggestedin RFC 1918. In the case of an internal intranet private address, a NAT functionis employed to map the internal addresses to an external public address when theprivate-to-public network boundary is crossed. This, however, imposes a num-ber of limitations, particularly since the number of registered public addressesavailable to a company is almost invariably much smaller (as small as 1) thanthe number of internal devices requiring an address. As noted, IPv4 theoretically allows up to 232 addresses, based on a four-octetaddress space. Public, globally unique addresses are assigned by the InternetAssigned Numbers Authority (IANA). IP addresses are addresses of networknodes at layer 3; each device on a network (whether the Internet or an intranet)must have a unique address. In IPv4, it is a 32-bit (4-byte) binary address usedto identify the device. It is represented by the nomenclature a.b.c.d, each ofa, b, c, and d being from 1 to 255 (0 has a special meaning). Examples are18.104.22.168, 22.214.171.124, and 126.96.36.199. The problem is that during the 1980s many public, registered addresses wereallocated to ﬁrms and organizations without any consistent control. As a result,some organizations have more addresses than they actually need, giving rise tothe present dearth of available “registerable” Layer 3 addresses. Furthermore, notall IP addresses can be used due to the fragmentation described above. One approach to the issue would be a renumbering and a reallocation ofthe IPv4 addressing space. However, this is not as simple as it appears sinceit requires signiﬁcant worldwide coordination efforts and it would not solvethe medium-term need for a much larger address space for evolving end-user/consumer applications. Moreover, it would still be limited for the human pop-ulation and the quantity of devices that will be connected to the Internet in themedium-term future . At this juncture, and as a temporary and pragmaticapproach to alleviate the dearth of addresses, NAT mechanisms are employedby organizations and even home users. This mechanism consists of using onlya small set of public IPv4 addresses for an entire network to access to Inter-net. The myriad of internal devices are assigned IP addresses from a speciﬁcally
APPENDIX A5: IPv6 BASICS 143designated range of Class A or Class C address that are locally unique but areduplicatively used and reused within various organizations. In some cases (e.g.,residential Internet access use via DSL or cable), the legal IP address is onlyprovided to a user on a time-lease basis, rather than permanently. A number of protocols cannot travel through a NAT device and hence the useof NAT implies that many applications (e.g., VoIP) cannot be used effectively inall instances.9 As a consequence, these applications can only be used in intranets.Examples include the following : • Multimedia applications such as videoconferencing, VoIP, or VOD/IPTV do not work smoothly through NAT devices. Multimedia applications make use of RTP and Real-Time Control Protocol (RTCP). These in turn use UDP with dynamic allocation of ports and NAT does not directly support this environment. • IPsec is used extensively for data authentication, integrity, and conﬁdential- ity. However, when NAT is used, IPsec operation is impacted, since NAT changes the address in the IP header. • Multicast, although possible in theory, requires complex conﬁguration in a NAT environment and hence, in practice, is not utilized as often as could be the case. The need for obligatory use of NAT disappears with IPv6 (but it can still beused if someone wanted to). The format of IPv6 addressing is described in RFC 2373. As noted, an IPv6address consists of 128 bits, rather than 32 bits as with IPv4 addresses. Thenumber of bits correlates to the address space, as follows:IP Version Size of Address SpaceIPv6 128 bits, that allows for 2128 or 340,282,366,920,938,463,463, 374,607,431,768,211,456 (3.4 × 1038 ) possible addressesIPv4 32 bits, that allows for 232 or 4,294,967,296 possible addresses9 The reader should be aware that we are not referring here to deploying corporate VoIP for anorganization of 10, 1000, or 10,000 employees and then being able to pass VoIP protocols overthe ﬁrewall. That is a fairly trivial exercise. We are referring here to the overreaching goal ofenabling any-person-on-the-planet-to-any-other-person-on-the-planet VoIP-based communication byaffording a consistent, stable, and publishable addressing scheme. The US Bell System and thetelecommunications world solved that problem over half a century ago, by giving the world atelephony addressing scheme that allows every person in the world to have a unique, persistent,usable telephone number (Country Code + City (if applicable) + Local number) from Antarctica(+672) to Zimbabwe (+263), from Easter Island (+56) to Tristan da Cunha (+290), and every landand island in between.
144 3DTV/3DV IPTV TRANSMISSION APPROACHES The relatively large size of the IPv6 address is designed to be subdividedinto hierarchical routing domains that reﬂect the topology of the modern-dayInternet. The use of 128 bits provides multiple levels of hierarchy and ﬂexibility indesigning hierarchical addressing and routing. The IPv4-based Internet currentlylacks this ﬂexibility . The IPv6 address is represented as 8 groups of 16 bits each, separated by the“:” character. Each 16 bit group is represented by 4 hexadecimal digits, that is,each digit has a value between 0 and F (0,1, 2, . . . A, B, C, D, E, F with A =1010 , B = 1110 , etc., to F = 1510 ). What follows is an example of a hypotheticalIPv6 address 3223 : 0BA0:01E0:D001 : 0000 : 0000 : D0F0 : 0010 If one or more four-digit groups is 0000, the zeros may be omitted and replacedwith two colons (::). For example, 3223 : 0BA0 ::is the abbreviated form of the following address: 3223 : 0BA0 : 0000 : 0000 : 0000 : 0000 : 0000 : 0000 Similarly, only one 0 is written, removing 0’s in the left side, and four 0’s inthe middle of the address. For example, the address 3223 : BA0 : 0 : 0 : 0 : 0 :: 1234is the abbreviated form of the following address 3223 : 0BA0 : 0000 : 0000 : 0000 : 0000 : 0000 : 1234 There is also a method to designate groups of IP addresses or subnetworksthat is based on specifying the number of bits that designate the subnetwork,beginning from left to right, using remaining bits to designate single devicesinside the network. For example, the notation 3223 : 0BA0:01A0 :: /48indicates that the part of the IP address used to represent the subnetwork has 48bits. Since each hexadecimal digit has 4 bits, this points out that the part usedto represent the subnetwork is formed by 12 digits, that is “3223:0BA0:01A0.”The remaining digits of the IP address would be used to represent nodes insidethe network. There are a number of special IPv6 addresses, as follows: • Autoreturn or Loopback Virtual Address: This address is speciﬁed in IPv4 as the 127.0.0.1 address. In IPv6, this address is represented as ::1.
APPENDIX A5: IPv6 BASICS 145 • Unspeciﬁed Address (::): This address is not allocated to any node since it is used to indicate the absence of an address. • IPv6 over IPv4 Dynamic/Automatic Tunnel Addresses: These addresses are designated as IPv4-compatible IPv6 addresses and allow the sending of IPv6 trafﬁc over IPv4 networks in a transparent manner. For example, they are represented as ::188.8.131.52. • IPv4 over IPv6 Addresses Automatic Representation: These addresses allow for IPv4-only-nodes to still work in IPv6 networks. They are designated as IPv4-mapped IPv6 addresses and are represented as ::FFFF: (e.g., ::FFFF:184.108.40.206). Like IPv4, IPv6 is a connectionless, unreliable datagram protocol used pri-marily for addressing and routing packets between hosts. Connectionless meansthat a session is not established before exchanging data. Unreliable means thatdelivery is not guaranteed. IPv6 always makes a best-effort attempt to deliver apacket. An IPv6 packet might be lost, delivered out of sequence, duplicated, ordelayed. IPv6 per se does not attempt to recover from these types of errors. Theacknowledgment of packets delivered and the recovery of lost packets is done bya higher-layer protocol, such as TCP . From a packet forwarding perspective,IPv6 operates just like IPv4. An IPv6 packet, also known as an IPv6 datagram, consists of an IPv6 headerand an IPv6 payload, as shown in Fig. A5.1. The IPv6 header consists of twoparts, the IPv6 base header, and optional extension headers (Fig. A5.2). Func-tionally, the optional extension headers and upper-layer protocols, for example Version Traffic class Flow label Payload length Next header Hop limit Source address Destination address Figure A5.1 IPv6 packet.
146 3DTV/3DV IPTV TRANSMISSION APPROACHES Version Traffic class Flow label Payload length Next header Hop limit Source IPv6 address (128 bit) 40 octets Destination IPv6 address (128 bit) Next header Extension header information Variable length PayloadFigure A5.2 IPv6 extension headers. IPv6 extension headers are optional headers thatmay follow the basic IPv6 header. An IPv6 PDU may include zero, one or multipleheaders. When multiple extension headers are used, they form a chained list of headersidentiﬁed by the “next header” ﬁeld of the previous header.TCP, are considered part of the IPv6 payload. Table A5.2 shows the ﬁelds in theIPv6 base header. IPv4 headers and IPv6 headers are not directly interoperable:hosts and/or routers must use an implementation of both IPv4 and IPv6 in orderto recognize and process both header formats (Fig. A5.3). This gives rise to anumber of complexities in the migration process between the IPv4 and the IPv6environments. The IP header in IPv6 has been streamlined and deﬁned to be ofa ﬁxed length (40 bytes). In IPv6, header ﬁelds from the IPv4 header have beenremoved, renamed, or moved to the new optional IPv6 Extension Headers. Theheader length ﬁeld is no longer needed since the IPv6 header is now a ﬁxedlength entity. The IPv4 Type of Service is equivalent to the IPv6 Trafﬁc Classﬁeld. The Total Length ﬁeld has been replaced with the Payload Length ﬁeld.Since IPv6 only allows for fragmentation to be performed by the IPv6 sourceand destination nodes, and not individual routers, the IPv4 segment control ﬁelds(Identiﬁcation, Flags, and Fragment Offset ﬁelds) have been moved to similarﬁelds within the Fragment Extension Header. The functionality provided by theTime to Live (TTL10 ) ﬁeld has been replaced with the Hop Limit ﬁeld. TheProtocol ﬁeld has been replaced with the Next Header Type ﬁeld. The HeaderChecksum ﬁeld was removed; that has the main advantage of not having eachrelay spend time processing the checksum. The Options ﬁeld is no longer part of10 TTL has been used in many attacks and Intrusion Detection System (IDS) tricks in IPv4.
APPENDIX A5: IPv6 BASICS 147TABLE A5.2 IPv6 Base HeaderIPv6 Header Field Length (bits) FunctionVersion 4 Identiﬁes the version of the protocol. For IPv6, the version is 6Trafﬁc class 8 Intended for originating nodes and forwarding routers to identify and distinguish between different classes or priorities of IPv6 packetsFlow label 20 (Sometimes referred to as Flow ID.) Deﬁnes how trafﬁc is handled and identiﬁed. A ﬂow is a sequence of packets sent either to a unicast or a multicast destination. This ﬁeld identiﬁes packets that require special handling by the IPv6 node. The following list shows the ways the ﬁeld is handled if a host or router does not support ﬂow label ﬁeld functions: • if the packet is being sent, the ﬁeld is set to zero • if the packet is being received, the ﬁeld is ignoredPayload 16 Identiﬁes the length, in octets, of the payload. This ﬁeld is length a 16-bit unsigned integer. The payload includes the optional extension headers, as well as the upper-layer protocols; for example, TCPNext header 8 Identiﬁes the header immediately following the IPv6 header. The following shows examples of the next header: 00 = Hop-by-hop options 01 = ICMPv4 04 = IP in IP (encapsulation) 06 = TCP 17 = UDP 43 = Routing 44 = Fragment 50 = Encapsulating security payload 51 = Authentication 58 = ICMPv6Hop limit 8 Identiﬁes the number of network segments, also known as links or subnets, on which the packet is allowed to travel before being discarded by a router. The Hop Limit is set by the sending host and is used to prevent packets from endlessly circulating on an IPv6 internetwork When forwarding an IPv6 packet, IPv6 routers must decrease the Hop Limit by 1, and must discard the IPv6 packet when the Hop Limit is 0Source 128 Identiﬁes the IPv6 address of the original source of the address IPv6 packetDestination 128 Identiﬁes the IPv6 address of intermediate or ﬁnal address destination of the IPv6 packet
148 3DTV/3DV IPTV TRANSMISSION APPROACHES Figure A5.3 Comparison of IPv4 and IPv6 headers.the header as it was in IPv4. Options are speciﬁed in the optional IPv6 ExtensionHeaders. The removal of the Options ﬁeld from the header enables more efﬁcientrouting; only the information that is needed by a router needs to be processed. One area requiring consideration, however, is the length of the IPv6 PDU:the 40-octet header can be a problem for real-time IP applications such as VoIPand IPTV. Header compression becomes critical .11 Also, there will be somebandwidth inefﬁciency in general, that could be an issue in limited-bandwidthenvironments or applications (e.g., sensor networks.)11 Two compression protocols emerged from the IETF in recent years : (i) Internet ProtocolHeader Compression (IPHC), a scheme designed for low Bit Error Rate (BER) links (compressionproﬁles are deﬁned in RFC 2507 and RFC 2508); it provides compression of TCP/IP, UDP/IP,RTP/UDP/IP, and ESP/IP header; “enhanced” compression of RTP/UDP/IP (ECRTP) headers isdeﬁned in RFC 3545. (ii) Robust Header Compression (ROHC) Working Group, a scheme designedfor wireless links which provides greater compression compared to IPHC at the cost of greaterimplementation complexity (compression proﬁles are deﬁned in RFC 3095 and RFC 3096); this ismore suitable for high BER, long Round Trip Time (RTT) links and supports compression of ESP/IP,UDP/IP, and RTP/UDP/IP headers.
APPENDIX A5: IPv6 BASICS 149 “Autoconﬁguration” is a new characteristic of the IPv6 protocol that facilitatesnetwork management and system setup tasks by users. This characteristic isoften called “plug-and-play” or “connect-and-work.” Autoconﬁguration facilitatesinitialization of user devices: after connecting a device to an IPv6 network, one orseveral IPv6 globally unique addresses are automatically allocated. DHCP allowssystems to obtain an IPv4 address and other required information (e.g., defaultrouter or DNS server). A similar protocol, DHCPv6, has been published for IPv6.DHCP and DHCPv6 are known as stateful protocols because they maintain tableson (specialized) servers. However, IPv6 also has a new stateless autoconﬁgurationprotocol that has no equivalent in IPv4. The stateless autoconﬁguration protocoldoes not require a server component because there is no state to maintain (aDHCP server may typically run in a router or ﬁrewall). Every IPv6 system (otherthan routers) is able to build its own unicast global address. Stateless AddressAutoconﬁguration (SLAAC) provides an alternative between a purely manualconﬁguration and stateful autoconﬁguration . “Stateless” autoconﬁguration is also described as “serverless.” The acronymSLAAC is also used for serverless address autoconﬁguration. SLAAC isdeﬁned in RFC 2462. With SLAAC, the presence of conﬁguration servers tosupply proﬁle information is not required. The host generates its own addressusing a combination of the information that it possesses (in its interface ornetwork card) and the information that is periodically supplied by the routers.Routers determine the preﬁx that identiﬁes networks associated to the link underdiscussion. The “interface identiﬁer” identiﬁes an interface within a subnetworkand is often, and by default, generated from the Media Access Control (MAC)address of the network card. The IPv6 address is built combining the 64 bitsof the interface identiﬁer with the preﬁxes that routers determine as belongingto the subnetwork. If there is no router, the interface identiﬁer is self-sufﬁcientto allow the PC to generate a “link-local” address. The “link-local” address issufﬁcient to allow the communication between several nodes connected to thesame link (the same local network). IPv6 addresses are “leased” to an interface for a ﬁxed established time (includ-ing an inﬁnite time.) When this “lifetime” expires, the link between the interfaceand the address is invalidated and the address can be reallocated to other inter-faces. For the suitable management of addresses expiration time, an address goesthrough two states (stages) while is afﬁliated to an interface : 1. At ﬁrst, an address is in a “preferred” state, so its use in any communication is not restricted. 2. After that, an address becomes “deprecated,” indicating that its afﬁliation with the current interface will (soon) be invalidated. When it is in a “deprecated” state, the use of the address is discouraged,although it is not forbidden. However, when possible, any new communication(for example, the opening of a new TCP connection) must use a “preferred”address. A “deprecated” address should only be used by applications that have
150 3DTV/3DV IPTV TRANSMISSION APPROACHESalready used it before and in cases where it is difﬁcult to change this address toanother address without causing a service interruption. To ensure that allocated addresses (granted either by manual mechanisms orby autoconﬁguration) are unique in a speciﬁc link, the link duplicated addressesdetection algorithm is used. The address to which the duplicated address detec-tion algorithm is being applied to is designated (until the end of this algorithmicsession) as an “attempt address.” In this case, it does not matter that such anaddress has been allocated to an interface and received packets are discarded. Next, we describe how an IPv6 address is formed. The lowest 64 bits of theaddress identify a speciﬁc interface and these bits are designated as “interfaceidentiﬁer.” The highest 64 bits of the address identify the “path” or the “preﬁx”of the network or router in one of the links to which such interface is connected.The IPv6 address is formed by combining the preﬁx with the interface identiﬁer. It is possible for a host or device to have IPv6 and IPv4 addresses simultane-ously? Most of the systems that currently support IPv6 allow the simultaneoususe of both protocols. In this way, it is possible to support communication withIPv4-only-networks as well as IPv6-only-networks and the use of the applicationsdeveloped for both protocols . Is it possible to transmit IPv6 trafﬁc over IPv4 networks via tunneling methods.This approach consists of “wrapping” the IPv6 trafﬁc as IPv4 payload data:IPv6 trafﬁc is sent “encapsulated” into IPv4 trafﬁc and at the receiving end, thistrafﬁc is parsed as IPv6 trafﬁc. Transition mechanisms are methods used for thecoexistence of IPv4 and/or IPv6 devices and networks. For example, an “IPv6-in-IPv4 tunnel” is a transition mechanism that allows IPv6 devices to communicatethrough an IPv4 network. The mechanism consists of creating the IPv6 packetsin a normal way and encapsulating them in an IPv4 packet. The reverse processis undertaken in the destination machine that de-encapsulates the IPv6 packet. There is a signiﬁcant difference between the procedures to allocate IPv4addresses, that focus on the parsimonious use of addresses (since addresses are ascare resource and should be managed with caution), and the procedures to allo-cate IPv6 addresses, that focus on ﬂexibility. ISPs deploying IPv6 systems followthe RIRs policies relating to how to assign IPv6 addressing space among theirclients. RIRs are recommending ISPs and operators allocate to each IPv6 clienta/48 subnetwork; this allows clients to manage their own subnetworks withoutusing NAT. (The implication is that the obligatory need for NAT disappears inIPv6). In order to allow its maximum scalability, the IPv6 protocol uses an approachbased on a basic header, with minimum information. This differentiates it fromIPv4 where different options are included in addition to the basic header. IPv6uses a header “concatenation” mechanism to support supplementary capabilities.The advantages of this approach include the following: • The size of the basic header is always the same, and is well known. The basic header has been simpliﬁed compared with IPv4, since only 8 ﬁelds are used instead of 12. The basic IPv6 header has a ﬁxed size; hence, its processing
APPENDIX A5: IPv6 BASICS 151 by nodes and routers is more straightforward. Also, the header’s structure aligns to 64 bits, so that new and future processors (64 bits minimum) can process it in a more efﬁcient way. • Routers placed between a source point and a destination point (that is, the route that a speciﬁc packet has to pass through), do not need to process or understand any “following headers.” In other words, in general, interior (core) points of the network (routers) only have to process the basic header while in IPv4, all headers must be processed. This ﬂow mechanism is similar to the operation in MPLS, yet precedes it by several years. • There is no limit to the number of options that the headers can support (the IPv6 basic header is 40 octets in length, while IPv4 one varies from 20 to 60 octets, depending on the options used). In IPv6, interior/core routers do not perform packets fragmentation, but thefragmentation is performed end-to-end. That is, source and destination nodesperform, by means of the IPv6 stack, the fragmentation of a packet and thereassembly, respectively. The fragmentation process consists of dividing thesource packet into smaller packets or fragments . The IPv6 speciﬁcation deﬁnes a number of extension headers  (Table A5.3)): • Routing Header: Similar to the source routing options in IPv4, the header is used to mandate a speciﬁc routing. • Authentication Header: AH is a security header that provides authentication and integrity. • Encapsulating Security Payload (ESP) Header: ESP is a security header that provides authentication and encryption. • Fragmentation Header: This is similar to the fragmentation options in IPv4. • Destination Options Header: A header that contains a set of options to be processed only by the ﬁnal destination node. Mobile IPv6 is an example of an environment that uses such a header. • Hop-by-Hop Options Header: A set of options needed by routers to perform certain management or debugging functions. As noted, IPsec provides network-level security where the applicationdata is encapsulated within the IPv6 packet. IPsec utilizes the AH and/orESP header to provide security (the AH and ESP header may be usedseparately or in combination). IPsec, with ESP, offers integrity and dataorigin authentication, conﬁdentiality, and optional (at the discretion ofthe receiver) antireplay features (using conﬁdentiality without integrity isdiscouraged by the RFCs); ESP furthermore provides limited trafﬁc ﬂowconﬁdentiality. Both the AH and ESP header may be employed as follows (Fig. A5.4):
152 3DTV/3DV IPTV TRANSMISSION APPROACHESTABLE A5.3 IPv6 Extension HeadersHeader (Protocol ID) DescriptionHop-by-hop options The hop-by-hop options header is used for jumbogram header (protocol 0) packets and the Router Alert. An example of applying the hop-by-hop options header is Resource Reservation Protocol (RSVP). This ﬁeld is read and processed by every node and router along the delivery pathDestination options This header carries optional information that is speciﬁcally header (protocol 60) targeted to a packet’s destination address. The mobile IPv6 protocol speciﬁcation makes use of the destination options header to exchange registration messages between mobile nodes and the home agent. Mobile IP is a protocol allowing mobile nodes to keep permanent IP addresses even if they change point of attachmentRouting header (protocol This header can be used by an IPv6 source node to force a 43) packet to pass through speciﬁc routers on the way to its destination. A list of intermediary routers may be speciﬁed within the routing header when the routing type ﬁeld is set to 0Fragment header In IPv6, the Path Maximum Transmission Unit Discovery (protocol 44) (PMTUD) mechanism is recommended to all IPv6 nodes. When an IPv6 node does not support PMTUD and it must send a packet larger than the greatest MTU (Maximum Transmission Unit) along the delivery path, the fragment header is used. When this happens, the node fragments the packets and sends each fragment using fragment headers; then the destination node reassembles the original packet by concatenating all the fragmentsAuthentication header This header is used in IPsec to provide authentication, data (AH) (protocol 51) integrity, and replay protection. It also ensures protection of some ﬁelds of the basic IPv6 header. This header is identical in both IPv4 and IPv6Encapsulating security This header is also used in IPsec to provide authentication, payload (ESP) header data integrity, replay protection, and conﬁdentiality of the (protocol 50) IPv6 packet. Similar to the authentication header, this header is identical in both IPv4 and IPv6 • Tunnel Mode: The protocol is applied to the entire IP packet. This method is needed to ensure security over the entire packet, where a new IPv6 header and an AH or ESP header are wrapped around the original IP packet. • Transport Mode: The protocol is just applied to the transport layer (i.e., TCP, UDP, ICMP) in the form of an IPv6 header, AH or ESP header, followed by the transport protocol data (header, data).
APPENDIX A5: IPv6 BASICS 153 Original IP Extension TCP Data header headers Transport mode AH Original IP Hop-by-hop, DST options, DST AH TCP Data header routing, fragment options Mutable field processing Immutable fields DSCP ECN Flow Label Hop Limit Authenticated except for mutable fields Transport mode ESP Tunnel mode AH Original IP Hop-by-hop, DST options, DST ESP options TCP Data ESP trailer ESP ICV header routing, fragment Encrypted Integrity protected New IP Hop-by-hop, dest, Original IP Hop-by-hop, Dast, AH TCP Data header routing, fragment header routing, fragment Mutable field processing Immutable fields DSCP ECN Tunnel mode ESP Flow Label Hop Limit Authenticated except for mutuable fields New IP New EXT Original IP Original EXT ESP TCP Data ESP trailer ESP ICV header headers header headers Encrypted Integrity protected Figure A5.4 IPsec modes and types. Migration to IPv6 environments is expected to be fairly complex. Initially,internetworking between the two environments will be critical. Existing IPv4-endpoints and/or nodes will need to run dual-stack nodes or convert to IPv6systems. Fortunately, the new protocol supports an IPv4-compatible IPv6 addressthat is an IPv6 address employing embedded IPv4 addresses. Tunneling, that wealready described in passing, will play a major role in the beginning. There are anumber of requirements that are typically applicable to an organization wishingto introduce an IPv6 service : • the existing IPv4 service should not be adversely disrupted (e.g., as it might be by router loading of encapsulating IPv6 in IPv4 for tunnels); • the IPv6 service should perform as well as the IPv4 service (e.g., at the IPv4 line rate, and with similar network characteristics); • the service must be manageable and be able to be monitored (thus tools should be available for IPv6 as they are for IPv4); • the security of the network should not be compromised, due to the additional protocol itself or a weakness of any transition mechanism used; • an IPv6 address allocation plan must be drawn up.
154 3DTV/3DV IPTV TRANSMISSION APPROACHES Well-known interworking mechanisms include the following 12 : • Dual IP-Layer (or Dual Stack): A technique for providing complete support for both IPs—IPv4 and IPv6—in hosts and routers. • Conﬁgured Tunneling of IPv6 over IPv4: Point-to-point tunnels made by encapsulating IPv6 packets within IPv4 headers to carry them over IPv4 routing infrastructures. • Automatic Tunneling of IPv6 over IPv4: A mechanism for using IPv4- compatible addresses to automatically tunnel IPv6 packets over IPv4 networks. Tunneling techniques include the following 12 : • IPv6-over-IPv4 Tunneling: The technique of encapsulating IPv6 packets within IPv4 so that they can be carried across IPv4 routing infrastructures. • Conﬁgured Tunneling: IPv6-over-IPv4 tunneling where the IPv4 tunnel end- point address is determined by conﬁguration information on the encap- sulating node. The tunnels can be either unidirectional or bidirectional. Bidirectional conﬁgured tunnels behave as virtual point-to-point links. • Automatic Tunneling: IPv6-over-IPv4 tunneling where the IPv4 tunnel end- point address is determined from the IPv4 address embedded in the IPv4- compatible destination address of the IPv6 packet being tunneled. • IPv4 Multicast Tunneling: IPv6-over-IPv4 tunneling where the IPv4 tunnel endpoint address is determined using ND. Unlike conﬁgured tunneling, this does not require any address conﬁguration and unlike automatic tunneling it does not require the use of IPv4-compatible addresses. However, the mechanism assumes that the IPv4 infrastructure supports IPv4 multicast. Applications (and the lower-layer protocol stack) need to be properly equipped.There are four cases .Case 1: IPv4-only applications in a dual-stack node. IPv6 protocol is introducedin a node, but applications are not yet ported to support IPv6. The protocol stackis as follows: +-------------------------+ | appv4 | (appv4 - IPv4-only applications) +-------------------------+ | TCP / UDP / others| (transport protocols - TCP, UDP, etc.) +-------------------------+ | IPv4 | IPv6 | (IP protocols supported/enabled in the OS) +-------------------------+12 This section is based on Ref. . The reference is Copyrighted (C) by The Internet Society(2000). All Rights Reserved. This document and translations of it may be copied and furnished toothers, and derivative works that comment on or otherwise explain it or assist in its implementationmay be prepared, copied, published and distributed, in whole or in part, without restriction of anykind, provided that the above copyright notice and this paragraph are included on all such copiesand derivative works.
APPENDIX A5: IPv6 BASICS 155Case 2: IPv4-only applications and IPv6-only applications in a dual-stack node.Applications are ported for IPv6-only. Therefore there are two similar applications,one for each protocol version (e.g., ping and ping6). The protocol stack is asfollows: +-------------------------+ (appv4 - IPv4-only applications) | appv4 | appv6 | (appv6 - IPv6-only applications) +-------------------------+ | TCP / UDP / others| (transport protocols - TCP, UDP, etc.) +-------------------------+ | IPv4 | IPv6 | (IP protocols supported/enabled in the OS) +-------------------------+Case 3: Applications supporting both IPv4 and IPv6 in a dual-stack node. Appli-cations are ported for both IPv4 and IPv6 support. Therefore, the existing IPv4applications can be removed. The protocol stack is as follows: +-------------------------+ | appv4/v6 | (appv4/v6 - applications supporting both IPv4 and IPv6) +-------------------------+ | TCP / UDP / others| (transport protocols - TCP, UDP, etc.) +-------------------------+ | IPv4 | IPv6 | (IP protocols supported/enabled in the OS) +-------------------------+Case 4: Applications supporting both IPv4 and IPv6 in an IPv4-only node. Appli-cations are ported for both IPv4 and IPv6 support, but the same applications mayalso have to work when IPv6 is not being used (e.g., disabled from the OS). Theprotocol stack is as follows: +-------------------------+ | appv4/v6 | (appv4/v6 - applications supporting both IPv4 and IPv6) +-------------------------+ | TCP / UDP / others| (transport protocols - TCP, UDP, etc.) +-------------------------+ | IPv4 | (IP protocols supported/enabled in the OS) +-------------------------+ The ﬁrst two cases are not interesting in the longer term; only a few applica-tions are inherently IPv4- or IPv6-speciﬁc and should work with both protocolswithout having to care about which one is being used. Figure A5.5 depicts some basic scenarios of carrier-based IPv6 support. Cases(a) and (b) represent traditional environments where the carrier link supportseither a clear channel that is used to connect, say, two IPv4 routers, or is IP-aware. (In each case, the “cloud” on the left could also be the IPv4 Internet orthe IPv6 Internet.)
156 3DTV/3DV IPTV TRANSMISSION APPROACHES Carrier (telco) Carrier (telco)(a) IPv4 Network IPv4 (b) IPv4 Network IPv4 (PHY) (IPv4-based) Carrier (telco) Carrier (telco)(c) IPv6- IPv6- (d) IPv6- Network IPv6- Network (PHY) (IPv4-based) Carrier (telco) Carrier (telco)(e) IPv4 Network IPv6- (f) IPv6- Network IPv6- (IPv4-IPv6) (IPv6-based) IPv6- Carrier (telco) IPv6- IPv4 Carrier (telco)(g) Network (h) IPv6- Network (IPv4-IPv6) (IPv6-IPv4) IPv4 IPv4 Figure A5.5 Support of IPv6 in carrier networks. In Case (c), the carrier link is used to connect as a transparent link two IPv6routers; the carrier link is not (does not need to be) aware that it is transferringIPv6 PDUs. In Case (d), the carrier system is IPv4-aware, so the use of thatenvironment to support IPv6 requires IPv6 to operate in a tunneled-mode overthe non-IPv6 cloud, which is a capability of IPv6. In Case (e), the carrier infrastructure needs to provide a gateway functionbetween the IPv4 and the IPv6 world (this could entail repacking the IP PDUsfrom the v4 format to the v6 format). Case (f) is the ideal long-term scenariowhere the “world has converted to IPv6” and “so did the carrier network.” In Case (g), the carrier IP-aware network provides a conversion function tosupport both IPv4 (as a baseline) and IPv6 (as a “new technology”) handoffs.Possibly a dual-stack mechanism is utilized. In Case (h), the carrier IPv6-awarenetwork provides a support function for IPv6 (as a baseline) and also a conversionfunction to support legacy IPv4 islands. Even network/security administrators that operate in a pure IPv4 environmentneed to be aware of IPv6-related security issues. In a standard IPv4 environmentwhere IPv6 is not explicitly supported, any form of IPv6-based tunneling traf-ﬁc must be considered abnormal, malicious trafﬁc. For example, unconstrained6to4-based trafﬁc should be blocked (6to4 is a transitional mechanism intended
APPENDIX A5: IPv6 BASICS 157for individual independent nodes to connect IPv6 over the greater Internet). Mostcommercial-grade IPv4 ﬁrewalls block IP protocol 41, the 6to4, and tunnel pro-tocol, unless it has been explicitly enabled . In 2008, the Cooperative Association for Internet Data Analysis (CAIDA)and the American Registry for Internet Numbers (ARIN) surveyed over 200respondents from USG agencies, commercial organizations (including ISPs andend users), educational institutions, associations, and other proﬁt and nonproﬁtentities to determine the state of affairs in the United States with reference toIPv6 plans. Between 50% and 75% of the organizations surveyed indicated thatthey plan to deploy IPv6 by 2010 or sooner. According to some observers IPv6is still an emerging technology, maturing and growing as practical experience isgained; others take a more aggressive view, as seen in the next section.A5.2 Advocacy for IPv6 Deployment—ExampleWe include below some excerpt from the European Economic and Social Com-mittee and the Committee of the Regions  to emphasize the issues related toIPv6. Clearly, issues about IPv6 impact not only Europe but the entire world. The European Economic and Social Committee and the Committee of theRegions has issued an “Action Plan for the deployment of IPv6 in Europe.” It isthe objective of this Action Plan to support the widespread introduction of thenext version of the IP (IPv6) for the following reasons: • Timely implementation of IPv6 is required as the pool of IP addresses pro- vided by the current protocol version 4 is being depleted. • IPv6 with its huge address space provides a platform for innovation in IP based services and applications.A5.2.1 Preparing for the Growth in Internet Usage and for FutureInnovation. One common element of the Internet architecture is the IP that inessence gives any device or good connecting to the Internet a number, an address,so that it can communicate with other devices and/or goods. This address shouldgenerally be unique, to ensure global connectivity. The current version, IPv4,already provides for more than 4 billion such addresses. Even this, however, willnot be enough to keep pace with the continuing growth of the Internet. Beingaware of this long-term problem the Internet community developed an upgradedprotocol, IPv6, which has been gradually deployed since the late 1990s. In a previous Communication on IPv6, the European Commission made thecase for the early adoption of this protocol in Europe. This Communication hasbeen successful in establishing IPv6 Task Forces, enabling IPv6 on researchnetworks, supporting standards, and setting-up training actions. Following theCommunication, more than 30 European R&D projects related to IPv6 wereﬁnanced. Europe has now a large pool of experts with experience in IPv6 deploy-ment. Yet, despite the progress made, adoption of the new protocol has remainedslow while the issue of future IP address scarcity is becoming more urgent.
158 3DTV/3DV IPTV TRANSMISSION APPROACHESA5.2.2 Increasing Scarcity of IPv4 Addresses: A Difﬁculty for Users,an Obstacle to Innovation. Initially all Internet addresses are effectively heldby the IANA and then large blocks of addresses are allocated to the ﬁve RIRsthat in turn allocate them in smaller blocks to those who need them, includingISPs. The allocation, from IANA to RIR to ISP, is carried out on the basis ofdemonstrated need: there is no preallocation. The address space of IPv4 has been used up to a considerable extent. At the endof January 2008 about 16% was left in the IANA pool, that is, approximately 700million IPv4 addresses. There are widely quoted and regularly updated estimatesthat forecast the exhaustion of the unallocated IANA pool somewhere between2010 and 2011. New end users will still be able to get addresses from their ISPfor some time after these dates, but with increasing difﬁculty. Even when IPv4 addresses can no longer be allocated by IANA or the RIRs,the Internet will not stop working: the addresses already assigned can and mostprobably will be used for a signiﬁcant time to come. Yet the growth and alsothe capacity for innovation in IP-based networks would be hindered without anappropriate solution. How to deal with this transition is currently the subject ofdiscussion in the Internet community in general, and within and amongst the RIRcommunities in particular. All RIRs have recently issued public statements and have urged the adoptionof IPv6.A5.2.3 IPv4 is only a Short-Term Solution Leading to More Complexity.Concerns about the future scarcity of IP addresses are not a recent phenomenon.In the early days of the Internet, before the establishment of the RIRs and beforethe take-off of the World Wide Web, addresses were assigned rather generously.There was a danger of running out of addresses very quickly. Therefore, changesin allocation policy and in technology were introduced that allowed allocation tobe more aligned to actual need. One key IPv4 technology has been NAT. NATs connect a private (home orcorporate) network that uses private addresses to the public Internet where pub-lic IP addresses are required. Private addresses come from a particular part ofthe address space reserved for that purpose. The NAT device acts as a form ofgateway between the private network and the public Internet by translating theprivate addresses into public addresses. This method therefore reduces consump-tion of IPv4 addresses. However, the usage of NATs has two main drawbacks,namely: • It hinders direct device-to-device communication: intermediate systems are required to allow devices or goods with private addresses to communicate across the public Internet. • It adds a layer of complexity in that there are effectively two distinct classes of computers: those with a public address and those with a private address. This often increases costs for the design and maintenance of networks, as well as for the development of applications.
APPENDIX A5: IPv6 BASICS 159 Some other measures could extend the availability of IPv4 addresses. A marketto trade IPv4 addresses might emerge that would offer incentives to organizationsto sell addresses they are not using. However IP addresses are not strictly prop-erty. They need to be globally acceptable to be globally routable, which a sellercannot always guarantee. In addition, they could become a highly priced resource.So far, RIRs have been skeptical about the emergence of such a secondary mar-ket. Another option consists of trying to actively reclaim those already-allocatedaddress blocks that are underutilized. However, there is no apparent mechanismfor enforcing the return of such addresses. The possible cost of it has to be bal-anced against the additional lifetime this would bring to the IANA pool. Thoughsuch measures may provide some interim respite, sooner or later the demand forIP addresses will be too large to be satisﬁed by the global IPv4 space. Effortsto stay with IPv4 too long risk increasing unnecessary complexity and fragmen-tation of the global Internet. A timely introduction of IPv6 is thus the betterstrategy.A5.2.4 IPv6: The Best Way Forward. IPv6 provides a straightforward andlong-term solution to the address space problem. The number of addresses deﬁnedby the IPv6 protocol is huge. IPv6 allows every citizen, every network opera-tor (including those moving to all IP “Next Generation Networks”), and everyorganization in the world to have as many IP addresses as they need to connectevery conceivable device or good directly to the global Internet. IPv6 was alsodesigned to facilitate features that were felt to be missing in IPv4. Those fea-tures included quality of service, autoconﬁguration, security, and mobility. In themeantime, however, most of those features have been engineered in and aroundthe original IPv4 protocol. It is the large address space that makes IPv6 attractivefor future applications as this will simplify their design when compared to IPv4.The beneﬁts of IPv6 are, therefore, most obviously apparent whenever a largenumber of devices or goods need to be easily networked, and made potentiallyvisible and directly reachable over the Internet. A study funded by the Commis-sion demonstrated this potential for a number of market sectors such as homenetworks, building management, mobile communication, defense and securitysector, and car industry. Prompt and efﬁcient adoption of IPv6 offers Europe potential for innovationand leadership in advancing the Internet. Other regions, in particular the Asianregion, have already taken a strong interest in IPv6. For instance, the Japaneseconsumer electronics industry increasingly develops IP enabled products andexclusively for IPv6. The European industry should therefore be ready to meetfuture demand for IPv6-based services, applications, and devices and so securea competitive advantage in world markets. To conclude, the key advantage of IPv6 over IPv4 is the huge, more easilymanaged address space. This solves the future problem of address availabilitynow and for a long time to come. It provides a basis for innovation—developingand deploying services and applications that may be too complicated or too costly
160 3DTV/3DV IPTV TRANSMISSION APPROACHESin an IPv4 environment. It also empowers users, allowing them to have their ownnetwork connected to the Internet.A5.2.5 What Needs to be Done? IPv6 is not directly interoperable withIPv4. IPv6 and IPv4 devices can only communicate with each other usingapplication-speciﬁc gateways. They do not provide a general future-proofsolution for transparent interoperability. However, IPv6 can be enabled inparallel with IPv4 on the same device and on the same physical network. Therewill be a transition phase (expected to last for 10, 20, or even more years) whenIPv4 and IPv6 will coexist on the same machines (technically often referred toas “dual stack”) and be transmitted over the same network links. In addition,other standards and technologies (technically referred to as “tunneling”) allowIPv6 packets to be transmitted using IPv4 addressing and routing mechanismsand ultimately vice versa. This provides the technical basis for the step-by-stepintroduction of IPv6. Because of the universal character of the IP, deployment ofIPv6 requires the attention of many actors worldwide. The relevant stakeholdersin this process are as follows: • Internet organizations (such as ICANN, RIRs, and IETF) that need to man- age common IPv6 resources and services (allocate IPv6 addresses, operate DNS servers, etc.), and continue to develop needed standards and speciﬁca- tions. As of May 2008, the regional distribution of allocated IPv6 addresses is concentrated on Europe (R´ seaux Internet Protocol Europ´ ens or RIPE: e e 49%), with Asia and North America growing fast (Asia–Paciﬁc Network Information Centre, APNIC: 24%; ARIN: 20%). Less than half of those addresses are currently being announced on the public Internet (i.e., visible in the default-free routing table). In the DNS the root and top-level name servers are increasingly becoming IPv6 enabled. For instance, the gradual introduction of IPv6 connectivity to. eu name servers started in 2008. • ISPs that need over time to offer IPv6 connectivity and IPv6 based services to customers: There is evidence that less than half of the ISPs offer some kind of IPv6 interconnectivity. Only a few ISPs have a standard offer for IPv6 customer access service (mainly for business users) and provide IPv6 addresses. The percentage of “Autonomous Systems” (typically ISPs and large end users) that operate IPv6 is estimated at 2.5%. Accordingly, IPv6 trafﬁc seems to be relatively low. Typically the IPv6/v4 ratio is less than 0.1% at Internet Exchange Points (of which about one in ﬁve supports IPv6). However, this omits direct ISP to ISP trafﬁc and IPv6 that is “tunneled” and so appears at ﬁrst glance to be still IPv4. Recent measurements suggest that this kind of trafﬁc IPv6 that is “tunneled” is growing. • Infrastructure vendors (such as network equipment, operating systems, network application software) that need to integrate IPv6 capability into their products: Many equipment and software vendors have upgraded their products to include IPv6. However, there are still issues with certain func- tions and performance, and vendor support equivalent to IPv4. The installed
APPENDIX A5: IPv6 BASICS 161 equipment base of consumers, such as small routers and home modems to access the Internet, still by and large do not yet support IPv6. • Content and service providers (such as websites, instant messaging, e- mail, ﬁle sharing, voice over IP) that need to be reachable by enabling IPv6 on their servers: Worldwide there are only very few IPv6 websites. Almost none of the global top sites offer an IPv6 version. The de facto nonexistence of IPv6 reachable content and services on the Internet is a major obstacle in the take-up of the new protocol. • Business and consumer application vendors (such as business software, smart cards, peer-to-peer software, transport systems, sensor networks) that need to ensure that their solutions are IPv6 compatible and increasingly need to develop products and offer services that take advantage of IPv6 features. Today, there are few, if any, current applications that are exclusively built on IPv6. One expectation has been that proliferation of IP as the dominant network protocol would drive IPv6 into new areas such as logistics and trafﬁc management, mobile communication, and environment monitoring that has not taken place to any signiﬁcant degree yet. • End users (consumers, companies, academia, and public administrations) that need to purchase IPv6 capable products and services and to enable IPv6 on their own networks or home Internet access: Many home end users, without being aware of it, operate IPv6 capable equipment and yet, as a result of missing applications, without necessarily making use of it. Companies and public administrations are cautious to make changes to a functioning network without a clear need. Therefore not much user deployment in private networks is visible. Among the early adopters have been universities and research institutions. All EU national research and education networks also operate on IPv6. The European G´ ant network is IPv6 enabled, whereby e approximately 1% of its trafﬁc is native IPv6. How much and which efforts are required to adopt IPv6 differ amongst actorsand depend on each individual case. Therefore, it is practically impossible toreliably estimate the aggregated costs to introduce IPv6 globally. Experienceand learning from projects have shown that costs can be kept under controlwhen deployment is gradual and planned ahead. It is recommended that IPv6be introduced step-by-step, possibly in connection with hardware and softwareupgrades, organizational changes, and training measures (at ﬁrst glance unrelatedto IPv6). This requires a general awareness within the organization in order tonot miss those synergies. The costs will be signiﬁcantly higher when IPv6 isintroduced as a separate project and under time constraints. Introduction of IPv6 will take place alongside the existing IPv4 networks.Standards and technology allow for a steady incremental adoption of IPv6 bythe various stakeholders that will help to keep costs under control. Users can useIPv6 applications and generate IPv6 trafﬁc without waiting for their ISP to offerIPv6 connectivity. ISPs can increase their IPv6 capability and offer this in linewith perceived demand.
CHAPTER 63DTV Standardization and RelatedActivitiesThis chapter provides a survey of some key standardization activities to supportthe deployment of 3DTV. Standards need to cover many, if not all, elementsdepicted in Fig. 4.1, including capture, mastering, distribution, and consumerdevices interface. Standards for 3D transport issues are particularly importantbecause content providers and studios seek to create one master ﬁle that can carrystereo 3D content (and 2D content by default) across all the various distributionchannels including cable TV, satellite, over-the-air, packaged media, and theInternet. Standardization efforts have to be understood in the context of where stake-holders and proponents see the technology going. We already deﬁned what webelieve to be ﬁve generations of 3DTV commercialization in Chapter 1, whichthe reader will certainly recall. These generations ﬁt in well with the follow-ing menu of research activity being sponsored by various European and globalresearch initiatives, as described in Ref. : Short-term 3DV R&D (immediate commercialization, 2010–2013) • Digital stereoscopic projection – better/perfect alignment to minimize “eye-fatigue.” • End-to-end digital production-line for stereoscopic 3D cinema – digital stereo cameras; – digital baseline correction for realistic perspective; – digital postprocessing. Medium-term 3DV R&D (commercialization during the next few years, 2013–2016) • End-to-end multi-view 3DV with autostereoscopic displays – cameras and automated camera calibration; – compression/coding for efﬁcient delivery;3DTV Content Capture, Encoding and Transmission: Building the TransportInfrastructure for Commercial Services, by Daniel MinoliCopyright 2010 John Wiley & Sons, Inc. 163
164 3DTV STANDARDIZATION AND RELATED ACTIVITIES – standardization; – view interpolation for free-view video; – better autostereoscopic displays, based on current and near future tech- nology (lenticular, barrier-based); – natural immersive environments. Long-term 3DV R&D (10+ years, 2016–2020+) • realistic/ultrarealistic displays; • “natural” interaction with 3D displays; • holographic 3D displays, including “integral imaging” variants; • natural immersive environments; • total decoupling of “capture” and “display”; • novel capture, representation, and display techniques. One of the goals of the current standardization effort is to decouple the capturefunction from the display function. This is a very typical requirement for serviceproviders, going back to voice and Internet services: there will be a large pool ofend users each opting to choose a distinct Customer Premises Equipment (CPE)device (e.g., phone, PC, fax machine, cell phone, router, 3DTV display); there-fore, the service provider needs to utilize an network-intrinsic protocol (encoding,framing, addressing, etc.) that can then be utilized by the end device to create itsown internal representation, as needed. The same applies to 3DTV. As noted in Chapter 1, there is a lot of interest shown in this topic by theindustry and standards body. The MPEG of ISO/IEC is working on a coding for-mat for 3DV. Standards are the key to cost-effective deployment of a technology.Examples of video-related standards include the Beta-VHS (Video Home Sys-tem) and the HD DVD–Blu-ray controversies.1 As we mentioned in Chapter 1,SMPTE is working on some of the key standards needed to deliver 3D to thehome. As far back as 2003, a 3D Consortium with 70 partner organizations hadbeen founded in Japan and, more recently, four new activities have been started:the 3D@Home Consortium, the SMPTE 3D Home Entertainment Task Force,the Rapporteur Group on 3DTV of ITU-R Study Group 6, and the TM-3D-SMgroup of DVB. It will probably be somewhere around 2012 by the time therewill be an interoperable standard available in consumer systems to handle all thedelivery mechanisms for 3DTV. At a broad level and in the context of 3DTV, the following major initiativeshad been undertaken at press time: • MPEG: standardizing multi-view and 3DV coding; • DVB: standardizing of digital video transmission to TVs and mobile devices;1 HD DVD (High-Deﬁnition/Density DVD) was a high-density optical disc format for storing dataand high-deﬁnition video advanced principally by Toshiba. In 2008 after a protracted format warwith rival Blu-ray, the proposed format was abandoned.
MOVING PICTURE EXPERTS GROUP (MPEG) 165 • SMPTE: standardizing 3D delivery to the home; • ITU-T: standardizing user experience of multimedia content; • VQEG (Video Quality Experts Group): standardizing of objective video quality assessment. We review some of the ongoing standardization/advocacy work in this chapter.Only a subset of the universe of entities working on 3DTV is covered here. There is a pragmatic possibility that in the short term, equipment providers mayhave to support a number of formats for stereo 3D content. The ideal approachfor stereoscopic 3DTV is to provide sequential left and right frames at twicethe chosen viewing rate. However, because broadcasters and some devices maylack transport/interface bandwidth for that approach, a number of alternativesmay also be used (at least in the short term). Broadcasters appear to be focusingon top/bottom interleaving; however, trials are still ongoing to examine otherapproaches that involve some form of compression including checkerboard, side-by-side, or interleaved rows or columns .26.1 MOVING PICTURE EXPERTS GROUP (MPEG)6.1.1 Overview 3MPEG is a working group of ISO/IEC in charge of the development of standardsfor coded representation of digital audio and video and related data. Establishedin 1988, the group produces standards that help the industry offer end users anevermore enjoyable digital media experience. In its 21 years of activity, MPEGhas developed a substantive portfolio of technologies that have created an industryworth several hundred billion USD. MPEG is currently interested in 3DV in gen-eral and 3DTV in particular. Any broad success of 3DTV/3DV will likely dependon the development and industrial acceptance of MPEG standards; MPEG is thepremiere organization worldwide for video encoding and the list of standards thathave been produced in recent years is as follows:MPEG-1 The standard on which such products as video CD and MP3 are basedMPEG-2 The standard on which such products as digital television set-top boxes and DVDs are basedMPEG-4 The standard for multimedia for the ﬁxed and mobile webMPEG-7 The standard for description and search of audio and visual contentMPEG-21 The multimedia framework2 Inthis case, the TV set will have to recognize all the various formats and transcode and convertthem to the native rate of the TV. This is obviously suboptimal, but is similar to what actuallytranspired related to frame rates initially required to support in HDTVs for cameras and TVs.3 This entire section is based on ISO/MPEG materials.
166 3DTV STANDARDIZATION AND RELATED ACTIVITIESMPEG-A The standard providing application-speciﬁc formats by integrating multiple MPEG technologiesMPEG-B A collection of systems-speciﬁc standardsMPEG-C A collection of video-speciﬁc standardsMPEG-D A collection of audio-speciﬁc standardsMPEG-E A standard (M3W) providing support to download and execute multimedia applicationsMPEG-M A standard (MXM) for packaging and reusability of MPEG technologiesMPEG-U A standard for rich media user interfaceMPEG-V A standard for interchange with virtual worlds Table 6.1 provides a more detailed listing of activities of MPEG groups in thearea of video.6.1.2 Completed WorkAs we have seen in other parts of this text, currently there are a number of differ-ent 3DV formats (either already available and/or under investigation), typicallyrelated to speciﬁc types of displays (e.g., classical two-view stereo video, multi-view video with more than two views, V + D, MV + D, and layered depth video).Efﬁcient compression is crucial for 3DV applications and a plethora of compres-sion and coding algorithms are either already available and/or under investigationfor the different 3DV formats (some of these are standardized e.g., by MPEG,others are proprietary). A generic, ﬂexible, and efﬁcient 3DV format that canserve a range of different 3DV systems (including mobile phones) is currentlybeing investigated by MPEG. As we noted earlier in this text, MPEG standards now already support 3DVbased on V + D. In 2007 MPEG speciﬁed a container format “ISO/IEC 23002-3Representation of Auxiliary Video and Supplemental Information” (also knownas MPEG-C Part 3) that can be utilized for V + D data. Transport of this datais deﬁned in a separate MPEG systems speciﬁcation “ISO/IEC 13818-1:2003Carriage of Auxiliary Data” [3, 4]. In 2008 ISO approved a new 3DV project in 2008 under ISO/IECJTC1/SC29/WG11 (ISO/IEC JTC1/SC29/WG11, MPEG2008/N9784). TheJVT of ITU-T and MPEG has devoted its recent efforts to extend the widelydeployed H.264/AVC standard for MVC to support MV + D (and also V + D).MVC allows the construction of bitstreams that represent multiple views.The MPEG standard that emerged, MVC, provides good robustness andcompression performance for delivering 3DV by taking into account of theinter-view dependencies of the different visual channels. In addition, itsbackwards-compatibility with H.264/AVC codecs makes it widely interoperablein environments having both 2D and 3D capable devices. MVC supports anMV + D (and also V + D) encoded representation inside the MPEG-2 transportstream. The MVC standard was developed by the JVT of ISO/IEC MPEG
TABLE 6.1 Activities of MPEG Groups in the Area of Video Summary 1-Pager White Paper Presentation Technology Area Standards to represent natural and synthetic media such as audio, video, 1. Media coding graphics, etc. in a bit-efﬁcient way 1. 2D video coding Coded representation of time-dependent 2D arrays of pixels (video) 1. MPEG-1 video X X X X 2. MPEG-2 video X X X X 3. MPEG-4 visual (rectangular) X X X X 4. Shape coding (nonrectangular) X X X X 5. Advanced video coding X X X X 6. Scalable video coding X X X X 7. MVC X X X X 8. High-performance video coding X X X X 2. Decoder representation 1. Reconﬁgurable video coding X X X X 2. Coding tool repository X X X X 3. 3D video coding Coded representation of time-dependent 3D arrays of pixels 1. Auxiliary video data representation X X 2. 3D video coding X X 4. Audio coding Coded representation of audio (speech and music) information 1. MPEG-1 audio X X X X 2. MPEG-2 audio X X X X 3. Advanced audio coding X X X X 4. Parametric audio coding X X X X 5. Spectral band replication X X X X 6. Lossless coding X X X X167 7. Scalable lossless coding X X X X (continued overleaf)
168 TABLE 6.1 (Continued ) Technology Area Summary 1-Pager White Paper Presentation 8. 1-bit lossless coding X X X X 9. MPEG surround X X X X 10. Spatial audio object coding X X X X 11. Uniﬁed speech and audio coding X X X X 5. 2D graphic coding Coded representation of 2D synthetic information 1. Texture coding X X X X 2. 2D mesh coding X X X X 6. 3D graphic coding Coded representation of 3D synthetic information 1. Face and body animation X X X X 2. 3D mesh coding X X X X 3. AFX X X X X 7. Synthetic audio coding Coded representation of synthetic audio information 1. Structured audio X X X X 8. Text coding Coded representation of text information 1. Streaming text format X X X X 9. Font coding Coded representation of font information 1. Font compression and streaming X X X X 2. Open font format X X X X 10. Music coding Coded representation of musical information 1. Symbolic music representation X X X X 11. Media context and control Coded representation of information designed to stimulate other senses than vision or audition; for example, olfaction, mechanoreception, equilibrioception, or thermoception 1. Control information X X X X 2. Sensory information X X X X 3. Virtual object characteristics X X X X
12. Media value chains Coded representation of information regarding the full media value chain 1. Media value chains X X X X 2. Composition coding Standards to describe how different media objects are composed in a scene 1. Composition coding Coded representation of the composition of media objects in a scene 1. Binary Format for Scenes (BIFS) X X X X 2. Audio BIFS X X X X 3. BIFS for digital radio X X X X 4. Lightweight scene representation X X X X 5. Presentation and modiﬁcation of X X X X structured information 3. Description coding Standards to describe media content that can be stored and transmitted for use by a machine 1. Description technologies Descriptors, description schemes, description deﬁnition language, and efﬁcient representation technologies 1. Description deﬁnition language X X X X 2. MPEG-7 schemas X X X X 2. Video description Description of video and image information 1. Low-level descriptions X X X X 2. High-level descriptions X X X X 3. Overview X X X X 4. Visual description tools X X X X 5. Image and video signature X X X X 3. Audio description Description of audio information 1. Low-level descriptions X X X X 2. High-level descriptions X X X X (continued overleaf)169
170 TABLE 6.1 (Continued ) Technology Area Summary 1-Pager White Paper Presentation 4. Multimedia description Description of information types that are used in multimedia applications 1. Multimedia description schemes X X X X 4. Systems support Standards to enable the use of digital media by an application 1. Multiplexing and synchronization Technologies to serialize multiple media sources and to keep them synchronized 1. MPEG-1 X X X X 2. MPEG-2 X X X X 3. MPEG-4 X X X X 2. Signaling Protocols to interact with a delivery system 1. DSM-CC (Digital Storage Media X X X X Command and Control) user to user 2. DSM-CC user to network X X X X 3. DMIF X X X X 5. IPMP Standards to enable the management and protection of intellectual property related to digital media objects 1. General General information on MPEG IPMP technologies 1. MPEG technologies for DRM X X X X 2. Identiﬁcation technologies Technologies to uniquely identify media objects 1. MPEG-2 copyright identiﬁer X X X X 2. Object content information X X X X 3. Digital item identiﬁcation X X X X 3. Rights expression technologies Syntax and semantics of rights expression languages and dictionary of terms of rights data 1. Rights expression language X X X X 2. Rights data dictionary X X X X
4. Persistent association technologies Technologies to bind information to resources in a persistent fashion 1. Evaluation tools for persistent X X X X association 5. Access technologies Protocols to access IPMP tools when they are required by an IPMP system 1. MPEG-2 IPMP X X X X 2. MPEG-4 IPMP X X X X 3. MPEG-21 IPMP X X X X 4. XML representation of IPMP-X X X X X messages 6. Digital item Standards to represent structured digital objects, including identiﬁcation, metadata, and governance information 1. Digital item technologies Technologies designed to deal speciﬁcally for digital items, such as digital item declaration, digital item processing, and event reporting 1. Digital item declaration X X X X 2. Digital item processing X X X X 3. C++ bindings X X X X 4. Session mobility X X X X 5. Event reporting X X X X 6. Schema ﬁles X X X X 7. Digital item presentation X X X X 2. Resources in digital items Handling of resources in digital items such as when adapting or identifying fragments 1. Digital item adaptation X X X X 2. Fragment identiﬁcation for MPEG X X X X resources (continued overleaf)171
172 TABLE 6.1 (Continued ) Technology Area Summary 1-Pager White Paper Presentation 7. Transport and ﬁle format Standards to enable the transport of digital media by means of ﬁles or transport protocol 1. Transport of media streams Technology to transport digital media information on a transport protocol 1. Program stream X X X X 2. Transport stream X X X X M4Mux X X X X 2. Media ﬁle formats Technology to package digital media information in a ﬁle 1. ISO base media ﬁle format X X X X 2. MPEG-4 ﬁle format X X X X 3. AVC ﬁle format X X X X 4. SVC ﬁle format X X X X 5. MVC ﬁle format X X X X 6. Digital item ﬁle format X X X X 3. Transport of digital items Technology to transport digital items 1. Digital item streaming X X X X 8. User interaction Technologies for user interaction 1. User interaction 1. Widgets X X X X 2. Advanced user interaction X X X X 9. Multimedia architecture Reference models and technology to enable the use of digital media in a device or by an application 1. Terminal architecture Reference architectures for MPEG standards 1. MPEG-1 X X X X 2. MPEG-2 X X X X
3. MPEG-4 X X X X 4. Graphics compression model X X X X 5. MPEG-7 X X X X 6. M3W architecture X X X X 7. MXM architecture and technologies X X X X 2. Application Programming API to enable enhanced use of rich media Interfaces ( API s) 1. MPEG-J X X X X 2. MPEG-J GFX X X X X 3. M3W multimedia API X X X X 4. MXM API X X X X 3. Terminals API to enable enhanced use of rich media 1. M3W component model X X X X 2. M3W resource and quality X X X X management 3. M3W component download X X X X 4. M3W fault management X X X X 5. M3W system integrity management X X X X 6. Advanced IPTV terminal X X X X 10. Application formats Standards to support speciﬁc applications by means of component MPEG technologies 1. Application formats Speciﬁcation of formats for media players 1. Music player X X X X 2. Photo player X X X X 3. Musical slide show X X X X 4. Media streaming X X X X 5. Professional archival X X X X 6. Open access X X X X 7. Portable video X X X X173 (continued overleaf)
174 TABLE 6.1 (Continued ) Technology Area Summary 1-Pager White Paper Presentation 8. Digital multimedia broadcasting X X X X 9. Video surveillance X X X X 10. Stereoscopic video X X X X 11. Interactive music X X X X 11. Generic media technologies Generic standard technologies for digital media to be used across MPEG standards 1. XML technologies XML-related technologies such as binarization 1. Binary MPEG format for XML X X X X 2. Signal processing technologies Digital Signal Processing (DSP) technologies such as 8 × 8 DCT and IDCT, and coding tool speciﬁcation 1. Generic inverse DCT speciﬁcation X X X X 2. Fixed point implementation of X X X X DCT/IDCT 3. Bitstream technologies DSP technologies such as 8 × 8 DCT and IDCT, and coding tool speciﬁcation 1. Bitstream syntax description X X X X language 12. Protocols Protocols to communicate between devices Protocols 1. MXM protocols X X X X 13. Reference implementations Implementation of MPEG standards using programming languages or hardware description languages 1. Reference software Implementation of MPEG standards using a programming language 1. MPEG-1 X X X X 2. MPEG-2 X X X X 3. MPEG-4 X X X X 4. MPEG-7 X X X X 5. MPEG-21 X X X X
6. MPEG-A X X X X 7. MPEG-B X X X X 8. MPEG-C X X X X 9. MPEG-D X X X X 10. MPEG-E X X X X 11. MPEG-M X X X X 12. MPEG-U X X X X 13. MPEG-V X X X X 2. Reference hardware description Implementation of MPEG standards using a hardware description language 1. MPEG-4 X X X X 14. Conformance Speciﬁcation of procedures and data to test the conformance of encoders, bitstreams or decoders to an MPEG standard 1. MPEG-1 MPEG-1 conformance 1. Systems X X X X 2. Video X X X X 3. Audio X X X X 2. MPEG-2 MPEG-2 conformance 1. Systems X X X X 2. Video X X X X 3. Audio X X X X 4. DSM-CC X X X X 3. MPEG-4 MPEG-4 conformance 1. Systems X X X X 2. Visual X X X X 3. Audio X X X X 4. AVC (continued overleaf)175
176 TABLE 6.1 (Continued ) Technology Area Summary 1-Pager White Paper Presentation 4. MPEG-7 MPEG-7 conformance 1. Systems X X X X 2. Visual X X X X 3. Audio X X X X 5. MPEG-21 MPEG-21 conformance 1. Digital item declaration X X X X 2. Rights expression language X X X X 3. Digital item adaptation X X X X 4. Digital item processing X X X X 5. Digital item processing X X X X 6. MPEG-A MPEG-A conformance 1. Music player X X X X 2. Photo player X X X X 7. MPEG-B MPEG-B conformance 8. MPEG-C MPEG-C conformance 9. MPEG-D MPEG-D conformance 10. MPEG-E MPEG-E conformance 11. MPEG-M MPEG-M conformance 12. MPEG-U MPEG-U conformance 13. MPEG-V MPEG-V conformance 15. Maintenance Activities designed to maintain the body of MPEG standards through the development of corrigenda and new editions 1. MPEG-1 MPEG-1 maintenance 2. MPEG-2 MPEG-2 maintenance
178 3DTV STANDARDIZATION AND RELATED ACTIVITIESand ITU-T Video Coding Experts Group (VCEG; ISO/IEC JTC1/SC29/WG11and ITU-T SG16 Q.6). MVC was originally an addition to H.264/MPEG-4 AVCvideo compression standard that enables efﬁcient encoding of sequences capturedsimultaneously from multiple cameras using a single video stream. At press time, MVC was the most efﬁcient way for stereo and multi-viewvideo coding; for two views, the performance achieved by H.264/AVC StereoSEI message and MVC are similar. MVC is also expected to become a newMPEG video coding standard for the realization of future video applications suchas 3DTV and FTV. The MVC group in the JVT has chosen the H.264/AVC-basedMVC method as the MVC reference model, since this method showed bettercoding efﬁciency than H.264/AVC simulcast coding and the other methods thatwere submitted in response to the call for proposals made by the MPEG [3, 5].6.1.3 New InitiativesISO MPEG has already developed a suite of international standards to support3D services and devices, and in 2009 initiated a new phase of standardization tobe completed by 2011 . • One objective is to enable stereo devices to cope with varying display types and sizes, and different viewing preferences. This includes the ability to vary the baseline distance for stereo video to adjust the depth perception that could help to avoid fatigue and other viewing discomforts. • MPEG also envisions that high-quality autostereoscopic displays will enter the consumer market in the next few years. Since it is difﬁcult to directly pro- vide all the necessary views due to production and transmission constraints, a new format is needed to enable the generation of many high-quality views from a limited amount of input data such as stereo and depth. ISO’s vision is now a new 3DV format that goes beyond the capabilities ofexisting standards to enable both advanced stereoscopic display processing andimproved support for autostereoscopic N -view displays, while enabling interop-erable 3D services. The new 3DV standard aims to improve rendering capabilityof 2D + Depth format while reducing bitrate requirements relative to existingstandards, as noted earlier in this Section 6. 3DV supports new types of audiovisual systems that allow users to view videosof the real 3D space from different user viewpoints. In an advanced application of3DV, denoted as FTV, a user can set the viewpoint to an almost arbitrary locationand direction that can be static, change abruptly, or vary continuously, within thelimits that are given by the available camera setup. Similarly, the audio listeningpoint is changed accordingly. The ﬁrst phase of 3DV development is expected tosupport advanced 3D displays, where M dense views must be generated from asparse set of K transmitted views (typically K ≤ 3) with associated depth data.The allowable range of view synthesis will be relatively narrow (20◦ view anglefrom leftmost to rightmost view).
MOVING PICTURE EXPERTS GROUP (MPEG) 179 Display configuration User preferences Stereo camera N × Video + Depth 2D display Depth-Image-based rendering 3D content production Depth camera Multi-View coding + M-view DVB 3D display Transmission Multi-camera setup 2D/3D Head-tracked conversion stereo display Video Depth Metadata Figure 6.1 Example of an FTV system and data format. The MPEG initiative notes that 3DV is a standard that targets serving a varietyof 3D displays. It is the ﬁrst phase of FTV, that is a new framework that includesa coded representation for multi-view video and depth information to support thegeneration of high-quality intermediate views at the receiver. This enables freeviewpoint functionality and view generation for automultiscopic displays .Figure 6.1 shows an example of an FTV system that transmits multi-view videowith depth information. The content may be produced in a number of ways; forexample, with multicamera setup, depth cameras or 2D/3D conversion processes.At the receiver, DIBR could be performed to project the signal to various typesof displays. The ﬁrst focus (phase) of ISO/MPEG standardization for FTV is 3DV . Thismeans video for 3D displays. Such displays focus present N views (e.g., N = 9)simultaneously to the user (Fig. 6.2). For efﬁciency reasons, only a lower numberK of views (K = 1, 2, 3) shall be transmitted. For those K views additional depthdata shall be provided. At the receiver side, the N views to be displayed aregenerated from the K transmitted views with depth by DIBR. This is illustratedin Fig. 6.2. This application scenario imposes speciﬁc constraints such as narrow angleacquisition (<20◦ ). Also there should be no need (cost reasons) for geometricrectiﬁcation at the receiver side, meaning if any rectiﬁcation is needed at all, itshould be performed on the input views already at the encoder side.
180 3DTV STANDARDIZATION AND RELATED ACTIVITIES Pos2 Pos3 R L R L Pos1 R L MV 3D display V1 V2 V3 V4 V5 V6 V7 V8 V9 DIBR DIBR V1 V5 V9 D1 D5 D9 Decoded MVD dataFigure 6.2 Example of generating nine outputs views (N = 9) out of three input viewswith depth (K = 3). Some multi-view displays are, for example, based on LCD screens with asheet of transparent lenses in front. This sheet sends different views to each eye,and so a person sees two different views; this gives the person a stereoscopicviewing experience. The stereoscopic capabilities of these multi-view displaysare limited by the resolution of the LCD screen (currently 1920 × 1080). Forexample, for a nine-view system where the cone of nine views is 10◦ (ConeAngle—CA), objects are limited to ±10% (Object Range—OR) of the screenwidth to appear in front or behind the screen. Both OR and CA will improvewith time (determined by economics) as the number of pixels of the LCD screengoes up. In addition, other types of stereo displays appear now in the market in largenumbers. The ability to generate output views at arbitrary positions at the receiver,is attractive even in the case of N = 2 (i.e., simple stereo display). If, for example,the material has been produced for a large cinema theater, direct usage of thatstereo signal (two ﬁxed views) with relatively small home-sized 3D displays willyield a very different stereoscopic viewing experience (e.g., strongly reduceddepth effect). With a 3DV signal as illustrated in Fig. 6.3 a new stereo pair canbe generated that is optimized for the given 3D display. With a different initiative, ISO previously looked at auxiliary video datarepresentations. The purpose of ISO/IEC 23002-3 Auxiliary Video Data Rep-resentations is to support all those applications where additional data needs to be
MOVING PICTURE EXPERTS GROUP (MPEG) 181Figure 6.3 Example of lenticular autostereoscopic display requiring nine views (N = 9).efﬁciently attached to the individual pixels of a regular video. ISO/IEC 23002-3describes how this can be achieved in a generic way by making use of exist-ing (and even future) video codecs available within MPEG. A good exampleof an application that requires additional information associated with the indi-vidual pixels of a regular (2D) video stream is stereoscopic video presented onan autostereoscopic single- or multiple-user display. At the MPEG meeting inNice, France (October 2005), the arrival of such displays on the market hadbeen stressed, and several of them were even shown and demonstrated. Becausedifferent display realizations vary largely in (i) the number of views that are rep-resented; and (ii) the maximum parallax that can be supported, an input format isrequired that is ﬂexible enough to drive all possible variants. This can be achievedby supplying a depth or parallax values with each pixel of a regular video stream,and by generating the required stereoscopic views at the receiver side. The stan-dardization of a common depth, in the parallax format within ISO/IEC 23002-3Auxiliary Video Data Representations will thus enable interoperability betweencontent providers, broadcasters, and display manufacturers. ISO/IEC 23002-3 isﬂexible enough to easily add other types of auxiliary video data in the future. Oneexample could be the annotation of temperature maps coming from an infraredcamera to regular video coming from a regular camera . The Auxiliary Video Data format deﬁned in ISO/IEC 23002-3 consists of anarray of N -bit values that are associated with the individual pixels of a regularvideo stream. These data can be compressed like conventional luminance signalsusing already existing (and even future) MPEG video codecs. The format allowsfor optional subsampling of the auxiliary data in both, the spatial and tempo-ral domains. This can be beneﬁcial depending on the particular application andits requirements and allowing for very low bitrates for the auxiliary data. Thespeciﬁcation is very ﬂexible in the sense that it deﬁnes a new 8-bit code wordaux_video_type that speciﬁes the type of the associated data; for example, cur-rently a value of 0 × 10 signals a depth map, a value of 0 × 11 signals a parallaxmap. New values for additional data representations can be easily added to fulﬁllfuture demands. The transport of auxiliary video data within an MPEG-2 transport or programstream is deﬁned in an amendment to the MPEG-2 systems standard. It speciﬁes
182 3DTV STANDARDIZATION AND RELATED ACTIVITIESnew stream_id_extension and stream_type values that are used to signal an auxil-iary video data stream. An additional auxiliary_video_data_descriptor is utilizedin order to convey in more detail how the data should to be interpreted by theapplication that uses them. Metadata associated with the auxiliary data is carriedon system level, allowing the use of unmodiﬁed video codecs (no need to modifysilicon). In conclusion, ISO/IEC 23002-3 Auxiliary Video Data Representations pro-vides a reasonably efﬁcient approach for attaching additional information such asdepth values and parallax values to the individual pixels of a regular video streamand to signal how these associated data should be interpreted by the applicationthat uses them.6.2 MPEG INDUSTRY FORUM (MPEGIF)Moving Pictures Expert Group Industry Forum (MPEGIF) is an advocacy groupfor standards-based DTV technologies. The group is an independent and platform-neutral not-for-proﬁt organization representing more than 20 international com-panies and organizations with the goal to facilitate and further the widespreadadoption and deployment of MPEG and related standards in next-generation dig-ital media services. MPEGIF is among the consortiums focused on standardizingtechnology and methods for delivering 3DV/3DTV. MPEGIF announced in December 2009 the formation of the 3DTV WorkingGroup and launch of the “3D over MPEG” campaign. The new working groupand campaign continue MPEGIF’s work in furthering the widespread adoptionand deployment of MPEG-related standards including MPEG-4 AVC/H.264 .The chair of the newly formed 3DTV Working Group stated that “3DTV is ofkeen interest to everyone in the video creation and delivery industries. The chal-lenge we all face is that of sorting through the myriad technical options. Ourcommon goal is to create a 3DTV ecosystem that delivers great new experiencesto consumers. The 3DTV Working Group and the ‘3D over MPEG’ campaignare designed to provide focus and clear information to decision makers. 3DTVcan be distributed today using MPEG-related standards. Existing broadband andbroadcast services and infrastructures are 3D-ready, and ongoing works by stan-dards bodies provide a compelling path for the future evolution of 3DTV . . . 3Dvideo is showing distinct commercial promise in theatrical releases and couldthus transition to the advanced living room to follow High-Deﬁnition and Sur-round Sound. As a result there is a growing array of competing technologies andwork from various standards bodies. It has therefore become a major theme ofthe next MPEG Industry Forum Master Class being held at CES 2010 in LasVegas in January 2010.” About 30 industry participants joined the 3D WorkingGroup at the launch. The 3DTV Working Group aims at providing a forum for free exchange ofinformation related to this emerging technology, an industry voice advocating theadoption of standards and for consolidating the overall direction of the 3DTV
SOCIETY OF MOTION PICTURE AND TELEVISION ENGINEERS 183industry. Its focus and constituency will be derived from video service providers,consumer electronics manufacturers, content owners, equipment manufacturers,system integrators, software providers, as well as industry advocacy groups,industry analysts, ﬁnancial institutions, and academic institutes. MPEG-4 MVC is being given consideration. As we have seen, MPEG-4MVC can be used, among other more sophisticated applications, to handlesimple transmission of independent, left-eye/right-eye views, which is consideredto be the viable early commercial approach, at least in the United States. Anarrangement called by some “frame-packing arrangement and SEI message”enables the encoder to signal the decoder how to extract two distinct views froma single decoded frame; this could be in the form of side-by-side, or over–underimages .6.3 SOCIETY OF MOTION PICTURE AND TELEVISION ENGINEERS(SMPTE) 3D HOME ENTERTAINMENT TASK FORCEThere is a need for a single mastering standard for viewing stereo 3D contenton TVs, PCs, and mobile phones, where the content could originate from opticaldisks, broadcast networks, or the Internet. To that end, SMPTE formed a 3DHome Entertainment Task Force in 2008 to work the issue and a standardseffort was launched in 2009 via an SMPTE 3D Standards Working Group todeﬁne a content format for stereo 3D. The SMPTE 3D Standards Working Grouphad about 200 participants at press time; the Home Master standard was expectedto become available in mid-2010. The group is in favor of a mastering standardfor the Home Master speciﬁcation based on 1920 × 1080 pixel resolution at 60fps/eye. The speciﬁcation is expected to support an option for falling back to a2D image. The standard is also expected to support hybrid products, such as BDsthat can support either 2D or stereo 3D displays. SMPTE’s 3D Home Master deﬁnes high-level image formatting requirementsthat impact 3DTV designs, but the larger bulk of the 3DTV standards for hardwareare expected to come from other organizations, such as CEA. Studios or gamepublishers would deliver the master as source material for uses ranging from DVDand BD players to terrestrial and satellite broadcasts and Internet downloadableor streaming ﬁles . As we have seen throughout this text, 3DTV systems must support multipledelivery channels, multiple coding techniques, and multiple display technolo-gies. Digital cinema, for example, is addressed with a relatively simple left–rightsequence approach; residential TV displays involve a greater variety of technolo-gies necessitating more complex encoding. Content transmission and delivery isalso supported by a variety of physical media such as BDs as well as broadcast-ing, satellite, and cable delivery. The SMPTE 3D Group has been consideringwhat kind of compression should be supported. One of the key goals of thestandardization process is deﬁning and/or identifying schemes that minimize thetotal bandwidth required to support the service; the MVC extension to MPEG-4/H.264 discussed earlier is being considered by the group. Preliminary studies
184 3DTV STANDARDIZATION AND RELATED ACTIVITIEShave shown, however, that relatively little bandwidth may be saved when com-pared to simulcast because high-quality images require 75–100% overhead andimages of medium quality require 65–98% overhead. In addition to deﬁningthe representation and encoding standards (which clearly drive the amount ofchannel bandwidth for the additional image stream), 3DTV service entails otherrequirements; for example, there is the issue of graphics overlay, captions andsubtitles, and metadata. 3D programming guides have to be rethought, accordingto industry observers; the goal is to avoid ﬂoating the guide in front of the actionand instead, to push the guide behind the screen and let the action play over itbecause practical research shows that people found it jarring when the program-ming guide is brought to the forefront of 3DV images . The SMPTE Groupis also looking at format wrappers, such as Material eXchange Format (MXF; acontainer format for professional digital video and audio media deﬁned by a setof SMPTE standards), whether an electrical interface should be speciﬁed, and ifdepth representation is needed for an early version of the 3DTV service, amongother factors . As we have noted earlier in the text, 3DTV has the addedconsideration of physiological effects because disjoint stereoscopic images canadversely impact the viewer.6.4 RAPPORTEUR GROUP ON 3DTV OF ITU-R STUDY GROUP 6Arranging4 a television system so that viewers can see 3D pictures is both simpleand complex. ITU–R has agreed on a new study topic on 3D television, and in2010 it expects to be building up knowledge of the options. Proponents hadmade the proposal to the ITU-R in 2008 that the time was ripe for worldwideagreements on 3DTV, and the ITU-R Study Group 6 has agreed on a “new StudyQuestion” on 3D television, that will be submitted for approval by the ITU-RMembership . Though there are different views about whether current technology can providea system which is entirely free of eyestrain, for those who wish to start suchservices, there could be advantages in having a worldwide common solution, orat least interoperable solutions, and the ITU-R Study Group 6 specialists havebeen gathering information, which might lead to such a result. Therefore, the Question from ITU-R calls for contributions on systems thatinclude, but also go beyond stereoscopy, and include technology that may recordwhat physicists call the “object wave.” Clearly, this a more futuristic versionof 3DTV. Holograms record in a limited way the “object wave.” Will there bea way of broadcasting to record an “object wave”? This remains to be seen.No approaches are excluded at this stage. The “Question” is essentially a callfor proposals for 3DTV. Journals and individuals are asked to “spread the word”about this, and to invite contributions. Such contributions are normally channeledvia national administrations, or via the other Members of the ITU—the so-called4 This section is based on ITU materials.
RAPPORTEUR GROUP ON 3DTV OF ITU-R STUDY GROUP 6 185Sector Members. Which proposals will be made and which may be the subjectof agreement remains to be seen, but the ITU-R sector has launched, in its ownwords, “an exciting new issue, which may have a profound impact on televisionin the years ahead.” The Question is included below to give the readers perspective on the ITU-Rwork. QUESTION ITU-R 128/6 Digital three-dimensional (3D) TV broadcasting (2008)The ITU Radiocommunication Assembly, consideringa) that existing TV broadcasting systems do not provide complete perception of reproduced pictures as natural three-dimensional scenes;b) that viewers’ experience of presence in reproduced pictures may be enhanced by 3D TV, which is anticipated to be an important future application of digital TV broadcasting;c) that the cinema industry is moving quickly towards production and display in 3D;d) that research into various applications of new technologies (for example, holographic imaging) that could be used in 3D TV broadcasting is taking place in many countries;e) that progress in new methods of digital TV signal compression and process- ing is opening the door to the practical realization of multifunctional 3D TV broadcasting systems;f) that the development of uniform world standards for 3D TV systems, cov- ering various aspects of digital TV broadcasting, would encourage adoption across the digital divide and prevent a multiplicity of standards;g) the harmonization of broadcast and non-broadcast applications of 3D TV is desirable, decides that the following Questions should be studied1 What are the user requirements for digital 3D TV broadcasting systems?2 What are the requirements for image viewing and sound listening conditions for 3D TV?3 What 3D TV broadcasting systems currently exist or are being developed for the purposes of TV program production, post-production, television recording, archiving, distribution and transmission for realization of 3D TV broadcasting?
186 3DTV STANDARDIZATION AND RELATED ACTIVITIES4 What new methods of image capture and recording would be suitable for the effective representation of three-dimensional scenes?5 What are the possible solutions (and their limitations) for the broadcasting of 3D TV digital signals via the existing terrestrial 6, 7 and 8 MHz bandwidth channels or broadcast satellite services, for ﬁxed and mobile reception?6 What methods for providing 3D TV broadcasts would be compatible with existing television systems?7 What are the digital signal compression and modulation methods that may be recommended for 3D TV broadcasting?8 What are the requirements for the 3D TV studio digital interfaces?9 What are appropriate picture and sound quality levels for various broadcast applications of 3D TV?10 What methodologies of subjective and objective assessment of picture and sound quality may be used in 3D TV broadcasting? also decides1 that results of the above-mentioned studies should be analyzed for the pur- pose of the preparation of new Reports and Recommendation(s);2 that the above-mentioned studies should be completed by 2012. It should be noted that the ITU-R has already published some standards andreports on 3DTV in the past, including the following: • Rec. ITU-R BT.1198 (1995) Stereoscopic television based on R- and L-eye two-channel signals • Rec. ITU-R BT.1438 (2000) Subjective assessment of stereoscopic televi- sion pictures’ • Report ITU-R BT.312-5 (1990) Constitution of stereoscopic television • Report ITU-R BT.2017 (1998) Stereoscopic television MPEG-2 multi-view proﬁle • Report ITU-R BT.2088 (2006) Stereoscopic Television. ITU-R BT.1198, Stereoscopic television based on R- and L-eye two-channelsignals, suggests some general principles to be followed in development ofstereoscopic television systems to maximize their compatibility with existingmonoscopic systems. It contains • requirements for compatibility with monoscopic signal;
TM-3D-SM GROUP OF DIGITAL VIDEO BROADCAST (DVB) 187 • requirement for a discrete two-channel digital video coding scheme; • requirement for a discrete channel plus difference channel digital video cod- ing scheme. Obviously these are “old” standards, but they point to the fact that transmissionof 3DTV signals is not completely a new concept.6.5 TM-3D-SM GROUP OF DIGITAL VIDEO BROADCAST (DVB)The DVB Project is an industry-led consortium of over 250 broadcasters, manu-facturers, network operators, software developers, regulatory bodies, and othersin over 35 countries committed to designing open technical standards for theglobal delivery of DTV and data services. The DVB project is responsible forthe deﬁnition of today’s 2D DTV broadcast infrastructure in Europe, requires theuse of the MPEG-2 Systems Layer speciﬁcation for the distribution of audiovi-sual data via cable (DVB-C i.e., Digital Video Broadcast-Cable), satellite (DVB-Si.e., Digital Video Broadcast-Satellite), or terrestrial (DVB-T i.e., Digital VideoBroadcast-Terrestrial) transmitters. Owing to its almost universal acceptance andworldwide use, it is of major importance for any future 3DTV system, and tobuild its distribution services on this transport technology  (services usingDVB standards are available on every continent with more than 500 million DVBreceivers deployed). During5 2009, DVB closely studied the various aspects of (potential) 3DTVsolutions. A Technical Module Study Mission report was ﬁnalized, leading to theformal creation of the TM-3DTV group. A 3DTV Commercial Module has alsonow been created to go back to the ﬁrst step of the DVB process: what kind of3DTV solution does the market want and need, and how can DVB play an activepart in the creation of that solution? To start answering some of these questions,the CM-3DTV group was planning to host a DVB 3D TV Kick-Off Workshopin early 2010. There have already been broadcasts of a conventional display-compatible sys-tem, and the ﬁrst HDTV channel compatible broadcasts are scheduled to startin Europe in spring 2010. In 2009, DVB had been closely studying the variousaspects of (potential) 3DTV solutions. A Technical Module Study Mission reportwas ﬁnalized, leading to the formal creation of the TM-3DTV group. As theDVB process is business- and market-driven, a 3DTV Commercial Module hasnow also been created to go back to the ﬁrst step: what kind of 3DTV solutiondoes the market want and need, and how can DVB play an active part in thecreation of that solution? To start answering some of these questions, the CM-3DTV group hosted a DVB 3DTV Kick-off Workshop in Geneva in early 2010,followed immediately by the ﬁrst CM-3DTV.5 This material is based on DVB Project sources.
188 3DTV STANDARDIZATION AND RELATED ACTIVITIES6.6 CONSUMER ELECTRONICS ASSOCIATION (CEA)The CEA is the preeminent trade association promoting growth in the $172 bil-lion US consumer electronics industry. More than 2000 companies are membersof the CEA, including legislative advocacy, market research, technical trainingand education, industry promotion, and the fostering of business and strategicrelationships. At recent CEA Industry Forums (2009), the focus has been on consumer elec-tronics retail trends (e.g., changes in channel dynamics), 3DTV technology, greentechnology, and social media. CEA takes the (tentative) position that the 3DTVtechnology is demonstrating clear success at movie theaters and will graduallyevolve into other facets of consumers’ viewing habits. But the guidance is thatthe industry needs to have reasonable expectations for 3DTV. 3DTV is gain-ing momentum, as covered in this text, but may not completely reach criticalmass for several years. CEA recently observed that the top trends and technolo-gies likely to prominently feature at upcoming international CES events are asfollows: interactive TV topped the list as a trend to watch with a variety of part-nerships, widgets, menus, and new ways to manage content across screens likelyto generate “buzz” at upcoming CES trade shows; and 3DTV also will be a bigtrend, with the question of whether 3D glasses or an alternative solution willemerge as the most viable option. E-books and Netbooks were also highlightedas top 2010-and-beyond CES trends . CEA is developing standards for the interface for an uncompressed digitalinterface between (say) the STB (called source) and the 3D display (calledsink); these standards will need to include signaling details, 3D format sup-port, and other interoperability requirements between sources and sinks. In 2008CEA started standards work aimed at enabling home systems to play stereoscopic3DTV. The group’s ﬁrst step was to upgrade the interconnect standard used inthe High-Deﬁnition Multimedia Interface (HDMI) to enable the cable/interface tocarry stereo 3D data. Speciﬁcally this entailed an upgrade of the CEA 861 stan-dard (A DTV Proﬁle for Uncompressed High-Speed Digital Interfaces, March2008) that deﬁnes an uncompressed video interconnect for HDMI. The stan-dard deﬁnes video timing requirements, discovery structures, and a data transferstructure (InfoPacket) that is used for building uncompressed, baseband, digitalinterfaces on DTVs or DTV monitors. A single physical interface is not speciﬁed,but any interface implemented must use Video Electronics Standards AssociationEnhanced Extended Display Identiﬁcation Data (VESA E-EDID) for format dis-covery. CEA-861-E establishes protocols, requirements, and recommendationsfor the utilization of uncompressed digital interfaces by consumer electronicsdevices such as DTVs, digital cable, satellite or terrestrial STBs, and relatedperipheral devices including, but not limited to DVD players/recorders, and otherrelated sources or sinks. CEA-861-E is applicable to a variety of standard DTV-related high-speed digital physical interfaces such as Digital Visual Interface(DVI) 1.0, Open Low Voltage Differential Signaling Display Interface (LDI),and HDMI speciﬁcations. Protocols, requirements, and recommendations that
BLU-RAY DISC ASSOCIATION (BDA) 189are deﬁned include video formats and waveforms; colorimetry and quantization;transport of compressed and uncompressed, as well as Linear Pulse Code Modula-tion (LPCM), audio; carriage of auxiliary data; and implementations of the VESAE-EDID, that is used by sinks to declare display capabilities and characteristics. At press time, CEA was also working on creating standards for 3DTV activeand passive eyeglasses, metadata, on-screen displays, and user controls. A CEAgroup set up in 2009 was working on a standard for infrared signals used tocontrol active shutter glasses; the group developed a requirements document andpublished a broad call for proposals in early 2010. The CEA also has a taskgroup studying how to place captions in 3D space; the group was expected toissue a call for proposals in early 2010.6.7 HDMI LICENSING, LLCHDMI Licensing, LLC, of Sunnyvale, California, promulgates the HDMI spec-iﬁcations. In 2009, they published the new HDMI 1.4 that was discussed inAppendix A5. HDMI cabling is typically used between the STB or BD playerand the TV display. This upgrade has been viewed as one of the key devel-opments to enable 3DTV. Of all the new HDMI 1.4 features, 3D is reportedlygetting the most interest from the broadcasters . The HDMI 1.4 work grew out of interactions between the HDMI Licensinggroup and a related working group in the CEA that owns CEA 861. Thereare improvements expected with new silicon interface chips as these supporthigher transfer rates on the interface, but the short-term goal is also to haveexisting equipment be as functional as possible because without HDMI support,one cannot readily deploy 3DTV. The HDMI Licensing group is also relaxing itsspeciﬁcations so that many existing STBs and TVs do not have to handle a varietyof previously mandatory formats, often beyond their processing capabilities orneeds. Instead, they can handle stereo 3D broadcasts in the top/bottom formatwith a ﬁrmware upgrade .6.8 BLU-RAY DISC ASSOCIATION (BDA)The BDA announced the ﬁnalization and release of the Blu-ray 3D speciﬁcationat the end of 2009. The speciﬁcation for 3D-enhanced Blu-ray video is titled“Blu-ray 3D.” The speciﬁcation, embodying the work of the leading Hollywoodstudios and consumer electronic and computer manufacturers, will enable thehome entertainment industry to bring the stereoscopic 3D experience into con-sumers’ living rooms, on BDs, but will require consumers to acquire new players,HDTVs, and shutter glasses. The speciﬁcation allows every Blu-ray 3D playerand movie to deliver full HD 1080p resolution (1920 × 1080, progressive scan)to each eye, thereby maintaining the industry’s leading image quality, which fur-ther distances Blu-ray from high-deﬁnition options provided by Internet-based
190 3DTV STANDARDIZATION AND RELATED ACTIVITIESservices. The release of a ﬁnal speciﬁcation based on H.264 should allow pro-fessional video editing tools such as Avid, Final Cut Studio, and Premiere author3DV in a routine fashion. Note: Although announced at the end of 2009, thespeciﬁcation will actually be ﬁnalized in 2010. The Blu-ray 3D speciﬁcation is display-agnostic, meaning that Blu-ray 3Dproducts will deliver the 3D image to any compatible 3D display, regardlessof whether that display uses LCD, OLED, plasma, or other technology, andregardless of what 3D technology the display uses to deliver the image to theviewer’s eyes. The compulsory aspect for stereoscopic 3D is that those screensshould support 120 Hz or higher refresh rate. The speciﬁcation supports playbackof 2D discs in forthcoming 3D players and can enable 2D playback of Blu-ray3D discs on the large installed base of BD players currently in homes around theworld. The Blu-ray 3D speciﬁcation will encode 3DV using the MVC codec, anextension to the ITU-T H.264 AVC codec currently supported by all BD players.MPEG4-MVC compresses both left- and right-eye views with a typical 50%overhead compared to equivalent 2D content, according to BDA; and can providefull 1080p-resolution backward compatibility with current 2D BD players. Thespeciﬁcation also incorporates enhanced graphic features for 3D. These featuresprovide a new experience for users, enabling navigation using 3D graphic menusand displaying 3D subtitles positioned in 3DV . By press time, observers were expecting to see demos of 3DTV sets usingcontent from stereo–3D enabled Blu-ray players utilizing prototype implemen-tations of the Blu-ray 3D. However, most of the players and many of the TVswill not be available until sometime later when new chips for the speciﬁcationsare available.6.9 OTHER ADVOCACY ENTITIESThis section provides a short survey of industry advocacy and activities in supportof 3DTV.6.9.1 3D@Home ConsortiumRecently (in 2008) the 3D@Home Consortium was formed with the missionto speed the commercialization of 3D into homes worldwide and provide thebest possible viewing experience by facilitating the development of standards,roadmaps, and education for the entire 3D industry—from content, hardware,and software providers to consumers .6.9.2 3D Consortium (3DC)The 3D Consortium (3DC) aims at developing 3D stereoscopic display devicesand increasing their take-up, promoting expansion of 3D contents, improving
OTHER ADVOCACY ENTITIES 191distribution, and contributing to the expansion and development of the 3D mar-ket. It was established in Japan in 2003 by ﬁve founding companies and 65other companies including hardware manufacturers, software vendors, contentsvendors, contents providers, systems integrators, image producers, broadcastingagencies, and academic organizations.6.9.3 European Information Society Technologies (IST) Project‘‘Advanced Three-Dimensional Television System Technologies’’(ATTEST)This6 is a project where industries, research centers, and universities have joinedforces to design a backwards-compatible, ﬂexible, and modular broadcast 3DTVsystem. The ambitious aim of the European Information Society Technologies(IST) project ATTEST is to design a novel, backwards-compatible, and ﬂexiblebroadcast 3DTV system. In contrast to former proposals that often relied on thebasic concept of “stereoscopic” video, that is the capturing, transmission, and dis-play of two separate video streams (one for the left eye and one for the right eye),this activity focuses on a data-in-conjunction-with-metadata approach. At the veryheart of the described new concept is the generation and distribution of a noveldata representation format that consists of monoscopic color video and associ-ated per-pixel depth information. From these data, one or more “virtual” viewsof a real-world scene can be synthesized in real-time at the receiver side (i.e., a3DTV STB) by means of the DIBR techniques. The modular architecture of theproposed system provides important features, such as backwards-compatibility totoday’s 2D DTV, scalability in terms of receiver complexity, and adaptability toa wide range of different 2D and 3D displays .220.127.116.11 3D Content Creation. For the generation of future 3D content, novelthree-dimensional material is created by simultaneously capturing video andassociated per-pixel depth information with an active range camera such as theso-called ZCamTM developed by 3DV Systems. Such devices usually integrate ahigh-speed pulsed infrared light source into a conventional broadcast TV cameraand they relate the time of ﬂight of the emitted and reﬂected light walls to directmeasurements of the depth of the scene. However, it seems clear that the need forsufﬁcient high-quality, three-dimensional content can only partially be satisﬁedwith new recordings. It will therefore be necessary (especially in the introduc-tory phase of the new broadcast technology) to also convert already existing2D video material into 3D using so-called “structure from motion” algorithms.In principle, such (ofﬂine or online) methods process one or more monoscopiccolor video sequences to (i) establish a dense set of image point correspondencesfrom which information about the recording camera, as well as the 3D structureof the scene can be derived or (ii) infer approximate depth information from the6 This section is based on Ref. .
192 3DTV STANDARDIZATION AND RELATED ACTIVITIESrelative movements of automatically tracked image segments. Whatever 3D con-tent generation approach is used in the end, the outcome in all cases consists ofregular 2D color video in European DTV format (720 × 576 luminance pels, 25Hz, interlaced) and an accompanying depth-image sequence with the same spa-tiotemporal resolution. Each of these depth-images stores depth information as8-bit gray values with the gray level 0 specifying the furthest value and the graylevel 255 deﬁning the closest value. To translate this data representation formatto real, metric depth values (that are required for the “virtual” view generation(and to be ﬂexible with respect to 3D scenes with different depth characteristics,the gray values are normalized to two main depth clipping planes.18.104.22.168 3DV Coding. To provide the future 3DTV viewers with three-dimensional content, the monoscopic color video and the associated per-pixeldepth information have to be compressed and transmitted over the conventional2D DTV broadcast infrastructure. To ensure the required backwards-compatibilitywith existing 2D-TV STBs, the basic 2D color video has to be encoded usingthe standard MPEG-2 as MPEG-4 Visual or AVC tools currently required bythe DVB Project in Europe.22.214.171.124 Transmission. The DVB Project, a consortium of industries andacademia responsible for the deﬁnition of today’s 2D DTV broadcast infrastruc-ture in Europe, requires the use of the MPEG-2 systems layer speciﬁcations forthe distribution of audiovisual data via cable (DVB-C), satellite (DVB-S), orterrestrial (DVB-T) transmitters.126.96.36.199 ‘‘Virtual’’ View Generation and 3D Display. At the receiver sideof the proposed ATTEST system, the transmitted data is decoded in a 3DTVSTB to retrieve the decompressed color video- and depth-image sequences (aswell as the additional metadata). From this data representation format, a DIBRalgorithm generates “virtual” left- and right-eye views for the three-dimensionalreproduction of a real-world scene on a stereoscopic or autostereoscopic, single-or multiple-user 3DTV display. The backwards-compatible design of the systemensures that viewers who do not want to invest in a full 3DTV set are still ableto watch the two-dimensional color video without any degradations in qualityusing their existing digital 2DTV STBs and displays.6.9.4 3D4YOU3D4YOU7 is funded under the ICT Work Programme 2007–2008, a thematic pri-ority for research and development under the speciﬁc program “Cooperation” ofthe Seventh Framework Programme (2007–2013). The objectives of the projectare7 This section is based on Ref. .
OTHER ADVOCACY ENTITIES 193 1. to deliver an end-to-end system for 3D high-quality media; 2. to develop practical multi-view and depth capture techniques; 3. to convert captured 3D content into a 3D broadcasting format; 4. to demonstrate the viability of the format in production and over broadcast chains; 5. to show reception of 3D content on 3D displays via the delivery chains; 6. to assess the project results in terms of human factors via perception tests; 7. to produce guidelines for 3D capturing to aid in the generation of 3D media production rules; 8. to propose exploitation plans for different 3D applications. The 3D4YOU project aims at developing the key elements of a practical3D television system, particularly, the deﬁnition of a 3D delivery format andguidelines for a 3D content creation process. The 3D4YOU project will develop 3D capture techniques, convert capturedcontent for broadcasting, and develop 3D coding for delivery via broadcast thatis suitable to transmit and make public. 3D broadcasting is seen as the nextmajor step in home entertainment. The cinema and computer games industrieshave already shown that there is considerable public demand for 3D content butthe special glasses that are needed limits their appeal. 3D4YOU will addressthe consumer market that coexists with digital cinema and computer games. The3D4YOU project aims to pave the way for the introduction of a 3D TV system.The project will build on previous European research on 3D, such as the FP5project ATTEST that has enabled European organizations to become leaders inthis ﬁeld. 3D4YOU endeavors to establish practical 3DTV. The key success factor is 3Dcontent. The project seeks to deﬁne a 3D delivery format and a content creationprocess. Establishing practical 3DTV will then be demonstrated by embeddingthis content creation process into a 3DTV production and delivery chain, includ-ing capture, image processing, delivery, and then display in the home. The projectwill adapt and improve on these elements of the chain so that every part inte-grates into a coherent interoperable delivery system. A key project’s objective isto provide a 3D content format that is independent of display technology, andbackward compatible with 2D broadcasting. 3D images will be commonplacein mass communication in the near future. Also, several major consumer elec-tronics companies have made demonstrations of 3DTV displays that could bein the market within two years. The public’s potential interest in 3DTV can beseen by the success of 3D movies in recent years. 3D imaging is already presentin many graphics applications (architecture, mechanical design, games, cartoons,and special effects for TV and movie production). In recent years, multi-view display technologies have appeared that improvethe immersive experience of 3D imaging that leads to the vision that 3DTV orsimilar services might become a reality in the near future. In the United States,the number of 3D-enabled digital cinemas is rapidly growing. By 2010, about
194 3DTV STANDARDIZATION AND RELATED ACTIVITIES4300 theaters are expected to be equipped with 3D digital projectors with thenumber increasing every month. Also in Europe, the number of 3D theatersis growing. Several digital 3D ﬁlms will surface in the months and years tocome and several prominent ﬁlmmakers have committed to making their nextproductions in stereo 3D. The movie industry creates a platform for 3D movies,but there is no established solution to bring these movies to the domestic market.Therefore, the next challenge is to bring these 3D productions to the living room.2D to 3D conversion and a ﬂexible 3D format are an important strategic area.It has been recognized that multi-view video is a key technology that serves awide variety of applications, including free viewpoint and 3DV applications forthe home entertainment and surveillance business ﬁelds. Multi-view video codingand transmission systems are most likely to form the basis for next-generation TVbroadcasting applications and facilities. Multi-view video will greatly improve theefﬁciency of current video coding solutions performing simulcasts of independentviews. This project builds on the wealth of experience of the major players inEuropean 3DTV and intends to bring the date of the start of 3D broadcasting astep closer by combining their expertise to deﬁne a 3D delivery format and acontent creation process. The key technical problems that currently hamper the introduction of 3DTVto the mass market are as follows: 1. It is difﬁcult to capture 3DV directly using the current camera technology. At least two cameras need to operate simultaneously with an adjustable but known geometry. The offset of stereo cameras needs to be adjustable to capture depth, both close by and far away. 2. Stereo video (acquired with 2-cameras) is currently not sufﬁcient input for glasses-free, multi-view autostereoscopic displays. The required processing, such as disparity estimation, is noise-sensitive resulting in low 3D picture quality. 3. 3D postproduction methods and 3DV standards are largely absenter- immature. The 3D4YOU project will tackle these three problems. For instance, a creativecombination of two or three high-resolution video cameras with one or two low-resolution depth range sensors may make it possible to create 3DV of goodquality without the need for an excessive investment in equipment. This is incontrast to installing, say, 100 cameras for acquisition where the expense mayhamper the introduction of such a system. Developing tools for conversion of 3D formats will stimulate content creationcompanies to produce 3DV content at acceptable cost. The cost at which 3DVshould be produced for commercial operation is not yet known. However, cur-rently, 3DV production requires almost per frame user interaction in the video,which is certainly unacceptable. This immediately indicates the issue that needsto be solved: currently, fully automated generation of high 3DV is difﬁcult; in
OTHER ADVOCACY ENTITIES 195the future it needs to be fully, or at least semi-automatic with an acceptable min-imum of manual supervision during postproduction. 3D4YOU will research howto convert 3D content into a 3D broadcasting format and prove the viability ofthe format in production and over broadcast chains. Once 3DV production becomes commercially attractive because acquisitiontechniques and standards mature, then this will impact the activities of contentproducers, broadcasters, and telecom companies. As a result, one may see thatthese companies may adopt new techniques for video production just becausethe output needs to be in 3D. Also, new companies could be founded that focuson acquiring 3DV and preparing it for postproduction. Here, there is room fordifferentiation since, for instance, the acquisition of a sport event will requirelarge baselines between cameras and real-time transmission, whereas the shoot-ing of narrative stories will require both small and large baselines and allowssome manual postproduction for achieving optimal quality. These activities willrequire new equipment (or a creative combination of existing equipment) andnew expertise. 3D4YOU will develop practical multi-view and depth capture techniques. Cur-rently, the stereo video format is the de facto 3D standard that is used by thecinemas. Stereo acquisition may, for this reason, become widespread as an acqui-sition technique. Cinemas operate with glasses-based systems and can thereforeuse a theater-speciﬁc stereo format. This is not the case for the glasses-freeautostereoscopic 3DTV that 3D4YOU foresees for the home. To allow glasses-free viewing with multiple people at home, a wide baseline is needed to coverthe total range of viewing angles. The current stereo video that is intended forthe cinema will need considerable postproduction to be suitable for viewing ona multi-view autostereoscopic display. Producing visual content will therefore,become more complex and may provide new opportunities for companies cur-rently active in (3D) movie postproduction. According to the Networked andElectronic Media (NEM) Strategic Research Agenda, multi-view coding willform the basis for next-generation TV broadcast applications. Multi-view videohas the advantage that it can serve different purposes. On the one hand, themulti-view input can be used for 3DTV. On the other hand, it can be shownon a normal TV where the viewer can select his or her preferred viewpoint ofthe action. Of course, a combination is possible where the viewer selects hisor her preferred viewpoint on a 3DTV. However, multi-view acquisition with30 views for example, will require 30 cameras to operate simultaneously. Thisinitially requires a large investment. 3D4YOU therefore sees a gradual transitionfrom stereo capture to systems with many views. 3D4YOU will investigate amixture of 3DV acquisition techniques to produce an extended center view plusdepth format (possibly with one or two extra views) that is, in principle, easierto produce, edit, and distribute. The success of such a simpler format relies onthe ease (read cost!) at which it can be produced. One can conclude that theintroduction of 3DTV to the mass market is hampered by (i) the lack of high-quality 3DV content; (ii) by the lack of suitable 3D formats; and (iii) lack of
196 3DTV STANDARDIZATION AND RELATED ACTIVITIESappropriate format conversion techniques. The variety of new distribution mediafurther complicates this. Hence, one can identify the following major challenges that are expected tobe overcome by the project: 1. Video Acquisition for 3D Content: Here, the practicalities of multi-view and depth capture techniques are of primary importance, the challenge is to ﬁnd the trade off such as number of views to be recorded, and how to optimally integrate depth capture with multi-view. A further challenge is to deﬁne which shooting styles are most appropriate. 2. Conversion of Captured Multi-View Video to a 3D Broadcasting Format: The captured format needs new postproduction tools (like enhancement and regularization of depth maps or editing, mixing, fading, and composit- ing of V + D representations from different sources) and a conversion step generating a suitable transmission format that is compatible with used post- production formats before the 3D content can be broadcast and displayed. 3. Coding Schemes for Compression and Transmission: A last challenge is to provide suitable coding schemes for compression and transmission that are based on the 3D broadcasting format under study and to demonstrate their feasibility in ﬁeld trials under real distribution conditions. By addressing these three challenges from an end-to-end systems point ofview, the 3D4YOU project aims to pave the way to the deﬁnition of a 3D TVsystem suitable for a series of applications. Different requirements could be setdepending on the application, but the basic underlying technologies (capture,format, and encoding) should maintain as much commonality as possible so asto favor the emergence of an industry based on those technologies.6.9.5 3DPHONEThe 3DPHONE8 project aims to develop technologies and core applicationsenabling a new level of user experience by developing end-to-end all-3D imagingmobile phone. Its aim is to have all fundamental functions of the phone—mediadisplay, User Interface (UI), and personal information management (PIM)applications—realized in 3D. We will develop techniques for all-3D phoneexperience: mobile stereoscopic video, 3D UIs, 3D capture/content creation,compression, rendering, and 3D display. The research and development ofalgorithms for 3D audiovisual applications including personal communication,3D visualization, and content management will be done. The 3DPhone Project started on February 11, 2008. The duration of the projectis 3 years and there are six participants from Turkey, Germany, Hungary, Spain,and Finland. The partners are Bilkent University, Fraunhofer, Holograﬁka, TAT,8 This section is based on Refs  and .
OTHER ADVOCACY ENTITIES 197Telefonica, and University of Helsinki. 3DPhone is funded by the EuropeanCommunity’s ICT programme in Framework Programme Seven. The goal is to enable users to • capture memories in 3D and communicate with others in 3D virtual spaces; • interact with their device and applications in 3D; • manage their personal media content in 3D. The expected outcome will be simpler use and a more personalized look andfeel. The project will bring state-of-the-art advances in mobile 3D technologieswith the following activities: • A mobile hardware and software platform will be implemented with both 3D image capture and 3D display capability, featuring both 3D displays and multiple cameras. The project will evaluate different 3D display and capture solutions and will implement the most suitable solution for hardware–software integration. • UIs and applications that will capitalize on the 3D autostereoscopic illu- sion in the mobile handheld environment will be developed. The project will design and implement 3D and zoomable UI metaphors suitable for autostereoscopic displays. • End-to-end 3DV algorithms and 3D data representation formats, targeted for 3D recording, 3D playback, and real-time 3DV communication will be investigated and implemented. • Ergonomics and experience testing to measure any possible negative symp- toms, such as eye strain created by stereoscopic content, will be performed. The project will research ergonomic conditions speciﬁc to the mobile hand- held usage: in particular, the small screen, one hand holding the device, absence of complete keyboard, and limited input modalities. In summary, the general requirements on 3DV algorithms on mobile phonesare as follows: • low power consumption, • low complexity of algorithms, • limited memory/storage for both RAM and mass storage, • low memory bandwidth, • low video resolution, • limited data transmission rates and limited bitrates for 3DV signal. These strong restrictions derived from terminal capabilities and from transmis-sion bandwidth limitations usually result in relatively simple video processingalgorithms to run on mobile phone devices. Typically, video coding standards
198 3DTV STANDARDIZATION AND RELATED ACTIVITIEStake care of this by speciﬁc proﬁles and levels that only use a restricted andsimple set of video coding algorithms and low-resolution video. The H.264/AVCBaseline Proﬁle for instance, uses only a simple subset of the rich video cod-ing algorithms that the standard provides in general. For 3DV, the equivalent ofsuch a low-complexity baseline proﬁle for mobile phone devices still needs tobe deﬁned and developed. Obvious requirements of video processing and codingapply for 3DV on mobile phones as well, such as • high coding efﬁciency (taking bitrate and quality into account); • requirements speciﬁc for 3DV that apply for 3DV algorithms on mobile phones including – ﬂexibility with regard to different 3D display types, – ﬂexibility for individual adjustment of 3D impression.REFERENCES 1. Onural L. The 3DTV Toolbox—The Results of the 3DTV NoE. 3DTV NoE Coordi- nator, Bilkent University, Workshop on 3DTV Broadcasting, Geneva. Apr 30 2009. 2. Merritt R. Incomplete 3DTV Products in CES Spotlight HDMI Upgrade one of Latest Pieces in Stereo 3D Puzzle. EE Times. Dec 23 2009. 3. 3DPHONE. Project no. FP7-213349, Project title: All 3D imaging phone, 7th frame- work programme, Speciﬁc Programme Cooperation, FP7-ICT-2007.1.5—Networked Media, D5.1- Requirements and speciﬁcations for 3D video. Oct 19 2008. 4. ISO/IEC JTC 1/SC 29/WG 11. Text of ISO/IEC FDIS 23002-3 Representation of auxiliary video and supplemental information. WG11 Document N8768, Marrakech, Morocco. Jan 2007. 5. Hur J-H, Cho S, Lee Y-L. Illumination change compensation method for H.264/AVC- based multi-view video coding. IEEE Trans Circ Syst Video Tech 2007; 17(11). 6. International Organization for Standardization. ISO/IEC JTC1/SC29/WG11, Coding Of Moving Pictures and Audio, Vision on 3D Video, Video and Requirements, ISO/IEC JTC1/SC29/WG11N10357, Lausanne, Switzerland. Feb 2009. 7. International Organization for Standardization. Introduction to 3D Video. ISO/IEC JTC1/SC29/WG11, Coding of Moving Pictures and Audio, ISO/IEC JTC1/SC29/WG11, MPEG2008/N9784, Approved Project, Archamps, France. May 2008. 8. Tanimoto M. Overview of Free Viewpoint Television. Signal Process Image Comm 2006; 21(6): 454–461. 9. Bourge C, Fehn. White Paper on ISO/IEC 23002-3 Auxiliary Video Data Represen- tations, ISO/IEC JTC1/SC29/WG11/N8039, Montreux, Switzerland. Apr 2006.10. Wissler N. MPEGIF Launches “3D over MPEG” Campaign. Online Magazine. Dec 21 2009. Your-Story.org.11. TVB, Television Broadcast. A 3DTV Update from the MPEG Industry Forum. Online Magazine. Jan 20 2010, www.televisionbroadcast.com.
REFERENCES 19912. Merritt R. SMPTE to kick off 3DTV Effort in June—Task Force Calls for Single Home Mastering Spec. EE Times. Apr 13 2009.13. TVB, Television Broadcast, Building a 3DTV Programming Guide. Online Magazine. Jan 20 2010.14. TVB, Television Broadcast, 3DTV Standards Face Multiple Obstacles. Online Mag- azine. Jan 19 2010.15. ITU-R Newsﬂash. ITU Journey to Worldwide 3D TELEVISION SYSTEM BEGINS, Geneva. Jun 3 2008.16. Fehn C. Depth-image-based rendering (DIBR), compression, and transmission for a new approach on 3DTV. In: Woods AJ, Merritt JO, Benton SA, Bolas MT, Editors. Stereoscopic displays and virtual reality systems XI. Proceedings of the SPIE. Volume 5291, Denver, CO; 2004. pp. 93–104.17. Consumer Electronics Association (CEA). Press Release, CEA’s Industry Forum Delivers Economic Analysis, Retail Strategy, Green Trends and Industry Advice. Earth Times Online Magazine. Oct 22 2009.18. HDMI Licensing, LLC. 1060 E. Arques Avenue, Suite 100, Sunnyvale, CA 94085, USA.19. Shilov A. Blu-ray Disc Association Finalizes Stereoscopic 3D Speciﬁcation: Blu-ray 3D Spec Finalized: New Players Incoming. xbitslabs Online Magazine. Dec 18 2009. http://www.xbitlabs.com.20. The 3D@Home Consortium. http://www.3dathome.org/.21. 3D4YOU Project. 3D Media Cluster, Umbrella structure embracing related EC 3DTV funded projects. http://www.3d4you.eu/index.php.22. 3DPHONE. Project no. FP7-213349, Project title: All 3D imaging phone, 7th frame- work programme, Speciﬁc Programme Cooperation, FP7-ICT-2007.1.5—Networked Media, D5.2—Report on ﬁrst study results for 3D video solutions. Dec 31 2008.23. 3DPHONE Project. 3D Media Cluster, Umbrella structure embracing related EC 3DTV funded projects. http://the3dphone.eu/.
GLOSSARY1080p 1080p is a high-deﬁnition video format with resolution of 1920 × 1080 pixels. The “p” stands for progressive scan, which means that each video frame is transmitted as a whole in a single sweep. The main advantage of 1080p TVs is that they can display all high-deﬁnition video formats without the down- converting that sacriﬁces some picture detail. 1080p TVs display video at 60 fps, so this format is often referred to as 1080p60. The video on most high- deﬁnition discs is encoded at a ﬁlm’s native rate of 24 fps, or 1080p24. For compatibility with most current 1080p TVs, high-deﬁnition players internally convert the 1080p24 video to 1080p60. HDTVs now include the ability to accept a 1080p24 signal directly. These TVs do not actually display video at 24 fps because that would cause visible ﬂicker and motion stutter. The TV converts the video to 60 fps or whatever its native display rate is. The ideal situation would be to display 1080p24 at a multiple of 24 fps, like 72, 96, or 120 fps, to avoid the motion judder caused by 3–2 pull-down, that is required when converting 24-frames-per-second material to 60 fps .120-Hz refresh rate The digital display technologies (LCD, plasma, DLP, LCoS, etc.) that have replaced picture tubes are progressive scan by nature, displaying 60 video fps—often referred to as “60 Hz.” HDTVs with 120-Hz refresh rates employ sophisticated video processing to double the standard rate to 120 fps by inserting either additional video frames or black frames. Because each video frame appears for only half the normal amount of time, on-screen motion looks smoother and more ﬂuid, with less smearing. It is especially noticeable viewing fast-action sports and video games. This feature is available on an increasing number of ﬂat-panel LCD TVs .240-Hz refresh rate 240-Hz refresh rate reduces LCD motion blur even more than 120-Hz refresh rate. 240-Hz processing creates and inserts three new video frames for every original frame. Most “240-Hz” TVs operate this way, but some models use “pseudo-240-Hz” technology that combines 120-Hz refresh rate with high-speed backlight scanning. An example of the pseudo- 240-Hz approach, that can also be very effective, is Toshiba’s ClearScan 240 technology .3DTV Content Capture, Encoding and Transmission: Building the TransportInfrastructure for Commercial Services, by Daniel MinoliCopyright 2010 John Wiley & Sons, Inc. 201
202 GLOSSARY2D Two dimensional. An image or object with only two dimensions, such as width and height, but no depth.2D+delta A single image along with data that represents the difference between that image view and a second eye image view along with other additional metadata. The delta data could be spatial temporal stereo disparity, temporal predictive or bidirectional motion compensation .3D Having or appearing to have width, height, and depth (three-dimensional). Accepts and/or produces uncompressed video signals that convey 3D.3D adjustment setting Changes the apparent depth of objects on a 3D view screen.3D distribution (or transport) formats for 3D content Formats for 3D content when content is transmitted to the end user over the air, over cable, over satellite, over the Internet, or on packaged media. These formats typically need to be compressed on the service provider side and decompressed on the network termination at home.3D format An uncompressed video signal type used to convey 3D over an interface.3D in-home formats Formats used when connecting in-home devices to the 3D display system. In-home formats may be compressed or uncompressed. The decompression and decoding/transcoding can be done in several places in the home and can include additional demodulation of RF-modulated signals as well. Video decoding and 3D decoding may be done at different locations in the signal chain, which could require two different in-home formats .3D native display formats Formats that are required to create the 3D image on the particular TV. These formats may reside only in the TV, or can be decoded/transcoded outside of the TV. Normally, once a signal is decoded into the 3D native display format, no additional 3D signal processing is required to display the signal although there is likely to be additional 2D processing. The 3D native display format is different from the native 3D display format or resolution, which refers to the 3D pixel arrangement .3D rendering The process of producing an image based on three-dimensional data stored within a computer.3D video (3DV) Any capture, transmission, display of three-dimensional video content in any venue, including playback of downloaded and stored content, display of broadcasted content (i.e., 3DTV, via DVB, DVB-H), display of streamed content (i.e., via mobile phone line, WLAN), or recording .3D viewing The act of viewing a 3D image with both eyes in order to experi- ence stereoscopic vision and binocular depth perception.3D-ready Contains 3D decoder/transcoder and may accept and/or produce uncompressed video signals that convey 3D.Accommodation The focusing of the eyes. The refocusing of the eyes as their vision shifts from one distance plane to another.
GLOSSARY 203Accommodation–convergence relationship The learned relationship estab- lished through early experience between the focusing of the eyes and verging of the eyes when looking at a particular object point in the visual world. Usually called the accommodation–convergence relationship (or the convergence–accommodation relationship). .Accommodation–convergence conﬂict The deviation from the learned and habitual correlation between accommodation and convergence when viewing plano-stereoscopic images .Accommodation–convergence link The physiological link that causes the eyes to change focus as they change convergence, a link that has to be overcome in stereo viewing since the focus remains unchanged on the plane of the constituent ﬂat images .Active glasses Powered shutter glasses that function by alternately allowing each eye to see the left-eye/right-eye images in an eye sequential 3D system. Most commonly based on liquid crystal devices .ALiS (Alternate Lighting of Surfaces) A type of high-deﬁnition plasma TV panel designed for optimum performance when displaying 1080i material. On a typical progressive-scan plasma TV, all pixels can be illuminated at any given instant. With an ALiS plasma panel, alternate rows of pixels are illuminated so that only half the panel’s pixels can be illuminated at any moment (somewhat similar to interlaced-scanning on a CRT-type TV). ALiS-based plasmas make up a small part of the overall market .Anaglyph A type of stereogram (either printed, projected or viewed on a TV or computer screen) in which the two images are superimposed but are separated, so each eye sees only the desired image, by the use of colored ﬁlters and viewing spectacles (commonly red and cyan, or red and green). To the naked eye, the image looks overlapping, doubled, and blurry. Traditionally, the image for the left eye is printed in red ink and the right-eye image is printed in green ink .Anamorphic video Refers to widescreen video images that have been “squeezed” to ﬁt a narrower video frame when stored on DVD. These images must be expanded (unsqueezed) by the display device. Most of today’s TVs employ a screen with ratio16:9 aspect ratio, so that anamorphic and other widescreen material can be viewed in its proper proportions. When anamorphic video is displayed on an old-fashioned TV with a ratio4:3 screen, images appear unnaturally tall and narrow .Angular disparity See parallax angle.Angular resolution The angular resolution determines the smallest angle between independently emitted light beams from a single screen point. It can be calculated by dividing the emission range with the number of independently addressable light beams emitted from a screen point. The angular resolution determines the smallest feature (voxel) the display can reconstruct in a given distance from the screen .
204 GLOSSARYAspect ratio The ratio of width to height for an image or screen. The North American NTSC television standard uses the squarish ratio4:3 (ratio1.33:1) ratio. HDTVs use the wider ratio16:9 ratio (ratio1.78:1) to better display widescreen material like high-deﬁnition broadcasts and DVDs .Autostereoscopic 3D displays that do not require glasses to see the stereo- scopic image. Multi-view autostereoscopic displays based on parallax barrier or lenticules (sometimes called parallax panoramagram displays).Backlight scanning An antiblur technology used in some LCD TVs. Typical LCDs use a ﬂuorescent backlight that shines constantly, that can contribute to motion blur. LCD models with backlight scanning have a special type of ﬂuorescent backlight that pulses at very high speed, that has the effect of reducing motion blur. Some recent TVs use backlight scanning along with 120-Hz refresh rate for even greater blur reduction .Binocular Of or involving both eyes at once.Binocular cues Depth cues that depend on perception with two eyes.Binocular depth perception A result of successful stereo vision; the ability to visually perceive three-dimensional space; the ability to visually judge relative distances between objects; a visual skill that aids accurate movement in three- dimensional space .Binocular disparity The difference between the view from the left and right eyes.Binocular rivalry Perception conﬂicts that appear in case of colorimetric, geo- metric, photometric or other asymmetries between the two stereo images .Binocular stereopsis Term used for the depth sense (also described as stere- opsis)Binocular vision Vision as a result of both eyes working as a team; when both eyes work together smoothly, accurately, equally, and simultaneously.Chromatic stereoscopy An impression of depth that results from viewing a spectrum of colored images through a light-bending device such as a prism, a pinhole or an embossed “holographic” ﬁlter, caused by variations in the amount of bending according to the wavelength of the light from differing colors (chromatic dispersion). If such a device is placed in front of each eye, but arranged to shift planar images or displays of differing colors laterally in opposite directions, a 3D effect will be seen. The effect may also be achieved by the lenses of the viewer’s eyes themselves when viewing a planar image with strong and differing colors. Typically, with unaided vision, red portions of the image appear closer to the viewer than the blue portions of the image. Sometimes also called chromostereopsis .Circular polarization A form of polarization of the light in which the tip of the electromagnetic vector of the light ray moves through a corkscrew in space.Column-interleaved format A 3D image format where left and right view image data are encoded on alternate columns of the display.
GLOSSARY 205Compressed video signal A stream of compacted data representing an uncom- pressed video signal. A compressed video signal is an encoded version of an uncompressed video signal. A compressed video signal must be decoded to an uncompressed video signal in order to be edited or displayed. Compressed video formats vary according to the encoding methods used. A compressed video signal format may be converted to another using a “transcoder” .Computer-generated holograms (CGHs) The counterpart of computer graph- ics (CGI) in holography. The technology has a long history and is sometimes referred to as the ﬁnal 3D technology, because CGHs not only produce a sen- sation of depth but also generate light from the objects themselves. However, currently available CGHs cannot yet produce ﬁne true 3D images accom- panied by a strong sensation of depth. Such ﬁne CGHs commonly require the following two conditions: the CGHs must have a large viewing zone to acquire the autostereoscopic property, i.e., motion parallax and, second, the dimensions of the CGHs must be large enough to reconstruct a 3D object that can be observed by two naked eyes. Both of these conditions lead to an extremely large number of pixels for a CGH, because the large viewing zone requires high spatial resolution and the large dimensions require a large num- ber of pixels for high resolution. In addition, scenes with occlusions should be reconstructed to give a CGH a strong sensation of depth, because the ability to handle occlusions is one of the most important mechanisms in the perception of 3D scenes. The reason why ﬁne 3D images are difﬁcult to be produced by CGH technology is that there is no practical technique to compute the object ﬁelds for such high-deﬁnition CGHs that reconstruct 3D scenes with occlusions .Contrast ratio Measures the difference between the brightest whites and the darkest blacks that a TV can display. The higher the contrast ratio, the better a TV will be at showing subtle color details, and the better it will look in rooms with more ambient room light. Contrast ratio is one of the most impor- tant speciﬁcations for all TV types. There are actually two different ways of measuring a TV’s contrast ratio. Static contrast ratio measures the difference between the brightest and darkest images a TV can produce simultaneously (sometimes called on-screen contrast ratio). The ratio of the brightest and darkest images a TV can produce over time is called dynamic contrast ratio. Both speciﬁcations are meaningful, but the dynamic speciﬁcation is often four or ﬁve times higher than the static speciﬁcation .Conventional stereo video (CSV) Conventional stereo video is the most well- known and in a way, most simple type of 3D video representation. Only color pixel video data are involved, that is captured by at least two cameras. The resulting video signals may undergo some processing steps like normalization, color correction, rectiﬁcation, etc., but in contrast to other 3D video formats, no scene geometry information is involved. The video signals are meant in principle to be directly displayed using a 3D display system, though some video processing might also be involved before display.
206 GLOSSARYConvergence The ability of both eyes to turn inwards together. This enables both eyes to be looking at exactly the same point in space. This skill is essential to being able to pay adequate attention at near to be able to read. Not only is convergence essential to maintaining attention and single vision, it is vital to be able to maintain convergence comfortably for long periods of time. For good binocular skills it is also to be able to look further away. This is called divergence. Sustained ability to make rapid convergence and divergence movements are vital skills for learning . The term has also been used to describe the movement of left and right image ﬁelds or the rotation (toe-in) of camera heads, the horizontal rotation of eyes or cameras that makes their optical axes intersect in a single point in 3D space. This term is also used to denote the process of adjusting the Zero Parallax Setting (ZPS) in a stereoscopic camera .Corresponding points The points in the left and right images that are pictures of the same point in 3D space; the image points of the left and right ﬁelds referring to the same point on the object. The distance between the corre- sponding points on the projection screen is deﬁned as parallax. Also known as conjugate or homologous points .Crossed disparity Retinal disparities indicating that corresponding optical rays intersect in front of the horopter or the convergence plane.Crosstalk Imperfect separation of the left- and right-eye images when view- ing plano-stereoscopic 3D content. Incomplete isolation of the left and right image channels so that one leaks (leakage) or bleeds into the other. Looks like a double exposure. Crosstalk is a physical entity and can be objectively measured, whereas ghosting is a subjective term. See ghosting.Depth budget The combined values of the positive and negative parallax. Often given as a percentage of screen width.Depth cues Cues by which HVS is able to perceive depth.Depth perception The ability to see in 3D or depth to allow us to judge the relative distances of objects. Often referred to as stereo vision or stereopsis.Depth range The extent of depth that is perceived when a plano-stereoscopic image is reproduced by means of a stereoscopic viewing device. A term that applies to stereoscopic images created with cameras. The limits in values are deﬁned as the range of distances in camera space from the background point, producing maximum acceptable positive parallax to the foreground point producing maximum acceptable negative parallax .Diplopia Perception of double images caused by imperfect stereoscopic fusion. A condition where the left and right homologs in a stereogram remain separate instead of being fused into a single image.Direct view A display where the viewer looks directly at the display, not at a projected or virtual image produced by the display. CRTs, LCDs, plasma panels, and OLEDs can all be used in direct view 3D displays .
GLOSSARY 207Disparate images A pair of images that fail as a stereogram (e.g., due to distortion, poor trimming, masking, mismatched camera lenses).Disparity The distance between corresponding points on left- and right-eye images. The distance between conjugate points on overlaid retinas is some- times called retinal disparity. The corresponding term for the display screen is parallax .Disparity difference The parallax between two images representing the same scene but acquired from two different viewing angles. The disparity between homologous points is used to compute the elevation .Display An electronic device that presents information in visual form, that is, produces an electronic image—such as LCDs and plasma displays.Display surface The physical surface of the display that exhibits information.Distortion In general usage, any change in the shape of an image that causes it to differ in appearance from the ideal or perfect form. In stereo, usually applied to an exaggeration or reduction of the front-to-back dimension .Divergence The ability of the eyes to turn outwards together to enable them both to look further away. The opposite of convergence. It is essential for efﬁcient learning and general visual performance to have good divergence and convergence skills .DLNA (Digital Living Network Alliance) A collaboration among more than 200 companies, including Sony, Panasonic, Samsung, Microsoft, Cisco, Denon, and Yamaha. Their goal is to create products that connect to each other across one’s home network, regardless of manufacturer, so one can easily enjoy the digital and online content in any room. While all DLNA-compliant devices are essentially guaranteed to work together, they may not be able to share all types of media. For example, a DLNA-certiﬁed TV may be able to display digital photos from a DLNA-certiﬁed media server, but not videos .DLP (Digital Light Processing) A projection TV technology developed by Texas Instruments, based on their Digital Micromirror Device (DMD) microchip. Each DMD chip has an array of tiny swiveling mirrors that create the image. Depending on the TV’s resolution, the number of mirrors can range from several hundred thousand to over two million. DLP technology is used in both front- and rear-projection displays. There are two basic types of DLP projector. “Single-chip” models that include virtually all rear-projection DLP TVs, use a single DMD chip, with color provided by a spinning color wheel or colored LEDs. “3-chip” projectors dedicate a chip to each primary color: red, green, and blue. While 3-chip models are considerably more expensive, they completely eliminate the rainbow effect, which is an issue for a small minority of viewers .Dolby’s 3D system Used for some Avatar screenings, makes use of an exclusive ﬁltering wheel (above) installed inside the projector in front of a 6.5-kW bulb. Effectively it operates as a notch ﬁlter (a notch ﬁlter is a band-rejection ﬁlter
208 GLOSSARY that produces a sharp notch in the frequency response curve of a system; a ﬁlter that rejects/attenuates one frequency band and passes both a lower and a higher frequency band). The wheel is divided into two parts, each one ﬁltering the projector light into different wavelengths for red, green, and blue. The wheel spins rapidly—about three times per frame—so it does not produce a seizure-inducing effect. The glasses that the viewer wears contains passive lenses that only allow light waves aligned in a certain direction to pass through, separating the red, green, and blue wavelengths for each eye .DVI (Digital Visual Interface) A multipin, computer-style connection intended to carry high-resolution video signals from video source components (such as older HD-capable satellite and cable boxes, and up-converting DVD players) to HD-capable TVs with a compatible connector. Most (but not all) DVI con- nections use HDCP (High-Bandwidth Digital Content Protection) encryption to prevent piracy. In consumer electronics products, DVI connectors have been almost completely replaced by HDMI connectors that carry both video and audio. One can use an adapter to connect a DVI-equipped component to an HDMI-equipped TV, or vice versa, but a DVI connection can never carry audio .Emissive A self-luminous display where there is no separate light source (lamp). Plasma panels, LEDs, OLEDs, and CRTs are examples.Eye sequential 3D The images in a stereo pair are presented alternately to the left and right eyes fast enough to be merged into a single 3D image. At no instant in time are both images present. The images may be separated at the eyes by active or passive glasses .Eye-dedicated displays A 3D display system where there are two separate displays to produce the left- and right-eye images and the geometry of the system is arranged so each eye can only see one display .Eyewear Anything worn on the head and eyes to produce a 3D image. This includes both passive and active glasses or head-mounted displays. Consumer- grade 2D and 3D HMDs are often speciﬁcally called eyewear. Passive and active glasses are often just called glasses .Far point The feature in a stereo image that appears to be farthest from the viewer.Field of depth The ﬁeld of depth determines the largest depth a display can visualize with a deﬁned minimum resolution. For displays with ﬁxed emission range and angular resolution, the size of the smallest displayed feature depends on the distance from the screen. The smallest feature (voxel) the display can reconstruct is the function of the distance from the screen and the angular resolution. If one sets an upper limit on the feature size, the angular resolution determines the distance from the screen, within which the displayed features are smaller than the given limit. This range is the ﬁeld of depth that effectively determines the largest displayable depth below which the features are within a given limit .
GLOSSARY 209Field of view Usually measured in degrees, this is the angle that a lens can accept light. For instance, the human eye’s horizontal ﬁeld of view is about 175◦ .Field sequential The rapid alternation of left and right views in the video format, on the display or at the eye.Fields per second The number of subimages presented each second. The ofﬁ- cial unit is Hertz (Hz). The subimage can be deﬁned by the interlace pattern, the color or the left/right images in a stereo pair.Film A sheet of material that is thin compared to its lateral dimensions. Films are used to modify the light passing through or reﬂecting off of them. Films can modify the brightness, color, polarization or direction of light. Film encoded with images can be used in projection systems as an image source .Flat-Panel Display (FPD) The two most common FPDs used in 3D systems are LCDs and plasma panels. OLED FPDs are also becoming commercially available.Flat-panel TV Any ultrathin, relatively lightweight TV—especially those that can be wall-mounted. Current ﬂat-panel TVs use plasma or LCD screen tech- nology; OLED is expected to follow.Floating image A display where the image appears to be ﬂoating in midair, separated from any physical display screen.Format The method used to combine images for printing, storage or transmis- sion.Frame In moving picture media, whether ﬁlm or video, a frame is a complete, individual picture.Frame-compatible 3D format Left/right frames organized to ﬁt in a single legacy frame such as 480 × 720, 720 × 1280 or 1080 × 1920 pixels. The pair of images can be pixel-decimated using spatial compression, color-encoded like anaglyph, time-sequenced like page ﬂipping, and so on. .Frame rate The rate at which frames are displayed. The frame rate for movies on ﬁlm is 24 fps. Standard NTSC video has a frame rate of 30 fps (actually 60 ﬁelds per second). The frame rate of a progressive-scan video format is twice that of an interlaced-scan format. For example, interlaced formats like 480i and 1080i deliver 30 complete fps; progressive formats like 480p, 720p, and 1080p provide 60 fps .Frames per second The number of complete images delivered to the eye each second.Free viewpoint video (FVV) Video arrangement where the user can choose an own viewpoint; requires a 3D video format that allows rendering a continuum of output views or a very large number of different output views at the decoder.Frustum The frustum is the rectangular wedge that one gets if a line is drawn from the eye to each corner of the projection plane, for example, the screen.
210 GLOSSARYFrustum effect Front-to-back keystone distortion in the space-image so that a cube parallel to the lens-base is portrayed as the frustum of a regular four- sided truncated pyramid with the smaller face toward the observer. In reverse frustum distortion, the larger face is forward .Fusion The merging (by the action of the brain) of the two separate views of a stereo pair into a single three-dimensional (or Cyclopean) image.Fusional reserves A series of measures to probe how much stress the conver- gence and divergence mechanisms are able to cope with when placed under stress. This is linked to the ability to maintain good, clear, comfortable sin- gle vision while keeping control of the focusing mechanism. Analysis of the results of this test is complicated. If results are low it can be expected that dif- ﬁculty in concentrating for long periods will be experienced. Often headaches can result in prolonged periods of close work. Children in particular, but also adults, often show a tendency to avoid prolonged close work when the fusional reserves are low .Ghosting The perception of crosstalk is called ghosting. A condition that occurs when the right eye sees a portion of the left image or vice versa causing a faint double image to appear on the screen .Giantism (also known as gigantism) Jargon term for the impression of enlarged size of objects in a stereo image due to the use of a stereo base separation less than normal for the focal length of the taking lens(es). See also hypostereo .Graphics Processing Unit (GPU) A high-performance 3D processor that inte- grates the entire 3D pipeline (transformation, lighting, setup, and rendering). A GPU ofﬂoads all 3D calculations from the CPU, freeing the CPU for other functions such as physics and artiﬁcial intelligence .HDMI (High-Deﬁnition Multimedia Interface) Similar to DVI (but using much smaller connectors), the multipin HDMI interface transfers uncom- pressed digital video with HDCP copy protection and multichannel audio. Using an adapter, HDMI is backward-compatible with most current DVI con- nections, although any DVI-HDMI connection will pass only video, not audio .Headset A display device worn on the user’s head, typically using LCD tech- nology. These devices can be used in conjunction with a tracking device to create an immersive virtual reality .Height error (vertical error) A fault present in a stereogram when the two ﬁlm chips or prints are not aligned vertically in mounting, so that homologous points are at different heights .HMD Head-Mounted Display.Holography “Whole drawing.” A technique for producing an image (hologram) that conveys a sense of depth, but is not a stereogram in the usual sense of providing ﬁxed binocular parallax information. Some appear to ﬂoat in space in front of the frame, and they change perspective as one walks left or
GLOSSARY 211 right. Holograms are monochromatic, and no special viewers or glasses are necessary, although proper lighting is important. To make a hologram, lengthy exposures are required with illumination by laser beams that must be carefully set up to travel a path with precisely positioned mirrors, beam splitters, lenses, and special ﬁlm .Holoscopic Imaging See integral imaging.Homologs (Homologous points) Identical features in the left and right image points of a stereo pair. The spacing between any two homologous points in a view is referred to as the separation of the two images (which varies according to the apparent distance of the points) and this can be used in determining the correct positioning of the images when mounting as a stereo pair .Horizontal image translation (HIT) The horizontal shifting of the two image ﬁelds to change the value of the parallax of corresponding points. The term convergence has been confusingly used to denote this concept .Horopter The 3D curve that is deﬁned as the set of points in space whose images form at corresponding points in the two retinas (i.e., the imaged points have zero disparity).HUD (Head Up Display) A display device that provides an image ﬂoating in midair in front of the user.HVS Human Visual System; system by which humans perceive visual cues.Hyperstereo Use of a longer-than-normal stereo base in order to achieve the effect of enhanced stereo depth and reduced scale of a scene; it produces an effect known as Lilliputism because of the miniaturization of the subject mat- ter that appears as a result. Often used in order to reveal depth discrimination in architectural and geological features. The converse of hypostereo .Hypostereo Using a baseline that is less than the distance between the left and right eyes when taking the pictures. This exaggerates the size of the objects, making them look larger than life. It produces an effect known as Giantism. The converse of hyperstereo. A good use for this would be 3D photographs of small objects: one could make a train set look life-sized .Image splitter A device mounted on the front of a single lens that, through the use of mirrors or prisms, divides the image captured on ﬁlm into two halves, which are the two images of a stereoscopic pair. Sometimes called a frame-splitter, and often imprecisely called a beamsplitter .Image-based rendering The process of calculating virtual views on the basis of real images and assigned per pixel depth or disparity maps.Immersive A term used to describe a system that is designed to envelop the participant in a virtual world or experience.Integral imaging (integral photography) A technique that provides autostereoscopic images with full parallax. This is a 3D photography technique described in 1908 by M. G. Lippmann. In the capture subsystem an array of microlenses generates a collection of 2D elemental images onto a matrix image sensor (such as a CCD); in the reconstruction/display
212 GLOSSARY subsystem, the set of elemental images is displayed in front of a far-end microlens array, providing the viewer with a reconstructed 3D image .Integral videography A 3D imaging technique, integral photography is being extended to motion pictures.Interaxial distance The distance between the left- and right-eye lenses in a stereoscopic camera.Interaxial distance (Interaxial separation) The distance between camera lenses’ axes.Interlaced A type of video stream made up of odd and even lines (or some- times columns). Normal TV signals (such as PAL and NTSC) are interlaced signals, made up of two odd and even line images called ﬁelds. These odd and even ﬁelds can be used to store stereoscopic left and right images, a tech- nique used on 3D DVDs, although this halves the vertical resolution of the video .Interlens separation The distance between the optical centers of the two lenses of a stereo camera or stereoscope, or (in wide-base stereography) between two photographic or viewing positions. Similar to base stereo .Interocular distance The separation between the optical centers of a twin-lens stereo viewer (which may be adjustable). Not necessarily the same as the interpupillary distance of the eyes.Interocular distance The distance between an observer’s eye, about 64 mm for adults.Interocular adjustment A provision in some stereo viewers that allows for adjustment of the distance between the lenses of the viewer to correspond with the image’s inﬁnity separation and in some cases the distance between a viewer’s eyes .Interpupillary distance (IPD) (interpupillary separation, interocular separa- tion) The distance between the centers of the pupils of the eyes when vision is at inﬁnity. IPDs can range from 55 to 75 mm in adults, but the average is usually taken to be around 65 mm, the distance used for most resolving calculations and viewer designs .Inversion The visual effect achieved when the planes of depth in a stereograph are seen in reverse order; for example, when the left-hand image is seen by the right eye, and vice versa. Often referred to as pseudostereo .IR transmitter A device that sends synchronization signals to wireless shutter glasses.Keystoning Term used to describe the result arising when the ﬁlm plane in a camera or projector is not parallel to the view or screen. The perspective distortion that follows from this produces an outline of, or border to, the picture that is trapezoidal in shape, resembling the keystone of a masonry arch. In stereo, the term is applied to the taking or projecting of two images where the cameras or projectors are “toed-in” so that the principal objects coincide when viewed. The proportions of the scene will then have slight
GLOSSARY 213 differences that produce some mismatching of the outlines or borders of the two images. Gross departures from orthostereoscopic practice (e.g., by using telephoto lenses) can produce keystoning in depth; more properly called a frustum effect .Layered Depth Video (LDV) Layered depth video is a derivative and alter- native to MV + D. It uses one-color video with associated depth map and a background layer with associated depth map. The background layer includes image content that is covered by foreground objects in the main layer. LDV might be more efﬁcient than MV + D because less data have to be transmit- ted. On the other hand, additional error-prone vision tasks are included that operate on partially unreliable depth data which may increase artifacts.LCD (Liquid Crystal Display) LCD technology is one of the methods used to create ﬂat-panel TVs. Light is not created by the liquid crystals; a “backlight” behind the panel shines light through the display. The display consists of two polarizing transparent panels and a liquid crystal solution sandwiched in between. An electric current passed through the liquid causes the crystals to align so that light cannot pass through them. Each crystal acts like a shutter, either allowing light to pass through or blocking the light. The pattern of transparent and dark crystals forms the image .LCoS (Liquid Crystal on Silicon) A projection TV technology based on LCD. With LCoS, light is reﬂected from a mirror behind the LCD panel rather than passing through the panel. The control circuitry that switches the pixels on and off is embedded further down in the chip so it does not block the light; this improves brightness and contrast. This multilayered microdisplay design can be used in rear-projection TVs and projectors. TV makers use different names for their LCoS technologies: Sony uses SXRD, while JVC uses D-ILA or HD-ILA .Lenticular A method of producing a depth effect without the use of viewing equipment, using an overlay of semicylindrical (or part-cylindrical) lens-type material that exactly matches alternating left and right images on a specially produced print, thereby enabling each eye to see only one image from any viewing position, as in an autostereogram .Lenticular screen A projection screen that has embossed vertical lines for its ﬁnish rather than the “emery board” ﬁnish that is most common. They tend to cost more. The silvered version is critical to 3D projection, as any white screen will not preserve the polarization of the image reﬂected off it .Lilliputism Jargon term for the miniature model appearance resulting from using a wider-than-normal stereo base in hyperstereography .Linear polarization A form of polarized light in where the light ray remains conﬁned to a plane.Lumen The unit of measure for light output of a projector. Different man- ufacturers may rate their projectors’ light output differently. “Peak lumens” is measured by illuminating an area of about 10% of the screen size in the center of the display. This measurement ignores the reduction in brightness at
214 GLOSSARY the sides and corners of the screen. The more conservative “ANSI lumens” (American National Standards Institute) speciﬁcation is made by dividing the screen into 9 blocks, taking a reading in the center of each, and averaging the readings. This number is usually 20–25% lower than the peak lumen measurement .Luminance The brightness or black-and-white component of a color video signal. Determines the level of picture detail .Mastering Mastering (also known as content preparation) is the process of creating the (master) ﬁle package containing the movie’s images, audio, sub- titles, and metadata. Mastering standards are typically used in this process. Generally mastering is performed on behalf of the distributor/broadcaster at a mastering facility. Encryption may be added at this stage. The ﬁle package is thus ready for duplication (e.g., DVDs) or for transmission via satellite or ﬁber links.Misalignment In stereo usage, a condition where one homolog or view is higher or lower than the other. Where the misalignment is rotational in both views, there is tilt; in one view only, twist. Viewing a misaligned stereogram can result in diplopia or produce eyestrain .Monocular areas Parts of the scene in a stereo image that appear in one view and not in the other. These can be natural (if behind the stereo window) or unnatural, as in the case of ﬂoating edges (if in front of the stereo window).Monocular cues Depth cues that can be appreciated with a single eye alone such as relative size, linear perspective or motion parallax.Mount In stereo usage, a special holder or card used to secure, locate, and protect the two images of a stereo pair. The term includes any framing device or mask that may be incorporated.Mounting The process of ﬁxing the left and right views to a mask or mount (single or double) so that they are in correct register, both vertically (to avoid misalignment) and horizontally (so that the stereo view is held in correct relationship to the stereo window) .MPEG Standards developed by Moving Picture Experts Group. A type of audio/video ﬁle found on the Internet. There are three major MPEG stan- dards: MPEG-1, MPEG-2, and MPEG-4. There are also extensions in support of 3D.Multiplex The process of taking a right and left image and combining them with a multiplexing software tool or with a multiplexer to make one stereo 3D image. In a telecom context: the simultaneous carriage of streams from multiple sources or programs. A number of technologies can be employed, such as Time Division Multiplexing.Multiplexing The technique for placing the two images required for a stereo- scopic display within an existing bandwidth. In a telecom context: the simul- taneous carriage of streams from multiple sources or programs. A number of technologies can be employed, such as Time Division Multiplexing.
GLOSSARY 215Multi-View Video plus Depth (MV + D) (some also call this Multiple Video plus Depth) Advanced 3D video applications are wide-range multi-view autostereoscopic displays and free viewpoint video, where the user can chose an own viewpoint; it requires a 3D video format that allows rendering a continuum of output views or a very large number of different output views at the decoder. Multi-view video does not support a continuum and coding is increasingly inefﬁcient for a large number of views. V + D supports only a very limited continuum around the available original view since view synthesis artifact increase dramatically with the distance of the virtual viewpoint. Therefore, an MV + D representation is deﬁned for advanced 3D video applications. MV + D involves a number of complex and error-prone processing steps. Depth has to be estimated for the N views at the sender. N color with N depth videos have to be encoded and transmitted. At the receiver, the data have to be decoded and the virtual views have to be rendered. The Multi-View Video Coding (MVC) standard–developed MPEG supports this format and is capable of exploiting the correlation between the multiple views that are required to represent 3D video.Near point The feature in a stereo image that appears to be nearest to the viewer.Near point of accommodation The closest distance from the eyes that reading material can be read. This distance varies with age. It is often measured in each eye separately and both eyes together. The results are compared to one another.Near point stress The term used when close work is causing the individ- ual unacceptable stress. This is often seen when the relationship between accommodation and convergence is maintained only by excessive effort. The response to this is either a tendency to avoid close work (known as evasion) or alternatively, to use progressively more and more effort .Negative parallax Stereoscopic presentation where the optical rays intersect in front of the screen in the viewers’ space (refers to crossed disparity) .Normalized Cross Correlation (NCC) Correlation measure for detecting the most likely estimate of point correspondences in stereo vision.NTSC A type of interlaced video stream used primarily in North America. It is made up from 525 horizontal lines playing at 30 fps (or 60 ﬁelds per second).Occlusion Occlusion is the situation of blocking, as in when an object blocks light.OLED (Organic Light Emitting Diode) OLED is an up-and-coming display technology that can be used to create ﬂat-panel TVs. An OLED panel employs a series of organic thin ﬁlms placed between two transparent electrodes. An electric current causes these ﬁlms to produce a bright light. A thin-ﬁlm tran- sistor layer contains the circuitry to turn each individual pixel on and off to form an image. The organic process is called electrophosphorescence, which means the display is self-illuminating, requiring no backlight. OLED panels are thinner and lighter than current plasma or LCD HDTVs, and have lower
216 GLOSSARY power consumption. Only small screens are available at this time, but larger screens should be available by 2010 .Orthoscopic image A stereoscopic image viewed with its planes of depth in proper sequence, as opposed to an inverse (or pseudo) stereoscopic image.Orthostereoscopic image An image that appears to be correctly spaced as in the original view.Orthostereoscopical viewing When the focal length of the viewer’s lenses is equal to that of the focal length of the taking lenses of the camera in which the slides were viewed. This is said to allow one to see the objects as being exactly the same size and with the same distance between each other in the viewer as in reality .Over/under format Over/under format involves using a mirror system to sep- arate the left and right images that are placed one above the other. Special mirrored viewers are made for over/under format . For modern TV (3DTV) displays, this will be done automatically and electronically.Over-and-under A form of stereo recording (on cine ﬁlm) or viewing (of prints) in which the left and right images are positioned one above the other rather than side-by-side, and viewed with the aid of prisms or mirrors that deﬂect the light path to each eye accordingly . For modern TV (3DTV) displays, this will be done automatically and electronically.PAL A type of interlaced video stream used in Europe and other locations around the world. It is made up from 625 horizontal lines playing at 25 fps (or 50 ﬁelds per second).Panum’s fusional area Small region around the horopter where retinal dispar- ities can be fused by HVS into a single, three-dimensional image .Parallax Apparent change in the position of an object when viewed from differ- ent points. The distance between conjugate points. Generally, the differences in a scene when viewed from different points (as, photographically, between the viewﬁnder and the taking lens of a camera). In stereo, often used to describe the small relative displacements between homologs, more correctly termed deviation. The distance between corresponding points in the left- and right-eye images of a plano-stereoscopic image.Parallax angle The angle under which the optical rays of the two eyes intersect at a particular point in the 3D space.Parallax budget The range of parallax values, from maximum negative to maximum positive that is within an acceptable range for comfortable viewing.Parallax stereogram A form of autostereogram that currently describes a tech- nique in which alternate thin vertical strips of the left- and right-hand views are printed in a composite form and then overlaid with a grating (originally), or (nowadays) a lenticular sheet of cylindrical lenses that presents each view to the correct eye for viewing stereoscopically .Parallel viewing method Viewing a stereo image where the left view of a stereo image is placed on the left and the right view is placed on the
GLOSSARY 217 right. This is the way most stereocards are made as opposed to cross-eyed viewing .Passive polarized 3D glasses 3D glasses made with polarizing ﬁlters. Used in conjunction with a view screen that preserves polarized light.Passive stereo A technique whereby 3D stereoscopic imagery is achieved by polarizing the left and right images differently at source, viewed using low-cost polarizing glasses.Planar image A planar image is one contained in a two-dimensional space, but not necessarily one that appears ﬂat. It may have all the depth cues except stereopsis.Plane of convergence Depth plane where optical rays of sensor centers intersect in case of parallel camera setup.Plano-stereoscopic More exact term for describing 3D displays that achieve a binocular depth effect by providing the viewer with images of slightly different perspective at one common planar screen .Plasma Plasma technology is one of the methods used to create ﬂat-panel TVs. The display consists of two transparent glass panels with a thin layer of pixels sandwiched in between. Each pixel is composed of three gas-ﬁlled cells or subpixels (one each for the red, green, and blue primary colors). A grid of tiny electrodes applies an electric current to the individual cells, causing the gas to ionize. This ionized gas (plasma) emits a high-frequency UV ray that stimulate the cells’ phosphors, causing them to glow, that creates the TV image .Plasma Display Panel (PDP) TVs A ﬂat-panel TV that uses plasma technol- ogy.Point of convergence 3D point where optical axis of eyes or convergent cam- eras intersect.Polarization of light The division of beams of light into separate planes or vectors by means of polarizing ﬁlters.Positive parallax Stereoscopic presentation where the optical rays intersect behind the screen in the screen space (refers to uncrossed disparity).Projector A video display device that projects a large image onto a physically separate screen. The projector is typically placed on a table, or is ceiling- mounted. Projectors, sometimes referred to as front-projection systems, can display images up to 10 ft. across, or larger. Old-fashioned large, expensive CRT-based projectors have been generally replaced by compact, lightweight, lower-cost digital projectors using DLP, LCD or LCoS technology .Pseudoscopic The presentation of three-dimensional images in inverse order, so that the farthest object is seen as closest and vice versa: more correctly referred to as inversion. Achieved (either accidentally or deliberately, for effect) when the left and right images are transposed for viewing .Pulfrich effect Term now used to describe an illusory stereoscopic effect that is produced when two-dimensional images moving laterally on a single plane
218 GLOSSARY (as on a ﬁlm or television screen) are viewed at slightly different time intervals by each eye, the perceived delay between the eyes being achieved by means of reduced vision in one of them; for example, through the use of a neutral- density ﬁlter. The apparent positional displacement that results from this is interpreted by the brain as a change in the distance of the fused image. A scene is produced giving a depth effect, the depth being proportionate to the rate of movement of the object, not to the object distance. The phenomenon was ﬁrst adequately described in 1922 by Carl Pulfrich .Pulfrich stereo Stereo video taken by rolling a camera sideways at a right angle to an object. When played back, the viewer wears glasses with one eye unobstructed, and the other through a darker lens. The brain is fooled into processing frames of the video in sequence, and the result is a moving stereo image in color .Puppet-theater effect A miniaturization effect in plano-stereoscopic images that makes people look like animated puppets .RealD cinema Currently the most widely used 3D movie system in theaters, uses circular polarization—produced by a ﬁlter in front of the projector—to beam the ﬁlm onto a silver screen. The ﬁlter converts linearly polarized light into circularly polarized light. When the vertical and horizontal parts of the picture are projected onto the silver screen, the ﬁlter slows down the vertical component. This effectively makes the light appear to rotate, and it allows the viewer to more naturally move his or her head without losing perception of the 3D image. Circular polarization also eliminates the need for two projectors shooting out images in separate colors. The silver screen, in this case, helps preserve the polarization of the image .Real-time 3D graphics Real-time graphics are produced on-the-ﬂy, by a 3D graphics card. Real-time is essential if the user needs to interact with the images as in virtual reality, as opposed to watching a movie sequence.Rear-projection TV Rear projection occurs when images are projected from behind a screen. Typically referred to as “big-screen” TVs, these large-cabinet TVs generally have built-in screens measuring at least 40 in. Unlike the bulky CRT-based rear-projection TVs from years ago, today’s “tabletop” rear- projection TVs are relatively slender and light. These TVs use digital microdis- play technologies like DLP, LCD, and LCoS . The advantage of this conﬁguration is that a viewer cannot cast shadows by getting in between the projector and screen; this is particularly important when a user is interacting with images on the screen. Certain types of rigid and ﬂexible rear-projection screens can be used for stereoscopic projection.Retinal disparity Disparity perceived at the retina of the human eyes.Retinal rivalry Retinal rivalry is the simultaneous transmission of incompatible images from each eye.Rig Dual camera heads in a properly engineered mounting used to shoot stereo movies.
GLOSSARY 219Row interleaved A format to create 3D video or images in which each row or line of video alternates between the left eye and the right eye (from top to bottom).Screen space The region appearing to be within a screen or behind the surface of the screen. Images with positive parallax will appear to be in screen space. The boundary between screen and theater space is the plane of the screen and has zero parallax .Selection device The hardware used to present the appropriate image to the appropriate eye and to block the unwanted image. For 3D movies the selection device is usually eyewear used in conjunction with a device at the projector, such as a polarizing device .Sensor shift Horizontal off-center shift of stereo cameras to allow ZPS in a parallel camera setup .Separation (interaxial) The distance between two taking positions in a stereo photograph. Sometimes used to denote the distance between two homologsSeptum The partition used in a stereo camera to separate the two image paths. Any partition or design element that effectively separates the lines of sight of the eyes such that only their respective left and right images are seen by each one .Sequential stereograph A stereo pair of images made with one camera that is moved by an appropriate separation between the making of the LH and the RH exposures.Shutter glasses A device worn on the head, with two lenses generally covered in a liquid crystal material and controlled by a computer. When viewing a 3D image using these glasses, the computer displays the left image ﬁrst, while instructing the glasses to open the left eye’s “shutter” (making the liquid crystal transparent) and to close the right eye’s “shutter” (making the liquid crystal opaque). Then in a short interval—/30 or 1/60 of a second—the right image is displayed, and the glasses are instructed to reverse the shutters. This keeps up for as long as the viewer views the image. Since the time interval is so short, the brain cannot tell the difference in time, and views them simultaneously. Does not require polarized light preserving screen .Silvered screen A type of screen surface used for passive stereoscopic front projection. These screens maintain the polarization of the light introduced by polarizing ﬁlters in front of the two projector lenses.Simulcast coding The separate encoding (and transmission) of the two video scenes in the Conventional Stereo Video (CSV) format.Spinography This is done by walking around an object and taking pictures every 10–20◦ , or putting the camera on a tripod and an object on a turntable and rotating it 10–20◦ between shots. It can also be done with 3D modeling software by a computer. It does not create the same sense of depth as stere- ographics. To view spinography on a computer, one usually needs a small program for the browser called a plug-in .
220 GLOSSARYSqueeze Diminution of depth in a stereogram in relation to the other two dimensions, usually resulting from a viewing distance closer than the optimum (especially in projection). The opposite effect to stretch.Stereo Having depth, or three dimensions: used as a preﬁx to describe, or as a contraction to refer to, various stereographic or stereoscopic artifacts or phenomena. Stereo comes from the Greek stereos for hard, ﬁrm or solid and it means combining form, solid, three-dimensional. Two inputs combine to create one uniﬁed perception of three-dimensional space .Stereo acuity The ability to distinguish different planes of depth, measured by the smallest angular differences of parallax that can be resolved binocularly.Stereo inﬁnity The farthest distance at which spatial depth effects are normally discernible, usually 200 m.Stereo vision (stereoscopic vision) (stereopsis) Two eye views combine in the brain to create the visual perception of one three-dimensional image. A by- product of good binocular vision. Vision wherein the separate images from two eyes are successfully combined into one three-dimensional image in the brain .Stereo window The cone deﬁned by the camera’s focal points and the borders of the screen within the convergences plane. The viewing frame or border of a stereo pair, deﬁning a spatial plane through which the three-dimensional image can be seen beyond (or, for a special effect, “coming through”). A design feature in some stereo cameras whereby the axes of the lenses are offset slightly inwards from the axes of the ﬁlm apertures, so as to create a self-determining window in the resulting images that is usually set at around an apparent 2-m distance from the viewer. If the objects appear to be closer to the viewer than this plane, it is called breaking the window .Stereogram A general term for any arrangement of LH and RH views that produces a three-dimensional result, which may consist of (i) a side-by-side or over-and-under pair of images; (ii) superimposed images projected onto a screen; (iii) a color-coded composite (anaglyph); (iv) lenticular images; (v) a vectograph; (vi) in ﬁlm or video, alternate projected LH and RH images that fuse by means of the persistence of vision .Stereograph The original term, coined by Wheatstone, for a three-dimensional image produced by drawing; now denoting any image viewed from a stereogram. In more general but erroneous usage as the equivalent of stereogram .Stereographs (stereograms) (stereopairs) Two images made from different points of view that are side by side. When viewed with a special viewer the effect is similar to seeing the objects in reality.Stereo-photogrammetry Stereo-photogrammetry is based on the concept of stereo viewing that derives from the fact that human beings naturally view their environment in three dimensions. Each eye sees a single scene from
GLOSSARY 221 slightly different positions. The brain then “calculates” the difference and “reports” the third dimension .Stereoplexing (stereoscopic multiplexing) A means to incorporate information for the left and right perspective views into a single information channel without expansion of the bandwidth.Stereopsis The binocular depth sense, literally, “solid seeing.” The blending of stereopairs by the brain. The physiological and mental process of convert- ing the individual LH and RH images seen by the eyes into the sensation and awareness of depth in a single three-dimensional concept (or Cyclopean image) .Stereoscopic Having visible depth as well as height and width. May refer to any experience or device that is associated with binocular depth perception.Stereoscopic 3D Two photographs taken from slightly different angles that appear three-dimensional when viewed together.Stereoscopic fusion Ability of human brain to fuse the two different perspective views into a single, three-dimensional image .Stereoscopy The art and science of creating images with the depth sense stere- opsis. The reproduction of the effects of binocular vision by photographic or other graphic means. Stereography.Stretch The elongation of depth in a stereogram in relation to the other two dimensions, usually caused by viewing from more than the optimum distance, especially in projection. The opposite effect to squeeze.t In stereoscopy, t is used to denote the distance between the eyes, called the interpupillary or interocular distance. tc is used to denote the distance between stereoscopic camera heads’ lens axes and is called the interaxial distance .Time multiplexing of the display (Sometimes also called “interlaced stereo”.) Content shown with consecutive left and right signals and shuttered glasses. This technology is applicable to 3DTV. This technique is still used for movie theaters today, such as the Imax, and sometimes used in conjunction with polarization plane separation. In a Cathode Ray Tube (CRT) environment, a major shortcoming of the interlaced stereo was image ﬂicker, since each eye would see only 25 or 30 images per second, rather than 50 or 60. To overcome this, the display rate could be doubled to 100 or 120 Hz to allow ﬂicker-free reception.Toeing-in The technique of causing the optical axes of twin planar cameras to converge at a distance point equivalent to that of a desired stereo window, so that the borders of the images are coincident at that distance (apart from any keystoning that results) .Tracking A 3D tracking system is used in virtual reality in order for the com- puter to track the participant’s head and hands.
222 GLOSSARYTranscoding In this context, the process of converting one 3D video format into another. Example ﬁeld sequential 3D video into column-interleaved image data.Transposition The changing over of the inverted images produced by a stereo camera to the upright and left/right presentation necessary for normal viewing. May be achieved optically by means of a transposing camera or viewer, or mechanically by means of a special printing frame, as well as manually during the mounting of images .Twin camera stereo photography Stereo photography using two monoscopic cameras, usually with shutters and other components connected internally or externally using mechanical or electronic means. This photography has advan- tages that include using common formats (e.g., full frame, medium format) and being able to achieve a variable stereo base. Drawbacks include difﬁculty matching cameras, ﬁlm, and getting normal stereo bases. Camera bars can be used to help achieve more consistent results .Uncrossed disparity Retinal or camera disparities where the optical rays inter- sect behind the horopter or the convergence plane.Vectograph A form of polarization-coded stereogram (originally devised by the Polaroid company) in which the images are mounted on the front and rear surfaces of a transparent base, and are viewed by polarized light or through polarized ﬁlters. The polarized equivalent of an anaglyph stereogram .Video plus Depth (V + D) The video plus depth (V + D) representation consists of a video signal and a per pixel depth map. (This is also called 2D-plus-depth by some and color plus depth by others). Per pixel depth data is usually generated from calibrated stereo or multi-view video by depth estimation and can be regarded as a monochromatic, luminance-only video signal. The depth range is restricted to a range in between two extremes znear and zfar indicating the minimum and maximum distance of the corresponding 3D point from the camera respectively. Typically, the depth range is quantized with 8 bit, associating the closest point with the value 255 and the most distant point with the value 0. With that, the depth map is speciﬁed as a grayscale image, which can be fed into the luminance channel of a video signal and then be processed by any state-of-the-art video codec. For displaying V + D at the decoder, a stereo pair can be rendered from the video and depth information by 3D warping with camera geometry information.Vieth–Muller circle See horopter.Viewer space The region between the viewer and the display screen surface. Objects will be perceived in this region if they have negative parallax.Viewing angle Measures a video display’s maximum usable viewing range from the center of the screen, with 180◦ being the theoretical maximum. Most often, the horizontal (side to side) viewing angle is listed but sometimes, both horizontal and vertical viewing angles are provided. For most home theater setups, horizontal viewing angle is more critical .
GLOSSARY 223Virtual reality A system of computer-generated 3D images (still or moving) viewed by means of a headset linked to the computer that incorporates left- eye and right-eye electronic displays. The controlling software programs often enables the viewer to move interactively within the environment or “see” 360◦ around a scene by turning the head, and also to “grasp” virtual objects in the scene by means of an electronically linked glove. Although they allow one to see all sides of an object by rotating it, one is still seeing only two dimensions at a time .Vision The act of perceiving and interpreting visual information with the eyes, mind, and body.Widescreen When used to describe a TV, widescreen generally refers to an aspect ratio of ratio16:9, which is the optimum ratio for viewing anamorphic DVDs and HDTV broadcasts .Window The stereo window corresponds to the screen surround unless ﬂoating windows are used.Z -buffer The area of the graphics memory used to store the Z or depth informa- tion about rendered objects. The Z -buffer value of a pixel is used to determine if it is behind or in front of another pixel. Z calculations prevent background objects from overwriting foreground objects in the frame buffer .REFERENCES1. Kindig S. TV and HDTV Glossary, Dec 02, 2009, Crutchﬁeld, Charlottesville (VA).2. The 3D@Home Consortium. http://www.3dathome.org/.3. Kauff P, M¨ ller M, et al. ICT- 215075 3D4YOU, Deliverable D2.1.2: Requirements on u post-production and formats conversion. Aug 2008. This reference is uncopyrighted.4. Matsushima K, Nakahara S. Extremely high-deﬁnition full-parallax computer- generated hologram created by the polygon-based method. J Opt Soc Am 2009; 48(34): H54–H83.5. Chen BX. Wired Explains: How 3D Movie Projection Works. Wired Online Magazine. Dec 21 2009.
INDEX1080p, 201 3DTV Generation 5, 202-Camera Rig, 3 3DTV Generations, 202D + Depth, 74, 77 3DTV Streaming Transport Over IP, 872D in Conjunction with Metadata (2D + M), 3DTV System, 3 59,60 3DV Coding, 1922D Plus Depth, 23, 74 3DV R&D, 1643D Blu-Ray Disc Logo, 133D Camcorder with Dual Lenses, 4 Access, 1253D Consortium (3DC), 190 Accommodation, 34, 353D Content Creation, 191 Accommodation/Convergence Conﬂict, 323D Content, 1, 41 Address Space, 1433D Display, 8 Adoption of 3DTV in the Marketplace, 63D Encoding, 5 ADSL2+ (Asymmetric Digital Subscriber Line,3D Home Display, 4 823D Image Capture, 197 Advanced Television Systems Committee3D Image Warping, 61 (ATSC), 1043D Imaging Mobile Phone, 196 Advanced Three-Dimensional Television System3D Plasma/LCD TV (unpolarized), 11 Technologies (ATTEST), 1913D Programming, 8 Anaglyph Method, 73, 2033D Rig, 47 Anamorphic Video, 2033D Video Format, 63 APON, 1293D@Home Consortium, 190 Appliances (glasses), 93D4YOU, 192 Asymmetrical quality, 573D-Compatible TV, 53 ATSC, 1043DPHONE, 196 Authentication Header (AH), 1383DTV Broadcasting, 185 Automatic Tunneling, 1543DTV Delivery Environment Using a Satellite Autostereographic Approaches, 36, 42 DTH Infrastructure, 85 Autostereoscopic screen (lenticular or barrier), 123DTV Delivery Environment Using IPTV, 84 Autostereoscopic TV, 7, 8, 2043DTV Delivery Environment Using Over-the-Air Infrastructure, 86 Barrier approach, 9, 123DTV Delivery Environment Using the Cable Binocular Cues, 29 TV Infrastructure, 85 Binocular Rivalry, 323DTV Delivery Environment Using the Internet, Binocular Suppression, 58 86 Binocular, 333DTV Distribution Ecosystem, 21 Block-Level View of a DVB-H Network, 973DTV Generation 2, 20 Blu-Ray 3D Speciﬁcation, 189, 1903DTV Generation 3, 20 Blu-Ray Disc Association (BDA), 13, 1893DTV Generation 4, 20 Blu-ray Discs (BDs), 1, 813DTV Content Capture, Encoding and Transmission: Building the TransportInfrastructure for Commercial Services, by Daniel MinoliCopyright 2010 John Wiley & Sons, Inc. 225
226 INDEXBPON, 128, 129 Depth-Imaged-Based Rendering (DIBR)British Channel 4, 1 Techniques, 60, 179Broadcast Operation, 115 DES, 62, 68Broadcast Satellite Service (BSS), 92 DIBR, 60, 179 Digital Rights Management (DRM), 94, 120Camera Convergence, 38 Digital Subscriber Line (DSL), 126Camera Separation, 47 Digital Television (DTV), 7, 14Camera-Based 3D, 41 Digital Video Broadcast–Handheld (DVB-H),Capture, 48, 49, 63 15, 87, 96, 132CEA Industry Forums, 188 Digital Video Broadcast Satellite SecondCEA-861-E, 188 Generation (DVB-S2), 15Chromatic Stereoscopy, 204 Diplopia, 33Coding, 48 Direct-To-Home (DTH), 14Color Encoding (Anaglyph), 73 DIRECTV, 1, 8 Display, 49Color Plus Depth, 23, 74 Distance Vector Multicast Routing ProtocolColorimetric Arrangements, 9 (DVMRP), 118Comcast, 10 DTH, 81, 90Compatibility of Systems, 18 DTV Proﬁle for Uncompressed High-SpeedCompression for Conventional Stereo Video Digital Interfaces, 188 (CSV), 55 DTV, 7, 14Compression Techniques, 41, 49 DVB (Digital Video Broadcasting), 90,104Computer-Generated Holograms (CGHs), 205 DVB 3DTV Kick-Off Workshop, 187Computer-Generated Imagery (CGI), 41 DVB Architecture, 81Conditional Access (CA), 114 DVB Cable Video Distributing (DVB-C), 91Conditional Access Systems (CAS), 116 DVB, 164Conﬁgured Tunneling, 154 DVB-H Framework, 97Congestion Control Methods, 89 DVB-H standards, 15, 87, 96, 132Consumer Electronics Association (CEA), 3 DVB-H, 15, 87, 96, 132Consumer Electronics Show (CES) International DVB-S, 91, 92, 95 Trade Show, 13 DVB-S2, 91, 106Content Aggregation, 119, 124Content Aggregators, 120 Ease of Viewing, 18Content Encoding, 119 ECM (Entitlement Control Message),Content Provider, 120 121Conventional Stereo Video (CSV), 17, 22, 205 EMM (Entitlement Management Message), 121Convergence, 29, 35, 206 Encapsulating Security Payload (ESP) Header,Conversion of 2D Material, 41 151Core-Based Trees (CBT), 118 Encapsulating Security Protocol (ESP) ExtensionCorresponding Points, 31 Headers, 138Crossed Disparity, 29, 33 Encapsulation of a Subnetwork IPv4 or IPv6Cross-Layer Error Robustness, 89 PDU to form an MPEG-2 Payload Unit,Crosstalk, 32, 206 110CSV, 55 Encapsulation, 106, 108,111 End-to-End 3DTV System, 50Datacasting, 115 End-to-End Multi-View 3DV, 163Dense-Mode (DM) Protocols, 118 Epochs of 3DTV Commercial Deployment, 19Deployment of IPTV, 115 EPON, 129Depth Cues, 33 Error Concealment, 89Depth Mapping, 41 ESPN, 1Depth Range, 30 European Information Society TechnologiesDepth, 37 (IST) Project, 191Depth/Binocular Cues, 33 European Telecommunications StandardsDepth-Enhanced Stereo (DES), 62, 68 Institute (ETSI), 91
INDEX 227Explicit Coding, 74 ICANN, 160Eye Problems, 39 IETF, 160Eyewear, 2 IGMP (Internet Group Management Protocol), 82 IGMP v2 Message Format, 117, 119Fiber to the Neighborhood (FTTN), 126 Image Pair, 52Fiber-to-the-Curb (FTTC), 126 Implicit Coding, 74Field of Depth, 208 In-Home Formats, 202Fine Grating of Liquid Crystal, 44 Inner Encryption, 116First Generation 3DTV, 18 Integral Imaging 3DTV, 7, 8,19, 42, 211Fixed Satellite Service (FSS), 92 Interaxial Distance, 38Fixed Wireless WiMAX, 126 Interaxial Separation, 47Flow Labeling, 134 Interlaced, 212Frame-Compatible 3D Format, 209 Interleaving Formats, 52Frame-Compatible, 53 Internet Group Management Protocol (IGMP), 82 Internet Protocol (IP), 15, 81Free Viewpoint Video (FVV), 209 Internet Protocol Television (IPTV), 1Front Projection TV (FPTV), 11 Internet Service Providers (ISPs), 2FTTH Capabilities, 82 Interocular Distance, 29, 30FTV System and Data Format, 179 Interpupillary Distance, 50FTV, 179 IP Encapsulator, 105Fusion of Left-Eye-Right-Eye Images, 31 IP Multicast, 113, 117FVT/FVV, 74, 75 IP-Based Transport, 88 IPsec, 143GE-PON, 129 IPTV, 14, 90, 92, 104, 113Glasses-Based Displays with Active Shutter IPTV, Europe, 114, 115 Glasses, 69 IPTV, US, 114, 115Glasses-Based Displays with Passive Glasses, IPv4 Multicast Tunneling, 154 69 IPv4 Networks, 139Glasses-Based Displays—Anaglyph, 69 IPv6 Address, 134GPON, 128, 129 IPv6 Base Header, 147Grayscale, 61 IPv6 Extension Headers, 146, 152 IPv6 in Carrier Networks, 156H.264 Advanced Video Coding (AVC), 14 IPv6 Packet, 145H.264 AVC Codecs, 166 IPv6 Security, 134H.264 AVC Scalable Video Coding, 56 IPv6, 81, 132, 139, 141H.264 AVC Simulcast, 80 IPv6-over-IPv4 Tunneling, 154H.264 AVC Stereo SEI Message, 80 ISO/IEC 23002–3 Auxiliary Video DataH.264 AVC Stereo Supplemental Enhancement Representations, 180 Information (SEI), 56 ISO/IEC, 23002–3 77H.264 MPEG-4 AVC, 75 ISO/IEC FDIS, 57HDMI Licensing, 189 ISO-IEC standards, 15HDTV, 7,14, 113 ISPs, 160High Deﬁnition Television (HDTV), 7,14, 113 ITU-R BT.1198, 186History of 3D, 6 ITU-T VCEG, 75Holograms, 184 ITU-T Video Coding Experts Group (VCEG), 55Holographic 3DTV, 7Holography, 8, 42 Key DVB Transmission Speciﬁcations, 93Holoscopic imaging (Integral Imaging), 8Home Master, 183 Layered Depth Video (LDV) example, 66Horizontal Parallax, 38 Layered Depth Video (LDV), 23, 65, 68, 166,Horizontal Shifting, 41 213Horopter, 211 LCD (Liquid Crystal Display), 10, 13, 213Human Visual System, 29 LCD 3DTV (polarized display), 11Hybrid Fiber Coax (HFC), 126 LDV, 23, 65, 68, 213