Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

A Journey Towards Fully Immersive Media Access

347 views

Published on

Universal media access as proposed almost two decades ago is now reality. We can generate, distribute, share, and consume any media content, anywhere, anytime, and with/on any device. A technical breakthrough was the adaptive streaming over HTTP resulting in the standardization of MPEG-DASH, which is now successfully deployed in a plethora of environments. The next big thing in adaptive media streaming is virtual reality applications and, specifically, omnidirectional (360°) media streaming, which is currently built on top of the existing adaptive streaming ecosystems. This tutorial provides a detailed overview of adaptive streaming of both traditional and omnidirectional media. The tutorial focuses on the basic principles and paradigms for adaptive streaming as well as on already deployed content generation, distribution, and consumption workflows. Additionally, the tutorial provides insights into standards and emerging technologies in the adaptive streaming space. Finally, the tutorial includes the latest approaches for immersive media streaming enabling 6DoF DASH through Point Cloud Compression (PCC) and concludes with open research issues and industry efforts in this domain. More information available at: https://multimediacommunication.blogspot.com/2019/07/acmmm19-tutorial-journey-towards-fully.html

Published in: Technology
  • Be the first to comment

  • Be the first to like this

A Journey Towards Fully Immersive Media Access

  1. 1. ACM Multimedia 2019 Tutorial – Nice, France, October 21, 2019 A Journey Towards Fully Immersive Media Access Christian Timmerer and Ali C. Begen http://www.slideshare.net/christian.timmerer
  2. 2. Upon Attending This Tutorial, You Will Know About • Principles of HTTP adaptive streaming for the Web/HTML5 • Principles of omnidirectional (360°) media delivery • Content generation/distribution/consumption workflows for traditional and omnidirectional media • Standards and emerging technologies in the adaptive streaming space • Current and future research on traditional and omnidirectional media delivery, specifically enabling 6DoF adaptive streaming through point cloud compression This tutorial is public, however, some material might be copyrighted Use proper citation when using content from this tutorial (Thanks to T. Stockhammer, J. Simmons, K. Hughes, C. Concolato, S. Pham, W. Law, N. Weil, P. Chou, R. Schatz, J. van der Hooft and many others for helping with the material) ACM Multimedia Tutorial — October 2019 2
  3. 3. Ali C. Begen • Electrical engineering degree from Bilkent University (2001) • Ph.D. degree from Georgia Tech (2006) – Video delivery and multimedia communications • Research, development and standards at Cisco (2007-2015) – IPTV, content delivery, software clients – Transport and distribution over IP networks – Enterprise video • Consulting at Networked Media since 2016 • Assistant professor at OzU and IEEE Distinguished Lecturer since 2016 • Further information http://ali.begen.net Christian Timmerer • Associate professor at the Institute of Information Technology (ITEC), Multimedia Communication Group (MMC), Alpen-Adria-Universität Klagenfurt, Austria • Co-founder and CIO of Bitmovin • Research interests – Immersive multimedia communication – Streaming, adaptation, and – Quality of experience (QoE) • Blog: http://blog.timmerer.com; @timse7 • New research project ATHENA: https://athena.itec.aau.at/ Presenters Today ACM Multimedia Tutorial — October 2019 3
  4. 4. • Two grand challenges – Improving open-source HEVC encoding – Low-latency live streaming • Focus areas in 2020 – Machine learning and statistical modeling for video streaming – Volumetric media: from capture to consumption – Fake media and tools for preventing illegal broadcasts • A workshop (posters/demos) dedicated to middle and high-school students • Two confirmed keynotes from Google and MIT • Expecting reduced registration fees thanks to strong support Important Dates Submit by Research Track Jan. 10 (firm) Demo Track Feb. 29 Open Source/Dataset Feb. 29 Workshops Mar. 27 Conference June 8-11 NEW Visit http://acmmmsys.org today! NEW
  5. 5. Topics to Cover • Part 1 – fundamentals of adaptive streaming – HTML5 video and media extensions – Survey of well-established streaming solutions – Multi-bitrate encoding, encapsulation, and encryption workflows – MPEG-DASH and MPEG-CMAF – Common issues in scaling and improving quality • Part 2 – omnidirectional (360-degree) media towards fully immersive media access: – Acquisition, projection, coding and packaging of 360-degree video – Delivery, decoding and rendering methods – Overview of the MPEG-OMAF and MPEG-I standards – Ongoing industry efforts and research trends, specifically towards 6DoF adaptive streaming ACM Multimedia Tutorial — October 2019 5
  6. 6. IPTV vs. IP (Over-the-Top) Video First Things First IPTV IP Video Best-effort delivery Quality not guaranteed Mostly on demand Paid or ad-based service Managed delivery Emphasis on quality Mostly linear TV Always a paid service ACM Multimedia Tutorial — October 2019 6
  7. 7. Part I: Adaptive Streaming HTTP Adaptive Streaming and Workflows
  8. 8. Internet (IP aka OTT) Video Essentials Reach all connected devicesReach Enable live and on-demand delivery to the mass marketScale Provide TV-like consistent rich viewer experienceQuality of Experience Enable revenue generation thru paid content, subscriptions, ads, etc.Business Satisfy regulations such as captioning, ratings and parental controlRegulatory ACM Multimedia Tutorial — October 2019 8
  9. 9. Creating Revenue – Attracting Eye Balls • High-end content – Hollywood movies, TV shows – Sports • Excellent quality – HD/3D/UHD audiovisual presentation w/o artifacts such as pixelization and rebuffering – Fast startup, fast zapping and low glass-to-glass delay • Usability – Navigation, content discovery, battery consumption, trick modes, social network integration • Service flexibility – Linear TV – Time-shifted and on-demand services • Reach – Any device, any time ACM Multimedia Tutorial — October 2019 9
  10. 10. One Request à One Response Progressive Download HTTP Request HTTP Response ACM Multimedia Tutorial — October 2019 10 Playback starts only after there is several seconds of data in the playback buffer Download will continue as fast as possible Fetched content will be wasted if the viewer clicks away Can seek only throughout the fetched content
  11. 11. ACM Multimedia Tutorial — October 2019 11 What is Streaming? Streaming Sounds Cooler! Client consumption rate is also limited by real-time constraints as opposed to just bandwidth availability That is, client cannot fetch content not available yet Server transmission rate (loosely or tightly) matches to client consumption rate That is, no buffer overrun or underrun is acceptable Streaming is transmission of a continuous content from a server to a client and its simultaneous consumption by the client
  12. 12. Streaming is More Viewer Friendly ACM Multimedia Tutorial — October 2019 12 Playback starts when there is just few seconds of data Download rate will match the encoding bitrate and downloading pauses if the player pauses à Less waste Can seek to anywhere in the entire content
  13. 13. Video Delivery over HTTP • Enables playback while still downloading • Server sends the file as fast as possible Progressive Download • Enables seeking via media indexing • Server paces transmission based on encoding rate Pseudo Streaming • Content is divided into short-duration chunks • Enables live streaming and ad insertion Chunked Streaming • Multiple versions of the content are created • Enables to adapt to network and device conditions Adaptive Streaming ACM Multimedia Tutorial — October 2019 13
  14. 14. ACM Multimedia Tutorial — October 2019 14 In a nutshell… Over-The-Top – Adaptive Media Streaming Adaptation logic is within the client, not normatively specified by the standard, subject to research and development
  15. 15. Adaptive Streaming over HTTP Decoding and Presentation Streaming Client Media Buffer Content Ingest (Live or Pre-captured) Multi-rate Encoder Packager Origin (HTTP) Server … … … … ServerStorage HTTP GET Request Response ACM Multimedia Tutorial — October 2019 15
  16. 16. Adapt Video to the Web rather than Changing the Web HTTP Adaptive Streaming (HAS) • Why HTTP – Features well-understood naming/addressing and authentication/authorization infrastructure – Provides easy traversal for all kinds of middleboxes (e.g., NATs, firewalls) – Enables cloud access, leverages the existing (cheap) HTTP caching infrastructure in the CDNs • Imitation of streaming via short downloads – Downloads small chunks to minimize waste – Enables monitoring consumption and tracking streaming clients • Adaptation to varying conditions on the network and different device capabilities • Improved Quality of Experience – Enables shorter stream start time upon zapping or seeking – Reduces skips and freezes ACM Multimedia Tutorial — October 2019 16 The cure may be worse than the disease if you are not careful enough
  17. 17. Stalls, Slow Start-Up, Plug-In and DRM Issues Common Annoyances in Streaming • Unsupported/wrong – protocol – plug-in – codec – format – DRM • Slow start-up • Poor quality, quality variation • Frequent freezes/glitches • Lack of seeking ACM Multimedia Tutorial — October 2019 17
  18. 18. Dead, Surviving, Maturing and Newborn Technologies • Move Adaptive Stream (Long gone, but some components are in Slingbox) – http://www.movenetworks.com • Microsoft Smooth Streaming (Legacy) – http://www.iis.net/expand/SmoothStreaming • Adobe Flash (Almost dead) – http://www.adobe.com/products/flashplayer.html • Adobe HTTP Dynamic Streaming (Legacy) – http://www.adobe.com/products/httpdynamicstreaming • Apple HTTP Live Streaming (The elephant in the room) – https://tools.ietf.org/html/rfc8216 – https://datatracker.ietf.org/doc/draft-pantos-hls-rfc8216bis • MPEG DASH and CMAF (The standards) – http://mpeg.chiariglione.org/standards/mpeg-dash – http://mpeg.chiariglione.org/standards/mpeg-a/common-media-application-format ACM Multimedia Tutorial — October 2019 18
  19. 19. List of Accessible Segments and Their Timings An Example DASH Template-Based Manifest MPD Period id = 1 start = 0 s Period id = 3 start = 300 s Period id = 4 start = 850 s Period id = 2 start = 100 s Adaptation Set 0 subtitle turkish Adaptation Set 2 audio english Adaptation Set 1 BaseURL=http://abr.rocks.com/ Representation 2 Rate = 1 Mbps Representation 4 Rate = 3 Mbps Representation 1 Rate = 500 Kbps Representation 3 Rate = 2 Mbps Resolution = 720p Segment Info Duration = 10 s Template: 3/$Number$.mp4 Segment Access Initialization Segment http://abr.rocks.com/3/0.mp4 Media Segment 1 start = 0 s http://abr.rocks.com/3/1.mp4 Media Segment 2 start = 10 s http://abr.rocks.com/3/2.mp4 Adaptation Set 3 audio italian Adaptation Set 1 video Period id = 2 start = 100 s Representation 3 Rate = 2 Mbps Selection of components/tracks Well-defined media format Selection of representations Splicing of arbitrary content like ads Chunks with addresses and timing ACM Multimedia Tutorial — October 2019 19
  20. 20. An Example HLS Playlist-Based Manifest #EXTM3U #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=232370,CODECS="mp4a.40.2, avc1.4d4015" gear1/prog_index.m3u8 #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=649879,CODECS="mp4a.40.2, avc1.4d401e" gear2/prog_index.m3u8 #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=41457,CODECS="mp4a.40.2" gear0/prog_index.m3u8 master.m3u8 Source: https://developer.apple.com/streaming/examples/ and https://www.gpac-licensing.com/2014/12/01/apple-hls-comparing-versions/ #EXTM3U #EXT-X-TARGETDURATION:10 #EXT-X-VERSION:3 #EXT-X-MEDIA-SEQUENCE:0 #EXT-X-PLAYLIST-TYPE:VOD #EXTINF:9.97667, fileSequence0.ts #EXTINF:9.97667, fileSequence1.ts #EXTINF:9.97667, fileSequence2.ts . . . #EXT-X-ENDLIST gear1/prog_index.m3u8 ACM Multimedia Tutorial — October 2019 20
  21. 21. Example Representations Encoding Bitrate Resolution Rep. #1 3.45 Mbps 1280 x 720 Rep. #2 2.2 Mbps 960 x 540 Rep. #3 1.4 Mbps 960 x 540 Rep. #4 900 Kbps 512 x 288 Rep. #5 600 Kbps 512 x 288 Rep. #6 400 Kbps 340 x 192 Rep. #7 200 Kbps 340 x 192 Source: Vertigo MIX10, Alex Zambelli’s Streaming Media Blog, Akamai, Comcast Vancouver 2010 Sochi 2014 Encoding Bitrate Resolution Rep. #1 3.45 Mbps 1280 x 720 Rep. #2 1.95 Mbps 848 x 480 Rep. #3 1.25 Mbps 640 x 360 Rep. #4 900 Kbps 512 x 288 Rep. #5 600 Kbps 400 x 224 Rep. #6 400 Kbps 312 x 176 Encoding Bitrate Resolution Rep. #1 18 Mbps 4K (60p) Rep. #2 12.2 Mbps 2560x1440 (60p) Rep. #3 4.7 Mbps 2K (60p) Rep. #4 3.5 Mbps 1280x720 (60p) Rep. #5 2 Mbps 1280 x 720 Rep. #6 1.2 Mbps 768 x 432 Rep. #7 750 Kbps 640 x 360 Rep. #8 500 Kbps 512 x 288 Rep. #9 300 Kbps 320 x 180 Rep. #10 200 Kbps 320 x 180 PyeongChang 2018 ACM Multimedia Tutorial — October 2019 21
  22. 22. Smart and Selfish Clients HAS Working Principle - Client fetches and parses the manifest - Client uses the OS-provided HTTP stack (HTTP/1.1/2 runs over TCP, HTTP/2/3 runs over QUIC) - Client uses the required decryption tools for the protected content Client monitors and measures - Size of the playout buffer (both in bytes and seconds) - Chunk download times and throughput - Local resources (CPU, memory, window size, etc.) - Dropped frames Client performs adaptation Request Response HTTPServer Client Client measures and reports metrics for analytics (One can also multicast media segments) ACM Multimedia Tutorial — October 2019 22
  23. 23. Tradeoffs in Adaptive Streaming User experience Overall quality Quality stability Stalls Zapping/seeking time Live latency ACM Multimedia Tutorial — October 2019 23
  24. 24. End-to-End Workflow for OTT Production Preparation and Staging Distribution Consumption News Gathering Sport Events Premium Content Studio Multi-bitrate Encoding Encapsulation Protection Origin Servers VoD Content & Manifests Live Content & Manifests CDN ACM Multimedia Tutorial — October 2019 24
  25. 25. Example Platform/Infrastructure https://bitmovin.com/ Bitmovin End-to-End Workflow
  26. 26. Part I: Streaming HTML5 Video and Media Extensions
  27. 27. • HTML5 is a set of technologies that allows more powerful Web sites and applications – Better semantics – Better connectivity – Offline and storage – Multimedia – 2D/3D graphics and effects – Performance and integration – Device access – Styling • Most interesting new elements – Semantic elements: <header>, <footer>, <article> and <section> – Graphic elements: <svg> and <canvas> – Multimedia elements: <audio> and <video> A Common Platform across Devices HTML5 Source: Mozilla MDN, w3schools.com ACM Multimedia Tutorial — October 2019 27
  28. 28. • These apps can – leverage native APIs and offline capability – provide cross-device compatibility (iOS, Windows, Android, etc.) – be packaged and published to app stores • For streaming: – To a user, it looks like a regular native app they have downloaded – The app is actually a container and loads loads dynamically from the web server – Server-side changes propagate automatically to installed apps Hosted and Progressive Web Apps Source: Google PWA, Microsoft HWA ACM Multimedia Tutorial — October 2019 28
  29. 29. Types of Browser-Based Playback Source: CTA WAVE Type 1: Minimum control architectural model for HTML5 support of adaptive streaming where manifest and heuristics are managed by the user agent Type 2: Adaptation control architectural model for HTML5 support of adaptive streaming providing script manageable features Type 3: Full media control architectural model for HTML5 support of adaptive streaming allowing script to explicitly send the media segments (MSE + EME) ACM Multimedia Tutorial — October 2019 29
  30. 30. W3C Media Source Extensions Source: https://www.w3.org/TR/media-source/ ACM Multimedia Tutorial — October 2019 30
  31. 31. MSE Support in Web Browsers (as of October’19) ACM Multimedia Tutorial — October 2019 31Source: http://caniuse.com/#search=mse
  32. 32. Content Protection – Famous Quotes Digital files cannot be made uncopiable, any more than water can be made not wet Bruce Schneier (cryptographer), 2001 We have Ph.D.'s here that know the stuff cold, and we don't believe it's possible to protect digital content Steve Jobs, 2003 If we’re still talking about DRM in five years, please take me out and shoot me eMusic CEO David Pakman, 2007 ACM Multimedia Tutorial — October 2019 32
  33. 33. W3C Encrypted Media Extensions Source: https://www.w3.org/TR/encrypted-media/ ACM Multimedia Tutorial — October 2019 33
  34. 34. EME Support in Web Browsers (as of October’19) Source: http://caniuse.com/#search=eme ACM Multimedia Tutorial — October 2019 34
  35. 35. Web Video Ecosystem Encoding Encryption Rights Expression Web Video App Framework Decoding Decryption Rights Management The fundamental Digital Rights Management problem derives from a lack of interoperability which prevents mobility of experience The solution is to combine interoperable commercial Web video content with a cross- platform Web video app framework Source: John Simmons ACM Multimedia Tutorial — October 2019 35
  36. 36. Contrary to a common misconception, with EME DRM functionality is not in the HTML/JS app There is no DRM in HTML5 with EME, and ECP requires that this be the case The HTML/JS app selects the DRM and controls key exchange between DRM client and server Browser extends HTML5 media element to allow JavaScript handled key acquisition Browser A CDM exposes a key system to JavaScript; it is transparent whether the CDM is in the browser The Web, EME and Enhanced Content Protection (ECP) Source: John Simmons ACM Multimedia Tutorial — October 2019 36
  37. 37. Part I: Streaming Overview of Streaming Standards (DASH, CMAF)
  38. 38. Dead, Surviving, Maturing and Newborn Technologies • Move Adaptive Stream (Long gone, but some components are in Slingbox) – http://www.movenetworks.com • Microsoft Smooth Streaming (Legacy) – http://www.iis.net/expand/SmoothStreaming • Adobe Flash (Almost dead) – http://www.adobe.com/products/flashplayer.html • Adobe HTTP Dynamic Streaming (Legacy) – http://www.adobe.com/products/httpdynamicstreaming • Apple HTTP Live Streaming (The elephant in the room) – https://tools.ietf.org/html/rfc8216 – https://datatracker.ietf.org/doc/draft-pantos-hls-rfc8216bis • MPEG DASH and CMAF (The standards) – http://mpeg.chiariglione.org/standards/mpeg-dash – http://mpeg.chiariglione.org/standards/mpeg-a/common-media-application-format ACM Multimedia Tutorial — October 2019 38
  39. 39. • Fragmented architectures – Advertising, DRM, metadata, blackouts, etc. • Investing in more hardware and software – Increased CapEx and OpEx • Lack of consistent analytics • Preparing and delivering each asset in several incompatible formats – Higher storage and transport costs • Confusion due to the lack of skills to troubleshoot problems • Lack of common experience across devices for the same service – Tricks, captions, subtitles, ads, etc. What does Fragmentation Mean? Higher Costs Less Scalability Smaller Reach Frustration Skepticism Slow Adoption ACM Multimedia Tutorial — October 2019 39
  40. 40. DASH intends to be to the Internet world … what MPEG2-TS has been to the broadcast world ACM Multimedia Tutorial — October 2019 40
  41. 41. DASH intendsed to be to the Internet world … what MPEG2-TS has been to the broadcast world ACM Multimedia Tutorial — October 2019 41
  42. 42. Dynamic Adaptive Streaming over HTTP (DASH) • DASH is not – system, protocol, presentation, codec, interactivity, DRM, client specification • DASH is an enabler – It provides formats to enable efficient and high-quality delivery of streaming services – System definition is left to other organizations • Design choices – Reuse the existing technologies (containers, codecs, DRM, etc.) – Enable deployment on top of CDNs – Move intelligence from network to client, enable client differentiation – Provide simple interoperability points (profiles) ACM Multimedia Tutorial — October 2019 42
  43. 43. Scope of DASH: what is specified? ACM Multimedia Tutorial — October 2019 43 Media Presentation on HTTP Server DASH-enabled ClientMedia Presentation Description . . . Segment … . . .Segment … . . . Segment … . . .Segment … … Segments located by HTTP-URLs DASH Control Engine HTTP/1.1 HTTP Client MPD Parser Media Engine On-time HTTP requests to segments
  44. 44. Scope of DASH: what is specified? ACM Multimedia Tutorial — October 2019 44 Media Presentation on HTTP Server DASH-enabled ClientMedia Presentation Description . . . Segment … . . .Segment … . . . Segment … . . .Segment … … Segments located by HTTP-URLs DASH Control Engine HTTP/1.1 HTTP Client MPD Parser Media Engine On-time HTTP requests to segments
  45. 45. List of Accessible Segments and Their Timings The MPEG-DASH Data Model MPD Period id = 1 start = 0 s Period id = 3 start = 300 s Period id = 4 start = 850 s Period id = 2 start = 100 s Adaptation Set 0 subtitle turkish Adaptation Set 2 audio english Adaptation Set 1 BaseURL=http://abr.rocks.com/ Representation 2 Rate = 1 Mbps Representation 4 Rate = 3 Mbps Representation 1 Rate = 500 Kbps Representation 3 Rate = 2 Mbps Resolution = 720p Segment Info Duration = 10 s Template: 3/$Number$.mp4 Segment Access Initialization Segment http://abr.rocks.com/3/0.mp4 Media Segment 1 start = 0 s http://abr.rocks.com/3/1.mp4 Media Segment 2 start = 10 s http://abr.rocks.com/3/2.mp4 Adaptation Set 3 audio italian Adaptation Set 1 video Period id = 2 start = 100 s Representation 3 Rate = 2 Mbps Selection of components/tracks Well-defined media format Selection of representations Splicing of arbitrary content like ads Chunks with addresses and timing ACM Multimedia Tutorial — October 2019 45
  46. 46. MPEG-DASH Status (10/2019) AMD2 • SRD • URL param insertion • Role extensions AMD3 • AuthN/AuthZ • NTP anchor • External MPD link • Period continuity • Generalized HTTP header extensions & queries 23009-5 Server & Network Assisted DASH 23009-6 Full Duplex DASH Additional Tools under development • Random access to segments • Patch method for MPD updates • Reducing redundancy in multi-DRM linear MPDs AMD1 • Server-client NTP sync • Extended profiles ü AMD4 • Flexible segment & Broadcast TV profile • MPD chaining • MPD fallback • Preselections • Data URLs in MPD • Labels • Switching x adaptation sets 2nd Edition 23009-1:2014 MPEG-DASH 1st Edition 23009-1:2012 • Events • Asset Identifier ü ü ü ü ü 23009-4 Segment Encryption & Authentication 23009-8 Session based DASH operations ü 23009-2 Conformance and Reference Software 3rd Edition 23009-1:2019 3rd Edition AMD5 (AMD1 to 3rd edition) Device information, quality equivalence descriptor, timed text roles, announcing popular content, flexible IOP signaling, early available periods, signaling missing/alternative segments ü 🚧 4th Edition 23009-1:2020 ü AMD1 (to 4th edition) • CMAF support • event processing model 46 🚧
  47. 47. Using DASH-Assisting Network Elements (DANE) Part 5 – Server and Network Assisted DASH (SAND) Origin (HTTP) Server Encoder/ Transcoder Packager CDN RG Clients Media Parameters for enhancing reception (PER) Metrics and status messages Parameters for enhancing delivery (PED) Analytics Server ACM Multimedia Tutorial — October 2019 47
  48. 48. • Describes various types of “push” behaviors in HTTP/2 and WebSockets • Primarily targeted for low-latency live streaming • Similar activities – DASH over LTE broadcast (3GPP SA4) – DVB ABR multicast – ATSC 3.0 hybrid delivery Part 6 – DASH over Full Duplex HTTP-Based Protocols ACM Multimedia Tutorial — October 2019 48
  49. 49. Common Media Application Format (CMAF) • Media delivery has three main components: – Media format – Manifest – Delivery • CMAF defines the media format only (Fragments, headers, segments, chunks, tracks) – No manifest format or a delivery method is specified • CMAF uses ISO-BMFF and common encryption (CENC) – CENC means the media fragments can be decrypted/decoded by devices using different DRMs – CMAF does not mandate CTR or CBC mode • This makes CMAF useful only for unencrypted content for now, but industry is moving towards CBCS • Any delivery method may be used for delivering CMAF content – HTTP – RTP multicast/unicast – LTE broadcast ACM Multimedia Tutorial — October 2019 49
  50. 50. CMAF (ISO/IEC 23000-19) • CMAF defines media profiles for – Video – Audio – Subtitle • CMAF defines presentation profiles by selection a media profile from each category • Current status – 1st edition was published in Jan. 2018 – Supported in iOS 10+ (with HLS playlists) – Amd. 1: SHVC media profile and additional audio media profiles – Amd. 2: xHE-AAC and other media profiles – 2nd edition: Final text received or FDIS registered for formal approval (2019-10-09) ACM Multimedia Tutorial — October 2019 50
  51. 51. Each Frame can be a CMAF Chunk CMAF ISO-BMFF Media Objects RAP … RAP … RAP … RAP … Fragment Fragment Fragment Fragment Segment Segment Track File … … … Encoding Packaging Encryption CMAF Header Seamless switching can only happen at fragment boundaries Manifests may provide URLs to • Track files • CMAF header + segments • CMAF header + chunks ACM Multimedia Tutorial — October 2019 51
  52. 52. ismc Smooth mpd DASH m3u8 HLS Source Encoder f4m HDS m3u8 m3u8 mpd mpd ismc ismc f4m f4m Credit: Akamai ACM Multimedia Tutorial — October 2019 52
  53. 53. ismc Smooth mpd DASH m3u8 HLS Source Encoder f4m HDS m3u8 m3u8 mpd mpd ismc ismc f4m f4m Credit: Akamai More packaging $ More storage $ Less efficient caching ACM Multimedia Tutorial — October 2019 53
  54. 54. Multi-Platform OTT Workflow Efficient Distribution in the CDN CMAF Source Encoder m3u8 mpd Credit: Akamai m3u8 mpd m3u8 mpd m3u8 mpd m3u8 mpd m3u8 mpd ACM Multimedia Tutorial — October 2019 54
  55. 55. Encode and Package Once, Deliver Efficiently Origin (HTTP) Server Encoder/ Transcoder LinearIngest On-demandContent Pick your favorite codec Pick your chunk and segment sizes CIF (SCTE) to client manifest translation withEBPmarkers Packager ACM Multimedia Tutorial — October 2019 55
  56. 56. Non-HTTP Transport Options RTP Multicast - Define RTP payload format for CMAF chunks/fragments/segments - Use RFC 4588 and 6285 for loss recovery and rapid acquisition Broadcast - TS 26.346 and TS 26.347: Delivering DASH content via MBMS - A/331:2017: Carrying DASH content over broadcast and/or broadband networks Peer-to-Peer - WebRTC: Clients fetch DASH/CMAF pieces from each other in addition to CDN servers - This helps reduce the load on the CDN as well as the stream start times ACM Multimedia Tutorial — October 2019 56
  57. 57. Part I: Streaming Common Issues in Scaling and Improving Quality
  58. 58. Streaming over HTTP – The Promise • Leverage tried-and-true Web infrastructure for scaling – Video is just ordinary Web content! • Leverage tried-and-true TCP – Congestion avoidance – Reliability – No special QoS for video THERE IT SHOULD JUST WORK ACM Multimedia Tutorial — October 2019 58
  59. 59. Doesitjustwork? Mostly yes, when streaming clients compete with other types of traffic Not really, when streaming clients compete with each other Streaming clients interact with each other forming an “accidental” distributed control-feedback system • Multiple screens within a household • ISP access/aggregation links • Small cells in stadiums and malls ACM Multimedia Tutorial — October 2019 59
  60. 60. A Single Microsoft Smooth Streaming Client under a Controlled Environment Demystifying a Streaming Client 0 1 2 3 4 5 0 50 100 150 200 250 300 350 400 450 500 Bitrate(Mbps) Time (s) Available Bandwidth Requests Chunk Tput Average TputReading: • “An experimental evaluation of rate-adaptation algorithms in adaptive streaming over HTTP,” ACM MMSys 2011 • "An Evaluation of Dynamic Adaptive Streaming over HTTP in Vehicular Environments,” ACM MoVid 2012 Buffer-filling State Back-to-back requests Steady State Periodic requests ACM Multimedia Tutorial — October 2019 60
  61. 61. 10 (Commercial) Streaming Clients Sharing a 10 Mbps Link Selfishness Hurts Everyone 0 200 400 600 800 1000 1200 1400 0 100 200 300 400 500 RequestedBitrate(Kbps) Time (s) Client1 Client2 Client3 ACM Multimedia Tutorial — October 2019 61
  62. 62. Viewer Experience Statistics Selfishness Hurts Everyone Source: Conviva Viewer Experience Report, 2015 ACM Multimedia Tutorial — October 2019 62
  63. 63. Inner and Outer Control Loops HTTP Server Manifest Media HTTP Origin Module TCP Sender Streaming Client Manifest Resource Monitors Streaming Application TCP ReceiverData / ACK There could be multiple TCPs destined to the same or different servers Request / Response ACM Multimedia Tutorial — October 2019 63
  64. 64. Streaming with Multiple TCP Connections • Using multiple concurrent TCPs – Can help mitigate head-of-line blocking – Allows fetching multiple (sub)segments in parallel – Allows to quickly abandon a non-working connection without having to slow-start a new one System performance deteriorates very quickly if many clients adopt this approach without limiting the aggregated bandwidth consumption ACM Multimedia Tutorial — October 2019 64
  65. 65. Two Competing Clients Understanding the Root Cause • Depending on the timing of the ON periods: – Unfairness, underutilization and/or instability may occur – Clients may grossly overestimate their fair share of the available bandwidth Clients cannot figure out how much bandwidth to use until they use too much (Just like TCP) Reading: • “Oscillation Compensating Dynamic Adaptive Streaming over HTTP,” IEEE ICME 2015 • “A Proxy Effect Analysis and Fair Adaptation Algorithm for Multiple Competing Dynamic Adaptive Streaming over HTTP Clients”, IEEE VCIP 2012 • “What happens when HTTP adaptive streaming players compete for bandwidth?,” ACM NOSSDAV 2012 ACM Multimedia Tutorial — October 2019 65
  66. 66. How to Solve the Issues? • Use a better adaptation algorithm like PANDA or BOLA (mostly buffer-based) • Use machine learning or deep learning like Pensieve • Improve the HTTP/TCP stack, try out the alternatives • Adopt ideas from game/consensus theory (GTA) Fix the clients and/or the transport • QoS in the core/edge • SDN Get support from the network • Assist the clients and network elements thru metrics and analytics Enable a control plane ACM Multimedia Tutorial — October 2019 66
  67. 67. Bitrate Adaptation Schemes Bitrate Adaptation Schemes Client- based Adaptation Bandwidth- based Buffer- based Mixed adaptation Proprietary solutions MDP-based Server- based Adaptation Network- assisted Adaptation Hybrid Adaptation SDN-based Server and network- assisted Reading: “A survey on bitrate adaptation schemes for streaming media over HTTP,” IEEE Communications Surveys & Tutorials 2019 ***open access*** ACM Multimedia Tutorial — October 2019 67
  68. 68. High-Level Comparison between Different Schemes • The client-based adaptation schemes – Show a good performance in single and few-client scenarios – Largely fail in multi-client scenarios • The server-based adaptation schemes – Require custom servers – More effective in eliminating the bitrate oscillation problems – Less scalable due to increased complexity on the servers • The network-assisted adaptation and hybrid schemes – Show a good performance in both small and large populations – Require modifications on the clients, server and/or network devices – Pose practicality issues for deployment ACM Multimedia Tutorial — October 2019 68
  69. 69. A Note on the Reproducibility • A performance comparison between different algorithms is often difficult due to – Lack of the source codes – Lack of a unified QoE framework and universally accepted metrics • Schemes have their own parameters and assumptions, and work well only under certain circumstances ACM Multimedia Tutorial — October 2019 69 How to Evaluate HAS Systems?
  70. 70. AdViSE • Adaptive Media Content [DASH, HLS, CMAF] • Players/Algorithms • Network Parameters Impaired Media Sequences Generate Impaired Media Sequences Templates [Startup Delay, Stalling, …] WESP QoE Evaluation Parameters [Questionnaire, Methodology, Crowdsourcing Platform, …] QoS/QoE Metrics Subjective Results + Other Data Reports Analysis AdViSE: Anatoliy Zabrovskiy, Evgeny Kuzmin, Evgeny Petrov, Christian Timmerer, and Christopher Mueller. 2017. AdViSE: Adaptive Video Streaming Evaluation Framework for the Automated Testing of Media Players. In Proceedings of the 8th ACM on Multimedia Systems Conference (MMSys'17). ACM, New York, NY, USA, 217-220. DOI: https://doi.org/10.1145/3083187.3083221 WESP: Benjamin Rainer, Markus Waltl, Christian Timmerer, A Web based Subjective Evaluation Platform, In Proceedings of the 5th International Workshop on Quality of Multimedia Experience (QoMEX'13) (Christian Timmerer, Patrick Le Callet, Martin Varela, Stefan Winkler, Tiago H Falk, eds.), IEEE, Los Alamitos, CA, USA, pp. 24-25, 2013. https://doi.org/10.1109/QoMEX.2013.6603196 Objective and Subjective Evaluation Platform How to Evaluate HAS Systems? ① ② ③④ ⑤ Log of Segment Requests ACM Multimedia Tutorial — October 2019 70
  71. 71. AdViSE: Adaptive Video Streaming Evaluation • Scalable, end-to-end HAS evaluation through emulation w/ a plethora of – content configurations – players/algorithms (including for player competition) – network parameters/traces • Real content and network settings with real dynamic, adaptive streaming including rendering! • Collection of various metrics from players: API or directly from the algorithms/HTML5 • Derived metrics and utilize QoE models proposed in the literature • Segment request log to generate impaired media sequence as perceived by end users for subjective quality testing ACM Multimedia Tutorial — October 2019 71
  72. 72. C. Timmerer, A. Zabrovskiy, and Ali C. Begen, Automated Objective and Subjective Evaluation of HTTP Adaptive Streaming Systems, Proceedings of the 1st IEEE International Conference on Multimedia Information Processing and Retrieval (MIPR), Miami, FL, USA, April 2018. Example Evaluation Results • Test sequence encoded 15 different representation (Amazon Prime configuration: 400x224@100Kbps – 1920x1080@15Mbps) with 4s segment length • Bandwidth trajectory based on prior work proposed in literature; network delay 70 ms ACM Multimedia Tutorial — October 2019 72
  73. 73. Bandwidth Index vs. QoE • Bandwidth index – Average bitrate – Efficiency – Stability • QoE model – Startup delay – Stalls C. Timmerer, A. Zabrovskiy, E. Kuzmin, E. Petrov, Quality of experience of commercially deployed adaptive media players, In 2017 21st Conference of Open Innovations Association (FRUCT) (Sergey Balandin, ed.), pp. 330-335, 2017. https://doi.org/10.23919/FRUCT.2017.8250200 ACM Multimedia Tutorial — October 2019 73
  74. 74. GTA: A Game Theory Based ABR Scheme • Designed based on a cooperative game in the form of static formation-based coalitions • Formulates the ABR decision problem as a bargaining process and consensus mechanism • Outputs the optimal decision by reaching the Pareto optimal (PO) Nash bargaining solution (NBS) Game Theory Cooperative Coalition Formation Bargaining Solution Consensus Theory Strategic GTA GTA player Coalition Bargaining process ACM Multimedia Tutorial — October 2019 74 Reading: “Want to play DASH? a game theoretic approach for adaptive streaming over HTTP,” ACM MMSys 2018
  75. 75. • PANDA & CS2P Estimators – Throughput estimation • SAND Enabler – Implements various SAND-enabled communication interfaces • SSIMplus MAP Model – Maps content type, display resolution and service plan type into one common perceptual quality space in which a set of clusters are constructed • QoE Metrics – QoE model and various metrics • GT Agent – Develops the GTA decision rules for selecting the best bitrate GTA Components Reading: “Want to play DASH? a game theoretic approach for adaptive streaming over HTTP,” ACM MMSys 2018 ACM Multimedia Tutorial — October 2019 75
  76. 76. One Slide on Software Defined Networking (SDN) Control and data planes are decoupled, network intelligence and state are logically centralized, and the underlying network infrastructure is abstracted from the apps ACM Multimedia Tutorial — October 2019 76
  77. 77. SDN-Based Bitrate Adaptation Reading: “SDNDASH: improving QoE of HTTP adaptive streaming using software defined networking,” ACM MM 2016 ACM Multimedia Tutorial — October 2019 77
  78. 78. • Video encoding pipeline • Ingest and packaging operations • Network propagation • Server I/O, CDN buffering • Media segment duration • Player behavior – Buffering – Playhead positioning – Resilience Contributors to the Latency 78ACM Multimedia Tutorial — October 2019 Delay of TV signal [sec] Goal
  79. 79. 79 Low Latency is Always a Trade-Off against Playback Robustness Stream Start Time ≠ Latency Time Live encoder producing 2-second segments iOS (3 segments) Last fully available segment Lowest latency 1 2 3 4 Start Now 2 3 4 4 5 Latency: 7 s Latency: 3 s Latency: 2 s 6 seconds of buffer ~0 seconds of startup* 2 seconds of buffer ~0 seconds of startup* 2 seconds of buffer 1 second of startup* * Segment fetching time is assumed to be negligible in this example 5 ACM Multimedia Tutorial — October 2019
  80. 80. Live Twitch Data* (Nov. 2018) Bandwidth Measurement is Tricky 0 1 2 3 4 5 0 100 200 300 400 500 600 Bitrate(Mbps) Time (s) tc Bandwdith Measured Bandwidth Selected Bitrate * Encoded at {0.18, 0.73, 1.83, 2.5, 3.1, 8.8} Mbps with three resolutions of {540p, 720p, 1080p}, and packaged with CMAF ACM Multimedia Tutorial — October 2019 80 No upshifting despite the available bandwidth
  81. 81. ABR for Chunked Transfer Encoding (ACTE) Bandwidth Measurement Sliding window based moving average method Bandwidth Prediction Online linear adaptive filter based RLS algorithm ABR Controller Throughput- based bitrate selection logic ACM Multimedia Tutorial — October 2019 82 A. Bentaleb, C. Timmerer, A. C. Begen, and R. Zimmermann, "Bandwidth prediction in low- latency chunked streaming,” NOSSDAV ‘19, DOI: https://doi.org/10.1145/3304112.3325611
  82. 82. ACM Multimedia Tutorial — October 2019 83 New/Modified Blocks in dash.js (in Red) ABR Controller Throughput-based Buffer-based Hybrid ABR Decision Bandwidth Measurement Bandwidth Prediction Buffer Display Segment RequestsResponses (in chunks) Logger Sliding Window Filter Taps Update C(i) W(i) 𝜖(i) Ĉi+1 Ci +Ĉi -
  83. 83. ACM Multimedia Tutorial — October 2019 84 Overall Results Avg. Buffer Occupancy Avg. # of Switches Avg. # of Stalls and Duration (s) Avg. Startup Delay (s) ACTE 2.1 to 3.0 (2.5) 17 3 & 0.76 0.71 THsl 3.6 to 5.0 (4.3) 0 2 & 0.86 1.46 THew 1.9 to 3.9 (2.9) 18 21 & 66 1.06 THsw 1.9 to 3.5 (2.8) 24 27 & 33 1.03 THwss 2.0 to 3.1 (2.5) 23 16 & 9 0.88 BOLAsw 1.6 to 3.0 (2.3) 20 58 & 119 1.66 Dynamicsw 1.6 to 3.0 (2.4) 30 53 & 68 0.92
  84. 84. ACM Multimedia Tutorial — October 2019 85 ACTE Outperforms the Existing ABR Schemes • Consecutive numbers represent the results – Summary of the average results. Percentage improvements of ACTE’s over the other scheme
  85. 85. • Content formatting – Each asset is copied multiple times • Different audio/video codecs • Different (regional) frame rates • Different HDR formats • Different container formats, encryption modes – Huge cost for encoding/packaging/storage – Inefficiencies in CDN caching/distribution • Across platforms – Lack of consistent app behavior – Varying video features, APIs and semantics • Fake media • Playback – Partial profile support – Switching glitches – Audio discontinuities – Ad splicing problems – Long-term playback instability – Request protocol deficiencies – Memory problems, CPU weaknesses – Scaling (display) issues – Variable HDR support – Unknown capabilities, … • Security – Piracy and restreaming – Account sharing, use of VPNs/proxies, … Houston, We Have So Many Problems! ACM Multimedia Tutorial — October 2019 86 Luckily, we’re not getting bored!
  86. 86. Part II: towards fully immersive media access Adaptive Delivery of Omnidirectional/360° Video
  87. 87. MPEG’s Definition What is Virtual Reality (VR) VR is a rendered environment (visual and acoustic, pre- dominantly real-world) providing an immersive experience to a user who can interact with it in a seemingly real or physical way using special electronic equipment ACM Multimedia Tutorial — October 2019 88
  88. 88. Virtual Reality (VR) puts us in a Virtual World Source: Phil Chou ACM Multimedia Tutorial — October 2019 89
  89. 89. Augmented Reality (AR) puts Virtual Objects in our World Source: Phil Chou ACM Multimedia Tutorial — October 2019 90
  90. 90. Delivering High-Quality VR in Economic Scale is Extremely Challenging The VR Challenge 30K x 24K x 36 x 60 x 2 / 600 ~5.2 Gbps Resolution Bit depth Frame rate Stereoscopic Compression gain ACM Multimedia Tutorial — October 2019 91
  91. 91. Ultimate Level of Immersion Interactions So intuitive that they become second nature Sounds So accurate that they are true to life Visuals So vibrant that they are eventually indistinguishable from the real world ACM Multimedia Tutorial — October 2019 92 Source: Thomas Stockhammer
  92. 92. Visual quality Sound quality Intuitive interactions Immersive VR Has Extreme Requirements Immersion High resolution audio Up to human hearing capabilities 3D audio Realistic 3D, positional, surround audio that is accurate to the real world Precise motion tracking Accurate on-device motion tracking Minimal latency Minimized system latency to remove perceptible lag Natural user interfaces Seamlessly interact with VR using natural movements, free from wires Extreme pixel quantity and quality Screen is very close to the eyes Spherical view Look anywhere with a full 360° spherical view Stereoscopic display Humans see in 3D Source: Thomas Stockhammer ACM Multimedia Tutorial — October 2019 93
  93. 93. Need for Higher Resolutions on Mobile Phones? Source: Thomas Stockhammer ACM Multimedia Tutorial — October 2019 94
  94. 94. Omnidirectional/360° Video Capturing Devices • Stitching, projection formats • Encoding, encryption, encapsulation • Storage, content distribution, delivery • Processing, decoding, rendering Consumer Devices ACM Multimedia Tutorial — October 2019 95
  95. 95. Adaptive Streaming of VR Content • Required resolution of a panoramic video for achieving 4K resolution for viewport with 120° field of view (FoV) ACM Multimedia Tutorial — October 2019 96
  96. 96. Functional Architecture Encode, Encrypt, Encapsulate Decode, Decrypt, Decapsulate Project & Render Fuse, Stitch & Edit ① ② ③ ④ ⑤ Store & Deliver Capture Consume Content Creation NetworkServer Client Encoding Encryption Encapsulation Storage Delivery Decryption Decapsulation Decoding RenderingEditing Processing Processing Capture Acquisition Consumption Distribution Adaptive Delivery of Omnidirectional Video Interaction From ecosystem… To building blocks… ACM Multimedia Tutorial — October 2019 97 Reading: A Framework for Adaptive Delivery of Omnidirectional Video, Human Vision and Electronic Imaging 2018, https://doi.org/10.2352/ISSN.2470-1173.2018.14.HVEI-524
  97. 97. Functional Architecture Encode, Encrypt, Encapsulate Decode, Decrypt, Decapsulate Project & Render Fuse, Stitch & Edit ① ② ③ ④ ⑤ Store & Deliver Capture Consume Content Creation NetworkServer Client Encoding Encryption Encapsulation Storage Delivery Decryption Decapsulation Decoding RenderingEditing Processing Processing Capture Acquisition Consumption Distribution Adaptive Delivery of Omnidirectional Video Interaction From ecosystem… To building blocks… ACM Multimedia Tutorial — October 2019 98
  98. 98. Adaptive Streaming Options (1) • Traditional, viewport-agnostic streaming – ERP – handle like reg. video – Simple, easy, deployed today – Bandwidth waste, quality issues • Viewport-adaptive streaming – Multiple versions for predefined viewports – Various projection techniques (pyramid) – Bandwidth waste reduced, increased storage and CDN costs, limited flexibility X. Corbillon, et al., "Viewport-adaptive navigable 360-degree video delivery," 2017 IEEE International Conference on Communications (ICC), Paris, 2017, https://doi.org/10.1109/ICC.2017.7996611 ACM Multimedia Tutorial — October 2019 99
  99. 99. Adaptive Streaming Options (2) • Tile-based streaming – Use tiling technique of modern video codecs – High complexity, full flexibility – Multiple challenges M. Graf, et al. 2017. Towards Bandwidth Efficient Adaptive Streaming of Omnidirectional Video over HTTP: Design, Implementation, and Evaluation. Proc. ACM MMSys'17. https://doi.org/10.1145/3083187.3084016 C. Concolato, et al., "Adaptive Streaming of HEVC Tiled Videos using MPEG-DASH," IEEE TCSVT, 2017. https://doi.org/10.1109/TCSVT.2017.2688491 ACM Multimedia Tutorial — October 2019 100
  100. 100. System Overview Tile-Based Adaptive Streaming Tile 1 Tile 2 Tile 3 Tile 4 Tile 5 Tile 6 Tile 7 Tile 8 Tile 9 Tile 10 Tile 11 Tile 12 Tile 13 Tile 14 Tile 15 Tile 16 Tile 17 Tile 18 Tile 19 Tile 20 Tile 21 Tile 22 Tile 23 Tile 24 Encoding & Packaging Tile 1 Tile 2 Tile 3 Tile 4 Tile 5 Tile 6 Tile 7 Tile 8 Tile 9 Tile 10 Tile 11 Tile 12 Tile 13 Tile 14 Tile 15 Tile 16 Tile 17 Tile 18 Tile 19 Tile 20 Tile 21 Tile 22 Tile 23 Tile 24 Tile 1 Tile 2 Tile 3 Tile 4 Tile 5 Tile 6 Tile 7 Tile 8 Tile 9 Tile 10 Tile 11 Tile 12 Tile 13 Tile 14 Tile 15 Tile 16 Tile 17 Tile 18 Tile 19 Tile 20 Tile 21 Tile 22 Tile 23 Tile 24 Tile 1 Tile 2 Tile 3 Tile 4 Tile 5 Tile 6 Tile 7 Tile 8 Tile 9 Tile 10 Tile 11 Tile 12 Tile 13 Tile 14 Tile 15 Tile 16 Tile 17 Tile 18 Tile 19 Tile 20 Tile 21 Tile 22 Tile 23 Tile 24 DifferentQ uality Representations Delivery MPEG-HEVC/H.265 Tiles in ISOBMFF Adaptive Player Tile 1 Tile 2 Tile 3 Tile 4 Tile 5 Tile 6 Tile 7 Tile 8 Tile 9 Tile 10 Tile 11 Tile 12 Tile 13 Tile 14 Tile 15 Tile 16 Tile 17 Tile 18 Tile 19 Tile 20 Tile 21 Tile 22 Tile 23 Tile 24 Tile-based streaming of VR/360° content with MPEG-DASH SRD / OMAF … … … … Adaptive Streaming using MPEG-DASH SRD / OMAF Head Mounted Displays Browsers, Smart (Mobile) Devices (Stereo) 2D, (Stereo) 3D ACM Multimedia Tutorial — October 2019 101
  101. 101. Encoding Options • AVC dominates the market • HEVC, VP9, AV1 support tiles – Divides a picture into independent, rectangular regions – Tradeoff: bitrate, quality, flexibility • Multiple tiling options available – Uniform vs. non-uniform tiling – Same vs. mixed resolutions • New quality metrics, mostly based on PSNR but subjective quality assessments/metrics increasing https://bitmovin.com/bitmovin-2019-video-developer-report- av1-codec-ai-machine-learning-low-latency/ I. D.D. Curcio, et al. 2017. Bandwidth Reduction of Omnidirectional Viewport-Dependent Video Streaming via Subjective Quality Assessment. Proc. AltMM'17. https://doi.org/10.1145/3132361.3132364 A. Zare, et al. 2016. HEVC-compliant Tile-based Streaming of Panoramic Video for Virtual Reality Applications. Proc. ACM MM'16. http://dx.doi.org/10.1145/2964284.2967292 ACM Multimedia Tutorial — October 2019 102
  102. 102. Quality Metrics: S-PSNR M. Yu, H. Lakshman, and B. Girod. 2015. A Framework to Evaluate Omnidirectional Video Coding Schemes. In 2015 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). 31–36. DOI:http://dx.doi.org/10.1109/ISMAR.2015.12 ACM Multimedia Tutorial — October 2019 103
  103. 103. Quality Metrics: V-PSNR M. Yu, H. Lakshman, and B. Girod. 2015. A Framework to Evaluate Omnidirectional Video Coding Schemes. In 2015 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). 31–36. DOI:http://dx.doi.org/10.1109/ISMAR.2015.12 ACM Multimedia Tutorial — October 2019 104
  104. 104. Dataset • Segment length / intra period – 1s (tiled content) vs 1, 2, 4s (monolithic content) • Tiling pattern (columns × rows): 1×1, (i.e., tiles monolithic), 3×2, 5×3, 6×4, and 8×5 • Resolution: 1920×960, 3840×1920 and 7680×3840 • Map projection: equirectangular format • Quantization parameter: QP={22,27,32,37,42} • Head motion recordings for V-PSNR evaluation ACM Multimedia Tutorial — October 2019 105
  105. 105. Bitrate Overhead due to Tiling 500 1000 1500 2000 2500 384042444648 Bitrate [kbps] Y−PSNR[dB] ● ● ● ● ● ● ● 1x1 tiles 3x2 tiles 5x3 tiles 6x4 tiles 8x5 tiles Tile Overhead for Resolution: 1920x960 Sequence: AssassinsCreed ACM Multimedia Tutorial — October 2019 106
  106. 106. Bitrate Overhead due to Tiling 2000 4000 6000 8000 30323436384042 Bitrate [kbps] Y−PSNR[dB] ● ● ● ● ● ● ● 1x1 tiles 3x2 tiles 5x3 tiles 6x4 tiles 8x5 tiles Tile Overhead for Resolution: 1920x960 Sequence: ExploreTheWorld ACM Multimedia Tutorial — October 2019 107
  107. 107. Bandwidth Requirements ACM Multimedia Tutorial — October 2019 108
  108. 108. Adaptive Streaming Issues for (Tiled) 360° Video • Increasing number of segment/tile requests – HTTP/2 server push, query parameters, proprietary protocols – Additional functionality at server – breaks fundamental HAS requirements • Low latency streaming (see also next slide) – Reducing segment size impacts coding efficiency (1s vs. 4s) – CMAF chunks + other enhancements to enable sub-second latency – Remember: live internet video will grow 15-fold from 2016 to 2021 • Viewport prediction – Allows prefetching (caching) but cannot predict to much into future (1s) – Impact on segment size but situation will get better the more data is available – machine learning/AI will help • QoE (see also following slide) – Still in its infancy but situation much better than one year ago – Requires datasets, subjective studies, quality models, metrics M. Xu, et al., "A subjective visual quality assessment method of panoramic videos," Proc. ICME’17. https://doi.org/10.1109/ICME.2017.8019351 R. Schatz, et al., "Towards subjective quality of experience assessment for omnidirectional video streaming," Proc. QoMEX’17, https://doi.org/10.1109/QoMEX.2017.7965657 Y. Rai, at al. 2017. A Dataset of Head and Eye Movements for 360 Degree Images. Proc. ACM MMSys’17. https://doi.org/10.1145/3083187.3083218 S. Petrangeli, et al. 2017. An HTTP/2-Based Adaptive Streaming Framework for 360° Virtual Reality Videos. Proc. ACM MM'17. https://doi.org/10.1145/3123266.3123453 N. Bouzakaria, et al., "Overhead and performance of low latency live streaming using MPEG-DASH," Proc. IISA’14. https://doi.org/10.1109/IISA.2014.6878732 C.-L. Fan, et al. 2017. Fixation Prediction for 360° Video Streaming in Head- Mounted Virtual Reality. Proc. NOSSDAV'17. https://doi.org/10.1145/3083165.3083180 ACM Multimedia Tutorial — October 2019 109
  109. 109. Lag Prevents Immersion and Causes Discomfort Minimizing Motion-to Photon Latency is Important Low latency Noticeable latency ACM Multimedia Tutorial — October 2019 110
  110. 110. Motivation QoE of Omnidirectional Video • QoE of Omnidirectional Video (OV, alt.: 360º Movie) Streaming consumed via HMD – Immersive panorama video (or image) surrounding the user – User can turn head to changes gaze direction • OV streaming = content class with dramatically increasing popularity – Cisco VNI: augmented and virtual reality traffic will grow nearly 12-fold from 22 petabytes per month in 2017, to 254 petabytes per month in 2022 Immersive Media VR Responsive VR OV MR AR R. Schatz, et al., "Towards subjective quality of experience assessment for omnidirectional video streaming," Proc. QoMEX’17, https://doi.org/10.1109/QoMEX.2017.7965657 R. Schatz, et al., "Tile-based Streaming of 8K Omnidirectional Video: Subjective and Objective QoE Evaluation,” Proc. QoMEX’19, https://doi.org/10.1109/QoMEX.2019.8743230 ACM Multimedia Tutorial — October 2019 111
  111. 111. ACM Multimedia Tutorial — October 2019 112 Research Questions RQ1: How does full vs. partial delivery impact ODV streaming QoE? Is there a clear user preference? RQ2: What is the QoE impact of different ODV tile quality encoding levels (cf. full delivery advanced)? (RQ2) RQ3: Do camera movement (static vs. moving) or head turning speed exert an influence on quality perception?
  112. 112. Subjective Study Results • Subjective ratings: MOS [0-100] • Delivery: Partial Delivery è visibly low QoE • Tile Encoding Quality: QP46 = significant QoE drop, but only very little difference between QP22 & 32 ACM Multimedia Tutorial — October 2019 113 R. Schatz, et al., "Tile-based Streaming of 8K Omnidirectional Video: Subjective and Objective QoE Evaluation,” Proc. QoMEX’19, https://doi.org/10.1109/QoMEX.2019.8743230 QP: 46 32 22 | 46 ….
  113. 113. Subjective Ratings: Acceptability • Acceptability Ratings: confirm MOS results • In particular: Partial Delivery = really unacceptable ACM Multimedia Tutorial — October 2019 114 QP: 46 32 22 | 46 ….
  114. 114. Objective Metrics: wPSNR, SSIM and VMAF for FoV PVS • Results for segments 3-4-5 (i.e., head turn part only) • Low sensitivity of SSIM w.r.t. full vs. partial delivery • Little impact of content / cam movement speed • Consistent impact of head movement speed (but not pronounced, too) ACM Multimedia Tutorial — October 2019 115 More objective analysis results à see the paper!
  115. 115. Findings & Conclusions • RQ1 (Full vs partial delivery): – Avoid partial delivery (or find better visual workarounds), since unacceptable for end users • RQ2 (Impact of tile encoding quality): – Sufficient to use QP 32 for tile encoding for practical deployments as saturation kicks in here • RQ3 (Impact of camera movement and head turn speed): – Camera movement masking low encoding quality (known) – Head turn speed: only weak impact, interaction with content/cam movement (+ for static, - for dynamic scene) • Objective Evaluation: – Largely in line with subjective results, with some exceptions depending on metric and impairment ACM Multimedia Tutorial — October 2019 116
  116. 116. Part II: towards fully immersive media access The Developing MPEG-OMAF and MPEG-I Standards
  117. 117. Standardization Overview C. Timmerer, "Immersive Media Delivery: Overview of Ongoing Standardization Activities," in IEEE Communications Standards Magazine, vol. 1, no. 4, pp. 71-74, Dec. 2017. https://doi.org/10.1109/MCOMSTD.2017.1700038 Data Representations and Formats JPEG MPEG IEEE System Standards and APIs VR-IF 3GPP DVB Khronos W3C Quality of Experience QUALINET ITU-T VQEG Guidelines DASH-IF CTASVA SMPTE ETSI https://multimediacommunication.blogspot.com/2017/04/vr360-streaming-standardization-related.html IETF/IRTF ACM Multimedia Tutorial — October 2019 118
  118. 118. Coded Representation of Immersive Media MPEG-I (ISO/IEC 23090) • Part 1: Architectures • Part 2: Omnidirectional media format (OMAF) • Part 3: Versatile Video Coding (VVC) • Part 4: Immersive audio • Part 5: Video-based Point Cloud Compression (V-PCC) • Part 6: Immersive media metrics • Part 7: Immersive media metadata • Part 8: Network-based media processing (NBMP) • Part 9: Geometry-based Point Cloud Compression (G-PCC) • Part 10: Carriage of point cloud data • Part 11: Network-based media processing implementation guidelines • Part 12: Immersive Video ACM Multimedia Tutorial — October 2019 119
  119. 119. Scope Omnidirectional Media Format (OMAF) • 360° video, images, audio and associated timed text – 3DoF only • Specifies – A coordinate system • Consists of a unit sphere and the x (back-to-front), the y (lateral, side-to-side), and the z (vertical, up) axes – Projection and rectangular region-wise packing methods • Used for conversion of a spherical video/image into a two-dimensional rectangular video/image – The sphere signal is the result of stitching of video signals captured by multiple cameras – A special case: fisheye video – Storage of omnidirectional media and the associated metadata using ISO-BMFF – Encapsulation, signaling and streaming of omnidirectional media in DASH and MMT – Media profiles and presentation profiles • Provide interoperable and conformance points for media codecs as well as media coding and encapsulation configurations • Provides some informative viewport-dependent 360° video processing approaches ACM Multimedia Tutorial — October 2019 120
  120. 120. A: Real-world scene B: Multiple-sensors-captured video or audio D/D’: Projected/packed video E/E’: Coded video or audio bitstream F/F’: ISOBMFF file/segment OMAF player Acquisition Audio encoding Video encoding Image encoding File/segment encapsulation Delivery File/segment decapsulation Audio decoding Video decoding Image decoding Image rendering Audio rendering Loudspeakers/ headphones Display Head/eye tracking Fileplayback Orientation/ viewport metadata Orientation/viewport metadata Metadata Metadata A Ba Bi D Ea Ei Ev Fs F F’ F’s E’a E’v E’iD’ B’a A’i A’a Image stitching, rotation, projection, and region-wise packing https://mpeg.chiariglione.org/standards/mpeg-i/omnidirectional-media-format R. Skupin, et al., "Standardization Status of 360 degree Video Coding and Delivery,” Proc. IEEE VCIP’17 OMAFArchitecture
  121. 121. DASH/OMAF System Diagram Multi- camera capture Stitch Project HEVC encode OMAF composing DASH server DASH client HEVC Decode Renderer Head mounted device (HMD) DASH encapsulation Head and eye tracking telemetry Request for tiles within FoV window Projection mapping metadata Projection mapping metadata Authoring CDN VR player ACM Multimedia Tutorial — October 2019 122
  122. 122. MPEG- 1,2,4,H,I MPEG-7 MPEG-21 MPEG-A MPEG- B,C,D, DASH MPEG- E,M MPEG- U,V Compression of video, audio and 3DG Description of video, audio and multimedia for content search Technologies for content e-commerce Multimedia application formats (combinations of content formats) Systems, video, audio and transport Multimedia platform technologies Device and application interfaces https://mpeg.chiariglione.org/meetings/126 MPEG’s Areas of Activity ACM Multimedia Tutorial — October 2019 123
  123. 123. 1990 2000 20101995 2005 2015 MP3 MPEG-2 Video & Systems AVC HEVC MP4 FF MMTDASH AAC Internet Audio Digital Television Music Distribution Media Storage and Streaming UHD & Immersive Services New Forms of Digital Television Unified Media Streaming OFF HD DistributionMobile Video Custom Fonts on Web & DigiPub MPEG-4 Video CMAF 3D Audio 2020 Major MPEG Standards
  124. 124. 6 DoF Video Extensions Geometry Point Cloud Compression OMAF v.2 Jan 2018 20202019 2021 Internet of Media Things Descriptors for Video Analysis (CDVA) 6 DoF Audio Video Point Cloud Compression Media Coding 2022 Immersive Media Scene Description Interface Versatile Video Coding 2023 Systems and ToolsWeb Resource Tracks Dense Representation of Light Field Video 3DoF+ Video Neural Network Compression for Multimedia Essential Video Coding Low Complexity Enhancement Video Coding PCC Systems Support Network-Based Media Processing Beyond Media Genome Annotation CompressionGenome Compression Point Cloud Compression v.2 CMAF v.2 Multi-Image Application Format Jan 2024 Color Support in Open Font Format Partial File Format Next Generation Video Coding? Video Holography? Next Generation DASH?
  125. 125. Part II: towards fully immersive media access Ongoing Efforts/Trends, specifically towards 6DoF Adaptive Streaming Reading: “Dynamic Adaptive Point Cloud Streaming,” ACM Packet Video 2018
  126. 126. Through Point Cloud Compression Towards 6DoF HTTP Adaptive Streaming ACM Multimedia Tutorial — October 2019 127 Reading: “Towards 6DoF HTTP Adaptive Streaming Through Point Cloud Compression,” ACM MM 2019
  127. 127. Multiple Cameras to capture an object Capturing ACM Multimedia Tutorial — October 2019 128 P(1, 2, 1) x y z How does it work? Reading: “Towards 6DoF HTTP Adaptive Streaming Through Point Cloud Compression,” ACM MM 2019
  128. 128. Let’s focus on dynamic objects Static is boring! ACM Multimedia Tutorial — October 2019 129 4.1 Gb/s 3.8 Gb/s 5.7 Gb/s 5.6 Gb/s Reading: “Towards 6DoF HTTP Adaptive Streaming Through Point Cloud Compression,” ACM MM 2019
  129. 129. Streaming the whole scene would require 19.2 Gb/s ACM Multimedia Tutorial — October 2019 130 Reading: “Towards 6DoF HTTP Adaptive Streaming Through Point Cloud Compression,” ACM MM 2019
  130. 130. … and generated different scenes of two minutes each! We can solve this by using MPEG’s reference encoder ACM Multimedia Tutorial — October 2019 131 4.5 Mb/s 40.4 Mb/s 5.7 Gb/s Reading: “Towards 6DoF HTTP Adaptive Streaming Through Point Cloud Compression,” ACM MM 2019
  131. 131. We extended MPEG’s MPD to include point cloud objects ACM Multimedia Tutorial — October 2019 132 Reading: “Towards 6DoF HTTP Adaptive Streaming Through Point Cloud Compression,” ACM MM 2019
  132. 132. How should we prioritize the point cloud objects? ACM Multimedia Tutorial — October 2019 133 133Reading: “Towards 6DoF HTTP Adaptive Streaming Through Point Cloud Compression,” ACM MM 2019
  133. 133. Distance to the user? How should we prioritize the point cloud objects? ACM Multimedia Tutorial — October 2019 134 134 6 m 5 m 8 m 3 m 1 243 Reading: “Towards 6DoF HTTP Adaptive Streaming Through Point Cloud Compression,” ACM MM 2019
  134. 134. Size of the object? How should we prioritize the point cloud objects? ACM Multimedia Tutorial — October 2019 135 135 1 243 Reading: “Towards 6DoF HTTP Adaptive Streaming Through Point Cloud Compression,” ACM MM 2019
  135. 135. Visual area within the viewport? How should we prioritize the point cloud objects? ACM Multimedia Tutorial — October 2019 136 136 1 342 Reading: “Towards 6DoF HTTP Adaptive Streaming Through Point Cloud Compression,” ACM MM 2019
  136. 136. Once prioritized, the available bandwidth can be allocated ACM Multimedia Tutorial — October 2019 137 137 1 423 Reading: “Towards 6DoF HTTP Adaptive Streaming Through Point Cloud Compression,” ACM MM 2019
  137. 137. While buffering, the user might change focus and/or position 138 1 423 2. What is the impact of the buffer size on the visual quality? Reading: “Towards 6DoF HTTP Adaptive Streaming Through Point Cloud Compression,” ACM MM 2019 Further details à see presentation on Thursday!
  138. 138. ACM Multimedia Tutorial — October 2019 139 Lessons learned • How should we prioritize point cloud object? – There is no one-size-fits-all rate adaptation heuristic • What is the impact of the buffer size on the visual quality? – A trade-off between interactivity and buffer resilience exists – Accurate prediction of the user’s movement is required Reading: “Towards 6DoF HTTP Adaptive Streaming Through Point Cloud Compression,” ACM MM 2019
  139. 139. Final Remarks
  140. 140. Many Definitions Do Exist What is Quality (of Experience)? ACM Multimedia Tutorial — October 2019 141 The blind men and the elephant, Poem by John Godfrey Saxe
  141. 141. ACM Multimedia Tutorial — October 2019 142 QoE Definition • Quality of Experience (QoE) is the degree of delight or annoyance of the user (persona) of an application or service. It results from the fulfillment of his or her expectations with respect to the utility and/or enjoyment of the application or service in the light of the user’s personality and current state (context). • Experience: An experience is an individual’s stream of perception and interpretation of one or multiple events. • QoE feature: A perceivable, recognized and namable characteristic of the individual’s experience of a service which contributes to its quality. In the context of communication services, QoE can be influenced by factors such as service, content, network, device, application, and context of use. Reading: Qualinet White Paper on Definitions of Quality of Experience, 2013, http://www.qualinet.eu/
  142. 142. ACM Multimedia Tutorial — October 2019 143 QoE: Not an Easy Target … QoS QoP QoE QoS/P/E: Quality of Service/Perception/Experience
  143. 143. Service Provider: “Your video or CDN provider must be slow” Video/CDN Provider: “Your home network must suck” Consumer: “The device or the app is slow” Device/App Vendor: “It must be the OS” OS Vendor: “Your Internet connection must be bad” ACM Multimedia Tutorial — October 2019 144 • Use a common language (e.g., CTA- 2066) across players • Beware infobesity “Overabundance of information implies a scarcity of user attention” Yes, at the Source, Encoder, Packager, Origin, Cache, GW, Player, etc. Do We Need Analytics?
  144. 144. Focusing on Four Major Areas • Choosing the bitrate/resolution pairs to make up/downshifts least visible • Picking the best segment durations • Modeling streaming dynamics for different genres Content Preparation • What information could the network provide to streaming clients • Achieving controlled unfairness in the network • New transport (including reliable UDP) and congestion control options Distribution and Delivery • Modeling the impact of faster zapping and trick modes on the QoE • Understanding the impact of QoE on viewer engagement • Optimization across clients based on the stream utility that depends on spatial pooling model, content types/features, rendering devices, audience profiles and sizes QoE Modeling and Client Design • Understanding the interaction of adaptive streaming with caching in CDNs • Extracting actions based on real-time analytics • Fixing issues faster and remotely Analytics, Fault Isolation and Diagnostics ACM Multimedia Tutorial — October 2019 145
  145. 145. https://athena.itec.aau.at/ ACM Multimedia Tutorial — October 2019 146 Adaptive Streaming over HTTP and Emerging Networked Multimedia Services Motivation • Multimedia/video data on the Internet increasing: >80% of all IP traffic by 2021, mostly HTTP adaptive streaming • Internet bandwidth increasing: user’s bandwidth grows by 50% per year; 1Gbps in 2021 Multimedia systems challenges and tradeoffs (cf. IEEE MIPR 2018)
  146. 146. July 13-17, 2020, Klagenfurt am Wörthersee, Austria ACM Multimedia Tutorial — October 2019 147 Summer School 2020
  147. 147. July 13-17, 2020, Klagenfurt am Wörthersee, Austria ACM Multimedia Tutorial — October 2019 148 Summer School 2020 • Scope – The aim of this summer school to learn about basic and advanced concepts related to adaptive streaming over HTTP and emerging networked multimedia services, targeting the following topic areas: (i) multimedia content provisioning; (ii) content delivery, and (iii) content consumption in the media delivery chain as well as for (iv) end-to-end aspects, with a focus on, but not being limited to, HTTP Adaptive Streaming (HAS) – with a particular focus on gaming- and learning-based approaches related to the topics areas identified above and possibly also beyond. • Target audience – Any practitioner and Ph.D. student interested in adaptive streaming over HTTP and emerging networked multimedia services including games (streaming) development and learning-based approaches (machine learning, deep learning). You will find our program interesting if you are working on multimedia content provisioning (i.e., video coding and packaging for HAS), content delivery (i.e., content distribution networks, caching, video networking, SDN, ICN, 5G), content consumption (i.e., dynamic adaptive streaming), or/and end-to-end aspects thereof including Quality of Experience (QoE) issues. – We also cordially invite master students, post-docs, and researchers (upon availability) Further details coming soon: https://athena.itec.aau.at
  148. 148. June 8-11, 2020 MMSys: Multimedia Systems https://2020.acmmmsys.org
  149. 149. 150ACM Multimedia Tutorial — October 2019
  150. 150. MMSys’19 by Numbers • 120 attendees; three gold and six silver supporters • Papers – Research track: 52 submissions, 21 accepted (Double blind, minimum four reviews) – Also 10 demos, nine datasets presented • Special sessions – Machine learning and statistical modeling for video applications – Real-time video at the edge – Advanced transport protocols for video – Volumetric media: from capture to consumption • Two keynotes, two workshop keynotes, two overview talks • First ever: A female TPC co-chair • First ever: A grant from SIGMM for childcare and women’s lunch 151 NEW FEATURES ACM Multimedia Tutorial — October 2019
  151. 151. 152 MMSys’20 in Planning • First ever: A female general co-chair (in addition to a female TPC co-chair) • First ever: Diversity and inclusion chairs (three females) • First ever: A workshop (posters and demos) dedicated to middle and high-school students • First ever: Two grand challenges – Improving open-source HEVC encoding – Low-latency live streaming • Focus areas in 2020 – Machine learning and statistical modeling for video streaming – Volumetric media: from capture to consumption – Fake media and tools for preventing illegal broadcasts • DASH-IF Excellence in DASH Award • Two confirmed keynotes from Google and MIT • Expecting reduced registration fees thanks to strong support MORE NEW FEATURES ACM Multimedia Tutorial — October 2019
  152. 152. SupportersFor 2020
  153. 153. 154 Important Dates Submissions Due Notification By Camera-Ready Due Research Track Jan. 10, 2020 (firm) Mar. 16 Apr. 17 Demo Track Feb. 29, 2020 Mar. 30 Apr. 17 Open Source and Dataset Track Feb. 29, 2020 Mar. 30 Apr. 17 Workshops Mar. 27, 2020 Apr. 22 May 1 ACM Multimedia Tutorial — October 2019
  154. 154. Backup Slides
  155. 155. Source Code for Client Implementations • DASH Industry Forum – http://dashif.org/software/ • JW Player – https://github.com/jwplayer/jwplayer • Other Open Source Implementations/Frameworks – http://dash.itec.aau.at/ – https://github.com/bitmovin – http://gpac.wp.mines-telecom.fr/ – https://github.com/google/shaka-player – https://github.com/video-dev/hls.js/ – http://streaming.university/ • TNO’s SAND Demo: https://github.com/tnomedialab/sand ACM Multimedia Tutorial — October 2019 156
  156. 156. • DASH (http://dash.itec.aau.at/) – http://www-itec.uni-klu.ac.at/dash/?page_id=207 • Distributed DASH – http://www-itec.uni-klu.ac.at/dash/?page_id=958 • Multi-Codec DASH – http://www-itec.uni-klu.ac.at/dash/?page_id=1619 • DASH SVC – http://concert.itec.aau.at/SVCDataset/ • UHD HEVC DASH – http://download.tsi.telecom- paristech.fr/gpac/dataset/dash/uhd/ • iVID-Datasets for AVC and HEVC – https://www.ucc.ie/en/misl/research/datasets/ivid_d ataset/ • AVC and HEVC UHD 4K DASH – https://www.ucc.ie/en/misl/research/datasets/ivid_u hd_dataset/ • Open Dataset from ITU-T P.1203 Standardization – https://github.com/itu-p1203/open-dataset DASH Datasets ACM Multimedia Tutorial — October 2019 157
  157. 157. • SWAPUGC – https://github.com/emmanouil/SWAPUGC • A 4G LTE Dataset with Channel and Context Metrics – https://www.ucc.ie/en/misl/research/datasets/ivid_4 g_lte_dataset/ • A Multi-Carrier Mobile Geo-Communication Dataset – https://dl.acm.org/citation.cfm?id=3193572 • ODIs Saliency Maps – https://drive.google.com/file/d/1hbPDS2FqzZRqpA bhRurL7L-rT0bjIZeB/view • Exploring User Behaviors in VR – https://wuchlei-thu.github.io/ • 360° Videos Head Movements – http://dash.ipv6.enstb.fr/headMovements/ • 360° Video Viewing in Head-Mounted VR – https://dl.acm.org/citation.cfm?id=3192927 Other Datasets ACM Multimedia Tutorial — October 2019 158

×