Advance video distribution


Published on

Advance Video Distribution lecture

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Advance video distribution

  1. 1. Advance Video Distribution Fast Forward Your Development
  2. 2. Day I - Agenda • Current Delivery technologies (RTP/RTSP) • Components in depth ▫ File Formats ▫ Codecs • Video Streaming Evolution ▫ Requirements ▫ Solutions • CDN • Basic Adobe Flash Fast Forward Your Development
  3. 3. Day II - Agenda • Microsoft “Smooth Streaming” • Adobe “HTTP Streaming” • Apple “HTTP Streaming” • Internet Gorillas positioning • Adobe OSMF Fast Forward Your Development
  4. 4. Current Delivery Methods Fast Forward Your Development
  5. 5. Progressive Download • Simple • Utilizing existing protocols & servers (HTTP) • Media file is prepared: metadata up front • Playback - after metadata is received • Cache-ability - supported • Seek-ability – very limited support • Poor user experience - seek, multi-rate • Waste of bandwidth when not watched fully • Low cost Fast Forward Your Development
  6. 6. Pseudo Streaming • Media is sent as a regular file like Progressive DW • Server must understand how the media is structured • Playback: after metadata is received • Existing protocols ▫ Non standard server ▫ Non standard client component • Cache-ability – Limited ! • Seek-ability – supported • User experience – better than PD, support seek. • Waste of bandwidth when not watched full Fast Forward Your Development
  7. 7. Streaming • Seek-ability – supported • Server side – proprietary technology (FMS) • Cache-ability – requires special servers for streaming • User experience – very good • Cost – high Fast Forward Your Development
  8. 8. HTTP Streaming Intro • HTTP Streaming offers the advantages of: ▫ Progressive download in terms of Cost Standard Server Scalability Standard client components (OSMF) ▫ Streaming in terms of User experience Seek-ability of streaming Fast Forward Your Development
  9. 9. RTSP Streaming Fast Forward Your Development
  10. 10. RTSP Protocol • Real Time Streaming Protocol • Used for controlling streaming data over the web. • Designed to efficiently broadcast audio/video- on-demand to large groups. • Using Directives to control the stream ▫ Options, Describe, Setup, Play, Pause, Record, Teardown. Fast Forward Your Development
  11. 11. SDP Protocol • Describes the metadata of the stream. • Mainly used in: SIP, RTSP and other Multicast sessions. Protocol Version • Sample SDP description: Session ID ▫ v=0 Session Name ▫ o=jdoe 2890844526 2890842807 IN IP4 Session Info. ▫ s=SDP Seminar Description URI ▫ i=A Seminar on the session description protocol Connection Info. u= (Jane Doe) Active session time ▫ c=IN IP4 Session Attribute lines ▫ t=2873397496 2873404696 Media Name and Transport address ▫ a=recvonly ▫ m=audio 49170 RTP/AVP 0 Media Attribute lines ▫ m=video 51372 RTP/AVP 99 ▫ a=rtpmap:99 h263-1998/90000 Fast Forward Your Development
  12. 12. Client-Server flow Client Server Web HTTP GET Web Stream URI Server Browser OPTIONS DESCRIBE SDP Information SETUP Media Media PLAY Player RTP Media Stream Server RTP Media Stream PAUSE TEARDOWN Fast Forward Your Development
  13. 13. RTSP Protocol Parameters • version ▫ The version of rtsp. (RTSP/1.0) • URL [rtsp/rtspu]://host:port/path Reliable unreliable legal domain port used to the server protocol protocol name or IP control the stream path (TCP) (UDP) address stream *port – the actual stream will be delivered in other port Fast Forward Your Development
  14. 14. RTSP Protocol Parameters (Ctnd.) • Session ID ▫ Generated by the server ▫ Stays constant for the entire session • SMPTE – Relative timestamp ▫ A relative time from the beginning of the stream. ▫ Nested types: smpte-range, smpte-type, smpte-time. ▫ smpte-25=(starttime)-(endtime) • UTC – Absolute time ▫ Absolute time using GMT. ▫ Nested types: utc-range, utc-time. utc-date ▫ utc-time = (utcdate)T(utctime).(fraction)Z • NPT - Normal Play Time ▫ Absolute position from the beginning of the presentation. ▫ npt=123.45-125 Fast Forward Your Development
  15. 15. RTSP Session Details Initiation Handling Termination Fast Forward Your Development
  16. 16. RTSP - OPTIONS request Media URL Client Player Request ID OPTIONS – Request for information about the communication options available by the Request-URI. • CSeq – the request id, a response with the same id will be sent from the server. •Media URL – the URL of the video. •Client Player – the user agent of the client. Fast Forward Your Development
  17. 17. RTSP – OPTIONS response Response Code Available Options • All RTSP response codes are divided into 5 ranges (RFC 2326 7.1.1) : 1xx – Informational, 2xx – Success 3xx – Redirection, 4xx – Client Error, 5xx – Server Error. CSeq has the same value as the request CSeq field. • The server response will return the available methods that it supports. It May contain any arbitrary data the server want to expose. Fast Forward Your Development
  18. 18. RTSP – DESCRIBE request Description readers DESCRIBE is used to retrieve the description of the media URL and the session. The description response MUST contain all media and streaming data needed in order to initialize the session. Fields: Accept - Used to inform the server which description methods the client supports. Session Description Protocol (SDP) is highly used. Notice that CSeq field is increased by one. Fast Forward Your Development
  19. 19. RTSP – DESCRIBE response The media URL the response is referring to The description method used The length of the SDP message Description readers SDP The response will always return the details of the media. SDP details will be next Fast Forward Your Development
  20. 20. RTSP – GET_PARAMETER request GET_PARAMETER is used to retrieve information about the stream. The request can be initiated from the Client or from the Server. The request/response message body is left to server/client implementation. The parameters can be: packets received, jitter, bps or any other relevant information about the stream. Fast Forward Your Development
  21. 21. RTSP – SETUP request Transport protocol Unicast/Multicast RTP/RTSP client Track ID media port SETUP is used to specify the transport details used to stream the media. The request/response message body is left to server/client implementation. The parameters can be: packets received, jitter, bps or any other relevant information about the stream. Fast Forward Your Development
  22. 22. Transport Unicast/Multicast Unicast Last gateway The client port The server protocol server option destination ip source ip to receive port to receive media data media data SETUP response will contain the session ID. For each track ( audio/video ) a different SETUP request will be made After the response is received, a PLAY request can be made to start receiving the media stream. Fast Forward Your Development
  23. 23. RTSP – PLAY request Normal Play TIme PLAY request tells the server to start send data through the streaming details defined in the SETUP process. PLAY request maybe queued so that a PLAY request arriving while a previous PLAY request is still active is delayed until the first has been completed. Fast Forward Your Development
  24. 24. RTSP – PAUSE request Stream URL PAUSE request tells the server to pause the streaming. When the user will want to start the stream again he’ll send a PLAY request to the same URL. The request may contain time information to handle when the pause will take effect. Fast Forward Your Development
  25. 25. RTSP – TEARDOWN Description readers TEARDOWN stops the stream delivery for the URL specified. Informs the server that the client is disconnecting from it. The response will include only the response code. Fast Forward Your Development
  26. 26. RTSP – More Request types • RECORD: ▫ Initiates recording operation given a time information and stream URL. • REDIRECT: ▫ Server to Client request that informs the client he needs to switch the server he connected to. The request will contain the new server URL. • SET_PARAMETER: ▫ sends a request to change a value of the presentation stream. The response code will contain the answer. • ANNOUNCE: ▫ Can be initiated both by client/server. Informs the recipient that the SDP table of the object has changed. Fast Forward Your Development
  27. 27. Components •Containers •CODECs Fast Forward Your Development
  28. 28. File Format Movie (meta-data) Video track trak moov Audio track trak Media Data sample sample sample sample mdat frame frame Fast Forward Your Development
  29. 29. Agenda • Intro to file formats • Second Generation formats ▫ RIFF: AVI, WAV • Third Generation Containers ▫ MPEG4 FF ▫ MKV Fast Forward Your Development
  30. 30. File Format Segmentation File Formats 3rd 2nd 1st Generation Generation Generation Object Media Raw / XML Based Based Muxer Proprietary Fast Forward Your Development
  31. 31. 2nd Generation File Formats Fast Forward Your Development
  32. 32. 2ND Generation Files features • Multiple media track in the same file • Identification of codec ▫ Usually by FourCC • Interleaving Fast Forward Your Development
  33. 33. 2nd Generation File Formats 2nd Generation FF RIFF ASF MPEG2 FLV MP2PS WAV AVI WMA WMV MP2TS VOB Fast Forward Your Development
  34. 34. AVI FILE FORMAT Fast Forward Your Development
  35. 35. AVI Overview • AVI files use the AVI RIFF format (like WAV) • Introduced by Microsoft on 1992 • File is divided into: ▫ Streams – Audio, Video, Subtitles ▫ Blocks “Chunks” - Fast Forward Your Development
  36. 36. Blocks / Chunks • A RIFF File logical unit • Chunks are identified by four letters (FOUR-CC) • RIFF file has two mandatory sub-chunks and one optional sub-chunk • Mandatory Chunks: RIFF ('AVI ' ▫ hdrl – File header LIST ('hdrl‘ 'avih'(<Main AVI Header>) ▫ movi - Media Data LIST ('strl’ ... ) . . . ) • Optional Chunk LIST ('movi‘ . . . ) ['idx1 ['idx1'<AVI Index>] ▫ idx1 - Index ) *This order is fixed Fast Forward Your Development
  37. 37. AVI main header RIFF 'AVI ' - Identifies the file as RIFF file. LIST 'hdrl' - Identifies a chunk containing sub- chunks that define the format of the data. 'avih' - Identifies a chunk containing general information about the file. Includes: ▫ dwMicrosecPerFrame - Time between frames ▫ dwMaxBytesPerSec – number of bytes per second the player should handle ▫ dwReserved1 - Reserved ▫ dwFlags - Contains any flags for the file. Fast Forward Your Development
  38. 38. Example - headers Avi file header Initial frame chunk ID chunk size format chunk ID Data rate flages Time between streams frames Total no. of frames Frame Stream header width 320 Frame height reserved Size of padding Junk chunk identifier Fast Forward Your Development
  39. 39. Example – data chunks Audio data chunk (stream 01) video data chunk (stream 00) Fast Forward Your Development
  40. 40. AVI Summary • Advantages ▫ Includes both audio and video ▫ Index-able • Disadvantage ▫ Not suited for progressive DW ▫ Very rigid format ▫ Insufficient support for: seeking, metadata multi- reference frames Fast Forward Your Development
  41. 41. 3RD GENERATION FILE FORMATS Fast Forward Your Development
  42. 42. Why “Fix it”? • 2nd Generation Formats are missing: • Metadata ▫ Separate from Media ▫ Info on angle, language, Synchronization ▫ Versioning • Better Streaming Support ▫ Reduce CPU per stream ▫ Better seeking support • Better parsing ▫ XML ▫ Atom Based Fast Forward Your Development
  43. 43. Main Attributes • File format is not just a Video / Audio multiplexer • Separation between ▫ Media – Audio, Video, Images, Subtitles ▫ Metadata – Indexing, frame length, Tags Fast Forward Your Development
  44. 44. 3rd Generation File Formats 3rd Generation XML Based Object Based Matruska (MKV) MOV MPEG4 FF Fragmented 3GPP FF MPEG4 FF Fast Forward Your Development
  45. 45. MPEG4 FILE FORMAT Fast Forward Your Development
  46. 46. MP4 File Format • File Structuring Concepts ▫ Separate the media data from descriptive (meta) data. ▫ Support the use of multiple files. ▫ Support for hint tracks: support of real time streaming over any protocol Fast Forward Your Development
  47. 47. Separate Metadata and Media • Key meta-information is compact ▫ The type of media present ▫ Time-scales ▫ Timing ▫ Synchronization points etc. • Enables ▫ Random access ▫ Inspection, composition, editing etc. ▫ Simplified update Fast Forward Your Development
  48. 48. Multiple file support • Use URLs to ‘point to’ media ▫ Distinct from URLs in MPEG-4 Systems • URLs use file-access service ▫ e.g. file://, http://, ftp:// etc. • Permits assembly of composition without requiring data-copy • Referenced files contain only media ▫ Meta-data all in ‘main’ file Fast Forward Your Development
  49. 49. Logical File Structure • Presentation (‘movie’) contains… • Tracks which contain… • Samples Fast Forward Your Development
  50. 50. Physical Structure—File • Succession of objects (atoms, boxes) • Exactly one Meta-data object • Zero or more media data object(s) • Free space etc. Fast Forward Your Development
  51. 51. Example Layout Movie (meta-data) Video track trak moov Audio track trak Media Data sample sample sample sample mdat frame frame Fast Forward Your Development
  52. 52. Meta-data tables • Sample Timing • Sample Size and position • Synchronization (random access) points, priority etc. • Temporal/physical order de-coupled ▫ May be aligned for optimization ▫ Permits composition, editing, re-use etc. without re-write • Tables are compacted Fast Forward Your Development
  53. 53. Multi-protocol Streaming support • Two kinds of track • Media (Elementary Stream) Tracks ▫ Sample is Access Unit • Protocol ‘hint’ tracks ▫ Sample tells server how to build protocol transmission unit (packet, protocol data unit etc.) Fast Forward Your Development
  54. 54. Track types • Visual—’description’ formats ▫ MPEG4 ▫ JPEG2000 • Audio—’description’ formats ▫ MPEG4 compressed tracks ▫ ‘Raw’ (DV) audio • Other MPEG-4 tracks • Hint Tracks (streaming) Fast Forward Your Development
  55. 55. Track Structure • Sample pointers (time, position) • Sample description(s) • Track references ▫ Dependencies, hint-media links • Edit lists ▫ Re-use, time-shifting, ‘silent’ intervals etc. Fast Forward Your Development
  56. 56. Hint Tracks • May include media (ES) data by ref. • Only ‘extra’ protocol headers etc. added to hint tracks — compact ▫ Make SL, RTP headers as needed • May multiplex data from several tracks • Packetization/fragmentation/multiplex through hint structures • Timing is derived from media timing Fast Forward Your Development
  57. 57. Hint track structure Movie (meta-data) Video track trak moov Hint track trak Sample Data sample sample hint sample hint sample mdat header header frame frame pointer pointer Fast Forward Your Development
  58. 58. Extensibility • Other media types. ▫ Non-sc29 sample descriptions (e.G. Other video). ▫ Non-sc29 track types (e.G. Laboratory instrument trace). • Copyright notice (file or track level) etc. • General object extensions (GUIDs). Fast Forward Your Development
  59. 59. Advantages • Compatibility ▫ files can be played by other companies players. Real Player with envivo plug-in. Windows media player etc. ▫ Files can be streamed by other companies streaming server Darwin Streaming Server. Quick Time Streaming Server. Fast Forward Your Development
  60. 60. Single File-Multiple data types • No need to do an export process for files, one file type is used for storage of video, audio, events, continues telemetry data from sensors and JPEG images in one file. Audio Métadonnées Video JPEG1 JPEG1 Sensor Continues data events Fast Forward Your Development
  61. 61. Single file playback • All video track of a site could be stored in one file. In order to view many cameras in a synchronized manner the MPEG-4 file format can hold all the views of multiple cameras in one file. Audio Métadonnées Video cam 1 Video cam 2 Video cam ……. Video cam N Fast Forward Your Development
  62. 62. Skimming • Skimming – shortening a long movie to its interesting points, much like creating a “promo”. For example skimming a surveillance movie of two hours to 2 minutes where there is movement and people are entering the building. • MPEG-4 FF enables the creation of skims within the file through the use of edit-list (part of the standard) without overhead. Fast Forward Your Development
  63. 63. MKV FILE FORMAT XML Based File-Format Fast Forward Your Development
  64. 64. MKV - File Format • Container file format for videos, audio tracks, pictures and subtitles all in one file. • Announced on Dec. 2002 by Steve Lhomme. • Based on Binary XML format called EBML (Extensible Binary Meta Language) • Complete Open-Standard format. (Free for personal use). • Source is licensed under GNU L-GPL. Fast Forward Your Development
  65. 65. MKV - Specifications • Can contain chapter entries of video streams • Allows fast in-file seeking. • Metadata tags are fully supported. • Multiple streams container in a single file. • Modular – Can be expanded to company special needs. • Can be streamed over HTTP, FTP, etc. Fast Forward Your Development
  66. 66. MKV Support software & hardware • Players: ▫ All Player, BS.Player, DivX Player, Gstreamer-Based players, VLC media, xine, Zoom Player, Mplayer, Media Player Classic, ShowTime, Media Player Classic and many more… • Media Centers: ▫ Boxee, DivX connected, Media Portal, PS3 Media Server, Moovida, XBMC etc. • Blu-Ray Players: ▫ Samsung, LG and Oppo. • Mobile Players: ▫ Archos 5 android device, Cowon A3 and O2. Fast Forward Your Development
  67. 67. MKV - EBML in details • A binary format for representing data in XML- like format. • Using specific XML tags to define stream properties and data. • MKV conforms to the rules of EBML by defining a set of tags. ▫ Segment , Info, Seek, Block, Slices etc. • Uses 3 Lacing mechanisms for shortening small data block (usually frames). ▫ Uses: Xiph, EBML or fixed-sized lacing. Fast Forward Your Development
  68. 68. MKV – Simple representation Type Description Header Version info, EBML type ( matroska in our case ). Meta Seek Optional, Allows fast seeking of other level 1 elements in file. Information Segment File information - title, unique file ID, part number, next file Information ID. Track Basic information about the track – resolution, sample rate, codec info. Chapters Predefines seek point in media. Clusters Video and audio frames for each track Cueing Data Stores cue points for each track. Allows fast in track seeking. Attachment Any other file relates to this. ( subtitles, Album covers, etc… ) Tagging Tags that relates to the file and for each track (similar to MP3 ID3 tags). Fast Forward Your Development
  69. 69. MKV – Streaming • Matroska supports two types of streaming. • File Access ▫ Used for reading file locally or from remote web server. ▫ Prone to reading and seeking errors. ▫ Causes buffering issues on slow servers. • Live Streaming ▫ Usually over HTTP or other TCP based protocol. ▫ Special streaming structure – no Meta seek, Cues, Chapters or attachments are allowed. Fast Forward Your Development
  70. 70. File Format Summary - Trends • Metadata is important ▫ Simple metadata or XML ▫ Separated from media • Forward compatibility ▫ Not crash if don’t understand a data entry • Progressive download oriented • Multi-bitrate oriented • Fragmentation -> Lower granularity ▫ Self contained File fragments • CDN-ability Fast Forward Your Development
  71. 71. Video Codecs Movie (meta-data) Video track trak moov Audio track trak Media Data sample sample sample sample mdat frame frame Fast Forward Your Development
  72. 72. Why Advance ? MPEG2 Works…. • Coding efficiency • Packetization • Robustness • Scalable profiles • Internet requires Interaction ▫ Scalable & On demand ▫ Fast-Forward / Fast Rewind / Random Access ▫ Stream switching • Multi ▫ Bitrate ▫ resolution /screen Fast Forward Your Development
  73. 73. Coding efficiency Motivation Fast Forward Your Development
  74. 74. Sorenson Spark video Codec • H263 variant • Low footprint (code size) ~100K • Good performance for 2002 • Quality SPARK vs Optimal MPEG (H263+) ▫ 20-30% less efficient • SPARK Quality RT vs Offline ▫ RT has Considerably lower quality due to processing power and RT (delay) constraints Fast Forward Your Development
  75. 75. Sorenson Spark - 2 • Does Not support: ▫ Arithmetic coding ▫ Advance prediction ▫ B-frames • Features ▫ De-blocking filter mode ▫ UMV - Unrestricted Motion Vector mode ▫ Arbitrary frame dimensions ▫ Supported by FFMPEG ▫ D – Frames Fast Forward Your Development
  76. 76. D-Frames • D (Disposable) frames ▫ One way prediction ▫ Provides flexible bit-rate: I-D-P-D-P-D-P ▫ D-frames used only when feeding a flash communication server Fast Forward Your Development
  77. 77. On2 TrueMotion VP6 • Features ▫ Compressed I-frames (Intra-compression makes use of spatial predictors) ▫ unidirectional predicted frames (P-frames) ▫ Multiple reference P-frames ▫ 8x8 iDCT-class transform (4x4 in VP7) ▫ improved quantization strategy (preserves image details) ▫ Advance Entropy Coding Fast Forward Your Development
  78. 78. VP6 Features • Entropy Coding ▫ various techniques are used based on complexity and frame size including: VLC Context modeled binary coding (like H264 CABAC) • Bit Rate Control ▫ To reach the requested data rate, VP6 adjusts Quantization levels Encoded frame dimensions Entropy Coding Drop frames Fast Forward Your Development
  79. 79. VP6 motion prediction • Motion Vectors ▫ One vector per MacroBlock (16x16) or ▫ 4 vectors for each block (8x8) • Quarter pel motion compensation support • Unrestricted motion compensation support • Two reference frames: ▫ The previous frame or ▫ Previously bookmarked frame Fast Forward Your Development
  80. 80. VP6 vs H264 • VP6 is much simpler than H.264 ▫ Requires less CPU resourced for decoding & encoding ▫ Code size is considerably smaller. • Simpler means less efficient? NO! Techniques used: ▫ Mix of adaptive sub-pixel motion estimation ▫ Better prediction of low-order frequency coefficients ▫ Improved quantization strategy ▫ de-blocking and de-ringing filters ▫ Enhanced context based entropy coding, Fast Forward Your Development
  81. 81. PSNR Graphs are used for comparative 720p High Profile H.264 vs VP7 This axis represents quality. Higher is better analysis of compression quality. Each line represents the encode quality on a given clip at multiple datarates. The highest line Draw a line straight represents the codec withAlexander Trailer the best quality. across until you intersect In this case VP7 clearly is better than x264. 47 the lower line ( in this Pick any point on case x264. i.e. keep the 46.5 Tips for reading this kind of a the top line, in this quality/ psnr constant ) 46 graph (a PSNR graph): case it’s VP7. What this means: 45.5 On this clip VP7 at 2750 kbps has the same quality / PSNR as x264 high profile 45 at 3620 kbps. i.e. you’d need 30%Draw a line straight higher PSNR Draw a line straight 44.5 datarate to get the same quality out of from that point down from that point down Vp7 to the datarate axis. that you got from vP7. x264 x264 to the datarate axis. 44 The crossing point tells The crossing point tells 43.5 you the datarate at that you the datarate at that point. point. 43 42.5 1400 1900 2400 2750 kbps 2900 3400 3620 kbps 3900 4400 Kbps This axis represents datarate in kilobits per second. Fast Forward Your Development
  82. 82. VP6 vs. H264 • There is a difference between the codec technology and a codec implementation. Fast Forward Your Development
  83. 83. On2 VP7 • Not open source • Non-standard royalties model • Better video quality than H264 • Used by: ▫ Part of EVD – China standard for HD-DVD ▫ Skype Beta (V 2.0) ▫ Flash Player Fast Forward Your Development
  84. 84. Windows Media • Windows media is a format used by Microsoft for encoding and distributing Audio and Video. • Windows Media has two types of media: ▫ Windows Media Audio (WMA) ▫ Windows Media video (WMV) • Windows Media Video ▫ A modified version of MPEG 4 ▫ Codec version has initially started from version 7 for windows media player 7 and then evolved to version 8-10 Fast Forward Your Development
  85. 85. Windows Media 9 - VC1 Format • Microsoft has submitted Version 9 codec to the Society of Motion Picture and Television Engineers (SMPTE), for approval as an international standard. SMPTE is reviewing the submission under the draft-name "VC-1") • This codec is also used to distribute high definition video on standard DVDs in a format Microsoft has branded as WMV HD. This WMV HD content can be played back on computers or compatible DVD players. • The Trial version of standards were published by SMPTE in September 2005 • WMV9 was approved by SMPTE, April 2006 Fast Forward Your Development
  86. 86. H.264 Fast Forward Your Development
  87. 87. H.264 Terminology • The following terms are used interchangeably: ▫ H.26L ▫ “JVT CODEC” ▫ The “AVC” or Advanced Video CODE • Proper Terminology going forward: ▫ MPEG-4 Part 10 (Official MPEG Term) ISO/IEC 14496-10 AVC ▫ H.264 (Official ITU Term) Fast Forward Your Development
  88. 88. H264 Standard ideas • “Blocks” size fixed ->Variable ▫ Slice ▫ Block • Block Size order/scanning –> different orders ▫ Zig-zag, Flexible Macroblock Order • Additional spatial prediction - >Intra prediction • Inter prediction 1 frame only ->Multiple frames ▫ P and B picture ▫ Multiple reference frame Fast Forward Your Development
  89. 89. H264 Standard Ideas ▫ Pixel interpolation ▫ Motion vectors • In-loop Deblocking filter • Improved Entropy coding Fast Forward Your Development
  90. 90. New Features of H.264 - summarized • SP, SI - Additional picture types • NAL (Network Abstraction Layer) • CABAC - Additional entropy coding mode • ¼ & 1/8-pixel motion vector precision • In-loop de-blocking filter • B-frame prediction weighting • 4×4 integer transform • Multi-mode intra-prediction • NAL - Coding and transport layers separation • FMO - Flexible MacroBlock ordering. Fast Forward Your Development
  91. 91. Block diagram Fast Forward Your Development
  92. 92. Profiles and Levels • Profiles: Baseline, Main, and X ▫ Baseline: Progressive, Videoconferencing & Wireless ▫ Main: esp. Broadcast ▫ Extended: Mobile network • Wireless <> Mobile Fast Forward Your Development
  93. 93. Fast Forward Your Development
  94. 94. Baseline Profile • Baseline profile is the minimum implementation ▫ No CABAC, 1/8 MC, B-frame, SP-slices • 15 levels ▫ Resolution, capability, bit rate, buffer, reference # ▫ Built to match popular international production and emission formats ▫ From QCIF to D-Cinema • Progressive (not interlaced) • I and P slices types Fast Forward Your Development
  95. 95. Baseline Profile • 1/4-sample Inter prediction • Deblocking filter, Redundant slices • VLC-based entropy coding (no CABAC) • 4:2:0 chroma format • Flexible Macroblock Ordering (FMO) • Arbitrary Slice Order (ASO) ▫ Decoder process slices in an arbitrary order as they arrive to the decoder. ▫ The decoder dose not have a wait for all slices to be properly arranged before it starts processing them. ▫ Reduces the processing delay at the decoder. Fast Forward Your Development
  96. 96. Baseline Profile • FMO: Flexible Macroblock Ordering ▫ With FMO, macroblocks are coded according to a macroblock allocation map that groups, within a given slice. ▫ Macroblocks from spatially different locations in the frame. ▫ Enhances error resilience • Redundant slices: ▫ allow the transmission of duplicate slices. Fast Forward Your Development
  97. 97. H.264 Profiles & Levels - Main • All Baseline features Plus Interlace B slice types (bi directional reference ) CABAC Weighted prediction • All features included in the Baseline profile except: Arbitrary Slice Order (ASO) Flexible Macroblock Order (FMO) Redundant Slices Fast Forward Your Development
  98. 98. Main Profile • CABAC • Good performance (bit rate reduction) by ▫ Selecting models by context ▫ Adapting estimates by local statistics ▫ Arithmetic coding reduces computational complexity • Improve computational complexity more than 10%~20% of the total decoder execution time at medium bitrate • Average bit-rate saving over CAVLC 10-15% Fast Forward Your Development
  99. 99. Extended Profile • All Baseline features plus ▫ Interlace ▫ B slice types ▫ Weighted prediction Fast Forward Your Development
  100. 100. Variable block size Slices A picture split into 1 or several slices Slices are a sequence of macroblocks Macroblock Contains 16x16 luminance samples and two 8x8 chrominance samples Macroblocks within a slices depend on each others Macroblocks can be further partitioned Slice 0 Slice 1 Slice 2 Fast Forward Your Development
  101. 101. Basic Marcoblock Coding Structure Input Coder Video Control Control Signal Data Transform/ Quant. Scal./Quant. - Transf. coeffs Decoder Scaling & Inv. Split into Macroblocks Transform 16x16 pixels Entropy Coding De-blocking Intra-frame Filter Prediction Output Motion- Video Compensation Signal Intra/Inter Motion Data Motion Estimation Fast Forward Your Development
  102. 102. Motion Compensation Input Coder Video Control Control Signal Data Transform/ Quant. Scal./Quant. - Transf. coeffs Decoder Scaling & Inv. Split into Macroblocks Transform 16x16 pixels Entropy Coding De-blocking 16x16 16x8 8x16 8x8 Intra-frame MBFilter 0 0 1 Prediction Types 0 0 1 Output1 2 3 Motion- Video Compensation 8x8 8x4 Signal 4x8 4x4 Intra/Inter 8x8 0 0 1 0 0 1 Types Motion 1 2 3 Data Motion Various block sizes and shapes Estimation Fast Forward Your Development
  103. 103. Variable block size Block sizes of 0 0 1 16x8, 8x16, 8x8, 8x4 , 4X8 and 0 0 1 2 3 1 4X4 are available. Mode 1 Mode 2 Mode 3 Mode 4 1 16x16 block 2 16x8 blocks 2 8x16 blocks 4 8x8 blocks 0 1 0 1 2 3 Using seven different 0 1 2 3 2 3 4 5 6 7 block sizes can translate 4 5 8 9 1 1 into bit rate savings of 4 5 6 7 0 1 6 7 more than 15% as 1 2 1 3 1 4 1 5 compared to using only a Mode 5 Mode 6 16x16 block size. 8 8x4 blocks 8 4x8 blocks Mode 7 16 4x4 blocks Fast Forward Your Development Yossi Cohen DSP-IP
  104. 104. How to select the partition size? The partition size that minimizes the coded residual and motion vectors Fast Forward Your Development
  105. 105. Inter prediction modes motion vectors • MVs for neighboring partitions are often highly correlated. • So we encode MVDs instead of MVs • MVD = predicted MV – MVp • ¼ pixel accurate motion compensation Fast Forward Your Development
  106. 106. Multiple Reference Frames Input Coder Video Control Control Signal Data Transform/ Quant. Scal./Quant. - Transf. coeffs Decoder Scaling & Inv. Split into Macroblocks Transform 16x16 pixels Entropy Coding De-blocking Intra-frame Filter Prediction Output Motion- Video Compensation Signal Intra/Inter Motion Multiple Reference Data Frames for Motion Motion Compensation Estimation Fast Forward Your Development
  107. 107. Multiple Reference Frames Fast Forward Your Development
  108. 108. Intra prediction modes 4x4 luminance prediction modes 0(vertical) 1(Horizontal) 2(DC) 3(Diagonal 4(Diagonal Down/left) Down/right) 5(Vertical-right) 6(Horizontal-down) 7(Vertical-left) 8(Horizontal-top) Mode 2 (DC) Predict all pixels from (A+B+C+D+I+J+K+L+4)/8 or (A+B+C+D+2)/4 or (I+J+K+L+2)/4 Fast Forward Your Development
  109. 109. Intra prediction modes 4x4 luminance prediction modes Fast Forward Your Development
  110. 110. Intra prediction modes Intra 16x16 luminance and 8x8 chrominance prediction modes Fast Forward Your Development
  111. 111. Inter prediction modes chrominance Pixel interpolation A B Quarter chrominance Pixels are interpolated by tacking weighted dy dx averages of distance from the new S-dx pixel to four surrounding original S-dy pixels. C D (s-dx)(s-dy)A+dx(s-dy)B+(s-dx)dyC+dxdyD+s2/2 V= S2 Fast Forward Your Development
  112. 112. In-loop De-blocking Filter • Highly compressed decoded inter picture • Significantly reduces prediction residuals Without filter with H.264/AVC De-blocking Fast Forward Your Development
  113. 113. Entropy coding Fast Forward Your Development
  114. 114. Entropy coding • Entropy coding methods: • CABAC - Discussed • UVLC ▫ H.264 offers a single Universal VLC (UVLC) table for all symbol • CAVLC ▫ CAVLC (Context-based variable Length Coding ) ▫ Probability distribution is static ▫ Code words must have integer number of bits (Low coding efficiency for highly peaked pdfs) Fast Forward Your Development
  115. 115. CABAC - Binarization Fast Forward Your Development
  116. 116. CABAC: Technical Overview update probability estimation Context Binarization Probability Coding modeling estimation engine Adaptive binary arithmetic coder Chooses a model Maps non-binary Uses the provided model conditioned on symbols to a for the actual encoding past observations binary sequence and updates the model Fast Forward Your Development
  117. 117. H.264 NAL & RTP Fast Forward Your Development
  118. 118. H264 Layer Structure Control Data Video Coding Layer Macroblock Data Partitioning Slice/Partition Network Abstraction Layer H.320 H.324 H.323/IP MPEG2 Fast Forward Your Development
  119. 119. H264 & NAL • Motivation ▫ Many delivery methods are based on packet based networks ▫ Its better to do the packetization inside the encoder where all coding information than in other separate modules • Architecture: NAL units as the transport entity ▫ NAL units may be mapped into a bit stream ▫ NAL units are self-contained-independently decodable ▫ The decoding process assumes NAL units are in decoding order Fast Forward Your Development
  120. 120. Network Abstraction Layer (NAL) • H.264 encoder is composed of two layers: • VCL - Video Coding Layer – unit which translates the video information into a stream of bits • NAL - Network Abstraction Layer (NAL). Which maps and packetize the VCL biststream into unitsprior to transmission or storage • Each NAL unit contains: ▫ Payload – RBSP (Raw Byte Sequence Payload), which contains set of data corresponding to coded video data or header information ▫ NAL Unit Header – which contains the NAL header Fast Forward Your Development
  121. 121. NAL • The coded video sequence is represented by a sequence of NAL units that can be transmitted over a packet-based network or a bitstream transmission link or stored in a file • There are two NAL Types ▫ VCL Units – NAL units which represents encoded video data ▫ Non-VCL Units – Parameters sets Fast Forward Your Development
  122. 122. NAL Unit Header NAL unit header NAL unit payload NAL unit header- 1 byte consisting of: • Forbidden_bit(1 bit) may be used to signal that a NAL unit is corrupt • nal_storage_idc(2 bit): signal relative importance, and if the picture is stored in the reference picture buffer. • nal_unit_type(5 bit): signal 1 of 10 different NAL unit types: ▫ Coded slice (regular VCL data) ▫ Coded data partition. (DPA,DPB,DPC) ▫ Instantaneous decoder refresh (IDR) ▫ Supplemental enhancement information (SEI) ▫ Sequence and picture parameter sets (SPS,PPS) ▫ Picture delimiter (PD) and filler data (FD) Fast Forward Your Development
  123. 123. RBSP (NAL Payload) Types • Parameter Set – global parameters for a sequence includes: Resolution, video format, macroblock, allocation map • Supplemental Enhancement Information • Picture Delimiter – boundary between video pictures • Coded slice – header and data for a slice, this unit contains actual coded video data • Data Partition A,B or C – Data Partitioned slice layer data (A – header data for all MBs in the slice, B – intra coded data, C – inter coded data) • End of sequence • End of stream • Filler data Fast Forward Your Development
  124. 124. RTP payload format for H.264 • Based on IETF RFC 3984, February 2005 • Described how to use H.264 NAL inside RTP with proper packetization Employs the native NAL (Network Abstraction Layer) interface, based on NAL units (NALUs) • NALU – byte string of variable length that contains syntax elements of a certain class • NALU header – defines the information within the NAL Unit (Corrupted, Type etc) • There are two basic methods for RTP packetization of NAL units: ▫ Non fragmented NAL units ▫ Fragmented NAL units Fast Forward Your Development
  125. 125. RTP Payload for H.264 NAL • The most common method is to configure the encoder to output one NAL unit for each RTP packet. Each NAL unit is ~1.4KB • Fragment a large NAL unit (Frame) into many RTP Packets. The difference is in the RTP Header information Fast Forward Your Development
  126. 126. RTP and H.264 • RTP Packetization of NAL allows both aggregation of many NAL units into one RTP Packet and fragmentation of one NAL units into many RTP packets Fast Forward Your Development
  127. 127. Comparison Fast Forward Your Development
  128. 128. Summary • New key features are: ▫ Enhanced motion compensation ▫ Small blocks for transform coding ▫ Improved de-blocking filter ▫ Enhanced entropy coding • Substantial bit-rate savings (up to 50%) relative to other standards for the same quality • The complexity of the encoder triples that of the prior ones • The complexity of the decoder doubles that of the prior ones Fast Forward Your Development
  129. 129. Google VP8 Fast Forward Your Development
  130. 130. Before we start • VP8 goal is NOT to delivery the best video quality in any given bitrate • VP8 was designed as a mobile video decoder and should be examined in this context: ▫ VP8 vs H.264 base profile Fast Forward Your Development
  131. 131. Google VP8 • Last month, in Google IO (its developer confrence), Google released VP8 as open source • VP8 is a light weight video codec developed by On2. • VP8 provide quality which is the same/higher than H.264 base profile • VP8 memory requirements are lower than H.264 base profile • After optimization, VP8 might have better MIPS performance than H.264 base profile Fast Forward Your Development
  132. 132. Genealogy • VP8 is part of a well know codec family • VP3 was released to open source to become XIPH Theora Sorenson • VP6 is used in Flash video Spark • VP7 is used in Skype VP3 • Motivation: Theora ▫ “No Royalties” CODEC VP7 VP6 VP8 Fast Forward Your Development
  133. 133. ADAPTATION – WHO USE IT? Software Hardware Platform & Publishers Fast Forward Your Development
  134. 134. Software Adaptation • Android, Anystream, Collabora • Corecodec, Firefox, Adobe Flash • Google Chrome, iLinc, • Inlet, Opera, ooVoo • Skype, Sorenson Media •, Telestream, Wildform. Fast Forward Your Development
  135. 135. Hardware adaptation • AMD, ARM, Broadcom • Digital Rapids, Freescale • Harmonic ,Logitech, ViewCast • Imagination Technologies, Marvell • NVIDIA, Qualcomm, Texas Instruments • VeriSilicon, MIPS Fast Forward Your Development
  136. 136. Platforms and Publishers • Brightcove • • HD Cloud • Kaltura • Ooyala • YouTube • Zencoder Fast Forward Your Development
  137. 137. VP8 MAIN FEATURES (According to On2/Google) Fast Forward Your Development
  138. 138. Adaptive Loop Filter • Improved Loop filter provides better quality & performance in comparison to H.264 • ALF is a feature of H.265 Source: On2 Fast Forward Your Development
  139. 139. Golden Frames • Golden frames enables better decoding of background which is used for prediction in later frames • Could be used as resync-point: ▫ Golden frame can reference an I frame • Could be hidden (not for display) Fast Forward Your Development Source: On2
  140. 140. Decoding efficiency • CABAC is an H.264 feature which improves coding efficiency but consumes many CPU cycles • VP8 has better entropy coding than H.264, this leads to relatively lower CPU consumption under the same conditions • Decoding efficiency is important for smooth operation and long battery life in netbooks and mobile devices Fast Forward Your Development Source: On2
  141. 141. Resolution up-scaling & downscaling • Supported by the decoder • Encoder could decide dynamically (RT applications) to lower resolution in case of low bit rate and let the decoder scale. • Remove decision from the application • No need for an I frame Fast Forward Your Development
  142. 142. VP8 BASICS Definitions Bitstream structure Frame structure Fast Forward Your Development
  143. 143. Definitions • Frame – same as H.264 • Segment – Parallel to slice in H.264. MB in the same segment will use the settings such as: ▫ Entropy encoder/decoder context ▫ De-blocking filter settings • Partition – block of byte aligned compressed video bits. Fast Forward Your Development
  144. 144. Definitions • Block – 8x8 matrix of pixels • Macro-block –processing unit, contains a 16x16 Y pixels, and 2 8x8 matrix of U and V: ▫ 4* 8x8Y block ▫ 1* 8x8U block ▫ 1* 8x8V block • Sub-block – 4x4 matrix of pixels. All DCT / WHT operations are done on sub-blocks. Fast Forward Your Development
  145. 145. Frame Types • I Frame • P Frame • No B Frames due to patents / delays but we have “Future Alt-Ref” frame. What is the diff? • Prediction ▫ Previous frame ▫ Golden Frame ▫ Alt-ref frame Fast Forward Your Development
  146. 146. Frame Structure • Include three sections: • Frame Header • Partition I • Partition II Frame Header Partition I Partition II partitions Fast Forward Your Development
  147. 147. Frame Header • Byte aligned uncompressed information • Frame type - 1-bit frame type ▫ 0 for key frames, 1 for inter-frame. • Level - A 3-bit version number ▫ 0 - 3 are defined as four different profiles with different decoding complexity; other values for future use • show_frame - A 1-bit show_frame flag ▫ 0 – current frame not for display ▫ 1 - current frame is for display • Length - A 19-bit field containing the size of the first data partition in bytes. Fast Forward Your Development
  148. 148. Partition I • Header information for the entire frame • Per-macroblock information specifying how each macroblock is predicted. • This information is presented in raster-scan order Fast Forward Your Development
  149. 149. Partition II • Texture information - DCT/WHT quantized coefficients • Optionally each macroblock row could be mapped to a separate partition. • Partition II might be divided to several partitions for parallel processing Frame Header Partition I Partition IIA Partition IIB Partition IIn Texture Data Fast Forward Your Development
  150. 150. Decoder • Holds 4 frames: ▫ Current remonstrated frame ▫ Previous frame ▫ Previous “Golden Frame” ▫ Previous Alt-ref frame • Frame dimension can change in every frame Fast Forward Your Development
  151. 151. VP8 block diagram Input Coder Video Control Control Signal Data Transform/ Quant. Scal./Quant. - Transf. coeffs Decoder Scaling & Inv. Split into Macroblocks Transform Entropy Coding Dynamic Intra-frame De-blocking Prediction Output Motion- Video Compensation Intra/Inter Motion Data Motion Estimation Fast Forward Your Development
  152. 152. VP8 BLOCK CODING Fast Forward Your Development
  153. 153. VP8 Macroblock coding DC/AC Coeff 4x4 Divide to Divide to Process as DCT 16x16 8x8 4x4 Macroblock blocks sub blocks 4x4 WHT Each Macroblock is divided into 25 sub-blocks •16Y sub-blocks •4 U sub-blocks, •4 V sub-blocks •1 Y2 DC values sub-block (WHT) aka Hierarchical transform Fast Forward Your Development
  154. 154. VP8 Transforms • Very inefficient raw implementation of the transform – uses 16bit multipliers • Uses exact values of pixels ▫ +Memory ▫ +Accuracy and no drift static const int cospi8sqrt2minus1 = 20091; //sqrt(2) * cos(pi/8) static const int sinpi8sqrt2 = 35468; //sqrt(2) * sin (pi/8) temp1 = (ip[4] * sinpi8sqrt2 + rounding) >> 16; Fast Forward Your Development
  155. 155. iDCT • Like in H.264 or VP6 the 2D transform is done by two separable 1D transforms (LLM method) there are two methods of implementation: 1. Transpose, vertical T, Transpose, vertical T. 2. Vertical T., transpose, vertical T, transpose. • Duo to SIMD command its better to do it as 1 and eliminate the first transpose like H.264 does • VP8 does it as 2 (Why?!) wasted 1-2% CPU Fast Forward Your Development
  156. 156. Quantization • There are 6 quantizer types each has its own levels • The quantizer depends on (multiplication of) ▫ Plane: Y,U, V ▫ Coefficient AC, DC • Quantizer level is indicated by a 7 digit number which is an entry into one of the 6 quantization levels Fast Forward Your Development
  157. 157. VP8 PREDICTION Inter-prediction Intra prediction Fast Forward Your Development
  158. 158. Macroblock Intra Prediction • Intra-prediction exploits the spatial coherence between Macro-blocks without referring to other frames. • Modes ▫ Same as H.264 in i16x16 and i4x4 ▫ Missing modes like i8x8 which exists in H.264 Fast Forward Your Development
  159. 159. Intra prediction - blocks used Not Relevant Not Available Not Available M Not Available Not Available Not Available Not Available Fast Forward Your Development
  160. 160. Luma i4x4 Intra Prediction • 4x4 block are predicated by ▫ Orig four 16x16 prediction methods ▫ six “diagonal” prediction methods Diagonal Down/left Diagonal Down/right Vertical-right Horizontal-down Vertical-left Horizontal-top Fast Forward Your Development
  161. 161. 8x8 Chroma prediction modes • U,V, Y prediction are done separately and one channel prediction does not affect the other channels. Fast Forward Your Development
  162. 162. Inter-frame prediction • Definition - Inter-prediction exploits the temporal coherence between frames to save bitrate. • Luma sub-block prediction ▫ Method - each Y 4x4 sub-blocks is related to a 4x4 sub-block of the prediction frame. ▫ Precision – motion vectors precision is q-pel. ▫ interpolation pixel is calculated by applying a kernel filter three pixels horizontally and vertically. ▫ Based on deltas from previous MVs “NEARST” and NEAR modes Fast Forward Your Development
  163. 163. Inter-frame prediction • SPLITMV mode ▫ Supports up to 16 MV inside a single MB ▫ MV reuse (delat) is done not only on MB level but also on sub-block level • Example: Partition of Block into 3 MV areas 16 Fast Forward Your Development 3
  164. 164. Inter-frame prediction - Chroma • Chroma prediction - motion vector for each 8X8 chroma block is calculated separately by one of four prediction methods listed below: 1. Vertical - Copying the row from above throughout the prediction buffer. 2. Horizontal - Copying the column from left throughout the prediction buffer. 3. DC - Copying the average value of the row and column throughout the prediction buffer. 4. Extrapolation from the row and column using the (fixed) second difference (horizontal and vertical) from the upper left corner. Fast Forward Your Development
  165. 165. Inter-frame Prediction - Chroma • Chroma precision - the calculated chroma motion vectors have 1/8 pixel resolution • averaging the vectors of the four Y sub-blocks that occupy the same area of the frame. Fast Forward Your Development
  166. 166. Entropy Coding • Entropy coding is based on binary tree like CABAC but unlike H.264 its not context adaptive like in H.264 and does not adapt on every operation • Tables does not change all the time • Unlike H.264 which makes all Symbol->Bit decision as text, VP8 has a tree which represent the transform of each symbol -> faster implementation (not faster processing time) 16 Fast Forward Your Development 6
  167. 167. Entropy Coding • According to X264Developer “probability values are constant over the course of the frame” • According to Google document, there could be up to 4 “Entropy coding context” in a frame • We can view this as “switched context BAC” and not as CABAC • Obviously “switched context BAC”<<CABAC on a CPU level • However setting the context for each fragment is hard to implement in hardware 16 Fast Forward Your Development 7
  168. 168. PARALLEL PROCESSING Segment Partition Fast Forward Your Development
  169. 169. Segment Processing • Segmentation enables creation of MB groups within one logical unit. • MB are associated with a segment by the MB Segment ID • All MBs in a segment has the same adaptive adjustments which includes: ▫ Same Quantization level ▫ Loop filter strength (0-2) Fast Forward Your Development
  170. 170. Frame Processing Architecture • Frame Header and Partition I are processed first to initialize probabilistic decoder and prediction scheme for each MB. A Serial operation • Each sub-partition might be processed in parallel to other partitions. probabilistic model of one sub-partition does not interact with another sub- partition Frame Length Partition Partition Partition Partition I Header IIA-IIn-1 IIA IIB IIn Sub-partition Fast Forward Your Development
  171. 171. VP VS H.264 DIFFRENCES Fast Forward Your Development
  172. 172. H.264 Loop Filter • H.264 Loop filter strength depends on boundary strength, MB type 17 Fast Forward Your Development 2
  173. 173. VP8 Loop Filter • Lop filter is adjusted by 6 bit variable on two levels: ▫ Global Frame Loop filter level ▫ Per MB level • VP8 supports two filter types. • Per MB level is set as “delta” from frame filter level • L1 – enable/Disable Loop filter in code mode_ref_lf_delta_update 17 Fast Forward Your Development 3
  174. 174. VP8 Loop filter complexity • Before final optimization 70% of CPU time is deblocking filter!!! According to • a 4×4 transform requires a total of 8 length-16 and 8 length-8 loopfilter calls per macroblock while theora 8x8 filter requires half the amount of calls 17 Fast Forward Your Development 4
  175. 175. AltRef Noise Reduction • One of the usages of Alt-Ref frame is for noise reduction • For noisy sources using a filtered image as reference improves compression (so is pre- processing) • Improvement 0.25DB for noisy clip • Example with noise reduction set to 5 17 Fast Forward Your Development 5
  176. 176. AltRef Frame Future Frame • Just don’t call it a B-Frame and it will give you 1db 17 Fast Forward Your Development 6
  177. 177. H.264 vs VP8 Transforms differences • No 8x8 transforms • H.264 simplifies the DCT transform and implement it as a series of: 1. Add 2. Subtract 3. Right shift by 1 It sacrifice accuracy (~1%) for speed/CPU VP8 uses large,16bit multipliers for accuracy (20091, 35468), this is redundant unlike VC which uses small multiplier 17 Fast Forward Your Development 7
  178. 178. H.264 vs VP8 Transforms differences • Unlike H.264 which de-corellates DC values (Hadamard transform) ONLY in Intra i116x16 MB, VP8 uses this method also on some p16x16 MBs 17 Fast Forward Your Development 8
  179. 179. VP8 / H.264 summary • “Golden Frames” – exist at some level in H.264 slice group map type 2 • Slice granularity is better in H.264 ▫ support MB instead of MB line • Interlacing – not supported • AltRef Frame – a reference frame which is not displayed. Could aggregate all sorts of useful MBs • Filter – VP has an adaptive complex and slow filter 17 Fast Forward Your Development 9
  180. 180. Install • Visual Studio 2010 version – I couldn’t make it work • Download Visual Studio 2005/2008 code version • Download and Install YASM • Follow instructions on alStudio2005 to integrate with Visual Studio 2005/2008 (copy files and set paths) 18 Fast Forward Your Development 0
  181. 181. Basic Decoder work • Compile IVFDEC Project • Set parameters ▫ --codec=vp8 -o a.i420 qcif1.ivf • Compile IVFEnc Project ▫ --codec=vp8 --i420 -v a.i420 qcif1.ivf • Check that the decoder unders 18 Fast Forward Your Development 1
  182. 182. COMPARISON (FINALLY) Fast Forward Your Development
  183. 183. Talking heads, Low motion • Low motion videos like talking heads are easy to compress, so you'll see no real difference Fast Forward Your Development
  184. 184. Low motion In another low motion video with a terrible background for encoding (finely detailed wallpaper), the VP8 video retains much more detail than H.264. Interesting result. Fast Forward Your Development
  185. 185. Medium motion VP8 holds up fairly well Fast Forward Your Development
  186. 186. High motion • In high motion videos, H.264 seems superior. In this sample, blocks are visible in the pita where the H.264 video is smooth. The pin-striped shirt in the right background is also sharper in the H.264 video, as is the striped shirt on the left. Fast Forward Your Development
  187. 187. Very High motion In this very high motion skateboard video, H.264 also looks clearer, particularly in the highlighted areas in the fence, where the VP8 video has artifacts. Fast Forward Your Development
  188. 188. Final In the final comparison, I'd give a slight edge to VP8, which was clearer and showed fewer artifacts. Fast Forward Your Development
  189. 189. Quality Comparison Fast Forward Your Development
  190. 190. Test yourself 1. Why VP8 is less effective in high motion? 2. Is it patent free? 3. Will you use it? Fast Forward Your Development
  191. 191. Adobe Flash Video Fast Forward Your Development
  192. 192. Flash File Format Web Page File type: .htm, Flash video / .html, .asp, FLV etc File type: .flv Flash Movie / SWF Includes: File type: .swf Video Includes: Plays in: Graphics, text, video .swf Video controls Client logic Served from: Plays in: Web server Flash Player Flash Served from: Communicati Web server on Server Synonyms: Application Video player Fast Forward Your Development
  193. 193. FLV (Flash Video) File Format • Headers in the beginning of the file. Why? • FLV and MP4 File Formats • Video support: ▫ SPARK ▫ TrueMotion VP6 ▫ H264 and VP8 (future support) • Audio ▫ Nelly Moser codec by ASAO ▫ MP3 Codec ▫ ADPCM (not compressed) • Alpha Channel Fast Forward Your Development
  194. 194. SWF File • Vector Graphic Format • Container file • Includes FLV files • Action Scripts • Players Fast Forward Your Development
  195. 195. SWF and Personalized Ads • Encoding a special video version for each user is: ▫ Expensive ▫ Degrades the video • SWF Enables: ▫ Using one video version stored in FLV ▫ Changing the text or image stored in the outer SWF per user Video & Images, Audio Scripts, Personalized special Video FLV File Effects Fast Forward Your Development
  196. 196. HTTP Live Streaming by Apple Fast Forward Your Development
  197. 197. Agenda • System Overview • Components • Session Fast Forward Your Development
  198. 198. Apple’s Note Note: Many existing streaming services require specialized servers to distribute content to end users. It requires specialized skills to set up and maintain these servers, and in a large-scale deployment these servers can be costly. Apple has designed a system that avoids this by using standard HTTP to deliver the streams. Fast Forward Your Development
  199. 199. System Overview Fast Forward Your Development
  200. 200. Components Review • Server ▫ Encoder ▫ Segmenter • Distributer ▫ Basic HTTP Server • Client Fast Forward Your Development
  201. 201. Server • Receives Digital / Analog input stream • Encodes / Transcode video/audio ▫ H.264 Video ▫ AAC audio (HE-AAC or AAC-LC) • Encodes / Transcode audio only: ▫ MPEG-2 elementary streams, HE-AAC or AAC-LC files, or MP3 files • Encapsulate in MPEG2 ▫ Transport Stream ▫ Program Stream Fast Forward Your Development
  202. 202. Segmenter • All segments should be with the same duration • All segments are placed in a separate file • Creates Index file with references to segment files • For protection, the Segmenter might encrypt each media segment and create a key file Fast Forward Your Development
  203. 203. Distribution • Distribution system is a regular HTTP Server • Could be Apache or small embedded Server Fast Forward Your Development
  204. 204. Files • Segments – stored as *.ts files • Index files – stored as *.m3u8 • Index file format example: #EXTM3U #EXT-X-TARGETDURATION:10 #EXTINF:10, #EXTINF:10, #EXTINF:10, #EXT-X-ENDLIST Fast Forward Your Development
  205. 205. Session types • Live Stream Broadcast ▫ Index file is continues updated ▫ Include a moving window of segments around “live” part of the session ▫ Client should continuously refresh the Index file • VoD Session ▫ Index file static ▫ Includes ALL the segments of the file ▫ Enables “Seek” operation Fast Forward Your Development
  206. 206. Multi-bitrate multi–device support • Multi-bitrate is enabled via multiple index files • Index files are pointed by a global index files • Client can select a stream according to: ▫ Device properties ▫ Available bit rate • This method is less efficient than Silverlight Global File Fast Forward Your Development
  207. 207. Test yourself 1. What are the two Live streaming file types? 2. What is the role of the Segmenter? 3. On which delivery protocol is the live streaming based? Fast Forward Your Development
  208. 208. Silverlight Smooth Streaming Fast Forward Your Development
  209. 209. SILVERLIGHT INTRODUCTION Fast Forward Your Development
  210. 210. Smooth Streaming • Microsoft’s implementation of HTTP-based adaptive streaming • A hybrid media delivery method that acts like streaming but is in fact a series of short progressive downloads • Leverages existing HTTP caches • Client can seamlessly switch video quality and bit rate based on perceived network bandwidth and CPU resources Fast Forward Your Development
  211. 211. Streaming or Progressive Download? Traditional Progressive Streaming Download • Responsive User • Works from a Experience Web Server • Bandwidth Use • World-wide • User Tracking scale w/HTTP Challenges Challenges • No cache-ability • Limited User • Separate, Experience smaller • User tracking streaming • Bandwidth Use networks (20% watched) Fast Forward Your Development
  212. 212. Smooth Streaming Design • Smooth Streaming File Format based on MP4 (ISO Base Media File Format) • Video is encoded and stored on disk as one contiguous MP4 file ▫ Separate file for each bit rate • Each video Group of Pictures (GOP) is stored in a Movie Fragment box ▫ This allows easy fragmentation at key frames • Contiguous file is virtually split up into chunks when responding to a client request Fast Forward Your Development
  213. 213. Content Provider Benefits • Cheaper to deploy ▫ Can utilize any generic HTTP caches/proxies ▫ Doesn’t require specialized servers at every node • Better scalability and reach ▫ Reduces “last mile” issues because it can dynamically adapt to inferior network conditions • Audience can adapt to the content, rather than requiring the content providers to guess which bit rates are most likely to be accessible to their audience Fast Forward Your Development
  214. 214. End User Benefits • Fast start-up and seek times ▫ Start-up/seeking can be initiated on the lowest bit rate before moving up to a higher bit rate • No buffering, no disconnects, no playback stutter ▫ As long as the user meets the minimum bit rate requirement • Seamless bit rate switching based on network conditions and CPU capabilities. • A generally consistent, smooth playback experience Fast Forward Your Development
  215. 215. Evolution • Previous versions of MS streaming divide the file into many chunkc 0001.vid 0002.vid etc • Problematic in caching, CDNs, CMS etc • Today all fragments of a file are contained in a single bitstream container. Typically 1 fragment = 1 video GOP. Fast Forward Your Development
  216. 216. SILVERLIGHT FILES Containers & Configuration files Fast Forward Your Development
  217. 217. Format options • ASF/WMV – native Microsoft Format • MPEG4 File-Format • AVI • OGG Fast Forward Your Development
  218. 218. MP4 over ASF file format • MP4 is a lightweight container format with less overhead than ASF • MP4 is easier to parse in managed (.NET) code • MP4 is based on a widely used standard, making 3rd party adoption and support easier • MP4 has native H.264 video support • MP4 was designed to natively support payload fragmentation within the file Fast Forward Your Development
  219. 219. MP4 File format • MP4 has two format types ▫ Disk Format - for file storage ▫ Wire format - for transport • Wire format enables easy CDN support and integration Fast Forward Your Development
  220. 220. Smooth Streaming File Format Fast Forward Your Development
  221. 221. Smooth Streaming Wire Format Fast Forward Your Development
  222. 222. File extensions • Media Files ▫ *.ismv - Audio & Video ▫ *.isma – Audio only • Manifest Files ▫ *.ism – Server manifest. Describes to the server Relation between tracks, bitrates & files on disk. Based on SMIL 2.0 XML format specification ▫ *.ismc – Describes to the client the available streams, CODECS used, bitrates encoded, video resolutions, markers, captions. First file delivered to client. It’s the first file delivered to client (“SDP” like). Fast Forward Your Development
  223. 223. Directory Structure Media file in different Manifest Files bitrates Fast Forward Your Development
  224. 224. Manifest files • VC-1, WMA, H.264 and AAC codecs • Text streams • Multi-language audio tracks • Alternate video & audio tracks (i.e. multiple camera angles, director’s commentary, etc.) • Multiple hardware profiles (i.e. same bitrates targeted at different playback devices) • Script commands, markers/chapters, captions • Client manifest Gzip compression • URL obfuscation • Live encoding and streaming Fast Forward Your Development
  225. 225. ISM file sample <?xml version="1.0" encoding="utf-16" ?> - <!-- Created with Expression Encoder version 2.1.1206.0 --> - <smil xmlns=""> - <head> <meta name="clientManifestRelativePath" content="NBA.ismc" /> </head> - <body> - <switch> - <video src="NBA_3000000.ismv" systemBitrate="3000000"> <param name="trackID" value="2" valuetype="data" /> </video> - <video src="NBA_2400000.ismv" systemBitrate="2400000"> <param name="trackID" value="2" valuetype="data" /> </video> - <video src="NBA_1800000.ismv" systemBitrate="1800000"> <param name="trackID" value="2" valuetype="data" /> </video> Fast Forward Your Development
  226. 226. ISM file sample - <video src="NBA_1300000.ismv" systemBitrate="1300000"> <param name="trackID" value="2" valuetype="data" /> </video> - <video src="NBA_800000.ismv" systemBitrate="800000"> <param name="trackID" value="2" valuetype="data" /> </video> - <video src="NBA_500000.ismv" systemBitrate="500000"> <param name="trackID" value="2" valuetype="data" /> </video> - <audio src="NBA_3000000.ismv" systemBitrate="64000"> <param name="trackID" value="1" valuetype="data" /> </audio> </switch> </body> </smil> Fast Forward Your Development
  227. 227. *.ISMC sample <?xml version="1.0" encoding="utf-16" ?> - <!-- Created with Expression Encoder version 2.1.1206.0 --> - <SmoothStreamingMedia MajorVersion="1" MinorVersion="0" Duration="4084405506"> - <StreamIndex Type="video" Subtype="WVC1" Chunks="208" Url="QualityLevels({bitrate})/Fragments(video={start time})"> <QualityLevel Bitrate="3000000" FourCC="WVC1" Width="1280" Height="720" CodecPrivateData="250000010FD3FE27F1678A27F859E80C9082DB8D44A9 C00000010E5A67F840" /> <QualityLevel Bitrate="2400000" FourCC="WVC1" Width="1056" Height="592" CodecPrivateData="250000010FD3FE20F1278A20F849E80C9082493DEDDC C00000010E5A67F840" /> <QualityLevel Bitrate="1800000" FourCC="WVC1" Width="848" Height="480" CodecPrivateData="250000010FCBF81A70EF8A1A783BE80C908236EE52654 00000010E5A67F840" /> <QualityLevel Bitrate="1300000" FourCC="WVC1" Width="640" Height="352" CodecPrivateData="250000010FCBE813F0AF8A13F82BE80C9081A7ABF704 400000010E5A67F840" /> Fast Forward Your Development
  228. 228. ISMC File - 2 - <SmoothStreamingMedia MajorVersion="1" MinorVersion="0" Duration="5965419999"> - <StreamIndex Type="video" Subtype="WVC1" Chunks="299" Url="QualityLevels({bitrate})/Fragments(video={start time})"> <QualityLevel Bitrate="2750000" FourCC="WVC1" Width="1280" Height="720" CodecPrivateData="250000010FD3BE27F1678A27F859E804508253EBE8E6C00000010E5AE7F8 40" /> ….. <c n="0" d="20000000" /> <c n="1" d="20000000" /> ..... <c n="298" d="5000001" /> </StreamIndex> - <StreamIndex Type="audio" Subtype="WmaPro" Chunks="299" Url="QualityLevels({bitrate})/Fragments(audio={start time})"> <QualityLevel Bitrate="64000" WaveFormatEx="6201020044AC0000451F0000CF05100012001000030000000000000000000 000E00042C0" /> <c n="0" d="20433560" /> .... <c n="297" d="20433560" /> <c n="298" d="4393197" /> </StreamIndex> </SmoothStreamingMedia> Fast Forward Your Development
  229. 229. SILVERLIGHT SESSION Initiation and Flow Fast Forward Your Development
  230. 230. Smooth Streaming Protocol • Smooth Streaming Protocol uses HTTP [RFC2616] as its underlying transport . • The Server role in the protocol is stateless ▫ Enabling (potentially) different instance of the server to handle client requests ▫ Request can utilize any generic HTTP caches/proxies - > Lowering CDN costs Fast Forward Your Development
  231. 231. Messages • Smooth Streaming Protocol uses 4 different messages: ▫ Manifest Request ▫ Manifest Response ▫ Fragment Request ▫ Fragment Response • All messages follow the HTTP/1.1 specification Fast Forward Your Development