SlideShare a Scribd company logo
1 of 29
Adventures in cutting every last
millisecond from glass-to-glass
latency
Kieran Kunhya – kierank@obe.tv
@openbroadcastsy, @kierank_
Who am I, who are
we?• I work on FFmpeg, x264 and others…
• A lot related to professional video in OSS, probably has my
fingerprints on it
• At $job, Open Broadcast Systems builds software for
broadcasters mainly around video point to point
encoding/decoding for news/sport etc...
• Not to be confused with:
What I will talk about
• Minimising every last millisecond of latency from broadcast
production processes (before distribution)
• Encoding and Decoding often being the dominant form
of latency – will focus on this
• Doing this from a software engineering standpoint
• Not much (if any) about this at all
• Hardware-centric industry – “secret sauce” thinking
What I will not talk
about
• Doing live production
with high-bandwidth
(10-100GbE)
networking
• Network stack in
between (FEC vs SRT
vs RIST)
• Not the right audience
• Demuxed 2017 video
Live broadcast production processes (1)
• Processes in black boxes, e.g: Routing, graphics,
switching, mixing, recording, monitoring, playout,
subtitling, standards conversion etc…
• Infrastructure as complex if not more complex than
delivery
Live broadcast production processes (2)
• Heavily hardware (FPGA/DSP)
centric.
• Fixed function, black-box
products
• Low-latency processes in studio
• “Video Lines” of latency – order of
10-100 us.
• Uncompressed video - high data
rates, many Gbps.
• Legacy usage of satellite, fibre, SDI,
ASI
• Includes premium live web
video!
Video contribution
• Getting content from a remote place to one or more central
places, often studio or aggregation centre
• Minimise latency
• Often fast-paced interviews/debates
• Often uneconomical to pay for uncompressed
• Remote production, director not onsite, back at base
The live production
environment
• Largely SDI (coax) based
• Unidirectional, Gbps video
• Latency on order of ~video lines
(40us)
• Many I/O Boards to do this
• Abstracted away low latency into
~frames (40ms) (1000x increase!)
• SDKs hide capabilities of electronics
• Internal buffering?
• Hardware doing the data
processing (offload)
SDI from a software engineers point of
view
• I want the software to do as much as is
reasonably possible
• A driver, not a SDK+driver hybrid as
*all* manufacturers have
• “Offload” is irrelevant in 2019
• Start processing the data as soon as a
field arrives, not whole frame.
• Later on processing chunk by chunk
• I/O in the purest sense
• Write data and it be put to wire *now*
What you often get in reality
• Video and Audio on a separate File Descriptor
• Can never open them simultaneously so can never have exact
lipsync
• Long delays in and out of card (~2-3 frames)
….
• Not all audio tracks available
• Audio out of sync
• Video downconverted to 8-bit
• Not all blanking data available, less common parameters not
changeable
Built our own SDI
card
SDI from a software engineers point of
view
• Massive time and expense for the most important 4 lines of code
• DMA (direct memory access) buffers of 8192 bytes (approx. 1 HD line)
• Get an interrupt every 32 buffers
• Can capture, process chunks of video and push out in the ~100s lines!
• Tight timescales, need to be aware of thread priority, CPU powersaving
etc
SDI from a software engineers point of
view
• CRC not software-centric (10-bit data, 25-bit polynomial)
• We offload this otherwise big waste of CPU
• Very tedious to build frame correctly, lots of legacy
• Difficulty to verify, tools all hardware-based
• 1080p50/60 – 3G-SDI Level B, very software unfriendly
• (and lots of other implementation details)
Pixel formats
Only YUV 4:2:2 domain (as example)!
• Planar 10b – main working format
• Planar 8b - preview quality
• UYVY 10b (16-bit aligned) – SDI datastream
• Apple v210 – some hardware
• Contiguous 10-bit – SDI wire format
Pixel formats
Handwritten (no intrinsics!) SIMD for every mapping
(and others).
• 5-15x speed improvements compared to C
• Do it once, make it fast once and for all (until
new CPU…)
• Generic conversion library a difficult problem
• Intermediate pixel format(s) always a compromise
• Add special cases until you’ve done them all!
Basic encode / decode
pipeline
• Encoder
• Capture: 1-2 frames
• Encode (x264 lowestlatency, no audio compression): 1-frame
• Mux and other processing (~5ms)
• Decoder
• Wait for frame to arrive: 1-frame
• Decode the frame: 1-frame
• Frame synchronisation: 1-frame (drop and duplicate video,
resample audio)
• Push to wire: 1-2 frames
• Basic implementation: 7-frames, 280ms at 1080i25
Better encode / decode pipeline
• Encoder
• Capture: 1-frame
• Encode (x264 lowestlatency, no audio compression): 1-frame
• Mux and other processing (~5ms)
• Decoder
• Wait for frame to arrive: 1-frame
• Decode the frame: 1-frame
• Frame synchronisation: 1-frame (drop and duplicate video,
resample audio)
• Push to wire: 1 frame (10ms)
• Better implementation: 5.x-frames, 210ms at 1080i25
Better encode / decode pipeline
• Encoder
• Capture: 1-frame
• Encode (x264 lowestlatency, no audio compression): 1-frame
• Mux and other processing (~5ms)
• Decoder
• Wait for frame to arrive: 1-frame
• Decode the frame: 1-frame
• Frame synchronisation: 1-frame (drop and duplicate video,
resample audio)
• Push to wire: 1 frames
• Better implementation: 6-frames, 240ms at 1080i25
Decode frame as it arrives on the
wire
• Fix FFmpeg chunk decode
• Slices arrive at destination • Complete frame is built
Better encode / decode pipeline
• Encoder
• Capture: 1-frame
• Encode (x264 lowestlatency, no audio compression): 1-frame
• Mux and other processing (~5ms)
• Decoder
• Wait for frame to arrive: 1-frame
• Decode the frame as it arrives: 1-frame
• Frame synchronisation: 1-frame (drop and duplicate video,
resample audio)
• Push to wire: 10ms
• Better implementation: 4.x-frames, 170ms at 1080i25
Better encode / decode pipeline
• Encoder
• Capture: 1-field
• Encode (x264 lowestlatency, no audio compression): 1-field
• Mux and other processing (~5ms)
• Decoder
• Decode the frame as it arrives: 1-frame
• Frame synchronisation: 1-frame (drop and duplicate video,
resample audio)
• Push to wire: 10ms
• Better implementation: 3.x-frames, 130ms at 1080i25
Better encode / decode pipeline
• Encoder
• Capture: 1-field
• Encode (x264 lowestlatency, no audio compression): 1-field
• Mux and other processing (~5ms)
• Decoder
• Decode the frame as it arrives: 1-frame
• Frame synchronisation: 1-frame (drop and duplicate video,
resample audio)
• Push to wire: 10ms
• Better implementation: 3.x-frames, ~130ms at 1080i25
Clocks
• Drift clock to match remote clock
• Clocks do not match (temperature etc), drift can be fast
• Control the onboard oscillator on the SDI Card to match remote clock
• Saves having to drop/duplicate video and resample audio to match
• Same number of frames pushed per hour, per day etc
• At low latencies, clock drift bites you quicker
Better encode / decode pipeline
• Encoder
• Capture: 1-field
• Encode (x264 lowestlatency, no audio compression): 1-field
• Mux and other processing (~5ms)
• Decoder
• Decode the frame as it arrives: 1-frame
• Push to wire: 10ms
• Better implementation: 2.x-frames, 90ms at 1080i25
Better encode / decode pipeline
• Encoder
• Capture: 1-field
• Encode (x264 lowestlatency, no audio compression): 1-field
• Mux and other processing (~5ms)
• Decoder
• Decode the frame as it arrives: 1-frame
• Push to wire: 10ms
• Decode the frame to the wire as it arrives
• Better implementation: 1.x-frames, ~50ms at 1080i25
Chunk based encode and decode
• Throughout all of these improvements, bitrate roughly the same, no
loss in picture quality owing to H.264 bitexact decode.
• Diminishing returns now but some very high end applications
demand even lower latency
• Not a good idea for H.264, ratecontrol would prefer full frame
• Codecs like JPEG2000, VC-2, JPEG-XS operate on slices
• Limited use of slice based encoding in software
• Capture, Encode, Decode and Render before the frame has even
finished arrive on the wire at source (~20ms latency)
• Concert video wall, VR etc
Chunk based encode and decode
Destination
Source
• 10-20ms end-to-end
• Huge bitrate penalty (~100s Mbps)
• High quality network also
required
Thanks
• Thanks to team working on this
• James Darnley
• Rafael Carre
• Sam Willcocks
The END

More Related Content

What's hot

Transport Stream And Next Generation Logging
Transport Stream And Next Generation LoggingTransport Stream And Next Generation Logging
Transport Stream And Next Generation Logging
Volicon
 
QNAP Surveillance Solutions
QNAP Surveillance SolutionsQNAP Surveillance Solutions
QNAP Surveillance Solutions
CK Chen
 
Video decoding: SDI interface implementation &H.264/AVC bitstreamdecoder hard...
Video decoding: SDI interface implementation &H.264/AVC bitstreamdecoder hard...Video decoding: SDI interface implementation &H.264/AVC bitstreamdecoder hard...
Video decoding: SDI interface implementation &H.264/AVC bitstreamdecoder hard...
Vicheka Phor
 

What's hot (19)

Ip live production
Ip live productionIp live production
Ip live production
 
SMPTE Toronto Presentation - Multi Camera High Definition TV Studio and Roger...
SMPTE Toronto Presentation - Multi Camera High Definition TV Studio and Roger...SMPTE Toronto Presentation - Multi Camera High Definition TV Studio and Roger...
SMPTE Toronto Presentation - Multi Camera High Definition TV Studio and Roger...
 
Technology at the cutting edge of broadcasting
Technology at the cutting edge of broadcastingTechnology at the cutting edge of broadcasting
Technology at the cutting edge of broadcasting
 
Transport Stream And Next Generation Logging
Transport Stream And Next Generation LoggingTransport Stream And Next Generation Logging
Transport Stream And Next Generation Logging
 
Tackling 400 MHz Timing Closure
Tackling 400 MHz Timing ClosureTackling 400 MHz Timing Closure
Tackling 400 MHz Timing Closure
 
Broadcast day-2007-tandberg-ipxasi
Broadcast day-2007-tandberg-ipxasiBroadcast day-2007-tandberg-ipxasi
Broadcast day-2007-tandberg-ipxasi
 
QNAP Surveillance Solutions
QNAP Surveillance SolutionsQNAP Surveillance Solutions
QNAP Surveillance Solutions
 
Qnap Surveillance Solutions
Qnap Surveillance Solutions Qnap Surveillance Solutions
Qnap Surveillance Solutions
 
AVB intro
AVB introAVB intro
AVB intro
 
Faxing Q and A
Faxing Q and AFaxing Q and A
Faxing Q and A
 
Opus codec
Opus codecOpus codec
Opus codec
 
SMPTE Toronto Presentation - Open-Source Software In Broadcasting: The Power ...
SMPTE Toronto Presentation - Open-Source Software In Broadcasting: The Power ...SMPTE Toronto Presentation - Open-Source Software In Broadcasting: The Power ...
SMPTE Toronto Presentation - Open-Source Software In Broadcasting: The Power ...
 
Embrionix Vsf May2011
Embrionix Vsf May2011Embrionix Vsf May2011
Embrionix Vsf May2011
 
Software defined networking: Primer
Software defined networking: PrimerSoftware defined networking: Primer
Software defined networking: Primer
 
Develop Smart Solutions with Raspberry Pi and EnableX Live Video API
Develop Smart Solutions with Raspberry Pi and EnableX Live Video APIDevelop Smart Solutions with Raspberry Pi and EnableX Live Video API
Develop Smart Solutions with Raspberry Pi and EnableX Live Video API
 
MAP-E as IPv4 over IPv6 Technology
MAP-E as IPv4 over IPv6 TechnologyMAP-E as IPv4 over IPv6 Technology
MAP-E as IPv4 over IPv6 Technology
 
Video decoding: SDI interface implementation &H.264/AVC bitstreamdecoder hard...
Video decoding: SDI interface implementation &H.264/AVC bitstreamdecoder hard...Video decoding: SDI interface implementation &H.264/AVC bitstreamdecoder hard...
Video decoding: SDI interface implementation &H.264/AVC bitstreamdecoder hard...
 
Quick Tips from the IT Trenches
Quick Tips from the IT TrenchesQuick Tips from the IT Trenches
Quick Tips from the IT Trenches
 
IPv4 over IPv6 in the Venue, APRICOT-APAN 2015 Fukuoka
IPv4 over IPv6 in the Venue, APRICOT-APAN 2015 FukuokaIPv4 over IPv6 in the Venue, APRICOT-APAN 2015 Fukuoka
IPv4 over IPv6 in the Venue, APRICOT-APAN 2015 Fukuoka
 

Similar to London Video Tech - Adventures in cutting every last millisecond from glass-to-glass latency

02.m3 cms sys-req4mediastreaming
02.m3 cms sys-req4mediastreaming02.m3 cms sys-req4mediastreaming
02.m3 cms sys-req4mediastreaming
tarensi
 
GamingAnywhere: An Open Cloud Gaming System
GamingAnywhere: An Open Cloud Gaming SystemGamingAnywhere: An Open Cloud Gaming System
GamingAnywhere: An Open Cloud Gaming System
Academia Sinica
 

Similar to London Video Tech - Adventures in cutting every last millisecond from glass-to-glass latency (20)

Glitch-Free A/V Encoding (CocoaConf Boston, October 2013)
Glitch-Free A/V Encoding (CocoaConf Boston, October 2013)Glitch-Free A/V Encoding (CocoaConf Boston, October 2013)
Glitch-Free A/V Encoding (CocoaConf Boston, October 2013)
 
Criteo Labs Infrastructure Tech Talk Meetup Nov. 7
Criteo Labs Infrastructure Tech Talk Meetup Nov. 7Criteo Labs Infrastructure Tech Talk Meetup Nov. 7
Criteo Labs Infrastructure Tech Talk Meetup Nov. 7
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
Digitizing and Delivering Audio and Video
Digitizing and Delivering Audio and VideoDigitizing and Delivering Audio and Video
Digitizing and Delivering Audio and Video
 
Encoding Video for the Web - Webinar from ReelSEO.com
Encoding Video for the Web  - Webinar from ReelSEO.comEncoding Video for the Web  - Webinar from ReelSEO.com
Encoding Video for the Web - Webinar from ReelSEO.com
 
7 reasons why video conferencing world will never
7 reasons why video conferencing world will never7 reasons why video conferencing world will never
7 reasons why video conferencing world will never
 
02.m3 cms sys-req4mediastreaming
02.m3 cms sys-req4mediastreaming02.m3 cms sys-req4mediastreaming
02.m3 cms sys-req4mediastreaming
 
Dcp
DcpDcp
Dcp
 
Distributed high-quality image manipulation and review in a virtual collabora...
Distributed high-quality image manipulation and review in a virtual collabora...Distributed high-quality image manipulation and review in a virtual collabora...
Distributed high-quality image manipulation and review in a virtual collabora...
 
Chapter 6 : VIDEO
Chapter 6 : VIDEOChapter 6 : VIDEO
Chapter 6 : VIDEO
 
Chapter 6
Chapter 6Chapter 6
Chapter 6
 
Image and Video formates
Image and Video formatesImage and Video formates
Image and Video formates
 
Building the ZoomFloppy (ECCC 2010)
Building the ZoomFloppy (ECCC 2010)Building the ZoomFloppy (ECCC 2010)
Building the ZoomFloppy (ECCC 2010)
 
Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution
Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-ResolutionUltra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution
Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution
 
038 039 home theater
038 039 home theater038 039 home theater
038 039 home theater
 
GamingAnywhere: An Open Cloud Gaming System
GamingAnywhere: An Open Cloud Gaming SystemGamingAnywhere: An Open Cloud Gaming System
GamingAnywhere: An Open Cloud Gaming System
 
Design in Motion: Video Production Workflow
Design in Motion: Video Production WorkflowDesign in Motion: Video Production Workflow
Design in Motion: Video Production Workflow
 
Creating React for live streams - Insights on low-latency multimedia processing
Creating React for live streams - Insights on low-latency multimedia processingCreating React for live streams - Insights on low-latency multimedia processing
Creating React for live streams - Insights on low-latency multimedia processing
 
simple video compression
simple video compression simple video compression
simple video compression
 
Sundance Profile 2014
Sundance Profile 2014Sundance Profile 2014
Sundance Profile 2014
 

More from Kieran Kunhya

More from Kieran Kunhya (9)

Baby Demuxed's First Assembly Language Function
Baby Demuxed's First Assembly Language FunctionBaby Demuxed's First Assembly Language Function
Baby Demuxed's First Assembly Language Function
 
Stable Feed and Lower Costs with Use of 5G and Satellite Stable Feed and Lowe...
Stable Feed and Lower Costs with Use of 5G and Satellite Stable Feed and Lowe...Stable Feed and Lower Costs with Use of 5G and Satellite Stable Feed and Lowe...
Stable Feed and Lower Costs with Use of 5G and Satellite Stable Feed and Lowe...
 
Moving to software-based production workflows and containerisation of media a...
Moving to software-based production workflows and containerisation of media a...Moving to software-based production workflows and containerisation of media a...
Moving to software-based production workflows and containerisation of media a...
 
AVX512 assembly language in FFmpeg
AVX512 assembly language in FFmpegAVX512 assembly language in FFmpeg
AVX512 assembly language in FFmpeg
 
Private 5G Networks at the Queen's Funeral and Elsewhere
Private 5G Networks at the Queen's Funeral and ElsewherePrivate 5G Networks at the Queen's Funeral and Elsewhere
Private 5G Networks at the Queen's Funeral and Elsewhere
 
IBC 2022 IP Showcase - Timestamps in ST 2110: What They Mean and How to Measu...
IBC 2022 IP Showcase - Timestamps in ST 2110: What They Mean and How to Measu...IBC 2022 IP Showcase - Timestamps in ST 2110: What They Mean and How to Measu...
IBC 2022 IP Showcase - Timestamps in ST 2110: What They Mean and How to Measu...
 
5G for onboard racing car video
5G for onboard racing car video5G for onboard racing car video
5G for onboard racing car video
 
Ground-Cloud-Cloud-Ground - NAB 2022 IP Showcase
Ground-Cloud-Cloud-Ground - NAB 2022 IP ShowcaseGround-Cloud-Cloud-Ground - NAB 2022 IP Showcase
Ground-Cloud-Cloud-Ground - NAB 2022 IP Showcase
 
How to explain ST 2110 to a six year old.
How to explain ST 2110 to a six year old.How to explain ST 2110 to a six year old.
How to explain ST 2110 to a six year old.
 

Recently uploaded

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

London Video Tech - Adventures in cutting every last millisecond from glass-to-glass latency

  • 1. Adventures in cutting every last millisecond from glass-to-glass latency Kieran Kunhya – kierank@obe.tv @openbroadcastsy, @kierank_
  • 2. Who am I, who are we?• I work on FFmpeg, x264 and others… • A lot related to professional video in OSS, probably has my fingerprints on it • At $job, Open Broadcast Systems builds software for broadcasters mainly around video point to point encoding/decoding for news/sport etc... • Not to be confused with:
  • 3. What I will talk about • Minimising every last millisecond of latency from broadcast production processes (before distribution) • Encoding and Decoding often being the dominant form of latency – will focus on this • Doing this from a software engineering standpoint • Not much (if any) about this at all • Hardware-centric industry – “secret sauce” thinking
  • 4. What I will not talk about • Doing live production with high-bandwidth (10-100GbE) networking • Network stack in between (FEC vs SRT vs RIST) • Not the right audience • Demuxed 2017 video
  • 5. Live broadcast production processes (1) • Processes in black boxes, e.g: Routing, graphics, switching, mixing, recording, monitoring, playout, subtitling, standards conversion etc… • Infrastructure as complex if not more complex than delivery
  • 6. Live broadcast production processes (2) • Heavily hardware (FPGA/DSP) centric. • Fixed function, black-box products • Low-latency processes in studio • “Video Lines” of latency – order of 10-100 us. • Uncompressed video - high data rates, many Gbps. • Legacy usage of satellite, fibre, SDI, ASI • Includes premium live web video!
  • 7. Video contribution • Getting content from a remote place to one or more central places, often studio or aggregation centre • Minimise latency • Often fast-paced interviews/debates • Often uneconomical to pay for uncompressed • Remote production, director not onsite, back at base
  • 8. The live production environment • Largely SDI (coax) based • Unidirectional, Gbps video • Latency on order of ~video lines (40us) • Many I/O Boards to do this • Abstracted away low latency into ~frames (40ms) (1000x increase!) • SDKs hide capabilities of electronics • Internal buffering? • Hardware doing the data processing (offload)
  • 9. SDI from a software engineers point of view • I want the software to do as much as is reasonably possible • A driver, not a SDK+driver hybrid as *all* manufacturers have • “Offload” is irrelevant in 2019 • Start processing the data as soon as a field arrives, not whole frame. • Later on processing chunk by chunk • I/O in the purest sense • Write data and it be put to wire *now*
  • 10. What you often get in reality • Video and Audio on a separate File Descriptor • Can never open them simultaneously so can never have exact lipsync • Long delays in and out of card (~2-3 frames) …. • Not all audio tracks available • Audio out of sync • Video downconverted to 8-bit • Not all blanking data available, less common parameters not changeable
  • 11. Built our own SDI card
  • 12. SDI from a software engineers point of view • Massive time and expense for the most important 4 lines of code • DMA (direct memory access) buffers of 8192 bytes (approx. 1 HD line) • Get an interrupt every 32 buffers • Can capture, process chunks of video and push out in the ~100s lines! • Tight timescales, need to be aware of thread priority, CPU powersaving etc
  • 13. SDI from a software engineers point of view • CRC not software-centric (10-bit data, 25-bit polynomial) • We offload this otherwise big waste of CPU • Very tedious to build frame correctly, lots of legacy • Difficulty to verify, tools all hardware-based • 1080p50/60 – 3G-SDI Level B, very software unfriendly • (and lots of other implementation details)
  • 14. Pixel formats Only YUV 4:2:2 domain (as example)! • Planar 10b – main working format • Planar 8b - preview quality • UYVY 10b (16-bit aligned) – SDI datastream • Apple v210 – some hardware • Contiguous 10-bit – SDI wire format
  • 15. Pixel formats Handwritten (no intrinsics!) SIMD for every mapping (and others). • 5-15x speed improvements compared to C • Do it once, make it fast once and for all (until new CPU…) • Generic conversion library a difficult problem • Intermediate pixel format(s) always a compromise • Add special cases until you’ve done them all!
  • 16. Basic encode / decode pipeline • Encoder • Capture: 1-2 frames • Encode (x264 lowestlatency, no audio compression): 1-frame • Mux and other processing (~5ms) • Decoder • Wait for frame to arrive: 1-frame • Decode the frame: 1-frame • Frame synchronisation: 1-frame (drop and duplicate video, resample audio) • Push to wire: 1-2 frames • Basic implementation: 7-frames, 280ms at 1080i25
  • 17. Better encode / decode pipeline • Encoder • Capture: 1-frame • Encode (x264 lowestlatency, no audio compression): 1-frame • Mux and other processing (~5ms) • Decoder • Wait for frame to arrive: 1-frame • Decode the frame: 1-frame • Frame synchronisation: 1-frame (drop and duplicate video, resample audio) • Push to wire: 1 frame (10ms) • Better implementation: 5.x-frames, 210ms at 1080i25
  • 18. Better encode / decode pipeline • Encoder • Capture: 1-frame • Encode (x264 lowestlatency, no audio compression): 1-frame • Mux and other processing (~5ms) • Decoder • Wait for frame to arrive: 1-frame • Decode the frame: 1-frame • Frame synchronisation: 1-frame (drop and duplicate video, resample audio) • Push to wire: 1 frames • Better implementation: 6-frames, 240ms at 1080i25
  • 19. Decode frame as it arrives on the wire • Fix FFmpeg chunk decode • Slices arrive at destination • Complete frame is built
  • 20. Better encode / decode pipeline • Encoder • Capture: 1-frame • Encode (x264 lowestlatency, no audio compression): 1-frame • Mux and other processing (~5ms) • Decoder • Wait for frame to arrive: 1-frame • Decode the frame as it arrives: 1-frame • Frame synchronisation: 1-frame (drop and duplicate video, resample audio) • Push to wire: 10ms • Better implementation: 4.x-frames, 170ms at 1080i25
  • 21. Better encode / decode pipeline • Encoder • Capture: 1-field • Encode (x264 lowestlatency, no audio compression): 1-field • Mux and other processing (~5ms) • Decoder • Decode the frame as it arrives: 1-frame • Frame synchronisation: 1-frame (drop and duplicate video, resample audio) • Push to wire: 10ms • Better implementation: 3.x-frames, 130ms at 1080i25
  • 22. Better encode / decode pipeline • Encoder • Capture: 1-field • Encode (x264 lowestlatency, no audio compression): 1-field • Mux and other processing (~5ms) • Decoder • Decode the frame as it arrives: 1-frame • Frame synchronisation: 1-frame (drop and duplicate video, resample audio) • Push to wire: 10ms • Better implementation: 3.x-frames, ~130ms at 1080i25
  • 23. Clocks • Drift clock to match remote clock • Clocks do not match (temperature etc), drift can be fast • Control the onboard oscillator on the SDI Card to match remote clock • Saves having to drop/duplicate video and resample audio to match • Same number of frames pushed per hour, per day etc • At low latencies, clock drift bites you quicker
  • 24. Better encode / decode pipeline • Encoder • Capture: 1-field • Encode (x264 lowestlatency, no audio compression): 1-field • Mux and other processing (~5ms) • Decoder • Decode the frame as it arrives: 1-frame • Push to wire: 10ms • Better implementation: 2.x-frames, 90ms at 1080i25
  • 25. Better encode / decode pipeline • Encoder • Capture: 1-field • Encode (x264 lowestlatency, no audio compression): 1-field • Mux and other processing (~5ms) • Decoder • Decode the frame as it arrives: 1-frame • Push to wire: 10ms • Decode the frame to the wire as it arrives • Better implementation: 1.x-frames, ~50ms at 1080i25
  • 26. Chunk based encode and decode • Throughout all of these improvements, bitrate roughly the same, no loss in picture quality owing to H.264 bitexact decode. • Diminishing returns now but some very high end applications demand even lower latency • Not a good idea for H.264, ratecontrol would prefer full frame • Codecs like JPEG2000, VC-2, JPEG-XS operate on slices • Limited use of slice based encoding in software • Capture, Encode, Decode and Render before the frame has even finished arrive on the wire at source (~20ms latency) • Concert video wall, VR etc
  • 27. Chunk based encode and decode Destination Source • 10-20ms end-to-end • Huge bitrate penalty (~100s Mbps) • High quality network also required
  • 28. Thanks • Thanks to team working on this • James Darnley • Rafael Carre • Sam Willcocks