Multivendor cloud production with VSF
TR-11 - there and back again
Kieran Kunhya – kierank@obe.tv
Company Overview
• Specialists in software-based
encoders and decoders for
Sport, News and Channel
contribution (B2B)
• Based in Central London
• Build everything in house
• Hardware, firmware, software
• Not to be confused with:
Agenda
• What are the technical challenges with multivendor cloud
production?
• How is VSF TR-11 (formerly known as Ground-Cloud-Cloud-Ground)
solving these technical challenges?
• How can you help?
• Will talk about the principles instead of the implementation details
• Complicated topic (how can we simplify?)
Life in the Cloud
• The pandemic demonstrated the ability of cloud to
scale-up compute-heavy network-heavy services:
• Zoom, Cloud-hosted email, Social Media,
Amazon, Netflix etc…
• But television broadcast production is still
mainly on-premise – nearly all mid/high end
production is some variant of in-person.
• Cloud economics (scale-up/scale down) seems a
great alternative to paying for resources that stay
idle most of the time – what is stopping us?
• Broadcast is next whether you like it or not
• Highly regulated industries, Healthcare, Finance,
National Security already moving
• Sysadmins, database admins etc. all thought
they were immune
Cloud is eating the world
Ground-Cloud-Cloud-Ground (GCCG)
• I want to do mid/high end television in the
cloud!
• The GCCG working group of the VSF is trying to
solve these problems
• Now published as TR-11 draft + GitHub API
Moving Cloud production to the next level
• “But I’ve been doing live cloud production” – Yes and No
• Single Vendor Monolithic applications such as Channel-in-a-box, playout
server, cloud switchers, use the cloud as a home, but not necessarily as
a scalable architecture
• Proprietary Transports stifle innovation (IE6, Flash, Silverlight)
• To get widespread adoption we must have:
• Multi-vendor interoperation via standard APIs
• Appropriate-to-task picture quality levels
• Standards for Ground-Cloud-Cloud-Ground
• Agreed mechanism(s) for building workflows
Cloud production – What makes it difficult?
• Integration with the ground – both ways
• Must work into existing workflows
• SDI, ST 2110, satellite, cable, DTT
• Legacy Workflows have well-defined linear timing
models (e.g SDI, ST 2110-21, MPEG-TS VBV)
• Without a proper timing model, you end up with
variable (undefined) latency
• One reason web streams are 20-30 seconds
behind broadcast – They don’t have a timing model!
• What are my neighbours cheering about?
• Inter-cutting ground and cloud requires timing
Let’s do 2110 in the cloud
• Some people claiming to have 2110 in
public cloud
• But it’s not possible right now in any
public cloud:
• No (full) PTP in the cloud – all clouds
handle time their own way
• Cloud networks are shared and have
packet loss
• Other implementation challenges
• Is this even a good idea?
The end of linear, lockstep processing
• No, it’s not a good idea
• We don’t actually want linear, lockstep processing in cloud any more
• We DO want to allow cloud instances to process data non-linearly,
sometimes faster or slower than real-time but on average real-time –
known worst case
• How to handle “synthetic” sources (e.g clips, graphics) played out from
cloud?
•Cloud-native vs lift-and-shift
The end of linear, lockstep processing
• What does this mean in simple terms?
Time
• Before: Processes operate with a strict
lockstep and fixed interval
• After: Processes have variable delays but
worst case is known
• Strict lockstep recoverable (e.g by video
encoder) for integration with ground
• Technical note: Analogous to MPEG VBV
Video
Frames
from a
process
Video
Frames
from a
process
Cloud-native transport
• To get the benefits of cloud, we also must trust the cloud
• i.e. Depend on cloud provider’s internal bulk-transport protocols
• Requirement is Throughput, with Reliability, in “bounded” time
• My data arrives correctly, in a constrained amount of time
• The Big Data community has similar needs for large data transfers
• Application may not have visibility of the internals of protocol (“black box”)
• Amazon Scalable Reliable Datagram (SRD) such an example
• Used in Amazon CDI (Cloud Digital Interface)
Amazon CDI
• How does the Amazon CDI protocol compare?
• Handles many of the challenges discussed
• An agreed way to exchange data between Amazon cloud instances.
Defined pixel data structures, metadata (e.g HDR) etc
• Amazon guarantees throughput, reliability and bounds latency
• A big step forward for the industry
• All well and good if you are in Amazon – what if you are not?
• How about a common API, with cloud vendor implementation under it?
• Amazon proposed CDI API as basis for GCCG
Summary so far
• Software/cloud applications don’t process media in a linear
lockstep fashion
• They operate with variable delays – fine if you know the worst case
• Have to depend on cloud-specific transport (not necessary IP)
• As long as cloud provider can offer a guarantee everything arrives on
time
• Cloud native and not “lift-and-shift”
• (Dinner party take-away)
VSF GCCG working group
• The GCCG working group is addressing this set of problems
• The last difficult technical problem in broadcast production (personal
view):
• How can I do a complex multicamera production in the cloud, with
comparable latency to on-premises and get it to the viewer?
• (or partial elements in the cloud)
• Numerous technical challenges
• https://vsf.tv/Ground-Cloud-Cloud-Ground.shtml
TR-11 “time floating” model
• Vocabulary (about each process step in the cloud)
• Linear vs non-Linear – why? “Real-time is relative”
• How early or late a “Media Element” (e.g video frame) can arrive
• Allow variability in the handoffs, but with an ability to predict the outcome
• Some processes must reconcile the variable inputs into a consistent output
• Must bound the input buffering (latency) yet accommodate the variability
• Majority of delay is processing delay, some delay from transport
• Applications (Workflow Steps) advertise their worst-case delay
• Dependent on resolution/framerate, cloud instance type, algorithms etc
Why does this timing model matter?
• Allows the Workflow Step (e.g a video encoder) at the end of the chain to
linearise for delivery to ground
• A current problem:
• “Why is the transport stream from my cloud production system flagging
warnings?”
• They don’t understand variable delay timing models
• Often hiding timing model issues by increasing latency
• But proper method is to know worst-case (minimises latency)
Building a Virtual Facility
• Use existing standards from Ground-Cloud and Cloud-Ground (TR-
08/09 or H.264/5 in TS)
• For inter-instance (intra-cloud) coordinated handoff (a “virtual facility”)
• Identify senders and receivers (use NMOS IS-04 extended for the purpose)
• Initiate and manage connections (NMOS IS-05 extended)
• What is the content description lingo? (JSON collection based on 2110-20
vocabulary)
• What are the transport params for interchange? (provider-specific, registered
in AMWA register)
• What is the timing description specification? (This is defined in TR-11)
• Data packing options matter for energy efficiency (Peter B speaking
tomorrow). 2110 pgroups not software friendly but exist already.
What Next?
• TR-11 draft published:
https://www.vsf.tv/download/technical_recommendations/VSF_TR-
11_2024-02-21-draft.pdf
• API on GitHub:
https://github.com/vsf-tv/gccg-api/
• Read and open GitHub Issues/Discussions
• Ask your vendors to do the same
• Can we simplify?

Multivendor cloud production with VSF TR-11 - there and back again

  • 1.
    Multivendor cloud productionwith VSF TR-11 - there and back again Kieran Kunhya – kierank@obe.tv
  • 2.
    Company Overview • Specialistsin software-based encoders and decoders for Sport, News and Channel contribution (B2B) • Based in Central London • Build everything in house • Hardware, firmware, software • Not to be confused with:
  • 3.
    Agenda • What arethe technical challenges with multivendor cloud production? • How is VSF TR-11 (formerly known as Ground-Cloud-Cloud-Ground) solving these technical challenges? • How can you help? • Will talk about the principles instead of the implementation details • Complicated topic (how can we simplify?)
  • 4.
    Life in theCloud • The pandemic demonstrated the ability of cloud to scale-up compute-heavy network-heavy services: • Zoom, Cloud-hosted email, Social Media, Amazon, Netflix etc… • But television broadcast production is still mainly on-premise – nearly all mid/high end production is some variant of in-person. • Cloud economics (scale-up/scale down) seems a great alternative to paying for resources that stay idle most of the time – what is stopping us?
  • 5.
    • Broadcast isnext whether you like it or not • Highly regulated industries, Healthcare, Finance, National Security already moving • Sysadmins, database admins etc. all thought they were immune Cloud is eating the world
  • 6.
    Ground-Cloud-Cloud-Ground (GCCG) • Iwant to do mid/high end television in the cloud! • The GCCG working group of the VSF is trying to solve these problems • Now published as TR-11 draft + GitHub API
  • 7.
    Moving Cloud productionto the next level • “But I’ve been doing live cloud production” – Yes and No • Single Vendor Monolithic applications such as Channel-in-a-box, playout server, cloud switchers, use the cloud as a home, but not necessarily as a scalable architecture • Proprietary Transports stifle innovation (IE6, Flash, Silverlight) • To get widespread adoption we must have: • Multi-vendor interoperation via standard APIs • Appropriate-to-task picture quality levels • Standards for Ground-Cloud-Cloud-Ground • Agreed mechanism(s) for building workflows
  • 8.
    Cloud production –What makes it difficult? • Integration with the ground – both ways • Must work into existing workflows • SDI, ST 2110, satellite, cable, DTT • Legacy Workflows have well-defined linear timing models (e.g SDI, ST 2110-21, MPEG-TS VBV) • Without a proper timing model, you end up with variable (undefined) latency • One reason web streams are 20-30 seconds behind broadcast – They don’t have a timing model! • What are my neighbours cheering about? • Inter-cutting ground and cloud requires timing
  • 9.
    Let’s do 2110in the cloud • Some people claiming to have 2110 in public cloud • But it’s not possible right now in any public cloud: • No (full) PTP in the cloud – all clouds handle time their own way • Cloud networks are shared and have packet loss • Other implementation challenges • Is this even a good idea?
  • 10.
    The end oflinear, lockstep processing • No, it’s not a good idea • We don’t actually want linear, lockstep processing in cloud any more • We DO want to allow cloud instances to process data non-linearly, sometimes faster or slower than real-time but on average real-time – known worst case • How to handle “synthetic” sources (e.g clips, graphics) played out from cloud? •Cloud-native vs lift-and-shift
  • 11.
    The end oflinear, lockstep processing • What does this mean in simple terms? Time • Before: Processes operate with a strict lockstep and fixed interval • After: Processes have variable delays but worst case is known • Strict lockstep recoverable (e.g by video encoder) for integration with ground • Technical note: Analogous to MPEG VBV Video Frames from a process Video Frames from a process
  • 12.
    Cloud-native transport • Toget the benefits of cloud, we also must trust the cloud • i.e. Depend on cloud provider’s internal bulk-transport protocols • Requirement is Throughput, with Reliability, in “bounded” time • My data arrives correctly, in a constrained amount of time • The Big Data community has similar needs for large data transfers • Application may not have visibility of the internals of protocol (“black box”) • Amazon Scalable Reliable Datagram (SRD) such an example • Used in Amazon CDI (Cloud Digital Interface)
  • 13.
    Amazon CDI • Howdoes the Amazon CDI protocol compare? • Handles many of the challenges discussed • An agreed way to exchange data between Amazon cloud instances. Defined pixel data structures, metadata (e.g HDR) etc • Amazon guarantees throughput, reliability and bounds latency • A big step forward for the industry • All well and good if you are in Amazon – what if you are not? • How about a common API, with cloud vendor implementation under it? • Amazon proposed CDI API as basis for GCCG
  • 14.
    Summary so far •Software/cloud applications don’t process media in a linear lockstep fashion • They operate with variable delays – fine if you know the worst case • Have to depend on cloud-specific transport (not necessary IP) • As long as cloud provider can offer a guarantee everything arrives on time • Cloud native and not “lift-and-shift” • (Dinner party take-away)
  • 15.
    VSF GCCG workinggroup • The GCCG working group is addressing this set of problems • The last difficult technical problem in broadcast production (personal view): • How can I do a complex multicamera production in the cloud, with comparable latency to on-premises and get it to the viewer? • (or partial elements in the cloud) • Numerous technical challenges • https://vsf.tv/Ground-Cloud-Cloud-Ground.shtml
  • 18.
    TR-11 “time floating”model • Vocabulary (about each process step in the cloud) • Linear vs non-Linear – why? “Real-time is relative” • How early or late a “Media Element” (e.g video frame) can arrive • Allow variability in the handoffs, but with an ability to predict the outcome • Some processes must reconcile the variable inputs into a consistent output • Must bound the input buffering (latency) yet accommodate the variability • Majority of delay is processing delay, some delay from transport • Applications (Workflow Steps) advertise their worst-case delay • Dependent on resolution/framerate, cloud instance type, algorithms etc
  • 20.
    Why does thistiming model matter? • Allows the Workflow Step (e.g a video encoder) at the end of the chain to linearise for delivery to ground • A current problem: • “Why is the transport stream from my cloud production system flagging warnings?” • They don’t understand variable delay timing models • Often hiding timing model issues by increasing latency • But proper method is to know worst-case (minimises latency)
  • 21.
    Building a VirtualFacility • Use existing standards from Ground-Cloud and Cloud-Ground (TR- 08/09 or H.264/5 in TS) • For inter-instance (intra-cloud) coordinated handoff (a “virtual facility”) • Identify senders and receivers (use NMOS IS-04 extended for the purpose) • Initiate and manage connections (NMOS IS-05 extended) • What is the content description lingo? (JSON collection based on 2110-20 vocabulary) • What are the transport params for interchange? (provider-specific, registered in AMWA register) • What is the timing description specification? (This is defined in TR-11) • Data packing options matter for energy efficiency (Peter B speaking tomorrow). 2110 pgroups not software friendly but exist already.
  • 22.
    What Next? • TR-11draft published: https://www.vsf.tv/download/technical_recommendations/VSF_TR- 11_2024-02-21-draft.pdf • API on GitHub: https://github.com/vsf-tv/gccg-api/ • Read and open GitHub Issues/Discussions • Ask your vendors to do the same • Can we simplify?

Editor's Notes

  • #3 Sports delivery using cloud such as Premier League, NFL etc. Also work with many competitors to linear broadcasting such as DAZN, Amazon Prime etc
  • #5 A lot of infrastructure for “peak” events like elections or sports
  • #7 These are my personal views, not the VSF working group’s views.
  • #8 Proprietary transport such as NDI is bad, it’s simple in the short-term like Internet Explorer 6, Silverlight, Flash etc. Still go to broadcasters that need it using WinXP in VM. Doesn’t need to always be 10-bit 4:2:2
  • #9  See a goal again, or see an racing car overtake twice