A Progressive Approach to the Past: Ensuring Backwards Compatability Through Cleverness and Pain

A Progressive Approach to the Past:
Ensuring Cheap Backwards Compatibility Through Cleverness and Pain
derekb@vimeo.com / derek@videolan.org
@daemon404
Derek Buitenhuis
13 April 2021
The British Internet

Who’s this guy?
1
13 April 2021
• Principal Video Engineer @ Vimeo
• Open source developer (FFmpeg, FFMS2, rav1e, obuparse, etc.)
• VideoLAN non-profit board member
• Professional Twitter Sh*tposter

Who am I?
2
13 April 2021
• Usually, I’m this guy:

Who am I, really?
3
13 April 2021
• Currently, I’m this guy:

Sins of Multimedia Past Last Forever
4
13 April 2021
• It is 2021. We encode to and serve fragmented MP4 for VoD.
• Audio and video are separate files.
• Segements are just range requests.
• Easier logic, easier caching.
• Some of us encode to progressive MP4, and segment at the edge.
• Can be expensive, can require running and maintaining services.
• Some people use MPEG-TS as their mezzanine. These people are monsters.
• Problems:
• Some Very Bad Programs can only consume progressive MP4.
• Your company made a bad decision over 10 yeas ago to give direct progressive MP4 URLs
to the highest paying customers.
• 10+ years of hardcoded URLs and API use. You also support VOD downloads.

Support Options
5
13 April 2021
• Don’t store videos as FMP4; store as progressive.
• Almost all traffic will have to be segmented at edge. This is expensive and dumb.
• Entirely remove progressive MP4 support.
• Least engineering work, most product work.
• Anger your highest paying users. Anger product. Anger marketing. Anger viewers on terrible devices.
• Store progressive MP4s as well, or just one rendition, such as 720p.
• Not much work.
• A lot of expensive storage for a rarely used rendition of every single video.
• People will still be angry because you took away their 240p or 4K, etc.
• Write a Very Clever Service to proxy FMP4s and make them appear progressive.
• Most engineering work.
• Service will be low volume, and thus fairly cheap.

So You’ve Chosen Pain
6
13 April 2021
• Obviously we chose the difficult engineering one.
• Things it needed :
• Transparently expose a set of FMP4 (one video, one audio) as a progressive MP4.
• Must support exact range requests, for playback in browser and Akamai cachability.
• Every request must be performant.
• Can’t read all the source moof boxes every time (more on this later).
• There are so many MP4 muxers and demuxers, but they’re all generic and not suitable.
• Source MP4s have all the info we need, such as mdat box offsets, timestamps, and sample sizes,
so the real solution is closer to de-/re-serialization.
• All input is known good. Bad input should be hard-rejected.
• So I wrote one.

MP4 Anatomy (Simplified)
8
13 April 2021
ftyp
moov
sidx
moof
mdat
moof
mdat
.
.
.
FMP4 Video
ftyp
moov
sidx
moof
mdat
moof
mdat
.
.
.
FMP4 Audio
→
ftyp
moov
mdat
Progressive
+

MP4 Anatomy (Deeper)
9
13 April 2021
[ftyp: File Type Box]
[moov: Movie Box]
[mvhd: Movie Header Box]
[trak: Track Box]
[tkhd: Track Header Box]
[edts: Edit Box]
[elst: Edit List Box]
[mdia: Media Box]
[mdhd: Media Header Box]
[hdlr: Handler Reference Box]
[minf: Media Information Box]
[vmhd: Video Media Header Box]
[dinf: Data Information Box]
[dref: Data Reference Box]
[url : Data Entry Url Box]
[stbl: Sample Table Box]
[stsd: Sample Description Box]
[avc1: Visual Description]
[avcC: AVC Configuration Box]
[colr: Colour Information Box]
[stts: Decoding Time to Sample Box]
[stsc: Sample To Chunk Box]
[stsz: Sample Size Box]
[stco: Chunk Offset Box]
[sgpd: Sample Group Description Box]
[sbgp: Sample to Group Box]
[mvex: Movie Extends Box]
[mehd: Movie Extends Header Box]
[trex: Track Extends Box]
[sidx: Segment Index Box]
[moof: Movie Fragment Box]
[mfhd: Movie Fragment Header Box]
[traf: Track Fragment Box]
[tfhd: Track Fragment Header Box]
[tfdt: Track Fragment Base Media Decode Time Box]
[trun: Track Fragment Run Box]
[mdat: Media Data Box]
→
[ftyp: File Type Box]
[moov: Movie Box]
[mvhd: Movie Header Box]
[trak: Track Box]
[tkhd: Track Header Box]
[edts: Edit Box]
[elst: Edit List Box]
[mdia: Media Box]
[mdhd: Media Header Box]
[hdlr: Handler Reference Box]
[minf: Media Information Box]
[vmhd: Video Media Header Box]
[dinf: Data Information Box]
[dref: Data Reference Box]
[url : Data Entry Url Box]
[stbl: Sample Table Box]
[stsd: Sample Description Box]
[avc1: Visual Description]
[avcC: AVC Configuration Box]
[colr: Colour Information Box]
[stts: Decoding Time to Sample Box]
[ctts: Composition Time to Sample Box]
[stss: Sync Sample Box]
[stsc: Sample To Chunk Box]
[stsz: Sample Size Box]
[co64: Chunk Offset Box]
[mdat: Media Data Box]

moov Box Strategy
10
13 April 2021
• Parse in the input moov and sidx boxes.
• Use moof offsets from the sidx boxes to use a threadpool to parse moofs in parallel.
• Construct all the non-mdat output boxes from this upfront, before reemuxing.
• This allows us to know the moov size, full file size, PTS/DTS, sync points,
and all mdat offsets upfont. This is extremely important for Content-Length and range
request support.
• Since we have all the exact parsed info from the source boxes, every size and offset
is calculable with a bit of book-keeping.
• Cache this information so that any future requests are fast.
• Now about range request support…

mdat Box Strategy
11
13 April 2021
• Packets sizes and positions in source files are all known.
• We need to properly interleave audio and video chunks.
• Chose 500ms interleaving.
• This interleaving is state – it must be consistent regardless of which range was requested.
• For example, you need to know, for any given range, how many packets into the chunk
you are when writing, and how they’re interleaved, 100% exactly.
• More on this in a second.
• We want to use persistent HTTP connections for reading all the mdats from source files.
• This means taking a minor hit in bandwidth by skipping over moofs, in order to keep it persistent.
• A prefetch is useful here.

Range Request Strategy
12
13 April 2021
• ftyp and moov boxes are calculated and cached already (byte buffer) – ranges for this are easy.
• Need to careful when handling ranges which staddle the cached moov and mdat box boundaries.
• mdat is much trickier:
• We need to calculate which source mdats (there are may per stream, remember) to start reading.
• We need to know which packets within these mdats to start outputting, and when to stop.
• We need to know how many bytes of the first and last written packets to ignore to satisfy the range.
• We need to know the exact position and state of the packet interleaving where this range starts.
• With a little pain, we can calculate this on each request, since we will know exactly what the
chunk pattern is, e.g. 12 video packets / 24 audio packets / repeat.
• If this sounds like a ton of tricky book-keeping, you are correct.

14
13 April 2021
Questions? Disgust?

A Progressive Approach to the Past: Ensuring Backwards Compatability Through Cleverness and Pain

Recommended

Recommended

More Related Content

Similar to A Progressive Approach to the Past: Ensuring Backwards Compatability Through Cleverness and Pain

Similar to A Progressive Approach to the Past: Ensuring Backwards Compatability Through Cleverness and Pain (20)

More from Derek Buitenhuis

More from Derek Buitenhuis (8)

Recently uploaded

Recently uploaded (20)

A Progressive Approach to the Past: Ensuring Backwards Compatability Through Cleverness and Pain