Could cloud service speech to text create accessibility captions for a VOD library?
It's cheap to run, easy to build. Here is what we considered planning automation
3. VOD content supply
Catch up VOD
✓ Captions supplied reliably, required for broadcast
Live Recorded VOD (L2V)
x Live captions in sent as file much later
Back Catalogue VOD (BCV)
x Content released before broadcast so not required yet
x Digital only content deals often don’t supply captions
4. ● Back Catalogue VOD has a large audience
● BCV has year long shelf life, Live typically a week
● Time to review, clips supplied a week ahead
● Shows are global, differing formats and genres
BCV Back Catalogue
5. Why is Live special
● News & daily talk shows
● Over 12 hours daily supplied
● Words are enunciated cleary
● Minimum background sound
● Easy to add to existing automated workflow
● All Australian content, consistent accents
Live was determined our preferred
content for automated caption trial
6. Why we tried?
We determined traditional
captioning partners were
out of our (small digital)
budget and had slower
turn around than planned.
8. POC solution was:
● Cheap, $0.03/min
● Completely
automated
● Fast, ~2x real time
Benefits
9. What about the others?
Similar costs but more complexities caused
moving large files from our existing host
Google was most accurate translating in testing
15. The team praised
the technology
But, we had to consider
the likely user reaction
“AI generated captions are amazing!”
“Are they illiterate? This is embarrassing.”
16. Each translation is scored
“Windsor” - Tow (0.4940) wins (0.9490) a (0.9801) bridge (0.6395)
“Quay” - Circular (1.0000) key (0.3497)
“How the fight” - About (0.9996) hell (0.8830) the (0.9879) fuck (0.8509)
“Turnbull” - Alleged (0.9986) malcolm (0.9986) terrible (0.4550) also (1.0000)