Trialling AI to
automate captions
Development - Jacksen Kline
Video Architect - Jeremy Brown
Some content
is still supplied
uncaptioned
VOD content supply
Catch up VOD
✓ Captions supplied reliably, required for broadcast
Live Recorded VOD (L2V)
x Live captions in sent as file much later
Back Catalogue VOD (BCV)
x Content released before broadcast so not required yet
x Digital only content deals often don’t supply captions
● Back Catalogue VOD has a large audience
● BCV has year long shelf life, Live typically a week
● Time to review, clips supplied a week ahead
● Shows are global, differing formats and genres
BCV Back Catalogue
Why is Live special
● News & daily talk shows
● Over 12 hours daily supplied
● Words are enunciated cleary
● Minimum background sound
● Easy to add to existing automated workflow
● All Australian content, consistent accents
Live was determined our preferred
content for automated caption trial
Why we tried?
We determined traditional
captioning partners were
out of our (small digital)
budget and had slower
turn around than planned.
Caption process
POC solution was:
● Cheap, $0.03/min
● Completely
automated
● Fast, ~2x real time
Benefits
What about the others?
Similar costs but more complexities caused
moving large files from our existing host
Google was most accurate translating in testing
Results were accurate,
but did fail in bad ways
“Windsor”
Location names
“Quay”
Homophones
“how the fight”
The Australian accent?
“Turnbull”
The team praised
the technology
But, we had to consider
the likely user reaction
“AI generated captions are amazing!”
“Are they illiterate? This is embarrassing.”
Each translation is scored
“Windsor” - Tow (0.4940) wins (0.9490) a (0.9801) bridge (0.6395)
“Quay” - Circular (1.0000) key (0.3497)
“How the fight” - About (0.9996) hell (0.8830) the (0.9879) fuck (0.8509)
“Turnbull” - Alleged (0.9986) malcolm (0.9986) terrible (0.4550) also (1.0000)
POC Caption
Output
Publish direct if
highly confident
Manual review if
less confident
We are considering
reviewing low scores
How this efficient
review tool could look
Certainly viable, check your
legal obligations for accuracy
before considering a fully
automated workflow
Final thoughts
Speech translation is
improving quickly and
could be be less reliant
on manual intervention
Final thoughts
Thanks
Development - Jacksen Kline
Video Architect - Jeremy Brown

Trialling AI speech to automate VOD captions

  • 1.
    Trialling AI to automatecaptions Development - Jacksen Kline Video Architect - Jeremy Brown
  • 2.
    Some content is stillsupplied uncaptioned
  • 3.
    VOD content supply Catchup VOD ✓ Captions supplied reliably, required for broadcast Live Recorded VOD (L2V) x Live captions in sent as file much later Back Catalogue VOD (BCV) x Content released before broadcast so not required yet x Digital only content deals often don’t supply captions
  • 4.
    ● Back CatalogueVOD has a large audience ● BCV has year long shelf life, Live typically a week ● Time to review, clips supplied a week ahead ● Shows are global, differing formats and genres BCV Back Catalogue
  • 5.
    Why is Livespecial ● News & daily talk shows ● Over 12 hours daily supplied ● Words are enunciated cleary ● Minimum background sound ● Easy to add to existing automated workflow ● All Australian content, consistent accents Live was determined our preferred content for automated caption trial
  • 6.
    Why we tried? Wedetermined traditional captioning partners were out of our (small digital) budget and had slower turn around than planned.
  • 7.
  • 8.
    POC solution was: ●Cheap, $0.03/min ● Completely automated ● Fast, ~2x real time Benefits
  • 9.
    What about theothers? Similar costs but more complexities caused moving large files from our existing host Google was most accurate translating in testing
  • 10.
    Results were accurate, butdid fail in bad ways
  • 11.
  • 12.
  • 13.
    “how the fight” TheAustralian accent?
  • 14.
  • 15.
    The team praised thetechnology But, we had to consider the likely user reaction “AI generated captions are amazing!” “Are they illiterate? This is embarrassing.”
  • 16.
    Each translation isscored “Windsor” - Tow (0.4940) wins (0.9490) a (0.9801) bridge (0.6395) “Quay” - Circular (1.0000) key (0.3497) “How the fight” - About (0.9996) hell (0.8830) the (0.9879) fuck (0.8509) “Turnbull” - Alleged (0.9986) malcolm (0.9986) terrible (0.4550) also (1.0000)
  • 17.
    POC Caption Output Publish directif highly confident Manual review if less confident We are considering reviewing low scores
  • 18.
  • 19.
    Certainly viable, checkyour legal obligations for accuracy before considering a fully automated workflow Final thoughts
  • 20.
    Speech translation is improvingquickly and could be be less reliant on manual intervention Final thoughts
  • 21.
    Thanks Development - JacksenKline Video Architect - Jeremy Brown