In this webinar, Owen Edwards, Senior Accessibility Consultant at SSB BART Group and contributor to the Video.js open-source web video player, and Lily Bond, Director of Marketing for 3Play Media, will deconstruct captioning and audio description down to its nuts and bolts. This webinar will explore the legal requirements, benefits, best practices, how-to's and more of captioning and audio description to ensure you can confidently proclaim yourself as an accessibility guru.
The Nuts & Bolts of Captioning & Describing Online Video
1. THE NUTS &
BOLTS OF
CAPTIONING &
DESCRIBING
ONLINE VIDEO
• Type questions in the window during the
presentation
• This webinar is being recorded & will be available
for replay
2. Presented by:
Owen Edwards
Senior Accessibility
Consultant
SSB BART Group
www.ssbbartgroup.c
Lily Bond
Director of
Marketing
3Play Media
lily@3playmedia.co
Live tweet:
#a11y
@3playmed
ia
3. What will we cover?
▸WHAT are captions & audio description?
▸WHY should you caption & describe?
▸HOW do you create captions &
descriptions?
▸WHO captions & describes your files?
▸WHERE do you publish captions &
descriptions?
▸Q&A
6. CAPTIONS VS. SUBTITLES VS. TRANSCRIPTS
Captions assume the viewer
can’t hear the audio.
Subtitles translate the audio
into another language.
Transcripts contain text of the
audio that isn’t time coded.
8. “Narration added to the soundtrack
to describe important visual details
that cannot be understood from
the main soundtrack alone.” – W3C
▸ Also known as video description,
narrative description, &
description
▸ Similar to “Directors commentary”
on DVDs
▸ Increasingly available on TV (via
“SAP”), and on some online
services
“I am here” - RNIB
12. 21st CENTURY
COMMUNICATIONS & VIDEO
ACCESSIBILITY ACT
CAPTIONS: Online video that previously
appeared on TV with captions
AD: Prime-time viewing & children’s
programming (Goal: 100% AD by 2020)
13. Web Content Accessibility
Guidelines (WCAG) 2.0
International guidelines with “success
criteria” – Levels A, AA, AAA
Level A: Captions & Transcript OR Audio
Description
Level AA: Captions & Audio Description
Level AAA: Captions & Transcript & Audio
Description & Sign Language & …
A key part of the Section 508 Refresh
14. “Excluding businesses that sell services
through the Internet from the ADA would
run afoul of the purposes of the ADA.”
“The United States respectfully submits
this Statement of Interest to correct
Harvard’s misapplication of the primary
jurisdiction doctrine and its
misunderstanding of the ADA and
- Judge Ponsor, NAD vs.
Netflix
- DOJ, NAD vs. Harvard &
15. BENEFITS OF CAPTIONS
Accessibility
> 48 million Americans
are Deaf or hard of
hearing
Comprehension
> 80% of people who
use captions aren’t
D/HoH
Flexibility
> View videos in sound-
sensitive environemtns
Video Search
> 97% of students said
interactive transcripts
enhanced experience
SEO
> Adding captions to
YouTube led to a 7.3%
increase in views
Translation
> Create multi-lingual
subtitles to reach a
global audience
Reusable
> 50% repurposed
transcripts as study
guides
Legal Requirements
> 3 major US
accessibility laws
require captioning
16. RESEARCH: HOW & WHY DO STUDENTS USE CLOSED
CAPTIONS?
www.3playmedia.com/student-research-study/
98.6%Of students find captions helpful
75%Of students use captions as a learning aid
FOCUSIs the #1 reason students use captions for learning
17. BENEFITS OF AUDIO DESCRIPTION
Accessibility
> Estimated 21 million
Americans (10%) with
vision loss.
Autism
> Helps to better
understand emotional
and social cues
Flexibility
> View videos in eyes-
free environments
Language
Development
> Listening is a key
step in learning
language.
Auditory Learners
> 20-30% of students
retain information best
through sound.
Legal Requirements
> May be required by
law
24. 1. Include description at the
production stage
2. Write a description script &
align w/ the timeline. Then record
the voice artist & mix the
description w/ original audio
3. Create an edited version of a
video w/ additional time for audio
descriptions
HOW
DO YOU
CREATE
AUDIO
DESCRIPTION
?
26. No specific standards from
WCAG, FCC, CVAA
Guidelines exist: especially
DCMP’s Description Key
Description companies have
internal best practices/standards
CVAA requirements may lead to
lawsuits which ultimately define
“good enough”
WHAT IS
“GOOD
ENOUGH”
FOR
DESCRIPTION
?
30. WHAT DO CAPTION FORMATS
LOOK LIKE?
01:00:00:00 942c
01:00:03:01 9420 9454 5468 e973 2076 e964
e5ef 20f7 e9ec ec80 94f2 97a2 7368 eff7
2079 ef75 2068 eff7 20f4 ef20 6d61 6be5
942c 8080 8080 942f
01:00:04:05 9420 9440 97a2 d9ef 7554 7562
e520 76e9 64e5 ef73 2061 e3e3 e573 73e9
62ec e520 e96e 94e0 97a2 ea75 73f4 20ef
6ee5 20e3 ece9 e36b 2c20 7573 e96e 6720
b3d0 ec61 7980 942c 8080 8080 942f
1
00:00:00,000 --> 00:00:04,000
2
00:00:04,000 --> 00:00:05,500
This video will
show you how to make
3
00:00:05,500 --> 00:00:08,860
YouTube videos accessible in
just one click, using 3Play
< SCC
SRT >
31. What players
support
description:
• Few players support a
secondary audio track for
description
• Create two copies: original & w/
AD
• Some players support a “text
track description,” which is read
out by a screen reader (issues
32. Other
implications of
video
accessibility:
Need to use an accessible video
player/platform:
• Keyboard-only access to controls
• Low-vision support
• Screen reader support
• Voice control support (e.g. Dragon)
• One or more method for AD
playback
Able player
33. Existing accessible
video
players/platforms:
Accessible players exist (but not widely
adopted):
• Able Player
• OzPlayer
Increasingly accessible players:*
• YouTube
• Kaltura
• JW Player
• Brightcove
• Akemi
• PayPal’s “accessible video player”
• Nomensa’s video player
* Not an exhaustive list, nor specific endorsement
OzPlayer
34. “Access to information and
communication technologies
is increasingly becoming the
gateway civil rights issue for
individuals with disabilities.”
- Department of Justice
Time-synchronized text that can be read while watching a video
Usually the CC icon
Originated as FCC mandate for broadcast in the 1980s
Assume the viewer can't hear; convey sound effects, speaker ID, & other non-speech elements
relevance: [keys jangling]
Terminology:
Captions assume the viewer can’t hear the audio; time synchronized & include sound effects
Subtitles assume the viewer can’t understand the audio; time synchronized & translate the audio
Transcripts include a plain text version of the audio; not time synchronized; sufficient for audio-only
Time-synchronized text that can be read while watching a video
Usually the CC icon
Originated as FCC mandate for broadcast in the 1980s
Many reasons outside of accessibility
1973
Section 504: anti-discrimination law that requires equal access for individuals with disabilities. Applies to federal & federally funded programs.
Section 508: introduced in 1998 to require federal communications and information technology to be accessible. Apples to federal programs, but often applied to federally funded programs through state & organization laws.
Closed captioning requirements are written directly into Section 508, and are often applied to Section 504
Section 508 refresh was released in January, which references WCAG 2.0 guidelines.
1990
5 sections
Title II: public entities; Title III: public accommodations (extends to private sector)
"Places of public accommodation" – what constitutes this?
Tested against online businesses
Refresh coming?
NAD vs Netflix
Netflix sued by National Association of the Deaf in 2012 for failing to provide closed captions for most of its "Watch Instantly" movies and television shows streamed on the Internet.
First time that Title III of the ADA (place of public accommodation) had been applied to Internet only businesses (before, it had only been applied to physical structures like wheelchair ramps)
Court ruled in favor of the National Association of the Deafj.
Netflix settlement: Netflix agreed to caption 100% of its streaming content.
Profound precedent
FedEx was sued for not providing closed captions on training videos
Hulu settled with the National Association of the Deaf
Amazon settled with the National Association of the Deaf
National Association of the Deaf vs. Harvard and MIT
Harvard and MIT were sued by the National Association of the Deaf for providing inaccessible video content that was either not captioned or was inaccurately/unintelligibly captioned
The first time that accuracy has been considered in legal ramifications for closed captioning (YT auto captions)
NAD argued that educational online videos should be constituted as a public accommodation.
In June of 2015, the Department of Justice submitted a statement of interest supporting the Plaintiffs' position that Harvard and MIT's free online courses and lectures discriminate against deaf and hard of hearing individuals by failing to provide equal access in the form of captions
Still waiting on a decision
Outcome will have huge implications for higher education.
2010
Previously aired on television
Clips – straight lift & montages as of January
AD: Phases in between 2010 and 2020
Currently: Top 60 TV markets required to describe 50 hours per quarter. Next phase in is July 1, 2018:
Multi channel video distributers must provide 87.5 hours per quarter
Not U.S. specific, and some countries are ahead in implementing (e.g. Australia, Canada)
Technology agnostic, although initially targeting web content. Has been applied to smartphone apps.
Level A is the most basic compliance; the DOJ has interpreted ADA as implying Level AA compliance (potentially with some exceptions), and that’s the level that most organizations, companies and educational institutions are standardizing around. Essentially, Level AAA is a list of many potential improvements over Level AA, but it is not realistic to aim for complete Level AAA compliance.
The Section 508 refresh brings 508 into sync with Level AA.
Note that this is somewhat the wrong way to look at it – while WCAG *requires* “Audio Description”, it specifically states that “if all of the information in the video track is already provided in the audio track, no audio description is necessary”. In other words, some videos are “already self-described” - *additional* audio description is *only* required for those which are *not*.
Because the laws were written before the proliferation of online video, we must turn to the courts to see what they believe about how the ADA and Section 504 should apply to the online sector:
Read quotes
1) Accessibility: 48 million Americans
growing due to: medical advancements
20% of Americans
2) Better Comprehension: 80% of users
Accent, difficult content, background noise, ESL
Flexibility to view in noise-sensitive environments (office, library, gym)
Autoplay: more and more, companies are playing captions by default with no sound – Social video like Yahoo, Facebook, Twitter
w/o captions, these are inaccessible to everyone
3) Video Search: MIT survey - 97% of users found they enhanced experience
Used to being able to search and go
4) SEO – Google can't watch a video
DDN study > increased views by 7.3%
5) Reusability
UW = 50% of students repurposing transcripts
Create infographics, white papers, case studies, other docs, course materials
6) Translation > translate English transcript to make video accessible on a global scale
7) Required by Law
Research study with Oregon State University: how & why do students use captions?
Prove how much captions help all students learn
98.6% found captions helpful
75% of students who use captions – not just those who are deaf/hard of hearing - used captions as a learning aid
#1 reason for using captions is to help students FOCUS on the video content
1) Accessibility: 2015 National Health Interview Survey found that 23.7 million Americans (10%) have trouble seeing.
2) Autism: individuals on the autistic spectrum find that audio description helps better understand emotional and social cues only demonstrated through actions or facial expressions. Answers questions like “what does that smile mean?”
3) Flexibility: view videos in eyes-free environments (cooking, in the car, etc.)
4) Language development: listening is a key step in learning language and associating it with appropriate actions and behaviors.
5) Auditory learners: research into how the brain processes information reveals that there are two channels: auditory and visual. 20-30% of students say they retain information best through sound.
6) Inattentional blindness is the phenomena where you fail to recognize visual information in plain sight. We often have instances where we missed a key visual element in a video or image until it was pointed out to us. Audio description can help point those key visuals out to all viewers.
7) May be required by law.
DIY
Standards
Accuracy
ASR errors
DIY method: YouTube. Recommend using it for timings.
Transcribe your video – takes 5-6x real time; include non-speech elements
Use YouTube to transcribe & set timings
Try not to attempt timings manually! There’s a lot of room for sync errors.
Or, download YouTube’s auto captions or edit them in the interface
If you need a different format, use a caption format converter.
Important to follow best practices for caption quality.
Spelling should be at least 99% accurate; include grammar for readability
Speaker identification should be consistent
Include relevant sound effects (keys jangling)
Use punctuation to improve readability
Verbatim: for broadcast, include every “um,” etc because it is scripted.
Clean read: for lectures, live presentations, etc., clean read is usually preferable.
CHARACTERS: 1-3 lines; 32 characters per line
FONT: non-serif
SYNC: should be perfect
MIN DUR: 1 second
PLACEMENT: shouldn’t obscure
SILENCE: should note – can drop off the screen.
We expect continuous improvements
Currently: 80% with our current tech
Even 95% isn’t sufficient for accurately conveying complex material.
Typical sentence of 8 words: 95% accuracy rate means there will be an error every 2.5 sentences.
We use our large database of human-corrected, near-perfect transcripts to continually improve our recognition accuracy
Word-to-word accuracy issues – multiply the chance of each word in a sentence being incorrect and you’ll see how accuracy issues compound.
Examples of where ASR fails. A lot of the errors make sense acoustically, but not linguistically. A human wouldn’t make those errors.
Issues to look out for:
Punctuation
Hesitation words aren’t removed
Speaker changes not captured
No Speaker ID
Acoustic errors:
- Should be “New England Aquarium” vs “new wing of the Koran”
- “forester” vs “four story”
DIY
Standards
Accuracy
ASR errors
Note the increasing cost of the three different options
See also https://dcmp.org/ai/179/ for a more complete list of Description Service Vendors
Note this is not a complete list nor an endorsement
Note: Text-based description is supported by some players (e.g. Able Player), but is not widely supported/implemented, and has issues – it is not a viable production solution at this point
This demo shows a video player with a user-selectable description track
Audio description by JJ Hunt (www.jjhunt.com)
(Note that this currently links out to a demo site – it may make more sense to pre-record the demo, and embed the recording here)
Description seems inherently subjective, at least from the editorial perspective (although some argue otherwise)
Existing description companies do a very good job – the question is, could there come an equivalent of YouTube auto-captions, and if so what quality is good enough?
Without standards development, someone may need to bring a lawsuit for non-compliance as NAD did with Harvard/MIT
Possibility for cross-over from the Caption standards? Certainly, we can identify some areas which would not meet quality standards (e.g. talking over dialog, like putting captions over text)
Note: Even WCAG acknowledges that there are situations where description is not required, so it continues to be subjective:
... if all of the information in the video track is already provided in the audio track, no audio description is necessary.
[Audio description is provided] ... except when the media is a media alternative for text and is clearly labeled as such.
Important to follow best practices for caption quality.
Spelling should be at least 99% accurate; include grammar for readability
Speaker identification should be consistent
Include relevant sound effects (keys jangling)
Use punctuation to improve readability
Verbatim: for broadcast, include every “um,” etc because it is scripted.
Clean read: for lectures, live presentations, etc., clean read is usually preferable.
CHARACTERS: 1-3 lines; 32 characters per line
FONT: non-serif
SYNC: should be perfect
MIN DUR: 1 second
PLACEMENT: shouldn’t obscure
SILENCE: should note – can drop off the screen.
Our editors are all US-based and come from all different backgrounds. They go through a rigorous certification process.
Captions are supported on most devices where you can publish video. One limitation: some social video.
Originated as a mandate for broadcast TV
Now published across all devices and platforms
Most players and platforms have caption compatibility; some more advanced
User control options on many platforms
Side Car: like YouTube, where you upload caption file and associate to video
Encode: encode caption file onto the video > Use for kiosks or offline video
Open Captions: burned in and can't be turned off
Integrations make this trivial
Caption format depends on platform/player.
Two common formats here
Same exact file; first 3 caption frames. SCC uses hex codes; SRT is much more readable.
Consider a caption format converter for something like SCC.
Note: YouTube player was Flash, and not very accessible. They added HTML5, which was more accessible, but opt-in. As of Jan 2015, HTML5 is default (see http://youtube-eng.blogspot.com/2015/01/youtube-now-defaults-to-html5_27.html)
Because the laws were written before the proliferation of online video, we must turn to the courts to see what they believe about how the ADA and Section 504 should apply to the online sector:
Read quotes