Increasingly video content is becoming part of the enterprise web environment. The promise of HTML5's video element was supposed to solve a lot of the issues around serving videos to the web. But has it succeeded? And what of Accessibility?
This seminar will cover the state of video delivery on the web today, the issues, the promises, and, importantly, how to ensure that it all meets accessibility requirements.
A Very Brief History: The evolution of video content delivered via the web remains in its infancy today, and a standardized solution for ubiquitous and simple delivery remains elusive. With the current transition towards HTML5 based solutions (which break from previous delivery mechanisms) there remains division in media encoding, with some browsers only supporting one encoding (H.264/MP4) and others only supporting a different encoding (VP8/WebM). It is further complicated by the fact that some operating systems (iOS) do not support the traditional Flash-based media players that have emerged over the past 5+ years, requiring additional production effort and scripted bridging solutions to achieve full coverage across the multiple delivery channels available to the end consumer today.
Video buffering (stalling) impacts on the user-experience in negative ways, as research confirms. As we add more video to our service and product offerings, this will increasingly be a delivery issue.Under normal circumstances, “web content” travels across the internet using the http:// protocol (you will often see this as the start of a web address). The http:// protocol “chops up” content and sends small packets of data across the web, and the browser collects those packets and re-assembles them on the client screen (well, not exactly, but close enough for this discussion). This is a very efficient and effective way of transmitting static content, however it is not so great when you want to stream content (like videos). What happens is that you create a situation where you get buffering – the video either gets choppy or “stalls” and you get the spinner on screen as the browser is waiting for the rest of the packets to arrive – I’m sure you’ve seen this before.
There are a number of different ways of addressing this problem, the most efficient being to use a different type of protocol. The ‘standard’ for streaming media is the Real Time Streaming Protocol (RTSP)- *BUT* to do RTSP you require a differently configured web-server, and instead of declaring the address of your asset as http://mysite.com/video.mp4 you would instead write rtsp://mysite.com/video.mp4. Other, newerformats/protocols such as HTTP Live Streaming, Smooth Streaming and HTTP Dynamic Streaming are also seeking to address the buffering problems by developing a HTTP delivery mechanism. Currently there are similar but competing solutions from Apple, Microsoft and Adobe. (There is another distinct advantage to progressive streaming when it comes to actual mobile delivery, where buffering is further complicated by bandwidth availability – 4G vs. 3G vs. Edge)
Another important consideration is file-size and compression, which will have a direct impact on our videos. With web content being consumed by an increasingly diverse collection of screens and connectivity combinations (from lean-back, large-screen delivery to hand-held mobile screens), there is a need to provide adaptive streams tailored to those platforms. For example, large screen displays (be they desktop or even home television screens) require a higher-definition video stream, resulting in larger files and increased demand for bandwidth. Conversely, serving streaming media to handheld devices over more restricted wireless networks requires smaller file sizes.
Systems are emerging today that provide different video compressions on demand – so-called Adaptive bit-rate delivery – to address this issue, although issues around codec support mean that none of these solutions are 100% supported across all browsers. Due to the way these systems work, there is some additional post-production overhead effort required, although tools to automate this are emerging.Despite some limitations today however, a robust, scalable video delivery solution will likely require some form of this type of service.
All web videos must be encoded into a format that works across all browsers and operating systems. Due to complicated legal and “political” reasons, not all browsers and operating systems today support the same encodings, and at this writing positions are fairly well entrenched in all corners, resulting in a need for multiple encodings to support all browsers and platforms. At issue is the need for a license to cover the patent on the H.264 codec, which is at odds with Free and Open Source software such as Linux and the Firefox browser, which for philosophical reasons cannot support that codec in their software stack(s). Firefox however currently provides limited support when hardware decoding is present on the host system (such as most handheld devices today – see: https://brendaneich.com/2012/03/video-mobile-and-the-open-web/).
However, for full coverage we should be expecting to deliver encoded videos in both the H.264/mpeg 4 and On2/Webm formats at minimum today. This will likely be an additional burden on the production process of all videos.
Videos and other multi-media content is increasingly becoming an integral part of modern web content as an effective means of engaging with our users. However due to it’s multi-modal nature, the use of video has a number of accessibility issues we must address in a holistic fashion. These issues include: * Deaf and hard-of-hearing users - these users cannot hear the audio track of your media presentation. We must ensure that captions and perhaps even transcripts are available for their use. * Blind and low-vision users - these users cannot see the presentation, although perhaps they can hear it. It is critical that any on-screen text, or import visual imagery (charts, graphs, etc.) also be communicated to them. This can be achieved via audio-description, or by ensuring supporting (text-based) documents [transcripts] are associated to the media asset and easy to access. * Mobility impaired users / keyboard only users - Media players must be constructed in such a way so that they can be interacted with via keyboard, avoiding keyboard ‘traps’, etc. A common, approved media player often addresses these issues at a global level.Other issues can include addressing the needs of users with atypical color perception, blind/deaf users, or users with cognitive and neurological disabilities. Each of these users groups will have potential strategies to ensure their access to multi-media content that we should be aware of.The W3C has created a Media Accessibility User Requirements document that outlines in more detail the challenges that these groups face, as well as potential strategies that should be used to address their needs.
The intent of Requirement 1.2.1 is to make information conveyed by prerecorded audio-only and prerecorded video-only content available to all users. Alternatives for time-based media that are text based make information accessible because text can be rendered through any sensory modality (for example, visual, auditory or tactile) to match the needs of the user. The intent of Requirement 1.2.2 is to enable people who are deaf or hard of hearing to watch synchronized media presentations. Captions provide the part of the content available via the audio track. Captions not only include dialogue, but identify who is speaking and include non-speech information conveyed through sound, including meaningful sound effects. It is acknowledged that at the present time there may be difficulty in creating captions for time-sensitive material and this may result in the author being faced with the choice of delaying the information until captions are available, or publishing time-sensitive content that is inaccessible to the deaf, at least for the interval until captions are available. Over time, the tools for captioning as well as building the captioning into the delivery process can shorten or eliminate such delays.
The intent of Requirement 1.2.3 is to provide people who are blind or visually impaired access to the visual information in a synchronized media presentation. This can be achieved using one of the following two processes: - One approach is to provide audio description of the video content. The audio description augments the audio portion of the presentation with the information needed when the video portion is not available. During existing pauses in dialogue, audio description provides information about actions, characters, scene changes, and on-screen text that are important and are not described or spoken in the main sound track. - The second approach involves providing all of the information in the synchronized media (both visual and auditory) in text form. An alternative for time-based media provides a running description of all that is going on in the synchronized media content. The alternative for time-based media reads something like a screenplay or book. Unlike audio description, the description of the video portion is not constrained to just the pauses in the existing dialogue. Full descriptions are provided of all visual information, including visual context, actions and expressions of actors, and any other visual material. In addition, non-speech sounds (laughter, off-screen voices, etc.) are described, and transcripts of all dialogue are included.The intent of Requirement 1.2.5 is to provide people who are blind or visually impaired access to the visual information in a synchronized media presentation. The audio description augments the audio portion of the presentation with the information needed when the video portion is not available. During existing pauses in dialogue, audio description provides information about actions, characters, scene changes, and on-screen text that are important and are not described or spoken in the main sound track.
There are a number of different ‘time-stamping’ formats that are used to deliver synchronized captions on the web today. The W3C have already produced an XML-based format (TTML – Timed Text Markup Language), of which a subset of that format – DFXP – is currently supported by most Flash-based players today. However other time-stamp formats are also in play at this time: .SRT and it’s successor WebVTT* are emerging as the de-facto standard for native HTML5 delivery, SMPTE-TT (a superset of TTML developed by the Society of Motion Picture and Television Engineers that has support in IE 10) as well as .SCC, a binary format that is required for iPhone captioning support today.
Along with the obvious need for captioned videos, our Accessibility requirements also call for provisioning of Transcripts.There are a number of experimental examples of “interactive” transcripts, that provide enhance functionality when delivered in synch with the video. Examples include a “follow-along” highlighting, hyperlinked transcripts (both static and timed), and the ability to highlight sections for truncated embedding (useful with longer-form videos). The potential for marketing uses of this type of interactive transcript should be considered as a ‘bonus’ consideration when investigating any solution moving forward.
Gov. of Canada = WCAG 2.0 AA minus Complex maps (1.1.1), Live Video Captions (1.2.4) and Audio-Descriptions (unless related to health or safety, 1.2.5)http://www.tbs-sct.gc.ca/pol/doc-eng.aspx?id=23601&section=text#appB Ontario Gov. - AODA = Full fledged WCAG 2.0 AA, except for 1.2.4 and 1.2.5http://www.e-laws.gov.on.ca/html/source/regs/english/2011/elaws_src_regs_r11191_e.htm#BK15 Quebec Gov. - SGQRI 008-03 = limited to 1.2.1, 1.2.2 and 1.2.3 when it comes to audio/video http://www.tresor.gouv.qc.ca/fileadmin/PDF/ressources_informationnelles/AccessibiliteWeb/access_multimedia_ve.pdf (p.13)
While many aspects of the creation of accessible videos can be automated and systemized, the conversion of the spoken word (etc.) to a text based format remains a manual process, especially when specialized terms or other legal requirements demand accuracy. Whether done real time (CART services) or during the post-production process, it is a specialized skill-set that can either be brought in-house, or out-sourced to third-party firms that specialize in this service. Currently, while pricing is variable, there appears to be a leveling off at approximately $60 - $100 per hour of video content when out-sourced (pricing often dependant on volume of content, and turn-around time).Once the textual equivalent is produced, there are numerous services and systems that can apply the time-stamping to the content for final delivery.
For a variety of reasons (including lower CPU capacity on phones, bandwidth and network restrictions, and the complexities of keeping separate text tracks and media tracks in sync) support for “Closed Captions” on mobile devices today is practically non-existent today. This means that for the mobile platform, we will need to be able to offer the end user a choice of the non-captioned video, or an Open Captioned video.
Accessible Video in The Enterprise
Accessible Video in the Enterprise
A (Very) Brief History
1999 – 2005: Competing, incompatible delivery platforms
2005: Launch of YouTube & Flash-based player brings some
commonality to delivery platform
A (Very) Brief History
2012/2014: W3C’s Standardization of HTML5
Apple drops Flash support / advances in Major
Consideration #1 -
Users Start Giving Up on Streaming Video If It Takes Two
Seconds to Load: (http://gizmodo.com/5959553/users-start-
University of Massachusetts, Amherst & Akamai Technologies
Consideration #1 -
HTTP (hyper text transfer protocol):
• chops web pages into packets for fast, asynchronous delivery
RTSP (Real Time Streaming Protocol):
• delivers continuous stream of multimedia data
• requires specialized streaming media server
Adaptive Bit-Rate - HTTP Live Streaming, Smooth Streaming and HTTP
• HTTP Live Streaming is backed by Apple, Smooth Streaming is
backed by Microsoft and HTTP Dynamic Streaming is backed by
• emergent solutions that are not yet standardized – not all
platforms are supported
• “fakes out” streaming by delivering “chunks” of content delivered
via HTTP that self-adjusts delivery packets
• requires additional production overhead and asset management
Consideration #1 -
This delivery method is beginning to have a massive impact on
every aspect of Internet video delivery because it allows the
stream to actually adapt the video experience to the quality of the
network and the device's CPU.
Consideration #1 -
Essentially, the video stream can increase or decrease the bit rate
and resolution of the video (its quality) in real time so that it’s
always streaming the best possible quality the available network
connection can support.
The better the network connection, the better the video image
The fact that the stream handles all of this complexity means the
mobile video viewer doesn’t have to do anything; everything is left
to the stream and the player.
considered to be the front-runner / industry standard
Licensed codec via MPEG LA – Royalty status remains vague
“free” codec developed by Google
Royalty free for use by content producers
Open Source codec
Considered ‘dated’ and support diminishing in favor of WebM
The Bottom Line?
To provide full support today
to all users and user-
platforms we will need to
consider encoding videos at
least twice, in 2 formats.
H.264 & WebM codecs
Consideration #3: Security
There are at least 2 types of
security concerns with
video delivery on the web:
Script Injections: Since many video
controls and captions use some form
of scripting, caution must be taken to
ensure that they do not introduce
security holes that can be exploited.
Consideration #3: Security
http vs. https: Since the video
and all related assets
(captions, transcripts, video
descriptions) are traditionally
served to the web browser as
discrete files, when we look
to embed a video on a secure
page, those supplemental
files will also need to be
served securely to avoid User
The W3C have
detailed list of all
need for full and
At a minimum, users
media player controls
(start, stop, pause,
mute, etc), as well as
audio, and full
transcripts of all
WCAG 1.2.1 Provide alternatives for Prerecorded Video:
Either an alternative for time-based media or an audio
track is provided that presents equivalent information for
prerecorded video-only content. (A)
WCAG 1.2.2 Captions: Captions are provided for all
prerecorded audio content in synchronized media. (A)
WCAG 1.2.3 Audio Description or Media Alternative: An
alternative for time-based media or audio description of
the prerecorded video content is provided for synchronized
WCAG 1.2.5 Audio Description: Audio description is
provided for all prerecorded video content in synchronized
Closed Captions / Open
Closed captions can be turned “on or off”
by the end user
Open Captions remain on-screen for all
Captions capture onscreen dialog and basic
sound effects (<<clapping>>, <<music>>,
TTML (Timed Text Markup Language) –
XML based (includes DFXP, a standard for
WebVTT (Web Video Timed Text) –
emergent standard, text based, favored by
Other formats exist – conversion from
one format to the other is a mechanical
Loosely defined in the web space
Generally are more complete than
captions – includes additional on-screen
information (descriptions of charts or other
visual assets for example)
Traditionally offered as a complementary
piece to the media asset (unlike captions
which are delivered in a synchronous fashion
with the media)
Usually provided as HTML or
downloadable text formats such as accessible
Supplemental audio track,
provided on demand, which
describes on-screen actions to the
Specified as a WCAG
requirement (1.2.5), delivery
technologies remain rudimentary
with little practical support in the
Re: WCAG 1.2.5 Audio Description:
At this time, delivering on this AA Requirement is severely
frustrated due to the lack of robust native support in browsers
and mobile devices. Many entities are choosing to NOT require
this Success Criteria, including the Governments of Canada,
Ontario and Quebec.
The Access Board in the US will likely seek to maintain the
current requirement in provision 1194.24(b) that ICT hardware
support audio description , which might improve the current
situation. Fingers crossed.
Requirements: The most labor-intensive
aspect of ensuring
accessible media is the
generation of the text that
represents the audio (and
in some cases descriptions
of on-screen activity), to
be subsequently integrated
into the final on-screen
delivery to the end client.
Videos created from an approved script will already have text
to work with, however when no script is available the process
of ensuring accurate text transcription remains a manual
While advances in speech to text have
come a long way, and continue to
evolve in terms or accuracy, at this time
the only dependable way of ensuring
accuracy is through the involvement of
Requirements: • Support for “Closed
Captions” on mobile devices
today is practically non-
• This means that for the
mobile platform today, we will
need to be able to offer the
end user a choice of the non-
captioned video, or an Open
Captioned video prior to the
launch of the video itself.
• The same technical limitations currently impact the
provisioning of descriptive audio as well.
• Streaming solutions like Adaptive Bit-Rate delivery are
emerging as absolute requirements to address
different screen resolutions, bandwidth
• There are existing proprietary solutions in the market-
place that address some, but not all needs
• W3C’s Media Source Extension specification is at Last
Call, with minimal browser support today
• The “codec wars” remain at a stalemate, necessitating
multiple encodings to support HTML5’s <video>
• H.264 and WebM codecs are the recommended
• Caution should be exercised with regard to security
considerations. Beware of script injection holes
• Videos served from a secure environment will need to
ensure that all supporting assets are also served
• At a minimum, users require accessible media player
controls (start, stop, pause, mute, etc), as well as time-
synched captions, descriptive audio, and full
transcripts of all content delivered.
• There is currently no native support in the browsers to
satisfy WCAG 1.2.5 (AA)
• The creation of text based alternatives remains for the
most part a manual process today
• Delivering Captioned videos on mobile currently
requires Open Caption alternatives
Kind of disappointing, right?
While problems still exist, there
is forward movement at a
Remember, patience is a virtue.
Exciting developments to
watch: The A11yMetadata
Project seeks to
by including new
resources on the
The A11yMetadata Project
Specifying content features of the resource, such as accessible
media and alternatives:
<meta itemprop="mediaFeature" content="alternativeText">
Exciting developments to
watch: The Descriptive Video
Exchange project focuses
techniques for describing
DVD media. CSD will
expand DVX to include
such as YouTube, iTunes
U, and other streamed
video found on a wide
variety of web sites.
Exciting developments to
watch: This new project aims to
enhancements in ways
that are both visual and
non-visual, all of which
accessible and delivered
and the Popcorn.js
Deliver Accessible Supplemental
Accessible Video in the Enterprise