Accessible Video in The Enterprise


Published on

Increasingly video content is becoming part of the enterprise web environment. The promise of HTML5's video element was supposed to solve a lot of the issues around serving videos to the web. But has it succeeded? And what of Accessibility?

This seminar will cover the state of video delivery on the web today, the issues, the promises, and, importantly, how to ensure that it all meets accessibility requirements.

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • A Very Brief History: The evolution of video content delivered via the web remains in its infancy today, and a standardized solution for ubiquitous and simple delivery remains elusive. With the current transition towards HTML5 based solutions (which break from previous delivery mechanisms) there remains division in media encoding, with some browsers only supporting one encoding (H.264/MP4) and others only supporting a different encoding (VP8/WebM). It is further complicated by the fact that some operating systems (iOS) do not support the traditional Flash-based media players that have emerged over the past 5+ years, requiring additional production effort and scripted bridging solutions to achieve full coverage across the multiple delivery channels available to the end consumer today.
  • Video buffering (stalling) impacts on the user-experience in negative ways, as research confirms. As we add more video to our service and product offerings, this will increasingly be a delivery issue.Under normal circumstances, “web content” travels across the internet using the http:// protocol (you will often see this as the start of a web address). The http:// protocol “chops up” content and sends small packets of data across the web, and the browser collects those packets and re-assembles them on the client screen  (well, not exactly, but close enough for this discussion). This is a very efficient and effective way of transmitting static content, however it is not so great when you want to stream content (like videos). What happens is that you create a situation where you get buffering – the video either gets choppy or “stalls” and you get the spinner on screen as the browser is waiting for the rest of the packets to arrive – I’m sure you’ve seen this before.
  • There are a number of different ways of addressing this problem, the most efficient being to use a different type of protocol. The ‘standard’ for streaming media is the Real Time Streaming Protocol (RTSP)- *BUT* to do RTSP you require a differently configured web-server, and instead of declaring the address of your asset as you would instead write rtsp://  Other, newerformats/protocols such as HTTP Live Streaming, Smooth Streaming and HTTP Dynamic Streaming are also seeking to address the buffering problems by developing a HTTP delivery mechanism. Currently there are similar but competing solutions from Apple, Microsoft and Adobe. (There is another distinct advantage to progressive streaming when it comes to actual mobile delivery, where buffering is further complicated by bandwidth availability – 4G vs. 3G vs. Edge)
  • Another important consideration is file-size and compression, which will have a direct impact on our videos. With web content being consumed by an increasingly diverse collection of screens and connectivity combinations (from lean-back, large-screen delivery to hand-held mobile screens), there is a need to provide adaptive streams tailored to those platforms. For example, large screen displays (be they desktop or even home television screens) require a higher-definition video stream, resulting in larger files and increased demand for bandwidth. Conversely, serving streaming media to handheld devices over more restricted wireless networks requires smaller file sizes.
  • Systems are emerging today that provide different video compressions on demand – so-called Adaptive bit-rate delivery – to address this issue, although issues around codec support mean that none of these solutions are 100% supported across all browsers. Due to the way these systems work, there is some additional post-production overhead effort required, although tools to automate this are emerging.Despite some limitations today however, a robust, scalable video delivery solution will likely require some form of this type of service.
  • This specification extends HTMLMediaElement to allow JavaScript to generate media streams for playback. Allowing JavaScript to generate streams facilitates a variety of use cases like adaptive streaming and time shifting live streams.
  • This specification extends HTMLMediaElement to allow JavaScript to generate media streams for playback. Allowing JavaScript to generate streams facilitates a variety of use cases like adaptive streaming and time shifting live streams.
  • All web videos must be encoded into a format that works across all browsers and operating systems. Due to complicated legal and “political” reasons, not all browsers and operating systems today support the same encodings, and at this writing positions are fairly well entrenched in all corners, resulting in a need for multiple encodings to support all browsers and platforms. At issue is the need for a license to cover the patent on the H.264 codec, which is at odds with Free and Open Source software such as Linux and the Firefox browser, which for philosophical reasons cannot support that codec in their software stack(s). Firefox however currently provides limited support when hardware decoding is present on the host system (such as most handheld devices today – see:
  • However, for full coverage we should be expecting to deliver encoded videos in both the H.264/mpeg 4 and On2/Webm formats at minimum today. This will likely be an additional burden on the production process of all videos.
  • Videos and other multi-media content is increasingly becoming an integral part of modern web content as an effective means of engaging with our users. However due to it’s multi-modal nature, the use of video has a number of accessibility issues we must address in a holistic fashion. These issues include: * Deaf and hard-of-hearing users - these users cannot hear the audio track of your media presentation. We must ensure that captions and perhaps even transcripts are available for their use. * Blind and low-vision users - these users cannot see the presentation, although perhaps they can hear it. It is critical that any on-screen text, or import visual imagery (charts, graphs, etc.) also be communicated to them. This can be achieved via audio-description, or by ensuring supporting (text-based) documents [transcripts] are associated to the media asset and easy to access. * Mobility impaired users / keyboard only users - Media players must be constructed in such a way so that they can be interacted with via keyboard, avoiding keyboard ‘traps’, etc. A common, approved media player often addresses these issues at a global level.Other issues can include addressing the needs of users with atypical color perception, blind/deaf users, or users with cognitive and neurological disabilities. Each of these users groups will have potential strategies to ensure their access to multi-media content that we should be aware of.The W3C has created a Media Accessibility User Requirements document that outlines in more detail the challenges that these groups face, as well as potential strategies that should be used to address their needs.
  • The intent of Requirement 1.2.1 is to make information conveyed by prerecorded audio-only and prerecorded video-only content available to all users. Alternatives for time-based media that are text based make information accessible because text can be rendered through any sensory modality (for example, visual, auditory or tactile) to match the needs of the user. The intent of Requirement 1.2.2 is to enable people who are deaf or hard of hearing to watch synchronized media presentations. Captions provide the part of the content available via the audio track. Captions not only include dialogue, but identify who is speaking and include non-speech information conveyed through sound, including meaningful sound effects. It is acknowledged that at the present time there may be difficulty in creating captions for time-sensitive material and this may result in the author being faced with the choice of delaying the information until captions are available, or publishing time-sensitive content that is inaccessible to the deaf, at least for the interval until captions are available. Over time, the tools for captioning as well as building the captioning into the delivery process can shorten or eliminate such delays.
  • The intent of Requirement 1.2.3 is to provide people who are blind or visually impaired access to the visual information in a synchronized media presentation. This can be achieved using one of the following two processes: - One approach is to provide audio description of the video content. The audio description augments the audio portion of the presentation with the information needed when the video portion is not available. During existing pauses in dialogue, audio description provides information about actions, characters, scene changes, and on-screen text that are important and are not described or spoken in the main sound track. - The second approach involves providing all of the information in the synchronized media (both visual and auditory) in text form. An alternative for time-based media provides a running description of all that is going on in the synchronized media content. The alternative for time-based media reads something like a screenplay or book. Unlike audio description, the description of the video portion is not constrained to just the pauses in the existing dialogue. Full descriptions are provided of all visual information, including visual context, actions and expressions of actors, and any other visual material. In addition, non-speech sounds (laughter, off-screen voices, etc.) are described, and transcripts of all dialogue are included.The intent of Requirement 1.2.5 is to provide people who are blind or visually impaired access to the visual information in a synchronized media presentation. The audio description augments the audio portion of the presentation with the information needed when the video portion is not available. During existing pauses in dialogue, audio description provides information about actions, characters, scene changes, and on-screen text that are important and are not described or spoken in the main sound track.
  • There are a number of different ‘time-stamping’ formats that are used to deliver synchronized captions on the web today. The W3C have already produced an XML-based format (TTML – Timed Text Markup Language), of which a subset of that format – DFXP – is currently supported by most Flash-based players today. However other time-stamp formats are also in play at this time: .SRT and it’s successor WebVTT* are emerging as the de-facto standard for native HTML5 delivery, SMPTE-TT (a superset of TTML developed by the Society of Motion Picture and Television Engineers that has support in IE 10) as well as .SCC, a binary format that is required for iPhone captioning support today.
  • Along with the obvious need for captioned videos, our Accessibility requirements also call for provisioning of Transcripts.There are a number of experimental examples of “interactive” transcripts, that provide enhance functionality when delivered in synch with the video. Examples include a “follow-along” highlighting, hyperlinked transcripts (both static and timed), and the ability to highlight sections for truncated embedding (useful with longer-form videos). The potential for marketing uses of this type of interactive transcript should be considered as a ‘bonus’ consideration when investigating any solution moving forward.
  • Gov. of Canada = WCAG 2.0 AA minus Complex maps (1.1.1), Live Video Captions (1.2.4) and Audio-Descriptions (unless related to health or safety, 1.2.5) Ontario Gov. - AODA = Full fledged WCAG 2.0 AA, except for 1.2.4 and 1.2.5 Quebec Gov. - SGQRI 008-03 = limited to 1.2.1, 1.2.2 and 1.2.3 when it comes to audio/video (p.13)
  • While many aspects of the creation of accessible videos can be automated and systemized, the conversion of the spoken word (etc.) to a text based format remains a manual process, especially when specialized terms or other legal requirements demand accuracy. Whether done real time (CART services) or during the post-production process, it is a specialized skill-set that can either be brought in-house, or out-sourced to third-party firms that specialize in this service. Currently, while pricing is variable, there appears to be a leveling off at approximately $60 - $100 per hour of video content when out-sourced (pricing often dependant on volume of content, and turn-around time).Once the textual equivalent is produced, there are numerous services and systems that can apply the time-stamping to the content for final delivery.
  • For a variety of reasons (including lower CPU capacity on phones, bandwidth and network restrictions, and the complexities of keeping separate text tracks and media tracks in sync) support for “Closed Captions” on mobile devices today is practically non-existent today. This means that for the mobile platform, we will need to be able to offer the end user a choice of the non-captioned video, or an Open Captioned video.
  • Accessible Video in The Enterprise

    1. 1. Accessible Video in the Enterprise
    2. 2. A (Very) Brief History 1999 – 2005: Competing, incompatible delivery platforms 2005: Launch of YouTube & Flash-based player brings some commonality to delivery platform
    3. 3. A (Very) Brief History 2012/2014: W3C’s Standardization of HTML5 Apple drops Flash support / advances in Major Browsers
    4. 4. Consideration #1 - Streaming Users Start Giving Up on Streaming Video If It Takes Two Seconds to Load: ( giving-up-on-streaming-video-if-it-takes-two-seconds-to-load) Research Data: krishnan.pdf University of Massachusetts, Amherst & Akamai Technologies
    5. 5. Consideration #1 - Streaming Protocols: HTTP (hyper text transfer protocol): • chops web pages into packets for fast, asynchronous delivery RTSP (Real Time Streaming Protocol): • delivers continuous stream of multimedia data • requires specialized streaming media server Adaptive Bit-Rate - HTTP Live Streaming, Smooth Streaming and HTTP Dynamic Streaming: • HTTP Live Streaming is backed by Apple, Smooth Streaming is backed by Microsoft and HTTP Dynamic Streaming is backed by Adobe • emergent solutions that are not yet standardized – not all platforms are supported • “fakes out” streaming by delivering “chunks” of content delivered via HTTP that self-adjusts delivery packets • requires additional production overhead and asset management
    6. 6. Consideration #1 - Streaming Adaptive bit-rate: This delivery method is beginning to have a massive impact on every aspect of Internet video delivery because it allows the stream to actually adapt the video experience to the quality of the network and the device's CPU.
    7. 7. Consideration #1 - Streaming Essentially, the video stream can increase or decrease the bit rate and resolution of the video (its quality) in real time so that it’s always streaming the best possible quality the available network connection can support. The better the network connection, the better the video image quality. The fact that the stream handles all of this complexity means the mobile video viewer doesn’t have to do anything; everything is left to the stream and the player.
    8. 8. Consideration #1 - Streaming
    9. 9. Consideration #1 - Streaming
    10. 10. Consideration #1 - Streaming
    11. 11. Consideration #2: Encoding Considerations H.264: considered to be the front-runner / industry standard Licensed codec via MPEG LA – Royalty status remains vague WebM: “free” codec developed by Google Royalty free for use by content producers Ogg Theora: Open Source codec Considered ‘dated’ and support diminishing in favor of WebM
    12. 12. Consideration #2: Encoding Considerations The Bottom Line? To provide full support today to all users and user- platforms we will need to consider encoding videos at least twice, in 2 formats. Recommendation? H.264 & WebM codecs
    13. 13. Consideration #3: Security There are at least 2 types of security concerns with video delivery on the web: Script Injections: Since many video controls and captions use some form of scripting, caution must be taken to ensure that they do not introduce security holes that can be exploited.
    14. 14. Consideration #3: Security http vs. https: Since the video and all related assets (captions, transcripts, video descriptions) are traditionally served to the web browser as discrete files, when we look to embed a video on a secure page, those supplemental files will also need to be served securely to avoid User Security Warnings.
    15. 15. Consideration #4: Accessibility The W3C have produced a detailed list of all requirements various user- groups would need for full and complete access to multi-media.
    16. 16. Consideration #4: Accessibility At a minimum, users require accessible media player controls (start, stop, pause, mute, etc), as well as time-synched captions, descriptive audio, and full transcripts of all content delivered.
    17. 17. Consideration #4: Accessibility WCAG 1.2.1 Provide alternatives for Prerecorded Video: Either an alternative for time-based media or an audio track is provided that presents equivalent information for prerecorded video-only content. (A) WCAG 1.2.2 Captions: Captions are provided for all prerecorded audio content in synchronized media. (A)
    18. 18. Consideration #4: Accessibility WCAG 1.2.3 Audio Description or Media Alternative: An alternative for time-based media or audio description of the prerecorded video content is provided for synchronized media. (A) WCAG 1.2.5 Audio Description: Audio description is provided for all prerecorded video content in synchronized media. (AA)
    19. 19. Definitions: Closed Captions / Open Captions: Closed captions can be turned “on or off” by the end user Open Captions remain on-screen for all users Captions capture onscreen dialog and basic sound effects (<<clapping>>, <<music>>, <<laughter>>, etc.) Resource: captioningkey/
    20. 20. Definitions: Caption Formats: TTML (Timed Text Markup Language) – XML based (includes DFXP, a standard for Flash players) WebVTT (Web Video Timed Text) – emergent standard, text based, favored by browser vendors Other formats exist – conversion from one format to the other is a mechanical process
    21. 21. Definitions: Transcripts: Loosely defined in the web space Generally are more complete than captions – includes additional on-screen information (descriptions of charts or other visual assets for example) Traditionally offered as a complementary piece to the media asset (unlike captions which are delivered in a synchronous fashion with the media) Usually provided as HTML or downloadable text formats such as accessible PDF
    22. 22. Definitions: Descriptive Audio: Supplemental audio track, provided on demand, which describes on-screen actions to the non-sighted. Specified as a WCAG requirement (1.2.5), delivery technologies remain rudimentary with little practical support in the wild.
    23. 23. Consideration #4: Accessibility Re: WCAG 1.2.5 Audio Description: At this time, delivering on this AA Requirement is severely frustrated due to the lack of robust native support in browsers and mobile devices. Many entities are choosing to NOT require this Success Criteria, including the Governments of Canada, Ontario and Quebec. The Access Board in the US will likely seek to maintain the current requirement in provision 1194.24(b) that ICT hardware support audio description , which might improve the current situation. Fingers crossed.
    24. 24. Accessibility Production Requirements: The most labor-intensive aspect of ensuring accessible media is the generation of the text that represents the audio (and in some cases descriptions of on-screen activity), to be subsequently integrated into the final on-screen delivery to the end client. Videos created from an approved script will already have text to work with, however when no script is available the process of ensuring accurate text transcription remains a manual process.
    25. 25. Accessibility Production Requirements: While advances in speech to text have come a long way, and continue to evolve in terms or accuracy, at this time the only dependable way of ensuring accuracy is through the involvement of human input.
    26. 26. Accessibility Production Requirements: • Support for “Closed Captions” on mobile devices today is practically non- existent. • This means that for the mobile platform today, we will need to be able to offer the end user a choice of the non- captioned video, or an Open Captioned video prior to the launch of the video itself. • The same technical limitations currently impact the provisioning of descriptive audio as well.
    27. 27. Recap: • Streaming solutions like Adaptive Bit-Rate delivery are emerging as absolute requirements to address different screen resolutions, bandwidth considerations, etc. • There are existing proprietary solutions in the market- place that address some, but not all needs • W3C’s Media Source Extension specification is at Last Call, with minimal browser support today
    28. 28. Recap: • The “codec wars” remain at a stalemate, necessitating multiple encodings to support HTML5’s <video> element • H.264 and WebM codecs are the recommended choices today • Caution should be exercised with regard to security considerations. Beware of script injection holes • Videos served from a secure environment will need to ensure that all supporting assets are also served securely
    29. 29. Recap: • At a minimum, users require accessible media player controls (start, stop, pause, mute, etc), as well as time- synched captions, descriptive audio, and full transcripts of all content delivered. • There is currently no native support in the browsers to satisfy WCAG 1.2.5 (AA) • The creation of text based alternatives remains for the most part a manual process today • Delivering Captioned videos on mobile currently requires Open Caption alternatives
    30. 30. Recap: Kind of disappointing, right? While problems still exist, there is forward movement at a decent pace. Remember, patience is a virtue.
    31. 31. Exciting developments to watch: The A11yMetadata Project seeks to extend by including new properties to address the accessibility and discoverability of resources on the Web.
    32. 32. The A11yMetadata Project Specifying content features of the resource, such as accessible media and alternatives: <meta itemprop="mediaFeature" content="alternativeText"> PROPOSED VALUES: alternativeText audioDescription braille captions ChemML describedMath displayTransformability haptic highContrast largePrint latex longDescription MathML musicBraille musicLargePrint nemethBraille signLanguage structuralNavigation tactileGraphic tactileObject transcript
    33. 33. Exciting developments to watch: The Descriptive Video Exchange project focuses on crowd-sourced techniques for describing DVD media. CSD will expand DVX to include Internet-based media such as YouTube, iTunes U, and other streamed video found on a wide variety of web sites.
    34. 34. Exciting developments to watch: This new project aims to demonstrate the inclusion of enhancements in ways that are both visual and non-visual, all of which are screen-reader accessible and delivered using HTML5, JavaScript and the Popcorn.js HTML5 Media Framework. Using HTML5 and JavaScript to Deliver Accessible Supplemental Materials
    35. 35. Thank you Questions? Contact Me Accessible Video in the Enterprise September, 2013