This document discusses approaches for integrating accessible multimedia documents into digital libraries. It describes two approaches - one based on extending the DAISY format and one based on transforming MultiReader documents into a client/server distribution model. Both approaches aim to provide synchronized multimedia content and personalization for users with different abilities. The document also discusses collaborative production of accessible documents and the roles of authors and librarians in ensuring high quality resources.
Integration of Accessible Documents into Digital Libraries of Tomorrow
1. Integration of Accessible Documents into Digital Librar-
ies of Tomorrow
Alexander Haffner Gerhard Weber
Technische Universität Dres- Technische Universität Dres-
den den
01062 Dresden 01062 Dresden
Germany Germany
alexander.haffner@inf.tu- gerhard.weber@inf.tu-
dresden.de dresden.de
Digital libraries are processing mainly digital text resources and intend to en-
sure long-tem preservation. Future systems additionally have to focus on ac-
cessible multimedia dissemination and overcome traditional channels for dis-
tribution. We have developed two different approaches towards a client/server
distribution model for an accessible digital library based either on an exten-
sion of DAISY or by transforming MultiReader documents. These approaches
rely on ingest of comprehensive contents covering enhanced semantic rich-
ness. We consider role changes of authors and their corresponding responsi-
bilities for actual resource quality increase. Furthermore, we discuss resulting
end user benefits by multimedia and related modalities of use.
1. Introduction
Recently, the term “digital library” has been extremely expanded in its meaning,
the development turned from simple online catalogues to archival information
systems. Archival information systems are archives, consisting of an organization
of people and systems that have accepted the responsibility to preserve informa-
tion and make it available for a designated community [OAIS02].
Each reader community is affected by the availability of digital resources. Accord-
ing to the European Disability Forum (EDF), approx. 50 million people of the
population in the European Union suffer from disabilities [EDF08]. For example,
people dependent on a wheelchair can avoid lack of mobility when visiting digital
libraries instead of conventional libraries. In contrast, print-disabled people get
only access to contents available if it is available as digital asset explicitly. For
the majority of people with a disability digital resources offer a diversity of van-
tages for comfortable document use. Instead of delivery by mail, digital libraries
support near to instant access. Issues of reengineering ingest, archival and dis-
semination processes challenge a modern library-for-all.
This paper discusses access strategies to rich multimedia contents. Multimedia
solutions may include synchronized equivalent media streams that already match
user needs or are adaptable to their needs. Production and ingest responsibilities
2. by authors and librarians may be based on a distributed system to support ar-
chiving and become applicable for high quality resource distribution.
2. Access by consumers
Consumers perform an online document search, if they benefit from the biblio-
graphic metadata. Often they intend immediately to read or at least browse a
document that suits their interest. Digital library search functionality is mostly of-
fered by a web based user interface, whereby forms assist users in a purposeful
search. Therefore techniques and guidelines for web accessibility may already
cover accessibility aspects. In contrast, access to actual resources is not neces-
sarily covered by such guidelines.
[PET05] determined requirements of print disabled users (blind, partial sighted,
dyslexic, and hearing impaired users) in document handling. Results show each
group has problems using standard print but has as well specific requirements for
the use of digital documents. For example, blind users are not able to use graph-
ics whereas graphics support understanding of all the other readers, partial
sighted readers need scalable text and adjustable contrasts, audio without any
capturing is useless for deaf, and too small line distances let dyslexics get lost in
text. In particular, video and audio demand different accessibility approaches to
serve each single target groups.
The use of multimedia documents by mainstream readers as well as by print-
disabled readers requires contents which matches the constrained modalities of
use and offer personalization of document presentation.
2.1 Digital Talking Books in DAISY format
A Digital Talking Book (DTB) is a multimedia document developed also for dis-
abled users. DTBs use Digital Accessible Information System (DAISY) as a
smart format to integrate and synchronize text, audio and images. ANSI/NISO
Z39.86-2005 standard [DTB05] defines the format and content of the electronic
file set that comprises a Digital Talking Book.
The XML based specification provides producers with the ability to structure a
book in great detail. Compared to HTML mark-up, XML increases mark-up op-
tions and makes more detailed structure and some nesting possible. The corre-
sponding structure allows readers both global navigation (through pages, chap-
ters, headings etc.), and local navigation within a document at a very fine granu-
larity (on a paragraph, sentence or word level as well as within a table).
DAISY players offer readers personalization of text display, as well as adjustment
of audio playback speed within some bandwidth. Additionally a player can con-
tinuously highlight text phrases. Typically DTBs contain synthetic or human nar-
rated voice.
An exemplary synchronized media playback processing is shown in Figure 1.
The illustration demonstrates dependencies of permitted DTB items. A maximum
of three media objects (one is time-dependent) may be applicable in parallel at
every moment of reading.
3. m
dtb:audio
<audio> <audio> <audio>
dtb:text
<dtbook>
dtb:image dtb:image
<image> <image>
t
Figure 1: example for synchronized multimedia in DAISY
The deficit in DAISY is the lack of video integration. In particular people with a
loss of hearing suffer from this fact. For deaf people sign language is the native
language. Consequently, navigable sign language videos are produced by spe-
cialized publishers to make documents more readable utilizing a variety of tech-
nical approaches and typically without XML-based markup.
The MultiReader project noticed already in 2005 the demand for corresponding
media alternatives and developed its own format-for-all utilizing the same indus-
try standard file formats as DAISY.
Recently, the DAISY Consortium likewise observed missing provision of several
media streams particularly matching hearing-impaired needs.
Accompanying to the design of SMIL 3.0 a SMIL 3.0 DAISY profile was devel-
oped. The profile meets a variety of additional requirements. Consequently,
DAISY Consortium announced a revision of ANSI/NISO Z39.86. The revised
Standard will address both authoring (master creation) and distribution require-
ments [DAYNL08].
2.2 MultiReader Project
The MultiReader document model is based on enriched media documents and
the separation of content from presentation. A MultiReader document consists of
a main file and several source files which are rich media documents containing
both XHTML and a kind of SMIL mark-up to identify single media objects.
XHTML+TIME is a development combining the well known Hypertext Mark-up
Language (HTML) with the timing and synchronization mechanisms of SMIL. By
using these XML languages further document processing with easily available
XML parsers and XSLT engines is possible. Further, XHTML+TIME documents
can be displayed by industry standard web browsers. MultiReader documents
are read through a reading program with novel user interface elements.
Identification of media objects is using a mechanism in XHTML for linking de-
scription of contents with a description of presentation (Cascading Stylesheet,
CSS). In XHTML each stylesheet is identified through a name, which also serves
as a microformat. In addition, MultiReader specifies classes of media objects to
describe some contents which will be transformed according to user needs. Hier-
archical nesting of MultiReader classes is possible through nested mark-up
4. (<span>-tags). Figure 2 shows such a nesting for some video played together
with background, followed by music. Narration to the video is enriched by audio
description by blind users. Captions and subtitles may be offered to deaf users
instead.
m
cvideo ctsynth
<video> <audio>
cvbgsound cvmusic cthigh
<audio> <audio> <set> <set> <set>
cvaudiodesc cvnarration cvaudiodesc
<audio> <audio> <audio>
cvcaption cvsubtitle
text text t
Figure 2: example for synchronized multimedia in MultiReader documents [SPI08]
In essence, the formats for accessible multimedia documents do exist. But a digi-
tal library of tomorrow does not only support search and download mechanisms,
it should also support platform independent web based playback of multimedia
resources. Neither MultiReader nor DAISY have been developed with this inten-
tion.
3. Web based playback solutions
This section introduces two enhancements for an improved access to multimedia
documents-for-all ensuring independent player technologies. An important role in
both approaches plays the support of a multimodal usage concept regarding
support for handicapped users.
3.1 Timesheet based MultiReader solution
In contrast to the traditional MultiReader concept [SPI08] replaces use of
HTML+TIME technologies by a different approach towards multimedia synchro-
nisation in a browser independent broadcast. The redeveloped system more
strictly is based on SMIL, but implements SMIL through SMIL-Timesheets
[STS08] and allows validation as XHTML file. Timesheet provide absolute, rela-
tive and event based time controlling of multimedia items in a web page. The ad-
vantage of Timesheet is in the external and modular specification similar to CSS.
All the timing specification can be hold in one file and will be attached to embed-
ded page elements at runtime. Timesheet are based on JavaScript, a technology
which screenreaders may utilize successfully if used carefully. A Timesheet en-
gine supports client side events and corresponding media handling. It is dynami-
cally integrated into a distributed MultiReader solution by relying on AJAX.
The distributed MultiReader system consists of an application server maintaining
a user profile. The implementation is based on Cocoon. Corresponding to the
5. user profile, the application server composes mandatory media objects into a
MultiReader document. Those documents are located in a separate archival
storage system which supports streaming functionality. The Cocoon application
server transforms media containers by XSLT and arranges all necessary compo-
nents in an accessible webpage. This means personalization is completed on the
server. As a result users read in web pages which meet their needs more ade-
quately. Adjustment of visual media is provided by utilizing individual CSS. The
multimedia items are just embedded as link objects. Streaming of multimedia
data is handled by an accessible Flash player, controlled with respect to temporal
requirements by the timesheet as well as with respect to user intervention for ex-
tra reading time by mouse or keyboard operated buttons.
Furthermore documents contain besides standard browser navigation, functional-
ities of navigation support by table of contents and indices to explore the hierar-
chical document structures. A variety of assistance features offer quick help. The
enhanced MultiReader system has been tested for AA WCAG 1.0 conformance
successfully as well as for usability in a pilot test by a blind user.
3.2 Enhanced DAISY web player
The enhanced DAISY format of [EBE08] contains, besides the traditional dtbook
textual content file and the recorded audio files, a variety of specific audio tracks,
an additional video stream, a sign language stream and different subtitles and
captions for videos as primary media. Combinations of single media objects aim
to the needs of different user groups.
Each user has a client-side user profile which causes a first arrangement of nec-
essary media for synchronization. Therefore the client requests the original
DAISY-DOM from an archival storage system and undertakes a rebuilding into
an adapted personalized document. Afterwards every user has the opportunity to
undertake further unassisted personalization in the player.
The client side player application is completely embedded in a Flex [FLEX08]
based Flash environment to support heterogeneous systems by proper media
stream playing. The actual text of the dtbook file is displayed in an HTML area.
This approach allows the reuse of standard web accessibility features, so it is
also possible to read text by screenreader if equivalent audio files are missing.
Furthermore display adaptation is adjustable by CSS. The major advantage of
Flash is its comfortable and accessible audio and video file playback similar to
timesheet based MultiReader.
3.3 Comparison
Unlike static web pages support both approaches optional placement and plastic-
ity for all media objects included. Primarily plasticity of the user interface is ensur-
ing resizing but also aspects of overlaying and free positioning (i.e. subtitles on
the display) are addressed through a client/server system.
On system side every media object is selectable individually and streamed. Con-
sequently, a client just selects the necessary media from the streaming server.
Users will experience a better degree of controllability also with respect to addi-
tional time for reading. The streaming approach of both [SPI08] and [EBE08] en-
6. ables partial playback of large files with almost no delays, in particular requires
streamed Flash only little buffering before playback can start. In contrast, ap-
proaches to download videos or audio files would cause undesired pauses.
The main difference arises from support for screenreaders. The MultiReader-
based approach is based on HTML and JavaScript. Many screenreader support
HTML sufficiently well and reliably follow techniques for accessible web pages. In
contrast Flex, is supporting screenreaders only within the Windows operating
system by addressing MSAA. The following Table 1 summarizes this compari-
son.
Client/Server Enhanced DAISY
MultiReader
Web-based distribution personalised by server personalized by client
W3C standards XHTML incl. JavaScript requires standard update
readers blind, partial sighted, blind, partial sighted,
dyslexic, deaf and hear- dyslexic, deaf and hear-
ing impaired users ing impaired users
streamed time- Flash Flash
dependend media
screenreader support Independent independent
pipeline Cocoon DAISY pipeline applica-
ble
operating systems Independent MS Windows family
4. Collaborative accessible multimedia production
Accessible multimedia document production either in DAISY or MultiReader for-
mat demands well skilled experts in accessible document processing but can be
assisted by automatically processing technologies.
4.1 Textual source generation
The usual book author mainly produces text results including several images for
print or electronic publication. However, a generated source document should
always be the starting point for a fully accessible multimedia publication, no mat-
ter if textual content, audio or video is used as primary media.
Our considerations concentrate on textual content as primary media.
Structured information is the first big step towards high-quality accessible infor-
mation. A document whose internal structure can be defined and its elements
isolated and classified, without losing sight of the overall structure of the docu-
ment, is a document that can be navigated [DPA08]. Formats providing the po-
tential to create those structured content should base at optimal case on XML.
So how can an author without any particular knowledge in the field of accessibil-
ity reach these objectives?
7. Many office applications like Microsoft Office 2007 or OpenOffice use internally
XML based formats. The author’s responsibilities are just in qualified content
marking by specifying adequate style templates. For example, if an author uses
headings instead of big, bold fonts the authoring tool can perform semantic rea-
soning for internal document structuring. Of course it is not only the styling, addi-
tional tasks like alternative image descriptions allow the verbalisation of graphical
content by words of the actual author. Accessible textual content production in
specific authoring environments is already discussed in a variety of publications,
so authors will always get help in their particular environment by mutually agreed
guidelines.
Our interest is more in the resulting, ‘digital born’ XML document and its opportu-
nities for reuse in a library-for-all and its document processing.
4.2 Automatic vs. manual accessible document preparation
The preparation of generated source documents shouldn’t be part of authors
specific authoring tool work. To achieve identical high-quality results authors or
their publishers as representatives have to ingest the XML based source docu-
ments for common processing in an adapted environment. This environment is
our archival information system. Different import filters allow consistent transfor-
mation of source documents in specific resources for adequate archiving in a
simple and economic manner. For instance, does a qualified OpenDocument to
PDF filter ensure overtake of all document structures and tagging, whereas most
free market solutions cannot offer.
What about filters for the generation of accessible multimedia documents? Re-
sulting first step is as well a transformation of produced source documents to cor-
responding mark-up-based textual content files and related exports of multimedia
items. Of course in our considered context multimedia primarily refers to graphi-
cal contents.
In the past DTBs only contained human narrated voice and no related textual
content. Today textual content is automatically convertible into synthetic voice by
text-to-speech solutions. Synthetic speech is accepted by users in some do-
mains, for example timely production of a TV guide or items that do not have in-
creased demand by handicapped users.
The DAISY Pipeline offes a variety of filters and additional validation components
to safeguard high-quality transformation. Consequently, generated source docu-
ments get transcribed into DAISY master documents which may need to be
transformed to a specific delivery format (e.g. electronic or printed Braille, E-text,
Daisy Text-Only DTB) [DAPI08].
For audio production, a narrator component realises a text-to-speech transforma-
tion and corresponding mark-up in a DTB for synchronisation of textual content
and audio. Furthermore, DAISY Pipeline converts DTBs between different DAISY
Standards. This approach can ensure aspects of long-term preservation in archi-
val information systems.
Much more difficult is a union of text based sources and human narrated voice.
Studio recording is based on well skilled narrators. Then the synchronisation
mostly takes place in a lot of handcraft. As a seminal development a speech–to-
8. text transformation could detour the troublesome synchronisation work to step
towards an almost automatic production. Currently no application is providing
such functionality for accessible multimedia production.
Most difficult and probably most expensive in accessible multimedia production is
the generation of sign language videos. Deaf people do not yet accept artificial
sign language by avatars. Recording effort is similar to human narrators of text
but it is related to a much higher expense for synchronisation because of missing
recognition and transformation tools. The only applicable approach including sus-
tainable costs is the use of a lexicon to describe written words in sign language.
We also want to mention the issue in subtitling videos or audio as primary media.
In the library of tomorrow video and audio productions will be substantial assets.
The actual spoken text is as well extractable by speech recognition approaches.
But the breakup of single audio tracks without creators support is almost impos-
sible. Furthermore, blind and hearing impaired users need additional descriptions
that definitely have to be extended by well skilled accessibility staff.
4.3 Ingest strategies
It is obviously that current full accessible multimedia production still relate to a lot
of handcraft and time intensive efforts. Additionally we have illustrated the de-
mand of a vast number of contributors in resource (resp. resource part) produc-
tion.
Therefore distributed specialists have to be provided by a single access point for
common authoring. The MultiReader project introduced the MultiWriter as a web
based authoring tool for a progressive and collaborative document generation
[SZC04]. Unfortunately this approach is not starting at document production by
usual authors with little respect to accessibility.
A more advanced approach is in an Iterative ingest of distributed produced items
in combination with filter techniques. Consequently, automatic resource produc-
tion can take place on generated source documents as well as additional proc-
essing steps. Accessibility experts just extend the resource by additionally pro-
duced media items and ensure their synchronisation.
Similar to MultiWriter archival information system should achieve distributed and
iterative ingest processes by the provision of a web based interface supporting
collaborative work by contributors.
Another innovative approach for the increase of document quality is in actual
reader contribution. Web 2.0 experiences evidence the success caused by role
changes of end users to authors. The probably most famous example is Wikipe-
dia. It is conceivably that disabled users know best what fits requirements of fel-
lows in miseries. If those users are able to supply further information, give them a
chance to!
4.4 Metadata enrichment
Digital libraries already cover a variety of metadata in fields of descriptive, struc-
tural, administrative, and long-term preservation metadata. Currently metadata
enrichment is almost exclusively provided by librarians. Only little metadata gets
attached by authors or publishers during ingest phase.
9. In context of accessible multimedia production and dissemination digital library
face also new challenges in distributed metadata enrichment on single item level
as well as item dependency level.
But what particular metadata appears by archiving accessible multimedia re-
sources? In first row it is necessary to group resources into two possible catego-
ries: primary resources and equivalent alternative resources. The primary re-
source is the initial or default resource. An equivalent alternative resource pro-
vides equivalent semantic and behavioural functionality [ACCMD04]. Equivalent
alternative resource can cover the whole primary resource or only parts of.
Primary resources have to declare global access modalities (sight, sound, and
touch, with an additional special content property of 'text' to denote the need for
text literacy [ACCMD04]) and a local modality of use for included sub-items. Ad-
ditionally, a primary resource needs metadata about adaptability regarding dis-
play transformability and control flexibility. Furthermore this metadata must cover
information about existence of equivalent alternative resources.
Equivalent alternative resources can supplement (i.e. captions for a video) or
substitute (i.e. DTB substitutes a PDF) a primary resource. Corresponding meta-
data declares the nature of the resource equivalence. Metadata refers the actual
primary media and specifies the kind of alternative equivalent and its modality of
use. For example, video equivalents are captions (visual) and audio descriptions
(auditory).
Consequently an archival information system is able to match equivalent alterna-
tives to needs or preferences of a user. Needs and preferences can be set by
users before searching. Resulting from, the system pre-manufactures accessible
items to an accessible multimedia document. If users do not specify preferred
modalities they get a list of all available media sources and choose best fitting.
The interesting issue in distributed metadata enrichment is: Who specifies which
metadata at what time?
1. Authors or representative publishers of generated textual source docu-
ments have to declare main descriptive metadata and additionally i.e. ver-
balization of included graphical contents as most simple alternative.
2. Afterwards a librarian does his usual job by enriching conventional meta-
data to ingested resources. In respect to accessibility librarians have to
specify missing access modalities on resource and sub-item level and pay
special attention to structural metadata for primary resources.
3. For accessible resource publishing the contribution in metadata enrich-
ment by accessibility experts as resource producers is inevitably. They
have to declare the alternative access modalities and the relation to a pri-
mary resource. Furthermore, it is important to specify information about
resource retrieval. For example, a tactile graphic is not online available.
Following steps require collaborative work of the librarians and the accessibility
experts to ensure the needs for archiving in long-term and best suitable resource
distribution. Particularly enhanced structural and administrative metadata opti-
mize corresponding workflows.
10. 5. Conclusions
Digital libraries will be able to deliver personalized books if the authors, narrators
and transcribers work together with librarians. Such a workflow requires a distrib-
uted and asynchronous approach while preserving the author’s intention through
quality assurance methods and tools. Readers will benefit from improved read-
ability of these books but authors may find it difficult to write ‘for’ their readers.
Both, mainstream users and print-disabled people enjoy more comfortable
document dealing. We are not proposing a new Kindle, but focus on the plasticity
of the reading experience in order to ensure accessibility. We have described
reading programs for similar types of rich multimedia documents. Key to their ac-
cessibility is the use of industry formats such as XHTML and Flash. Both pre-
serve the user’s identity by storing a reader profile locally with the reading pro-
gram.
The main difference arises from the support of existing quality assurance tools
such as manual and automatic tools for checking web accessibility. Future work
will have to show how a digital library benefits from such tools in order to address
more readers.
6. References
[ACCMD04] IMS AccessForAll Meta-data
(http://www.imsglobal.org/accessibility/accmdv1p0/imsaccmd_oviewv1p0.html).
July 2004.
[DPA08] Document Processing for Accessibility. CEN WORKSHOP AGREE-
MENT, CWA 15778, February 2008.
[DAPI08] DAISY Pipeline (http://www.daisy.org/projects/pipeline/). retrieved Sept
8, 2008.
[DAYNL08] DAISY Planet Newsletter August 2008
(http://www.daisy.org/news/newsletters/planet-2008-08.shtml). retrieved Sept 8,
2008.
[DTB05] Specifications for the Digital Talking Book. ANSI/NISO Z39.86-2005,
April 2005.
[FLEX08] http://www.adobe.com/products/flex/, retrieved Sept 8, 2008.
[EDF08] http://www.edf-feph.org/Page_Generale.asp?DocID=12534, retrieved
Sept 8, 2008.
[EBE08] Eberius, W.: Multimodale Erweiterung und Distribution von Digital Tal-
king Books. Diploma Thesis, Dept. Computer Science, TU Dresden, 2008.
[OAIS02] Consultative Committee for Space Data Systems: Reference Model for
an Open Archival Information System (OAIS). CCSDS 650.0-B-1 BLUE BOOK,
January 2002.
[PET05] Petrie, H.; Weber, G.; Fisher, W.: Personalisation, interaction and navi-
gation in rich multimedia documents for print-disabled users. IBM Systems Jour-
nal, 44 (3), 2005, 629-636.
[SPI08] Spindler, M.: Verteilte barrierearme multimediale Dokumente. Diploma
Thesis, Dept. Computer Science,TU Dresden, 2008.
11. [STS08] SMIL Timesheets 1.0, W3C Working Draft 10
(http://www.w3.org/TR/timesheets/). January 2008.
[SZC04] Szczepaniak, A.: Authoring System for a XML-based Multimedia eBook.
Master Thesis, Multimedia Campus Kiel, 2004.
[WTT08] Timed Text (TT) Authoring Format 1.0 – Distribution Format Exchange
Profile (DFXP) (http://www.w3.org/TR/2006/CR-ttaf1-dfxp-20061116/). November
2006.