9 ODT2DAISY: Producing Digital Talking Books with Open-Source Software


Published on

odt2daisy is an open-source add-on for OpenOffice.org that converts text processing files to digital talking books in the DAISY1 format (ANSI/NISO2 Z39.86). Digital talking books make print material accessible to blind or otherwise print-disabled persons. DAISY contains features that allow users to navigate by headings or page numbers, and to have a text version that is synchronised with the audio version. odt2daisy produces both Full DAISY 3 (text synchronised with audio) and DAISY 3 XML3 (text without audio). For compatibility with older DAISY software, it also supports DAISY 2.02. odt2daisy also supports mathematical content (Mathematical Markup Language). odt2daisy works on Microsoft Windows, Mac OS X, Linux and Solaris. For the production of audio, odt2daisy relies on the DAISY Pipeline Lite, an open-source software developed by the DAISY Consortium, the LAME MP3 encoding technology, and the operating system’s text-to-speech (TTS) engine(s). The supported languages depend on the TTS engines available on the user’s system. On Unix-based systems odt2daisy relies on the open-source eSpeak TTS engine, which supports 27 languages. odt2daisy enables the production of DAISY books with only opensource software, for example Ubuntu Linux, OpenOffice.org, odt2daisy and eSpeak constitute a completely open-source software stack. The next step is the development of an accessibility evaluation and repair add-on for OpenOffice.org in order to ensure that documents produced with OpenOffice.org can be more accessible and serve as a better basis for exporting to other formats such as DAISY, PDF4 and HTML5. Vincent Spiewak started working on odt2daisy at the Université Pierre et Marie Curie (Paris, France) and continued the work at the Katholieke Universiteit Leuven (Leuven, Belgium) in the framework of ÆGIS, a research and development project co-financed by the European Commission’s 7th Framework Programme.

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

9 ODT2DAISY: Producing Digital Talking Books with Open-Source Software

  1. 1. odt2daisy: digital talking books with open-source software Christophe Strobbe Katholieke Universiteit Leuven Belgium
  2. 2. Motivation & Problem Area <ul><li>Digital talking books </li></ul><ul><li>For persons with “print disabilities” </li></ul><ul><li>DAISY – ANSI/NISO Z39/86 </li></ul><ul><li>Production: typically </li></ul><ul><ul><li>by specialised production centres </li></ul></ul><ul><ul><li>for blind & visually impaired users </li></ul></ul><ul><ul><li>i.e. not by users (in 2007) </li></ul></ul>
  3. 3. Objectives <ul><li>Enable end-users to produce DAISY </li></ul><ul><li>In most European languages </li></ul><ul><li>In a free and open-source office suite </li></ul><ul><li>Support: </li></ul><ul><ul><li>DAISY 3 (with or without audio) </li></ul></ul><ul><ul><li>DAISY 2.02 (for older players) </li></ul></ul><ul><ul><li>Multilingual content </li></ul></ul><ul><ul><li>Mathematical Markup Language </li></ul></ul>
  4. 4. Methodology <ul><li>Build OpenOffice.org extension </li></ul><ul><ul><li>Odt2dtbook by Vincent Spiewak available in 2008 </li></ul></ul><ul><ul><li>Functionality available as extension and as reusable JAR (Java Archive) </li></ul></ul><ul><ul><li>Add: </li></ul></ul><ul><ul><ul><li>DAISY 3 audio, DAISY 2.02 </li></ul></ul></ul><ul><ul><ul><li>comprehensive set of test documents (regression testing) </li></ul></ul></ul><ul><ul><ul><li>Support for multilingual content on Windows </li></ul></ul></ul>
  5. 5. odt2daisy Components (1) <ul><li>Java Open Document Library (JODL) </li></ul><ul><ul><li>For ODT / XML preprocessing </li></ul></ul><ul><li>odt2daisy library </li></ul><ul><ul><li>Converts ODT to DAISY XML (XSTL) </li></ul></ul><ul><ul><li>Validates output </li></ul></ul><ul><ul><li>Reusable Java library </li></ul></ul><ul><ul><li>Command line interface </li></ul></ul>
  6. 6. odt2daisy Components (2) <ul><li>odt2daisy extension </li></ul><ul><ul><li>Wrapper for other components: </li></ul></ul><ul><ul><li>Uses OpenOffice.org UNO API </li></ul></ul><ul><ul><li>Uses odt2daisy library </li></ul></ul><ul><ul><li>Uses DAISY Pipeline Lite (speech synthesis) </li></ul></ul><ul><ul><li>Includes templates </li></ul></ul><ul><li>Templates with custom styles for DAISY production </li></ul>
  7. 7. Results (1) <ul><li>odt2daisy released November 2009 </li></ul><ul><ul><li>Tutorials in various formats (text, DAISY, video) </li></ul></ul><ul><ul><li>Developer documentation </li></ul></ul><ul><ul><li>Test files for regression testing </li></ul></ul><ul><ul><li>TTS in 27 languages where eSpeak is available (Linux, Windows) </li></ul></ul>
  8. 8. Results (2) <ul><li>Support for ODT features </li></ul><ul><ul><li>Heading, List, Table, Images, Captions, Notes, Foot/Rear notes, Math, TOC, Section, Frame, Bookmark, Metadata, ... </li></ul></ul><ul><ul><li>Page numbering (1,i,I,a,A; advanced) </li></ul></ul><ul><ul><li>Front / body / rear matter </li></ul></ul><ul><ul><li>“ Complex text layout” and East-Asian languages not supported </li></ul></ul>
  9. 9. Conclusion and Outlook <ul><li>Some ODT features are hard to parse (e.g. multilingual text; “Asian” languages) </li></ul><ul><li>Licensing: MP3 vs Ogg Vorbis for TTS </li></ul><ul><li>TTS quality: TTS as internet service/ in cloud computing? </li></ul><ul><li>Accessibility checking before export </li></ul>
  10. 10. Start Using It! <ul><li>http://odt2daisy.sf.net/ </li></ul><ul><li>Developer site: </li></ul><ul><ul><li>http://sourceforge.net/projects/odt2daisy/ </li></ul></ul>