From XML to eBooks Part 2: The Details

671 views
581 views

Published on

LavaCon 2012 presentation about creating eBooks from DocBook XML. This presentation provides details of the XML Press process for creating eBooks. A companion presentation (From XML to eBooks Part 2: Overview) is an introduction.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
671
On SlideShare
0
From Embeds
0
Number of Embeds
18
Actions
Shares
0
Downloads
25
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

From XML to eBooks Part 2: The Details

  1. 1. From XML to eBooksPart II: The Devil is in the Details Richard Hamilton XML Press hamilton@xmlpress.net
  2. 2. Slight RecapFor most tech comm situations:● Two formats matter: ePub & Kindle● XML processes (esp., Docbook or DITA) will make things much easier● Content strategy is the hardest part● Authoring is next hardest● Production is tough, but doable●Distribution is easiest
  3. 3. Overview● Authoring● Storing and managing content●Producing output Content Strategy is critical, but not for this presentation
  4. 4. Authoring Authoring formats at XML Press: ● DocBook XML: 5 books ● DITA XML: 2 books (so far) ● Word: 4 books ● Wiki (Confluence): 1 book ● Wiki (pbworks): 3 books ● Author-it: 1 book ● InDesign: 1 bookAll but 3 (1 each in Word, InDesign, & Author-it) were ultimately produced from XML
  5. 5. Authoring in a Wiki● Based on PBWorks● Authors, editor, reviewers, indexers, work in wiki● Parallel access throughout most of the process● Content exported for proofs as needed● Content moved to SVN for final production Requires a clean, clear breaking point where content moves from wiki to SVN
  6. 6. HTML to XHTML Tidy  Convert  Cleanup
  7. 7. Pre-process XHTML XSL Stylesheet  Remove empty elements  Normalize  Handle headings
  8. 8. Convert to DocBook Herold  Infer hierarchy  Convert  Define structure
  9. 9. Process Supplemental Markup Perl script  Index entries  Footnotes  Endnotes  Sidebars  Epigraphs  Block quotes  Convert all to DocBook
  10. 10. Supplemental MarkupIndexing: {in primary; secondary; tertiary} {id term 1; term 2} {is id; primary; secondary; tertiary} {ie id} {is term; see term}Footnotes, sidebars, etc. {if footnote text} {ib sidebar text} {ip epigraph text;;attribution;;source} {it endnote text} {iq quotation;;attribution}
  11. 11. Cleanup XSL stylesheet Handle links Validate structure
  12. 12. What about Confluence?Confluence,Tech Comm, Chocolate used K15t Softwares DocBook export plugin, which also handles much of what the supplemental markup handles.
  13. 13. Storing and managing content Content has one home, but... ●That home can change at certain well-defined points ●For XML, SVN is the home ●For wiki, the wiki is the home until production, then SVN is the home ●Home changes once, irrevocably ●All production comes from SVN
  14. 14. ePub Structure Top Level Directorymimetype (file) OEBPS META-INFApplication/epub_zip (folder) (folder)Identifies this as an ePub file container.xml (file) (next page) Points to package file inePub file is simply a zip file of this OEBPS folder.structure, with mimetype as firstfile in the zip. Uses .epub suffix.
  15. 15. Ebook production - DocBook OEBPS Directory Contents OEBPS (folder) OPF file package.opf Navigation file toc.ncx CSS file xyz.css ch01-toc.xhtml HTML TOC figure.jpg ch01.xhtml Media screen.png ch01s02.xhtml ... HTML Content ch01s03.xhtml … chXX.xhtml Notes:This folder is like any website ●Names are arbitrary ●Sub-folders ok
  16. 16. NCX View in Kindle Button for NCX view in emulator
  17. 17. Ebook production - DocBook OEBPS Directory Contents OEBPS (folder) OPF file package.opf Navigation file toc.ncx CSS file xyz.css ch01-toc.xhtml HTML TOC figure.jpg ch01.xhtml Media screen.png ch01s02.xhtml ... HTML Content ch01s03.xhtml … chXX.xhtml Notes:This folder is like any website ●Names are arbitrary ●Sub-folders ok
  18. 18. OPF (Open Packaging Format) <package ...> <metadata ...> … Dublin Core Metadata elements … </metadata> <manifest> } Metadata } <item id=”ncx” media-type=”application/x-dtbncx+xml” href=”toc.ncx”/> <item id=”toc” media-type=”application/xhtml+xml” Whats in href=”ch01-toc.xhtml”/> the ePub? <item id=”ch01” media-type=”application/xhtml+xml” href=”ch01-toc.xhtml”/> … } </manifest> <spine toc=”ncx”> What order <itemref idref=”cover”/> is it in? <itemref idref=”toc”/> … } </spine> <guide> Where do <reference type=”text” title=”Startup page” href=”ch01.xhtml”/> you start? </reference> </guide> </package> Change starting place
  19. 19. Other tweaks to XHTML● Remove empty paragraphs (vestige of wiki past)● Remove <p> around first para after an <li> (for original Kindle)● Work around a few epubcheck anomalies
  20. 20. ePub/Kindle from DocBook● Based on open-source DocBook stylesheets● ePub3 transform by Bob Stayton● CSS added● A few minor tweaks for personal preference● Kindle (.mobi) produced using kindlegen● Amazon tests .mobi and converts to smaller file
  21. 21. Generating ePub from DocBook DocBook XSL  ePub3 transform  Based on HTML5 transform Generates all ePub3 files
  22. 22. Generating ePub from DocBook File cleanup  Adjust .opf file  Clean up XHTML
  23. 23. Generating ePub from DocBook File preparation  Copy images  Copy in CSS file  Run zip to create .epub file
  24. 24. ePub/Kindle from DITA● Based on DITA Open Toolkit and DITA for Publishers toolkit extensions (developed by Eliot Kimber)● Does not require content to use DITA for Publishers specialization.● Generates ePub2 compliant files● Kindle (.mobi) produced using kindlegen● Amazon tests .mobi and converts to smaller file
  25. 25. Thanks for listening Richard Hamilton XML Press hamilton@xmlpress.net

×