Engineering Next-Generation Publishing Workflows


Published on

Slides from "Engineering Next-Generation Publishing Workflows", IDPF Digital Book 2013

Learn how technical publisher O'Reilly Media has solved the challenges of implementing a single-source workflow by taking advantage of modern open source software development tools to create a new authoring platform for print and ebook creation. Topics covered will include optimal authoring document formats, version control, automated eBook generation, and developing digital-first content with HTML5 technology.

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Note that there is no markup for the #1 in chapter heading
  • Engineering Next-Generation Publishing Workflows

    1. 1. Engineering Next-Generation PublishingWorkflowsIDPF Digital Book 2013May 30, 2013Sanders KleinfeldO’Reilly Media, Inc.
    2. 2. How do youwrite a book?
    3. 3. How do youwrite a “book”?
    4. 4. How do youwrite an (e)book?
    5. 5. How do you“write” an (e)book?
    6. 6. Anatomy of an ebook: EPUBWhat you see<?xml version="1.0" encoding="UTF-8" standalone="no"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" ""><html xmlns=""><head><title>Chapter 1. A Python Q&amp;A Session</title><link rel="stylesheet" href="core.css" type="text/css" /><meta name="generator" content="DocBook XSL Stylesheets V1.74.0" /></head><body><div class="chapter" title="Chapter 1. A Python Q&amp;A Session"><div class="titlepage”><div><div><h1 class="title"><a id="a_python_q_ampersand_a_session”></a>Chapter 1. A Python Q&amp;A Session</h1></div></div></div><p>If you’ve bought this book, you may already know what Python isand why it’s an important tool to learn. If you don’t, you probably won’t be soldon Python until you’ve learned the language by reading the rest of this book andhave done a project or two. But before we jump into details, the first few pagesof this book will briefly introduce some of the main reasons behind Python’spopularity. To begin sculpting a definition of Python, this chapter takes the formof a question-and-answer session, which poses some of the most commonquestions asked by beginners.</p>What’s inside
    7. 7. Ebooks are made ofcode. If you are an ebookpublisher, you are in thesoftware-developmentbusiness.An Inconvenient Truth:
    8. 8. How do you“write” an (e)book?
    9. 9. How do youdevelop an (e)book?
    10. 10. Five Key Principles of aModern (e)Book Workflow#1. Semantic Markup Matters#2. Single Source, Multiple Outputs#3. Automate Your Headaches Away#4. Versioning is the New Spell-Check#5. Always think “Digital First”
    11. 11. #1 Semantic MarkupMatters
    12. 12. First Chapter of My MemoirsMicrosoftWord
    13. 13. Underlying Representation of Content(Word XML)<w:body><w:p w:rsidR="0073527D" w:rsidRDefault="007F1550" w:rsidP="007F1550”><w:pPr><w:jc w:val="right"/><w:rPr><w:sz w:val="96"/><w:szCs w:val="96"/></w:rPr></w:pPr><w:r w:rsidRPr="007F1550”><w:rPr><w:szw:val="96"/><w:szCs w:val="96"/></w:rPr>!!<w:t>1</w:t>!!</w:r></w:p><w:p w:rsidR="007F1550" w:rsidRDefault="007F1550" w:rsidP="007F1550”><w:pPr><w:jc w:val="right"/>!<w:rPr><w:sz w:val="72"/><w:szCs w:val="72"/></w:rPr></w:pPr><w:r w:rsidRPr="007F1550”><w:rPr><w:szw:val="72"/><w:szCs w:val="72"/></w:rPr>!!<w:t>Autobiography of Me</w:t>!!</w:r></w:p><w:p w:rsidR="007F1550" w:rsidRPr="007F1550" w:rsidRDefault="007F1550" w:rsidP="007F1550">!<w:pPr><w:jc w:val="right"/><w:rPr><w:sz w:val="72"/><w:szCs w:val="72"/></w:rPr></w:pPr></w:p>!<w:p w:rsidR="007F1550" w:rsidRPr="00032659" w:rsidRDefault="007F1550" w:rsidP="007F1550”><w:pPr><w:rPr>!<w:sz w:val="48"/><w:szCs w:val="48"/></w:rPr></w:pPr><w:r w:rsidRPr="00032659”><w:rPr><w:sz w:val="48"/>!<w:szCs w:val="48"/></w:rPr>!!<w:t xml:space="preserve">I was born in 1980, I love chocolate ice cream, and I am a </w:t>!!</w:r><w:r w:rsidRPr="00032659”><w:rPr><w:i/><w:sz w:val="48"/><w:szCs w:val="48"/></w:rPr>!!<w:t>wicked awesome</w:t>!!</w:r><w:r w:rsidRPr="00032659”><w:rPr><w:sz w:val="48"/><w:szCs w:val="48"/></w:rPr>!!<w:t xml:space="preserve"> writer, </w:t></w:r>!!<w:proofErr w:type="spellStart"/><w:r w:rsidRPr="00032659”><w:rPr><w:sz w:val="48"/><w:szCs w:val="48"/></w:rPr>!!<w:t>yo</w:t>!!</w:r><w:proofErr w:type="spellEnd"/>!…!
    14. 14. Three Problems with this XML•  Markup is not semantic!•  It conflates content and presentation•  Um, yuck 
    15. 15. Semantic Markup in a NutshellSemantic markup describes the functionof your content, not its formattingSEMANTIC MARKUP SAYS:“This is a section heading”NOT:“This text is in Garamond, 36 pt, bold,center-aligned”
    16. 16. Semantic Markup Option #1:DocBook•  DocBook is a semantic XML markupvocabulary introduced in 1991•  It was primarily designed forrepresenting technicaldocumentation, but is well-suited forrepresenting any prose content•  DocBook DTDs are available here:
    17. 17. DocBook Representation ofBook Content<?xml version="1.0" encoding="utf-8"?>!<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XMLV4.5//EN" "">!<chapter>!<title>Autobiography of Me</title>!<para>I was born in 1980, I lovechocolate ice cream, and I am a<emphasis>wicked awesome</emphasis>writer, yo!</para>!</chapter>!
    18. 18. Text Editors with GUIDocBook SupportXMLmind XML Editor( XML Editor(
    19. 19. Semantic Markup Option #2:AsciiDoc•  AsciiDoc is a lightweight, wiki-likemarkup language for prose content•  It was created by Stuart Rackham in2002.•  The AsciiDoc toolchain is written inPython, and relies heavily on textprocessing with regular expressions.
    20. 20. AsciiDoc Representation ofBook Content== Autobiography of Me!!I was born in 1980, I lovechocolate ice cream, and Iam a _wicked awesome_writer, yo!!
    21. 21. Text Editor withAsciiDoc SupportO’Reilly Atlas
    22. 22. Semantic Markup Option #3:HTML
    23. 23. “Say what? HTML?”
    24. 24. Ebooks are composed ofHTML…So, why not write them inHTML?
    25. 25. HTML5 = New StructuralSemantics•  <article>•  <aside>•  <header>•  <figure>•  <footer>•  <nav>•  <section>
    26. 26. But eBooks require a richercontent model!!!•  More robust semantics for book-specific elements—e.g, chapter,appendix, glossary•  Explicit, enforceable rules forstructure—e.g, no <h1>s lower in thehierarchy than <h2>s
    27. 27. Introducing the HTMLBook Project:
    28. 28. “That’s nice, butwhat’s in it for me ifI develop my (e)bookin DocBook orAsciiDoc or HTML?”
    29. 29. #2 Single Source,Multiple Outputs
    30. 30. Welcome to Conversion CityEnjoy Your Stay!Conversion! Conversion!Conversion!
    31. 31. The Single-Source ModelXML or HTML
    32. 32. Advantages of the Single-Source Model:•  All authoring/edits are made to just oneset of files. No need to maintain multiplesets of files.•  Outputs are produced by transforms, notconversions.•  Transforms are automated, fast,infinitely repeatable, and do not requirecleanup afterward.•  The model is extensible. Add new outputformats by adding a new transform.Workflow doesn’t need to be reinvented.
    33. 33. ASC/DB Single-Source Workflow:AsciiDocDocBook XMLasciidoc.pyDocBook XSLEPUB Stylesheets+ Custom CSSEPUBDocBook XSLHTML5StylesheetsHTML5Print PDF Web PDFAntennaHouse +Print CSS3AntennaHouse +Web CSS3EPUBDocBook XSLEPUB StylesheetsCustom XSL forEPUB postprocessing+ KF8/Mobi7 CSSMobi-ready EPUBKindlegenMobi (KF8)Source ContentIntermediate OutputFinal Output For Sale(optional; can start with DocBook)
    34. 34. HTML5 Single-Source Workflow:HTML5EPUB Print PDF Web PDFAntennaHouse+ Print CSS3AntennaHouse+ Web CSS3EPUBCustom XSL forEPUB postprocessing+ KF8/Mobi7 CSSMobi-ready EPUBKindlegenMobi (KF8)Source ContentIntermediate OutputFinal Output For SalePackaging XSL+ CSSPackaging XSL+ CSS
    35. 35. O’Reilly Atlas Ebook Build UI#1. Pick ebookformats to build#2. Pick contentfiles to build#3. Click “Build”
    36. 36. #3 Automate YourHeadaches Away
    37. 37. 1776:!2012:Manuscript editscannot be automatedManuscript editscan be automated!Some rights reserved by ASurroca!
    38. 38. Tools for ScriptingWord Documents•  Macros•  Visual Basic for Applications (VBA)•  PowerShell
    39. 39. Tools for ScriptingPlaintext (AsciiDoc/XML) Documents•  Ruby•  Python•  Perl•  Java•  XPath/XSLT/XQuery•  JavaScript•  Regex•  Emacs/vi•  sed•  And many more…
    40. 40. Fix My Manuscript with One Line of Code!Request #1:“In the important scientific article below, please change allsuperscripts to subscripts, except in informal equationelements”<chapter id="chap1">!!<title>Makin’ Water and Energy</title>!!<para>Makin’ water is really easy. The formula is !H<superscript>2</superscript>O, so you just takesome H<superscript>2</superscript>, and add someO.</para>!!<para>Also, here’s how you make energy (perEinstein):</para>!!<informalequation>!<mathphrase>!E = mc<superscript>2</superscript>!</mathphrase>!</informalequation>!</chapter>!DocBook XML Manuscript: PDF Output:
    41. 41. Fix My Manuscript with One Line of Code!Solution #1: XPath to the rescue!<chapter id="chap1">!!<title>Makin’ Water and Energy</title>!!<para>Makin’ water is really easy. The formula is !H<subscript>2</subscript>O, so you just take someH<subscript>2</subscript>, and add some O.</para>!!<para>Also, here’s how you make energy (perEinstein):</para>!!<informalequation>!<mathphrase>!E = mc<superscript>2</superscript>!</mathphrase>!</informalequation>!</chapter>!Revised DocBook Manuscript: PDF Output:$ xmlstarlet ed -r "//superscript[not(ancestor::informalequation)]" -v "subscript" book.xml!!XMLcommandMakeaneditr =renameSelectsuperscripts……thatarenot…. …inside……informalequations.v =replacementvalueReplacewithsubscripts.Do all thisonbook.xml
    42. 42. Fix My Manuscript with One Line of Code!Request #2:“House style for dates is YYYY-MM-DD Can you please fix inmanuscript below?”AsciiDoc Manuscript: PDF Output:== Kindergarten Lemonade Sales!!.Lemonade sales by KindergartenLemonade, LLC![options="header"]!|================!|Date|Lemonade Sold|!|3/15/12|6 glasses|!|4/22/10|10 glasses|!|5/31/12|2 glasses|!|7/14/11|4 glasses|!|8/19/12|1 glass|!|9/24/12|432 glasses|!|================!
    43. 43. Fix My Manuscript with One Line of Code!Solution #2: Regex FTW!AsciiDoc Manuscript: PDF Output:== Kindergarten Lemonade Sales!!.Lemonade sales by KindergartenLemonade, LLC![options="header"]!|================!|Date|Lemonade Sold|!|2012-03-15|6 glasses|!|2010-04-22|10 glasses|!|2012-05-31|2 glasses|!|2011-07-14|4 glasses|!|2012-08-19|1 glass|!|2012-09-24|432 glasses|!|================!$ perl -p -e s#^(.*)([1-9])/([0-9]{2})/([0-9]{4})(.*)$#$1$4-0$2-$3$5#g book.asc!Perl script!Print eachline…Run thefollowingregexCapture the following pattern:CharsbeforedateDigitsinmonthDigits indayDigits inyearCharsafterdateSpecify replacement pattern:CharsbeforedateYear Month DayCharsafterdatePerform onthis file
    44. 44. #4 Versioning is theNew Spell-Check
    45. 45. Two Questions About Your (e)Book’sEditorial Lifecycle1. Will more than one person beworking on the manuscript files?2. Will there be more than one draft ofthe manuscript?
    46. 46. If you answered yesto either question,you need a version-control system.
    47. 47. Key Feature #1 of Version Control:Revision Snapshots
    48. 48. Key Feature #2 of Version Control:Diffing
    49. 49. What if weversionedmanuscripts likesoftware developersversion code?
    50. 50. Revision snapshots in GitHubPro Git:
    51. 51. Diffing in GitHub(English to Portuguese translation)
    52. 52. #5 Always Think“Digital First”
    53. 53. There is a differencebetween a digitizedtext and a digitaltext
    54. 54. Digitized Text = Digital Last“Let’s make a print book andthen get it converted to anebook.”Digital Text = Digital First“Let’s make an ebook.”
    55. 55. What Does Digital FirstLook Like?
    56. 56. Welcome to Atlas [Beta] examples!
    57. 57. Welcome to Atlas [Beta] Commenting!
    58. 58. Welcome to Atlas [Beta] Multimedia!
    59. 59. Contact Me!Email: sanders@oreilly.comTwitter: @sandersk