Sustainable XML for Publishing Applications: DITA Makes It Possible


Published on

Presented by Eliot Kimber at Documentation and Training East 2008,
October 29-November 1, 2008 in Burlington, MA.

XML applications for publishers have largely failed to realize the
full potential inherent in the technology. While larger publishers
could make the investment necessary to realize significant return on
the use of XML technology, smaller enterprises simply could not, for a
number of reasons, but fundamentally because the startup costs and
ongoing costs of ownership were simply too high. The DITA standard
fundamentally changes the equation, bringing several unique features
that, together, serve to lower both the startup cost and ongoing
costs, making the use of XML for publishers much more affordable than
it ever has before. At the same time, advances in supporting
technologies important to Publishers, such as improved support for XML
in Adobe Creative Suite and Microsoft Office, powerful new XML search
and retrieval systems such as MarkLogic, and a new generation of lower-
cost XML editors, as serve to make the use of XML for Publishing
applications more attractive than it ever has been before.

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Sustainable XML for Publishing Applications: DITA Makes It Possible

  1. 1. Sustainable XML for Publishing Applications: DITA Makes It Possible Eliot Kimber, Really Strategies, Inc. DocTrain East 2008
  2. 2. Preliminaries
  3. 3. Who Is This Talk For?  Publishers who want to implement XML-based solutions  Publishers who have XML-based solutions that need to be enhanced, refined, or upgraded  Creators of XML-aware tools applicable to publishing use cases  Service providers who support the development and use of XML-based solutions for Publishers:  Integrators  Data conversion houses  Consultants DITA for Publishers DocTrain East 2008
  4. 4. About Me  Senior Concept Prover at Really Strategies Inc.  20+ years experience with generalized markup (GML, SGML, XML, etc.)  Career focus on large-scale hyperdocument creation and management  Focus for last 8+ years on Publishing use cases around XML-based publishing workflows  Active member of the DITA Technical Committee  Founding member of the XML Working Group  Long-time member of the XSL-FO Working Group  Co-editor of the ISO/IEC HyTime standard DITA for Publishers DocTrain East 2008
  5. 5. Audience Survey  Who is here?  Publishing for profit?  Publishing as a cost?  Technical Documentors?  Service providers?  People just interested in DITA?  DITA knowledge:  No idea what DITA is?  Know about DITA a little?  Familiar with DITA concepts and details?  Using DITA now or implementing DITA-based solution? DITA for Publishers DocTrain East 2008
  6. 6. Brief Overview of DITA
  7. 7. What Is DITA?  OASIS Open Standard: Darwin Information Typing Architecture  An XML architecture standard for representing human-consumed information  Some distinguishing aspects of DITA as an XML architecture:  Formal mechanism for controlled definition of new vocabularies (“specialization”)  Optimized for information modularity (“topics”, “maps”) and blind interchange  Standardized document type implementation design patterns  Growing off-the-shelf processing infrastructure  Sophisticated hyperlinking features (“relationship tables”)  Currently at version 1.1, version 1.2 in final stages of review and approval 7 DITA for Publishers DocTrain East 2008
  8. 8. Key DITA Concepts Briefly Explained  Topics  Topic content is paragraphs and stuff  Standalone units of information  Topics may directly contain other topics  Maps  Hierarchical sets of links to topics  Establish organizational hierarchies for sets of topics  May have many maps over the same topics  Can impose metadata onto topics  Can impose topic-to-topic hyperlinks (relationship tables)  Specialization  New element types are “subclasses” of existing types DITA for Publishers DocTrain East 2008
  9. 9. Oooh, A Picture Maps Topics Map One Topic Topic A D Topic Topic B E Map Two Topic Topic C F DITA for Publishers DocTrain East 2008
  10. 10. Output Results Map One I. Topic C 1.1 Topic B 1.1.1 Topic A 1.1.2 Topic D 1.2 Topic F Map to PDF Map Two I. Topic F 1.1 Topic B 1.2 Heading 1.1.1 Topic A 1.1.2 Topic E DITA for Publishers DocTrain East 2008
  11. 11. Specialization  DITA standard defines a set of base element types:  Topic, map, section, paragraph, figure, table, phrase, data  All other elements based on these base types  Establishes a formal class hierarchy for all element types in any DITA document  Every element type maps back to some standard- defined base type  Declaration mechanism is formal and simple:  Uses element attributes (“class=“)  Can be processed by almost any XML tool, including CSS selectors  Even works for DTD-less documents DITA for Publishers DocTrain East 2008
  12. 12. DITA Compared to Other XML Options
  13. 13. Compare DITA With…  DocBook  Book-focused (not inherently modular)  Can use XInclude to manage information in modular fashion  No facility comparable to DITA maps  Mature standard  Very large tag set reflecting union of wide set of requirements  No formal vocabulary extension mechanism  Blind interchange not really possible  Deep off-the-shelf infrastructure 13 DITA for Publishers DocTrain East 2008
  14. 14. Compare DITA With…  NLM  Optimized for journals not books  No formal vocabulary extension mechanism  Little off-the-shelf infrastructure 14 DITA for Publishers DocTrain East 2008
  15. 15. Compare DITA With…  PRISM/PAM  Essentially XHTML with sophisticated metadata  Optimized for serialization, not authoring and archiving  Little off-the-shelf processing infrastructure 15 DITA for Publishers DocTrain East 2008
  16. 16. Compare DITA With…  Custom XML application  Expensive to develop and maintain  Can be optimized for local requirements  Processing infrastructure must be built from scratch  Content management  Authoring tool configuration and customization  Publishing pipelines  Interchange transforms  No blind interchange possible 16 DITA for Publishers DocTrain East 2008
  17. 17. About That…XML Application Development Costs  Information requirements analysis is always required  Using a standard XML application still requires that you determine how to apply it to your requirements  All useful standard XML applications…  …Provide more stuff than you need  …Fail to provide some things specific to your requirements  Amount of analysis required reflects your business problem, not standard chosen  Thus: cost of analysis is essentially invariant regardless of implementation choice  Main variable is cost of system implementation:  Implementation of XML document types (DTDs)  Implementation of management and processing 17 DITA for Publishers DocTrain East 2008
  18. 18. XML System Cost Analysis  Three distinct cost domains:  Initial system development  Cost of use (training, skills required, cost of tools)  Maintenance and refinement over long time scale  Ideal implementation base minimizes all costs:  Low cost of acquisition and implementation  Low cost of use, skills and knowledge common in user population, tools are appropriately priced  Low cost of refinement, extension, interchange, management  Cost evaluated in terms of value:  Short term: ability to meet immediate requirements with lowest initial cost  Long term: ability to support new requirements with lowest cost of maintenance and extension 18 DITA for Publishers DocTrain East 2008
  19. 19. DITA Largely Meets the Ideal  Lowest possible cost of initial solution development  Implementing custom doctypes very low cost  Many off-the-shelf tools “just work” with little or no customization or configuration  Large and growing body of use-case-specific DITA modules  Large and growing body of DITA knowledge  Standard is well written  Many service providers with solid DITA knowledge  Growing body of published DITA how-to information  Controlled extension (“specialization”) means:  Knowledge about one DITA application transfers to all other DITA applications  Extensibility and interchange are optimized  Implementations can optimize their own modularity and flexibility 19 DITA for Publishers DocTrain East 2008
  20. 20. My Assertion: DITA Is Almost Always Best Fit  DITA can be easily and practically applied to almost all documentation use cases (not just tech docs)  DITA’s unique features minimize initial cost of ownership and implementation  DITA’s unique features optimize interchange of content  DITA’s unique features maximize flexibility and stability of supporting tools  Therefore:  DITA provides maximum value compared with other alternatives  Main cost is acceptance of a few constraints that enable DITA’s value 20 DITA for Publishers DocTrain East 2008
  21. 21. DITA Myths Busted
  22. 22. DITA Myth One: DITA Is Only For Tech Docs  DITA is a layered, flexible standard  Originally driven by technical documentation requirements…  …but, core features are completely generic  DITA has been used for:  Government reports  Financial standards  Test preparation books  Travel guides  No inherent restrictions on the kind of publications DITA will work well for 22 DITA for Publishers DocTrain East 2008
  23. 23. Forest for the Trees: It’s Still Just XML  DITA has lots of cool features, some quite sophisticated  This sophistication can be scary  But...  …It’s still just XML  You don’t have to use any particular feature of DITA  Users don’t necessarily need to know it’s DITA  If it being DITA-based doesn’t help at the moment, don’t talk about it  To the non-DITA-aware it looks like any other custom XML application DITA for Publishers DocTrain East 2008
  24. 24. DITA Myth Two: DITA Requires Topic-Based Writing  DITA standard is optimized for modularity  But it does not require that content be stored or written as modules  Use of DITA maps is entirely optional  Topics can physically contain other topics  An entire book could be marked up as a single XML document consisting of one root topic and many child topics  Such a topic would be indistinguishable from any other similar XML document (e.g., an NLM article, a DocBook document) 24 DITA for Publishers DocTrain East 2008
  25. 25. DITA Myth Three: DITA Is Hard  DITA has lots of features, some quite sophisticated  Making full use of all these features requires understanding those features, of course  But at its simplest, DITA is just like any other XML document type for publications: sections, paragraphs, lists, figures, tables, and inlines.  Thus, a DITA application need only be as sophisticated as you need it to be to satisfy your specific requirements  Complexity and “difficulty” of DITA is concentrated in the data processing requirements, not in authoring  Ability to easily define custom vocabularies means you can optimize markup names and structures to reflect local culture and practice 25 DITA for Publishers DocTrain East 2008
  26. 26. In Short: Why DITA? Why Not DITA?  DITA can be applied where any other applicable XML standard can be applied  At lower absolute cost  With greater flexibility  With greater potential value  Cost of using DITA at worst no greater than using XML generally  So why not use it? 26 DITA for Publishers DocTrain East 2008
  27. 27. Yeah, But…  I’ve said a lot of stuff  What do you need from me to be convinced? DITA for Publishers DocTrain East 2008
  28. 28. DITA As Applied to Publishing
  29. 29. Publishing-Specific Challenges  Existing vendor solutions and community knowledge focused on tech doc requirements  Many vendors don’t really get DITA  Many tools don’t yet fully support specialization  Many older tools limited by architecture and implementation choices made years ago  Many service providers still building understanding of DITA  Publishing requirement for high-quality composition always a challenge for any XML-based solution  Publishers have different business drivers from tech doc DITA for Publishers DocTrain East 2008
  30. 30. Note to Vendors and Service Providers  Potential market for DITA as a publishing solution orders of magnitude larger than potential market for DITA as a tech doc solution  Many more publishers and units of publication than tech doc producers  Tech doc is a cost center  Publishing is profit center  In many ways, the value of DITA to publishing is more compelling than it is for tech doc  Just saying… DITA for Publishers DocTrain East 2008
  31. 31. Publishing: Open Toolkit Alone Won't Cut It  Pages are and will always be important  Need a path from DITA XML to publishing tools  InDesign  Quark  Etc.  No technical barrier to a generic DITA-to-InDesign process  Products like Typefi could add significant value  Several advantages for Publishers:  Uses existing layout design skills, tools, and methods  Can be 100% automatic or include human tweaking  Can leverage Toolkit for preprocessing 31 DITA for Publishers DocTrain East 2008
  32. 32. Where Could Publishers Go From Here?  DITA as specialized in the DITA spec not always appropriate for Publishers  Too constrained in some areas  Needs more ways to capture format intent  May not match existing publishing practice or conventions well  Might be useful to define a separate publishing- specific specialization family rooted at DITA topic rather than at concept/task/reference  Can a novel be single topic or small set of topics? [Yes]  Does that help? Does it it hurt? 32 DITA for Publishers DocTrain East 2008
  33. 33. Business Process Improvement Implications  Base cost of using XML is essentially unchanged  Same challenges for legacy conversion  Applying XML at end of process  Using XML as input to revision process  Cost of developing initial markup design can be significantly lower  Can have more generic, reusable processing components  DITA encourages and enables small modules  Makes recombination at small granularity possible and manageable  Adapts well to delivery to portable devices. 33 DITA for Publishers DocTrain East 2008
  34. 34. Business Improvement Implications (Cont.)  Incremental cost of DITA-based systems should go down over time as infrastructure acretes  Enables local optimization of markup without impeding interchange within an enterprise or across enterprises  Provides controlled, formal framework for defining common components used across parts of enterprise or communities of interest  Enables use of more sophisticated features as needed DITA for Publishers DocTrain East 2008
  35. 35. Potential Information Economy Improvements  Reduce tight coupling between suppliers and consumers (aggregators, republishers, etc.)  No need to agree on rigid, overly-general standards to enable interchange  Supplier need not have full publishing infrastructure in order to supply high-quality content  Information consumers can apply a generic DITA processing infrastructure to content from many suppliers  Increases value of information in DITA form  Reduces impedance of interchange 35 DITA for Publishers DocTrain East 2008
  36. 36. Wrap Up
  37. 37. In Conclusion  DITA has lots of unique goodness of direct and compelling value to publishers  DITA can be used in simple ways to good effect with low cost of entry  DITA’s low cost and strong features represent a compelling value for almost all XML-based publishing use cases  Full DITA infrastructure still being developed  Off-the-shelf DITA-to-InDesign processes  Standard publishing-specific DITA modules  Community knowledge of how best to apply DITA to publishing use cases DITA for Publishers DocTrain East 2008
  38. 38. Questions? ? DITA for Publishers DocTrain East 2008
  39. 39. Thank You Eliot Kimber Really Strategies 39 DITA for Publishers DocTrain East 2008