XML Workflows & You


Published on

"XML Workflows & You", a presentation to the Association of Canadian Publishers in December-09. Designed to educate non-technical book publishing personnel on the intricacies of implementing XML in their production workflows. The presentation does not fully endorse the notion!

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

XML Workflows & You

  1. 1. XML Workflows and You Thad McIlroy The Future of Publishing San Francisco & Vancouver Presented to ACP CPDS Digital Publishing Workshop g g p Thursday, December 10, 2009
  2. 2. Outline  My background y g  My XML Thesis  The Vision!  Coping with a digital world  Thinking (a lot) about XML  Complexity p y  eBooks  Implementing XML workflows
  3. 3. My Background M B k d  8 years in bookselling & publishing in Canada; 4 in the U.S. (15 in SF) ( )  20+ years studying the intersection of technology and print publishing, working with publishers, printers & vendors  5 years with Seybold Seminars  14 books and 200+ articles
  4. 4. More Recent Background  10 years studying the impact of the Internet on graphic communications  Major focus now:  The future of publishing  Workflow  eBooks and other media  Publishing automation (XSL FO) (XSL-FO)  Writing for PrintAction, Gilbane.com, TheFutureOfPublishing.com
  5. 5. XML Background B k d  Worked to implement two multi-million XML workflows at large educational g publishers  Authored “XSL-FO: Ready for Prime XSL FO: Time” (published by Gilbane)  Designed four automated print publishing systems
  6. 6. TheFutureofPublishing.com Th F t fP bli hi
  7. 7. My XML Thesis y These are all real problems!  Print production  Web publishing  Repurposing to multiple media  The semantic world (metadata)  Recombining (or subsetting) for new products  Archiving  Accessibility
  8. 8. XML Provides Real Solutions  …But it is a big, ugly, unwieldy bear g, g y, y  Its conceptual metaphors bear little or no resemblance to those of book publishing  It’s based on 1960s thinking about techdoc (GML, SGML, XML) (GML SGML  Yet its ubiquity makes it hard to shake  ….as does its mindshare  It’s an open standard, but that don’t make it free
  9. 9. The Vision: Smart Documents Authors w. Text and vector templates t l t graphics in XML Editors working Knows where electronically it’s been and where it’s going: • print & bind Proofing • Web W b & PDA done digitally • distribution File contains all preflight info & revision history Contains multiple language versions l i
  10. 10. Some ACP Survey Results  Most title production remains inhouse p  Few expect this to change soon  Few are “very aware” of XML workflows; very aware many questions remain  Not N t certain where th ROI will come f t i h the ill from  Keen on semantic tagging but without a strong concept of the value
  11. 11. ACP Survey Result Q S R lt Questions ti  50% of the software used inhouse is neither QE or ID. What is it?  Why the major interest in semantic tagging?
  12. 12. Dealing with a Digital World
  13. 13. Workflow Can B C t ll d W kfl C Be Controlled  It is a debugged system: there are no technological reasons for mistakes  PDF is a big part of the answer  Predictable (potentially) and independent  Major problem remains in the interface between publishers and printers
  14. 14. Color C l can be Controlled b C t ll d  CMSs work  Soft, remote color proofing works  Press manufacturers support color control  Closed-loop color control is here now
  15. 15. 20 Years Later
  16. 16. Managing the Mice
  17. 17. $10k f an “E for “Exculsive” L k l i ” Look
  18. 18. What is to Be Done?
  19. 19. Workflow Must Be Charted
  20. 20. The T Th Tenets of Automation t fA t ti  Full digitization: nothing on paper  Full commitment: from management to sales to all operating staff  All the software: the right applications (from creative through DAM/CMS and workflow enablers)  Standards: full support for the standards that enable a tomation automation
  21. 21. Stylesheets St l h t  Do you fully embrace stylesheets throughout y g your workflow, from word processor to page composition?  If changes are made in the print-ready print ready PDF file, are those changes systematically being restored to earlier iterations of the work?
  22. 22. Essential Points E ti l P i t  If you have not got your current workflows fully digital and fully debugged, y g y gg forget XML entirely  There is only one workflow  Stop seeing the printer as separate from your workflow  Stop seeking needless competitive bids. Partner. Partner
  23. 23. 6 projects that could change publishing f the b h ld h bli hi for h better Michael Tamblyn, CEO BookNet Canada BookNet Canada TechForum 09
  24. 24. an XML publishing p g workflow that doesn’t suck
  25. 25. OR a publishing workflow bli hi kfl that offers all of the benefits of XML XML, yet doesn’t suck doesn t The Future of Publishing g
  26. 26. The I Th Importance of XML t f  eXtended Markup Language  XML enables content management  Combining of the power of style sheets with the power of databases  Style sheets with meaning
  27. 27. XML is the Answer A New-Breed of Data Standard, , a Single Standard Able to Represent: 1. All manner of content f t t 2. The structure of content 3. 3 The “meaning” of content (through smart meaning tag names and metadata) 4. Production/workflow requirements 5. Rights data 6. Repurposing requirements (cross-media) 2005
  28. 28. XML  “Composition is the ‘low-hanging fruit’”  XML stands for Extremely Mixed-up Mixed up Language  Suited to reference non fiction reference, non-fiction, education, multipurposing  “XML i lik violence. If you’re not getting is like i l ’ t tti the result you want you have to use more.” more ”
  29. 29. Format vs. Structure F t St t  Format describes how content is intended to look when it is displayed or p y printed  Structure describes the purpose or meaning of content
  30. 30. XML & P bli h Publishers’ Workflows ’ W kfl  Most smaller publishers are still exploratory mode p y  This is NOT simple to implement, train and support  It can be hugely expensive  It’s It’ easy to make expensive mistakes t k i i t k  Demands a large offshore component
  31. 31. The I f Th Information Avalanche ti A l h  Doubling the knowledge base: 1750 – 1900: 150 years to double 1900 – 1950: 50 years to double y 1950 – 1960: 10 years to double 1960 – 1992: 5 years to double  By 2020, information is expected to double about every 73 days!  Paper can’t provide data in a cost-effective and timely fashion
  32. 32. Growth in Electronic Documents  1995: 12 trillion electronic and paper documents  90% of all documents were printed (in 1998)  2005:20 trillion documents  2005: About 50% will be printed  Ratio f ff t t di it l i t R ti of offset to digital print — 40 60 40:60  Offset @ 40% of today’s volume Source: Gary Starkweather, Microsoft Research (and inventor of the laser printer)
  33. 33. The Next Section IMPLEMENTING XML
  34. 34. Structured Tagging by Authors? 36 Typéfi sample approach
  35. 35. XML Tagging Semantic tagging requires human judgment <!--the resource links in the ProcessGroup define the input resources that must be available for the ProcessGroup to be submitted and the output resources that are p p produced by the ProcessGroup --> y p <ResourceLinkPool> <!-- print input media --> <MediaLink Usage="Input" rRef="L2"/> <ResourceLinkPool> <GatheringParamsLink Usage="Input" rRef="L4"/> <!-- gathered output components --> <ComponentLink U C tLi k Usage="Output" rRef="L7"/> "O t t" R f "L7"/ </ResourceLinkPool> <ID="J2" Status="Waiting" Type="DigitalPrinting"> <ResourceLinkPool> <GatheringParamsLink Usage="Input" rRef="L4"/> 37
  36. 36. Semantic Tagging S ti T i
  37. 37. Semantic Tagging S ti T i  The concept has been dominated by the notion of a “semantic Web”  The benefits are easy to imagine  The implementation can be imagined also  But the infrastructure is not in place to deliver th b d li the benefits fit  Publishers run fiefdoms
  38. 38. RCO ̶ Th Semantic Challenge The S ti Ch ll  Reusable Content Objects: What is your level of semantic granularity?  The word  Sentence  Paragraph  Story  Other content objects  Graphics
  39. 39. If you show this to most editors... they’re going to start drinking at their desks (MT)
  40. 40. Digital Asset Management XML’s role in metadata and taxonomies 42
  41. 41. Content Management
  42. 42. Templated Designs How much of XML-tagged content can be composed automatically? 44 Typéfi sample approach
  43. 43. Metadata E t M t d t Enters the Process th P  Data that describes other data 45
  44. 44. The B Th Bean Analogy A l FROM: A Manager’s Introduction to Adobe extensible Metadata Platform
  45. 45. Bean M t d t B Metadata ELEMENT CATEGORY VALUE OF CATEGORY IN THIS DATA TYPE NUMBER OF INFORMATION INSTANCE (What appears on the label) 1 The maker: Trader Joe’s String 2 The contents: Black Beans String 3 A notion of distinctive food value: A low fat food String 4 A second notation of distinctive food value: An excellent source of dietary fiber String 5 Directions for finding nutritional information: See side panel for nutritional information String 6 A notation of weight, in English and metric units: New Wt. 15 oz (415g) Formatted numbers 7 A marketing narrative Trader Joe’s Black Beans have a rich, hearty taste and soft texture. They are wonderful in soups and stews, with rice, and in salads with colorful vegetables and Southwestern or Caribbean flavors. Black beans have gained in popularity due to their high dietary fiber and protein content. They are a cholesterol-free and low fat food. Long string
  46. 46. More Bean Metadata cholesterol-free and low fat food. 8 A declaration of No preservatives, no artificial colors, no artificial wholesomeness: flavors String 9 A list of ingredients: black beans, water, salt, calcium chloride List separated by commas 10 The ID of distributor Dist.& Sold Exclusively by Trader Joe’s, and seller: So. Pasadena, CA 91031 String 11 A tracking code, in Roman 0009 6362 Integer 12 Same tracking code in bar- code-readable format Bit map 13 The nutritional facts, in Nutritional Facts Structured table standard order and format: Serving Size 1/2 cup (130g) Servings per container about 3 Amount per serving Calories 130 Fat Cal 5 % Daily Value Total Fat 0.5g 0% Saturated Fat 0g 0% Cholesterol 0mg 0% Sodium 260mg 11% Total Carbohydrates 22g 7% Dietary Fiber 5g 22% Sugars 0g Protein 10g 20% Vitamin A 0% ° Vitamin C 0% Calcium 4% ° Iron 10% • Percent Daily Values are based on a 2,000 calorie diet
  47. 47. The Cross-Media Challenge g Print Web Mobile
  48. 48. The C Th Cross-Media Challenge M di Ch ll
  49. 49. The C Th Cross-Media Challenge M di Ch ll
  50. 50. W3C XML Schema Definition Language (XSD) 1.1 12-03-09 …Part 1: Structures” specifies the XML Schema Definition Language, offering facilities for describing the structure and constraining the contents of XML documents, including those which exploit the XML Namespace facility. The schema language, which is itself represented i an XML vocabulary and uses t d in b l d namespaces, substantially reconstructs and extends the capabilities found in XML document type definitions p yp (DTDs). The second publication, "Datatypes,” defines facilities for defining datatypes to be used in XML Schemas as well as other XML specifications Comments specifications. welcome through 31-12.
  51. 51. DocBook (docbook.org)  What is DocBook?  “DocBook is a schema (available in several languages including RELAX NG NG, SGML and XML DTDs, and W3C XML Schema) maintained by the DocBook Committee of OASIS. It is particularly well suited to books and papers about computer hardware and software...”  700 pages of “dense” documentation f “d ”d t ti
  52. 52. DITA 1 1 August-07 1.1 August 07  The Darwin Information Typing Architecture (DITA) is an XML-based architecture for authoring producing and authoring, producing, delivering information. Its main use is for technical publications  The documentation is 593 pages  Maintained by OASIS-open.org M i t i d b OASIS  And it won’t work for all your titles
  53. 53. Do Not Be Tricked!
  54. 54. Rule f L R l of Least Power tP  A W3C TAG Finding states: “When designing computer systems, one is g g p y often faced with a choice between using a more or less powerful language for p g g publishing information, for expressing constraints, or for solving some p g problem. [ . . . ] The ‘Rule of Least Power’ suggests choosing the least p gg g powerful language suitable for a given purpose.”
  55. 55. We’re Thi ki W ’ Thinking Small S ll
  56. 56. We’re Thi ki W ’ Thinking Small S ll
  57. 57. The Tipping Point How Little Things Can Make a Big Difference “...a book that presents a new way of p y understanding why change so often happens as q pp quickly and as unexpectedly as y p y it does...Ideas and behavior and messages and products sometimes behave j p just like outbreaks of infectious disease. They are p social epidemics. — Malcolm Gladwell
  58. 58. Crossing the Chasm C i th Ch Pragmatists Conservatives Visionaries Skeptics Techies Innovators Early Early Majority Late Majority Laggards Adopters Source: www.chasmgroup.com
  59. 59. The Human Factor New Internal Roles, Skills & Positions  The production skill set changes substantially  Much of the existing knowledge base changes or obsoletes  The Th move from design & composition & f d i iti production management to content & p product architecting and engineering g g g  There is an enormous training challenge ahead  And a need for certification
  60. 60. Some St S Steps  Do your homework  Listen to the next preso and decide if you think that can work for you  Remember, Remember there are alternatives  eBook “construction” is only going to get cheeper and easier h d i  Stay tuned…with Adobe, Quark and the key trade associations
  61. 61. Thank Th k you thad@theFutureofPublishing.com