Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Content Archaeology (Keynote for DocTrain West March 2009)


Published on

This presentation introduces the various approaches used to convert unstructured legacy content into something more useful - namely into a structured form such as that provided by DITA.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Content Archaeology (Keynote for DocTrain West March 2009)

  1. 1. Content Archaeology: Raiders of the Lost Art Joe Gollner VP Enterprise Publishing Solutions Stilo International Copyright © Stilo International 2009
  2. 2. A 1994 Presentation that Addressed a Similar Theme
  3. 3. Nerd Alert
  4. 4. The Long Road to XML (1987...)
  5. 5. Building Advanced Content Conversion, Management & Publishing Solutions for over 20 years
  6. 6. Tales from the Content Conversion Crypt Memories of Extreme Content Makeover Four Common Approaches Illustrative Examples of Content Conversion Experiences Practical Content Conversion Key Lessons & Themes
  7. 7. The Essence of Content Conversion Got this! Want that!
  8. 8. Extreme Content Makeover It can happen to you! Your content could become “spectacular”
  9. 9. Blood, Sweat and Tears Model of Conversion Manual effort deployed with great industry yields results …over time It can also be cruel…. conversion teams have been “sequestered” before...I know...
  10. 10. Snake Oil and Conversion Magic Some products claim to provide complete conversion solutions “out-of-the-box” One project licensed a “Universal Converter” and got…
  11. 11. Random Generator Conversion Environment Information Technology (IT) Team constructs a custom conversion solution using tools with which they are familiar Sometimes works but in more complex scenarios can led to problems when the programs don’t produce the “expected” results
  12. 12. Over the Wall Content Conversion Outsourced conversion services can be effective if managed carefully Often they are used as a way to “pass the ball” when the job Conversion services have seems too difficult historically been a challenging business The problems don’t usually go away
  13. 13. The Four Pillars of Content Conversion The Four Conversion Strategies Manual Effort Conversion Products Custom Conversion Environments Out-sourced Content Conversion There is Merit in Each of these Strategies Elements of each may figure in any effective conversion strategy Each may actually work in certain circumstances The Key Point Each conversion scenario is unique Complexity is determined by “distance” between source & target
  14. 14. Sources: The Harsh Reality of Legacy Content The Legacy Content Spectrum Opaque Not directly processable (e.g., paper / scanned images) Annoying Aggressively proprietary Little or no predictability in usage Polluted Normally processable but frequently filled with deviations & additions (HTML) Tolerable Documented format that exposes format & structure in a processable form Fortunately, popular formats are becoming more and more “tolerable”
  15. 15. Additional Potential Obstacles Things to watch out for: Content that exists in multiple formats Different renditions may be the best source for part of the content Necessitates parallel conversions of sources & merge Sophisticated supporting content Formulas Vector graphics Multimedia resources Application code
  16. 16. An Inconvenient Truth – About Content The truth is usually a little rougher... Some imagine that content is always cute, well-formed & easily handled....
  17. 17. Schema Protocols Content Instance Demanding Targets XML Validation Content Verification The conversion outputs are becoming more challenging Published products are growing more sophisticated Transformation Processing Underlying content needs to be modular, reusable & intelligent Outputs
  18. 18. The Key Questions Where are you? A true assessment of the state of your content sources Where are you going? A validated understanding of the output that you must produce & the uses to which it will be put
  19. 19. Practical Content Conversion Best Practice for Content Conversion Flexible posture Leverages the best tools & techniques Adapts to circumstances Continuously looks for automation opportunities Deploys automation under the guidance of the people who understand the content Leverages automation to: Analyse sources Perform transformations Validate results Analyse results
  20. 20. Conversion Process Roadmap Target Source to Subject Source XML Target Interaction Matter Analysis Schema Experts Mapping Guidance Legacy Source Modify Modified Manual Existing Content Conversion Conversion Editing Conversion Rules Process Rules Execute Example 1 Result Identified Conversion Interaction Set Analysis Issues Process 2 Sample Set 10% 3 Application Validation & Complete Complete Tests Verification Set 100%
  21. 21. Case Study: Converting Drug Information Not Recommended C D A B Optional o o o o ari ari ari ari en en en en Sc Sc Sc Sc Recommeded Drug 1 Drug 2 Drug 3 Drug 4 Migrating drug information into a precise digital form presented a critical challenge Source: Miles33, Quark & vendor drug monographs Target: Logical data structures needed to drive diagnostics
  22. 22. Case Study: Content Aggregation Services Sources: Paper PDF HTML SGML XML Databases …
  23. 23. To Burst of Not to Burst Conversion Outputs Compare Outputs Content Modularity is not an end in itself A business rationale must drive bursting & refactoring efforts
  24. 24. Case Study: Realizing Savings with Refactoring Outcome of refactoring: $100 million saved annually
  25. 25. Case Study: High Precision Content Conversion
  26. 26. But There’s More: Establishing Content Metadata Ontology Internal Sources Segments of content designated as valuable metadata metadata Attributes available in source format Keywords & abstract Annotations Identify Extract Insert External Sources metadata System Data (file information) Topic Associated keywords & descriptions Topic Ratings & commentary Process context Taxonomy Topic Additional information drawn from other Topic sources (e.g., part database) Link Network
  27. 27. And Don’t Forget about the Links Increasingly important Essential for portals (enabling navigation) Adding links Source / target identification Link specification Link generation Link validation Link extraction Link reporting Link activation Level of precision is high as is the potential for error
  28. 28. Worth a Thousand Words & Special Handling Graphics frequently introduce unique challenges Often occur in large numbers Mismatch between sources and targets can be major Associated with a separate processing pipeline & quality control steps Frequently introduces needs for specialized software tools Occasionally demands manual intervention Something practical can usually be done
  29. 29. Observations on Content Conversion Numerous approaches exist Each have a time & a place Applicability depends on context Where are you? Where are you going? Practical Content Conversion Flexible approach to conversion Selects from available tools & techniques to find the best solution Main Risk Dogmatically sticking to one tool & technique when change is demanded
  30. 30. Why is Content Conversion Important Past Investments in Content Were expensive to make Can be very valuable today Can embody vital business knowledge Can be costly to reproduce Rescuing Legacy Content Can be done efficiently & effectively Can save precious resources today Can prevent valuable knowledge from slipping into oblivion
  31. 31. You can be a Content Conversion Hero Provided that you know: Where you are Where you are going Otherwise you might turn out to be a little less impressive
  32. 32. Some References Stilo Website Stilo Migrate Online & On Demand Conversion Service & Whitepapers
  33. 33. It All Comes Down to Understanding your Content Content may look easy to handle Sometimes content can turn nasty
  34. 34. The Answer Takes a Familiar Form But do not under-estimate the power of the right tools in the hands of the right people at the right time