Content Archaeology:
          Raiders of the Lost Art                             Joe Gollner
                           ...
A 1994 Presentation that Addressed a Similar Theme
Nerd Alert
The Long Road to XML (1987...)
Building Advanced
  Content Conversion,
Management & Publishing
Solutions for over 20 years
Tales from the Content Conversion Crypt
  Memories of
  Extreme Content Makeover

  Four Common Approaches

  Illustrative...
The Essence of Content Conversion




    Got this!              Want that!
Extreme Content Makeover
                    It can happen
                        to you!
                     Your conte...
Blood, Sweat and Tears Model of Conversion
 Manual effort
 deployed with
 great industry
 yields results
 …over time




 ...
Snake Oil and Conversion Magic


Some products
claim to provide
complete conversion
solutions “out-of-the-box”

One projec...
Random Generator Conversion Environment

Information Technology (IT)
Team constructs a
custom conversion solution
using to...
Over the Wall Content Conversion

Outsourced
conversion services
can be effective if
managed carefully

Often they are use...
The Four Pillars of Content Conversion
  The Four Conversion Strategies
     Manual Effort
     Conversion Products
     C...
Sources: The Harsh Reality of Legacy Content
  The Legacy Content Spectrum
    Opaque
       Not directly processable (e.g...
Additional Potential Obstacles
 Things to watch out for:
 Content that exists in multiple formats
    Different renditions...
An Inconvenient Truth – About Content
                          The truth is usually
                          a little ro...
Schema                  Protocols



                                                Content
                             ...
The Key Questions
  Where are you?
    A true assessment of
    the state of your
    content sources



  Where are you
 ...
Practical Content Conversion
  Best Practice for Content Conversion
    Flexible posture
    Leverages the best tools & te...
Conversion Process Roadmap
 Target                         Source to                                            Subject
  ...
Case Study: Converting Drug Information
                   Not Recommended




                                           ...
Case Study: Content Aggregation Services



Sources:
Paper
PDF
HTML
SGML
XML
Databases
…
To Burst of Not to Burst
Conversion




                                                                                 O...
Case Study: Realizing Savings with Refactoring




   Outcome of refactoring:
 $100 million saved annually
Case Study: High Precision Content Conversion
But There’s More: Establishing Content Metadata
                                               Ontology
  Internal Sources...
And Don’t Forget about the Links
  Increasingly important
  Essential for portals (enabling navigation)
  Adding links
   ...
Worth a Thousand Words & Special Handling
  Graphics frequently
  introduce unique challenges
    Often occur in large num...
Observations on Content Conversion
 Numerous approaches exist
   Each have a time & a place
   Applicability depends on co...
Why is Content Conversion Important
 Past Investments in Content
   Were expensive to make
   Can be very valuable today
 ...
You can be a Content Conversion Hero
Provided that
you know:
Where you are
Where you
are going

 Otherwise
 you might
 tur...
Some References
  Stilo Website
    www.stilo.com

  Stilo Migrate Online & On Demand Conversion Service
    www.stilo.com...
It All Comes Down to Understanding your Content




Content may look easy to handle




                                  ...
The Answer Takes a Familiar Form




   But do not under-estimate the power of the right tools
      in the hands of the r...
Upcoming SlideShare
Loading in...5
×

Content Archaeology (Keynote for DocTrain West March 2009)

1,122

Published on

This presentation introduces the various approaches used to convert unstructured legacy content into something more useful - namely into a structured form such as that provided by DITA.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,122
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Content Archaeology (Keynote for DocTrain West March 2009)

  1. 1. Content Archaeology: Raiders of the Lost Art Joe Gollner VP Enterprise Publishing Solutions Stilo International jgollner@stilo.com Copyright © Stilo International 2009
  2. 2. A 1994 Presentation that Addressed a Similar Theme
  3. 3. Nerd Alert
  4. 4. The Long Road to XML (1987...)
  5. 5. Building Advanced Content Conversion, Management & Publishing Solutions for over 20 years
  6. 6. Tales from the Content Conversion Crypt Memories of Extreme Content Makeover Four Common Approaches Illustrative Examples of Content Conversion Experiences Practical Content Conversion Key Lessons & Themes
  7. 7. The Essence of Content Conversion Got this! Want that!
  8. 8. Extreme Content Makeover It can happen to you! Your content could become “spectacular”
  9. 9. Blood, Sweat and Tears Model of Conversion Manual effort deployed with great industry yields results …over time It can also be cruel…. conversion teams have been “sequestered” before...I know...
  10. 10. Snake Oil and Conversion Magic Some products claim to provide complete conversion solutions “out-of-the-box” One project licensed a “Universal Converter” and got…
  11. 11. Random Generator Conversion Environment Information Technology (IT) Team constructs a custom conversion solution using tools with which they are familiar Sometimes works but in more complex scenarios can led to problems when the programs don’t produce the “expected” results
  12. 12. Over the Wall Content Conversion Outsourced conversion services can be effective if managed carefully Often they are used as a way to “pass the ball” when the job Conversion services have seems too difficult historically been a challenging business The problems don’t usually go away
  13. 13. The Four Pillars of Content Conversion The Four Conversion Strategies Manual Effort Conversion Products Custom Conversion Environments Out-sourced Content Conversion There is Merit in Each of these Strategies Elements of each may figure in any effective conversion strategy Each may actually work in certain circumstances The Key Point Each conversion scenario is unique Complexity is determined by “distance” between source & target
  14. 14. Sources: The Harsh Reality of Legacy Content The Legacy Content Spectrum Opaque Not directly processable (e.g., paper / scanned images) Annoying Aggressively proprietary Little or no predictability in usage Polluted Normally processable but frequently filled with deviations & additions (HTML) Tolerable Documented format that exposes format & structure in a processable form Fortunately, popular formats are becoming more and more “tolerable”
  15. 15. Additional Potential Obstacles Things to watch out for: Content that exists in multiple formats Different renditions may be the best source for part of the content Necessitates parallel conversions of sources & merge Sophisticated supporting content Formulas Vector graphics Multimedia resources Application code
  16. 16. An Inconvenient Truth – About Content The truth is usually a little rougher... Some imagine that content is always cute, well-formed & easily handled....
  17. 17. Schema Protocols Content Instance Demanding Targets XML Validation Content Verification The conversion outputs are becoming more challenging Published products are growing more sophisticated Transformation Processing Underlying content needs to be modular, reusable & intelligent Outputs
  18. 18. The Key Questions Where are you? A true assessment of the state of your content sources Where are you going? A validated understanding of the output that you must produce & the uses to which it will be put
  19. 19. Practical Content Conversion Best Practice for Content Conversion Flexible posture Leverages the best tools & techniques Adapts to circumstances Continuously looks for automation opportunities Deploys automation under the guidance of the people who understand the content Leverages automation to: Analyse sources Perform transformations Validate results Analyse results
  20. 20. Conversion Process Roadmap Target Source to Subject Source XML Target Interaction Matter Analysis Schema Experts Mapping Guidance Legacy Source Modify Modified Manual Existing Content Conversion Conversion Editing Conversion Rules Process Rules Execute Example 1 Result Identified Conversion Interaction Set Analysis Issues Process 2 Sample Set 10% 3 Application Validation & Complete Complete Tests Verification Set 100%
  21. 21. Case Study: Converting Drug Information Not Recommended C D A B Optional o o o o ari ari ari ari en en en en Sc Sc Sc Sc Recommeded Drug 1 Drug 2 Drug 3 Drug 4 Migrating drug information into a precise digital form presented a critical challenge Source: Miles33, Quark & vendor drug monographs Target: Logical data structures needed to drive diagnostics
  22. 22. Case Study: Content Aggregation Services Sources: Paper PDF HTML SGML XML Databases …
  23. 23. To Burst of Not to Burst Conversion Outputs Compare Outputs Content Modularity is not an end in itself A business rationale must drive bursting & refactoring efforts
  24. 24. Case Study: Realizing Savings with Refactoring Outcome of refactoring: $100 million saved annually
  25. 25. Case Study: High Precision Content Conversion
  26. 26. But There’s More: Establishing Content Metadata Ontology Internal Sources Segments of content designated as valuable metadata metadata Attributes available in source format Keywords & abstract Annotations Identify Extract Insert External Sources metadata System Data (file information) Topic Associated keywords & descriptions Topic Ratings & commentary Process context Taxonomy Topic Additional information drawn from other Topic sources (e.g., part database) Link Network
  27. 27. And Don’t Forget about the Links Increasingly important Essential for portals (enabling navigation) Adding links Source / target identification Link specification Link generation Link validation Link extraction Link reporting Link activation Level of precision is high as is the potential for error
  28. 28. Worth a Thousand Words & Special Handling Graphics frequently introduce unique challenges Often occur in large numbers Mismatch between sources and targets can be major Associated with a separate processing pipeline & quality control steps Frequently introduces needs for specialized software tools Occasionally demands manual intervention Something practical can usually be done
  29. 29. Observations on Content Conversion Numerous approaches exist Each have a time & a place Applicability depends on context Where are you? Where are you going? Practical Content Conversion Flexible approach to conversion Selects from available tools & techniques to find the best solution Main Risk Dogmatically sticking to one tool & technique when change is demanded
  30. 30. Why is Content Conversion Important Past Investments in Content Were expensive to make Can be very valuable today Can embody vital business knowledge Can be costly to reproduce Rescuing Legacy Content Can be done efficiently & effectively Can save precious resources today Can prevent valuable knowledge from slipping into oblivion
  31. 31. You can be a Content Conversion Hero Provided that you know: Where you are Where you are going Otherwise you might turn out to be a little less impressive
  32. 32. Some References Stilo Website www.stilo.com Stilo Migrate Online & On Demand Conversion Service www.stilo.com/migrate & migrate.stilo.com Whitepapers www.gollner.ca
  33. 33. It All Comes Down to Understanding your Content Content may look easy to handle Sometimes content can turn nasty
  34. 34. The Answer Takes a Familiar Form But do not under-estimate the power of the right tools in the hands of the right people at the right time

×