Conquering the mountain of legacy data


Published on

So you are finally planning that big move over to structured content. Great! You understand the benefits of reuse and what it means for your business.

But what do you do about the mountains of legacy content? Pack it all up and bring it with you? Leave it behind hoping you will never need to look at it again? Work in two worlds of structured (for the new content) and unstructured (for the old data) authoring?

In these slides from the FREE one hour session we will talk about the options you have for handling all of that old data. Like the Sherpa guides of Nepal, we can help you. In this session we will share with you what you will need to help you on the journey, what to plan for, and some hard won lessons learned along the way.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Established in 1981, DCL is a women owned company located in Fresh Meadows, NYWe are the pioneer in defining and developing the emerging data conversion industry and a founding member of SGML User GroupAs you will hear Anthony speak of shortly, it is critical that your data is able to support all processes to improve overall efficiency and increase profitsThe world has become an extremely data centric environment…with each destination requiring a specific data format in order to interact accordinglyThis is where our expertise comes into play…making certain your data is converted to meet the demands that are placed on it
  • Conquering the mountain of legacy data

    1. 1. Conquering the Mountain of Legacy DataDon Bridges, DCLAugust 10, 2011<br />
    2. 2. Poll Question #1<br />How have you handled content migrations in the past?<br />Haven’t done one<br />Copy and paste<br />Internal Tools<br />Outsourced<br />SAAS<br />
    3. 3. Results: Poll Question #1<br />3<br />
    4. 4. Poll Question #2<br />What is most important to you in converting content?<br />Out-of-pocket costs<br />Schedule<br />Quality of conversion<br />Impact on writing staff<br />Data security<br />
    5. 5. Results: Poll Question #2<br />5<br />
    6. 6. Conquering the Mountain of Legacy DataDon Bridges, DCLAugust 10, 2011<br />
    7. 7. Don Bridges<br />Don Bridges is the Manager of Commercial “Technical Documentation” projects at DCL. He manages DCL activities across diverse markets such as aerospace, automotive, life sciences, manufacturing, semiconductor, software, telecom, and utilities. <br />Don has written several articles on the ‘business’ aspects of implementing XML and is a frequent conference speaker.<br />Don has previously worked for Enigma (software) and was a Flight Test engineer for Pratt & Whitney. He has an Engineering degree from LSU and is based in Albuquerque, NM. <br />7<br />
    8. 8. <ul><li>Established in 1981
    9. 9. Expertise in complex conversion projects including eBooks, TechDocs, Defense, and Libraries.
    10. 10. Substantial experience in managing multiple vendors for large-scale projects, with automated tracking and reporting of data throughout
    11. 11. A sophisticated quality control workflow with both automated and human quality control steps to guarantee accuracy
    12. 12. Extensive experience with all key DTDs, including DITA, EPUB, </li></ul> MOBI, S1000D, ATA, NLM, DocBook and TEI<br /><ul><li>Wrote the data conversion chapters in The XML Handbook and </li></ul> Columbia guide to Digital Publishing<br />8<br />
    13. 13. Outline<br />Why XML?<br />What do you Need?<br />How to handle legacy content?<br />Pros/Cons of Different Options<br />Preparing for Conversion<br />Lessons Learned<br />9<br />
    14. 14. Why XML: Genesis<br />SGML<br /><ul><li> Too complex
    15. 15. Low Commercial acceptance, little industry support</li></ul>HTML<br /><ul><li> Focused on presentation, limited tag set
    16. 16. Too simplistic to handle complex searching, linking, document management</li></ul>10<br />
    17. 17. XML<br />Less publishing centric than SGML<br />Less complex than SGML (debatable)<br />More robust than HTML<br />Smooth transition from SGML<br />Wide acceptance from business community<br />13,900,000 vs. 1,560,000 Google hits!<br />11<br />
    18. 18. XML is a good fit if you need to…<br />Shorten time to production/reduce costs<br />Seamlessly produce print and electronic products<br />Manage content in a manner that facilitates searching, versioning, re-purposing<br />Immediate delivery of up-to-date content<br />Print on Demand/Dynamic Assembly<br />12<br />
    19. 19. And if you need more reasons…<br />XML is an Open Standard<br /><ul><li>Nobody “owns” the source code
    20. 20. Switch vendors ~ easily
    21. 21. Interchange simplified</li></ul>You can show a positive ROI<br /><ul><li>ROI - Return On Investment
    22. 22. Send an e-mail to for a ROI Calculator (compliments of PTC). </li></ul>13<br />
    23. 23. What about DITA? <br />DITA simplifies content reuse<br /><ul><li> Moving your content into “Lego blocks” that allow you to build any thing you want
    24. 24. “Single Source Authoring”</li></ul>14<br />
    25. 25. What do you need?<br />Conversion<br />Authoring<br />Delivery<br />Content Management<br />15<br />
    26. 26. What else do you need?<br />Sherpa<br />Training<br />A way to get your legacy content in to XML<br /><ul><li>A process that you can live with
    27. 27. Quality that you need
    28. 28. Reasonable schedule
    29. 29. Costs that won’t bust your budget</li></ul>16<br />
    30. 30. How to Handle Legacy Content<br />Draw a ‘line in the sand’<br /><ul><li> Support two systems (Legacy and XML)</li></ul>Convert your ‘living’ content<br /><ul><li> What to convert
    31. 31. How to convert</li></ul>17<br />
    32. 32. 18<br />A Few Thoughts<br />Do you need your legacy content? <br />Few writers have the clairvoyance to author content thinking it will EVER be converted in the future.<br />It’s more fun and less trouble to author anew, but not necessarily faster or cheaper.<br />
    33. 33. 19<br />2<br />1<br />costs<br />3<br />time<br />What to Convert<br />Option 1: Convert nothing<br /><ul><li> No conversion costs
    34. 34. Delayed ROI</li></ul>Option 2: Convert everything<br /><ul><li> High conversion costs
    35. 35. Reduced ROI</li></ul>Option 3: Convert ‘frequently used’ documents<br /><ul><li> Some conversion costs
    36. 36. Maximized ROI</li></li></ul><li>Convert with Intelligence<br />Convert documents that are most used <br />Convert documents that are customer favorites<br />Convert documents with longest product life<br />When in doubt, start in the present and go back<br />20<br />
    37. 37. Tribes<br />Sometimes people don’t believe that adding structure to content is something that automation can provide any assistance with. We can call these people the “skeptics”.<br />On the other side of the spectrum there are people who believe that there just has to be a packaged solution that will magically convert all of their content with the push of a button. We can call these people the “dreamers”. <br />A sub-tribe of the dreamers are those who think off-shore conversion with minimal data analysis and QA can bring the desired results.  These are the “cheap labor dreamers”.<br />Thanks to Joe Gollner and Paul Vanderveen for the definitions.<br />21<br />
    38. 38. Conversion Options<br /> Re-write<br /> Copy & Paste<br /> In-house<br /> SAAS (Software As A Service)<br /> Out-sources<br />22<br />
    39. 39. Re-write<br />Everything is just the way you want it<br />Can re-write for:<br /><ul><li> Minimalism
    40. 40. Simplified English
    41. 41. Topic-based</li></ul>Typical cost is $60 per page<br />23<br />
    42. 42. Copy & Paste<br />Feasible and may even make sense for small and simple documentation sets. <br />Humans are rarely 100% correct or consistent in performing tagging. <br />Read more:<br />24<br />
    43. 43. In-house<br />Feasible and may even make sense for small and simple documentation sets. <br />In a STC presentation, CSG Systems reported that they converted 11,677 pages of tech docs from Word to FrameMaker in six months. The total time spent on this project was 4,106 hours. <br />Read more:<br />25<br />
    44. 44. SAAS<br />Works well if your content is:<br /><ul><li> Consistent
    45. 45. Well-styled
    46. 46. Maps to the XML content model</li></ul>Unfortunately, most content does not.<br />You can do manual clean-up before or after conversion, but you will do manual clean-up.<br />26<br />
    47. 47. Out-sourced conversion<br />Better for larger and/or complex content<br />Turn-key operation for one-time event<br />Leverage experience of vendor<br />You will know up-front:<br /><ul><li>Price
    48. 48. Schedule
    49. 49. Quality</li></ul>No surprises<br />27<br />
    50. 50. Book A<br />Book B<br />Book C<br />Book D<br />Topic 1<br />Topic 1<br />Topic 1<br />Topic 2<br />Topic 3<br />Topic 3<br />Topic 4<br />Topic 4<br />Topic 5<br />Topic 5<br />Topic 5<br />Topic 5<br />DITA from a conversion perspective<br />Conversion can be topic-sourced or book-sourced<br />28<br />
    51. 51. DITA topic-sourced conversion<br />Analyze ‘similar’ topics that exist in many ‘books’ and re-write<br />Edit as a ‘stand alone’ topic that holds meaning on its own<br />Manual process<br />Maximizes reuse<br />29<br />
    52. 52. DITA book-sourced conversion<br />Designation of topics by type <br /><ul><li> Task / Concept / Reference</li></ul>Usually based on a heading level<br />Automated process<br />Typically 80%+ of the output topics don’t need re-authoring, and those that do don’t need it right away (as legacy presentation of a topic is still maintained)<br />Limited native reuse ‘out of the box’<br /><ul><li> Re-writing for reuse happens as a follow-on step</li></ul>30<br />
    53. 53. capitalization & punctuation<br />spelling<br />Harmonizing Content<br />31<br />
    54. 54. Content Faux Pas<br />Tables created without table editor<br />References to pages or location<br />Graphic overlays on tables<br />Extra spaces for composition<br />Style over-ride<br />32<br />
    55. 55. Content Model Violations<br />Having >2 levels of steps (DITA only allows two levels of steps (<step> and <substep>).)<br />2. Install the Outer Guard as follows:<br /> a. Spread the outer shin guard and place it over the inner shin guard. <br /> b. Install the outer guard cap screws based on your particular pump:<br /> For engines with a motor saddle support bracket:<br />i. Ensure the outer guard is straddling the support arm, and <br /> install but do not tighten the two remaining cap screws.<br /> For engines without a motor saddle support bracket:<br /> ii. Insert the spacer washer between the holes located closest to the <br /> motor in the outer guard, and install but do not tighten the two <br /> remaining cap screws.<br /> c. Position the outer guard so it is centered around the shaft, and so there <br /> is less than a 1/4" of shaft exposed. Tighten the cap screws.<br />3. Close the cover.<br />33<br />
    56. 56. Content Model Violations<br />Procedure authored as a table:<br />34<br />
    57. 57. Content Model Violations<br />Notes done as tables:<br />Problematic for software to decipher if this is a real table or a note. <br />DITA content model violations<br />35<br />
    58. 58. Lesson Learned<br />Be able to justify ROI to management<br />Make sure your process is scalable; do a pilot … or two… or three<br />You don’t know what you don’t know; hire a Sherpa and talk to others in industry; learn from their experience<br />36<br />
    59. 59. Be Prepared!<br />Implement XML authoring environment<br />Use robust, comprehensive DTDs<br />Train content experts to think about content differently<br />Create XML-centric workflows<br />Invest in tools that are XML enabled and designed to enhance -- not restrict -- the process<br />37<br />
    60. 60. Be Smart!<br />Do your homework - ROI is tough to identify, especially initially<br /><ul><li>Process improvements nice, but must be increased revenue to get approval</li></ul>Define a well-designed, phased approach<br />Know what you know -- and get help for what you don’t<br />38<br />
    61. 61. Resources<br />DCL News <br /><br />Webinar “Are you ready for Conversion”<br /><br />Webinar “Lessons:12 DITA Implements”<br /><br />DCL Library<br /><br />39<br />
    62. 62. Single-Sourcing Solutions<br />Need a Sherpa?<br /><ul><li>Over a decade of experience in dynamic publishing projects and content delivery.
    63. 63. Deep community relationships to ensure success.
    64. 64. Unique customer focus with concierge level service:
    65. 65. Assessments
    66. 66. Mentoring
    67. 67. Training
    68. 68. Implementation
    69. 69. Advisory Line</li></ul> Our community projects<br />Webinars<br />Single-Sourcing Blog<br />Podcast<br />and<br />SlideShare<br />DocS<br />Online Code Repository<br />Mailing Lists: dita-users, framers, xsl, xml-dev, adepters, many others..<br />More at:<br /><br />
    70. 70. Poll Question #3<br />What is the timing of your next conversion project?<br />In-process now<br />Before the end of the year<br />Next year or two<br />Longer than a year or two<br />None expected<br />
    71. 71. Results: Poll Question #1<br />42<br />
    72. 72. Poll Question #4<br />If you are considering a conversion project to XML, what DTD have you identified?<br />DITA<br />S1000D<br />NLM<br />Docbook<br />Something else<br />
    73. 73. Results: Poll Question #4<br />44<br />
    74. 74. 45<br />Questions...<br />& Answers<br />Data Conversion Laboratory<br />Web:<br />Telephone: 718-357-8700<br />Single-Sourcing Solutions<br />Web:<br />Telephone: 408-660-3219<br /><br />Don Bridges<br />Data Conversion Laboratory<br />Commercial Tech Docs Manager<br />Web:<br />Telephone: 718-357-8700<br />E-Mail:<br />