From Planning to Publishing: How Business Objects Migrated Documentation to DITA One Step at a Time


Published on

Presented by Dave Holmes at Documentation and Training West May 6-9, 2008 in Vancouver, BC
In 2006, Business Objects faced a major challenge. How to migrate over 50,000 pages of unstructured non-topic based documentation it had acquired through rapid growth and acquisitions. The answer was to use DITA to standardize content creation, management, translation and publishing processes company-wide. In this session, you will learn how they went from planning to publishing using an iterative approach, and how you can use this method to see the results of a content migration sooner in your project cycle.

Published in: Business, Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

From Planning to Publishing: How Business Objects Migrated Documentation to DITA One Step at a Time

  2. 2. AGENDA <ul><li>About us </li></ul><ul><li>Reasons for change </li></ul><ul><li>Migrating to DITA </li></ul><ul><li>Other Changes Required </li></ul><ul><li>How did we do? </li></ul><ul><li>Lessons Learned </li></ul>
  3. 3. About Business Objects <ul><li>Business Objects, an SAP company, is the world leader in business intelligence (BI) software </li></ul><ul><li>Headquarters in San Jose, CA and Paris, France </li></ul><ul><li>SAP is the world's leading provider of business software </li></ul><ul><li>Headquarters in Waldorf, Germany </li></ul><ul><li>SAP acquired Business Objects in 2007 </li></ul>
  4. 4. About Business Objects Documentation <ul><li>Some quick numbers </li></ul><ul><ul><li>76 authors + 16 Production Staff </li></ul></ul><ul><ul><li>Nine sites </li></ul></ul><ul><li>All content is written in English and localized to up to 10 other languages. Many documents are sim-shipped </li></ul><ul><li>Documentation teams have undergone rapid growth due to acquisition in the past 3 years </li></ul><ul><li>Began a move to XML based authoring in 2005 </li></ul><ul><li>First complete release in mid 2007 </li></ul>
  5. 5. Reasons for Change
  6. 6. Motivation for Change (1/2) <ul><li>Fast growth due to acquisitions of other companies brought inconsistencies </li></ul><ul><ul><li>Supported 6+ file formats </li></ul></ul><ul><ul><li>Different team structures and cultures </li></ul></ul><ul><ul><li>Different styles and guidelines </li></ul></ul><ul><li>Authors suffered from inefficient processes </li></ul><ul><ul><li>Manual processes made up a large portion of an authors work </li></ul></ul><ul><ul><li>Inconsistent tools and processes increased overhead </li></ul></ul><ul><ul><li>Writers spent time recreating existing content and manually copying/pasting instead of single sourcing </li></ul></ul>
  7. 7. Motivation for Change (2/2) <ul><li>Translation process was expensive </li></ul><ul><ul><li>Had to manage multiple character encodings </li></ul></ul><ul><ul><li>Large number of queries about English source content </li></ul></ul><ul><ul><li>Simultaneous shipment of software in 8 languages meant complex schedules and tight deadlines </li></ul></ul><ul><li>Publishing process was overly complicated </li></ul><ul><ul><li>Build times were as high as 2 days per deliverable per language </li></ul></ul><ul><ul><li>Multiple tools meant supporting multiple publishing processes </li></ul></ul>
  8. 8. Why did we choose XML? <ul><li>Support for end-to-end Unicode encoding </li></ul><ul><li>Improved reporting on content and automated workflows </li></ul><ul><li>Separation of content and format: centralized, standardized output </li></ul><ul><li>But we needed a DTD… </li></ul>
  9. 9. Why did we choose DITA? <ul><li>DITA is a robust, industry-standard DTD </li></ul><ul><li>Topic based: </li></ul><ul><ul><li>Provides better experience to our users </li></ul></ul><ul><ul><li>Makes reuse easier </li></ul></ul><ul><ul><li>Allows easier division of workload </li></ul></ul><ul><ul><li>Allows for rolling translation </li></ul></ul><ul><li>Extensible architecture allows us to grow our information types with minimal effort </li></ul><ul><li>Allows us to impose constraints on topic and element structure, which encourages: </li></ul><ul><ul><li>Minimalism: less extraneous information in standalone topics </li></ul></ul><ul><ul><li>Structural and stylistic consistency </li></ul></ul>
  10. 10. Migration Goals <ul><li>Reduce production times </li></ul><ul><li>Support a single file format </li></ul><ul><li>Support a single publishing process </li></ul><ul><li>Minimize writing effort </li></ul><ul><li>Reuse content between deliverables and Business Units </li></ul><ul><li>End-to-end Unicode character encoding </li></ul><ul><li>Reduce the amount of required interaction between the localization, documentation and publishing teams </li></ul>
  11. 11. Challenges and Restrictions <ul><li>Existing knowledge of DITA was very low </li></ul><ul><li>Delivery of our largest doc set, in 9 languages, by 2007 </li></ul><ul><li>In order to make the migration financially feasible, we had to migrate all of our content by the end of 2007 </li></ul>
  12. 12. Migrating to DITA <ul><li>How did we get from there, to here? </li></ul>
  13. 13. Components of a Successful Migration <ul><li>Content and Reuse analysis </li></ul><ul><li>DITA specialization </li></ul><ul><li>New Authoring tools </li></ul><ul><li>Content Management System </li></ul><ul><li>Publishing process </li></ul><ul><li>Automated migration process </li></ul>
  14. 14. Content and Reuse analysis <ul><li>Conducted initial analysis, including: </li></ul><ul><ul><li>Content analysis </li></ul></ul><ul><ul><li>Reuse analysis </li></ul></ul><ul><ul><li>Tools analysis </li></ul></ul><ul><li>Designed a roadmap that included all teams </li></ul><ul><li>Decided to migrate to DITA 1.0 with no specialization </li></ul><ul><li>Reuse strategy would be implemented later in the project </li></ul><ul><li>Created rough plans for content alignment and rework </li></ul>
  15. 15. Authoring Tool <ul><li>Authoring tools may have to change </li></ul><ul><li>Any XML Friendly Editor should be fine </li></ul><ul><li>We selected XMetaL for DITA authoring </li></ul><ul><ul><li>Highly visual XML Editor </li></ul></ul><ul><ul><li>Direct integration with our Content Management System </li></ul></ul><ul><ul><li>Several extension points </li></ul></ul>
  16. 16. Content Management System <ul><li>A good Content Management Tool: </li></ul><ul><ul><li>Is a common place to store files </li></ul></ul><ul><ul><li>Allows multiple people to work on the same files </li></ul></ul><ul><ul><li>Includes tools to find, group, sort and categorize information </li></ul></ul><ul><ul><li>Includes tools to publish information to other sources </li></ul></ul><ul><li>A good Globalization Management Tool: </li></ul><ul><ul><li>Supports the maintenance and deployment of multi-lingual versions of the same content </li></ul></ul><ul><ul><li>Custom formatting for each language </li></ul></ul><ul><ul><li>Provides additional tools for translation to the people that need them </li></ul></ul><ul><li>We Selected Idiom’s WorldServer </li></ul>
  17. 17. Publishing System <ul><li>Nearly all publishing is handled through WorldServer </li></ul><ul><ul><li>Exceptions for some release material or API material. </li></ul></ul><ul><li>We single source to the following formats: </li></ul><ul><ul><li>PDF x 3 styles (for review, product whitepaper, product guide) </li></ul></ul><ul><ul><li>CHM x 3 styles (for review, help, .NET2005) </li></ul></ul><ul><ul><li>HxS </li></ul></ul><ul><ul><li>Eclipse Help </li></ul></ul><ul><ul><li>HTML </li></ul></ul><ul><ul><li>Flat XML </li></ul></ul><ul><li>All languages published using the same workflows </li></ul><ul><li>Initial customization of publishing process done in house </li></ul><ul><li>Hired an XSL Developer to work on publishing full time </li></ul>
  18. 18. Automated Migration to DITA
  19. 19. Creating an Automated Process <ul><li>Most migration tasks can be automated </li></ul><ul><li>Some tools freely available </li></ul><ul><li>Any automation will require customization </li></ul><ul><li>Any customization will require technical expertise </li></ul><ul><li>No automation is perfect </li></ul>
  20. 20. Process Overview
  21. 21. Content Analysis <ul><li>Authors conducted an analysis per deliverable </li></ul><ul><ul><li>Identified content that would be obsolete for next major release </li></ul></ul><ul><ul><li>Analyzed content for appropriate structure and for potential reuse </li></ul></ul><ul><ul><li>Flagged difficult passages </li></ul></ul>
  22. 22. Pre-Processing <ul><li>Make the input as simple, and uniform, as possible </li></ul><ul><ul><li>Ensure adherence to current templates </li></ul></ul><ul><ul><li>Ensure adherence to corporate style guide </li></ul></ul><ul><ul><li>Remove ‘complicated’ constructions </li></ul></ul><ul><ul><li>Remove, or minimize, variables in text and call-outs in images </li></ul></ul><ul><li>Move towards Topic Oriented style </li></ul><ul><ul><li>Remove ‘book-isms’ where possible </li></ul></ul><ul><ul><li>Remove phrases such as ‘In this chapter’ or ‘on page…’ </li></ul></ul><ul><li>Structure content as much as possible </li></ul><ul><ul><li>Consistent styles for blocks of text or inline elements improved the results of automation </li></ul></ul>
  23. 23. Scripted Migration <ul><li>Migration to DITA was handled by our Publishing team </li></ul><ul><li>Scripts validated input files against the expected style sheets and templates </li></ul><ul><li>Framemaker Content </li></ul><ul><ul><li>Frame files ‘published’ to DITA using a WebWorks template </li></ul></ul><ul><li>Non-Framemaker Content </li></ul><ul><ul><li>Files were published to a simplified HTML template </li></ul></ul><ul><ul><li>Content was converted to DITA using XSLT </li></ul></ul><ul><li>Perl and XSLT were used to fine tune the output based on input from the author </li></ul>
  24. 24. Three stages of DITA <ul><li>Considered “Well Formed” if: </li></ul><ul><ul><li>All tags that open are closed </li></ul></ul><ul><ul><li>All tags open and close in the same order </li></ul></ul><ul><ul><li>All attributes are quoted </li></ul></ul><ul><li>Considered “Valid” if: </li></ul><ul><ul><li>It is well formed </li></ul></ul><ul><ul><li>It conforms to the rules of the DITA DTD </li></ul></ul><ul><li>Considered “Well Written” if: </li></ul><ul><ul><li>It is well formed, and valid </li></ul></ul><ul><ul><li>Content conforms to our Style Guide </li></ul></ul><ul><ul><li>All tags are used correctly </li></ul></ul><ul><ul><li>Adheres to the correct Information Architecture (topic based, correct topic types are used when appropriate) </li></ul></ul>
  25. 25. Post-Processing <ul><li>Content from migration was: </li></ul><ul><ul><li>Guaranteed to be Valid </li></ul></ul><ul><ul><li>80% Structurally correct </li></ul></ul><ul><ul><li>20-80% topic based </li></ul></ul><ul><li>Authors examined resulting files and improved content as necessary </li></ul><ul><li>No more than 10% of the content required re-writing </li></ul><ul><li>Most rewriting occurred because the input files were not topic based </li></ul>
  26. 26. Final Steps <ul><li>Content moved to new CMS </li></ul><ul><li>New published output compared against input files </li></ul><ul><ul><li>Authors published content in familiar file formats, and compared the output against the original files </li></ul></ul><ul><ul><li>Authors published content to unfamiliar formats, and examined the output for oddities </li></ul></ul><ul><li>Localization teams scoped the new files for loc impact </li></ul><ul><ul><li>Translation Memories were adjusted programmatically where possible to reduce the impact of the changes </li></ul></ul><ul><ul><li>Input files changed programmatically to filter out some content from translation </li></ul></ul>
  27. 27. Other changes required
  28. 28. Process Refinement <ul><li>Continual improvement of migration process </li></ul><ul><ul><li>Write scripts to migrate content to DITA </li></ul></ul><ul><ul><li>Write scripts to fine tune results </li></ul></ul><ul><ul><li>Test scripts on a sample set </li></ul></ul><ul><ul><li>Work with authors to identify pain points </li></ul></ul><ul><ul><li>Repeat… </li></ul></ul><ul><li>Began enforcing stricter limitations on input files </li></ul>
  29. 29. Changes for Authors <ul><li>New Authoring Tool </li></ul><ul><li>New Content Management System </li></ul><ul><ul><li>Direct integration with our authoring tool made managing files easier </li></ul></ul><ul><ul><li>New Content Management System easier to use, but less robust, than previous system </li></ul></ul><ul><li>Software strings extracted from source code for use in error message guides </li></ul><ul><li>New Style Guide </li></ul><ul><li>Created new roles to handle concerns or confusion about the new format </li></ul>
  30. 30. Changes for Localization <ul><li>TM adjusted programmatically to reduce the impact of the new file format </li></ul><ul><li>Filters put in place to restrict the type of content that is exposed for translation </li></ul><ul><li>Workflows introduced to automate translation process </li></ul><ul><li>Interactions with vendors changed </li></ul><ul><li>New translation tools </li></ul><ul><li>New systems for translating graphics and screenshots (graphics now translated as text) </li></ul>
  31. 31. Changes for Publishing <ul><li>All content uses a single file format </li></ul><ul><li>Redesigned our publishing layer (several times) to be more extensible </li></ul><ul><li>Had to develop custom transforms for formats that were previously produced with proprietary software </li></ul><ul><li>Introduced tools for automated QA testing </li></ul><ul><li>Created processes to automate publishing of content, and incorporate output into the product build </li></ul>
  32. 32. How did we do?
  33. 33. Migration Goals: Revisited <ul><li>Reduce production times </li></ul><ul><li>Support a single file format </li></ul><ul><li>Support a single publishing process </li></ul><ul><li>Minimize writing effort </li></ul><ul><li>Reuse content between deliverables and Business Units </li></ul><ul><li>End-to-end Unicode character encoding </li></ul><ul><li>Reduce the amount of required interaction between the localization, documentation and publishing teams </li></ul>
  34. 34. Documentation in 2008 Criteria 2005 2008 Teams 6 14 Tools/Formats supported Word, Framemaker 6/7, Robohelp, (forehelp), JavaDoc, .Net XML XMetaL and DITA Content Management Perforce WorldServer Translation Trados WorldServer Publishing Combination of people, and WebWorks WorldServer Managing Published Content Fully manual 50% Automation (and more on the way!)
  35. 35. Unexpected Benefits <ul><li>Less source content </li></ul><ul><li>Increased adherence to standards and style guidelines </li></ul><ul><li>Collaboration across the sites </li></ul><ul><li>Improved flexibility with published output </li></ul><ul><li>The technology has given us more flexibility </li></ul><ul><ul><li>Pulling content directly from source code </li></ul></ul><ul><ul><li>Direct integration with the build system </li></ul></ul>
  36. 36. Room for improvement <ul><li>Lost some doc-related features </li></ul><ul><li>Process automation needs review </li></ul><ul><ul><li>Some workflows not effective </li></ul></ul><ul><ul><li>Some workflows take too long, or are too tedious </li></ul></ul><ul><li>Discovered commonalities in content that can be better represented through topic specialization </li></ul><ul><li>Information Architecture still fairly rudimentary </li></ul>
  37. 37. Lessons Learned <ul><li>Some additional wisdom we picked up along the way </li></ul>
  38. 38. Education <ul><li>General education should be provided early </li></ul><ul><ul><li>Theoretical DITA </li></ul></ul><ul><ul><li>Topic Oriented Writing </li></ul></ul><ul><ul><li>Structured Writing Principles </li></ul></ul><ul><li>Specific education should be provided as needed </li></ul><ul><ul><li>DITA tag reference </li></ul></ul><ul><ul><li>Specific tools training </li></ul></ul><ul><li>Classroom training can help improve confidence </li></ul><ul><li>Some material should always be available for on boarding </li></ul><ul><ul><li>Skill with DITA is not yet common – some degree of training will need to be provided for any new hires </li></ul></ul>
  39. 39. DITA is not ‘Just XML’ <ul><li>DITA implies a content architecture and necessitates Information Typing </li></ul><ul><li>The DITA DTD is not simple </li></ul><ul><li>The Open Toolkit Transformations are not trivial </li></ul>
  40. 40. Planning <ul><li>Plan extra time for: </li></ul><ul><ul><li>Migration workload for writers </li></ul></ul><ul><ul><li>Rewriting of content </li></ul></ul><ul><ul><li>Bug resolution before first release </li></ul></ul><ul><li>Analyze the cost and the business case </li></ul><ul><ul><li>Is it a worthwhile investment? </li></ul></ul><ul><li>Get 100% commitment </li></ul><ul><ul><li>Upper management commit to cost </li></ul></ul><ul><ul><li>Writers commit to change and to migration schedule </li></ul></ul>
  41. 41. Communication <ul><li>Separate tools and content architecture decisions </li></ul><ul><ul><li>Create a dedicated tools team </li></ul></ul><ul><ul><li>Leverage the tools as much as possible </li></ul></ul><ul><li>Create a single point of contact for style changes </li></ul><ul><li>Determine tagging rules and ‘special cases’ as early as possible </li></ul><ul><ul><li>With no guidance, authors are forced to make their own decisions </li></ul></ul><ul><li>Not everything needs to be done at once, but clear milestones need to be set for when things will be done </li></ul>
  42. 42. General <ul><li>The migration requires some initial investment from all parties </li></ul><ul><li>The most difficult move for us was the move to Topic Oriented Authoring </li></ul><ul><li>The ‘cleaner’ your input, the better your output will be </li></ul><ul><li>Dedicate resources for customizing publishing output </li></ul>
  43. 43. Questions? <ul><li>Feel free to email me at </li></ul>