Slideshow transcript
Slide 1: Extreme Content Makeover: Migrating Content to DITA Joe Gollner Vice President e-Publishing Solutions Stilo International jgollner@stilo.com Copyright © Stilo International 2008
Slide 2: The Essence of Content Conversion Got this! Want that!
Slide 3: Legacy C t t Edition L Content Editi
Slide 4: Topics The Growing Demand for High Quality Content Challenges with Converting Content Solution Patterns for Converting Content Conversion Refactoring Metadata Linking Li ki Validation Conclusions C l i Key Lessons Learned
Slide 5: An Inconvenient Truth – About Content
Slide 6: Case Study: Drug Look-up Tool Migrating drug information into a precise digital form represented a key challenge. Sources: Miles33, Quark Mil 33 Q k & vendor monographs
Slide 7: Enterprise Content Frameworks rols Enterprise Contr Programs Domains Active Web ed Document Sources Publishing Services Specialize Models Integrate External Print Ontology Sources Discovery Services Rules Legacy Application Data Sources Content Architecture Data Services Inputs Outputs Users Tools Service Oriented Mechanisms Content Architectures Authors Content Management lead to high demands Resources being placed on Subject Matter Experts Content Processing content resources and Budget B d t Administrators Content Authoring the affordability of the overall process. Information Architects Personnel Development Tools Developers Infrastructure Web Services
Slide 8: Observations on Content Expectations Within this larger context what is expected of content? context, 1. Content will be available as valid XML 2. Content will be modularized 3. Content will be discretely addressable 4. Content will be uniquely identifiable using metadata 5. Content will be linked to related content 6. 6 Content will be process able with almost perfect confidence process-able How much legacy content is ready to play this role? (How much XML content is even ready for this?)
Slide 9: The Harsh Reality of Legacy Content g y Legacy Content All content resources that require modification in order to be useful The Legacy Content Spectrum Opaque Not directly processable (e.g., paper) Annoying Aggressively proprietary Little or no predictability in usage Polluted Poll ted Normally processable but frequently filled with deviations & additions (HTML) Tolerable Documented format that exposes format & structure in a processable form
Slide 10: Content Processing Roadmap ACQUIRE ENRICH DELIVER CONTEXT Import Metadata Select Content Processing Convert Collect Compile CONTENT Import Select Publish Manage Content Processing Refactor Relate Resolve CONNECTIONS Import Links Select
Slide 11: Convert Content ACQUIRE ENRICH DELIVER CONTEXT Import Metadata Select Content Processing Convert Collect Compile CONTENT Import Select Publish Manage Content Processing Refactor Relate Resolve CONNECTIONS Import Links Select
Slide 12: Converting Content ? Conversion: changing the format of legacy content to make it increasingly suitable for efficient management, revision, reuse and publishing.
Slide 13: Conversion Fundamentals Conversion is unavoidable and always under-estimated Conversion is fundamentally a matter of interpretation Parsing the legacy format & layout Inferring a meaning from this information Correlating the format & layout to a target structure Addressing problems introduced by format peculiarities Leveraging the content itself to guide format interpretation Enhancing i t E h ti i interpretive rules b matching content patterns l by t hi t t tt Automating conversion typically relies on two stages: Format Interpreter that can make sense of source formatting Rules-based Correlation Processor that maps content into structures
Slide 14: Conversion Process Template Target Source to Subject S bj t XML Source Target Interaction Matter Schema Analysis Experts Mapping Guidance Legacy Source Modify Modified Manual Existing Content Conversion Conversion Conversion Editing Rules Process Rules Example p 1 Execute Result esu Identified de ed Set C i Conversion Interaction I i Analysis Issues Process Sample 2 Set 10% Complete 3 Application Validation & Complete Set 100% Tests Verification
Slide 15: Show Me!
Slide 16: Conversion Process Initiation Content Analysis Document all features of source content and format Establish Control Collections Collections can be used to group files with similar features Rules can be tailored to address these features Collections provide useful management units for tracking & reporting Clearly Define the Target End State Sh ld b well-suited to validation & verification activities Should be ll it d t lid ti ifi ti ti iti Conversion should be separate from refactoring which can follow it Ensure that application testing is performed for verification Structural validation is not sufficient The converted content must support its intended uses
Slide 17: Conversion Process Planning Prepare a Conversion Specification Document analysis results & content mapping rules Incorporate naming conventions to be applied Instances, Instances media resources, identifiers, cross-references… resources identifiers cross references Establish a representative Example Set early in process A limited set of files that exhibit main features of source content Matched ith M t h d with converted content th t ill t t i t d d result t d t t that illustrates intended lt Used to iteratively refine rules & troubleshoot problems Forms part of the Conversion Specification Prepare a Conversion Plan Document intervention procedures to be followed Define manual editing g g guidelines Explore outsourcing opportunities to enhance process or reduce costs Prepare schedule & cost estimates
Slide 18: Conversion Process Refinement Implement initial Conversion Process Maximize automation Develop validation & verification scenarios that leverage automation Ensure conversion rules can be modified by non-programmers The goal is to interact with Subject Matter Experts efficiently Based on Conversion Specification & Example Set p p Test Conversion Process Follow the process from beginning to end Including application tests & output review Look for opportunities to enhance automation Perform trial interventions & manual editing to improve procedures Revise Conversion Specification, Example Set & automation
Slide 19: Conversion Process Execution & Adaptation Process refinement should continue throughout conversion Improve automation as the first response to identified issues Minimize manual editing and ensure it is made as routine as possible Suitable for outsourcing under knowledgeable guidance Application Testing is important (verification) Where all target applications are not available Develop tests that will minimize risks Reduce risk of rework M l l ft f Manual clean-up after format interpretation is less at risk ti t t ti i l t i k Manual editing as part of content mapping is at greater risk Separate format interpretation from content mapping p p pp g An interim XML format should be used as an interface Interim format should retain all details available in source content
Slide 20: Refactor Content ACQUIRE ENRICH DELIVER CONTEXT Import Metadata Select Content Processing Convert Collect Compile CONTENT Import Select Publish Manage Content Processing Refactor Relate Resolve CONNECTIONS Import Links Select
Slide 21: Refactoring Content Refactoring: restructuring content, without loss of meaning, to improve its g g , g, p suitability for management, maintenance and specifically reuse. Refactoring entails two activities: bursting & normalization
Slide 22: Aspects of Refactoring Refactoring breaks down into two tasks Bursting Normalization Content Bursting Decomposing content into components p g p optimized for reuse Content Normalization Systematic removal of redundancies to improve maintainability S t ti l f d d i t i i t i bilit Challenges Maintaining a complete equivalence with the original Adapting the linking mechanisms so they remain valid and functional Usually entails introduction of an indirect referencing scheme
Slide 23: Refactoring Strategies Strategy needed to ensure adequate returns on investment Approach must balance cost, risk, effort and time in a practical way Conversion Outputs Compare Outputs
Slide 24: Refactoring: Planning Granularity Level Finding the Right Level of Granularity What are the most “natural” joints where content can be burst How is content most meaningfully Managed Authored Used Ideally there is a level of granularity that is consistent across the views What to Avoid Over-ambition in defining granularity level At some point of decomposition, content becomes Meaningless Very difficult to manage Very expensive to achieve across large sets of content Challenging to work with for authors
Slide 25: Normalization Normalization is an optimization appropriate for content that: Has a long lifespan Exhibits a significant rate of change Will be translated into other languages Normalization occurs at two levels At the level of managed granularity (component) Commonly performed tasks in technical documentation Example: Procedures for accessing a control interface At a sub-component level Boilerplate text (e.g., copyright notice or disclaimer) Advisories (e.g., safety warnings) Automation can support the process under guidance Identify redundancies & implement replacement decisions Facilitate verifications that there has been no content loss or output impacts
Slide 26: Realizing Savings through Refactoring
Slide 27: Collect Metadata ACQUIRE ENRICH DELIVER CONTEXT Import Metadata Select Content Processing Convert Collect Compile CONTENT Import Select Publish Manage Content Processing Refactor Relate Resolve CONNECTIONS Import Links Select
Slide 28: Collecting Metadata Metadata: M t d t a set of data that provides information about other data. t f d t th t id i f ti b t th d t Collecting Metadata: extracting, validating, integrating, supplementing, synchronizing and storing metadata from, and about, the content.
Slide 29: Sources of Metadata Ontology Internal Segments of content designated as valuable metadata metadata Attributes Att ib t available i source f il bl in format t Keywords & Abstract Annotations Identify Extract E t t Insert External metadata System Data (file information) Topic T i Associated keywords & descriptions Topic Ratings & commentary Process co te t ocess context Taxonomy Topic Additional information drawn from other Topic sources (e.g., part database) Link Network
Slide 30: Establish Relationships ACQUIRE ENRICH DELIVER CONTEXT Import Metadata Select Content Processing Convert Collect Compile CONTENT Import Select Publish Manage Content Processing Refactor Relate Resolve CONNECTIONS Import Links Select
Slide 31: Establishing Relationships Explicit Links (Actual) Identifier Source Target Type A1 A2 Implicit Links (Potential) Identifier Source Target Type B1 B2 Reuse Links (Physical) Identifier Resource Request Condition R1 R2 Links: the connections or relationships between things that represent a significant portion of the meaning and value of content
Slide 32: All About Links Increasingly important Essential for portals (enabling navigation) Adding links g Source / target identification Link specification Link generation Link validation Link extraction Link reporting Link activation Level of precision is high as is the potential for error
Slide 33: Content Validation Validation Essential capability Enables consistent processing Streamlines processes Confirms conversion end-point Convert Transform Publish Validation must be Accurate Manageable Refactor Collect Compile Informative Actionable Relate Resolve Pro-active Continuously improving
Slide 34: Conclusions Content conversion is an unavoidable undertaking Performance Support Portals demand high-precision content high precision Content conversion is a challenging undertaking Particularly given the precision being demanded of the results Content conversion is a manageable undertaking Guided automation Substantially reduces costs Dramatically improves quality But there is no magic…
Slide 35: Your Dreams Can Come True




Add a comment on Slide 1
If you have a SlideShare account, login to comment; else you can comment as a guest- Favorites & Groups
Showing 1-50 of 0 (more)