Engineering Web Content (Web Content 2009)


Published on

A shortened version of the Content Engineering Workshop, specifically tailored for Web Content 2009 (Chicago IL).

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Engineering Web Content (Web Content 2009)

  1. 1. Engineering Web Content: Bringing Discipline & Automation to the Business of Managing Content Joe Gollner VP Enterprise Publishing Solutions
  2. 2. Engineering Web Content: Conversion, Refactoring, Profiling Joe Gollner Part 1 VP Enterprise Publishing Solutions
  3. 3. Engineering Web Content: Part 1 Topics Introducing Content Engineering Introducing the Content Processing Roadmap Conversion (content modernization) Refactoring (content optimization) Profiling (metadata collection) Aim: Introduce the tools and techniques that constitute a practical working framework for discussing, designing, developing and deploying content management and processing systems
  4. 4. The Truth about Content We are faced with: Massively expanding content volumes Diversifying venues for content delivery Proliferating format varieties Rising expectations of users Escalating specialization of content Evolving interconnectedness of content Multiplying problems related to content security Continuing lifecycle challenges (obsolescence remains a risk) Increasing complexity of content (the reintegration of data & documents) Growing recognition of the central importance of content
  5. 5. An Essential Response: Content Engineering Working Definition The application of rigorous engineering discipline to the design, development and deployment of content management and processing systems Distinguishing Features Systematic approach Progressive use of technology Awareness of Lifecycle considerations Total cost of ownership Solution scalability
  6. 6. Engineering and Content Organizing work Laying out work spaces Sequencing of process steps Optimizing tasks Refining tools Improving materials Transferring results between stages Sharing resources Performing maintenance Troubleshooting problems Differential Analyzer – Vannevar Bush (1930s)
  7. 7. Content Engineering Content Engineering Governing discipline Goal-directed Content Management Protect Value Content Processing Enhance Value People Create Value Planning Designing Authoring Editing
  8. 8. Content Management Components Content Management Control Organize resources, access and lifecycle Change Facilitate the evolution of content and the associated services Deploy Enable the services the content makes possible Control Change Deploy
  9. 9. Content Processing Components Content Processing Convert Transform Publish Key Focus in Content Engineering
  10. 10. Content Processing Components Content Processing Convert Transform Publish Transformation Breaks down into Refactor Relate Collect Resolve Compile Emphasis on leveraging efficient automation
  11. 11. The Content Processing Roadmap ACQUIRE ENRICH DELIVER CONTEXT Import Metadata Select Content Processing Convert Collect Compile CONTENT Import Select Publish Manage Content Processing Refactor Relate Resolve CONNECTIONS Import Links Select
  12. 12. Convert Content ACQUIRE ENRICH DELIVER CONTEXT Import Metadata Select Content Processing Convert Collect Compile CONTENT Import Select Publish Manage Content Processing Refactor Relate Resolve CONNECTIONS Import Links Select
  13. 13. Converting Content ? Conversion: changing the format of legacy content to make it increasingly suitable for efficient management, revision, reuse and publishing.
  14. 14. The Harsh Reality of Legacy Content The Legacy Content Spectrum Opaque Not directly processable (e.g., paper / scanned images) Annoying Aggressively proprietary Little or no predictability in usage Polluted Normally processable but frequently filled with deviations & additions (HTML) Tolerable Documented format that exposes format & structure in a processable form Fortunately, popular formats are becoming more and more “tolerable”
  15. 15. Conversion Fundamentals Conversion is unavoidable and always under-estimated Conversion is fundamentally a matter of interpretation Parsing the legacy format & layout Inferring a meaning from this information Correlating the format & layout to a target structure Addressing problems introduced by format peculiarities Leveraging the content itself to guide format interpretation Enhancing interpretive rules by matching content patterns Automating conversion typically relies on two stages: Format Interpreter that can make sense of source formatting Rules-based Correlation Processor that maps content into structures
  16. 16. Conversion Process Template Target Source to Subject XML Source Target Interaction Matter Schema Analysis Experts Mapping Guidance Legacy Source Modify Modified Manual Existing Content Conversion Conversion Conversion Editing Rules Process Rules Example 1 Execute Result Identified Set Conversion Interaction Analysis Issues Process Sample 2 Set 10% Complete 3 Application Validation & Complete Set 100% Tests Verification
  17. 17. The Key Questions Where are you? A true assessment of the state of your content sources Where are you going? A validated understanding of the output that you must produce & the uses to which it will be put
  18. 18. Practical Content Conversion Best Practice for Content Conversion Flexible posture Leverages the best tools & techniques Adapts to circumstances Continuously looks for automation opportunities Deploys automation under the guidance of the people who understand the content Leverages automation to: Analyse sources Perform transformations Validate results Analyse results
  19. 19. Scenario: Converting Drug Information Not Recommended oC oD oA oB Optional ari ar i ari ar i en en en en Sc Sc Sc Sc Recommeded Drug 1 Drug 2 Drug 3 Drug 4 Migrating drug information into a precise digital form presented a critical challenge Source: Miles33, Quark & vendor drug monographs Target: Logical data structures needed to drive diagnostics
  20. 20. Refactor Content ACQUIRE ENRICH DELIVER CONTEXT Import Metadata Select Content Processing Convert Collect Compile CONTENT Import Select Publish Manage Content Processing Refactor Relate Resolve CONNECTIONS Import Links Select
  21. 21. Refactoring Content Refactoring: restructuring content, without loss of meaning, to improve its suitability for management, maintenance and specifically reuse.
  22. 22. Aspects of Refactoring Refactoring breaks down into two tasks Bursting Normalization Content Bursting Decomposing content into components optimized for reuse Content Normalization Systematic removal of redundancies to improve maintainability Challenges Ensuring content components remain meaningful & manageable Maintaining a complete equivalence with the original Adapting the linking mechanisms so they remain valid and functional Usually entails introduction of an indirect referencing scheme
  23. 23. To Burst or Not to Burst Conversion Outputs Compare Outputs Content Modularity is not an end in itself A business rationale must drive bursting & refactoring efforts
  24. 24. Scenario: Realizing Savings with Refactoring Outcome of refactoring: $100 million saved annually
  25. 25. Collect Metadata ACQUIRE ENRICH DELIVER CONTEXT Import Metadata Select Content Processing Convert Collect Compile CONTENT Import Select Publish Manage Content Processing Refactor Relate Resolve CONNECTIONS Import Links Select
  26. 26. Collecting Metadata Metadata: a set of data that provides information about other data. Collecting Metadata: extracting, validating, integrating, supplementing, synchronizing and storing metadata from, and about, the content.
  27. 27. The Function of Metadata Metadata is used to make the context of content explicit Used to facilitate Control Security Limitation of rights Orderly storage & retrieval Discovery Searching Navigating Exchange Surprisingly important point The boundary between metadata and content is Yale University Library never completely clear
  28. 28. Sources of Metadata Metadata can be supplied from an external source System data Captured when content is created / modified Subject information Declaring details about the subject matter Keywords, short descriptions,… Externally managed data about subject Author contributions Annotations, justifications, abstracts,… Process context (critically important) Relating content to business process events Metadata can be extracted from the content Specific aspects of the content are selected as valuable metadata Often one of the more precise aspects of subject-specific markup
  29. 29. Ontologies, Taxonomies & Metadata Ontology The Meaning of Metadata Metadata categories and values relate content to aspects of metadata an Ontology The Ontology provides the context for metadata Ontologies metadata Describe a domain of knowledge Topic Can be used as the basis of: Topic Taxonomies (classification schemes) Link networks Taxonomy Topic Context driven navigational aids Topic Link Network
  30. 30. Scenario: Content Aggregation Services Sources: Paper PDF HTML SGML XML Databases …
  31. 31. Engineering Web Content: Linking, Publishing, Validating Joe Gollner Part 2 VP Enterprise Publishing Solutions
  32. 32. Engineering Web Content: Part 2 Topics Introducing the Content Processing Roadmap (Continued) Linking (content connection) Publishing (content delivery) Validating (content confirmation) Aim: Introduce the tools and techniques that constitute a practical working framework for discussing, designing, developing and deploying content management and processing systems
  33. 33. Establish Relationships ACQUIRE ENRICH DELIVER CONTEXT Import Metadata Select Content Processing Convert Collect Compile CONTENT Import Select Publish Manage Content Processing Refactor Relate Resolve CONNECTIONS Import Links Select
  34. 34. Establishing Relationships Explicit Links (Actual) Identifier Source Target Type A1 A2 Implicit Links (Potential) Identifier Source Target Type B1 B2 Reuse Links (Physical) Identifier Resource Request Condition R1 R2 Links: the connections or relationships between things that represent a significant portion of the meaning and value of content
  35. 35. Relationship Considerations Effective linking is central to content usability & value Ability to provide content tailored to a specific user context depends on being able to facilitate immediate access to additional information Linking is highly contextual Not all relationships are relevant at the same time How relationships are presented is format and media specific Often leads to additional rendition requirements for content objects Multiple renditions of graphics (thumbnail, low-res, high-res) Links have become acknowledged as First-Class Objects Subject to specific management and processing measures Ideally expressed & managed separately from the content (overlays) Associated with metadata & constituting important content metadata
  36. 36. Link Management Link Analysis: Increasingly Outbound Links: Intact or broken important Transclusions: Where used metadata Inbound Links: Track-back / Where cited Increasingly External Links: Network participation complex L ink Link Analysis metadata b o und Out Significant L in k processing cl u sion Trans Leverages external i nk ou nd L storage of links Inb Bidirectional External Link & link metadata Link generation becoming critical Link Base
  37. 37. Scenario: Forest Information Mall FIM Interface Search Functionality Content Context Process Context Finding content using a variety of Navigation through processes familiar mechanisms and leading to (areas) surfaces sets of relevant applicable process areas documents Publish Process Web XML Metadata Web Services Sites Databases Tools Contents Contacts Lightweight deployment of XML & transformations to enable “process help”
  38. 38. Deliver Content ACQUIRE ENRICH DELIVER CONTEXT Import Metadata Select Content Processing Convert Collect Compile CONTENT Import Select Publish Manage Content Processing Refactor Relate Resolve CONNECTIONS Import Links Select
  39. 39. Delivering Content Compile Publish Resolve Resolve: assemble content and instantiate applicable relationships Compile: convert resolved content into a form suitable for rendition Publish: render the content in the forms required by the context
  40. 40. The Goal: High Fidelity Automation Print Publishing Content (PDF) Web Publishing Output Print Deliver PDF (Portal / Portable) Products - Resolve - Compile - Publish Rules Publish Transformations Output Variants Templates Delivery Processing Resolve Render Output Plan Assembling the inputs (Map & View) Content requested Content Supporting assets Assets Compile Applicable stylesheets & rules Output Web XHTML Resolve into a processable whole Products Compile formattable content representations Publish final formatted renditions
  41. 41. The Publishing Pipeline Resolution leverages CMS / Database services (selecting) Compilation produces “simplest possible serialization” All stages generate activity logs that feed a “quality report”
  42. 42. Scenario: Performance Support Portals Performance Support Portals depend upon content resources that are intelligent and modular and that exhibit extremely high levels of quality
  43. 43. Implications for Content What then is expected of content? 1. Content must be available as valid XML 2. Content must be modularized 3. Content must be meaningful in multiple contexts 4. Content must be discretely addressable 5. Content must be uniquely identifiable using metadata 6. Content must be linked to related content 7. Content must encourage modification & addition 8. Content must be processable with almost perfect confidence This also has implications for the publishing process...
  44. 44. Content Processing & Validation Validation Essential capability Enables consistent processing Streamlines processes Validation must be Accurate Manageable Informative Actionable Pro-active Continuously improving
  45. 45. Validate & Transform: Simple Content Validation DTD structural rules Instance conformance Content Transformation Traditionally focused on arranging content for formatting Supporting primarily structural manipulation Validated Outputs Inputs to rendition processes HTML outputs XML outputs
  46. 46. Schema Rules Content Instance Validate & Transform: Complex Structure Validation Content Verification Content Validation & Verification Schema structural rules Rules governing content values Instance conformance Transformation Content Transformation Processing Continuous process of improvement Parse, validate, align, verify…repeat Manipulation of many content types Validated Outputs Outputs Inputs to rendition processes HTML outputs XML outputs Data outputs for applications
  47. 47. Solution Architectures Content Assembles Engineering components to provide integrated services Content Content Solution Management Processing Architectures Technology selection & integration Convert Transform Publish Standards selection & integration Refactor Collect Compile Multiple solution instances Relate Resolve will exist Validate
  48. 48. Content Solution Architecture Framework Controls Enterprise Programs Domains Active Web Specialized Document Sources Publishing Services Models Integrate External Print Ontology Sources Discovery Services Rules Legacy Application Data Sources Content Architecture Data Services Inputs Outputs Users Tools Mechanisms Authors Content Management Resources Subject Matter Experts Content Processing Administrators Budget Content Authoring Information Architects Personnel Development Tools Developers Infrastructure Web Services
  49. 49. Content Architecture Content Establishes Engineering governing model of the knowledge Content Architecture domain Content Content Solution The knowledge Management Processing Architectures that has informed the content Convert Transform Publish The knowledge being encapsulated in the solutions Refactor Collect Compile Supports multiple Relate Resolve solution instances Validate
  50. 50. The Central Role of the Content Architecture Content Service Discovery Specialized Requirements Requirements Taxonomies Architecture Topic Description Description Procedure Data Concept Task Reference Data Data Description Data Description Procedure Procedure Data Data Specialized Information Types Specialized Delivery Processes Procedure Data Data Annotation Formatting Effectivity Data Procedure Data Change Procedure Data Data Specialized Procedure Data Domains
  51. 51. Content Solution Design Principles The nature of content demands an adaptable architecture Technology components should be loosely-coupled Content must always be available in its simplest self-describing form Data stores should be replaceable by stored instances True for content, metadata and links Content processing events can be performed many ways Simple methods must be present, sophisticated methods may be All interfaces established as the exchange of validated content Processing rules are, themselves, managed & processable content Content Processing should be extensively leveraged Content validation, analysis and reporting at every stage Used to manage & optimize solution components to improve efficiency
  52. 52. General Observations Content is inherently complex Current trends have moved content to the center of attention Content Engineering is an essential response Provides the necessary discipline & the conceptual framework Content has not typically received this level of attention in the past Effective Content Processing is central to success Content Management services are enabled by content processes Adaptive content processing is essential for addressing change Effective Content Solutions are designed to cover the complete content lifecycle and all stakeholder perspectives The efficient management and processing of content remains an elusive goal for most organizations
  53. 53. Content Engineering and Business Value The design of Content Solutions should Continuously minimize the costs of acquiring, enriching, managing and delivering content Continuously improve content resources through enrichment Continuously increase the benefits realized through the delivery of content Continuously reduce risks threatening content assets or the services being supported Each of these represents an increase in value
  54. 54. Top Ten Secrets of Content Engineering Success Don’t underestimate your content or your business Don’t underestimate the power of good automation Choose an appropriate tool set and validate your choices Don’t invest in content management technology too early Carefully plan and execute migration activities Take a “customer service” focus in delivering tangible benefits (new products / services) from your investments Be demanding of your suppliers (expect quality) Engage your stakeholders and “take control” of the solution Leverage standards, don’t be enslaved by them Be an active part of the community as a way to learn and as a way to share what you have learned
  55. 55. End of Part 2 – Engineering Web Content Questions & Comments Contact Joe Gollner VP e-Publishing Solutions Stilo International otherwise…the End