2. Introduction to Content Engineering: Topics
What is Content?
Content Engineering & the Content Processing Roadmap
The Business Context of Content Engineering
Aims:
Establish the nature of, and need for, Content Engineering
Define a rubric of terminology for the tools and techniques
that constitute a practical working framework for
discussing, designing, developing and deploying
content management and processing systems
4. Content is how we Communicate
Content is the physical form
of human communication
Content is meaningful
because it entails context
Narrative Structures
Implied Associations
Associative Memory Associative Memory
Acquired Perspectives Acquired Perspectives
Imperfect Expression Imperfect Interpretation
Content is typically serialized
due to the ways we
express, store and interpret information
5. The Document as the Popular Face of Content
The document has proven to be a
powerful device for communicating and
retaining content
While documents provide effective physical containers for content, they also
lead to multiple modes of exchange and potential obsolescence
6. Content is Everywhere
This has been true since the dawn of
civilization and its importance grows daily
Content populates an ecosystem where people receive, internalize,
modify, create and share that content. Content connects everything.
7. The Truth about Content
We are faced with:
Massively expanding content volumes
Diversifying venues for content delivery
Proliferating format varieties
Rising expectations of users
Escalating specialization of content
Evolving interconnectedness of content
Multiplying problems related to content security
Continuing lifecycle challenges (obsolescence remains a risk)
Increasing complexity of content
(the reintegration of data & documents)
Growing recognition of the central importance of content
8. What Lies Ahead?
What are the biggest challenges you face today
in managing and using content?
What do you suspect will be the
biggest challenge
you will be facing
in the next five years?
What are the opportunities emerging
to leverage content in your business?
9. An Essential Response: Content Engineering
Working Definition
The application of
rigorous engineering discipline
to the design, development
and deployment of
content management and
processing systems
Distinguishing Features
Systematic approach
Progressive use of technology
Awareness of
Lifecycle considerations
Total cost of ownership
Solution scalability
10. Engineering and Content
Organizing work
Laying out
work spaces
Sequencing of
process steps
Optimizing tasks
Refining tools
Improving materials
Transferring results
between stages
Sharing resources
Performing
maintenance
Troubleshooting
problems
Differential Analyzer – Vannevar Bush (1930s)
11. Content Engineering
Content Engineering
Governing discipline
Goal-directed
Content Management
Protect Value
Content Processing
Enhance Value
People
Create Value
Planning
Designing
Authoring
Editing
12. Content Management Components
Content Management
Control
Organize resources, access
and lifecycle
Change
Facilitate the evolution of
content and the associated
services
Deploy
Enable the services
the content makes
possible
Control Change Deploy
13. Content Management and Content Processing
A Close Relationship
CM cannot exist without
content processing services
Expanding CM services
demands more processing
The sophistication of the
processing functions
increases more rapidly than
management functions
Many CMS solutions are
constrained by weak
content processing capabilities
18. Converting Content
?
Conversion: changing the format of legacy content to make it increasingly
suitable for efficient management, revision, reuse and publishing.
19. The Harsh Reality of Legacy Content
Legacy Content
All content resources that modification in order to be useful
The Legacy Content Spectrum
Opaque
Not directly processable (e.g., paper)
Annoying
Aggressively proprietary
Little or no predictability in usage
Polluted
Normally processable but frequently
filled with deviations & additions (HTML)
Tolerable
Documented format that exposes format
& structure in a processable form
20. Conversion Fundamentals
Conversion is unavoidable and always under-estimated
Conversion is fundamentally a matter of interpretation
Parsing the legacy format & layout
Inferring a meaning from this information
Correlating the format & layout to a target structure
Addressing problems introduced by format peculiarities
Leveraging the content itself to guide format interpretation
Enhancing interpretive rules by matching content patterns
Automating conversion typically relies on two stages:
Format Interpreter that can make sense of source formatting
Rules-based Correlation Processor that maps content into structures
21. Conversion Process Template
Target Source to Subject
XML Source
Target Interaction Matter
Schema Analysis Experts
Mapping
Guidance
Legacy
Source Modify Modified Manual
Existing
Content Conversion Conversion Conversion Editing
Rules
Process Rules
Example 1 Execute
Result Identified
Set Conversion Interaction
Analysis Issues
Process
Sample 2
Set 10%
Complete 3 Application Validation &
Complete
Set 100% Tests Verification
23. Refactoring Content
Refactoring: restructuring content, without loss of meaning, to improve its
suitability for management, maintenance and specifically reuse.
24. Aspects of Refactoring
Refactoring breaks down into two tasks
Bursting
Normalization
Content Bursting
Decomposing content into components optimized for reuse
Content Normalization
Systematic removal of redundancies to improve maintainability
Challenges
Ensuring content components remain meaningful & manageable
Maintaining a complete equivalence with the original
Adapting the linking mechanisms so they remain valid and functional
Usually entails introduction of an indirect referencing scheme
25. Refactoring Strategies
Strategy needed to ensure adequate returns on investment
Refactor content that undergoes the highest rates of change first
Conversion
Outputs
Compare
Outputs
27. Collecting Metadata
Metadata: a set of data that provides information about other data.
Collecting Metadata: extracting, validating, integrating, supplementing,
synchronizing and storing metadata from, and about, the content.
28. The Function of Metadata
Metadata is used to make the context of content explicit
Used to facilitate
Control
Security
Limitation of rights
Orderly storage & retrieval
Discovery
Searching
Navigating
Exchange
Surprisingly important point
The boundary between
metadata and content is
Yale University Library
never completely clear
29. The Storage of Metadata
Useful Design Pattern: Detachable Metadata
Key metadata clustered into a document sub-component
Shareable amongst many uses
Incorporated into document
when important to do so &
only then
30. Ontologies, Taxonomies & Metadata
Ontology
The Meaning of Metadata
Metadata categories and values
relate content to aspects of metadata
an Ontology
The Ontology provides the
context for metadata
Ontologies metadata
Describe a domain of knowledge
Topic
Can be used as the basis of:
Topic
Taxonomies (classification schemes)
Link networks Taxonomy
Topic
Context driven navigational aids
Topic
Link Network
32. Establishing Relationships
Explicit Links (Actual)
Identifier Source Target Type
A1
A2
Implicit Links (Potential)
Identifier Source Target Type
B1
B2
Reuse Links (Physical)
Identifier Resource Request Condition
R1
R2
Links: the connections or relationships between things that
represent a significant portion of the meaning and value of content
33. Link Management
Link Analysis:
Increasingly Outbound Links: Intact or broken
important Transclusions: Where used
metadata
Inbound Links: Track-back / Where cited
Increasingly External Links: Network participation
complex
L ink
Link Analysis metadata
b o und
Out
Significant
L in k
processing cl u sion
Trans
Leverages
external i nk
ou nd L
storage of links Inb
Bidirectional
External Link
& link metadata
Link generation
becoming critical
Link Base
35. Delivering Content
Compile Publish
Resolve
Resolve: assemble content and instantiate applicable relationships
Compile: convert resolved content into a form suitable for rendition
Publish: render the content in the forms required by the context
36. The Goal: High Fidelity Automation
Print Publishing
Content (PDF)
Web Publishing Output Print
Deliver PDF
(Portal / Portable) Products
- Resolve
- Compile
- Publish Rules Publish
Transformations
Output Variants
Templates
Delivery Processing
Resolve
Render
Output Plan
Assembling the inputs (Map & View)
Content requested Content
Supporting assets Assets Compile
Applicable stylesheets & rules
Output Web
XHTML
Resolve into a processable whole Products
Compile formattable content representations
Publish final formatted renditions
37. Content Processing & Validation
Validation
Essential capability
Enables consistent
processing
Streamlines
processes
Validation must be
Accurate
Manageable
Informative
Actionable
Pro-active
Continuously improving
38. Validate & Transform: Simple
Content Validation
DTD structural rules
Instance conformance
Content Transformation
Traditionally focused on arranging
content for formatting
Supporting primarily
structural manipulation
Validated Outputs
Inputs to rendition processes
HTML outputs
XML outputs
39. Schema Rules
Content
Instance
Validate & Transform: Complex
Structure Validation Content Verification
Content Validation & Verification
Schema structural rules
Rules governing content values
Instance conformance
Transformation
Content Transformation Processing
Continuous process of improvement
Parse, validate, align, verify…repeat
Manipulation of many content types
Validated Outputs
Outputs
Inputs to rendition processes
HTML outputs
XML outputs
Data outputs for applications
40. Complexity and the Cost of Quality
Complexity is inherent in
the nature of content
Increasing content
complexity increases the
amount and sophistication
of content processing tasks
Increases in content
processing tasks results in
a significant increase in the
total cost of quality
42. Managing Solution Risk
Integration risk represents
The potential loss of services
The potential loss of assets
Integration risk increases
with the increase in the
number of technologies
used to build a solution
System complexity
Can be managed
Ultimately limits solution
affordability and even viability
Addressed in design selections
43. Technology Selection
Key Considerations
Solution context
Scored against
requirements
Scoring scale
0 – No Fit
6 – Total Fit
Results weighed
against acquisition cost
44. Technology Lifecycle Considerations
High High
Solution context includes Measuring Overall
Productivity over Time
Urgency
Complexity
Criticality
Constraints
Time
Projected lifecycle Low
Expected lifespan Complexity
Rate of change
Influencing factors
High High
45. Solution Component Dependencies
Structure
Content Media Process
Maps Schemas
Files Sources Rules
<X>
Document Processing
Import Data Style
Templates Scripts
Sources Sources A BC Sheets
Analysis Relationships Quality Log Configuration
xy Reports Reports Reports Files
.. .. A B
.. ..
Because all components within a solution evolve their inter-dependencies
require explicit description and management.
46. Evaluating Standards as Potential Tools
Independence
From parochial interests, proprietary claims, external influences
Formality
Of creation, validation, approval & modification process
Stability
Of standard over time & the backward compatibility of changes
Completeness
Sufficiency for declared scope as well as availability of
useful documentation & reference implementations
Adoption
Extent of support amongst tool vendors, authorities & users
Practicality
The extent to which all, or parts, of the standard can be deployed
47. Evaluating a Specialized Industry Standard
Scenario
Industry
specification
Broad scope
Specialized
stakeholder
community
Continuously
changing
& expanding
Strategy
Implement where
necessary
Address risk areas
48. Evaluating a Cross-Industry Standard
Scenario
Addressing
widespread issues
Broad stakeholder
community
Mature
Further
capabilities
emerging
Strategy
Plan for adoption
Consider for use in
variety of areas
49. Content Solution Architecture Framework
Controls
Enterprise
Programs Domains
Active Web
Specialized
Document Sources Publishing Services
Models
Integrate
External Print
Ontology Sources Discovery Services
Rules
Legacy Application
Data Sources Content Architecture Data Services
Inputs Outputs
Users Tools
Mechanisms
Authors Content Management
Resources
Subject Matter Experts Content Processing
Administrators Budget Content Authoring
Information Architects Personnel Development Tools
Developers Infrastructure Web Services
50. Content Architecture
Content
Establishes Engineering
governing model
of the knowledge Content Architecture
domain
Content Content Solution
The knowledge Management Processing Architectures
that has informed
the content
Convert Transform Publish
The knowledge
being
encapsulated
in the solutions Refactor Collect Compile
Supports multiple
Relate Resolve
solution instances Validate
51. The Central Role of the Content Architecture
Content Service Discovery Specialized
Requirements Requirements Taxonomies
Architecture
Topic
Description Description
Procedure
Data Concept Task Reference Data
Data Description
Data
Description Procedure
Procedure
Data Data
Specialized
Information Types Specialized
Delivery Processes
Procedure
Data
Data Annotation Formatting Effectivity Data
Procedure
Data
Change Procedure
Data Data
Specialized Procedure
Data
Domains
52. Content Solution Design Principles
The nature of content demands an adaptable architecture
Technology components should be loosely-coupled
Content must always be available in its simplest self-describing form
Data stores should be replaceable by stored instances
True for content, metadata and links
Content processing events can be performed many ways
Simple methods must be present, sophisticated methods may be
All interfaces established as the exchange of validated content
Processing rules are, themselves, managed & processable content
Content Processing should be extensively leveraged
Content validation, analysis and reporting at every stage
Used to manage & optimize solution components to improve efficiency
53. Content Engineering Maturity Model
Modeled on the Software Engineering Institutes (SEI)
Capability Maturity Model Integration (CMMI)
“managed” used instead of “quantitatively managed” for level 4
“repeated” used instead of “managed” for level 2
“reactive” used instead of “performed” for level 1
Level
Content Engineering Maturity Model
Objective 5 Optimized
Follow software
engineering in 4 Managed
emphasizing the 3 Defined
importance of
formalization & 2 Repeated
quantitative methods 1 Reactive
for continuous
improvement 0 Incomplete
54. CE Maturity Model: Level 0 Incomplete
Incomplete
Often the complete absence of a documented process
A process that is documented but not followed also qualifies
Features
New requirements
addressed using
available tools
Each solution seeks
cost minimization
No persistent
infrastructure
No improvement
between projects
55. CE Maturity Model: Level 1 Reactive
Reactive
A process exists for specific goals
Sufficient for the needs of selected products
Not institutionalized and not integrated with institutional processes
Content Engineering Maturity Model
Features Level
Not designed to 5 Optimized
handle new or 4 Managed
changing
requirements 3 Defined
Can result in 2 Repeated
multiple solutions
each created as a 1 Reactive
reaction 0 Incomplete
56. CE Maturity Model: Level 2 Repeated
Repeated
A managed process exists and is supported by basic infrastructure
Predictability can be achieved in process performance & products
Reviews are conducted to identify & initiate improvements
Content Engineering Maturity Model
Features Level
A common set of 5 Optimized
tools has been 4 Managed
selected
3 Defined
Procedures
exist for steps 2 Repeated
Solution
1 Reactive
components
documented 0 Incomplete
57. CE Maturity Model: Level 3 Defined
Defined
Standardization in processes established on an institutional level
Common tools & techniques used across processes & projects
Features
Content Engineering Maturity Model
A single Level
infrastructure used 5 Optimized
to support multiple
4 Managed
processes &
projects 3 Defined
Processes defined 2 Repeated
with reference to
enterprise models 1 Reactive
Interrelationships 0 Incomplete
are known
58. CE Maturity Model: Level 4 Managed
Managed
Processes are managed using quantitative measurement
Automation is maximized in the execution of process steps
A single integrated & managed environment supports all processes
Content Engineering Maturity Model
Features Level
Infrastructure 5 Optimized
components 4 Managed
managed as content
with automation 3 Defined
used to adapt 2 Repeated
behaviour
High levels of 1 Reactive
quality sustained 0 Incomplete
59. CE Maturity Model: Level 5 Optimized
Optimized
Continuous orientation towards improvement
Continuous refactoring of solution and content to achieve efficiencies
Continuous identification & implementation of heightened standards
Content Engineering Maturity Model
Features Level
Systematic analysis 5 Optimized
& correction of 4 Managed
variations
3 Defined
Proactive
identification of new 2 Repeated
products & services
that can be offered 1 Reactive
Industry innovation 0 Incomplete
60. General Observations
Content is inherently complex
Current trends have moved content to the center of attention
Content Engineering is an essential response
Provides the necessary discipline & the conceptual framework
Content has not typically received this level of attention in the past
Effective Content Processing is central to success
Content Management services are enabled by content processes
Adaptive content processing is essential for addressing change
Effective Content Solutions are designed to cover the complete content
lifecycle and all stakeholder perspectives
The efficient management and processing of content remains an
elusive goal for most organizations
61. Content Engineering and Business Value
The design of Content Solutions should
Continuously minimize the costs of
acquiring, enriching, managing
and delivering content
Continuously improve content
resources through enrichment
Continuously increase the
benefits realized through
the delivery of content
Continuously reduce risks
threatening content assets or
the services being supported
Each of these represents an
increase in value
62. Top Ten Secrets of Content Solution Success
Don’t underestimate your content or your business
Don’t underestimate the power of good automation
Chose an appropriate tool set and validate your choices
Don’t invest in content management technology too early
Carefully plan and execute migration activities
Take a “customer service” focus in delivering tangible
benefits (new products / services) from your investments
Be demanding of your suppliers (expect quality)
Engage your stakeholders and “take control” of the solution
Leverage standards, don’t be enslaved by them
Be an active part of the community as a way to learn and as
a way to share what you have learned
63. The End
Admittedly an awful lot to cover in
a single go. Hopefully some of the
ideas connect with some of your
experiences and perhaps help in
framing aspects of your
next project.
Joe Gollner
VP e-Publishing Solutions
Stilo International
jgollner@stilo.com