A presentation developed and delivered in 1995. It was designed to be part of a larger introduction to SGML. It is interesting today because it foregrounds many (if not all - and perhaps a few extra) of the themes being touched upon in discussions of Intelligent Content. It needed to be shared just in case someone thought that this was all new.
Injustice - Developers Among Us (SciFiDevCon 2024)
Why SGML (Retro Alert 1995)
1. (1995)
Course document
*
Module Module
title para +
Why SGML? figure *
? list *
Sub-title
The Need for SGML
First delivered: 1995
knowledge
+
information information
...
data data + www.gollner.ca
2. (1995)
What is SGML?
SGML stands for the
Standard
Generalized
Markup
Language
SGML is an international (ISO) standard
ISO 8879:1986 Information Processing - Text and
Office Systems - Standard Generalized Markup
Language (SGML)
3. (1995)
What is SGML?
Informal Definitions
SGML is a system and processing
independent means of representing,
creating, managing and exchanging
information.
SGML is an “intelligent markup language”
that protects the accessibility, usability, life
expectancy and value of information.
4. (1995)
Why SGML?
A Meditation on a Paper Clip
The paper clip is a
low-tech version of
hypertext – facilitating
the physical association
of documents & fragments.
Often used in addition to
electronic files where
such associations cannot be
easily shown or enforced.
5. (1995)
SGML was created
to better manage documents
Publications
Training Manuals
Specifications
Documentation
Reports
Correspondence
Policies
Procedures
Standards
Plans
Directives
Commentaries
Proposals
6. (1995)
Most Information
is held in Documents
Database Information Document Information
10% 90%
IM Budget
90% 10%
Allocations
8. (1995)
Document Information
A Document is a meaningful organization of
Information
A Document is meaningful because it is
communicated between people to achieve
specific goals
A Document combines multiple media types
together in an organized, but not strictly
predictable, form that people can use
9. (1995)
Document Information
Features
Wide and
Hierarchical Structure Chapter Title
Section Title Variable
Variable Definitions
Access
1
Variable Organizational Multiple
Boundaries Dynamic
Processes
10. (1995)
Document Information
Conclusions
Document Information does not fit within the
conventional Database paradigm
Database Information is organized
according to the needs of the Computer
Document Information is organized
according to the needs of the User
Few of the assumptions within the Database
Paradigm apply to Documents
12. (1995)
Documents and Computers
Computers help us create more paper faster
Computers help us format printed
documents more efficiently and at less cost
Computers have not helped with the
management consequences
13. (1995)
The Document Explosion
The volume of documents is growing
exponentially
The visibility of document-based
transactions is increasing
The rise of the Internet and Enterprise
Integration dramatically alters the potential
user community of a document
Documents are becoming more complex,
larger and more varied in format
14. (1995)
Management Breakdown
TraditionalRecords Management practices
and technologies cannot cope with the
volume, complexity, or volatility of computer-
generated documents
The typical response has been to extend the
Database paradigm to document information
Given currently-used technology, the best
that can be done is the “Electronic Filing
Cabinet” (old tools made electronic - again)
15. (1995)
What’s Wrong
Computers traditionally store documents as
“objects”
Computers know very little (almost nothing)
about these objects
some management information (author, version, date)
little awareness of document content
less awareness of document structure
Computers can only associate some
information with the objects as the objects
have no inherent “intelligence”
16. (1995)
New Technologies
Applications have evolved to redress some
of these shortcomings
“Electronic Filing Cabinets” associate
management information with document
objects and physically control events
Full-Text Retrieval technologies have been
used to access Document “Content”
Word Processors are used to infer the
structure of documents based on format
(styles and templates)
17. (1995)
Electronic Filing Cabinets
Inan “Electronic Filing Cabinet”
environment, management information is
associated with these “objects”
Document objects that leave the sphere of
control are no longer managed
Chapter Title
Section Title
Chapter Title
Section Title
1
1
Chapter Title
Section Title Chapter Title
Section Title
1
1
Sphere of Control
18. (1995)
Full-Text Retrieval
Create external indices of the textual content
of a document
Various text indexing algorithms are used to
support searches by word, by text string,
proximity, exclusion and so on
Useful but imprecise as document volume
increases
New technologies arising to improve search
precision (lexicon-based, links to metadata)
19. (1995)
Word Processors
Evolving to include basic management
information (profiles)
Evolving to include template structures
(document types)
Management and structural information only
accessible through Word Processor
application (directly or via API)
These new Word Processing features are
not generally used
20. (1995)
Proprietary Documents
The basic problem is that traditional
documents are produced and maintained in
a proprietary and non-intelligent format
Electronic Documents are simply paper
documents in a more reproducible form
Electronic Documents are printed for use
People retain and use hardcopy “files”
New Applications still assume a static
environment and single format use
21. (1995)
Proprietary Formats
Word Processing applications offer an
enhanced implementation of the typewriter,
the copy editor and the typesetter
Word Processing applications
Add formatting instructions to text
Execute formatting instructions to produce an output
(operating system and printer interface)
Formatting Instructions are specific to the
application that created them and the
platform on which they were created
22. (1995)
Procedural Markup
Processing Instructions
12 pt. bold Helvetica
Chapter Title 10 pt. bold Helvetica
Section Title
8 pt. Times
on 10 pt. leading
8 pt. Times
on 10 pt. leading
7 pt. Helvetica bold
1
23. (1995)
Proprietary Markup
Typical of Word Processors
Position
[Center][Und On]SGML[Und Off][Hrt]
[Hrt] Style
[Font: Helvetica 10pt]
[Indent]Introduction[Hrt]
[Hrt]
[Font: Times Roman 8pt]
[Tab]Someday [Italic On]information
[Italic Off] will be free.[Hrt]
Font
24. (1995)
Binary Storage Formats
Highly Proprietary and
Optimized for Performance
ÿWPC-$
ûÿ 2 B ÿÿH W HP LaserJet!
Z - #| x
cpi) Courier 12pt (10cpi) Courier 12pt (10cpi) (Bold) CG Times (WN)
(Italic) ÿÿÿÿÿÿÿÿÿÿÿÿÿÿHP LaserJet
III HPLASIII.WRS Û x -Œ
@É ‡Ï ,È ,,4Y-œJX@Ð ÐÓ USCE Óûÿ 2 Ø
ÿÿ1 O ÿÿ… € ÿÿ R ÿÿ Ÿ Courier 12pt (10cpi) Courier 12pt
(10cpi) (Bold) CG Times (WN) (Italic) CG Times (WN) (Bold Italic) Univers (WN) Univers (WN)
QX˜þþþþþþþÿÿÿÿÿÿÿÿþÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿûÿ 2 _
@
ÿÿd J @ ® ÿÿq î
‚ ÿÿÿÿ5ÿÿ…ÿÿûÿÿÿÿÿÿ@ÿÿÿÿÿÿ^;C`cc±›CCCc±CCCCccccccccccCCDZÇc±zz
…zrCY…o¦…zzcoz¦zooCCCcccccYcY7cc77Y7ccccMM7cY…YYMYcYc± ;; !cc
c Rc c c zczczczczc±……YzYzYzYzYC7C7C7C7…c•c•c•c•c•c•c•c•c;Yzc•c•c
coYczczczczc…Y …Y…c zczczccc cccccccccc Y …Yo7 oR
…c …c •c;;zM zRcM;;N; ccCccc ;cc±±cF ccc±F CC ;;;;;; ;;;
; ;; ; CFtC±nn ± ± ÅyyÑ
2 co ±7¥ c Ÿ Å Ñ ¥ ™™™
25. (1995)
Proprietary Documents
Are proprietary to the originating software
Limit or obstruct cross-platform interchange
Are non-intelligent
provide no consistent mechanism to determine
document context, content, or structure
provide no means to enhance automation
Support only one output rendering (print)
Will become obsolete
Information in an obsolete format
is itself obsolete!
26. (1995)
Portability Problems
Paper remains the format for
Document Interchange
Chapter Title
Section Title
1
Chapter Title
Section Title
1
Chapter Title
Section Title
1
27. (1995)
Low Document Intelligence
Marginal Automated Support
for Business Processes
Lackof Document Intelligence prevents
computers from providing effective
document management or workflow support
Paper remains the working medium
Chapter Title
Section Title Approval
1
Review
28. (1995)
Single Output Formats
Create Additional Costs
Conversion $
CD ROM
WP Printed
Documents
Proprietary
Formatting
WWW
Conversion $
Database
Conversion $
29. (1995)
Obsolescence
Information must survive when
Products become obsolete
Where are they now?
Multimate Mass-11
WPS Plus WPS-8
Display Write CPT
Lotus Manuscript Word-11
Lanier NBI Legend
Wang Xywrite
30. (1995)
Summary
Traditional computing technology and
management practices are failing to cope
with the increasing volume of documents
Non-Intelligent, Proprietary document
formatting restricts document manageability,
portability, utility, quality, affordability,
suitability for multi-format publishing, and
longevity.
Business is therefore conducted in paper!
31. (1995)
Are your information assets
frozen in Proprietary Formats?