Why SGML (Retro Alert 1995)


A presentation developed and delivered in 1995. It was designed to be part of a larger introduction to SGML. It is interesting today because it foregrounds many (if not all - and perhaps a few extra) of the themes being touched upon in discussions of Intelligent Content. It needed to be shared just in case someone thought that this was all new.

  1. 1. (1995) Course document *Module Module title para + Why SGML? figure * ? list * Sub-title The Need for SGML First delivered: 1995 knowledge + information information ... data data +
  2. 2. (1995) What is SGML? SGML stands for the  Standard  Generalized  Markup  Language SGML is an international (ISO) standard  ISO 8879:1986 Information Processing - Text and Office Systems - Standard Generalized Markup Language (SGML)
  3. 3. (1995) What is SGML? Informal Definitions SGML is a system and processing independent means of representing, creating, managing and exchanging information. SGML is an “intelligent markup language” that protects the accessibility, usability, life expectancy and value of information.
  4. 4. (1995) Why SGML? A Meditation on a Paper Clip The paper clip is a low-tech version of hypertext – facilitating the physical association of documents & fragments. Often used in addition to electronic files where such associations cannot be easily shown or enforced.
  5. 5. (1995) SGML was created to better manage documents Publications Training Manuals Specifications Documentation Reports Correspondence Policies Procedures Standards Plans Directives Commentaries Proposals
  6. 6. (1995) Most Information is held in Documents Database Information Document Information 10% 90% IM Budget 90% 10% Allocations
  7. 7. (1995) Structured Database InformationRelational Structure Strict Definitions Limited Access Stable Organizational Boundaries Formalized Processes Limited Flexibility
  8. 8. (1995) Document InformationA Document is a meaningful organization of InformationA Document is meaningful because it is communicated between people to achieve specific goalsA Document combines multiple media types together in an organized, but not strictly predictable, form that people can use
  9. 9. (1995) Document Information Features Wide and Hierarchical Structure Chapter Title Section Title Variable Variable Definitions Access 1 Variable Organizational Multiple Boundaries Dynamic Processes
  10. 10. (1995) Document Information Conclusions Document Information does not fit within the conventional Database paradigm Database Information is organized according to the needs of the Computer Document Information is organized according to the needs of the User Few of the assumptions within the Database Paradigm apply to Documents
  11. 11. (1995) Document Management Technology Today
  12. 12. (1995) Documents and Computers Computers help us create more paper faster Computers help us format printed documents more efficiently and at less cost Computers have not helped with the management consequences
  13. 13. (1995) The Document Explosion The volume of documents is growing exponentially The visibility of document-based transactions is increasing The rise of the Internet and Enterprise Integration dramatically alters the potential user community of a document Documents are becoming more complex, larger and more varied in format
  14. 14. (1995) Management Breakdown TraditionalRecords Management practices and technologies cannot cope with the volume, complexity, or volatility of computer- generated documents The typical response has been to extend the Database paradigm to document information Given currently-used technology, the best that can be done is the “Electronic Filing Cabinet” (old tools made electronic - again)
  15. 15. (1995) What’s Wrong Computers traditionally store documents as “objects” Computers know very little (almost nothing) about these objects  some management information (author, version, date)  little awareness of document content  less awareness of document structure Computers can only associate some information with the objects as the objects have no inherent “intelligence”
  16. 16. (1995) New Technologies Applications have evolved to redress some of these shortcomings “Electronic Filing Cabinets” associate management information with document objects and physically control events Full-Text Retrieval technologies have been used to access Document “Content” Word Processors are used to infer the structure of documents based on format (styles and templates)
  17. 17. (1995) Electronic Filing Cabinets Inan “Electronic Filing Cabinet” environment, management information is associated with these “objects” Document objects that leave the sphere of control are no longer managed Chapter Title Section Title Chapter Title Section Title 1 1 Chapter Title Section Title Chapter Title Section Title 1 1 Sphere of Control
  18. 18. (1995) Full-Text Retrieval Create external indices of the textual content of a document Various text indexing algorithms are used to support searches by word, by text string, proximity, exclusion and so on Useful but imprecise as document volume increases New technologies arising to improve search precision (lexicon-based, links to metadata)
  19. 19. (1995) Word Processors Evolving to include basic management information (profiles) Evolving to include template structures (document types) Management and structural information only accessible through Word Processor application (directly or via API) These new Word Processing features are not generally used
  20. 20. (1995) Proprietary Documents The basic problem is that traditional documents are produced and maintained in a proprietary and non-intelligent format Electronic Documents are simply paper documents in a more reproducible form Electronic Documents are printed for use People retain and use hardcopy “files” New Applications still assume a static environment and single format use
  21. 21. (1995) Proprietary Formats Word Processing applications offer an enhanced implementation of the typewriter, the copy editor and the typesetter Word Processing applications  Add formatting instructions to text  Execute formatting instructions to produce an output (operating system and printer interface) Formatting Instructions are specific to the application that created them and the platform on which they were created
  22. 22. (1995) Procedural Markup Processing Instructions 12 pt. bold Helvetica Chapter Title 10 pt. bold Helvetica Section Title 8 pt. Times on 10 pt. leading 8 pt. Times on 10 pt. leading 7 pt. Helvetica bold 1
  23. 23. (1995) Proprietary Markup Typical of Word Processors Position [Center][Und On]SGML[Und Off][Hrt] [Hrt] Style [Font: Helvetica 10pt] [Indent]Introduction[Hrt] [Hrt] [Font: Times Roman 8pt] [Tab]Someday [Italic On]information [Italic Off] will be free.[Hrt] Font
  24. 24. (1995) Binary Storage Formats Highly Proprietary and Optimized for Performance ÿWPC-$ ûÿ 2 B ÿÿH W HP LaserJet! Z - #| x cpi) Courier 12pt (10cpi) Courier 12pt (10cpi) (Bold) CG Times (WN) (Italic) ÿÿÿÿÿÿÿÿÿÿÿÿÿÿHP LaserJet III HPLASIII.WRS Û x -Œ @É ‡Ï ,È ,,4Y-œJX@Ð ÐÓ USCE Óûÿ 2 Ø ÿÿ1 O ÿÿ… € ÿÿ R ÿÿ Ÿ Courier 12pt (10cpi) Courier 12pt (10cpi) (Bold) CG Times (WN) (Italic) CG Times (WN) (Bold Italic) Univers (WN) Univers (WN) QX˜þþþþþþþÿÿÿÿÿÿÿÿþÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿûÿ 2 _ @ ÿÿd J @ ® ÿÿq î ‚ ÿÿÿÿ5ÿÿ…ÿÿûÿÿÿÿÿÿ@ÿÿÿÿÿÿ^;C`cc±›CCCc±CCCCccccccccccCCDZÇc±zz …zrCY…o¦…zzcoz¦zooCCCcccccYcY7cc77Y7ccccMM7cY…YYMYcYc± ;; !cc c Rc c c zczczczczc±……YzYzYzYzYC7C7C7C7…c•c•c•c•c•c•c•c•c;Yzc•c•c coYczczczczc…Y …Y…c zczczccc cccccccccc Y …Yo7 oR …c …c •c;;zM zRcM;;N; ccCccc ;cc±±cF ccc±F CC ;;;;;; ;;; ; ;; ; CFtC±nn ± ± ÅyyÑ 2 co ±7¥ c Ÿ Å Ñ ¥ ™™™
  25. 25. (1995) Proprietary DocumentsAre proprietary to the originating softwareLimit or obstruct cross-platform interchangeAre non-intelligent  provide no consistent mechanism to determine document context, content, or structure  provide no means to enhance automationSupport only one output rendering (print)Will become obsolete  Information in an obsolete format is itself obsolete!
  26. 26. (1995) Portability Problems Paper remains the format for Document Interchange Chapter Title Section Title 1 Chapter Title Section Title 1 Chapter Title Section Title 1
  27. 27. (1995) Low Document Intelligence Marginal Automated Support for Business Processes Lackof Document Intelligence prevents computers from providing effective document management or workflow support Paper remains the working medium Chapter Title Section Title Approval 1 Review
  28. 28. (1995) Single Output Formats Create Additional Costs Conversion $ CD ROM WP Printed Documents Proprietary Formatting WWW Conversion $ Database Conversion $
  29. 29. (1995) Obsolescence Information must survive when Products become obsolete Where are they now?  Multimate  Mass-11  WPS Plus  WPS-8  Display Write  CPT  Lotus Manuscript  Word-11  Lanier  NBI Legend  Wang  Xywrite
  30. 30. (1995) Summary Traditional computing technology and management practices are failing to cope with the increasing volume of documents Non-Intelligent, Proprietary document formatting restricts document manageability, portability, utility, quality, affordability, suitability for multi-format publishing, and longevity. Business is therefore conducted in paper!
  31. 31. (1995) Are your information assets frozen in Proprietary Formats?