XML in the Wilderness


Published on

Presented by Joe Gollner at Documentation and Training West, May 6-9, 2008 in Vancouver, BC

The rapid emergence of the Web 2.0 has had a number of impacts. One that might be less obvious is that the Web 2.0 phenomenon showcases a number of important facts about the nature of content and the lessons we should attend to when designing content technologies. One of these lessons is the central importance of XML. There are also lessons for XML itself to be found in the Web 2.0 phenomenon and these ultimately point back to the original purpose of XML.

Published in: Technology, News & Politics
  • Be the first to comment

  • Be the first to like this

XML in the Wilderness

  1. 1. XML in the Wilderness Joe Gollner Vice President Stilo International
  2. 2. Patron Saint of Content Management Saint Jerome - Caravaggio (1605) Saint Jerome (347 – 420 AD) Patron Saint of Libraries, Librarians, Archivists and Encyclopaedists St Jerome in his Study - Antonello da Messina (1460)
  3. 3. XML in the Wilderness <ul><li>A Little Background </li></ul><ul><li>A Brief History of Content Technologies </li></ul><ul><li>What about Content? </li></ul><ul><li>Pointing towards the Hypertext Horizon </li></ul>St Jerome in the Wilderness Albrecht Dürer (1495)
  4. 6. Markup and the Curious World of Content 1705 Inter-Textuality Reference | Reuse | Republish | Ridicule Jonathan Swift John Dunton
  5. 7. A Brief History of Content Technologies <ul><li>Where did content technologies come from? </li></ul><ul><li>What lessons can we take from this history? </li></ul><ul><li>Does it help us see XML differently? </li></ul><ul><li>Does this shed light on how we might create and share content in the future? </li></ul>
  6. 8. In the Beginning <ul><li>… were table(t)s… </li></ul>… and books…
  7. 9. Memex Adapting to the Exponential Growth in Knowledge Resources 1940 1960 1980 2000
  8. 10. Some “Provocative” Definitions <ul><li>Data </li></ul><ul><ul><li>Data is the meaningful representation of experience </li></ul></ul><ul><li>Information </li></ul><ul><ul><li>Information is the meaningful organization of data communicated in a specific context with the purpose of informing others </li></ul></ul><ul><li>Knowledge </li></ul><ul><ul><li>Knowledge is the meaningful organization of information, expressing an evolving understanding of a subject and establishing a basis for judgment and the potential for action. </li></ul></ul><ul><li>Content </li></ul><ul><ul><li>What is “contained” and “communicated” </li></ul></ul><ul><ul><li>Accommodates Data , Information , and Knowledge </li></ul></ul>
  9. 11. The Knowledge Dynamic The persistence of content is what has allowed this dynamic to accelerate at an exponential rate
  10. 12. Knowledge Application with Technology Leveraging Knowledge through Automation The modern organization cannot survive without automation as a means to encapsulate & leverage knowledge 1940 1960 1980 2000
  11. 13. Augmenting Human Intelligence Leveraging Automation to Assist Personal and Team Productivity Douglas Engelbart Workstation - 1966 Workstation - 1968 1940 1960 1980 2000
  12. 14. The Internet Connecting Organizations to form Knowledge Enterprises Enterprise: bold, imaginative undertaking enabled by the sharing of knowledge 1940 1960 1980 2000
  13. 15. The Vision of Hyper- Text Envisioning content forms that reflect how people think and collaborate Theodor (Ted) Holm Nelson 1940 1960 1980 2000
  14. 17. Proprietary Content Formats Limiting the Interchangeability and Usefulness of all data types
  15. 18. CALS – Tackling the Interchange Problem GOAL Supplier and Client STDS INTERIM SOLUTION Supplier Client Supplier PROBLEM Client 1940 1960 1980 2000
  16. 19. Standard Generalized Markup Language (SGML) 1940 1960 1980 2000
  17. 20. SGML <ul><li>SGML </li></ul><ul><ul><li>Reflected human communication patterns </li></ul></ul><ul><ul><li>Provided substantial flexibility </li></ul></ul><ul><ul><li>Automated processing was “difficult” </li></ul></ul><ul><ul><li>Adopted in documentation-intensive sectors </li></ul></ul><ul><ul><ul><li>Military, Aerospace and Commercial Publishing </li></ul></ul></ul><ul><li>The Key Innovation of SGML: </li></ul><ul><ul><ul><li>naming something ( understanding ) is different than describing what should be done with it ( behaviour ) </li></ul></ul></ul><ul><ul><ul><li>naming something is the important part </li></ul></ul></ul><ul><ul><ul><ul><li>naming something and defining its behaviour benefits from sophistication </li></ul></ul></ul></ul>Charles Goldfarb The Father of SGML
  18. 21. The World Wide Web 1940 1960 1980 2000 Where there’s a Will there’s a Way
  19. 22. World Wide Web – The Success of Simplicity <ul><li>Original Objective (1989) </li></ul><ul><ul><li>“ to allow information sharing within internationally dispersed teams” </li></ul></ul><ul><ul><li>HTML: a simple use of a complex standard </li></ul></ul><ul><li>The Key Innovation of the Web: </li></ul><ul><ul><ul><li>deciding what to do ( intention ) is different than determining how it should be done ( execution ) </li></ul></ul></ul><ul><ul><ul><li>deciding what to do is the important part </li></ul></ul></ul><ul><ul><ul><ul><li>communicating an intention and successfully executing it benefits from simplicity </li></ul></ul></ul></ul>Sir Tim Berners-Lee The Father of the Web
  20. 23. Extensible Markup Language (XML) 1940 1960 1980 2000 Source: Microsoft
  21. 24. The Key Innovations of XML <ul><li>The Key Innovations of XML: </li></ul><ul><ul><li>Fusing the innovations of SGML and the Web </li></ul></ul><ul><ul><ul><li>naming something ( understanding ) is different than describing what should be done with it ( behaviour ) </li></ul></ul></ul><ul><ul><ul><li>deciding what to do ( intention ) is different than determining how it should be done ( execution ) </li></ul></ul></ul><ul><li>XML exhibits an unresolved tension between </li></ul><ul><ul><li>Sophistication </li></ul></ul><ul><ul><ul><li>to meet the needs of application integration </li></ul></ul></ul><ul><ul><li>Simplicity </li></ul></ul><ul><ul><ul><li>to meet the needs of people interacting with technology </li></ul></ul></ul>Yuri Rubinsky The Spiritual Father of XML
  22. 25. XML <ul><li>The driving focus for XML has been facilitating a revolution in the way technology applications are designed, developed and deployed </li></ul><ul><li>This addressed the failure of preceding approaches to adapt to genuinely open systems </li></ul><ul><li>This focus explains a great deal about the character of XML </li></ul>
  23. 26. Web 2.0 – The Social Web 1940 1960 1980 2000 2010 The second revolution in web adoption Emergent consequence of integration
  24. 27. Web 2.0 – All About Engagement <ul><li>Web 2.0 has been called “The Participatory Web” </li></ul><ul><li>Key technical elements include: </li></ul><ul><ul><li>AJAX – Asynchronous JavaScript and XML </li></ul></ul><ul><ul><li>simple syndication protocols – RSS / ATOM </li></ul></ul><ul><ul><li>simplified web services – Aggregator APIs </li></ul></ul><ul><ul><li>Folksonomies – collaborative tagging </li></ul></ul><ul><ul><li>Processable content – XHTML / CSS / Microformats </li></ul></ul><ul><ul><li>Addressable, traceable, dynamic, collaborative content – wiki / blog </li></ul></ul><ul><li>Much closer to the original idea behind the ‘web’ </li></ul><ul><li>The centrality of XML in making this possible is often missed </li></ul>
  25. 28. What About Content?
  26. 29. What XML has meant for Content Authors <ul><li>Authoring in XML exhibits two contradictory challenges </li></ul><ul><ul><li>Too much markup </li></ul></ul><ul><ul><ul><li>Gets in the way of creating content </li></ul></ul></ul><ul><ul><ul><li>Forces a reliance on unfamiliar tools </li></ul></ul></ul><ul><ul><ul><li>Adds a level of technical complexity to what is a creative task </li></ul></ul></ul><ul><ul><li>Not enough markup </li></ul></ul><ul><ul><ul><li>Some content demands precision </li></ul></ul></ul><ul><ul><ul><li>Authors need clear guidance and useful feedback in order to satisfy this demand </li></ul></ul></ul><ul><ul><ul><li>As more content is delivered to applications, this is more common </li></ul></ul></ul>
  27. 30. What XML has meant for Information Architects <ul><li>Information Modeling </li></ul><ul><ul><li>Syntax stabilization (restriction) </li></ul></ul><ul><ul><li>Vocabulary definition constraints </li></ul></ul><ul><ul><ul><li>Models mirror communication patterns less naturally </li></ul></ul></ul><ul><ul><ul><li>Sought simplicity & processability </li></ul></ul></ul><ul><ul><li>New language for declaring rules </li></ul></ul><ul><ul><ul><li>XML Schema (data constraints) </li></ul></ul></ul><ul><li>Implementation </li></ul><ul><ul><li>Specific constraints on markup use </li></ul></ul><ul><ul><li>Encourages instance verbosity </li></ul></ul><ul><ul><li>Many complexities reintroduced </li></ul></ul><ul><ul><li>Application challenges remained </li></ul></ul>
  28. 31. What XML has meant for Publishers Authoring with Structured Markup Multi-Format Automatic Publishing XML
  29. 32. What XML has really meant for Publishers Continuous Collaboration Persistent Multi-Channel Interaction XML
  30. 33. Content Happens <ul><li>What is the nature of content really ? </li></ul><ul><ul><li>Is it just the physical trace of an expression? </li></ul></ul><ul><ul><li>Is it always new and original? </li></ul></ul><ul><ul><ul><li>No - Not really </li></ul></ul></ul><ul><ul><li>Or does content mix what previously existed with something new? </li></ul></ul><ul><ul><ul><li>Yes – More Likely </li></ul></ul></ul><ul><li>Maybe content </li></ul><ul><ul><li>is fundamentally synthetic (an aggregate or composite) </li></ul></ul><ul><ul><li>accumulates over time and evolves continuously through use </li></ul></ul><ul><ul><li>is far from static and follows a path that is not predictable </li></ul></ul><ul><li>Maybe content is more of a process than a product? </li></ul>
  31. 34. Embedded Markup Considered Harmful (1997) <ul><li>Ted Nelson </li></ul><ul><ul><li>Has been a vocal critic of structured markup </li></ul></ul><ul><ul><li>Sees it as an impediment & an intrusion </li></ul></ul><ul><li>Primary Objections to Embedded Markup </li></ul><ul><ul><li>Complicates editing & change tracking </li></ul></ul><ul><ul><li>Impedes transpublishing </li></ul></ul><ul><ul><ul><li>Reuse must be unimpeded </li></ul></ul></ul><ul><ul><ul><li>Reuse often introduces changes </li></ul></ul></ul><ul><ul><li>Enforces unnatural & constraining structures on communication </li></ul></ul><ul><li>What is needed would accommodate: </li></ul><ul><ul><ul><li>The “anarchic and overlapping relations” </li></ul></ul></ul><ul><ul><ul><li>“ deep version management” </li></ul></ul></ul><ul><ul><ul><li>the “vast interconnectedness of ideas” ... Hypertext </li></ul></ul></ul>Theodor (Ted) Holm Nelson
  32. 35. Something on the Hypertext Horizon Online Access Wireless Access Topics Print Manuals PDF Customers Call Centre Staff Maps Products Repositories Sources Darwin Information Typing Architecture (DITA) Emerging out of the relatively mundane world of software and hardware documentation. An assemblage of “SGML Dirty Tricks”…
  33. 36. The Tao of DITA: Handling Variability & Change Introduces and continues to evolve a framework for handling content and its challenges more gracefully. Application layers are given a chance .
  34. 37. DITA enables an interesting mix of practices <ul><li>Promotes simplified markup for most content </li></ul><ul><li>Allows specialization to be introduced </li></ul><ul><ul><li>When more detailed markup guidelines help authors </li></ul></ul><ul><ul><li>When precise markup is essential for downstream applications </li></ul></ul><ul><li>Is introducing more sophisticated reuse behaviour </li></ul>
  35. 38. The Emergence of Content Technologies <ul><li>The initial focus of XML has not been on content </li></ul><ul><li>DITA represents a serious effort to direct attention towards the challenges of content </li></ul><ul><li>The appearance of Web 2.0 is a sign that the infrastructure is maturing in its content handling </li></ul><ul><ul><li>Simplified interfaces </li></ul></ul><ul><ul><li>Dynamic version management </li></ul></ul><ul><ul><li>Instant global interaction </li></ul></ul><ul><li>Hypertext is becoming possible </li></ul>
  36. 39. XML Returns from the Wilderness <ul><li>Saint Jerome </li></ul><ul><ul><li>Headed into isolation in the Syrian Desert </li></ul></ul><ul><ul><li>Learned Hebrew </li></ul></ul><ul><ul><li>Was able to create a new Latin translation of the bible (Vulgate) </li></ul></ul><ul><ul><li>Established the standard reference </li></ul></ul><ul><li>XML </li></ul><ul><ul><li>The fruits of success in application integration are being seen (Web 2.0) </li></ul></ul><ul><ul><li>DITA shows promise </li></ul></ul><ul><ul><ul><li>Addressing key content challenges </li></ul></ul></ul><ul><ul><ul><li>Leveraging more of the SGML legacy </li></ul></ul></ul><ul><ul><ul><li>Creating industry momentum </li></ul></ul></ul>St Jerome in his Study Albrecht Dürer (1492)
  37. 40. Conclusion We can start to handle & leverage content in its true hypertext form -- for the first time Joe Gollner VP e-Publishing Solutions Stilo International [email_address]