XML 101 presentation by Bill Kasdorf of Apex

2,378 views

Published on

Presentation by Bill Kasdorf of Apex, original source is here: http://www.apexcovantage.com/KnowledgeCorner/XML%20101%20(Rev%201.0).pdf

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,378
On SlideShare
0
From Embeds
0
Number of Embeds
182
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

XML 101 presentation by Bill Kasdorf of Apex

  1. 1. XML 101 It’s Not Just Markup Anymore Bill Kasdorf Vice President, Apex Publishing, LLC General Editor, The Columbia Guide to Digital Publishing
  2. 2. XML: Extensible Markup Language XML Extensible: Designed to adapt to various • Kinds of documents • Modes of publication • Styles of presentation • Patterns of access and use
  3. 3. XML: Extensible Markup Language XML Markup: Tagging a document to provide • Semantic information • Structural information • Formatting information • Supplemental information
  4. 4. XML: Extensible Markup Language XML Language: A formal way to express markup Not a set of tags or a vocabulary, but an agreed-upon way to express a given vocabulary or tag set
  5. 5. 1 Chapter Title Here’s some text Chapter Author with no markup. Author Identification Here’s some text at the beginning of this chapter. Let’s make one more line’s worth. Level One Subhead Here’s some more text. This author’s a pretty nice girl, but she doesn’t have a lot to say. Level Two Subhead The end.
  6. 6. 1 A typical MS implies Chapter Title markup . . . Chapter Author Author Identification Here’s some text at the beginning of this chapter. Let’s make one more line’s worth. Level One Subhead Here’s some more text. This author’s a pretty nice girl, but she doesn’t have a lot to say. Level Two Subhead The end.
  7. 7. CN 1 CT Chapter Title CA Chapter Author Author Identification NI Here’s some text at the beginning of this chapter. Let’s make one more line’s worth. H1 Level One Subhead NI Here’s some more text. This author’s a pretty nice girl, but she doesn’t have a lot to say. H2 Level Two Subhead “Editorial” markup . . . IT The end.
  8. 8. CN 1 <CN>1</CN> “Well Formed” (but <CT>Chapter Title</CT> CT Chapter Title <CA>Chapter Author not very good) XML CA Chapter Author Identification</ITAL></CA> <ITAL>Author Author Identification <NI>Here’s some text at the beginning of this NI Here’s some Let’s makethis chapter. Let’s make one more chapter. text at the beginning of one more line’s worth. line’s worth. </NI> H1 Level One Subhead <H1>Level One Subhead<H1> NI Here’s some more text. Thissome more text. This author’s <NI>Here’s author doesn’t have much to say. Blah, blah, <ITAL>a pretty nice girl</ITAL>, but she doesn’t have blah. Needs an editor. H2 Level Twosay.</NI> a lot to Subhead <H2>Level Two Subhead</H2> IT <IT>The.end.</IT> More text . .
  9. 9. <CN>1</CN> <CT>Chapter Title</CT> Structural <CA>Chapter Author markup <ITAL>Author Identification</ITAL></CA> <NI>Here’s some text at the beginning of this chapter. Let’s make one more line’s worth. </NI> <H1>Level One Subhead<H1> <NI>Here’s some more text. This author’s <ITAL>a pretty nice girl</ITAL>, but she doesn’t have a lot to say.</NI> <H2>Level Two Subhead</H2> <IT>The end.</IT>
  10. 10. <CN>1</CN> <CT>Chapter Title</CT> Presentational <CA>Chapter Author markup <ITAL>Author Identification</ITAL></CA> <NI>Here’s some text at the beginning of this chapter. Let’s make one more line’s worth. </NI> <H1>Level One Subhead<H1> <NI>Here’s some more text. This author’s <ITAL>a pretty nice girl</ITAL>, but she doesn’t have a lot to say.</NI> <H2>Level Two Subhead</H2> <IT>The end.</IT>
  11. 11. <CN>1</CN> <CT>Chapter Title</CT> Semantic <CA>Chapter Author markup <ITAL>Author Identification</ITAL></CA> <NI>Here’s some text at the beginning of this chapter. Let’s make one more line’s worth. </NI> <H1>Level One Subhead<H1> <NI>Here’s some more text. This author’s <ITAL>a pretty nice girl</ITAL>, but she doesn’t have a lot to say.</NI> <H2>Level Two Subhead</H2> <IT>The end.</IT>
  12. 12. XML: Extensible Markup Language • Tells what each element is —E.g., “chapter title,” not “18' Bulmer caps” • Tags where each element starts & ends —Unambiguous; must be properly nested • Defines attributes of each element —E.g., brand name vs. generic drug names • Defines relationship between elements —E.g., “H1 subhead must precede H2 subhead”
  13. 13. XML: Not Just for Markup Anymore Metadata—Making content discoverable • Sales, marketing, and rights information • ONIX metadata for booksellers • CrossRef for journals • DOI (for both books and journals) • Subject classifications, metadata Beginning to be integrated into ed/prod workflow
  14. 14. XML: Not Just for Markup Anymore Metadata—Making your content work • Links are a given (internal and external) to references, tables, figures, etc. • Keywords, taxonomies/controlled vocabularies • Metadata to track revision history, administrative information, editorial and production notes, etc. Semantic markup and rich metadata are essential for optimizing use of your content
  15. 15. <cup_document> Example of a header <cup_header> <cupid>segel-_811404</cupid> for a scholarly book <author>Segel, Harold B</author> <author_last_name>Segel</author_last_name> <title>The Columbia Guide to the Literatures of Eastern Europe Since 1945</title> <subtitle></subtitle> <edition></edition> <isbn>978-0-231-11404-2</isbn> <pub_date>4/15/2003</pub_date> <cup_subject_1>European</cup_subject_1> <cup_subject_2></cup_subject_2> <cup_subject_code_1>233</cup_subject_code_1> <cup_subject_code_2></cup_subject_code_2> <bisac_subject_1>LITERARY CRITICISM / European / General</bisac_subject_1> <bisac_subject_2></bisac_subject_2> <bisac_code_1>LIT004130</bisac_code_1> <bisac_code_2>LIT004130</bisac_code_2> <bisac_codes>LIT004130; LIT004130</bisac_codes> <price_dollars>105</price_dollars> <page_count>512</page_count> <market>World</market> <season>Spring 2008</season> <press>CUP</press> <publisher>Columbia University Press</publisher> <figures></figures> <rights>All rights in all media (now or hereafter known): CUP</rights> <copy_text>For nearly half a century, the Iron Curtain obscured from Western eyes a vital group of national and regional writers. Seen as a whole, the literatures of Eastern Europe during the second half of the twentieth century are extraordinarily rich, and in recent years many Eastern European novelists, poets, and playwrights have attracted wider attention and broader publication in the West. And yet no reference work, embracing all the countries of this region, including the former East Germany, has brought synoptic analysis to bear on these literatures—until now.</copy_text> <table_of_contents>Preface Chronology of Major Political Events, 1944-2001 Journals, Newspapers, and Other Periodical Literature Note on Orthography, Transliteration, and Titles Introduction: The Literature of Eastern Europe from 1945 to the Present Authors A-Z Select- ed Bibliography Author Index</table_of_contents> <cup_css_link>http://www.columbia.edu/cu/cup/cup_dtd/cup_css_1-1.css</cup_css_link> </cup_header>
  16. 16. DTD: Document Type Definition DTD Document SGML and XML originated from the need to mark up text in documents, which is not typically structured like information in databases, which have schemas. . . .
  17. 17. XML: Extensible Markup Language DTD Type For particular types of documents, which share characteristics like • Similar structure • Related semantics • Common metadata . . .
  18. 18. XML: Extensible Markup Language DTD Definition Formal, explicit, rigorous . . . So complete and precise even a computer can understand it . . .
  19. 19. XML: Extensible Markup Language DTDs are created for: • Archiving • Interchange • Print production • Online publishing or E-books • Multipurposing (“Slicing & Dicing”) One DTD rarely does it all; DTDs need to be customized or adapted for what they’re for.
  20. 20. XML: Extensible Markup Language Why start with a “standard” DTD? • Saves “reinventing the wheel” • Benefit from broad base of experience, evolution • Expedites interchange to use a known model • Vendors are already familiar with it • Some tools are optimized for certain standards • A standard can be mandated in a given industry
  21. 21. XML: Extensible Markup Language Why customize a “standard” DTD? • Some are too simplistic or generic • Many are much more complex than you need • Needs and capabilities change over time: —Requirements of customers, vendors, partners —Capabilities of software, tools, and staff Typically, you start with a subset of a standard and then add features you need for your situation.
  22. 22. XML: Extensible Markup Language Need to transform to “output” XML • XHTML for online publication • OEB PS (soon OPS/OPF) for e-books • DTBook for accessibility • Models required by business partners, e.g., ACLS History E-Book XML Your XML must be able to produce these outputs, but they’re rarely sufficient for production/archiving.
  23. 23. XML: Extensible Markup Language DTDs must accommodate: • Needs of customers and partners • Transformation to other needed models • Capabilities of: —Staff —Vendors —Available tools and technologies DTDs constantly evolve as these factors change.
  24. 24. DTDs can be strict . . .
  25. 25. ISO 12083 The Mother Superior of DTDs . . .
  26. 26. The ISO 12083 DTD • Brilliant, idealistic, based on theory • Creation of one individual, Eric van Herwijnen • Created before the Web, before XML Most big STM journal DTDs are still 12083-based
  27. 27. . . . or permissive . . .
  28. 28. TEI The “Let One Thousand Flowers Bloom” DTD . . .
  29. 29. TEI: The Text Encoding Initiative • Rich, expansive, accommodating • Collaborative creation: TEI Consortium • Created for scholarship, not publication • Enormously useful resource, but: —Full suite is overwhelming, not “a DTD” —TEI Lite is too simplistic Most humanities scholarship is TEI-based
  30. 30. <?xml version="1.0" encoding="us-ascii"?> <!DOCTYPE TEI SYSTEM "tei_all.dtd"> <!-- TEIp5tei-p5-exemplars-0.7xmldtdtei_all.dtd 2007-05-26 Release --> <TEI> <teiHeader type="text"> <fileDesc> Our text as TEI . . . <titleStmt> <title type="main">Chapter Title</title> <author>Chapter Author</author> </titleStmt> <editionStmt> <edition> The header <date value="2007">2007</date> goes on for </edition> two pages... </editionStmt> <publicationStmt> <distributor> <address> <addrLine> <name key="Pub" type="organisation">Publisher</name> </addrLine> <addrLine>Address</addrLine> <addrLine> <name type="place">Place</name> </addrLine> <addrLine>Email</addrLine> ...preserving </address> all sorts of useful </distributor> information. <idno type="demo">DEMO_Ch1</idno> <availability status="free"> <p>Public domain</p> </availability>
  31. 31. <publisher>Publisher</publisher> <pubPlace>Place</pubPlace> <date value="2007-06-09">2007-06-09</date> </publicationStmt> <notesStmt> <note>Prototype TEI header</note> </notesStmt> <sourceDesc> <p>Chapter in search of a book.</p> <p> <bibl>Example TEI Chapter 1. Chapter Author.</bibl> </p> </sourceDesc> </fileDesc> <encodingDesc> <editorialDecl> <p>Minimal TEI encoding. Chapter in search of a book.</p> </editorialDecl> <refsDecl> <p>No refs; no IDs assigned.</p> </refsDecl> </encodingDesc> <profileDesc> <langUsage> <language ident="en" usage="100">English.</language> </langUsage> </profileDesc> <revisionDesc> <change> <list> <item><date value="2007-06-09">June 9, 2007</date> Created initial version.</item> </list>
  32. 32. </change> </revisionDesc> </teiHeader> <text> <body> Note separation <div type="Chapter" n="1"> of semantics & <head>Chapter Title</head> formatting <opener> <byline><docAuthor>Chapter Author</docAuthor> <name type="affiliation" rend="italics">Author Identification</name></ byline> </opener> <p>Here’s some text at the beginning of this chapter. Let’s make one more line’s worth.</p> <div type="level-1"> <head>Level One Subhead</head> <p>Here’s some more text. This author’s <hi rend="italics">a pretty nice girl</hi>, but she doesn’t have a lot to say. </p> <div type="level-2"> <head>Level Two Subhead</head> <p>The end.</p> </div> <!-- end of level-2 div --> </div><!-- end of level-1 div --> </div><!-- end of chapter div --> Note nested </body> structure </text> </TEI>
  33. 33. . . . or utilitarian . . .
  34. 34. DocBook The “Crank It Out” DTD . . .
  35. 35. DocBook • Common general-purpose book model • Widely used for technical documents, manuals • Not as often used for scholarly, trade, reference, or textbooks • Vendors and technical writers are often most familiar with DocBook DocBook is often used in structured environments
  36. 36. <?xml version="1.0" encoding="us-ascii"?> <!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN" "http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd"> <chapter label="1"> <chapterinfo> Our text as DocBook <authorgroup> <author> <firstname>Chapter</firstname> <surname>Author</surname> <affiliation> <shortaffil remap="ITAL">Author Identification</shortaffil> <jobtitle></jobtitle><orgname></orgname> </affiliation> </author> </authorgroup> Author’s name </chapterinfo> & affil. generated <title>Chapter Title</title> from metadata <para>Here’s some text at the beginning of this chapter. Let’s make one more line’s worth.</para> <sect1> Context-sensitive <title>Level One Subhead</title> formatting <para>Here’s some more text. This author’s <emphasis remap="ITAL" role="italics">a pretty nice girl</emphasis>, but she doesn’t have a lot to say.</para> <sect2> <title>Level Two Subhead</title> Preserving a <para>The end.</para> record of previous </sect2> markup </sect1> </chapter>
  37. 37. . . . or strike a useful balance . . .
  38. 38. NLM The “Works and Plays Well Together” DTD . . .
  39. 39. The NLM Family of DTDs • Based on a comprehensive study of STM journals • Designed to be customized, adapted • Has a permissive “Archival & Interchange” model and stricter “Publishing” and “Authoring” models • Very widely adopted and well maintained • NLM Book DTD is just a re-tooled journal DTD NLM is the “no-brainer” basis for journal DTDs today
  40. 40. <?xml version="1.0" encoding="us-ascii"?> <!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd"> Our text as NLM <article> Publishing XML <front> <journal-meta> <journal-id journal-id-type="publisher">Publisher</journal-id> <issn>0001-0001</issn> <publisher> <publisher-name>Publisher</publisher-name> </publisher> </journal-meta> This model is for journals! <article-meta> <article-id pub-id-type="other">1</article-id> <title-group> <article-title>Chapter Title</article-title> </title-group> <contrib-group> <contrib contrib-type="author"> <name><surname>Author</surname> <given-names>Chapter</given-names></name> <aff>Author Identification</aff> </contrib> </contrib-group> <pub-date pub-type="pub"> <day>9</day><month>June</month><year>2007</year> </pub-date> <history> <date date-type="received"> <day>9</day>
  41. 41. <month>6</month> <year>2007</year> </date> <date date-type="rev-request"> <day>9</day> <month>6</month> <year>2007</year> </date> </history> <abstract></abstract> </article-meta> </front> <body> <p>Here’s some text at the beginning of this chapter. Let’s make one more line’s worth.</p> <sec id="bid_002"> <title>Level One Subhead</title> <p>Here’s some more text. This author’s <italic>a pretty nice girl</italic>, but she doesn’t have a lot to say.</p> <sec id="bid_003"> <title>Level Two Subhead</title> <p>The end.</p> </sec> </sec> </body> </article>
  42. 42. Our text as NLM Book XML <?xml version="1.0" encoding="us-ascii"?> <!DOCTYPE book-part PUBLIC "-//NLM//DTD Book DTD v2.3 20070202//EN" "book.dtd"> <book-part id="bid_001" book-part-type="chapter" book-part-number="1"> <book-part-meta> <title-group> <title>Chapter Title</title> CN, CT, AU, & </title-group> AFF are ONLY in <contrib-group> the metadata <contrib contrib-type="author"> <name><surname>Author</surname> <given-names>Chapter</given-names></name> <aff>Author Identification</aff> </contrib> </contrib-group> <history> <date date-type="created"> <day>9</day> <month>6</month> <year>2007</year> </date> <date date-type="updated"> <day>9</day> <month>6</month> <year>2007</year> </date> </history> <abstract></abstract> </book-part-meta>
  43. 43. Note that <sec>s are recursive (“nested”) <body> <p>Here’s some text at the beginning of this chapter. Let’s make one more line’s worth.</p> <sec id="bid_002"> <title>Level One Subhead</title> <p>Here’s some more text. This author’s <italic>a pretty nice girl</italic>, but she doesn’t have a lot to say.</p> <sec id="bid_003"> <title>Level Two Subhead</title> <p>The end.</p> </sec> “bid” attributes </sec> uniquely identify a </body> chunk of content; </book-part> not required, but usual & useful
  44. 44. XML: Extensible Markup Language Is XML making publishing complicated?
  45. 45. XML: Extensible Markup Language Is XML making publishing complicated? No, publishing’s inherently complicated.
  46. 46. XML: Extensible Markup Language Is XML making publishing complicated? No, publishing’s inherently complicated. XML is helping to make it easier!
  47. 47. Thanks! Bill Kasdorf Vice President, Apex Publishing bkasdorf@apexcovantage.com +1 734 904 6252

×