Intro to XML in libraries

1,248 views

Published on

Explanation of XML, how it is processed, and common examples of its application in libraries

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,248
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
15
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Intro to XML in libraries

  1. 1. Intro to XML in librariesKyle Banerjeebanerjek@ohsu.edu
  2. 2. Why do libraries use XML?• Easy to share information• Strict syntax and human readability make iteasy to work with• Create any structure you need• Many tools for all operating systems• Schema support• Namespace support2
  3. 3. Disadvantages• Requires an external application• Verbose• Inefficient• Picky – everything stops when data is notwell formed• No intrinsic data types3
  4. 4. Encoded Archival Description (EAD)4
  5. 5. Open Archives Initiative Protocol for MetadataHarvesting(OAI-PMH)5
  6. 6. NISO Circulation Interchange Protocol(NCIP)6<!DOCTYPE NCIPMessage PUBLIC "-//NISO//NCIP DTD Version 1.0//EN""http://www.niso.org/ncip/v1_0/imp1/dtd/ncip_v1_0.dtd"><NCIPMessage version="http://www.niso.org/ncip/v1_0/imp1/dtd/ncip_v1_0.dtd"><LookupUserResponse><ResponseHeader><FromAgencyId><UniqueAgencyId> <Scheme>http://136.181.125.166:6601/IRCIRCD?target=get_scheme_values&amp;scheme=UniqueAgencyId</Scheme><Value>zv229</Value></UniqueAgencyId></FromAgencyId><ToAgencyId><UniqueAgencyId> <Scheme>http://136.181.125.166:6601/IRCIRCD?target=get_scheme_values&amp;scheme=UniqueAgencyId</Scheme><Value>melir</Value></UniqueAgencyId></ToAgencyId></ResponseHeader>… [rest of entry deleted]
  7. 7. MARCXML<record xmlns="http://www.loc.gov/MARC21/slim"><leader>00000cas a2200000 4500</leader><controlfield tag="001">1798471</controlfield><controlfield tag="008">750909d19722001sw qx p ob 0 a0eng</controlfield><datafield ind1=" " ind2=" " tag="010"><subfield code="a">75640778</subfield></datafield><datafield ind1=" " ind2=" " tag="022"><subfield code="a">0105-0397</subfield><subfield code="l">0105-0397</subfield><subfield code="2">1</subfield></datafield>…[rest of record deleted]7
  8. 8. Dublin Core (DC)<qdc:qualifieddc xmlns:qdc="http://epubs.cclrc.ac.uk/xmlns/qdc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://epubs.cclrc.ac.uk/xmlns/qdc/ http://epubs.cclrc.ac.uk/xsd/qdc.xsd"><dc:creator>Huntington, C. L.</dc:creator><dc:title>Horseshoe Bend near Wolf Creek, Southern Pacific Railroad, Shasta Route</dc:title><dc:date>1908-00-00</dc:date><dc:date>1900-1909</dc:date><dc:subject>Railroad tracks; Forests; Railroad locomotives</dc:subject><dc:coverage>Josephine County (Ore.)</dc:coverage><dc:type>Image</dc:type><dc:source>Postcards</dc:source><dc:source>Gerald W. Williams Collection</dc:source><dc:title>Umpqua Album</dc:title><dcterms:isPartOf>WilliamsG:Horseshoe Bend</dcterms:isPartOf>..[rest of record deleted] 8
  9. 9. Search / Retrieve via URL (SRU)9
  10. 10. And enough other stuff to blow your mind• RDF• Darwin Core• VRA Core• MODS10• MADS• PBCore• Webapps and other cool stuff
  11. 11. XML is not a language• It’s a grammar that specifies a structure for exchanging information • XML cannot do anything by itself• When most people talk about XML, they are actually referring to a family of related technologies• Don’t confuse XML (a data structure standard) with content standards such as AACR2R/RDA, DACS, LCNAF, LCSH, MeSH, and AAT11
  12. 12. Interpreting XML• Common methods are Document Object Model (DOM) and Simple API for XML (SAX)• DOM is more common and far more powerful. Best for smaller files and documents• SAX is much faster and requires much less memory. Best for large files12
  13. 13. XML Document<?xml version = “1.0”?><inventory><book><title>My Dog</title></book><book><title>My Cat</title></book></inventory>DOM (tree structure) SAX (linear events)Start documentStart element: inventoryStart element: bookStart element: titleCharacters: My DogEnd element: titleEnd element: bookStart element: bookStart element: titleCharacters: My CatEnd element: titleEnd element: bookEnd documentDOM vs. SAX13inventorybook booktitle titleMyDogMyCat
  14. 14. DOM basics• Platform independent way to representand interact with XML documents• All nodes and relationships are accessible• Great for generating and displayingdocuments (e.g. EAD), interpretingmessages (e.g. NCIP, OAI-PMH)• Must load entire document into memory –terrible for transferring millions of records14
  15. 15. SAX (Simple API for XML)• Not formally defined• Relies on events – detects beginnings/endsof elements, attributes, etc.• Does not require loading file into memory• Great for extracting info from large files butawkward for interpreting documents15
  16. 16. XML Document<?xml version = “1.0”?><inventory><book><title>My Dog</title></book><book><title>My Cat</title></book></inventory>JSON{“inventory”: {“book”: {“title”: “My Dog”},“book”: {“title”: “My Cat”}}}DelimitedInventoryCommon Alternatives to XML16Item type Titlebook My Dogbook My Cat
  17. 17. Why Delimited or JSON?• Delimited– Easiest to parse– Works great with tabular data– Not good for arbitrary and nested structures• JSON– Much simpler and easier to use– Bad for situations where markup languagesare appropriate (e.g. documents)17
  18. 18. XML = Data Duct Tape• Very useful and is here to stay• Best uses are documents, messaging,and data transport• Can be used for almost anything butsometimes not a good choice18
  19. 19. XML and Life after MARC• Use of XML will expand as the role of thetraditional catalog wanes• Expect growth as libraries need to provideaccess to a greater variety of resources• XML will be critical as linked databecomes more common19
  20. 20. What You Should Do Now• Be aware of what XML is• Know what it is good for• Learn specifics on an as needed basis20
  21. 21. Thank You!Kyle Banerjeebanerjek@ohsu.edu

×