Beyond Seamless Access: Meta-data In The Age of Content Integration


Published on

This was an example of meta-data research that I did before Dot-COM bubble hit the East Coast in 2000. Much of what we envisioned for content integration shaped the meta-data movement for today. Its full potentials have not reached yet, e.g. the level of intelligent data for semantic apps, personalized delivery, interactive and bidirectional-linking services, repurposed services, etc. It's the first of its kind weaving content from scholarly publications (particularly in the context of formal and informal communications) with library mission critical applications in authority control, meta-data, directory services, ILS, ILL, knowledge-base for site map, etc.

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • When you provide meta-data for the data, you might record when when was the data created? By whom?, how long is it valid? You may want to serve up the metadata document first, so people that interact with your web site can first decide if the data is relevant before actually downloading the data You might provide hyperlinking capabilities in your data so that you can express the relationship between this data and other data
  • "Eve L. Maler" <elm@ARBORTEXT.COM> XLink is needed, first of all, because in an XML document you can use whatever element names you want. In HTML, it's predetermined that A and IMG (and a couple of others) do link-ish things, but software can't predict ahead of time what the linking elements will be called in an XML document: FOOTNOTE? ANNOTATION? RELATIONSHIP? CROSS-REFERENCE? So you have to put a "marker" (in the form of a special attribute) on your element in order for XLink-aware applications to recognize it. In addition to this innovation, however, XLink does another important job, which is invent a whole new form of link. In HTML, A is a two-ended, unidirectional link: you click from here to go there. In XML, this would be a "simple link," but you can also have a 318-ended "extended link" where you can start from any one of the ends and go to any of the others. And on top of this, the link doesn't have to be stuck into *any one* of those 318 pieces of content! This means that you can store the linking information outside all the information that you're hooking together, so you can update your link if one of the 318 chunks changes in any way. Here's an example of a plain vanilla ("simple") XLink linking element that functions just like an HTML A element (using syntactic details from the 19980303 working draft; this is highly likely to change a bit in the next draft). In fact, this example is actually an XMLified HTML fragment: See <a xml:link="simple" href="">our home page</a> for more cool information. Now, on to XPointer. If you use HTML's A to point into an HTML document, your choices are: Point to a whole HTML page Point to a "spot" somewhere inside the HTML page, using # Often, it's useful to point to some actual content, instead of a "spot." For example, you might want to direct your readers' attention to the third item in a particular list, or a certain stretch of text in the equivalent of a PRE element. When the content of interest is in an XML document, then you can rely on XML's natural tree structure to provide signposts that you can use in navigating to the right stuff. XPointer is a little mini-language that you can stick on the end of a URL after the #, in the case where the URL points to an XML document. It uses a keyword-and-arguments syntax like this: keyword1(arg1,arg2).keyword2(arg1,arg2) For example, if you want to link to the third item in the list with a unique ID of "interesting-list", you can do it this way: href=",item)" There's a whole boatload of keywords, and some of their arguments can get a bit complicated, but that's the basic idea. The best part about XPointer is that, if you do addressing this way, your link will probably be more robust against some change in the target document. For example, even if the target list is cut and pasted somewhere else, the fact that it has a unique ID will keep your link operational. (Now, if someone rearranges the items, that's another story.)
  • Beyond Seamless Access: Meta-data In The Age of Content Integration

    1. 1. Beyond Seamless Access: Meta-data in the Age of Content Integration Spring 2000 Program Information Technology Interest Group of Association of College & Research Libraries, New England Chapter Univ. of Connecticut May 26, 2000 Amanda Xu Information Architect EBSCO, 10 Estes Street, Ipswich, MA 01938
    2. 2. OVERVIEW <ul><ul><li>Definitions </li></ul></ul><ul><ul><ul><li>Meta-data, schemas, and XML linking structures </li></ul></ul></ul><ul><ul><li>Why content integration and analysis? </li></ul></ul><ul><ul><ul><li>Assumptions about information search and retrieval </li></ul></ul></ul><ul><ul><li>Meta-data applications for content integration and analysis </li></ul></ul><ul><ul><li>How can XML technologies support the interchange, analysis, and personalized delivery of full-text on the Web? </li></ul></ul><ul><ul><li>Role of librarians, and information mediators in the wave of content integration </li></ul></ul>
    3. 3. Definitions (1) Meta-data, What is it? [1/6] <ul><li>Definitions: </li></ul><ul><ul><li>1) “Data about data” or “information which describes a data set” </li></ul></ul><ul><ul><li>2) Data elements, and attributes that facilitate the search and retrieval of </li></ul></ul><ul><ul><li>a set of associated attributes </li></ul></ul><ul><ul><ul><li>Example 1: </li></ul></ul></ul><ul><ul><ul><li>An address label contains: name, address, city, state, zip </li></ul></ul></ul><ul><ul><ul><li>Address might feature a home or office, address access permissions, last updated, internal references </li></ul></ul></ul><ul><ul><li>3) A set of semantics that describe the data, classify it, categorize it, and provide instructions on how and where to exploit it </li></ul></ul><ul><ul><ul><li>Example 2: </li></ul></ul></ul><ul><ul><ul><li>Standard bibliographic information, summaries, indexing terms, and abstracts </li></ul></ul></ul>
    4. 4. Definitions (1) Meta-data, What is it? [ 2/6] <ul><li>Example 3: Simple XML Record </li></ul><ul><li><record> </li></ul><ul><ul><li><title>The Tao of Pooh</title> </li></ul></ul><ul><ul><li><author label=“personal”>Benjamin Hoff</author> </li></ul></ul><ul><ul><li><date label=“1st-published”>1982</date> </li></ul></ul><ul><ul><li><isbn>01400-67477</isbn> </li></ul></ul><ul><ul><li><publisher>Dutton</publisher> </li></ul></ul><ul><ul><li><subject label=“personal”>Winnie the Pooh</subject> </li></ul></ul><ul><ul><li><subject>Taoism in literature</subject> </li></ul></ul><ul><ul><li><classification scheme=“LCC”>PR6025.I65Z68 1983 </li></ul></ul><ul><ul><li></classification> </li></ul></ul><ul><li></record> </li></ul>
    5. 5. Definitions (1) Meta-data, What is it? [3/6] <ul><ul><li>4) Supports understanding of a document, its structure, relationship, locations, and usage </li></ul></ul><ul><ul><li>5) Helps you find things or make things disappear </li></ul></ul><ul><li>Where is meta-data? </li></ul><ul><ul><ul><li>1) Internally: </li></ul></ul></ul><ul><ul><ul><li>Embedded with markup, and with content </li></ul></ul></ul><ul><ul><ul><li>Attached as resource header (HTML META Tag), or package </li></ul></ul></ul><ul><ul><ul><li>2) Externally: </li></ul></ul></ul><ul><ul><ul><li>Stored separately from its resource </li></ul></ul></ul><ul><ul><ul><li>Generated on demand, e.g. MS SQL Server or Oracle </li></ul></ul></ul><ul><ul><ul><li>Static, e.g. bibliographic record </li></ul></ul></ul><ul><ul><ul><li>Dynamic linked using Xlink/Xpointers/Xpath and ISO Hytime </li></ul></ul></ul>
    6. 6. Definitions (1) Meta-data, What is it? [4/6] <ul><li>Naming Issues: </li></ul><ul><li>Can your meta-data be interchanged, and shared with others via computer programs or parsers? </li></ul><ul><ul><ul><li>URI = URN + URL + URC (IETF) </li></ul></ul></ul><ul><ul><ul><li>Namespaces (W3C): qualify elements uniquely, and avoid name collision </li></ul></ul></ul><ul><ul><ul><li>URIs specify the namespaces in use </li></ul></ul></ul><ul><ul><ul><li>XML Namespaces provide a way for the name to be unique, but it doesn’t solve vocabulary ambiguity </li></ul></ul></ul>
    7. 7. <ul><li>Example 4: </li></ul><ul><li><date> used in three different occasions: </li></ul><ul><li>From George’s document: <date>9-Sept-1999</date> </li></ul><ul><li>From Martha’s document: <date>The lovely Deni</date> </li></ul><ul><li>From Hadley’s document: <date>Large Plump Medjool</date> </li></ul><ul><li>Use namespaces: </li></ul><ul><ul><li><george:date> 9-Sept-1999</george:date> </li></ul></ul><ul><ul><li><martha:date>The lovely Deni</martha:date> </li></ul></ul><ul><ul><li><hadley:date> Large Plump Medjool</hadley:date> </li></ul></ul><ul><li>Note: Example from Brian Dravis <Essential_XML>seminar on 11/02/99, Boston </li></ul>Definitions (1) Meta-data, What is it? [5/6]
    8. 8. Definitions (1) Meta-data, What is it? [6/6] <ul><li>Example 5: Simple Dublin Core Record with DC namespace, and qualifiers </li></ul><ul><li><?xml version=“1.0” encoding=“UTF-8”?> </li></ul><ul><li><?xml version=“1.0” standalone=“yes”?> </li></ul><ul><li><record xmlns:dc=“” </li></ul><ul><li>xmlns:dc=“”> </li></ul><ul><li><dc:title>The Tao of Pooh</dc:title> </li></ul><ul><li><dc:creator>Benjamin Hoff</dc:creator> </li></ul><ul><li><dcq:creatorType>Illustrator</dcq:creatorType> </li></ul><ul><li><dc:date>1982</dc:date> </li></ul><ul><li><dc:isbn>01400-67477</dc:isbn> </li></ul><ul><li><dc:publisher>Dutton</dc:publisher> </li></ul><ul><li><dc:subject>Winnie the Pooh</dc:subject> </li></ul><ul><li><dc:subject>Taoism in literature</dc:subject> </li></ul><ul><li></record> </li></ul>
    9. 9. Definitions (2) Schemas, What is it? [1/3] <ul><li>How do you know which meta-data/vocabularies that you are interchanging with? </li></ul><ul><ul><li>Schemas (DTDs): </li></ul></ul><ul><ul><ul><li>understand document elements and structures </li></ul></ul></ul><ul><ul><ul><li>validation /parsing </li></ul></ul></ul><ul><ul><ul><li>schemas support data types (e.g. integer, time, time period), open content model, inheritance, constraints, and namespaces </li></ul></ul></ul><ul><ul><li>Example: </li></ul></ul><ul><ul><ul><ul><li><xsd:schema xmlns:xsd=&quot;;> </li></ul></ul></ul></ul><ul><ul><ul><ul><li><xsd:element name=&quot;state&quot; type=&quot;xsd:string&quot;/> </li></ul></ul></ul></ul><ul><ul><ul><ul><li><xsd:element name=&quot;zip&quot; type=&quot;xsd:decimal&quot;/> </li></ul></ul></ul></ul><ul><ul><ul><ul><li><xsd:attribute name=&quot;country&quot; type=&quot;xsd:NMTOKEN&quot; use=&quot;fixed&quot; value=&quot;US&quot;/> </li></ul></ul></ul></ul><ul><li>Note: Example from Brian Travis’s tutorial, “XML and Data-Driven Web Architectures”, Seybold Seminars, Boston, Feb. 11, 2000. </li></ul>
    10. 10. Definitions (2) Schemas, What is it? [2/3] <ul><li>How many types of XML vocabularies are there? </li></ul><ul><li>Examples: </li></ul><ul><li>1) xml schema </li></ul><ul><ul><li><xs:schema xmlns:xs=&quot; targetNamespace=&quot;” version=&quot;M.n&quot;>... </li></ul></ul><ul><ul><li></xs:schema> </li></ul></ul><ul><li>2) RDF </li></ul><ul><ul><li><? xml version=‘1.0’> </li></ul></ul><ul><ul><li><rdf:RDF xmlns:rdf=“” </li></ul></ul><ul><ul><li> xmlns:rdfs=“” </li></ul></ul><ul><ul><li> xmlns:dc=“ “> </li></ul></ul>
    11. 11. Definitions (2) Schemas, What is it? [3/3] <ul><li>3) Schema repositories: industry-specific </li></ul><ul><ul><li>SOAP, BizCodes, XMLRPC, ICE, CDF, WebDav, XML/ASN.1, XML/EDI, XER, and Z39.50 </li></ul></ul><ul><ul><li> routing information </li></ul></ul><ul><ul><ul><li><bizTalk> </li></ul></ul></ul><ul><ul><ul><ul><li><Route> </li></ul></ul></ul></ul><ul><ul><ul><ul><ul><li><From locationID=“” locationType=“IP” handle=“72” process=“POConf” Path=“”/> </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li><To locationID=“83-627-54204” locationType=“DUNS” handle=“14” process=“PO_Process” Path=“”/> </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><li></Route> </li></ul></ul></ul></ul><ul><ul><ul><ul><li><body> </li></ul></ul></ul></ul><ul><ul><ul><ul><li><purchaseOrder xmlns=“” PONumber=“10-01-2118”></purchaseOrder> </li></ul></ul></ul></ul><ul><ul><ul><ul><li></body> </li></ul></ul></ul></ul><ul><ul><ul><li></bizTalk> </li></ul></ul></ul><ul><ul><ul><li>Note: Example from Brian Dravis <Essential_XML>seminar on 11/02/99, Boston </li></ul></ul></ul>
    12. 12. Simple Meta-data Interchange Model DB XML/ASN.1Server Direct Transfer Repository Schema & map sys C to SysA XML/ASN.1 Server STMP DB Template mapping between SysA to Sys B, then sys B to sys C System B ILL Request in XML/EDIFACT Direct Transfer <ul><li>protocol </li></ul><ul><li>syntax </li></ul><ul><li>encoding </li></ul>System A System C XML/EDIFACT to ASN.1/BER ASN.1/BER XML/BER Direct Transfer to STMP ASN.1/BER to XML/BER STMP to Direct Transfer XML/BER to XML/EDIFACT
    13. 13. Definitions (3) Linking Structures [1/6] My Element Attlist my thing Xlink-Root URI Xpointer address Remote schema/ (DTD) Root URI address Leveraging XML Syntax : Link structures, which link an XML name tag to an external standard reference item, and which allow context query and non-context query at element and attribute level <ul><li>Notes: </li></ul><ul><ul><li>Xlink specification <> </li></ul></ul><ul><ul><li>Xpointer Specification <> </li></ul></ul>
    14. 14. Definitions (3) Linking Structures [2/6] Application Request linkInfo The API to retrieve link information from the linkbase Linkbase <ul><li>Leveraging application : </li></ul><ul><ul><li>The link structures, in which linkInfo partakes are returned to the application, </li></ul></ul><ul><ul><li>which can be re-assembled for different purposes on the fly </li></ul></ul>
    15. 15. Definitions (3) Linking Structures [3/6] <ul><li>Leveraging resources and merging links </li></ul>Original Doc Link structures in which links are merged into the original doc, and formed a composite document. API merge the links Composite Doc linkbase
    16. 16. Definitions (3) Linking Structures [4/6] <ul><li>Topic Map: </li></ul><ul><li>“ To qualify the content and/or data contained in information objects as topics to enable navigational tools such as indexes, cross-references, citation systems, or glossaries. </li></ul><ul><li>To link topics together in such a way as to enable navigation between them </li></ul><ul><li>To filter an information set to create views adapted to specific users or purposes. For example, such filtering can aid in the management of multilingual documents, management of access modes depending on security criteria, delivery of partial views depending on user profiles and/or knowledge domains, etc. </li></ul><ul><li>To structure unstructured information objects, or to facilitate the creation of topic-oriented user interfaces that provide the effect of merging unstructured information bases with structured ones.” </li></ul><ul><li>Note: Quote from Topic Map web site:> </li></ul>
    17. 17. Definitions (3) Linking Structures [5/6] Query Category map filter profiles knowledge domains languages access rights delivery views/devices DB Structured docs Unstructured docs Link Cluster Adaptive categories Attach categories Match query Result set w/ category map Search/ navigate TOPIC MAP Leverage Topic Maps TOPIC MAP TOPIC MAP TOPIC MAP TOPIC MAP TOPIC MAP 1 2
    18. 18. Definitions (3) Linking Structures [6/6] <ul><li>Topic association -Example </li></ul><ul><li><topic id=“n001” types=“city”> </li></ul><ul><li><topicname> </li></ul><ul><ul><li><basename>New York City</basename> </li></ul></ul><ul><ul><li></topname> </li></ul></ul><ul><li><mention adr1 adr2 adr3</mention></topic> </li></ul><ul><li><topic id=“c98991” types=“monument”> </li></ul><ul><li><topicname> </li></ul><ul><ul><li><basename>Brooklyn Bridge</basename> </li></ul></ul><ul><li></topname> </li></ul><ul><li><mention>adr34 adr3462 adr9832</mention></topic> </li></ul><ul><li><assoc type=“sightseeing” scope=“civil-engineering”> </li></ul><ul><li><when-in>n001</when-in> </li></ul><ul><li><visit>c98991</visit></assoc> </li></ul><ul><li><topic id=“city” types=“topictypes”> </li></ul><ul><li><topic id=“monument” types=“topictypes”> </li></ul><ul><li><topic id=“civil-engineering”> </li></ul><ul><li><topic id=“topictypes”> </li></ul><ul><li>Note:Example from Steve R. Newcomb’s tutorial, “Metadata, Schemas, and Linking Structures” XML World conference, Ottawa, Sept. 13, 1999, updated 5/30/2000. </li></ul>
    19. 19. <ul><ul><li>Why content integration and analysis? </li></ul></ul><ul><ul><ul><li>Assumptions about information search and retrieval </li></ul></ul></ul>Information retrieval is only the 1 st step for information management. The next step is information analysis and decision support, where information analysis is to cross-correlate information from multiple and diverse data sources in the net for specific problem solving, and where decision support is to detect, analyze and alert topics, trends and events based on the correlated information. Notes: Schatz, Bruce R. 1998. “Information Analysis in the Net: The Interspace of the Twenty-First Century.” Visualizing Subject Access for 21 st Century Information Resources, edited by Pauline Cochrane and Eric E. Johnson. Univ. of Illinois at Urbana-Champaign. Evans, David A. 1999. “Beyond Information Retrieval Workshop, 4 th Search Engine Conference, April 9, 1999, Boston, MA.”
    20. 20. Meta-data applications for content integration and analysis (1 of 3) <ul><li>What has it to do with products for the library world? </li></ul><ul><li>Today: </li></ul><ul><ul><li>Full-text linking </li></ul></ul><ul><ul><ul><li>ILL/DocDelivery </li></ul></ul></ul><ul><ul><ul><li>ILS linking for holdings </li></ul></ul></ul><ul><ul><ul><li>Publishers & Authors’ Web sites </li></ul></ul></ul><ul><ul><ul><li>Linking services </li></ul></ul></ul><ul><ul><ul><ul><li>Reference linking services provided by CrossRef, SFX, LANL </li></ul></ul></ul></ul><ul><ul><ul><li>Patent data </li></ul></ul></ul><ul><li>Tomorrow: </li></ul><ul><ul><li>User can link directly to any content published by a specific organization simply by highlighting a phrase, sentence, paragraph, a document appearing in any browser, word-processing package, email program or other application </li></ul></ul>
    21. 21. Meta-data applications for content integration and analysis (2 of 3) <ul><ul><li>Interwoven threads for subjects, journal titles, authors, collections </li></ul></ul><ul><ul><li>No document boundary, but information space where a deeper understanding of knowledge within and across domain is facilitated for specific problem solving and decision support </li></ul></ul><ul><li>Subjects </li></ul><ul><li>UMLS </li></ul><ul><li>Word Net </li></ul><ul><li>LCSH </li></ul><ul><li>Lexicons </li></ul><ul><li>Dictionaries </li></ul><ul><li>Journal </li></ul><ul><li>Titles </li></ul><ul><li>Ulrich’s Serials </li></ul><ul><li>Directory </li></ul><ul><li>LC Serials </li></ul><ul><li>Gale Directory </li></ul><ul><li>Authors </li></ul><ul><li>Who’s Who </li></ul><ul><li>Wilson Bibliography </li></ul><ul><li>Gale Contemporary Authors </li></ul><ul><li>Authority files from LC </li></ul><ul><li>Community of Science </li></ul>Link base Link base Article collections Book collections Journal collections Other media
    22. 22. Meta-data Applications for Content Integration and Analysis (3 of 3) <ul><li>Future -- decision support and problem solving </li></ul>Meta-data standardization Book directory Collection directory Journal directory Author directory Bi-directional linking Collections Library holdings ILL/Document delivery Reference linking Site-map Knowledge-base Websites reviews /annotations /publisher sites /author pages /email /mailing lists /chatting rooms /community pages Authority Control
    23. 23. How can XML technologies support the interchange, analysis, and personalized delivery of full-text on the Web? (1 of 2) <ul><li>XML is nothing but data interchange. It is the application that makes the data reusable, and thus adds functionality and intelligence to it: </li></ul><ul><li>In the beginning --> Editing </li></ul><ul><li>Generation X --> Look and feel </li></ul><ul><li>Intelligence (SGML/XML) </li></ul><ul><ul><ul><ul><li>--> Semantics: </li></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>Levels of fragmentation </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>Schema recognition, </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>Namespace handling </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>Linking registration and management </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><li>--> Viewing/Personalized delivery </li></ul></ul></ul></ul><ul><ul><ul><ul><li>--> Interactive services, e.g. B2B </li></ul></ul></ul></ul><ul><ul><ul><ul><li>--> Software applications, </li></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>e.g. re-purposing, concurrent editing </li></ul></ul></ul></ul></ul>
    24. 24. How can XML technologies support the interchange, analysis, and personalized delivery of full-text on the Web? (2 of 2) <ul><li>XML enables text mining which has become </li></ul><ul><ul><li>increasingly fine grained, subjective, and personal via </li></ul></ul><ul><ul><ul><li>extracting information </li></ul></ul></ul><ul><ul><ul><li>counting by type (quantifying) </li></ul></ul></ul><ul><ul><ul><li>categorizing/filtering </li></ul></ul></ul><ul><ul><ul><li>discovering trends </li></ul></ul></ul><ul><ul><ul><li>capturing critical details </li></ul></ul></ul><ul><ul><ul><li>assessing trends </li></ul></ul></ul><ul><li>Note: </li></ul><ul><li>Evans, David A. 2000. “Text Mining Workshop.” Fifth Search Engines Conference, Boston, MA. </li></ul>
    25. 25. Role of librarians, and information mediators in the wave of content integration <ul><li>Every aspect of librarian-ship is needed </li></ul><ul><li>It is a matter of which parts you would like to participate </li></ul>
    26. 26. Questions?