Metadata is back!Bernhard Haslhofer - Cornell UniversityJCDL 2011 - Semantic Web Technologies for Libraries and Readers WorkshopOttawa, CanadaThursday, June 16th 2011
schema.org Book Example <img src="catcher-in-the-rye-book-cover.jpg" /> The Catcher in the Rye - Mass Market Paperback by <a href="/author/jd_salinger.html">J.D. Salinger</a> Price: $6.99 In Stock Product details 224 pages Publisher: Little, Brown, and Company - May 1, 1991 Language: English ISBN-10: 0316769487
Library Catalogue Controlled Vocabulary (c) Vienna University Library Metadata(c) Bill Steele/Cornell Chronicle Identiﬁer (c) Vienna University Library
OPACMetadata Controlled Vocabulary Identiﬁer
WWW / Wikipedia / Search EnginesIdentiﬁer?Metadata?Controlled Vocabulary?
Semantic Web - Early Vision "Mom needs to see a specialist and then has to have a series of physical therapy sessions. Biweekly or something. Im going to have my agent set up the appointments." “The Semantic Web will bring structure to the meaningful content of Web pages, creating an environment where software agents roaming from page to page can readily carry out sophisticated tasks for users” “For the semantic web to function, computers must have access to structured collections of information and sets of inference rules that they can use to conduct automated reasoning.”~2000 2011
Semantic Web Technologies User Interface & Applications Trust Proof Unifying Logic Ontology: Rules: RIF Query: OWL SPARQL RDF-S Crypto Data Model: RDF XML URI Unicode~2000 2011
RDFa & Microformats • Mechanisms to embed structured metadata in Web pages • Deﬁne and/or reuse (X)HTML attributes to augment information in Websites with machine-readable semantics~2000 2011
Linked Data • There is lots of information on the Web • ... valuable information that can be (re-)used • Problem • information is usually expressed in the form of HTML documents • the underlying raw data are locked in closed data silos (mostly DBMS)~2000 2011
Why Linked Data? • The Web is successful because it provides • Uniform encoding (HTML) • Uniform addressing (URI) • Uniform transportation (HTTP) for the exchange of documents. • Why not apply the same mechanism to the underlying data?~2000 2011
What is Linked Data? • A pragmatic method to build a Web of Data • Architectural style based on SW standards • Intelligent agents not primary focus Web~2000 2011
Publishing Data • Distinguish between non-information and information resource • Sample non-information resource • http://dbpedia.org/resource/The_Catcher_in_the_Rye • Sample information resource • http://dbpedia.org/page/The_Catcher_in_the_Rye - HTML • http://dbpedia.org/data/The_Catcher_in_the_Rye - RDF~2000 2011
Retrieving Linked Data~2000 2011
Microdata (HTML5) • A very young HTML 5 proposition that extends Microformats and addresses its shortcomings • Items are created within an itemscope • Ever item is assigned an arbitrary number of properties (itemprop) • Uses global identiﬁers for typing and naming items~2000 2011
Deal with with schema.org• Ignore it?• Adopt it?• Align existing library models with schema.org?• Schema.org provides an extension mechanism for • properties • classes
Data Quality / Resource Sync• The Web is not static• Resources and their representations might change or disappear over time• Make sure that • applications can synchronize resources and learn about changes • go back in time
Use Web Data in Apps• Aggregate Web resources into special collections• DBpedia provides resource descriptions translated into 90+ languages!!!• Use URIs instead of labels for tagging• Combine and mesh up data• Analyze data ...
Metadata is back• Metadata was introduced in the 19th century to deal with the information overload• Cataloguing rules and workﬂows evolved over time• The Web seemed to work pretty well without metadata (info retrieval, nat.lang processing)• Now we have strong indicators that structured metadata on the Web will play an important role in future• Shouldn’t libraries / librarians be part of that?
References• Coyle, K.: Library Data in a Modern Context. In: Understanding the Semantic Web: Bibliographic Data and Metadata. Library Technology Reports. January 2010• http://blog.mediaspaces.info/ (Linked Data in Libraries State-of-the-Art)
Metadata Building Blocks class relationship Schema Definition Language property Metadata Schema Title Author Genre Title The Catcher in the Rye Metadata Author Salinger, J.D. Genre Fiction (Digital / Non-Digital) Information Object
Google Rich Snippet Types • Reviews • People • Products • Businesses and organizations • Recipes • Events~2000 2011
Microformats RDFa ﬂat namespace XML namespaces support HTML4, XHTML 1.1, and support for XHTML 1.1 HTML 5 use latent HTML attributes introduces new metadata attributes vocabulary deﬁned by one open to any RDF-based vocabulary organization/communitycp.: http://evan.prodromou.name/RDFa_vs_microformats~2000 2011
Publishing Data GET http://dbpedia.org/resource/The_Catcher_in_the_Rye Accept: application/rdf+xml 303 See Other Location: http://dbpedia.org/data/The_Catcher_in_the_Rye GET http://dbpedia.org/data/The_Catcher_in_the_Rye Accept: application/rdf+xml 200 OK ... <?xml version="1.0" encoding="utf-8"?> <rdf:RDF ...~2000 2011