Successfully reported this slideshow.

The Semantic Web and Libraries in the United States: Experimentation and Achievements


Published on

This presentation reflects the paper titled "The Semantic Web and Libraries in the United States: Experimentation and Achievements," published in the proceedings of 75th IFLA General Conference and Assembly, Satellite Meeting: Emerging Trends in Technology: Libraries between Web 2.0, Semantic Web and Search Technology 8/19-20/2009, in Florence, Italy, presented by Sharon Yang, Rider University, Yanyi Lee, Wagner College, and Amanda Xu, St. John's University. Here is the URL to the full paper:

Published in: Technology, Education
  • Be the first to comment

The Semantic Web and Libraries in the United States: Experimentation and Achievements

  1. 1. THE SEMANTIC WEB AND LIBRARIES IN THE UNITED STATES: EXPERIMENTATION AND ACHIEVEMENTS 75 th IFLA General Conference and Assembly Satellite Meeting: Emerging Trends in Technology: Libraries between Web 2.0, Semantic Web and Search Technology 8/19-20/2009, Florence, Italy Sharon Yang, Associate Professor/Librarian, Rider University Yanyi Lee, Systems Librarian, Wagner College Amanda Xu, Assistant Professor/Librarian, St. John’s University
  2. 2. Overview <ul><li>Introduction to Semantic Web </li></ul><ul><li>Library’s Role in Semantic Web </li></ul><ul><li>Semantic Web Development in the U.S. Libraries </li></ul><ul><li>Library Semantic Web Development in the United States and Europe-A Comparison </li></ul><ul><li>Conclusion </li></ul><ul><li>Questions & Answers </li></ul>
  3. 3. Introduction <ul><li>Semantic Web -A vision by Tim Berners-Lee; An extension of the current Web; Web of data for machine-processing; Smart applications making connection of people, communities, things, and knowledge on the Web </li></ul><ul><li>Semantic Web technologies (loosely a.k.a. Semantic Stack) </li></ul><ul><ul><li>URIs – naming things and info resources, and providing means to access info about them over the Web </li></ul></ul><ul><ul><li>Meta-Languages, e.g. RDF, RDFS, OWL, SPARQL, etc. and their serialization in RDF/XML, N-Triples, N3, Turtle, RDF/JASON, etc. </li></ul></ul><ul><ul><li>Logic for reasoning, e.g. making inferences based on classes, subclasses, properties, sub-properties, domains, ranges, and logical operations (unions, intersection, etc. ) in RDFS; and value restrictions, cardinality, transitivity, equivalence, and logical operations in OWL </li></ul></ul><ul><ul><li>Proof for validation </li></ul></ul><ul><ul><li>Trust via digital signatures and other knowledge, e.g. recommendation, rating, and certifications </li></ul></ul>
  4. 4. Introduction (Continued) Semantic Web Architecture/Stack 1
  5. 5. Introduction (Continued) <ul><li>Extended SW for community-based vocabularies </li></ul><ul><ul><li>SKOS (Simple Knowledge Organization Systems) – expressing concept organization systems such as thesauri, taxonomies, and controlled vocabularies in RDF/RDFS, OWL </li></ul></ul><ul><ul><li>FOAF (Friend of a Friend) – describing people, links between them, things they create or do, and their relationships in RDF/RDFS, OWL </li></ul></ul><ul><ul><li>SIOC (Semantically-Interlinked Online Communities) – expressing information in the discussion forums on the Web, e.g. blogs, forums, and mailing list </li></ul></ul><ul><ul><li>Dublin Core – a set of meta-data elements for cross-domain info description </li></ul></ul><ul><ul><li>Semantic Wiki/Semantic MediaWiki </li></ul></ul><ul><ul><li>Encoding structured information into HTML pages using Microforms, and mapping it to RDF using GRDDL </li></ul></ul><ul><ul><li>Encoding and retrieving RDF attributes from HTML page using RDFa </li></ul></ul>
  6. 6. Introduction (Continued) <ul><li>Semantic Web Aware Tools </li></ul><ul><ul><li>Browser level </li></ul></ul><ul><ul><ul><li>Google’s RDFa support 1 , Yahoo Pipes 2 , Firefox extension for Piggy Bank 3 , FOAFfox 4 , Zotero 5 , etc. </li></ul></ul></ul><ul><ul><li>Desktop level </li></ul></ul><ul><ul><ul><li>Twinkle in Jena 6 , and ARC in XAMP for SPARQL 7 </li></ul></ul></ul><ul><ul><li>Ontology editors </li></ul></ul><ul><ul><ul><li>Protégé 8 and TopBraid 9 </li></ul></ul></ul><ul><ul><li>RDF store </li></ul></ul><ul><ul><ul><li>Oracle 11’s RDF database 10 , Vulcan’s Knoodl 11 </li></ul></ul></ul><ul><ul><li>Semantic searching </li></ul></ul><ul><ul><ul><li>Hakia 12 , Freebase Parallax 13 , Semantic Engines’ SenseBot 14 </li></ul></ul></ul><ul><ul><li>SW framework enablers </li></ul></ul><ul><ul><ul><li>Drupal 15 , Semol’s ARC and Trice 16 , Twine 17 , Jena 18 , Sesame 19 , Ontotext’s OWLIM 20 </li></ul></ul></ul>
  7. 7. Library’s Role in Semantic Web <ul><li>Phase 1 – Semantifying Thesaurus/Mappings/Services </li></ul><ul><ul><li>Weaving semantic Wiki and semantic media Wiki </li></ul></ul><ul><ul><li>Translating LC controlled vocabularies and authority control for named entities, thesauri from domain specific societies and institutions into RDF/RDFS, OWL, SKOS with URIs assigned according to ‘Linked Data Design Principles (Berners-Lee, 2007)’ </li></ul></ul><ul><ul><li>Converting semantic-aware data sets in MARCXML, RDFa and DBPedia into RDF triples in conformance with content models defined in FRBR, RDA, Dublin Core, and registering them with global metadata registries </li></ul></ul><ul><ul><li>Extensive use of URIs for individual data elements in bibs, authority, etc. </li></ul></ul><ul><ul><li>Explicit correlation and referencing between LCSH terms, and LCC and DDC numbers </li></ul></ul><ul><ul><li>Creating build-in links to thesauri with MS Office tools </li></ul></ul><ul><ul><li>Supporting SPARQL endpoint for querying, analyzing, and federation of data sets </li></ul></ul><ul><ul><li>Developing business rules and knowledgebase for web services for info exaction, identity resolution, filtering, semantic annotation and search </li></ul></ul>
  8. 8. Library’s Role in Semantic Web (Continued) <ul><li>Phase 2 – Exposing collections </li></ul><ul><ul><li>Bibliographic data discovery, sharing, and reuse, e.g. Talis Connected Commons 1 </li></ul></ul><ul><ul><li>Authority data discovery, sharing, and reuse, e.g. LC authorities & Vocabularies 2 , OCLC’s Faceted Application of Subject Terminology (FAST) 3 , Multilingual Access to Subjects (MACS) 4 </li></ul></ul><ul><ul><li>Explanatory search, e.g. OCLC WorldCat Local 5 , Scriblio 6 , Endeca 7 , and Solr-based search tools, e.g. Vufind 8 , Primo 9 , and Blacklight 10 </li></ul></ul><ul><ul><li>Social networking, e.g. FRBR Blog 11 , LibGuides’ Widget 12 , 13 , Scientific Collaboration Framework 14 </li></ul></ul><ul><ul><li>Preservation and archiving, e.g. Dspace 15 , Fedora Commons 16 , Greenstones 17 , arXiv 18 , OAI-PMH 19 , Cheshire III 20 , 21 and OCLC’s ArchiveGrid 22 </li></ul></ul><ul><ul><li>Distributed content management systems, e.g. LC’s MIC 23 , Drupal Core and Site Vocabulary 24 , YouTube 25 , Flickr 26 , Vimeo 27 , and Blinx 28 </li></ul></ul><ul><ul><li>Info dissemination, e.g. email alerts from LinkedIn 29 & visualization </li></ul></ul><ul><ul><li>Intelligent info analysis and decision support </li></ul></ul>
  9. 9. Semantic Web Development in the U.S. Libraries <ul><li>Most Semantic Web projects in the U.S. Libraries : </li></ul><ul><ul><li>By national libraries or big organizations such as LC , NLM, NAL, OCLC, and DCMI </li></ul></ul><ul><ul><li>In the process of creating Semantic Web tools and infrastructures, e.g. exposing collections </li></ul></ul><ul><ul><li>In the area of converting MARC records and controlled vocabularies/thesauri into URIs and RDF/XML </li></ul></ul><ul><ul><li>Semantic Web technologies are slowly and steadily incorporated into digital library management systems </li></ul></ul>
  10. 10. Library of Congress Semantic Web Projects <ul><li>Participated in the creation of the W3C standard on SKOS and SKOS Primer </li></ul><ul><li>LCSH/SKOS project (several small projects) </li></ul><ul><ul><li>LCCN Permalink project (persistent URLs from LC bibs completed, and authorities in planning) </li></ul></ul><ul><ul><li>LCSH in RDF/XML using SKOS (entire subject authority records downloadable at </li></ul></ul><ul><ul><li>Supporting SKOS in RDA, MARC, PREMIS, and METS (experimental stage) </li></ul></ul><ul><ul><li>Maintaining registries for SKOS, and related standards & data elements (on-going) </li></ul></ul><ul><ul><li>Correlating DDC, LCC/LCSH in Classification Web (on-going) </li></ul></ul>
  11. 11. Library of Congress Semantic Web Projects (Continued) <ul><li>Other initiatives influenced by Semantic Web </li></ul><ul><ul><li>Developing content and format guidelines for submission of ONIX data to CIP program, and retooling ECIP program (completed) </li></ul></ul><ul><ul><li>Enabling OAI-PMH Retrieval from VIAF (Virtual International Authority File) (completed) </li></ul></ul><ul><ul><li>Making all access points in LC ILS under authority control (on-going) </li></ul></ul><ul><ul><li>Enabling RDA in MARCXML, MODS, and MADS (testing stage) </li></ul></ul><ul><ul><li>Linking TOCs, publisher descriptions, contributor-supplied biographies, sample pages, reading guides, reviews and user-added data to Bibs (on-going) </li></ul></ul><ul><li>Joining JSC, DCMI, CNL, BNL, NLA and others on making RDA records readily adaptable to Semantic Web (planned) </li></ul>
  12. 12. Library’s Role in Semantic Web (Continued) <ul><li>Phase 3 – Future Semantic Web </li></ul><ul><ul><li>Web of Linked Data 1 & 2 </li></ul></ul><ul><ul><ul><li>DBPedia 3 </li></ul></ul></ul><ul><ul><ul><li>GeoNames 4 </li></ul></ul></ul><ul><ul><ul><li>Other public data in Cloud Computing, e.g. Amazon Web Services 5 </li></ul></ul></ul><ul><ul><ul><li>Librarything 6 </li></ul></ul></ul><ul><ul><li>Trust layer </li></ul></ul><ul><ul><ul><li>Semantifying Web services for reviews and rating of library related application services, e.g. journal ranking and acceptance rates 7 , CiteUlike 8 , ISI Web of Science </li></ul></ul></ul>
  13. 13. OCLC Semantic Web Projects <ul><li>FRBR-izing projects </li></ul><ul><ul><li>Developed FRBR work set algorithms 1 and xISBN Web Services 2 </li></ul></ul><ul><ul><li>FictionFinder (2.8 million records in WorldCat) 3 & 4 </li></ul></ul><ul><ul><li>WorldCat Identities (20 million identities) 5 </li></ul></ul><ul><li>PREMIS Data Dictionary 6 & 7 </li></ul><ul><ul><li>Sponsored PREMIS working group </li></ul></ul><ul><ul><li>Developed common data model for metadata preservation, and implementation strategies (Release 2.0) </li></ul></ul>
  14. 14. OCLC Semantic Projects (Continued) <ul><li>ECHODEP at UIUC partnered with OCLC , etc. & funded under LC’s NDIIP 1 </li></ul><ul><ul><ul><li>“ Phase I (2004-2007) - Web archiving tool development, repository evaluation, interoperability tool development for METS, and long term semantic preservation research” </li></ul></ul></ul><ul><ul><ul><li>“ Phase II (2008-2009) – expansion of repository architecture; semantic archiving for preservation of meaning and structure; auto metadata extraction and creation, and user evaluation; data format risk assessment based on INFORM methodology” </li></ul></ul></ul>
  15. 15. Other U.S. Semantic Web Projects <ul><li>Semantic Web Features in Dspace </li></ul><ul><ul><li>MIT SIMILE 1 Add-ons to enhance inter-operability among digital assets, schemata/vocabularies/ontologies, metadata and services with RDF-based tools for Open Source </li></ul></ul><ul><ul><ul><li>Longwell (faceted browser for visualizing RDF data set) </li></ul></ul></ul><ul><ul><ul><li>Piggy Bank (Firefox Extension for website scrappers & mash-ups) </li></ul></ul></ul><ul><ul><ul><li>RDFizer (Converting structured data into RDF, e.g. JPEG, MARC, MODS, OAI-PMH, OCW, BibTeX) </li></ul></ul></ul><ul><ul><ul><li>Welkin (Graph-based RDF visualizer) & Gadget (Data graph viewer for XML) </li></ul></ul></ul><ul><ul><ul><li>Timeline (Temporal data visualizer) </li></ul></ul></ul><ul><ul><li>Semantic Search of Dspace 2 contents by HPCLab, Univ. of Patras using OWL 2.0 API to support DL Reasoner & DL-query Tab of Protégé 4.0 </li></ul></ul><ul><ul><li>SKOS controlled vocabulary list in DSPace 3 , e.g. 4 </li></ul></ul>
  16. 16. Other U.S. Semantic Web Projects (Continued) <ul><li>Semantic Web Features in Fedora </li></ul><ul><ul><li>Joint Project of Fedora Commons, Cornell Univ., and Univ. of Virginia </li></ul></ul><ul><ul><li>Digital Asset Management (DAM) architecture </li></ul></ul><ul><ul><li>Mulgara 1 – RDF databases for Web front-end of Fedora repository with SPARQL and OTM (Object to Tripple Mapper) endpoints & SWRL support by Revelytix </li></ul></ul><ul><ul><li>NSDL’s Ncore Repository Architecture – Implement Fedora-based digital repository, Ncore Model, an API, and a set of middleware Webapp for collaborative, and fully functional semantic digital library 2 </li></ul></ul><ul><li>Semantic Web Features in NSDL 3 </li></ul><ul><ul><li>Providing metadata registry for controlled vocabularies deployed in SKOS, and for ARC SPARQL + Endpoint </li></ul></ul><ul><li>DuraSpace 4 </li></ul><ul><ul><li>Open source initiatives supported by Dspace and Fedora to develop synergistic technologies, services and programs to increase the interoperability of the two platforms, including Mulgara implementation </li></ul></ul><ul><ul><li>DuraCoud project funded LC NDIPP & participated by NYPL and Biodiversity Library aiming to help organizations to take advantage of cloud technologies in providing access to digital materials 5 </li></ul></ul>
  17. 17. A Comparison <ul><li>Semantic Projects by European Libraries </li></ul><ul><ul><li>More aggressive and bold </li></ul></ul><ul><ul><li>More enthusiasm </li></ul></ul><ul><ul><li>Aims at delivering digital and semantic library applications </li></ul></ul><ul><ul><li>More visible to the public </li></ul></ul><ul><ul><li>Frequent conferences on digital libraries and semantic web </li></ul></ul><ul><li>Semantic Web Projects by U.S. Libraries </li></ul><ul><ul><li>Creating semantic tools and infrastructure </li></ul></ul><ul><ul><li>Converting controlled vocabularies and laying ground work </li></ul></ul><ul><ul><li>Less visible to the public </li></ul></ul><ul><ul><li>Domain specific applications </li></ul></ul><ul><ul><li>Cautious and slow </li></ul></ul>
  18. 18. Conclusion <ul><li>Semantic-aware tools have been increasingly embedded in current Web applications with different levels of semantic support from browsers to desktops, from clients to servers, and servers to servers </li></ul><ul><li>IT vendors started to view the capability of combining the Web of Data as an opportunity to move from current HTML-based Web to the Web of Linked Data, especially big vendors such as Google, Yahoo, Oracle and others </li></ul><ul><li>The vast amount of library data, e.g. bibliographic data, authority data, controlled vocabularies, non-MARC oriented meta-data and classification schemes have been standardized for representations, controlled for quality and entity resolutions, and cross-mapped for interoperability and reuse </li></ul><ul><li>The library-related Web of Data is ready to be exposed to Semantic Web application development. </li></ul>
  19. 19. References <ul><li>Allemang, D. & Hendler, J. (2008). Semantic Web for the Working Ontologist : Effective Modeling in RDFS and OWL. Boston, MA: Morgan Kaufmann Publisher/Elsevier </li></ul><ul><li>Berners-Lee, T. (2000). Semantic Web – XML2000. In W3C Website. Retrieved Aug. 24, 2009, from </li></ul><ul><li>Berners-Lee, T. (2007). Linked Data. In W3C Website. Retrieved Aug. 24, 2009, from </li></ul><ul><li>Dupriez, C. (2009). Integrating SKOS Thesauri and Authority Lists in Dspace. In Retrieved Aug. 24, 2009, from </li></ul><ul><li>FedoraCommons Dashboard. Retrieved Aug. 24, 2009, from </li></ul><ul><li>Harper, C.A., & Tillet, B.B. (2007). Library of Congress Controlled Vocabularies and Their Application to the Semantic Web. Cataloging & Classification Quarterly, 43(3/4), 47-68 </li></ul><ul><li>Koutsomitropoulos, D. (2009). Semantic Search Facility for Dspace: Overview. In Retrieved Aug. 24, 2009, from </li></ul><ul><li>Malmsten, M. (2008). Making a Library Catalogue Part of the Semantic Web. In Berlin Proc. Int’l Conf. on Dublin Core and Metadata Applications. Retrieved Aug. 24, 2009, from </li></ul><ul><li>Marcum, D.B. (2008). Response to On the Record: Report of the Library of Congress Working Group on the Future of Bibliographic Control. Retrieved Aug. 24, 2009, from </li></ul>
  20. 20. References (Continued) <ul><li>Morris, C.M. (2009). LC and DuroCloud Launch Pilot Program … In DuraSpace Blog. Retrieved Aug. 24, 2009, from </li></ul><ul><li> </li></ul><ul><li>SIMILE Overview. Retrieved Aug. 24, 2009, from </li></ul><ul><li>SIMILE Projects. Retrieved Aug. 24, 2009, from </li></ul><ul><li>Summers, E., Isaac, A., Redding, C., & Krech, D. (2008). LCSH, SKOS, and Linked Data. Proceedings of the International Conference on Dublin Core and Meta-data Applications. Retrieved Aug. 24, 2009, from </li></ul><ul><li>Yang, S., Lee, Y.Y., Xu, A. (2009). Semantic Web and Libraries in the United States: Experimentation and Achievements. Proceedings of Emerging Trends in Technology: Libraries between Web 2.0, Semantic Web and Search Technology: IFLA 2009 Milan –Italy , Satellite Meetings in Florence. [CD-ROM]. [Firenze, Italia]: Ente Cassa di Risparmio di Firenze </li></ul>
  21. 21. Questions & Answers