Forging New Links: Libraries in the Semantic Web
Upcoming SlideShare
Loading in...5
×
 

Forging New Links: Libraries in the Semantic Web

on

  • 353 views

 

Statistics

Views

Total Views
353
Views on SlideShare
353
Embed Views
0

Actions

Likes
0
Downloads
1
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • I’m going to talk generally about the semantic web – what it is, what it can do – then Gillian will talk specifically about the potential of the semantic web to solve some of the challenges faced by libraries.
  • Why do we need a new web at all? Let’s review some of the things that our current search engines don’t do well.
  • Current web search engines operate on string matching. They have no way to extract meaning from unstructured masses of textual data.http://semanticweb.com/happy-birthday-wikipedia-but-dbpedia-has-reason-to-celebrate-too_b17320
  • When the web is structured more like a database then computers will be able to do a lot more filtering, grouping, and reasoning. The first thing that we need to do is to stop publishing big unstructured blurbs of HTML information, and provide better metadata. @udcmrk is Martin Kalfatovic
  • One element in the FOAF ontology is “workplaceHomepage”. Like all RDF data, this element has it’s own unique URI, which you see in in pink at the bottom of the screen
  • When I enter this URI in a browser, I get information back about the object. http://xmlns.com/foaf/spec/#term_workInfoHomepage
  • Reasoning is one very powerful aspect of the semantic web. The ability to reason allows computers to infer new information from explicit statements. It’s a complex concept, so the easiest way to describe it is to give you a few examples of computer-based reasoning.
  • http://protegewiki.stanford.edu/wiki/Protege_Ontology_Library
  • Firefox Plugin - good for viewing RDF and OWL files. http://dig.csail.mit.edu/2007/tab/Using Tablulator to view the Family Ontology: http://protege.cim3.net/file/pub/ontologies/family.swrl.owl/family.swrl.owl
  • We’ve talked about how RDF allows us to create structured data, and how ontologies provide controlled vocabularies that can be shared. The last step is to link all of that data together in as many ways as possible.
  • The first step towards the semantic web vision as articulated by the W3C is for organizations to publish their data as RDF, using shared vocabularies and ontologies.
  • The second step is to establish links between the data exposed by different organizations. In order for linked data to become a web-scale discovery solution it is really important to link your own RDF data with other people’s RDF data.
  • One of the challenges is finding relevant RDF links from many different sources. You especially want to be connected with major linking hubs. http://www4.wiwiss.fu-berlin.de/bizer/silk/Discovering links between data items across data sets requires record linkage and duplicate detection techniques (e.g. Jaro-Winkler).Interlinking DBpedia movies with LinkedMDB directors. Silk was fed with the 50000 movies from DBpedia and 2500 directors from LinkedMDB. Silk was configured to set a dbpedia:director link from the movie to its director.Identifying duplicate person descriptions in a data stream. owl:sameAs links for URIs which effectively identify the same entity.http://www.wiwiss.fu-berlin.de/en/institute/pwo/bizer/research/publications/JentzschIseleBizer-Silk-Poster-ISWC2010.pdf LinQuer is a tool for semantic link discovery over relational data, based on string and semantic matching techniques and their combinations.Discovering links between different entities in data sources is a challenging task and an attractive research area. Existence of links add value to data sources, enhance data access and information discovery, and allow or enhance many increasingly important data mining tasks. When data sources are not linked, they resemble islands of data (or data silos), where each island maintains only part of the data necessary to satisfy a user's information needs. Penetrating these silos to both understand their contents and understand potential semantic connections is a daunting task. What users and data publishers need is automated support for creating referential links between data that reside in different sources and that are semantically related. Finding such links often requires the use of approximate matching (to overcome syntactic representational differences and errors) and semantic matching (to find specific semantic relationships). Furthermore, both types of matching must be tightly integrated to accommodate for the tremendous heterogeneity found in the data that reside in today's information systems.http://dblab.cs.toronto.edu/project/linquer/OneOKKAM http://www.okkam.org/okkam-more"Entities should not be multiplied beyond necessity" [Ockham's razor, XIV century]"Entity identifiers should not be multiplied beyond necessity" [OKKAM's razor, XXI century]OKKAM will contribute to this vision by supporting the convergence towards the use of a single and globally unique identifier for any entity which is named on the Web. Therefore, OKKAM will make available to content creators, editors and developers a global infrastructure and a collection of new tools and plugins which support them to easily find public identifiers for the entities named in their contents/services.The ENS will be a distributed service which permanently stores identifiers for entities and provides a collection of core services (e.g. entity matching, ID mapping and resolution) needed to support their pervasive reuse;provide a general service for entity-level integration of virtually any type of data and service into the global Web of Entities of the challenges is finding relevant RDF links from many different sources. You especially want to be connected with major linking hubs.
  • This is a graphical representation of the linked data cloud. Every circle represents the RDF data set that has been exposed by a specific organization. The lines shows how those datasets have been linked together. Some of the circles have a lot of inbound and outbound links, and we refer to those nodes as linking hubs.
  • Some linking hubs represent entities from a particular knowledge domain - information about music, or protein sequencing, for example. Some linking hubs, like Freebase or Dbpedia are more general, and contain RDF representations covering a lot of different subjects. Dbpedia is an interesting example, because it harvests the Wikipedia database, and converts it into RDF.
  • One of the problems that we have at the moment is that masses of unstructured text already exist on the internet, and we need ways to insert RDF links into that existing data.
  • DBpedia Spotlight is an example of a web service that performs semantic annotation of unstructured text. You can see this on the web by simply pasting a paragraph into the textbox on the spotlight demo page. http://spotlight.dbpedia.org/demo/index.xhtmlBy connecting text documents with DBpedia, our system enables a range of interesting use cases. For instance, the ontology can be used as background knowledge to display complementary information on web pages or to enhance information retrieval tasks. Moreover, faceted browsing over documents and customization of web feeds based on semantics become feasible. Finally, by following links from DBpedia into other data sources, the Linked Open Data cloud is pulled closer to the Web of Documents.
  • When you click the annotate button, DBpedia’s processing engine identifies concepts and entities within this text blurb, and suggests links to the RDF descriptions of those objects within DBpedia. [One of the ways DBpedia Spotlight aims at flexibility is by letting users determine what degree of precision makes the most sense for the application to which they would like to apply its semantic annotation. The current version of DBpedia Spotlight was built from a DBpedia3.6+Wikipedia dump from Oct. 2010, and users can configure the confidence value for returning annotations about content entities. Setting it higher may result in fewer annotations but the ones returned are more likely to be correct, while a lower confidence value will try to get you as many annotations as possible but the likelihood of mistakes grows. http://semanticweb.com/the-spotlight%E2%80%99s-on-dbpedia_b17942]
  • If we click through to the dbpedia page for Apple Corporation, we can see dbpedia’s highly structured data relating to the company, and all of the RDF links to related entities. If you link the word Apple in your text to this extended information, then semantically aware tools can use all of this data to search and reason.http://dbpedia.org/page/Apple_Inc.“Connecting your text to DBpedia enables this use case of more semantic processing or browsing of your text,” says Mendes.http://semanticweb.com/the-spotlight%E2%80%99s-on-dbpedia_b17942
  • A lot of this stuff may seem intimidating, but you are not expected to know how to write RDF in notepad, in the same way that you don’t need to know HTML in order to publish a blog post. Lots of semanticauthoring tools exist that allow you to produce RDF as you publish new information.
  • Drupal is a content management system that many libraries already use to publish their websites. The latest version of Drupal has RDF publishing tools built right into the core. As you create your website and add new content RDF data will automatically be added to your pages, even if you aren’t aware that this is happening. http://drupal.org/
  • Uses forms to collect information so that data is structured as it is entered.http://semantic-mediawiki.org/wiki/Semantic_MediaWikihttp://sandbox.semantic-mediawiki.org/wiki/Special:ExportRDFhttp://smwdemo.ontoprise.com/index.php/User:Lisagoddard
  • http://www.zemanta.com/http://lisagoddard.blogspot.comSupports RDF output, links to Linking Open Data entities and has properly defined namespace.Zemanta suggests appropriate in-text links, so if you type a name, for example it will suggest a wikipedia page, or a blog or an online portfolio for that person. Optimized for user-generated content Other semantic APIs are built to manage only well-formatted documents and texts. Zemanta is built with the fluid nature of today's Web in mind and will not fail to extract the meaning even in the most dubious of situationsImplicit disambiguation means that it never confuses Apple for apples We achieve this by comparing numerous meanings of each extracted term and acting based on that evaluation.
  • We’ve been hearing about semantic technologies for a long time now, and a lot of people think that linked data is just a lot of blue sky thinking that has no support on the current web.
  • http://tripletalk.wordpress.com/2011/01/25/rdfa-deployment-across-the-web/The data shows that the usage of RDFa has increased 510% between March, 2009 and October, 2010, from 0.6% of webpages to 3.6% of webpages (or 430 million webpages in our sample of 12 billion). This is largely thanks to the efforts of the folks at Yahoo! (SearchMonkey), Google (Rich Snippets) and Facebook (Open Graph), all of whom recommend the usage of RDFa.
  • Many of the major technology companies have already invested in Semantic Tech. Facebook = OpenGraph ProtocolTwitter = Twitter AnnotationsCisco inked deal with SW company DERI.Apple bought Siri personal assistant appGoogle acquired Metaweb & Freebase, supports RDFa in Rich Snippets.Microsoft bought PowerSet in 2008 to integrate with Bing. In 2010 they licensed semantic technologies from Cognition.
  • Facets appear down the left side that allow you to refine your search.
  • Now that you have command over some of the basic semantic web concepts, Gillian is going to talk about linked data specifically in libraries.
  • \

Forging New Links: Libraries in the Semantic Web Forging New Links: Libraries in the Semantic Web Presentation Transcript

  • Forging New Links: Libraries in the Semantic Web Lisa Goddard & Gillian Byrne Memorial University Libraries Computers in Libraries, Washington D.C. March 23rd, 2011
  • The Gist General Semantic Web • How it works. • A few tools. • Who’s involved? Lisa Libraries & Linked Data • What it solves. • Issues & obstacles. • Where we are now. Gillian
  • Web Search Problems
  • High Recall, Low Precision
  • Vocabulary Dependent
  • Returns Single Web Pages
  • Access to Deep Web
  • Identity
  • Comparisons Academic Staff Member (University College London) Faculty Member (McGill)Equivalent to?
  • Complex Queries Find all soccer players, who played as goalkeeper for a club that has a stadium with more than 40,000 seats and who are born in a country with more than 10 million inhabitants.
  • StructuredDatabases
  • The Semantic Solution 1. Structured data 2. Controlled vocabularies 3. Linking
  • Machine-Actionable Data “Our top two users are computers.” - Martin Kalfatovic, Smithsonian
  • Structured Data: RDF Data model for writing simple statements about web objects. RDF statements are written as “triples”. Subject Object Shakespeare Macbeth Predicate Wrote V Statement
  • RDF Triples Subject Predicate Object Shakespeare Shakespeare Anne Hathaway Shakespeare Stratford Macbeth England Scotland Wrote Wrote Married Lived in Is in Set in Part of Part of King Lear Macbeth Shakespeare Stratford England Scotland UK UK
  • RDF Graph: A Semantic Net AnneHathaway Shakespeare Stratford UK Macbeth KingLear Scotland Englandwrote isIn setIn
  • Unique Identifiers: URIs Shakespeare Macbeth wrote http://www.mun.ca/project#shakespeare http://www.mun.ca/project#macbeth http://www.mun.ca/project#wrote URIs should resolve.
  • RDF Description: FOAF <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:foaf="http://xmlns.com/foaf/0.1/" <foaf:Personrdf:ID="me"> <foaf:name>Lisa Goddard</foaf:name> <foaf:title>Ms.</foaf:title> <foaf:givenname>Lisa</foaf:givenname> <foaf:family_name>Goddard</foaf:family_name> <foaf:homepagerdf:resource="http://twitter.com/lisagoddard"/> <foaf:workplaceHomepagerdf:resource="http://www.library.mun.ca"/> <foaf:schoolHomepagerdf:resource="http://www.queensu.ca"/> <foaf:knows> <foaf:Person> <foaf:name>Gillian Byrne</foaf:name> <foaf:mboxrdf:resource="mailto:gbyrne@mun.ca"/> </foaf:Person> </foaf:knows> </foaf:Person> </rdf:RDF>
  • Resolvable URIs xmlns:foaf="http://xmlns.com/foaf/0.1/" <foaf:Personrdf:ID="me"> <foaf:name>Lisa Goddard</foaf:name> <foaf:workplaceHomepage rdf:resource="http://www.library.mun.ca"/> </foaf:Person> http://xmlns.com/foaf/0.1/#term_workplaceHomepage
  • Resolvable URIs
  • An ontology describes a particular domain of knowledge (e.g. bikes, whiskey). • Establishes controlled vocabulary. • Models relationships between entities & concepts. • Built-in rules and datatypes that support reasoning. Ontologies
  • Controlled Vocabulary Terms and definitions are posted online, so they can be shared by many different organizations. http://www.mun.ca/lit.owl #wrote #setIn #play #book #poem #narrated http://www.mun.ca/lit.owl#wrote http://www.mun.ca/lit.owl/#book http://www.mun.ca/lit.owl#play
  • SharedOntologies Namespaces allow us to combine several vocabularies while maintaining distinct meaning of each element. #person #partOf http://gmu.edu/bio.owl http://mit.edu/geo.rdf http://mun.ca/lit.owl #wrote #setIn #married #play #book #poem #narrated #isIn #country #city #region #birthdate #deathdate
  • Ontologies& Reasoning
  • Reasoning: Inverse lit:wroteowl:inverseOflit:writtenBy bio:Shakespearelit:wrotelit:Macbeth lit:Macbethlit:writtenBybio:Shakespeare Axiom Explicit Fact Implicit Fact
  • Reasoning: Symmetrical bio:marriedrdf:typeowl:SymmetricProperty bio:AnneHathawaybio:marriedbio:Shakespeare bio:Shakespearebio:marriedbio:AnneHathaway Axiom Explicit Fact Implicit Fact
  • Reasoning: Equivalent lit:WilliamShakespeareowl:sameAsbio:Shakespeare bio:Shakespearelit:wrotelit:Macbeth lit:WilliamShakespearelit:wrotelit:Macbeth Explicit Facts Axiom Implicit Facts
  • Finding Ontologies
  • RDF Browser
  • Linking Distributed Data
  • Linking Data
  • Linking Data
  • Linking Entities Directors Movies dbpedia:director foaf:made
  • Linking Hubs
  • Semantic Linking Hubs
  • Natural Language Processing Identify people, places, things, concepts in unstructured text files. Disambiguate terms. Suggest links to existing entities. ≠
  • RDF Publishing Tools
  • Drupal 7 CMS
  • Semantic MediaWiki
  • Semantic Blogging: Zemanta
  • Linked Data Adoption
  • Growth of RDFa Usage of RDFa increased 510% between Mar 2009 and Oct 2010 From: http://tripletalk.wordpress.com/2011/01/25/rdfa-deployment-across-the-web/
  • Websites with RDF
  • Semantic Technologies
  • Google Faceted Recipe Search
  • Library Linked Data
  • Disconnected Data
  • Silos in the Library Source: Provincial Archives of Newfoundland & Labrador
  • Weak Links
  • Enhanced Linking http://info.library.mun.ca/uhtbin/cgisirsi/dMp3Asia73/QEII/269560141/18/X100/XAUTHOR/M acpherson,+Cluny,+1879-1966 http://viaf.org/viaf/76097050
  • Lost Content
  • Enhanced Content
  • Missed Opportunities
  • Enhancedpersonalization Julie History 1012 List ID: 12 Newfoundland to confederation @prefix aiiso: <http://purl.org/vocab/aiiso/schema#>. @prefix resource: <http://purl.org/vocab/resourcelist/schema# >. @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix bibo: <http://purl.org/ontology/bibo/> . @prefix dcterms: <http://purl.org/dc/terms/> .
  • What’s our value? • 2.0 thinking: our value is linked to our data • 3.0 thinking: our value is linked to our (re)useable, shareable data?
  • Obstacles
  • Competing Vocabularies ...how many ways to describe a book, journal article or a place? Ian Millard, Hugh Glaser, Manuel Salvadores, Nigel Shadbolthttp://eprints.ecs.soton.ac.uk/21681/5/cold2010-slides.pdf
  • Co-referencing 1. http://dbpedia.org/resource/Cluny_MacPherson 2. http://dbpedia.org/resource/Dr._Cluny_MacPherson 3. http://mpii.de/yago/resource/Cluny_MacPherson 4. http://rdf.freebase.com/ns/guid.9202a8c04000641f800000 00005e34c1 5. http://viaf.org/viaf/76097050 6. http://umbel.org/umbel/ne/wikipedia/Cluny_MacPherson
  • Discovering • Lots (and lots and lots) of linked data out there • How to find it?
  • Querying
  • Trust The largest hurdle to library adoption of Linked Data, though, may not be educational or technological …The sticking point for librarians may be an issue of trust. - Ross Singer, “Linked Data Now!”
  • Preservation What happens when a ontology or linking hub disappears?
  • Data ownership Library Catalogue Digital Archive Database Repository Ejournal
  • Licensing “You shall not use the data made available through the GC Open Data Portal in any way which, in the opinion of Canada, may bring disrepute to or prejudice the reputation of Canada."
  • VoID :DBpedia a void:Dataset ; dcterms:license<http://www.gnu.org/copyleft/fdl.h tml> . • schema to describe linked datasets
  • Oh - One more thing… “who’s minding the ranch?”
  • RDA • Works with in MARC, but also works as a linked data Metadata Vocabulary
  • RDF Converters
  • Publishing Tools
  • Where we are now
  • Age of Chaotic Innovation? LIBRIS (Swedish Union Catalog) Library of Congress (LCSH, OSI) German National Library Hungarian National Library British Library Europeana Linked Periodicals Data Virtual International Authority File Dewey Decimal Classification BIBSYS’ authority files Thesaurus for Economics Rameau Swedish Subject Headings German Subject Headings Metadata Authority Description Schema (MADS)
  • The Chaos Tamers • W3C Linked Library Data Incubator Group • IFLA Semantic Web Interest Group • CKAN Linked Library Data Group • LITA/ALCTS Linked Library Data Interest Group
  • Questions? Source: http://www.jenniferbetman.com