Forging New Links: Libraries in the Semantic Web


Published on

Published in: Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • I’m going to talk generally about the semantic web – what it is, what it can do – then Gillian will talk specifically about the potential of the semantic web to solve some of the challenges faced by libraries.
  • Why do we need a new web at all? Let’s review some of the things that our current search engines don’t do well.
  • Current web search engines operate on string matching. They have no way to extract meaning from unstructured masses of textual data.
  • When the web is structured more like a database then computers will be able to do a lot more filtering, grouping, and reasoning. The first thing that we need to do is to stop publishing big unstructured blurbs of HTML information, and provide better metadata. @udcmrk is Martin Kalfatovic
  • One element in the FOAF ontology is “workplaceHomepage”. Like all RDF data, this element has it’s own unique URI, which you see in in pink at the bottom of the screen
  • When I enter this URI in a browser, I get information back about the object.
  • Reasoning is one very powerful aspect of the semantic web. The ability to reason allows computers to infer new information from explicit statements. It’s a complex concept, so the easiest way to describe it is to give you a few examples of computer-based reasoning.
  • Firefox Plugin - good for viewing RDF and OWL files. Tablulator to view the Family Ontology:
  • We’ve talked about how RDF allows us to create structured data, and how ontologies provide controlled vocabularies that can be shared. The last step is to link all of that data together in as many ways as possible.
  • The first step towards the semantic web vision as articulated by the W3C is for organizations to publish their data as RDF, using shared vocabularies and ontologies.
  • The second step is to establish links between the data exposed by different organizations. In order for linked data to become a web-scale discovery solution it is really important to link your own RDF data with other people’s RDF data.
  • One of the challenges is finding relevant RDF links from many different sources. You especially want to be connected with major linking hubs. links between data items across data sets requires record linkage and duplicate detection techniques (e.g. Jaro-Winkler).Interlinking DBpedia movies with LinkedMDB directors. Silk was fed with the 50000 movies from DBpedia and 2500 directors from LinkedMDB. Silk was configured to set a dbpedia:director link from the movie to its director.Identifying duplicate person descriptions in a data stream. owl:sameAs links for URIs which effectively identify the same entity. LinQuer is a tool for semantic link discovery over relational data, based on string and semantic matching techniques and their combinations.Discovering links between different entities in data sources is a challenging task and an attractive research area. Existence of links add value to data sources, enhance data access and information discovery, and allow or enhance many increasingly important data mining tasks. When data sources are not linked, they resemble islands of data (or data silos), where each island maintains only part of the data necessary to satisfy a user's information needs. Penetrating these silos to both understand their contents and understand potential semantic connections is a daunting task. What users and data publishers need is automated support for creating referential links between data that reside in different sources and that are semantically related. Finding such links often requires the use of approximate matching (to overcome syntactic representational differences and errors) and semantic matching (to find specific semantic relationships). Furthermore, both types of matching must be tightly integrated to accommodate for the tremendous heterogeneity found in the data that reside in today's information systems."Entities should not be multiplied beyond necessity" [Ockham's razor, XIV century]"Entity identifiers should not be multiplied beyond necessity" [OKKAM's razor, XXI century]OKKAM will contribute to this vision by supporting the convergence towards the use of a single and globally unique identifier for any entity which is named on the Web. Therefore, OKKAM will make available to content creators, editors and developers a global infrastructure and a collection of new tools and plugins which support them to easily find public identifiers for the entities named in their contents/services.The ENS will be a distributed service which permanently stores identifiers for entities and provides a collection of core services (e.g. entity matching, ID mapping and resolution) needed to support their pervasive reuse;provide a general service for entity-level integration of virtually any type of data and service into the global Web of Entities of the challenges is finding relevant RDF links from many different sources. You especially want to be connected with major linking hubs.
  • This is a graphical representation of the linked data cloud. Every circle represents the RDF data set that has been exposed by a specific organization. The lines shows how those datasets have been linked together. Some of the circles have a lot of inbound and outbound links, and we refer to those nodes as linking hubs.
  • Some linking hubs represent entities from a particular knowledge domain - information about music, or protein sequencing, for example. Some linking hubs, like Freebase or Dbpedia are more general, and contain RDF representations covering a lot of different subjects. Dbpedia is an interesting example, because it harvests the Wikipedia database, and converts it into RDF.
  • One of the problems that we have at the moment is that masses of unstructured text already exist on the internet, and we need ways to insert RDF links into that existing data.
  • DBpedia Spotlight is an example of a web service that performs semantic annotation of unstructured text. You can see this on the web by simply pasting a paragraph into the textbox on the spotlight demo page. connecting text documents with DBpedia, our system enables a range of interesting use cases. For instance, the ontology can be used as background knowledge to display complementary information on web pages or to enhance information retrieval tasks. Moreover, faceted browsing over documents and customization of web feeds based on semantics become feasible. Finally, by following links from DBpedia into other data sources, the Linked Open Data cloud is pulled closer to the Web of Documents.
  • When you click the annotate button, DBpedia’s processing engine identifies concepts and entities within this text blurb, and suggests links to the RDF descriptions of those objects within DBpedia. [One of the ways DBpedia Spotlight aims at flexibility is by letting users determine what degree of precision makes the most sense for the application to which they would like to apply its semantic annotation. The current version of DBpedia Spotlight was built from a DBpedia3.6+Wikipedia dump from Oct. 2010, and users can configure the confidence value for returning annotations about content entities. Setting it higher may result in fewer annotations but the ones returned are more likely to be correct, while a lower confidence value will try to get you as many annotations as possible but the likelihood of mistakes grows.]
  • If we click through to the dbpedia page for Apple Corporation, we can see dbpedia’s highly structured data relating to the company, and all of the RDF links to related entities. If you link the word Apple in your text to this extended information, then semantically aware tools can use all of this data to search and reason.“Connecting your text to DBpedia enables this use case of more semantic processing or browsing of your text,” says Mendes.
  • A lot of this stuff may seem intimidating, but you are not expected to know how to write RDF in notepad, in the same way that you don’t need to know HTML in order to publish a blog post. Lots of semanticauthoring tools exist that allow you to produce RDF as you publish new information.
  • Drupal is a content management system that many libraries already use to publish their websites. The latest version of Drupal has RDF publishing tools built right into the core. As you create your website and add new content RDF data will automatically be added to your pages, even if you aren’t aware that this is happening.
  • Uses forms to collect information so that data is structured as it is entered.
  • RDF output, links to Linking Open Data entities and has properly defined namespace.Zemanta suggests appropriate in-text links, so if you type a name, for example it will suggest a wikipedia page, or a blog or an online portfolio for that person. Optimized for user-generated content Other semantic APIs are built to manage only well-formatted documents and texts. Zemanta is built with the fluid nature of today's Web in mind and will not fail to extract the meaning even in the most dubious of situationsImplicit disambiguation means that it never confuses Apple for apples We achieve this by comparing numerous meanings of each extracted term and acting based on that evaluation.
  • We’ve been hearing about semantic technologies for a long time now, and a lot of people think that linked data is just a lot of blue sky thinking that has no support on the current web.
  • data shows that the usage of RDFa has increased 510% between March, 2009 and October, 2010, from 0.6% of webpages to 3.6% of webpages (or 430 million webpages in our sample of 12 billion). This is largely thanks to the efforts of the folks at Yahoo! (SearchMonkey), Google (Rich Snippets) and Facebook (Open Graph), all of whom recommend the usage of RDFa.
  • Many of the major technology companies have already invested in Semantic Tech. Facebook = OpenGraph ProtocolTwitter = Twitter AnnotationsCisco inked deal with SW company DERI.Apple bought Siri personal assistant appGoogle acquired Metaweb & Freebase, supports RDFa in Rich Snippets.Microsoft bought PowerSet in 2008 to integrate with Bing. In 2010 they licensed semantic technologies from Cognition.
  • Facets appear down the left side that allow you to refine your search.
  • Now that you have command over some of the basic semantic web concepts, Gillian is going to talk about linked data specifically in libraries.
  • \
  • Forging New Links: Libraries in the Semantic Web

    1. 1. Forging New Links: Libraries in the Semantic Web Lisa Goddard & Gillian Byrne Memorial University Libraries Computers in Libraries, Washington D.C. March 23rd, 2011
    2. 2. The Gist General Semantic Web • How it works. • A few tools. • Who’s involved? Lisa Libraries & Linked Data • What it solves. • Issues & obstacles. • Where we are now. Gillian
    3. 3. Web Search Problems
    4. 4. High Recall, Low Precision
    5. 5. Vocabulary Dependent
    6. 6. Returns Single Web Pages
    7. 7. Access to Deep Web
    8. 8. Identity
    9. 9. Comparisons Academic Staff Member (University College London) Faculty Member (McGill)Equivalent to?
    10. 10. Complex Queries Find all soccer players, who played as goalkeeper for a club that has a stadium with more than 40,000 seats and who are born in a country with more than 10 million inhabitants.
    11. 11. StructuredDatabases
    12. 12. The Semantic Solution 1. Structured data 2. Controlled vocabularies 3. Linking
    13. 13. Machine-Actionable Data “Our top two users are computers.” - Martin Kalfatovic, Smithsonian
    14. 14. Structured Data: RDF Data model for writing simple statements about web objects. RDF statements are written as “triples”. Subject Object Shakespeare Macbeth Predicate Wrote V Statement
    15. 15. RDF Triples Subject Predicate Object Shakespeare Shakespeare Anne Hathaway Shakespeare Stratford Macbeth England Scotland Wrote Wrote Married Lived in Is in Set in Part of Part of King Lear Macbeth Shakespeare Stratford England Scotland UK UK
    16. 16. RDF Graph: A Semantic Net AnneHathaway Shakespeare Stratford UK Macbeth KingLear Scotland Englandwrote isIn setIn
    17. 17. Unique Identifiers: URIs Shakespeare Macbeth wrote URIs should resolve.
    18. 18. RDF Description: FOAF <rdf:RDF xmlns:rdf="" xmlns:rdfs="" xmlns:foaf="" <foaf:Personrdf:ID="me"> <foaf:name>Lisa Goddard</foaf:name> <foaf:title>Ms.</foaf:title> <foaf:givenname>Lisa</foaf:givenname> <foaf:family_name>Goddard</foaf:family_name> <foaf:homepagerdf:resource=""/> <foaf:workplaceHomepagerdf:resource=""/> <foaf:schoolHomepagerdf:resource=""/> <foaf:knows> <foaf:Person> <foaf:name>Gillian Byrne</foaf:name> <foaf:mboxrdf:resource=""/> </foaf:Person> </foaf:knows> </foaf:Person> </rdf:RDF>
    19. 19. Resolvable URIs xmlns:foaf="" <foaf:Personrdf:ID="me"> <foaf:name>Lisa Goddard</foaf:name> <foaf:workplaceHomepage rdf:resource=""/> </foaf:Person>
    20. 20. Resolvable URIs
    21. 21. An ontology describes a particular domain of knowledge (e.g. bikes, whiskey). • Establishes controlled vocabulary. • Models relationships between entities & concepts. • Built-in rules and datatypes that support reasoning. Ontologies
    22. 22. Controlled Vocabulary Terms and definitions are posted online, so they can be shared by many different organizations. #wrote #setIn #play #book #poem #narrated
    23. 23. SharedOntologies Namespaces allow us to combine several vocabularies while maintaining distinct meaning of each element. #person #partOf #wrote #setIn #married #play #book #poem #narrated #isIn #country #city #region #birthdate #deathdate
    24. 24. Ontologies& Reasoning
    25. 25. Reasoning: Inverse lit:wroteowl:inverseOflit:writtenBy bio:Shakespearelit:wrotelit:Macbeth lit:Macbethlit:writtenBybio:Shakespeare Axiom Explicit Fact Implicit Fact
    26. 26. Reasoning: Symmetrical bio:marriedrdf:typeowl:SymmetricProperty bio:AnneHathawaybio:marriedbio:Shakespeare bio:Shakespearebio:marriedbio:AnneHathaway Axiom Explicit Fact Implicit Fact
    27. 27. Reasoning: Equivalent lit:WilliamShakespeareowl:sameAsbio:Shakespeare bio:Shakespearelit:wrotelit:Macbeth lit:WilliamShakespearelit:wrotelit:Macbeth Explicit Facts Axiom Implicit Facts
    28. 28. Finding Ontologies
    29. 29. RDF Browser
    30. 30. Linking Distributed Data
    31. 31. Linking Data
    32. 32. Linking Data
    33. 33. Linking Entities Directors Movies dbpedia:director foaf:made
    34. 34. Linking Hubs
    35. 35. Semantic Linking Hubs
    36. 36. Natural Language Processing Identify people, places, things, concepts in unstructured text files. Disambiguate terms. Suggest links to existing entities. ≠
    37. 37. RDF Publishing Tools
    38. 38. Drupal 7 CMS
    39. 39. Semantic MediaWiki
    40. 40. Semantic Blogging: Zemanta
    41. 41. Linked Data Adoption
    42. 42. Growth of RDFa Usage of RDFa increased 510% between Mar 2009 and Oct 2010 From:
    43. 43. Websites with RDF
    44. 44. Semantic Technologies
    45. 45. Google Faceted Recipe Search
    46. 46. Library Linked Data
    47. 47. Disconnected Data
    48. 48. Silos in the Library Source: Provincial Archives of Newfoundland & Labrador
    49. 49. Weak Links
    50. 50. Enhanced Linking acpherson,+Cluny,+1879-1966
    51. 51. Lost Content
    52. 52. Enhanced Content
    53. 53. Missed Opportunities
    54. 54. Enhancedpersonalization Julie History 1012 List ID: 12 Newfoundland to confederation @prefix aiiso: <>. @prefix resource: < >. @prefix foaf: <> . @prefix bibo: <> . @prefix dcterms: <> .
    55. 55. What’s our value? • 2.0 thinking: our value is linked to our data • 3.0 thinking: our value is linked to our (re)useable, shareable data?
    56. 56. Obstacles
    57. 57. Competing Vocabularies many ways to describe a book, journal article or a place? Ian Millard, Hugh Glaser, Manuel Salvadores, Nigel Shadbolt
    58. 58. Co-referencing 1. 2. 3. 4. 00005e34c1 5. 6.
    59. 59. Discovering • Lots (and lots and lots) of linked data out there • How to find it?
    60. 60. Querying
    61. 61. Trust The largest hurdle to library adoption of Linked Data, though, may not be educational or technological …The sticking point for librarians may be an issue of trust. - Ross Singer, “Linked Data Now!”
    62. 62. Preservation What happens when a ontology or linking hub disappears?
    63. 63. Data ownership Library Catalogue Digital Archive Database Repository Ejournal
    64. 64. Licensing “You shall not use the data made available through the GC Open Data Portal in any way which, in the opinion of Canada, may bring disrepute to or prejudice the reputation of Canada."
    65. 65. VoID :DBpedia a void:Dataset ; dcterms:license< tml> . • schema to describe linked datasets
    66. 66. Oh - One more thing… “who’s minding the ranch?”
    67. 67. RDA • Works with in MARC, but also works as a linked data Metadata Vocabulary
    68. 68. RDF Converters
    69. 69. Publishing Tools
    70. 70. Where we are now
    71. 71. Age of Chaotic Innovation? LIBRIS (Swedish Union Catalog) Library of Congress (LCSH, OSI) German National Library Hungarian National Library British Library Europeana Linked Periodicals Data Virtual International Authority File Dewey Decimal Classification BIBSYS’ authority files Thesaurus for Economics Rameau Swedish Subject Headings German Subject Headings Metadata Authority Description Schema (MADS)
    72. 72. The Chaos Tamers • W3C Linked Library Data Incubator Group • IFLA Semantic Web Interest Group • CKAN Linked Library Data Group • LITA/ALCTS Linked Library Data Interest Group
    73. 73. Questions? Source: