Tutorial kcc-2011


Published on

Semantic Web and Linked Data

Published in: Education
1 Comment
  • I appreciate your post. I also wrote that SMS advertising provides a cost effective method of targeting promotions to specific customer profiles. You might want to remind customers of specific events or promotions, but for whatever reasons, SMS allows you to pass information directly to the right customer at very affordable prices and fast delivery.
    iso 9000
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Tutorial kcc-2011

  1. 1. Linked Data:Enabler of Semantic Web 2011.06.30 Sung-Kook Han Semantic Technology Lab Won Kwang Univ. skhan@wku.ac.kr 1
  2. 2. OutlineIntroduction to Semantic TechnologySemantic Technology + Web Technology • Semantic Web • Web 2.0 • Linked DataDesign and Publication of Linked Data • 9 steps towards Linked Open Data skhan@wku.ac.kr 2
  3. 3. Why Semantic Technology?? the ways of thinking, cognition…George Boole: An Investigation of the Laws of Thought (1854) Claude Shannon: 1937 masters thesis, A Symbolic Analysis of Relay and Switching Circuits John von Neumann Kurt Gödel Alan Turing skhan@wku.ac.kr 3
  4. 4. Why Semantic Technology?? Final Goal: Intelligence skhan@wku.ac.kr 4
  5. 5. Our Computers skhan@wku.ac.kr 5
  6. 6. Communication Human vs. Human Human vs. Alien Human vs. ComputerComputer vs. Computer skhan@wku.ac.kr 6
  7. 7. Semantic Technology Semantic technology has been a distinct research field for more than 40 years.  Formal Logic (since Russell and Frege)  Knowledge Representation Systems in AI  Semantic Networks and ATN (William Woods, 1975)  DARPA and European Commission programs in information integration  Development of simple tractable logics  Relational Algebras and Schemas in Database Systems Library Science (classifications, thesauri, taxonomies) New challenges of Semantic Technology: Semantic Web  A massive store of information that computers cannot use  A way to get around needing the “big data warehouse”  Another place where “a little semantics can go a long way”... cf: The Relationship Between Web 2.0 And the Semantic Web - Dr. Mark Greaves, Vulcan, Inc. skhan@wku.ac.kr 7
  8. 8. Ontology Spectrum strong semantics Modal Logic has_experience_in works Company First Order Logic Technologies Knowledge Representation Programs Personnel Logical Theory Is Disjoint Subclass Management S1 illusion Agent Natural Language Project am AS Description Logic of with transitivity Program AS AS DepartmentTelecommunication Task Technical Paulnderleez Leo DAML+OIL, OWL property Semantic Director EcDARPA has WISO Interoperability Request Reza Assistant Director Navy Intelligence UML Ann Brad Howard Conceptual Model Is Subclass of RDF/S Semantic Interoperability XTM Extended ER Thesaurus Has Narrower Meaning Than ER DB Schemas, XML Schema Animal Structural Interoperability Taxonomy Mammal Reptile Is Sub-Classification of Bird Relational Snake Dog Cat Model, XML Syntactic Interoperability Cocker Spaniel weak semantics Lady Based on Leo Obrst, The Ontology Spectrum & Semantic Models skhan@wku.ac.kr 8
  9. 9. Semantic Technology Intelligence Integration InteroperabilityMachine-processible Digital Semantics Information Resources Web resources Ontology Services Semantic Image Metadata Audio/Video Technology controlled Documents vocabulary skhan@wku.ac.kr 9
  10. 10. Web Technology Web of machine-processible Data Common vocabularies: Metadata and Ontology Query and reasoning Web of ServicesClassic Web Internet of ServicesWeb of Documents Internet of ThingsHTML as document formatHTTP URLs as globally unique IDsHyperlinks to connect everything Social Web Connect human-being Web as a platform Programmable APIs and proprietary interfaces Mashups based on a fixed set of data sources skhan@wku.ac.kr 10
  11. 11. Semantic Web Standardizations  Trio of Semantic Web  Metadata / Ontology: RDF, RDFS, OWL  Query Language: SPARQL  Rule Language: RIF (SWRL)  SKOS, RDFa, GRRDL, WSMO,…  SOAP/ REST Tools and Systems  Authoring, Reasoning Engines,…  835 items in Sweet Tools Best Practices  Linked Open Data  Semantic MediaWiki  NEPOMUK, SIOC, Garlik  W3C Semantic Web Use cases Sweet Tools: http://www.mkbergman.com/new-version-sweet-tools-sem-web/ W3C Semantic Web Case Studies and Use Cases: http://www.w3.org/2001/sw/sweo/public/UseCases/ skhan@wku.ac.kr 11
  12. 12. Semantic ApplicationsSemantic Wave 2008, Industry Roadmap to Web 3.0, Project10X http://www.mkbergman.com/new-version-sweet-tools-sem-web/ skhan@wku.ac.kr 12
  13. 13. Web 2.0 Resharpen the way of viewing the Web  Web as the platform  Web as the social media  Web as the collaboration tool  Web as …… Web 2.0 Manifestation  Openness / Sharing  Participation / Collaboration Web 2.0 Syndrome  Library 2.0  Government 2.0  Enterprise 2.0  …… New Web applications  wiki, blog, RSS,… skhan@wku.ac.kr 13
  14. 14. Web 2.0 Developers skhan@wku.ac.kr 14
  15. 15. Semantic Web Today Major future issues: • Vocabularies • Scalability • Provenance • Personal Infospheres • Mobile and Real World Networks skhan@wku.ac.kr 15
  16. 16. Web 2.0 APIs TodayNo Single global space: Web APIs slice the Web into Walled Gardens. • Mashups of APIs are proprietary. • No links between data. MashUp Web Web Web API API API A B C Christian Bizer: Pay-as-you-go Data Integration (21/9/2010) skhan@wku.ac.kr 16
  17. 17. The Web is Dead?? http://www.wired.com/magazine/2010/08/ff_webrip/ skhan@wku.ac.kr 17
  18. 18. Long Live the Web !http://www.scientificamerican.com/article.cfm?id=long-live-the-web skhan@wku.ac.kr 18
  19. 19. Lessons Learned Data is more important than API code.  Data is the Intel Inside.  Open data is more important than open source Structured data is more valuable than unstructured.  We should seek to structure our data well.  Metadata will play a core role of data structure. A little semantics goes a long way.  Beware the usefulness of shallow ontology shown in LOD. Linking data and services are essential.  Link every thing. Rich user experiences are the key for adaption.  We should consider mobile computing and personalization.  Visualize and navigate. skhan@wku.ac.kr 19
  20. 20. Semantic Web &Linked Data
  21. 21. Web of Documents A global file systems of documents (document silos on the Web). Implicit semantics of content and links Designed for human consumption Disconnected data skhan@wku.ac.kr 21
  22. 22. Architecture: Web of Documents  Analogy Web Search  a global file system Browsers Engines  Designed for HTTP URL  human consumption  Primary objects  documentsHTML HTML HTML  Links betweenDoc. Doc. Doc.  documents (or sub-parts of)  Degree of structure in objects hyperlink hyperlink document link document link  fairly low  Main Usage  Search and browsingDB-A DB-B DB-C  Semantics of content and links  implicit skhan@wku.ac.kr 22
  23. 23. Machine-Processible Data Web of Documents DocumentsInformation Resources Documents Human processible Data Database Machine processible Web of Data  Open the data silos and get rid of repository-centric mindset  Publish data of public interest on the Web  In a way that other applications can access and interpret the data  Using common Web technologies skhan@wku.ac.kr 23
  24. 24. Semantic Web: Web of Data The vision of a Semantic Web:  building a global Web of machine-readable data  Berners-Lee, Hendler & Lassila, 2001; Marshall & Shipman, 2003The first step is putting data on the Web in a form that machines cannaturally understand, or converting it to that form. This creates what I call aSemantic Web - a web of data that can be processed directly or indirectly bymachines. Therefore, while the Semantic Web, or Web of Data, is the goal orthe end result of this process, Linked Data provides the means to reach thatgoal. -- Tim Berners-Lee, et al., http://linkeddata.org/docs/ijswis-special-issue, Jan, 2009 Linked Data Foundation  can lower the barrier to reuse, integration and application of data from multiple, distributed and heterogeneous sources.  the more sophisticated proposals associated with the Semantic Web vision, such as intelligent agents, may become a reality. skhan@wku.ac.kr 24
  25. 25. Linked Data: Web of Data Goal: Web-scale Data Integration  Alternative to classic data integration systems in order to cope with growing number of data sources.  Querying across data sources Global distributed database RDF  Extend the Web with a single global data space  Giant Global Graph (GGG) Demonstrate the possibility of Semantic Web  By using RDF to publish structured data RDF  By setting links between data single RDF universal information space. RDF RDF RDF skhan@wku.ac.kr 25
  26. 26. Architecture: Linked Data  Analogy  a global databaseLinked Data Linked Data Search  Designed for Browsers Mashup Engines  machines first, humans later HTTP URI  Primary objects  things (or descriptions (data) of things)  Links between RDF RDF RDF  thingstriples Triples triples  Degree of structure in RDF link RDF link (descriptions of) things data link data link  highDB-A DB-B DB-C  Main usage  query, navigation and reasoning  Semantics of content and links  explicit skhan@wku.ac.kr 26
  27. 27. Linked Data PrinciplesSet of best practices for publishing structured data on the Web in accordance withthe general architecture of the Web. Use URIs as names for things.  Use URIs as names for things, not just for documents or homepages Use HTTP URIs so that people can look up those names. When someone looks up a URI, provide useful RDF information. Include RDF statements that link to other URIs so that they can discover related things. URI URI URI URI RDF Link URI RDF triple Information URI HTTP URI URI skhan@wku.ac.kr 27
  28. 28. Linked Open Data Community effort to  publish existing open license datasets as Linked Data on the Web  interlink things between different data sources  develop clients that consume Linked Data from the Web  began early 2007 skhan@wku.ac.kr 28
  29. 29. LOD Data sets on the Web 25 billion RDF triples, which are interlinked by around 395 million RDF links (Sep. 2010). http://richard.cyganiak.de/2007/10/lod/lod-datasets_2010-09-22_colored.svg skhan@wku.ac.kr 29
  30. 30. Summary: Web of Linked Data A global, distributed database built on a simple set of standards  RDF, URI, HTTP Explicit semantics of content and links Resources are connected by semantic links.  creating a single global data graph that span data sources  enables the discovery of new data sources Provides for data co-existence  Anyone can publish data to the Web of Linked Data  Data publishers are not constrained in choice of vocabularies with which to represent data. Designed for computer first, humans later skhan@wku.ac.kr 30
  31. 31. Data.Gov skhan@wku.ac.kr 31
  32. 32. Europeana European digital library: Europeana: This European Commission initiative encompasses not only libraries but also museums, archives and other holders of cultural heritage material.http://version1.europeana.eu/web/europeana-project skhan@wku.ac.kr 32
  33. 33. Linked Library Cloud Libraries have been producing metadata for ages. Libraries (often) produce high- quality metadata. Library develops many metadata standards such as DC, SKOS, BIBO, OAI-ORE including MARC 21, MODS, FRBR,.. Integrate Library Catalogues on global scale http://code4lib.org/conference/2010/singer skhan@wku.ac.kr 33
  34. 34. Linking Open Drug Data linking the various sources of drug data together to answer interesting scientific and business questions.  Survey publicly available data sets about drugs  Publish and interlink these data sets on the Web  Explore interesting questions that could be answered if the data sets are linked. 8 million RDF triples, which are interlinked by more than 370,000 RDF links (As of August 2009) skhan@wku.ac.kr 34
  35. 35. BBC Semantic Project Publish program / music data as RDF/XML or RDFa Build semantically linked and annotated web pages about artists and singers whose songs are played on BBC radio stations. semantically interconnected skhan@wku.ac.kr 35
  36. 36. DBpedia Mobile Show map with information about nearby locations Linked data browser GPS + Google Maps + DBpedia + Flickr + Revyu skhan@wku.ac.kr 36
  37. 37. Attention by Search Engines Yahoo!  crawls Linked Data in its RDFa serialization as well as Microformat  Yahoo Search Monkey to make search results more useful and visually appealing  provides access to crawled data through the Yahoo BOSS API Google  use Social Graph API  is developing Google Squared and Google Fusion Table  merged MetaWeb  manage Freebase, a DBpedia/YAGO competitor Rich Snippets skhan@wku.ac.kr 37
  38. 38. Linked Open Commerce skhan@wku.ac.kr 38
  39. 39. Design and PublicationofLinked Data
  40. 40. 9 Steps to publishing Linked Data  Publicize your Data Sets  Describe your Data Sets  Link to other Data Sets  Triplify Data Sets  Choose URIs for Things in your Data  Create Vocabularies  Understand your data  Setup Your Infrastructure for Linked Data Understand the principles skhan@wku.ac.kr 40
  41. 41. 1. Understand Linked Data • Principle • Core Stack • Data Modeling
  42. 42. Linked Data: Overview Benefits of Linked Data Enables web-scale data distributed publication with web-based discovery mechanisms. Linked Data Web Resources are generic real-world data objects or entities:  People, Places, and other physical things  Abstract concepts (e.g., emotion, notion,…)  Subject matter (e.g., science, economics, arts,…) Linked Data is not just structured data published on the Web. Linked Data is based on well-established Web standards Linked Data adds value: less redundancy, greater discoverability, network effects. skhan@wku.ac.kr 42
  43. 43. Linked Data Principles (TimBL, 2006) Use URIs as names for things  not just for documents  http://dbpedia.org/resource/ontology  you are not your homepage  http://mentalist.com/actor/patrick_jane Use HTTP URIs  globally unique names, distributed ownership  allows people to look up those names Provide useful information in RDF  when someone looks up a URI Include RDF links to other URIs  to enable discovery of related information skhan@wku.ac.kr 43
  44. 44. 5 Star ratingOn the web, open licensed: Available on the web (whateverformat), but with an open licenseMachine-readable data: Available as machine-readablestructured data (e.g. excel instead of image scan of a table)Non-proprietary format (e.g. csv instead of excel)RDF standards: Use open standards from W3C (RDF and SPARQL) to identify things, so that people can point at your stuffLinked RDF: Link your data to other people’s data to provide context skhan@wku.ac.kr 44
  45. 45. Linked Data Core Stack http://linkeddata-specs.info/ RFC 2616 Hypertext Transfer Protocol • HTTP/1.1 Defines HTTP, a generic and stateless application-level protocol for distributed, collaborative, hypermedia information systems. RFC 3986 Uniform Resource Identifier (URI): • Generic Syntax Defines a generic URI syntax and a process for resolving URI references that might be in relative form, along with guidelines and security considerations for the use of URIs on the Internet. RDF Concepts and Abstract Syntax • Defines the RDF graph data model and key concepts. SPARQL Query Language for RDF • Defines the syntax and semantics of the SPARQL query language for RDF. skhan@wku.ac.kr 45
  46. 46. Core Technology Uniform Resource Identifier (URI)  Names (identifiers) for resources in an open Web environment Resource Description Framework (RDF)  a model for representing metadata on the web  triple structure RDF Schema and OWL  languages for defining vocabularies RDF/XML, N3, Turtle,…  serialization and de-serialization of RDF triples for exchanging RDF data Simple Knowledge Organization System (SKOS)  a language for describing controlled vocabularies SPARQL  a query language and protocol for accessing RDF data via the Web skhan@wku.ac.kr 46
  47. 47. Linked Data Modeling Data Modeling Data Linking RDF data model to publish RDF links to interlink data structured data on the Web from different data sourcesRDF triple: subject, predicate, and object  Subject: URI identifying the described resource  Predicate: relation exists between subject and object,  vocabularies, collections of URIs that can be used to represent information about a certain domain  Object: a simple literal value, or the URI of another resource that is related to the subject skhan@wku.ac.kr 47
  48. 48. Linked Data Model dbp-prop:title The Lord of the rings http://.../isbn/46316 Flexible graph-based model: RDF graph skos:subject dbp-prop:author English novels dbp-prop:publisher The HTTP protocol brings together identification dbp-prop:name and retrieval again. foaf:homepage dbpidia:Allen&Unwin J.R.R. Tolkien opencyc:headquarter dbp-prop:city Deeper into the Web wkp-en:J.R.R.Tolkien London fb:guid…..92df7URI: global primary key fb:creatorskos:subject = http://www.w3.org/2004/02/skos/core#subject fb:street_addressdbp-prop:title = http://dbpedia.org/property/title Marivie 83 Alexander St 83 Alexander skhan@wku.ac.kr 48
  49. 49. 2. Setup Infrastructure • Basic Infrastructure • Systems and Tools skhan@wku.ac.kr 49
  50. 50. Basic Infrastructure packaging search Data/ extraction discovery Content navigation SPARQL linkRDF Triple Base generation index Query Engine DB conversion triple storeInterface Framework + APIsDelivery Web Server (Apache)Application browser navigator search skhan@wku.ac.kr 50
  51. 51. Infrastructure Construction Configuration of Web server  Configuring the server for correct MIME types application/rdf+xml  Code samples for ConNeg and 303 Redirects: http://linkeddata.org/tools  use cURL: http://curl.haxx.se/ to configure Apache  Configure for hash URI or Slash URI Testing your content negotiation  Install the LiveHTTPHeaders and Modify Headers extensions for Firefox  Try LiveHTTPHeaders against my URI  http://www.skyhigh.com/id/hong  do the same with URIs from other data sets  Modify your headers to ask for application/rdf+xml skhan@wku.ac.kr 51
  52. 52. Supporting Technologies Linked Data Browsers  provide for navigating between data sources and for exploring the dataspace.  Tabulator Browser (MIT, USA), Marbles (FU Berlin, DE), OpenLink RDF Browser (OpenLink, UK), Zitgist RDF Browser (Zitgist, USA), Disco Hyperdata Browser Berlin, Fenfire (DERI, Irland) Web of Data Search Engines  crawl the data space and provide best-effort query answers over crawled data.  Falcons (IWS, China), Sig.ma (DERI, Ireland), Swoogle (UMBC, USA), VisiNav (DERI, Ireland), Watson (Open University, UK), TAP, Sindice skhan@wku.ac.kr 52
  53. 53. Supporting Technologies Describing data set  discovery and usage of linked datasets  voiD, Ding Registry  an open registry of data and content packages  CKAN Linking tool  discovering relationships between data items within different Linked Data sources  SILK Mapping tool  mapping database to RDF triples  Triplify, D2R Server LOD platform  D2R Server, Virtuoso Universal Server, Talis Platform, Pubby, … skhan@wku.ac.kr 53
  54. 54. 3. Understand Data to be published • Review about Data to be published • Requirement analysis skhan@wku.ac.kr 54
  55. 55. Review about Data to be published What  think about the key things to be presented in Linked Data  analysis of data properties  What vocabularies can be used to describe these? Why  purposes and goals of linked data to be published What for  how to use and apply linked data (use cases) How to serve  Serving Linked Data as Static RDF/XML Files  Serving Linked Data as RDF Embedded in HTML Files  Serving RDF and HTML with Custom Server-Side Scripts  Serving Linked Data from Relational Databases  Serving Linked Data from RDF Triple Stores  Serving Linked Data by Wrapping Existing Application or Web APIs skhan@wku.ac.kr 55
  56. 56. 4. Create Vocabularies • Vocabulary Creation • Common Namespace • Definition skhan@wku.ac.kr 56
  57. 57. Guideline for Vocabulary Creation Do not define new vocabularies from scratch, but complement existing vocabularies with additional terms (in your own namespace) to represent your data as required. Provide for both humans and machines. Use rdfs:comments for each term invented. Always provide a label for each term using the rdfs:label property. Make term URIs de-referenceable following the W3C Best Practice Recipes for Publishing RDF Vocabularies. Make use of other peoples terms. Using other peoples terms, or providing mappings to them, by means of rdfs:subClassOf or rdfs:subPropertyOf. State all important information explicitly. For example, state all ranges and domains explicitly. Do not create over-constrained, brittle models; leave some flexibility for growth. Do not use full-featured OWL or RDF to define your vocabulary. Unless you know exactly what you are doing, use RDF Schema to define vocabularies. skhan@wku.ac.kr 57
  58. 58. Potential Ontologies / Vocabularies Friend-of-a-Friend (FOAF), vocabulary for describing people. Dublin Core (DC) defines general metadata attributes. See also their new domains and ranges draft. Semantically-Interlinked Online Communities (SIOC), vocabulary for representing online communities. Description of a Project (DOAP), vocabulary for describing projects. Simple Knowledge Organization System (SKOS), vocabulary for representing taxonomies and loosely structured knowledge. Music Ontology provides terms for describing artists, albums and tracks. Review Vocabulary, vocabulary for representing reviews. Creative Commons (CC), vocabulary for describing license terms Geo, vocabulary for describing geographical locations GoodRelations, vocabulary for describing products skhan@wku.ac.kr 58
  59. 59. Common Namespacesxmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"xmlns:xsd="http://www.w3.org/2001/XMLSchema#"xmlns:owl="http://www.w3.org/2002/07/owl#"xmlns:dc="http://purl.org/dc/terms/"xmlns:foaf="http://xmlns.com/foaf/0.1/"xmlns:vcard="http://www.w3.org/2006/vcard/ns#"xmlns:dbp="http://dbpedia.org/dbprop/"xmlns:geo="http://www.geonames.org/ontology#"xmlns:gr="http://purl.org/goodrelations/v1#"xmlns:commerce="http://search.yahoo.com/searchmonkey/commerce/"xmlns:media="http://search.yahoo.com/searchmonkey/media/"xmlns:cb="http://cb.semsol.org/ns#"More Common Namespaces:http://esw.w3.org/TaskForces/CommunityProjects/LinkingOpenData/CommonVocabularieshttp://www-958.ibm.com/software/data/cognos/manyeyes/visualizations/100-most-popular-rdf-namespaces skhan@wku.ac.kr 59
  60. 60. Definition of Vocabulary# Definition of the class "Lover"<http://sites.movie.org/pub/LoveVocabulary#Lover> rdf:type rdfs:Class ; rdfs:label "Lover"@en ; rdfs:label "Liebender"@de ; rdfs:comment "A person who loves somebody."@en ; rdfs:comment "Eine Person die Jemanden liebt."@de ; rdfs:subClassOf foaf:Person .# Definition of the property "loves"<http://sites.movie.org/pub/LoveVocabulary#loves> rdf:type rdf:Property ; rdfs:label "loves"@en ; rdfs:label "liebt"@de ; rdfs:comment "Relation between a lover and a loved person."@en ; rdfs:subPropertyOf foaf:knows ; rdfs:domain <http://sites.movie.org/pub/LoveVocabulary#Lover> ; rdfs:range foaf:Person . skhan@wku.ac.kr 60
  61. 61. Tools for Vocabulary Definition Ontology editors  Protégé:  an open-source ontology editor with a dedicated OWL plug-in  Neologism:  Web-based tool for creating, managing and publishing simple RDFS vocabularies.  open-source and implemented in PHP on top of the Drupal-platform.  TopBraid Composer:  a powerful commercial modeling environment for developing Semantic Web ontologies  NeOn Toolkit:  an open-source ontology engineering environment with an extensive set of plug-ins. skhan@wku.ac.kr 61
  62. 62. 5. Choose URIs • Resource Identification • Types of URIs • De-Referencing • Common URI Patterns skhan@wku.ac.kr 62
  63. 63. Resource Identification Separation of Identity and Representation Identity  Identity (URI) of an Object or Entity should be unambiguous and globally unique Representation  On the Web a URI should provide an unambiguous data access path Access  Reference to abstract (physically inaccessible)  Objects or Entities is only achievable via conduit documents that carry representations of entity descriptions (which at best are facets of an entire description) URI Requirements:  Keep out of other peoples namespaces  Use a namespace that you control  Abstract away from implementation details (Short is better…)  Stable and persistent  Hash or Slash  Use common URI patterns skhan@wku.ac.kr 63
  64. 64. URI URI: Unique Resource Identifier home page?? (Web document) http://www.example.com/people/alice information object ??URI: identification of people, products, places, ideas and concepts such asontology classes, including URLs for Web documents hash URI Two Approaches slash URI skhan@wku.ac.kr 64
  65. 65. Hash / Slash URI Hash URI  URIs can contain a fragment, a special part that is separated from the rest of the URI by a hash symbol (“#”).  http://www.example.com/products/BiBimBab#this  http://www.travel.com /nation/Korea/KyungJu#main  simply publish a description document containing RDF about the things at the base URI Slash URI  examples:  http://www.example.com/products/BiBimBab  http://www.travel.com /nation/Korea/KyungJu  must publish your description document at another, distinct URI. skhan@wku.ac.kr 65
  66. 66. hash URI http://www.skyhigh.com/person/GilDong#this Separating identification and naming from representationMetadata:content-type:application/xhtml+ xmlData:<html xmlns=“..<head> Entity<title> Our hero… (GilDong)</html>http://www.skyhigh.com/person/GilDong skhan@wku.ac.kr 66
  67. 67. slash URI http://www.skyhigh.com/person/hero/GilDong/id Separating identification and naming from representation Metadata: content-type: application/xhtml+ xml Metadata: Data: content-type: <html xmlns=“.. application/rdf+xml <head> Entity <title> Our hero… (GilDong) Data: <html xmlns=“.. </html> <head> <title> Our hero…http://www.skyhigh.com/person/hero/GilDong/page </html> http://www.skyhigh.com/person/hero/GilDong/data skhan@wku.ac.kr 67
  68. 68. Slash vs. Hash Slash URI  HTTP redirection (30X response) is required in order for resource "Identity" to be separated from "representation". :  http://www.skyhigh.com/person/hero/GilDong/id (URI of an Organization Entity)  http://www.skyhigh.com/person/hero/GilDong/page (HTML representation of Entity description)  http://www.skyhigh.com/person/hero/GilDong/data (RDF representation that describes the Entity which could be: Turtle, N3. RDF/XML etc. based data serialization) Hash URI  HTTP redirection isnt required in order for resource "Identity" to be separated from "representation". :  http://demo.openlinksw.com/Northwind/Customer/ALFKI#this (URI of an Organization Entity)  http://demo.openlinksw.com/Northwind/Customer/ALFKI a document (HTML, Turtle, N3, RDF/XML, representation of Entity description). skhan@wku.ac.kr 68
  69. 69. DeReferencing Hash URI Without content negotiation  With content negotiation http://www.example.com/about#alicehttp://www.example.com/about#alice ID ID automatic truncation of fragment http://www.example.com/about automatic truncation of fragment application/rdf+xml win text/html win content negotiation RDF RDF http://www.example.com/about http://www.example.com/about.rdf HTML http://www.example.com/about.html skhan@wku.ac.kr 69
  70. 70. DeReferencing Slash URI  One Generic Document  Different documents http://www.example.com/id/alice http://www.example.com/id/alice ID ID 303 redirected text/html win http://www.example.com/doc/alice application/rdf+xml win generic document 303 redirected application/rdf+xml win text/html win with content negotiation content RDF negotiation http://www.example.com/doc/alice.rdf RDF HTMLhttp://www.example.com/doc/alice.rdf HTML http://www.example.com/doc/alice.html http://www.example.com/doc/alice.html skhan@wku.ac.kr 70
  71. 71. Content Negotiation skhan@wku.ac.kr 71
  72. 72. Content Negotiation skhan@wku.ac.kr 72
  73. 73. Common URI Patternhttp://dbpedia.org/resource/New_York_City Thinghttp://dbpedia.org/data/New_York_City RDF datahttp://dbpedia.org/page/New_York_City HTML pagehttp://revyu.com/people/tom Thinghttp://revyu.com/people/tom/about/rdf RDF datahttp://revyu.com/people/tom/about/html HTML pagehttp://www.bbc.co.uk/music/artists/db4624cf#artist Thinghttp://www.bbc.co.uk/music/artists/db4624cf.rdf RDF datahttp://www.bbc.co.uk/music/artists/db4624cf.html HTML pagehttp://id.dbpedia.org/Berlin Thinghttp://data.dbpedia.org/Berlin RDF Datahttp://page.dbpedia.org/Berlin HTML pagehttp://www4.wiwiss.fu-berlin.de/bookmashup/books/006251587X ISBN skhan@wku.ac.kr 73
  74. 74. Choosing URI http://www.culture.com/LOD/{class}/{member} http://www.culture.com/LOD/{class}/{member}.rdf http://www.culture.com/LOD/{class}/{member}.html Examples:  URI of an Organization Entity http://demo.openlinksw.com/Northwind/Customer/ALFKI/id  HTML representation of Entity description http://demo.openlinksw.com/Northwind/Customer/ALFKI/ page  RDF representation that describes the Entity which could be: Turtle, N3. RDF/XML etc. based data serialization http://demo.openlinksw.com/Northwind/Customer/ALFKI/data skhan@wku.ac.kr 74
  75. 75. 6. Triplify Data Sets • Publication Strategies • Conversion of Database skhan@wku.ac.kr 75
  76. 76. Linked Data PublicationTypes of data Structured Data Text RDF-izers EntityData Preparation For CVS, xml, Extractor Excel (e.g. Calais) Relational Data Source RDF RDFData storage Database With API Store files CMS with RDB-to-RDF Custom Linked Data Web RDFaData Publication Wrapper (e.g. D2R) Output Linked Data Interface Server wrapper (e.g. Pubby (e.g. Apache) (e.g. Drupal) Linked Data on the Web skhan@wku.ac.kr 76
  77. 77. Publication Strategy Strategy  From unstructured sources  use NLP, text mining, annotation,…  OpenCalais, Ontos  From semi-structured sources  Dbpedia, Linked GeoData, SCOVO,…  efficient bi-directional synchronization  From structured sources (relational database)  Declarative syntax and semantics of data model translation  RDB2RDF,… skhan@wku.ac.kr 77
  78. 78. Conversion of Database Books Authors ID ID Year Name HomepagePublishers ID PublisherName City Books ID Author Title Publisher Year ISBN0-00-651409-X id_xyz The Glass Palace id_qpr 2000 Authors ID Name Home page id_xyz Ghosh, Amitav http://www.amitavghosh.com Publishers ID Publisher Name City id_qpr Harper Collins London skhan@wku.ac.kr 78
  79. 79. Conversion of Database Tools for mapping RDB to Linked Data  D2R Server for customizable mappings from relational databases to ontologies [Bizer, Cyganiak 06]  Browser-based tools for defining RDB-to-RDF mappings [Zhou, Xu, Chen, Idehen 08]  Triplify [Auer, Dietzold, Lehmann, Hellmann, Aumueller 09]  OpenLink Data Spaces [Idehen, Erling 08] skhan@wku.ac.kr 79
  80. 80. RDF Features Best Avoided Do not use the full expressivity of the RDF data model.  Use a subset of the RDF features No blank nodes.  It is impossible to set external RDF links to a blank node, Do not use RDF reification as the semantics of reification  unclear and cumbersome to query with the SPARQL query language.  Metadata can be attached to the information resource instead Be careful before using RDF collections or RDF containers  do not work well together with SPARQL skhan@wku.ac.kr 80
  81. 81. 7. Link to other Data sets • Types of Linking • Linking manually • Automatic generation of Link skhan@wku.ac.kr 81
  82. 82. Link ! Reuse !! Reuse. Do not invent the wheel again…  The URIs are de-referenceable.  For instance, using the DBpedia URI http://dbpedia.org/page/Doom to identify the computer game Doom gives you an extensive description of the game including abstracts in 10 different languages and various classifications.  The URIs are already linked to URIs from other data sources.  For instance, you can navigate from the DBpedia URI http://dbpedia.org/resource/Innsbruck to data about Innsbruck provided by Geonames and EuroStat.  Therefore, by using concept URIs form these datasets, you interlink your data with a rich and fast-growing network of other data sources. skhan@wku.ac.kr 82
  83. 83. Types of Linking to other Data Sets Relationship Links  point at related things in other data sources, for instance, other people, places or genes. <http://www.skyhigh.com/people/GilDong> rdf:type foaf:Person ; foaf:name “Hong, Gil-Dong" ; foaf:based_near <http://dbpedia.org/resource/Seoul> ; foaf:topic_interest <http://dbpedia.org/resource/Justice> ; foaf:knows <http://dbpedia.org/resource/HalBingDang> . Identity Links  point at URI aliases used by other data sources to identify the same real-world object or abstract concept. <http:// www.skyhigh.com/people/GilDong > <http://www.w3.org/2002/07/owl#sameAs> <http://www.korea.org/history/hero> Vocabulary Links  point to the definitions of related terms in other vocabularies <http://www.university.org/terms/professor> rdf:type rdfs:Class ; rdfs:subClassOf <http://dbpedia.org/ontology/Person> . rdfs:subClassOf <http://sw.opencyc.org/concept/Mx4rvbGdrcN5Y29ycA> ; owl:equivalentClass <http://rdf.dictionary.com/entry/facultyMember> skhan@wku.ac.kr 83
  84. 84. Link to other Data Sets URI aliases  In an open environment like the Web it often happens that different information providers talk about the same non-information resource. As they do not know about each other, they introduce different URIs for identifying the same real-world object.  http://dbpedia.org/resource/Berlin  http://sws.geonames.org/2950159/  URI aliases provide an important social function to the Web of Data as they are de-referenced to different descriptions of the same non-information resource and thus allow different views and opinions to be expressed.  owl:sameAs Common Properties  rdfs:seeAlso, foaf:knows, foaf:based_near, foaf:topic_interest,… Two approaches for linking data:  RDF Links Manually  Auto-generating RDF Links skhan@wku.ac.kr 84
  85. 85. RDF Links Manually Find the similar data sets as suitable linking targets manually search in these for the URI references you want to link to. If a data source doesnt provide a search interface, you can use Linked Data browsers like Tabulator or Disco to explore the dataset and find the right URIs. Useful sites:  Sindice and Falcons provide indexes to identify candidate URIs for linking.  CKAN site : a registry of open linked data and projects.  Uriqr - A URI Search Engine: http://dev.uriqr.com/  Freebase: http://www.freebase.com MOAT: Meaning Of A Tag Framework  For manually interlinking tags with Semantic Web URIs (such as URIs from DBpedia, Geonames … or any knowledge base) Remember that data sources might use HTTP-303 redirects to redirect clients from URIs identifying non-information resources to URIs identifying information resources that describe the non-information resources. skhan@wku.ac.kr 85
  86. 86. Auto-generating RDF Links Various approaches  Pattern-based Algorithms  Similarity-based Approaches  Complex property-based Algorithms  Yves Equivalence Miner: interlinking Jamendo and Musicbrainz. Equivalence Mining and Matching Frameworks  Silk - A Link Discovery Framework for the Web of Data.  Silk can be run on a single machine or on a Hadoop cluster (for instance Amazon EC2).  LIMES - Link Discovery Framework for Metric Spaces.  time-efficient and lossless approaches for large-scale link discovery based on the characteristics of metric spaces.  DSNotify - Detecting and Fixing Broken Links in Linked Data Sets  TopBraid Composer  a wizard for linking ontology instances to corresponding DBpedia concepts.  SemMF  a flexible framework for calculating semantic similarity between objects that are represented as arbitrary RDF graphs. http://esw.w3.org/TaskForces/CommunityProjects/LinkingOpenData/EquivalenceMining skhan@wku.ac.kr 86
  87. 87. 8. Describe Data Sets • Metadata for Description skhan@wku.ac.kr 87
  88. 88. Publishing Descriptions of a Data set Help others discover and index your data Apply a license or waiver to your data set Metadata about the published linked data set  authorship of a data set, its currency (i.e., how recently the data set was updated), its licensing terms, the provenance and timeliness of a data set and the terms for licensing Important issues:  Provenance:  the ability to track the origin of data  key component in building trustworthy, reliable applications  Open Provenance Model84  Licenses vs. Waivers  Norms : a means for data publishers who waive their legal rights (through application of a waiver) to define expectations they have about how the data is used Two primary mechanisms  Semantic Sitemaps: http://sw.deri.org/2007/07/sitemapextension/  voiD : http://semanticweb.org/wiki/VoiD skhan@wku.ac.kr 88
  89. 89. Description Metadata about published data, such as a URI identifying the author Metadata and licensing information.Description Description of dataset that have the resources URI as the subject. Description of dataset that have the resources URI as the object. Backlinks This is redundant, but it allows browsers and crawlers to traverse links in either direction. Related Any additional information about related resources, i.e., answering information about a book with the author information.descriptions A moderate approach not overloaded excessively. Various ways to serialize RDF descriptions. At least provide RDF descriptions as RDF/XML which is the only Syntax official syntax for RDF. Additionally provide Turtle descriptions Trix, and other skhan@wku.ac.kr 89
  90. 90. Data Set Description: Example# Metadata and Licensing Information<http://dbpedia.org/data/Alec_Empire> rdfs:label "RDF description of Alec Empire" ; rdf:type foaf:Document ; dc:publisher <http://dbpedia.org/resource/DBpedia> ; dc:date "2007-07-13"^^xsd:date ; dc:rights <http://en.wikipedia.org/wiki/WP:GFDL> .# The description<http://dbpedia.org/resource/Alec_Empire> foaf:name "Empire, Alec" ; rdf:type foaf:Person ; rdf:type <http://dbpedia.org/class/yago/musician> ; rdfs:comment "Alec Empire (born May 2, 1972) is a German musician who is ..."@en ; rdfs:comment "Alec Empire (eigentlich Alexander Wilke) ist ein deutscher Musiker. ..."@de ; dbpedia:genre <http://dbpedia.org/resource/Techno> ; dbpedia:associatedActs <http://dbpedia.org/resource/Atari_Teenage_Riot> ; foaf:page <http://en.wikipedia.org/wiki/Alec_Empire> ; foaf:page <http://dbpedia.org/page/Alec_Empire> ; rdfs:isDefinedBy <http://dbpedia.org/data/Alec_Empire> ; owl:sameAs <http://zitgist.com/music/artist/d71ba53b-23b0-4870-a429-cce6f345763b> . skhan@wku.ac.kr 90
  91. 91. Data Set Description: Example# Backlinks<http://dbpedia.org/resource/60_Second_Wipeout> dbpedia:producer <http://dbpedia.org/resource/Alec_Empire> .<http://dbpedia.org/resource/Limited_Editions_1990-1994> dbpedia:artist <http://dbpedia.org/resource/Alec_Empire> . skhan@wku.ac.kr 91
  92. 92. 9. Publish Data Sets • Serialization • Linked Data Storage • Test and Debugging skhan@wku.ac.kr 92
  93. 93. Publishing Linked Data Serialization of Data Publication Advantages Disadvantages Method RDF/XML Document Oldest, best supported Confusingly like normal XML Turtle (N3) Not technically a standard Simplest Document yet HTML Document Fits inside HTML, Can get very complicated with RDFa but also RDF Promising, but still being JSON Normal JSON, but also RDF developed Needs to download+run GRDDL Use the XML you have/want XSLT SPARQL Query Protocol Query Protocol RDF files shouldnt be larger than, say, a few hundred kilobytes. Break them up into several RDF files Make sure multiple RDF files are linked to each other through RDF triples. skhan@wku.ac.kr 93
  94. 94. ExamplesRDF/XML <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:db="http://dbpedia.org/resource/"> <rdf:Description rdf:about="http://dbpedia.org/resource/Massachusetts"> <db:Governor> <rdf:Description rdf:about="http://dbpedia.org/resource/Deval_Patrick" /> </db:Governor> <db:Nickname>Bay State</db:Nickname> <db:Capital> <rdf:Description rdf:about="http://dbpedia.org/resource/Boston"> <db:Nickname>Beantown</db:Nickname> </rdf:Description> </db:Capital> </rdf:Description> </rdf:RDF>Turtle @prefix db: <http://dbpedia.org/resource/> db:Massachusetts db:Governor db:Deval_Patrick; db:Nickname "Bay State"; db:Capital db:Boston. db:Nickname "Beantown". skhan@wku.ac.kr 94
  95. 95. ExamplesRDFa <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:db="http://dbpedia.org/resource/" version="XHTML+RDFa 1.0"> <head> <title>About Massachusetts</title> </head> <body> <div about="http://dbpedia.org/resource/Massachusetts">The Massachusetts governor is <span rel="db:Governor"> <span about="http://dbpedia.org/resource/Deval_Patrick">Deval Patrick </span>, </span> the nickname is "<span property="db:Nickname">Bay State</span>", and the capital <span rel="db:Capital"> <span about="http://dbpedia.org/resource/Boston"> has the nickname "<span property="db:Nickname">Beantown</span>". </span> </span> </div> </body> </html> skhan@wku.ac.kr 95
  96. 96. ExamplesRDF-JSON { "__iri": "db:Massachusetts", "db:Nickname": "Bay State", "db:Governor": { "__iri": "db:Deval_Patrick" }, "db:Capital": { "__iri": "db:Boston", "db:Nickname": "Beantown" }, "__prefixes": { "db:": "http://dbpedia.org/resource/" } }GRDDL <MyDataSet xmlns="http://example.org/my-data-xml-namespace"> <State> <name>Massachusetts</name> <governor>Deval_Patrick</governor> <nickname>Bay State</nickname> <capital> <name>Boston</name> <nickname>Beantown</nickname> </capital> </State> </MyDataSet> skhan@wku.ac.kr 96
  97. 97. Linked Data Storage RDB to RDF Middleware  D2R Server Native RDF Storage (manage it yourself)  4Store  AllegroGraph  Bigdata  BigOWLIM  Jena TDB  Neo4j  Sesame  Virtuoso Native RDF Storage (managed)  Talis Platform Pubby  Linked Data front-end for SPARQL Endpoints Paget Framework skhan@wku.ac.kr 97
  98. 98. Testing and Debugging Linked Data To ensure it adheres to the Linked Data principles and best practices correctness of URIs dereference  Vapour Linked Data Validator at http://idi.fundacionctic.org/vapour  RDF:Alerts at http://swse.deri.org/RDFAlerts/  Sindice Inspector at http://inspector.sindice.com/ manual validation and debugging of Linked Data  cURL, Firefox browser extensions LiveHTTPHeaders and ModifyHeaders technical debugging and validation  Linked Data browsers can be used for.  Tabulator, Marbles, LOD Browser Switch skhan@wku.ac.kr 98
  99. 99. Summary: Linked DataSemantic Technologies need to go where the data is !Long Live Semantic Technology !Early adaptation of Semantic Technology is the king !Growth in data volumes is very rapid.Link, Integrate, ReuseLinked Data is a truly Web-friendly way of publishing data.Linked Data is the common global data space.Gun for killer apps of semantic technology…Catalyst and enabler to make semantic technology real…Unlimited opportunities ahead… skhan@wku.ac.kr 99
  100. 100. References Keith Alexander, Richard Cyganiak, Michael Hausenblas, and Jun Zhao, Describing linked datasets, In Proceedings of the WWW2009 Workshop on Linked Data on the Web, 2009. Tim Berners-Lee, Linked Data - Design Issues, 2006, http://www.w3.org/DesignIssues/LinkedData.html. Tim Berners-Lee, Giant global graph, http://dig.csail.mit.edu/breadcrumbs/node/215, 2007. Christian Bizer, Tom Heath, and Tim Berners-Lee, Linked data - the story so far, Int. J. Semantic Web Inf. Syst., 5(3):1–22, 2009. Chris Bizer, Richard Cyganiak, and Tom Heath, How to Publish Linked Data on the Web, http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/ W3C Working Draft, Cool URIs for the Semantic Web, http://www.w3.org/TR/2008/WD-cooluris-20080321/ http://data.gov.uk/linked-data http://www.w3.org/2001/sw/Specs.html Auer, S., Dietzold, S., Lehmann, J., Hellmann, S., and Aumueller, D. (2009). Triplify : lightweight linked data publication from relational databases. In Proceedings of the 17th International Conference on World Wide Web, WWW 2009, Madrid, Spain, April 20-24, 2009 A Survey of current approaches for mapping of relational databases to RDF: http://esw.w3.org/topic/Rdb2RdfXG/StateOfTheArt Miles et al.: Best Practices Recipes for Publishing RDF Vocabularies, Available at: http://www.w3.org/TR/swbp-vocab-pub/ skhan@wku.ac.kr 100
  101. 101. Semantic Technology Your World, Your Way skhan@wku.ac.krskhan@wku.ac.kr 101