Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data


Published on

Over the past 4 years, the Semantic Web activity has gained momentum with the widespread publishing of structured data as RDF. The Linked Data paradigm has therefore evolved from a practical research idea into
a very promising candidate for addressing one of the biggest challenges
of computer science: the exploitation of the Web as a platform for data
and information integration. To translate this initial success into a
world-scale reality, a number of research challenges need to be
addressed: the performance gap between relational and RDF data
management has to be closed, coherence and quality of data published on
the Web have to be improved, provenance and trust on the Linked Data Web
must be established and generally the entrance barrier for data
publishers and users has to be lowered. This tutorial will discuss
approaches for tackling these challenges. As an example of a successful
Linked Data project we will present DBpedia, which leverages Wikipedia
by extracting structured information and by making this information
freely accessible on the Web. The tutorial will also outline some recent advances in DBpedia, such as the mappings Wiki, DBpedia Live as well as
the recently launched DBpedia benchmark.

Published in: Technology, Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Initially, the Web consisted of many Websites containing only unstructured/textual content.The Web 2.0 extended this traditional Web with few extremely large Websites specializing on certain specific content types. Examples are:Wikipedia for encyclopedic for Web linksFlickr for picturesYouTube for VideosDigg for news…These websites provide specialized searching, querying, sharing, authoring specifically adopted to the content type of their concern.
  • Popular content types such as pictures, movies, calendars, encyclopedic articles, news recipes etc. are already sufficiently well supported on the Web.However, there is a long tail of special-interest content (profiles of expertise, historic data and events, bio-medical knowledge, intra-corporational knowledge etc.) which has very low or no current support (for filtering, aggregation, searching, querying, collaborative editing) on the Web.
  • Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

    1. 1. DBpedia and the Emerging Web ofLinkedDataSören Auer<br />
    2. 2. <ul><li>2000 Mathematics and Computer Science studies inHagen, Dresden and Екатеринбург
    3. 3. Managing director of adVIS GmbH – SME focusedon Web-Application and Content Management technology
    4. 4. IT consultant for various companies (T-Mobile AG, RDL Corp., Science Computing AG)
    5. 5. 2006 doctorate in Information Systems / Computer Science atUniversität Leipzig
    6. 6. 2006-2008 post-doctoral researcher at the DB Group at University of Pennsylvania (USA)
    7. 7. Head of AKSW research group – DBpedia, OntoWiki, LinkedGeoData, Triplify
    8. 8. Research interests: Information Systems, Database and Web Technologies, Semantic Web and Knowledge Engineering, Adaptive Methodologies, HCI, E-Science, Digital Libraries
    9. 9. Coordinator of the EU FP7 IP Project “LOD2 – Creating Knowledge out of Interlinked Data”
    10. 10. Work as expert for W3C, EU FP6/FP7/CIP, University City Keystone Innovation Zone, Swiss National Science Foundation</li></ul>Dr. SörenAuer<br />
    11. 11. The Vision & Big Picture<br />Linked Data 101<br />The LinkedData Life-cycle<br />Agenda<br />
    12. 12. 1. Reasoningdoes not scale on the Web<br /><ul><li>IR / one dimensional indexingscales (Google)
    13. 13. Next stepconjunctivequerying (OWL-QL?, dynamicscale-out / clustering)
    14. 14. Web scalable DL reasoningis out-of-sight (maybefragment, fuzzyreasoninghassomechances)</li></ul>2. Ifitwouldscaleitwould not beaffordable<br /><ul><li>“What is the only former Yugoslav republic in theEuropean Union?”
    15. 15. 2880 POWER7 cores, 16 Terabytes memory, 4 Terabytes clustered storage (IBM Watson) still can not answer this question</li></ul>WhytheSemantic Web won‘twork (soon)<br />
    16. 16. Problem: Try to search for these things on the current Web:<br /><ul><li>Apartments near German-Russian bilingual childcare in Berlin.
    17. 17. ERP service providers with offices in Vienna and London.
    18. 18. Researchers working on multimedia topics in Eastern Europe.</li></ul>Informationis available on the Web, but opaque to current search.<br />Why do we need the Data Web?<br />Solution: complement text on Web pages with structured linked open data & intelligently combine/integrate such structured information from different sources:<br />Search engine<br />HTML<br />HTML<br />RDF<br />RDF<br />Web server<br />Web server<br />Web server<br />Web server<br /><br />Has everything about childcare in Berlin.<br /><br />Knows all about real estate offers in Germany<br />DB<br />DB<br />
    19. 19. From the Document Web to theSemantic Data Web<br />Semantic Web(Vision 1998, starting ???)<br /><ul><li>Reasoning
    20. 20. Logic, Rules
    21. 21. Trust</li></ul>Data Web (since 2006)<br /><ul><li>URI de-referencability
    22. 22. Web Data integration
    23. 23. RDF serializations</li></ul>Social Web (since 2003)<br /><ul><li>Folksonomies/Tagging
    24. 24. Reputation, sharing
    25. 25. Groups, relationships</li></ul>Web (since 1992)<br /><ul><li>HTTP
    26. 26. HTML/CSS/JavaScript</li></li></ul><li>Web 1.0 Web 2.0 Web 3.0<br />Many Web sitescontaining unstructured,textual content<br />Few large Web sites are specialized onspecific content types<br />Many Web sites containing & semantically syndicating arbitrarily structured content<br />Video<br />Pictures<br />Encyclopedicarticles<br />+<br />+<br />
    27. 27. The Long Tail of Information Domains<br />Pictures<br />The Long Tail by Chris Anderson (Wired, Oct. ´04) adopted to information domains<br />Recipes<br />News<br />Video<br />Calendar<br />Popularity<br />SemWeb supported structured content<br />Requirements-Engineering<br />Talentmanagement<br />Special interest<br />communities<br />Itinerary ofKing George<br />Gene<br />sequences<br />…<br />…<br />…<br />…<br />Currently supportedstructuredcontent types<br />Not or insufficiently supported content types<br />
    28. 28. 1. Uses RDF Data Model<br />Linked Data in a Nutshell<br />3.10.2011<br />starts<br />organizes<br />SBC<br />SBBD2011<br />Florianopolis<br />takesPlaceIn<br />2. Is serialised in triples:<br />SBC organizes SBBD2011<br />SBBD2011 starts “20111003”^^xsd:date<br />SBBD2011 takesPlaceAt Florianopolis<br />3. Uses Content-negotiation<br />
    29. 29. The emerging Web of Data<br />interlink<br />2009<br />2007<br />SILK<br />DXX Engine<br />fuse<br />create <br />2008<br />poolparty<br />SemMF<br />OntoWiki<br />2008<br />Sigma<br />WiQA<br />2008<br />2008<br />ORE<br />repair<br />classify<br />Virtouso<br />2009<br />DL-Learner<br />MonetDB<br />Sindice<br />enrich<br />
    30. 30. Conceptual LevelData Access and Integration<br />Data Warehousing<br /><ul><li>Based on extract, transform load (ETL)
    31. 31. Global-As-View (GAV)</li></ul>Data Web<br /><ul><li>URIs as entity identifiers
    32. 32. HTTP as data accessprotocol
    33. 33. Local-As-View (LAV)</li></ul>Enterprise Information Integration<br />sets of heterogeneous data sources appear as a single, homogeneous data source<br />Research<br />Mediators<br />Ontology-based<br />P2P<br />Web service-based<br />Data Integration<br />Object-relational mappings (ORM)<br /><ul><li>NeXT’s EOF / WebObjects
    34. 34. ADO.NET Entity Framework
    35. 35. Hibernate</li></ul>Procedural APIs<br /><ul><li>ODBC
    36. 36. JDBC</li></ul>Query Languages<br /><ul><li>Datalog, SQL
    37. 37. SPARQL
    38. 38. XPATH/XQuery</li></ul>Linked Data<br /><ul><li>de-referencable URIs
    39. 39. RDF serialization formats</li></ul>Data Access<br />Others<br /><ul><li>XML, hierachical, tree, graph-oriented DBMS</li></ul>RDBMS<br /><ul><li>Organize data in relations, rows, cells
    40. 40. Oracle, DB2, MS-SQL</li></ul>Column-oriented DBMS<br /><ul><li>Collocates column values rather than row values
    41. 41. Vertica, C-Store, MonetDB</li></ul>Triple/Quad Stores<br /><ul><li>RDF data model
    42. 42. Virtuoso, Oracle, Sesame</li></ul>Data Models<br />Entity-attribute-value (EAV)<br /><ul><li>HELP medical record system, TrialDB</li></li></ul><li>The Vision & Big Picture<br />Linked Data 101(based on Michael Hausenblas‘ slides)<br />The LinkedData Life-cycle<br />Agenda<br />
    43. 43. Orientation<br />
    44. 44. Linked Data 101<br />Linked Data provides a standardised API for:<br />Data and metadata discovery<br />Data integration<br />Distributed query<br />
    45. 45. Linked Data principles<br />Use URIs to identify the “things” in your data<br />Use http:// URIs so people (and machines) can look them up on the web<br />When a URI is looked up, return a description ofthe thing (in RDF format)<br />Include links to related things<br /><br />
    46. 46. Linked Data principles<br />They are principles, not implementation advices<br />Not humans or machines but humans and machines! <br />Content negotiation (e.g. HTML and RDF/XML)<br />HTML+ RDFa<br />Metcalfe’s Law<br />
    47. 47. Linked Data example<br />17<br />
    48. 48. HTTP URIs<br />A Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resource. [RFC3986]<br />Syntax<br />URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]<br />Example<br />foo://<br />_/ _________________/_________/ __________/ __/<br /> | | | | |<br />scheme authority path query fragment<br />
    49. 49. HTTP URIs<br />URI references<br /> An RDF URI reference is a Unicode string does not contain any control characters (#x00 - #x1F, #x7F-#x9F) and would produce a valid URI character sequence representing an absolute URI when subjected to an UTF-8 encoding along with %-escaping non-US-ASCII octets.<br />Qualified Names (QNames)<br />XML’s way to allow namespaced elements/attributes as of QName = Prefix ‘:‘ LocalPart<br />Compact URIs (CURIEs)<br />Generic, abbreviated syntax for expressing URIs<br />
    50. 50. HTTP<br />The Hypertext Transfer Protocol (HTTP) is an application-level protocol for distributed, collaborative, hypermedia information systems.<br /> It is a generic, stateless, protocol which can be used for many tasks beyond its use for hypertext, such as name servers and distributed object management systems, through extension of its request methods, error codes and headers. A feature of HTTP is the typing and negotiation of data representation, allowing systems to be built independently of the data being transferred.<br />
    51. 51. HTTP<br />HTTP messages consist of requests from client to server and responses from server to client<br />Set of methods is predefined<br />GET<br />POST<br />PUT<br />DELETE<br />HEAD<br />(OPTIONS)<br />
    52. 52. HTTP<br />Status codes<br />Informational 1xx, provisional response, (100 Continue)<br />Successful 2xx, request successfully received, understood, and accepted (201 Created)<br />Redirection 3xx, further action needs to be taken by user agent to fulfill the request (301 Moved Permanently)<br />Client Error 4xx, client erred (405 Method Not Allowed)<br />Server Error 5xx, server encountered an unexpected condition (501 Not Implemented) <br />
    53. 53. HTTP<br />GET /html/rfc2616 HTTP/1.1<br />Host:<br />User-Agent: Mozilla/5.0<br />Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8<br />HTTP/1.x 200 OK<br />Date: Thu, 05 Mar 2009 08:17:33 GMT<br />Server: Apache/2.2.11<br />Content-Location: rfc2616.html<br />Last-Modified: Tue, 20 Jan 2009 09:16:04 GMT<br />Content-Type: text/html; charset=UTF-8<br />REQUEST<br />RESPONSE<br />
    54. 54. HTTP<br />Content Negotiation: selecting representation for a given response when multiple representations available<br />Three types of CN: server-driven, agent-driven CN, transparent CN<br />Example:<br />curl -I -H "Accept: application/rdf+xml"<br />HTTP/1.1 303 See Other<br />Content-Type: application/rdf+xml<br />Location:<br />
    55. 55. HTTP<br />Caching (see Cache–Control header field) is essential for scalability<br />HTTPbis IETF WG chaired by Mark Nottingham, mainly about: patches, clarifications, deprecate non-used features, documentation of security properties <br />
    56. 56. REST - HTTP<br />Representational State Transfer (REST)<br />resource intended conceptual target of a hypertext reference <br />resource identifier URL, URN<br />representation HTML document, JPEG image <br />representation media type, last-modified time<br />metadata<br />resource source link, alternates, varymetadata<br />control data if-modified-since, cache-control <br /><br />
    57. 57. Web's Standard Retrieval Algorithm<br />parse URI and find HTTP protocol<br />look up DNS name to determine the associated IP address<br />open a TCP stream to port 80 at the IP address determined above<br />format an HTTP GET request for resource and sends that to the server<br />read response from the server<br />from the status code (200) determine that a representation of the resource is available<br />inspect the returned Content-Type<br />pass the entity-body to its HTML rendering engine<br /><br />
    58. 58. RDF<br />A data model - directed, labeled graph<br />Triple: (subject predicate object)<br />subject … URIref or bNode<br />predicate … URIref<br />object … URIref or bNode or literal<br />
    59. 59. RDF Triple<br /><ul><li>Inspired by linguistic categories
    60. 60. Allowed usage:Subject: URI or blank nodePredicate: URI (also called properties)Object : URI or blank nodes or literal</li></ul>isMayorOf<br />Leipzig<br />Burkhard Jung<br />Predicate<br />Object<br />Subject<br />
    61. 61. Example RDF Graph<br />Leipzig<br />hasAreaCode<br />longitude<br />0341<br />12.3833<br />latitude<br />locatedIn<br />hasMayor<br />51.3333<br />isMayorOf<br />Saxony<br />Burkhard Jung<br />born<br />locatedIn<br />1958-03-07<br />isMemberOf<br />Germany<br />Social Democratic Party<br />
    62. 62. Literals<br /><ul><li>Representation of data values
    63. 63. Serialization as strings
    64. 64. Interpretation based on the datatype
    65. 65. Literals without Datatype are treated as strings</li></ul>longitude<br />12.3833<br />Leipzig<br />latitude<br />51.3333<br />isMayorOf<br />hasMayor<br />born<br />1958-03-07<br />Burkhard Jung<br />
    66. 66. RDF Serialization<br />N3: "Notation 3" - extensive formalism<br />N-Triples: part of N3<br />Quelle:<br />Turtle: Extension of N-Triples (shortcuts)<br />
    67. 67. Turtle Syntax<br /><ul><li>URIs in angle brackets
    68. 68. Literals in quotes
    69. 69. Triples separated by dot
    70. 70. Whitespace is ignored</li></ul>33<br />
    71. 71. Turtle Syntax: Shortcuts<br /> ;<br /> "Leipzig"@de ;<br /> "51.333332"^^xsd:float ;<br /> "12.383333"^^xsd:float .<br />Shortcuts for namespace prefixes:<br />@prefixrdf:<> .<br />@prefixrdfs:<> .<br />@prefixdbp:<> .<br />@prefixdbpp:<> .<br />@prefixgeo:<> .<br />dbp:Leipzigdbpp:hasMayordbp:Burkhard_Jung .<br />dbp:Leipzigrdfs:label "Leipzig"@de .<br />dbp:Leipziggeo:lat "51.333332"^^xsd:float .<br />dbp:Leipziggeo:lon "12.383333"^^xsd:float .<br />
    72. 72. Turtle Syntax: Shortcuts<br />Group triples with same subject using “;” instead of “.”:<br />@prefixrdf: <> .<br />@prefixrdfs="> .<br />@prefixdbp="> .<br />@prefixdbpp="> .<br />@prefixgeo="> .<br />dbp:Leipzigdbpp:hasMayordbp:Burkhard_Jung ;<br />rdfs:label "Leipzig"@de ;<br />geo:lat "51.333332"^^xsd:float ;<br />geo:lon "12.383333"^^xsd:float .<br />also Triple with same subject and predicate:<br />@prefixdbp="> .<br />@prefixdbpp="> .<br />dbp:Leipzigdbp:locatedIndbp:Saxony, dbp:Germany;<br />dbpp:hasMayordbp:Burkhard_Jung .<br />
    73. 73. XML-Syntax von RDF<br /><ul><li>Turtle intuitively readable and machine processable
    74. 74. but: better tool support and programming libraries for XML</li></ul><?xmlversion="1.0"?><br /><rdf:RDFxmlns:rdf=""<br />xmlns:rdfs=""<br />xmlns:dbpp=""<br />xmlns:geo=""><br /> <rdf:Descriptionrdf:about=""><br /> <property:hasMayor<br />rdf:resource="" /><br /> <rdfs:labelxml:lang="de">Leipzig</rdfs:label><br /> <geo:latrdf:datatype="float">51.3333</geo:lat><br /> <geo:lonrdf:datatype="float">12.3833</geo:lon><br /> </rdf:Description><br /></rdf:RDF><br />
    75. 75. RDF/JSON<br /><ul><li>JSON = JavaScript Object Notation
    76. 76. Compact formatfordataexchangebetweenapplications
    77. 77. JSON documentsare valid JavaScript
    78. 78. Programminglanguageindependent, sinceparserexistfor all popularprogramminglanguages
    79. 79. Lessoverheadwhenparsingandserialisingthan XML</li></ul>{ "S" : { "P" : [ O ] } }<br /><ul><li>Subject: URI, BNode
    80. 80. Predicate: URI
    81. 81. Object:</li></ul> Type: „URI“, „Literal“ or „bnode“<br /> Value:datavalue<br /> Lang: language tag<br />Datatype: URI ofthedatatype.<br />
    82. 82. JSON Example<br />{<br /> "" : {<br /> "": <br /> [ { "type":"uri", "value":"" } ],<br /> "":<br /> [ { "type":"literal", "value":"Leipzig", "lang":"en" } ] ,<br /> "":<br /> [ { "type":"literal", "value":"51.3333",<br /> "datatype":"" } ]<br /> "":<br />% [ { "type":"literal", "value":"12.3833",<br /> "datatype":"" } ]<br /> }<br />}<br />
    83. 83. RDFa Syntax<br /><ul><li>RDFa = Resource Description Framework – in –attributes
    84. 84. Embedding RDF in XHTML
    85. 85. UTF-8 and UTF-16, since Extension of XML based XHTML
    86. 86. Due toembedding in HTML moreoverheadthanotherserialisations
    87. 87. Lessreadable</li></ul><?xmlversion="1.0" encoding="UTF-8"?><br /><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"<br /> ""><br /><htmlversion="XHTML+RDFa 1.0" xml:lang="en" xmlns=""<br />xmlns:rdf=""<br />xmlns:rdfs=""<br />xmlns:dbpp=""<br />xmlns:geo=""><br /> <head><title>Leipzig</title></head><br /> <bodyabout=""><br /> <h1 property="rdfs:label" xml:lang="de">Leipzig</h1><br /> <p>Leipzig is a city in Germany. Leipzig'smayoris<br /> <a href="Burkhard_Jung" rel="dbpp:hasMayor">Burkhard Jung</a>. Itislocated<br />atlatitude <span property="geo:lat" datatype="xsd:float">51.3333</span><br />andlongitude <span property="geo:lon" datatype="xsd:float">12.3833</span>.</p><br /> </body><br /></html><br />
    88. 88. Vocabularies<br />Schema layer of RDF<br />Defines terms (classes and properties)<br />Typically RDFS or OWL family<br />Common vocabularies:<br />Dublin Core, SKOS<br />FOAF, SIOC, vCard<br />DOAP<br />Core Organization Ontology<br />VoID<br /><br />
    89. 89. 41<br />Vokabulare: Friend-of-a-Friend (FOAF)<br />defines classes and properties for representinginformation about people and theirrelationships<br />Soerenrdf:typefoaf:Person .<br />SoerencurrentProject .<br />Soerenfoaf:homepage .<br />Soerenfoaf:knows .<br />Soeren foaf:sha1 09ac456515dee .<br />
    90. 90. 42<br />Vokabulare: SemanticallyInterlinked Online Communities.<br />Represent content from Blogs, Wikis, Forums, Mailinglists, Chats etc.<br />
    91. 91. 43<br />Vokabulare: Simple Knowledge Organization System (SKOS)<br />support the use of thesauri, classification schemes, subjectheading systems and taxonomies<br />
    92. 92. Instance data<br />Instancesareassociatedwithoneorseveralclasses:<br />Boddingtonsrdf:type Ale .<br />Grafentrunkrdf:type Bock .<br />Hoegaardenrdf:type White .<br />Jeverrdf:type Pilsner . <br />
    93. 93. The Linked Open Data cloud <br />
    94. 94. Linked Open Data cloud<br />2009<br />2007<br />2008<br />2008<br />2010<br />2008<br />2008<br />2009<br />46<br />
    95. 95. Linked Open Data cloud<br />Media<br />User-generated<br />Government<br />Publications<br />Cross-domain<br />Geo<br />Life sciences<br /><br />
    96. 96. LOD cloudstats<br />triples distribution<br />links distribution <br /><br />
    97. 97. TimBL’s 5-star plan for open data<br />★ Make your data available on the Web under an open license<br /> ★★ Make it available as structured data(Excel sheet instead of image scan of a table) <br /> ★★★ Use a non-proprietary format(CSV file instead of an Excel sheet) <br />★★★★ Use Linked Data format(URIs to identify things, RDF to represent data)<br />★★★★★ Link your data to other people’s data to provide context<br />More:<br />
    98. 98. Why going for the 5th star?<br />Central Contractor Registration (CCR) <br />Geonames<br /><br />
    99. 99. Effort distribution<br />Fix Overall Data IntegrationEffort<br />
    100. 100. Datasets<br />A dataset is a set of RDF triples that are published,<br />maintained or aggregated by a single provider<br />
    101. 101. Linksets<br />An RDF link is an RDF triple whose subject and object are described in different datasets<br />A linksetis a collection of such RDF links between two datasets<br />
    102. 102. Describing Datasets - VoID<br />General dataset metadata<br />Access metadata<br />Structural metadata<br />Describing linksets<br />Deployment and discovery of voiD files<br /><br />
    103. 103. General dataset metadata<br />Dataset homepage<br />Publisher<br />Title and description<br />Categorisation<br />Licensing<br />Technical features<br />
    104. 104. General dataset metadata<br />:DBpedia a void:Dataset ;<br />dcterms:title "DBpedia” ;<br />dcterms:description "RDF data extracted from Wikipedia” ;<br />dcterms:contributor :FU_Berlin ;<br />dcterms:contributor :Uni_Leipzig ;<br />dcterms:contributor :Openlink ;<br />dcterms:source <> ;<br />void:feature <> ;<br />dcterms:modified "2008-11-17"^^xsd:date . <br />:Geonames a void:Dataset ; <br />dcterms:subject <> .<br />:GeoSpecies a void:Dataset ; <br />dcterms:license <> .<br />
    105. 105. Access metadata<br />SPARQL endpoints<br />RDF data dumps<br />Root resources<br />URI lookup endpoints<br />OpenSearch description documents<br />
    106. 106. Access metadata<br />:exampleDSvoid:Dataset ;<br />void:sparqlEndpoint <> ;<br />void:dataDump <> ;<br />void:uriLookupEndpoint <> .<br />
    107. 107. Structural metadata<br />Provides high-level information about the schema and internal structure of a dataset and can be helpful when exploring or querying datasets:<br />Example resources<br />Patterns for resource URIs<br />Vocabularies<br />Dataset partitions<br />Statistics<br />
    108. 108. Structural metadata<br />:DBpedia a void:Dataset;<br />void:exampleResource <> . <br />:LiveJournal a void:Dataset;<br />void:vocabulary <> .<br />:DBpedia a void:Dataset;<br />void:classPartition [<br />void:classfoaf:Person;<br />void:entities 312000;<br /> ];<br />void:propertyPartition [ <br />void:propertyfoaf:name;<br />void:triples 312000;<br /> ];<br /> .<br />
    109. 109. Describing linksets<br />
    110. 110. Describing linksets<br />:DBpedia a void:Dataset ;<br />void:subset :DBpedia2Geonames .<br />:Geonames a void:Dataset .<br />:DBpedia2Geonames a void:Linkset ;<br />void:target :DBpedia ;<br />void:target :Geonames ;<br />void:linkPredicateowl:sameAs .<br />
    111. 111. Deployment and discovery<br />Choosing URIs for datasets<br />Publishing a VoID file alongside a dataset<br />Turtle<br />RDFa<br />SPARQL Service Description Vocabulary<br />Discovery (well-known URI), based on of RFC5758], registered with IANA<br />
    112. 112. Consumption - Essentials<br />Linked Data provides for a global data-space with a uniform API (due to RDF as the data model)<br />Access methods<br />Dereference URIs via HTTP GET (RDF/XML, RDFa, etc.)<br />SPARQL (‘the SQL of RDF’)<br />Data dumps (RDF/XML, etc.)<br />
    113. 113. Consumption - Technologies<br />Linked Data access mechanisms widely supported<br /> all major platforms and languages (HTTP interface & RDF parsing), such as Java, Python, PHP, C/C++/.NET, etc.<br />Command line tools (curl, rapper, etc.)<br />Online tools <br /> (HTTP/low-level)<br /> (RDF/data-level)<br />Structured query: SPARQL (more later)<br />
    114. 114. Consumption - Technologies<br />Distributed setup  need for central point of access (indexer, aggregator)<br />Sindice, an index of the Web of Data<br /><br />, Web of Data aggregator & browser<br /><br />Relationship discovery<br /><br />
    115. 115. Technologies – FYN<br /><br />67<br />
    116. 116. Technologies –<br /> is a Web of Data platform enabling entity visualisation and consolidation both for humans and machines (API) <br /><br />68<br />
    117. 117. Technologies –<br /> is a service to find co-references on the Web of Data <br /><br />
    118. 118. <ul><li>All Linked Data datasets share a uniform data model, the RDF statement data model
    119. 119. Information is represented in facts expressed as (subject, predicate, object) triples
    120. 120. Components: globally unique IRI/URI entity identifiers & typed data values (literals) as objects</li></ul>Linked Data Benefits: Uniformity<br />
    121. 121. <ul><li>URIs not just used for identifying entities, but also (as URLs) for locating and retrieving resources that describe these entities on the Web</li></ul>Linked Data Benefits: De-referencability<br />
    122. 122. Linked Data Benefits: Coherence<br />Berlin<br />Germany<br />isCapitalOf<br />Knowledgebase 2<br />isMemberOf<br />Knowledgebase 1<br />European Union<br /><ul><li>triples containing URIs from different namespaces as subject and object, establish a link between (the entity identified by the) subject with (the entity identified by the) object (typed RDF links)</li></li></ul><li><ul><li>RDF data model, is based on a single mechanism for representing information (triples) -> very easy to attain a syntactic and simple semantic integration of different Linked Data sets.
    123. 123. higher level semantic integration can be achieved by employing schema and instance matching techniques and expressing found matches again as additional triple facts</li></ul>Linked Data Benefits: Integrateability<br />
    124. 124. <ul><li>Publishing and updating Linked Data is relatively simple thus facilitating a timely availability
    125. 125. once a Linked Data source is updated it is straightforward to access and use the updated data source (time consuming and error prune extraction, transformation and loading not required)</li></ul>Linked Data Benefits: Timeliness<br />
    126. 126. The Vision & Big Picture<br />Linked Data 101<br />The LinkedData Life-cycle<br />Agenda<br />
    127. 127. What works now? What has to be done?<br /><ul><li>Web - a global, distributed platform for data, information and knowledge integration
    128. 128. exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF</li></ul>Achievements<br />Extension of the Web with a data commons (25B facts<br />vibrant, global RTD community<br />Industrial uptake begins (e.g. BBC, Thomson Reuters, Eli Lilly) <br />Emerging governmental adoption in sight<br />Establishing Linked Data as a deployment path for the Semantic Web.<br /><ul><li>Challenges</li></ul>Coherence: Relatively few, expensively maintained links<br />Quality: partly low quality data and inconsistencies<br />Performance: Still substantial penalties comparedto relational <br />Data consumption: large-scale processing, schema mapping and data fusion still in its infancy<br />Usability: Establishing direct end-user tools and network effect<br />April 2008<br />July 2007<br />September 2008<br />July 2009<br />
    129. 129. Linked DataLifecycle<br />
    130. 130. Extraction<br />
    131. 131. From unstructured sources<br /><ul><li>NLP, text mining, annotation</li></ul>From semi-structured sources<br /><ul><li>DBpedia, LinkedGeoData, SCOVO/DataCube</li></ul>From structured sources<br /><ul><li>RDB2RDF</li></ul>Extraction<br />
    132. 132. extract structured information from Wikipedia & make this information available on the Web as LOD:<br /><ul><li>ask sophisticated queries against Wikipedia (e.g. universities in brandenburg, mayors of elevated towns, soccer players),
    133. 133. link other data sets on the Web to Wikipedia data
    134. 134. Represents a community consensus</li></ul>Recently launched DBpedia Live transforms Wikipedia into a structured knowledge base<br /> Transforming Wikipedia into an Knowledge Base<br />
    135. 135. Structure in Wikipedia<br />Title<br />Abstract<br />Infoboxes<br />Geo-coordinates<br />Categories<br />Images<br />Links<br />other language versions<br />other Wikipedia pages<br />To the Web<br />Redirects<br />Disambiguations<br />
    136. 136. Infobox templates<br />Wikitext-Syntax<br />{{Infobox Korean settlement<br />| title = Busan Metropolitan City<br />| img = Busan.jpg<br />| imgcaption = A view of the [[Geumjeong]] district in Busan<br />| hangul = 부산 광역시<br />...<br />| area_km2 = 763.46<br />| pop = 3635389<br />| popyear = 2006<br />| mayor = Hur Nam-sik<br />| divs = 15 wards (Gu), 1 county (Gun)<br />| region = [[Yeongnam]]<br />| dialect = [[Gyeongsang]]<br />}}<br /><br />dbp:Busan dbpp:title ″Busan Metropolitan City″<br />dbp:Busan dbpp:hangul ″부산 광역시″@Hang<br />dbp:Busan dbpp:area_km2 ″763.46“^xsd:float<br />dbp:Busan dbpp:pop ″3635389“^xsd:int<br />dbp:Busan dbpp:region dbp:Yeongnam<br />dbp:Busan dbpp:dialect dbp:Gyeongsang<br />...<br />RDF representation<br />
    137. 137. A vast multi-lingual, multi-domain knowledge base<br />DBpedia extraction results in:<br />descriptions of ca. 3.4 million things (1.5 million classified in a consistent ontology, including 312,000 persons, 413,000 places, 94,000 music albums, 49,000 films, 15,000 video games, 140,000 organizations, 146,000 species, 4,600 diseases<br />labels and abstracts for these 3.2 million things in up to 92 different languages; 1,460,000 links to images and 5,543,000 links to external web pages;4,887,000 external links into other RDF datasets, 565,000 Wikipedia categories, and 75,000 YAGO categories<br />altogether over 1 billion pieces of information (i.e. RDF triples): 257M from English edition, 766M from other language editions<br />DBpediaLive ( &Mappings Wiki ( the community into a refinement cycle<br />UpcommingDBpedia inline<br />
    138. 138. 2011/05/12<br />CONSEGI - Sören Auer: DBpedia<br />84<br />DBpedia Architecture<br />Extraction Manager<br />Wikipedia<br />N-Triple<br />Dumps<br />Extraction Job<br />Destinations<br />Article-<br />Queue<br />Extractors<br />Update<br />Stream<br />N-Triple<br />Serializer<br />Image<br />Label<br />Category<br />PageCollections<br />SPARQL-<br />Update<br />Destination<br />Redirect<br />Disambiguation<br />Wikipedia<br />Dumps<br />Database<br />Wikipedia<br />Pagelink<br />Geo<br />Abstract<br />Generic Infobox<br />Triple Store<br />Virtuoso <br />Wikipedia<br />OAI-PMH<br />Live<br />Wikipedia<br />Mapping-based Infobox<br />Parsers<br />Ontology-<br />Mappings<br />DateTime<br />Units<br />Geo<br />String-List<br />Numbers<br />Linked Data<br />SPARQL endpoint<br />The Web<br />SPARQL clients<br />HTML browser<br />DBpedia apps<br />RDF browser<br />
    139. 139. 2011/05/12<br />CONSEGI - Sören Auer: DBpedia<br />85<br />Hierarchies<br />DBpedia Ontology Schema:<br />manually created for DBpedia (infoboxes)<br />275 classes + 1335 properties; 20mio triples<br />YAGO:<br />large hierarchy linking Wikipedia leaf categories to WordNet<br />250,000 classes<br />UMBEL (Upper Mapping and Binding Exchange Layer):<br /> 20000 classes derived from OpenCyc<br />Wikipedia Categories:<br />Not a class hierarchy (e.g. cycles), represented using SKOS<br />415,000+ categories<br />
    140. 140. 2011/05/12<br />CONSEGI - Sören Auer: DBpedia<br />86<br />DBpedia SPARQL Endpoint<br /><br />hosted on a OpenLink Virtuososerver<br />can answer SPARQL queries like<br />Give me all Sitcoms that are set in NYC?<br />All tennis players from Moscow?<br />All films by Quentin Tarentino?<br />All German musicians that were born in Berlin in the 19th century?<br />All soccer players with tricot number 11, playing for a club having a stadium with over 40,000 seats and is born in a country with over 10 million inhabitants?<br />
    141. 141. 2011/05/12<br />CONSEGI - Sören Auer: DBpedia<br />87<br />DBpedia SPARQL Endpoint<br />SELECT ?name ?birth ?description ?person WHERE {<br /> ?person dbp:birthPlace dbp:Berlin .<br /> ?person skos:subject dbp:Cat:German_musicians .<br /> ?person dbp:birth ?birth .<br /> ?person foaf:name ?name .<br /> ?person rdfs:comment ?description .<br /> FILTER (LANG(?description) = 'en') .<br />} ORDER BY ?name<br />
    142. 142. 2011/05/12<br />CONSEGI - Sören Auer: DBpedia<br />88<br />DBpedia Applications<br />DBpedia Mobile: location aware mobile client for DBpedia<br />Uses current location and DBpedia to display map<br />Can navigate into other knowledge bases<br />DBpedia Query Builder: user front end for building queries<br />DBpedia Relationship Finder finds relation between two objects in DBpedia<br />
    143. 143. 2011/05/12<br />CONSEGI - Sören Auer: DBpedia<br />89<br />DBpedia Applications<br />
    144. 144. 2011/05/12<br />CONSEGI - Sören Auer: DBpedia<br />90<br />DBpediaApplications: Relfinder<br /><br />
    145. 145. 2011/05/12<br />CONSEGI - Sören Auer: DBpedia<br />91<br />DBpediaApplications: Zemanta<br />
    146. 146. 2011/05/12<br />CONSEGI - Sören Auer: DBpedia<br />92<br />DBpediaApplications: Faceted-Browser<br />
    147. 147. 2011/05/12<br />CONSEGI - Sören Auer: DBpedia<br />93<br />DBpedia Applications (3rd party)<br />Muddy Boots (BBC): Annotate actors in BBC News with DBpedia identifiers<br />Open Calais (Reuters): named entity recognition; entities are connected via owl:sameAs to DBpedia, Freebase, Geonames<br />Faviki: Social Bookmarking Tool uses DBpedia in backend to group tags etc. and multi-language support<br />Topbraid Composer: ontology editor, which links entities to DBpedia based on their labels<br />
    148. 148. LinkedGeoData<br />Conversion, interlinking and publishing of* data sets as RDF.<br />* ”Wikipedia for geographic data”<br />
    149. 149. Motivation<br />Ease information integration tasks that require spatial knowledge, such as<br /><ul><li>Offerings of bakeries next door
    150. 150. Map of distributed branches of a company
    151. 151. Historical sights along a bicycle track</li></ul>Therefore use RDF/OWL in order overcome structural and semantic heterogeneity.<br /><ul><li>Requires a vocabulary – which we try to establish.</li></ul>LOD cloud contains data sets with spatial features<br /><ul><li>e.g. Geonames, DBpedia, US census, EuroStat
    152. 152. But: they are restricted to popular or large entities like countries, famous places etc.</li></ul>Therefore they lack buildings, roads, mailboxes, etc.<br />
    153. 153. OpenStreetMap - Datamodel<br />Basic entities are:<br />NodesLatitude, Longitude<br />WaysSequence of nodes<br />RelationsAssociations between any number of nodes, ways and relations.<br />Each entity may be described with tags (= key-value pairs)<br />
    154. 154. Example: Leipzig's zoo<br />
    155. 155. Data/Mapping Example<br />
    156. 156. Data/Mapping Example<br />
    157. 157. Mapping Types<br />Three Mapping Types<br />Text<br />(5, name, Leipzig) -> lgd:node5 rdfs:label ”Leipzig”<br />(5, name:de, Leipzig) -> lgd:node5 rdfs:label ”Leipzig”@de<br />Datatypes<br />(6, seats, 4) -> lgd:node6 lgdo:seats ”4”^^xsd:integer<br />Classes/Object Properties<br />(7, place, city) -> lgdn:7 a lgdo:City<br />(7, religion, pastafarian) -> lgdn:7 lgdo:religion lgdo:Pastafarian<br />
    158. 158. Access<br />Rest Interface (based on Postgis DB, full osm dataset loaded, > 1billion triples)<br />Supports limited queries (e.g. circular/rectangular area, filtering by labels)<br />Sparql Endpoints (based on Virtuoso DB, subset of osm dataset, ~222M triples)<br />Static (<br />Live (<br />Downloads (<br />Monthly updates on the above datasets envisioned<br />
    159. 159. LinkedGeoData Live<br />OpenStreetMap provides full dumps and minutely changesets for download<br />Changesets are numbered, e.g. ”001/234/567.osc.gz”<br />We also convert the changesets to sets of added and removed triples (relative to our store) and publish them<br />001/234/567.added.nt.gz<br />001/234/567.removed.nt.gz<br />Advantage: Other users could easily sync their RDF store with LinkedGeoData<br />
    160. 160. DBpedia Mapping – Step By Step<br />Given a DBpedia point, query LGD points within type specific maximum distance<br />Basic idea (performed with Silk):<br />Compute spatial score<br />Compute name similarity (rdfs:label)<br />
    161. 161. DBpedia Mapping – Step By Step<br />Given a DBpedia point, query LGD points within type specific maximum distance<br />Basic idea (performed with Silk):<br />Compute spatial score<br />Compute name similarity (rdfs:label)<br />Combine both scores<br />Depending on final score, either automatically accept/reject links or mark for manual verification.<br />
    162. 162. Statistics (2011-Feb-23)<br />222.539.712 Triples<br />6.666.865 Ways<br />5.882.306 Nodes<br />Among them<br />352.673 PlaceOfWorship<br />60.573 RailwayStation<br />59.468 Recycling<br />50.955 Town<br />30.099 Toilet<br />7.222 City<br />
    163. 163. Conclusion<br />OpenStreetMap<br />immensely successful project for collaboratively creating free spatial data<br />Community uses key value structures, which provide a rich source of information<br />Key strength: broad coverage<br />LGD Contributions<br />Established mapping to Dbpedia<br />Geonames mapping partially done (37 different entity types cities, churches, ...)<br />Facet-based LGD Browser provides an interface for OSM/LGD, which highlights its structural aspects<br />Live sync<br />Goal: Make LGD as useful (succesful) as DBpedia for the geospatial domain<br />
    164. 164. Many different approaches (D2R, Virtuoso RDF Views, Triplify, …)<br />No agreement on a formalsemantics of RDF2RDFmapping<br /><ul><li>LOD readiness, SPARQL-SQL translation</li></ul>W3C RDB2RDF WG<br />Extraction Relational Data<br />
    165. 165. From unstructured sources<br /><ul><li>Deploy existing NLP approaches (OpenCalais, Ontos API)
    166. 166. Develop standardized, LOD enabled interfaces between NLP tools (NLP2RDF)</li></ul>From semi-structured sources<br /><ul><li>Efficient bi-directional synchronization</li></ul>From structured sources<br /><ul><li>Declarative syntax and semantics of data model transformations (W3C WG RDB2RDF)</li></ul>Orthogonal challenges<br /><ul><li>Using LOD as background knowledge
    167. 167. Provenance</li></ul>Extraction Challenges<br />
    168. 168. Storage and Querying<br />
    169. 169. Still by a factor 5-50 slower than relational data management (BSBM, DBpedia Benchmark)<br />Performance increases steadily<br />Comprehensive, well-supported open-soure and commercial implementations are available:<br /><ul><li>OpenLink’s Virtuoso (os+commercial)
    170. 170. Big OWLIM (commercial), Swift OWLIM (os)
    171. 171. 4store (os)
    172. 172. Talis (hosted)
    173. 173. Bigdata (distributed)
    174. 174. Allegrograph (commercial)
    175. 175. Mulgara (os)</li></ul>RDF Data Management<br />
    176. 176. <ul><li>UsesDBpediaasdataand a selectionof 25 frequentlyexecutedqueries
    177. 177. Can generatefractionsand multiples ofDBpedia‘ssize
    178. 178. Does not resemble relational data</li></ul>Performance differences, observedwithotherbenchmarksareamplified<br />DBpedia Benchmark<br />GeometricMean<br />
    179. 179. <ul><li>Reduce the performance gap between relational and RDF data management
    180. 180. SPARQL Query extensions
    181. 181. Spatial/semantic/temporal data management
    182. 182. More advanced query result caching
    183. 183. View maintenance / adaptive reorganization based on common access patterns
    184. 184. More realistic benchmarks</li></ul>Storage and Querying Challenges<br />
    185. 185. Authoring<br />
    186. 186. 1. Semantic (Text) Wikis<br /><ul><li>Authoring of semanticallyannotated texts</li></ul>2. Semantic Data Wikis<br /><ul><li>Direct authoring of structured information (i.e. RDF, RDF-Schema, OWL)</li></ul>Two Kinds of Semantic Wikis<br />
    187. 187. Versatile domain-independent tool<br />Serves as Linked Data / SPARQL endpoint on the Data Web<br />Open-source project hosted at Google code<br />Not just a Wiki UI, but a whole framework for the development of Semantic Web applications<br />Developed in PHP based on the Zend framework<br />Very active developer and user community<br />More than 500 downloads monthly<br />Large number of use cases<br />OntoWiki – a semantic data wiki<br />
    188. 188. Dynamic views on knowledgebases<br />OntoWiki<br />
    189. 189. OntoWiki<br />RDF triples on resourcedetailspage<br />
    190. 190. OntoWiki<br />Dynamische Vorschläge aus dem Daten Web<br />
    191. 191. CatalogusProfessorumLipsiensis<br />
    192. 192. OntoWiki: Caucasian Spiders<br />
    193. 193. RDFauthor in OntoWiki<br />
    194. 194. Semantic Portal with OntoWiki: Vakantieland<br />
    195. 195. RDFaCE- RDFa Content Editor<br />
    196. 196. RDFaCEArchitecture<br />
    197. 197. Integratingvarious NLP APIs<br />
    198. 198. Linking<br />© CC-BY-NC-ND by ~Dezz~ (residae on flickr)<br />
    199. 199. Automatic<br />Semi-automatic<br /><ul><li>SILK
    200. 200. LIMES</li></ul>Manual<br /><ul><li>Sindice integration into UIs
    201. 201. Semantic Pingback</li></ul>LOD Linking<br />
    202. 202. LIMES 0.3: Basic Idea<br />Usesthecharacteristicsofmetricspaces<br />Especiallyconsequencesoftriangleinequality<br />d(x, y) < d(x, z) + d(z, y) <br />d(x, z) - d(z, y) < d(x, y) < d(x, z) + d(z, y) <br />Basic idea<br />Usepessimisticapproximationsofdistancesinsteadofcomputingthem<br />Onlycomputedistanceswhenneeded<br />
    203. 203. Overview<br />Knowledgesources<br />
    204. 204. Computationof Exemplars<br />Assumption: numberofexemplarsisgiven<br />Goal: Segment targetdataset<br />
    205. 205. Computationof Exemplars<br />
    206. 206. Computationof Exemplars<br />
    207. 207. Computationof Exemplars<br />
    208. 208. Computationof Exemplars<br />
    209. 209. Computationof Exemplars<br />NB: Distancesfromexemplarsto all otherpointsareknown<br />
    210. 210. Filtering<br />y<br />x<br />z<br />1. Measuredistancefromeach x toeachexemplar<br />
    211. 211. Filtering<br />y<br />x<br />z<br />2. Applyd(x, y) - d(y, z) > t d(x, z) > t<br />
    212. 212. SimilarityComputation<br />y<br />x<br />z<br />d(x, y) - d(y, z) <t Compute d(x, z)<br />
    213. 213. Serialization<br />Resultsarereturnedas RDF<br />ForexamplemappingDBpediaandDrugbank<br />@prefixdrugbank: <> .<br />@prefixdbpedia: <> .<br />@prefixowl: <> .<br />dbpedia:Cefaclorowl:sameAs drugbank:DB00833 .<br />dbpedia:Clortermineowl:sameAs drugbank:DB01527 .<br />dbpedia:Prednicarbateowl:sameAs drugbank:DB01130 .<br />dbpedia:Linezolidowl:sameAs drugbank:DB00601 .<br />dbpedia:Valaciclovirowl:sameAs drugbank:DB00577 .<br />….<br />
    214. 214. Experiments<br />Q1: Whatisthebestnumberofexemplars?<br />Q2: Whatistherelationbetweenthesimilaritythresholdqandthe total numberofcomparisons?<br />Q3: Doestheassignmentof S and T matter?<br />Q4:Howdoes LIMES compareto SILK?<br />
    215. 215. Q1 and Q2<br />Experiments on syntheticdata<br />Knowledgebasesofsizes 2000, 3000, 5000, 7500 and 10000<br />Variednumberofexemplars<br />Variedthresholds<br />Experiments wererepeated 5 times<br />Averageresultsarepresented<br />
    216. 216. Q1 and Q2<br />
    217. 217. Q1 and Q2<br />Q1<br />Best numberofexemplarsdepends on q<br />Forq> 0.9, bestnumber lies around |T|1/2<br />Q2<br />As expected, numberofcomparisonsdiminisheswithgrowingq<br />
    218. 218. Q3 (order of S and T)<br />Experiments on syntheticdata<br />Knowledgebasesofsizes 1000, 2000, 3000, …, 10000<br />Numberofexemplars was |T|1/2<br />Experiments wererepeated 5 times<br />Averageresultsarepresented<br />
    219. 219. Q3<br />
    220. 220. Q3<br /><ul><li>Green = S firstismore time-efficient
    221. 221. Overall lessthan 5% difference</li></li></ul><li>Q4 (comparisonwith SILK)<br />3 Experiments on real data<br />Drugs<br />Diseases<br />SimCities<br />Numberofexemplars was |T|1/2<br />Comparisonofruntimewith SILK<br />Experiments wererepeatedthrice<br />Best runtimesarepresented<br />
    222. 222. Q4<br />
    223. 223. Q4<br />Weoutperform SILK 2 by 1.5 ordersofmagnitude<br />The larger thedatasources, thehigherourspeedup (64 forSimCities)<br />
    224. 224. LIMES Future Work<br />Extension ofapproachto semi-metrics<br />Adaption ofexemplarcomputationtoskew in data<br />Automatic computationoftheaccuratenumberofexemplars<br />ParallelizationusingHadoopshowshighspeedupgain<br />Combinationwithactivelearning<br />
    225. 225. update and notification services for LOD <br />Downward compatible with Pingback (blogosphere)<br /><br />Creating a network effect aroundLinking Data: Semantic Pingback<br />
    226. 226. Visualizing Pingbacks in OntoWiki<br />
    227. 227. Only 5% of the information on the Data Web is actually linked<br /><ul><li>Make sense of work in the de-duplication/record linkage literature
    228. 228. Consider the open world nature of Linked Data
    229. 229. Use LOD background knowledge
    230. 230. Zero-configuration linking
    231. 231. Explore active learning approaches, which integrate users in a feedback loop
    232. 232. Maintain a 24/7 linking service: Linked Open Data Around-The-Clock project (</li></ul>Interlinking Challenges<br />
    233. 233. Enrichment<br />
    234. 234. Linked Data is mainly instance data and !!!<br />ORE (Ontology Repair and Enrichment) tool allows to improve an OWL ontology by fixing inconsistencies & making suggestions for adding further axioms.<br /><ul><li>Ontology Debugging: OWL reasoning to detect inconsistencies and satisfiable classes + detect the most likely sources for the problems. user can create a repair plan, while maintaining full control.
    235. 235. Ontology Enrichment: uses the DL-Learner framework to suggest definitions & super classes for existing classes in the KB. works if instance data is available for harmonising schema and data.</li></ul><br />Enrichment & Repair<br />
    236. 236. Quality<br />Analysis<br />
    237. 237. Quality on the Data Web is varying a lot<br /><ul><li>Hand crafted or expensively curated knowledge base (e.g. DBLP, UMLS) vs. extracted from text or Web 2.0 sources (DBpedia)</li></ul>Research Challenge<br /><ul><li>Establish measures for assessing the authority, provenance, reliability of Data Web resources</li></ul>Linked Data Quality Analysis<br />
    238. 238. Evolution<br />© CC-BY-SA by alasis on flickr)<br />
    239. 239. <ul><li>unified method, for both data evolution and ontology refactoring.
    240. 240. modularized, declarative definition of evolution patterns is relatively simple compared to an imperative description of evolution
    241. 241. allows domain experts and knowledge engineers to amend the ontology structure and modify data with just a few clicks
    242. 242. Combined with RDF representation of evolution patterns and their exposure on the Linked Data Web, EvoPat facilitates the development of an evolution pattern ecosystem
    243. 243. patterns can be shared and reused on the Data Web.
    244. 244. declarative definition of bad smells and corresponding evolution patterns promotes the (semi-)automatic improvement of information quality.</li></ul>EvoPat – Pattern based KB Evolution<br />
    245. 245. Evolution Patterns<br />
    246. 246.
    247. 247. Exploration<br />
    248. 248. An ecosystemof LOD visualizations<br />Statistical<br />visualization<br />Faceted-browsing<br />Spatialfaceted-browsing<br />Entity-/faceted-<br />Basedbrowsing<br />Domain specificvisualizations<br />LOD Exploration<br />Widgets<br />…<br />…<br /><ul><li>Dataset analysis (size, vocabularies, propertyhistograms etc.)
    249. 249. Selectionofsuitablevisualizationwidgets</li></ul>Choreographylayer<br />LOD Datasets<br />
    250. 250. TODO: Put ULEI slides<br />Faceted spatial-semantic browsing component<br />
    251. 251. Pure JavaScript, requires onlySPARQL Endpoint for data access, Cross-Origin Resource Sharing (CORS) enabled.<br />operates on local spatial regions, doednot depend on global meta-data about the data<br />Source code:<br /><ul><li></li></ul>Online Demo - LinkedGeoData Browser:<br /><ul><li></li></ul>Next steps<br /><ul><li>Polygone/curvemarkers, domainspecificvisualizationtemplates, integrationofothersources, mobile interface</li></ul>Publication:<br /><ul><li>Claus Stadler, Jens Lehmann, Konrad Höffner, Sören Auer: LinkedGeoData: A Core for a Web of Spatial Open Data. Toappear in Semantic Web Journal - Special Issue on Linked Spatiotemporal Data and Geo-Ontologies.</li></ul>Faceted spatial-semantic browsing - Availability<br />
    252. 252. Genericentity-basedexplorationwithOntoWiki<br /><br />
    253. 253. Domain-specificvisualization: <br />
    254. 254. Visualization ofstatisticdata (datacubevocab.)<br /><br />
    255. 255.
    256. 256. 05.10.2011<br />Sören Auer - The emerging Web of Linked Data<br />170<br />
    257. 257. 05.10.2011<br />Sören Auer - The emerging Web of Linked Data<br />171<br />
    258. 258.
    259. 259. Visual Query Builder<br />
    260. 260. Relationship Finder in CPL<br />
    261. 261. Distributed Social Semantic Networking<br />
    262. 262. DistributedSocialSemanticNetworking<br />Social Networks are walled gardens<br /><ul><li>Take users' data out of their hands,
    263. 263. predefined privacy & data security regulations
    264. 264. infrastructure of a single provider (lock-in)
    265. 265. Facebook (600M+ users) = Web inside the Web
    266. 266. Interoperability is limited to proprietary APIs</li></ul>Social networks should be open and evolving<br /><ul><li>allow users to control what to enter & keep control over their data
    267. 267. users should be able to host the data on infrastructure, which is under their direct control, the same way as they host their own website (TBL)</li></ul>We need a truly Distributed Social Semantic Network (DSSN)<br /><ul><li>Initial approaches appeared with GNU social and more recently Diaspora
    268. 268. a DSSN should be based on semantic resource descriptions and de-referenceabilityso as to ensure versatility, reusability and openness in order to accommodate unforeseen usage scenarios
    269. 269. a number of standards and best-practices for social, Semantic Web applications such as FOAF, WebID and Semantic Pingback emerged.</li></li></ul><li>Resources announce services and feeds, feeds announce services – in particular a push service.<br />Applications initiate ping requests to spin the Linked Data network<br />Applications subscribe to feeds on push services and receive instant notifications on updates. <br />Update services are able to modify resources and feeds (e.g. on request of an application)<br />Personal and global search services index social network resources and are used by applications<br />Access to resources & services can be delegated to applications by a WebID, i.e. application can act in name of WebID owner<br />The majority of all access operations is executed through standard web requests.<br />DSSN Architecture<br />
    270. 270. <ul><li>Open-source, MVC architecture
    271. 271. Plattform independent, based on HTML5, CSS, Javascript
    272. 272. jQuery, jQuery Mobile, jQueryUI
    273. 273. rdfQuery – simple triplestore in Javascript
    274. 274. PhoneGap (Apache Device ready) native appsforiOS, Android, Blackberry OS, WebOS, Symbian, Bada
    275. 275.</li></ul>DSSN Mobile Client<br />
    276. 276.
    277. 277. DSSN Mobile Browsing<br />
    278. 278. DSSN Mobile Editing<br />
    279. 279. LOD Lifecycle<br />supported byDebian based<br />LOD2 Stack<br />(released next week)<br />
    280. 280. First releaseofthe LOD2 Stack: &<br />
    281. 281.
    282. 282. AKSW Team<br />
    283. 283. Thanksforyourattention!<br />Sören Auer<br /> | |<br /><br />