Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

  • 6,373 views
Uploaded on

Over the past 4 years, the Semantic Web activity has gained momentum with the widespread publishing of structured data as RDF. The Linked Data paradigm has therefore evolved from a practical research …

Over the past 4 years, the Semantic Web activity has gained momentum with the widespread publishing of structured data as RDF. The Linked Data paradigm has therefore evolved from a practical research idea into
a very promising candidate for addressing one of the biggest challenges
of computer science: the exploitation of the Web as a platform for data
and information integration. To translate this initial success into a
world-scale reality, a number of research challenges need to be
addressed: the performance gap between relational and RDF data
management has to be closed, coherence and quality of data published on
the Web have to be improved, provenance and trust on the Linked Data Web
must be established and generally the entrance barrier for data
publishers and users has to be lowered. This tutorial will discuss
approaches for tackling these challenges. As an example of a successful
Linked Data project we will present DBpedia, which leverages Wikipedia
by extracting structured information and by making this information
freely accessible on the Web. The tutorial will also outline some recent advances in DBpedia, such as the mappings Wiki, DBpedia Live as well as
the recently launched DBpedia benchmark.

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
6,373
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
119
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Initially, the Web consisted of many Websites containing only unstructured/textual content.The Web 2.0 extended this traditional Web with few extremely large Websites specializing on certain specific content types. Examples are:Wikipedia for encyclopedic articlesDel.icio.us for Web linksFlickr for picturesYouTube for VideosDigg for news…These websites provide specialized searching, querying, sharing, authoring specifically adopted to the content type of their concern.
  • Popular content types such as pictures, movies, calendars, encyclopedic articles, news recipes etc. are already sufficiently well supported on the Web.However, there is a long tail of special-interest content (profiles of expertise, historic data and events, bio-medical knowledge, intra-corporational knowledge etc.) which has very low or no current support (for filtering, aggregation, searching, querying, collaborative editing) on the Web.
  • http://www.flickr.com/photos/residae/2560241604/#/
  • http://www.flickr.com/photos/alasis/3541341601/sizes/l/in/photostream/

Transcript

  • 1. DBpedia and the Emerging Web ofLinkedDataSören Auer
  • 2.
    • 2000 Mathematics and Computer Science studies inHagen, Dresden and Екатеринбург
    • 3. Managing director of adVIS GmbH – SME focusedon Web-Application and Content Management technology
    • 4. IT consultant for various companies (T-Mobile AG, RDL Corp., Science Computing AG)
    • 5. 2006 doctorate in Information Systems / Computer Science atUniversität Leipzig
    • 6. 2006-2008 post-doctoral researcher at the DB Group at University of Pennsylvania (USA)
    • 7. Head of AKSW research group – DBpedia, OntoWiki, LinkedGeoData, Triplify
    • 8. Research interests: Information Systems, Database and Web Technologies, Semantic Web and Knowledge Engineering, Adaptive Methodologies, HCI, E-Science, Digital Libraries
    • 9. Coordinator of the EU FP7 IP Project “LOD2 – Creating Knowledge out of Interlinked Data”
    • 10. Work as expert for W3C, EU FP6/FP7/CIP, University City Keystone Innovation Zone, Swiss National Science Foundation
    Dr. SörenAuer
  • 11. The Vision & Big Picture
    Linked Data 101
    The LinkedData Life-cycle
    Agenda
  • 12. 1. Reasoningdoes not scale on the Web
    • IR / one dimensional indexingscales (Google)
    • 13. Next stepconjunctivequerying (OWL-QL?, dynamicscale-out / clustering)
    • 14. Web scalable DL reasoningis out-of-sight (maybefragment, fuzzyreasoninghassomechances)
    2. Ifitwouldscaleitwould not beaffordable
    • “What is the only former Yugoslav republic in theEuropean Union?”
    • 15. 2880 POWER7 cores, 16 Terabytes memory, 4 Terabytes clustered storage (IBM Watson) still can not answer this question
    WhytheSemantic Web won‘twork (soon)
  • 16. Problem: Try to search for these things on the current Web:
    • Apartments near German-Russian bilingual childcare in Berlin.
    • 17. ERP service providers with offices in Vienna and London.
    • 18. Researchers working on multimedia topics in Eastern Europe.
    Informationis available on the Web, but opaque to current search.
    Why do we need the Data Web?
    Solution: complement text on Web pages with structured linked open data & intelligently combine/integrate such structured information from different sources:
    Search engine
    HTML
    HTML
    RDF
    RDF
    Web server
    Web server
    Web server
    Web server
    berlin.de
    Has everything about childcare in Berlin.
    Immobilienscout.de
    Knows all about real estate offers in Germany
    DB
    DB
  • 19. From the Document Web to theSemantic Data Web
    Semantic Web(Vision 1998, starting ???)
    Data Web (since 2006)
    • URI de-referencability
    • 22. Web Data integration
    • 23. RDF serializations
    Social Web (since 2003)
    • Folksonomies/Tagging
    • 24. Reputation, sharing
    • 25. Groups, relationships
    Web (since 1992)
    • HTTP
    • 26. HTML/CSS/JavaScript
  • Web 1.0 Web 2.0 Web 3.0
    Many Web sitescontaining unstructured,textual content
    Few large Web sites are specialized onspecific content types
    Many Web sites containing & semantically syndicating arbitrarily structured content
    Video
    Pictures
    Encyclopedicarticles
    +
    +
  • 27. The Long Tail of Information Domains
    Pictures
    The Long Tail by Chris Anderson (Wired, Oct. ´04) adopted to information domains
    Recipes
    News
    Video
    Calendar
    Popularity
    SemWeb supported structured content
    Requirements-Engineering
    Talentmanagement
    Special interest
    communities
    Itinerary ofKing George
    Gene
    sequences




    Currently supportedstructuredcontent types
    Not or insufficiently supported content types
  • 28. 1. Uses RDF Data Model
    Linked Data in a Nutshell
    3.10.2011
    starts
    organizes
    SBC
    SBBD2011
    Florianopolis
    takesPlaceIn
    2. Is serialised in triples:
    SBC organizes SBBD2011
    SBBD2011 starts “20111003”^^xsd:date
    SBBD2011 takesPlaceAt Florianopolis
    3. Uses Content-negotiation
  • 29. The emerging Web of Data
    interlink
    2009
    2007
    SILK
    DXX Engine
    fuse
    create
    2008
    poolparty
    SemMF
    OntoWiki
    2008
    Sigma
    WiQA
    2008
    2008
    ORE
    repair
    classify
    Virtouso
    2009
    DL-Learner
    MonetDB
    Sindice
    enrich
  • 30. Conceptual LevelData Access and Integration
    Data Warehousing
    • Based on extract, transform load (ETL)
    • 31. Global-As-View (GAV)
    Data Web
    • URIs as entity identifiers
    • 32. HTTP as data accessprotocol
    • 33. Local-As-View (LAV)
    Enterprise Information Integration
    sets of heterogeneous data sources appear as a single, homogeneous data source
    Research
    Mediators
    Ontology-based
    P2P
    Web service-based
    Data Integration
    Object-relational mappings (ORM)
    • NeXT’s EOF / WebObjects
    • 34. ADO.NET Entity Framework
    • 35. Hibernate
    Procedural APIs
    Query Languages
    Linked Data
    • de-referencable URIs
    • 39. RDF serialization formats
    Data Access
    Others
    • XML, hierachical, tree, graph-oriented DBMS
    RDBMS
    • Organize data in relations, rows, cells
    • 40. Oracle, DB2, MS-SQL
    Column-oriented DBMS
    • Collocates column values rather than row values
    • 41. Vertica, C-Store, MonetDB
    Triple/Quad Stores
    • RDF data model
    • 42. Virtuoso, Oracle, Sesame
    Data Models
    Entity-attribute-value (EAV)
    • HELP medical record system, TrialDB
  • The Vision & Big Picture
    Linked Data 101(based on Michael Hausenblas‘ slides)
    The LinkedData Life-cycle
    Agenda
  • 43. Orientation
  • 44. Linked Data 101
    Linked Data provides a standardised API for:
    Data and metadata discovery
    Data integration
    Distributed query
  • 45. Linked Data principles
    Use URIs to identify the “things” in your data
    Use http:// URIs so people (and machines) can look them up on the web
    When a URI is looked up, return a description ofthe thing (in RDF format)
    Include links to related things
    http://www.w3.org/DesignIssues/LinkedData.html
  • 46. Linked Data principles
    They are principles, not implementation advices
    Not humans or machines but humans and machines!
    Content negotiation (e.g. HTML and RDF/XML)
    HTML+ RDFa
    Metcalfe’s Lawhttp://en.wikipedia.org/wiki/Metcalfe%27s_law
  • 47. Linked Data example
    17
  • 48. HTTP URIs
    A Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resource. [RFC3986]
    Syntax
    URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
    Example
    foo://example.com:8042/over/there?name=ferret#nose
    _/ _________________/_________/ __________/ __/
    | | | | |
    scheme authority path query fragment
  • 49. HTTP URIs
    URI references
    An RDF URI reference is a Unicode string does not contain any control characters (#x00 - #x1F, #x7F-#x9F) and would produce a valid URI character sequence representing an absolute URI when subjected to an UTF-8 encoding along with %-escaping non-US-ASCII octets.
    Qualified Names (QNames)
    XML’s way to allow namespaced elements/attributes as of QName = Prefix ‘:‘ LocalPart
    Compact URIs (CURIEs)
    Generic, abbreviated syntax for expressing URIs
  • 50. HTTP
    The Hypertext Transfer Protocol (HTTP) is an application-level protocol for distributed, collaborative, hypermedia information systems.
    It is a generic, stateless, protocol which can be used for many tasks beyond its use for hypertext, such as name servers and distributed object management systems, through extension of its request methods, error codes and headers. A feature of HTTP is the typing and negotiation of data representation, allowing systems to be built independently of the data being transferred.
  • 51. HTTP
    HTTP messages consist of requests from client to server and responses from server to client
    Set of methods is predefined
    GET
    POST
    PUT
    DELETE
    HEAD
    (OPTIONS)
  • 52. HTTP
    Status codes
    Informational 1xx, provisional response, (100 Continue)
    Successful 2xx, request successfully received, understood, and accepted (201 Created)
    Redirection 3xx, further action needs to be taken by user agent to fulfill the request (301 Moved Permanently)
    Client Error 4xx, client erred (405 Method Not Allowed)
    Server Error 5xx, server encountered an unexpected condition (501 Not Implemented)
  • 53. HTTP
    GET /html/rfc2616 HTTP/1.1
    Host: tools.ietf.org
    User-Agent: Mozilla/5.0
    Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
    HTTP/1.x 200 OK
    Date: Thu, 05 Mar 2009 08:17:33 GMT
    Server: Apache/2.2.11
    Content-Location: rfc2616.html
    Last-Modified: Tue, 20 Jan 2009 09:16:04 GMT
    Content-Type: text/html; charset=UTF-8
    REQUEST
    RESPONSE
  • 54. HTTP
    Content Negotiation: selecting representation for a given response when multiple representations available
    Three types of CN: server-driven, agent-driven CN, transparent CN
    Example:
    curl -I -H "Accept: application/rdf+xml" http://dbpedia.org/resource/Galway
    HTTP/1.1 303 See Other
    Content-Type: application/rdf+xml
    Location: http://dbpedia.org/data/Galway.rdf
  • 55. HTTP
    Caching (see Cache–Control header field) is essential for scalabilityhttp://webofdata.wordpress.com/2009/11/23/linked-open-data-http-caching/
    HTTPbis IETF WG chaired by Mark Nottingham, mainly about: patches, clarifications, deprecate non-used features, documentation of security properties
  • 56. REST - HTTP
    Representational State Transfer (REST)
    resource intended conceptual target of a hypertext reference
    resource identifier URL, URN
    representation HTML document, JPEG image
    representation media type, last-modified time
    metadata
    resource source link, alternates, varymetadata
    control data if-modified-since, cache-control
    http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htmhttp://webofdata.wordpress.com/2009/10/09/linked-data-for-restafarians/
  • 57. Web's Standard Retrieval Algorithm
    parse URI and find HTTP protocol
    look up DNS name to determine the associated IP address
    open a TCP stream to port 80 at the IP address determined above
    format an HTTP GET request for resource and sends that to the server
    read response from the server
    from the status code (200) determine that a representation of the resource is available
    inspect the returned Content-Type
    pass the entity-body to its HTML rendering engine
    http://www.w3.org/2001/tag/doc/selfDescribingDocuments
  • 58. RDF
    A data model - directed, labeled graph
    Triple: (subject predicate object)
    subject … URIref or bNode
    predicate … URIref
    object … URIref or bNode or literal
  • 59. RDF Triple
    • Inspired by linguistic categories
    • 60. Allowed usage:Subject: URI or blank nodePredicate: URI (also called properties)Object : URI or blank nodes or literal
    isMayorOf
    Leipzig
    Burkhard Jung
    Predicate
    Object
    Subject
  • 61. Example RDF Graph
    Leipzig
    hasAreaCode
    longitude
    0341
    12.3833
    latitude
    locatedIn
    hasMayor
    51.3333
    isMayorOf
    Saxony
    Burkhard Jung
    born
    locatedIn
    1958-03-07
    isMemberOf
    Germany
    Social Democratic Party
  • 62. Literals
    • Representation of data values
    • 63. Serialization as strings
    • 64. Interpretation based on the datatype
    • 65. Literals without Datatype are treated as strings
    longitude
    12.3833
    Leipzig
    latitude
    51.3333
    isMayorOf
    hasMayor
    born
    1958-03-07
    Burkhard Jung
  • 66. RDF Serialization
    N3: "Notation 3" - extensive formalism
    N-Triples: part of N3
    Quelle:http://www.w3.org/DesignIssues/Notation3.html
    Turtle: Extension of N-Triples (shortcuts)
  • 67. Turtle Syntax
    • URIs in angle brackets
    • 68. Literals in quotes
    • 69. Triples separated by dot
    • 70. Whitespace is ignored
    33
  • 71. Turtle Syntax: Shortcuts
    http://dbpedia.org/resource/Leipzig http://dbpedia.org/property/hasMayor http://dbpedia.org/resource/Burkhard_Jung ;
    http://www.w3.org/2000/01/rdf-schema#label "Leipzig"@de ;
    http://www.w3.org/2003/01/geo/wgs84_pos#lat "51.333332"^^xsd:float ;
    http://www.w3.org/2003/01/geo/wgs84_pos#lon "12.383333"^^xsd:float .
    Shortcuts for namespace prefixes:
    @prefixrdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
    @prefixrdfs:<http://www.w3.org/2000/01/rdf-schema#> .
    @prefixdbp:<http://dbpedia.org/resource/> .
    @prefixdbpp:<http://dbpedia.org/property/> .
    @prefixgeo:<http://www.w3.org/2003/01/geo/wgs84_pos#> .
    dbp:Leipzigdbpp:hasMayordbp:Burkhard_Jung .
    dbp:Leipzigrdfs:label "Leipzig"@de .
    dbp:Leipziggeo:lat "51.333332"^^xsd:float .
    dbp:Leipziggeo:lon "12.383333"^^xsd:float .
  • 72. Turtle Syntax: Shortcuts
    Group triples with same subject using “;” instead of “.”:
    @prefixrdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
    @prefixrdfs="http://www.w3.org/2000/01/rdf-schema#> .
    @prefixdbp="http://dbpedia.org/resource/> .
    @prefixdbpp="http://dbpedia.org/property/> .
    @prefixgeo="http://www.w3.org/2003/01/geo/wgs84_pos#> .
    dbp:Leipzigdbpp:hasMayordbp:Burkhard_Jung ;
    rdfs:label "Leipzig"@de ;
    geo:lat "51.333332"^^xsd:float ;
    geo:lon "12.383333"^^xsd:float .
    also Triple with same subject and predicate:
    @prefixdbp="http://dbpedia.org/resource/> .
    @prefixdbpp="http://dbpedia.org/property/> .
    dbp:Leipzigdbp:locatedIndbp:Saxony, dbp:Germany;
    dbpp:hasMayordbp:Burkhard_Jung .
  • 73. XML-Syntax von RDF
    • Turtle intuitively readable and machine processable
    • 74. but: better tool support and programming libraries for XML
    <?xmlversion="1.0"?>
    <rdf:RDFxmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
    xmlns:dbpp="http://dbpedia.org/property/"
    xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#">
    <rdf:Descriptionrdf:about="http://dbpedia.org/resource/Leipzig">
    <property:hasMayor
    rdf:resource="http://dbpedia.org/resource/Burkhard_Jung" />
    <rdfs:labelxml:lang="de">Leipzig</rdfs:label>
    <geo:latrdf:datatype="float">51.3333</geo:lat>
    <geo:lonrdf:datatype="float">12.3833</geo:lon>
    </rdf:Description>
    </rdf:RDF>
  • 75. RDF/JSON
    • JSON = JavaScript Object Notation
    • 76. Compact formatfordataexchangebetweenapplications
    • 77. JSON documentsare valid JavaScript
    • 78. Programminglanguageindependent, sinceparserexistfor all popularprogramminglanguages
    • 79. Lessoverheadwhenparsingandserialisingthan XML
    { "S" : { "P" : [ O ] } }
    • Subject: URI, BNode
    • 80. Predicate: URI
    • 81. Object:
    Type: „URI“, „Literal“ or „bnode“
    Value:datavalue
    Lang: language tag
    Datatype: URI ofthedatatype.
  • 82. JSON Example
    {
    "http://dbpedia.org/resource/Leipzig" : {
    "http://dbpedia.org/property/hasMayor":
    [ { "type":"uri", "value":"http://dbpedia.org/resource/Burkhard_Jung" } ],
    "http://www.w3.org/2000/01/rdf-schema#label":
    [ { "type":"literal", "value":"Leipzig", "lang":"en" } ] ,
    "http://www.w3.org/2003/01/geo/wgs84_pos#lat":
    [ { "type":"literal", "value":"51.3333",
    "datatype":"http://www.w3.org/2001/XMLSchema#float" } ]
    "http://www.w3.org/2003/01/geo/wgs84_pos#lon":
    % [ { "type":"literal", "value":"12.3833",
    "datatype":"http://www.w3.org/2001/XMLSchema#float" } ]
    }
    }
  • 83. RDFa Syntax
    • RDFa = Resource Description Framework – in –attributes
    • 84. Embedding RDF in XHTML
    • 85. UTF-8 and UTF-16, since Extension of XML based XHTML
    • 86. Due toembedding in HTML moreoverheadthanotherserialisations
    • 87. Lessreadable
    <?xmlversion="1.0" encoding="UTF-8"?>
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
    "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
    <htmlversion="XHTML+RDFa 1.0" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
    xmlns:dbpp="http://dbpedia.org/property/"
    xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#">
    <head><title>Leipzig</title></head>
    <bodyabout="http://dbpedia.org/resource/Leipzig">
    <h1 property="rdfs:label" xml:lang="de">Leipzig</h1>
    <p>Leipzig is a city in Germany. Leipzig'smayoris
    <a href="Burkhard_Jung" rel="dbpp:hasMayor">Burkhard Jung</a>. Itislocated
    atlatitude <span property="geo:lat" datatype="xsd:float">51.3333</span>
    andlongitude <span property="geo:lon" datatype="xsd:float">12.3833</span>.</p>
    </body>
    </html>
  • 88. Vocabularies
    Schema layer of RDF
    Defines terms (classes and properties)
    Typically RDFS or OWL family
    Common vocabularies:
    Dublin Core, SKOS
    FOAF, SIOC, vCard
    DOAP
    Core Organization Ontology
    VoID
    http://www.slideshare.net/prototypo/introduction-to-linked-data-rdf-vocabularies
  • 89. 41
    Vokabulare: Friend-of-a-Friend (FOAF)
    defines classes and properties for representinginformation about people and theirrelationships
    Soerenrdf:typefoaf:Person .
    SoerencurrentProject http://OntoWiki.net .
    Soerenfoaf:homepage http://aksw.org/Soeren .
    Soerenfoaf:knows http://sembase.at/Tassilo .
    Soeren foaf:sha1 09ac456515dee .
  • 90. 42
    Vokabulare: SemanticallyInterlinked Online Communities.
    Represent content from Blogs, Wikis, Forums, Mailinglists, Chats etc.
  • 91. 43
    Vokabulare: Simple Knowledge Organization System (SKOS)
    support the use of thesauri, classification schemes, subjectheading systems and taxonomies
  • 92. Instance data
    Instancesareassociatedwithoneorseveralclasses:
    Boddingtonsrdf:type Ale .
    Grafentrunkrdf:type Bock .
    Hoegaardenrdf:type White .
    Jeverrdf:type Pilsner .
  • 93. The Linked Open Data cloud
  • 94. Linked Open Data cloud
    2009
    2007
    2008
    2008
    2010
    2008
    2008
    2009
    46
  • 95. Linked Open Data cloud
    Media
    User-generated
    Government
    Publications
    Cross-domain
    Geo
    Life sciences
    http://lod-cloud.net/
  • 96. LOD cloudstats
    triples distribution
    links distribution
    http://lod-cloud.net/state/
  • 97. TimBL’s 5-star plan for open data
    ★ Make your data available on the Web under an open license
    ★★ Make it available as structured data(Excel sheet instead of image scan of a table)
    ★★★ Use a non-proprietary format(CSV file instead of an Excel sheet)
    ★★★★ Use Linked Data format(URIs to identify things, RDF to represent data)
    ★★★★★ Link your data to other people’s data to provide context
    More: http://lab.linkeddata.deri.ie/2010/star-scheme-by-example/
  • 98. Why going for the 5th star?
    Central Contractor Registration (CCR)
    Geonames
    http://webofdata.wordpress.com/2011/05/22/why-we-link/
  • 99. Effort distribution
    Fix Overall Data IntegrationEffort
  • 100. Datasets
    A dataset is a set of RDF triples that are published,
    maintained or aggregated by a single provider
  • 101. Linksets
    An RDF link is an RDF triple whose subject and object are described in different datasets
    A linksetis a collection of such RDF links between two datasets
  • 102. Describing Datasets - VoID
    General dataset metadata
    Access metadata
    Structural metadata
    Describing linksets
    Deployment and discovery of voiD files
    http://www.w3.org/TR/void/
  • 103. General dataset metadata
    Dataset homepage
    Publisher
    Title and description
    Categorisation
    Licensing
    Technical features
  • 104. General dataset metadata
    :DBpedia a void:Dataset ;
    dcterms:title "DBpedia” ;
    dcterms:description "RDF data extracted from Wikipedia” ;
    dcterms:contributor :FU_Berlin ;
    dcterms:contributor :Uni_Leipzig ;
    dcterms:contributor :Openlink ;
    dcterms:source <http://dbpedia.org/resource/Wikipedia> ;
    void:feature <http://www.w3.org/ns/formats/RDF_XML> ;
    dcterms:modified "2008-11-17"^^xsd:date .
    :Geonames a void:Dataset ;
    dcterms:subject <http://dbpedia.org/resource/Location> .
    :GeoSpecies a void:Dataset ;
    dcterms:license <http://creativecommons.org/licenses/by-sa/3.0/us/> .
  • 105. Access metadata
    SPARQL endpoints
    RDF data dumps
    Root resources
    URI lookup endpoints
    OpenSearch description documents
  • 106. Access metadata
    :exampleDSvoid:Dataset ;
    void:sparqlEndpoint <http://example.org/sparql> ;
    void:dataDump <http://example.org/dump1.rdf> ;
    void:uriLookupEndpoint <http://api.example.org/search?qt=term> .
  • 107. Structural metadata
    Provides high-level information about the schema and internal structure of a dataset and can be helpful when exploring or querying datasets:
    Example resources
    Patterns for resource URIs
    Vocabularies
    Dataset partitions
    Statistics
  • 108. Structural metadata
    :DBpedia a void:Dataset;
    void:exampleResource <http://dbpedia.org/resource/Berlin> .
    :LiveJournal a void:Dataset;
    void:vocabulary <http://xmlns.com/foaf/0.1/> .
    :DBpedia a void:Dataset;
    void:classPartition [
    void:classfoaf:Person;
    void:entities 312000;
    ];
    void:propertyPartition [
    void:propertyfoaf:name;
    void:triples 312000;
    ];
    .
  • 109. Describing linksets
  • 110. Describing linksets
    :DBpedia a void:Dataset ;
    void:subset :DBpedia2Geonames .
    :Geonames a void:Dataset .
    :DBpedia2Geonames a void:Linkset ;
    void:target :DBpedia ;
    void:target :Geonames ;
    void:linkPredicateowl:sameAs .
  • 111. Deployment and discovery
    Choosing URIs for datasets
    Publishing a VoID file alongside a dataset
    Turtle
    RDFa
    SPARQL Service Description Vocabularyhttp://www.w3.org/TR/sparql11-service-description/
    Discovery (well-known URI), based on of RFC5758], registered with IANAhttp://www.example.com/.well-known/void
  • 112. Consumption - Essentials
    Linked Data provides for a global data-space with a uniform API (due to RDF as the data model)
    Access methods
    Dereference URIs via HTTP GET (RDF/XML, RDFa, etc.)
    SPARQL (‘the SQL of RDF’)
    Data dumps (RDF/XML, etc.)
  • 113. Consumption - Technologies
    Linked Data access mechanisms widely supported
    all major platforms and languages (HTTP interface & RDF parsing), such as Java, Python, PHP, C/C++/.NET, etc.
    Command line tools (curl, rapper, etc.)
    Online tools
    http://redbot.org/ (HTTP/low-level)
    http://sindice.com/developers/inspector (RDF/data-level)
    Structured query: SPARQL (more later)
  • 114. Consumption - Technologies
    Distributed setup  need for central point of access (indexer, aggregator)
    Sindice, an index of the Web of Data
    http://sindice.com/
    Sig.ma, Web of Data aggregator & browser
    http://sig.ma/
    Relationship discovery
    http://relfinder.semanticweb.org/
  • 115. Technologies – FYN
    http://dbpedia.org/resource/Galway
    67
  • 116. Technologies – Sig.ma
    Sig.ma is a Web of Data platform enabling entity visualisation and consolidation both for humans and machines (API)
    http://sig.ma/search?q=Galway
    68
  • 117. Technologies – sameas.org
    Sameas.org is a service to find co-references on the Web of Data
    http://sameas.org/html?q=Galway
  • 118.
    • All Linked Data datasets share a uniform data model, the RDF statement data model
    • 119. Information is represented in facts expressed as (subject, predicate, object) triples
    • 120. Components: globally unique IRI/URI entity identifiers & typed data values (literals) as objects
    Linked Data Benefits: Uniformity
  • 121.
    • URIs not just used for identifying entities, but also (as URLs) for locating and retrieving resources that describe these entities on the Web
    Linked Data Benefits: De-referencability
  • 122. Linked Data Benefits: Coherence
    Berlin
    Germany
    isCapitalOf
    Knowledgebase 2
    isMemberOf
    Knowledgebase 1
    European Union
    • triples containing URIs from different namespaces as subject and object, establish a link between (the entity identified by the) subject with (the entity identified by the) object (typed RDF links)
    • RDF data model, is based on a single mechanism for representing information (triples) -> very easy to attain a syntactic and simple semantic integration of different Linked Data sets.
    • 123. higher level semantic integration can be achieved by employing schema and instance matching techniques and expressing found matches again as additional triple facts
    Linked Data Benefits: Integrateability
  • 124.
    • Publishing and updating Linked Data is relatively simple thus facilitating a timely availability
    • 125. once a Linked Data source is updated it is straightforward to access and use the updated data source (time consuming and error prune extraction, transformation and loading not required)
    Linked Data Benefits: Timeliness
  • 126. The Vision & Big Picture
    Linked Data 101
    The LinkedData Life-cycle
    Agenda
  • 127. What works now? What has to be done?
    • Web - a global, distributed platform for data, information and knowledge integration
    • 128. exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF
    Achievements
    Extension of the Web with a data commons (25B facts
    vibrant, global RTD community
    Industrial uptake begins (e.g. BBC, Thomson Reuters, Eli Lilly)
    Emerging governmental adoption in sight
    Establishing Linked Data as a deployment path for the Semantic Web.
    • Challenges
    Coherence: Relatively few, expensively maintained links
    Quality: partly low quality data and inconsistencies
    Performance: Still substantial penalties comparedto relational
    Data consumption: large-scale processing, schema mapping and data fusion still in its infancy
    Usability: Establishing direct end-user tools and network effect
    April 2008
    July 2007
    September 2008
    July 2009
  • 129. Linked DataLifecycle
  • 130. Extraction
  • 131. From unstructured sources
    • NLP, text mining, annotation
    From semi-structured sources
    • DBpedia, LinkedGeoData, SCOVO/DataCube
    From structured sources
    • RDB2RDF
    Extraction
  • 132. extract structured information from Wikipedia & make this information available on the Web as LOD:
    • ask sophisticated queries against Wikipedia (e.g. universities in brandenburg, mayors of elevated towns, soccer players),
    • 133. link other data sets on the Web to Wikipedia data
    • 134. Represents a community consensus
    Recently launched DBpedia Live transforms Wikipedia into a structured knowledge base
    Transforming Wikipedia into an Knowledge Base
  • 135. Structure in Wikipedia
    Title
    Abstract
    Infoboxes
    Geo-coordinates
    Categories
    Images
    Links
    other language versions
    other Wikipedia pages
    To the Web
    Redirects
    Disambiguations
  • 136. Infobox templates
    Wikitext-Syntax
    {{Infobox Korean settlement
    | title = Busan Metropolitan City
    | img = Busan.jpg
    | imgcaption = A view of the [[Geumjeong]] district in Busan
    | hangul = 부산 광역시
    ...
    | area_km2 = 763.46
    | pop = 3635389
    | popyear = 2006
    | mayor = Hur Nam-sik
    | divs = 15 wards (Gu), 1 county (Gun)
    | region = [[Yeongnam]]
    | dialect = [[Gyeongsang]]
    }}
    http://dbpedia.org/resource/Busan
    dbp:Busan dbpp:title ″Busan Metropolitan City″
    dbp:Busan dbpp:hangul ″부산 광역시″@Hang
    dbp:Busan dbpp:area_km2 ″763.46“^xsd:float
    dbp:Busan dbpp:pop ″3635389“^xsd:int
    dbp:Busan dbpp:region dbp:Yeongnam
    dbp:Busan dbpp:dialect dbp:Gyeongsang
    ...
    RDF representation
  • 137. A vast multi-lingual, multi-domain knowledge base
    DBpedia extraction results in:
    descriptions of ca. 3.4 million things (1.5 million classified in a consistent ontology, including 312,000 persons, 413,000 places, 94,000 music albums, 49,000 films, 15,000 video games, 140,000 organizations, 146,000 species, 4,600 diseases
    labels and abstracts for these 3.2 million things in up to 92 different languages; 1,460,000 links to images and 5,543,000 links to external web pages;4,887,000 external links into other RDF datasets, 565,000 Wikipedia categories, and 75,000 YAGO categories
    altogether over 1 billion pieces of information (i.e. RDF triples): 257M from English edition, 766M from other language editions
    DBpediaLive (http://live.dbpedia.org/sparql/) &Mappings Wiki (http://mappings.dbpedia.org)integrate the community into a refinement cycle
    UpcommingDBpedia inline
  • 138. 2011/05/12
    CONSEGI - Sören Auer: DBpedia
    84
    DBpedia Architecture
    Extraction Manager
    Wikipedia
    N-Triple
    Dumps
    Extraction Job
    Destinations
    Article-
    Queue
    Extractors
    Update
    Stream
    N-Triple
    Serializer
    Image
    Label
    Category
    PageCollections
    SPARQL-
    Update
    Destination
    Redirect
    Disambiguation
    Wikipedia
    Dumps
    Database
    Wikipedia
    Pagelink
    Geo
    Abstract
    Generic Infobox
    Triple Store
    Virtuoso
    Wikipedia
    OAI-PMH
    Live
    Wikipedia
    Mapping-based Infobox
    Parsers
    Ontology-
    Mappings
    DateTime
    Units
    Geo
    String-List
    Numbers
    Linked Data
    SPARQL endpoint
    The Web
    SPARQL clients
    HTML browser
    DBpedia apps
    RDF browser
  • 139. 2011/05/12
    CONSEGI - Sören Auer: DBpedia
    85
    Hierarchies
    DBpedia Ontology Schema:
    manually created for DBpedia (infoboxes)
    275 classes + 1335 properties; 20mio triples
    YAGO:
    large hierarchy linking Wikipedia leaf categories to WordNet
    250,000 classes
    UMBEL (Upper Mapping and Binding Exchange Layer):
    20000 classes derived from OpenCyc
    Wikipedia Categories:
    Not a class hierarchy (e.g. cycles), represented using SKOS
    415,000+ categories
  • 140. 2011/05/12
    CONSEGI - Sören Auer: DBpedia
    86
    DBpedia SPARQL Endpoint
    http://dbpedia.org/sparql
    hosted on a OpenLink Virtuososerver
    can answer SPARQL queries like
    Give me all Sitcoms that are set in NYC?
    All tennis players from Moscow?
    All films by Quentin Tarentino?
    All German musicians that were born in Berlin in the 19th century?
    All soccer players with tricot number 11, playing for a club having a stadium with over 40,000 seats and is born in a country with over 10 million inhabitants?
  • 141. 2011/05/12
    CONSEGI - Sören Auer: DBpedia
    87
    DBpedia SPARQL Endpoint
    SELECT ?name ?birth ?description ?person WHERE {
    ?person dbp:birthPlace dbp:Berlin .
    ?person skos:subject dbp:Cat:German_musicians .
    ?person dbp:birth ?birth .
    ?person foaf:name ?name .
    ?person rdfs:comment ?description .
    FILTER (LANG(?description) = 'en') .
    } ORDER BY ?name
  • 142. 2011/05/12
    CONSEGI - Sören Auer: DBpedia
    88
    DBpedia Applications
    DBpedia Mobile: location aware mobile client for DBpedia
    Uses current location and DBpedia to display map
    Can navigate into other knowledge bases
    DBpedia Query Builder: user front end for building queries
    DBpedia Relationship Finder finds relation between two objects in DBpedia
  • 143. 2011/05/12
    CONSEGI - Sören Auer: DBpedia
    89
    DBpedia Applications
  • 144. 2011/05/12
    CONSEGI - Sören Auer: DBpedia
    90
    DBpediaApplications: Relfinder
    http://www.visualdataweb.org/relfinder.php
  • 145. 2011/05/12
    CONSEGI - Sören Auer: DBpedia
    91
    DBpediaApplications: Zemanta
  • 146. 2011/05/12
    CONSEGI - Sören Auer: DBpedia
    92
    DBpediaApplications: Faceted-Browser
  • 147. 2011/05/12
    CONSEGI - Sören Auer: DBpedia
    93
    DBpedia Applications (3rd party)
    Muddy Boots (BBC): Annotate actors in BBC News with DBpedia identifiers
    Open Calais (Reuters): named entity recognition; entities are connected via owl:sameAs to DBpedia, Freebase, Geonames
    Faviki: Social Bookmarking Tool uses DBpedia in backend to group tags etc. and multi-language support
    Topbraid Composer: ontology editor, which links entities to DBpedia based on their labels
  • 148. LinkedGeoData
    Conversion, interlinking and publishing of OpenStreetMap.org* data sets as RDF.
    * ”Wikipedia for geographic data”
  • 149. Motivation
    Ease information integration tasks that require spatial knowledge, such as
    • Offerings of bakeries next door
    • 150. Map of distributed branches of a company
    • 151. Historical sights along a bicycle track
    Therefore use RDF/OWL in order overcome structural and semantic heterogeneity.
    • Requires a vocabulary – which we try to establish.
    LOD cloud contains data sets with spatial features
    • e.g. Geonames, DBpedia, US census, EuroStat
    • 152. But: they are restricted to popular or large entities like countries, famous places etc.
    Therefore they lack buildings, roads, mailboxes, etc.
  • 153. OpenStreetMap - Datamodel
    Basic entities are:
    NodesLatitude, Longitude
    WaysSequence of nodes
    RelationsAssociations between any number of nodes, ways and relations.
    Each entity may be described with tags (= key-value pairs)
  • 154. Example: Leipzig's zoo
  • 155. Data/Mapping Example
  • 156. Data/Mapping Example
  • 157. Mapping Types
    Three Mapping Types
    Text
    (5, name, Leipzig) -> lgd:node5 rdfs:label ”Leipzig”
    (5, name:de, Leipzig) -> lgd:node5 rdfs:label ”Leipzig”@de
    Datatypes
    (6, seats, 4) -> lgd:node6 lgdo:seats ”4”^^xsd:integer
    Classes/Object Properties
    (7, place, city) -> lgdn:7 a lgdo:City
    (7, religion, pastafarian) -> lgdn:7 lgdo:religion lgdo:Pastafarian
  • 158. Access
    Rest Interface (based on Postgis DB, full osm dataset loaded, > 1billion triples)
    Supports limited queries (e.g. circular/rectangular area, filtering by labels)
    Sparql Endpoints (based on Virtuoso DB, subset of osm dataset, ~222M triples)
    Static (http://linkedgeodata.org/sparql)
    Live (http://live.linkedgeodata.org/sparql)
    Downloads (http://downloads.linkedgeodata.org)
    Monthly updates on the above datasets envisioned
  • 159. LinkedGeoData Live
    OpenStreetMap provides full dumps and minutely changesets for download
    Changesets are numbered, e.g. ”001/234/567.osc.gz”
    We also convert the changesets to sets of added and removed triples (relative to our store) and publish them
    001/234/567.added.nt.gz
    001/234/567.removed.nt.gz
    Advantage: Other users could easily sync their RDF store with LinkedGeoData
  • 160. DBpedia Mapping – Step By Step
    Given a DBpedia point, query LGD points within type specific maximum distance
    Basic idea (performed with Silk):
    Compute spatial score
    Compute name similarity (rdfs:label)
  • 161. DBpedia Mapping – Step By Step
    Given a DBpedia point, query LGD points within type specific maximum distance
    Basic idea (performed with Silk):
    Compute spatial score
    Compute name similarity (rdfs:label)
    Combine both scores
    Depending on final score, either automatically accept/reject links or mark for manual verification.
  • 162. Statistics (2011-Feb-23)
    222.539.712 Triples
    6.666.865 Ways
    5.882.306 Nodes
    Among them
    352.673 PlaceOfWorship
    60.573 RailwayStation
    59.468 Recycling
    50.955 Town
    30.099 Toilet
    7.222 City
  • 163. Conclusion
    OpenStreetMap
    immensely successful project for collaboratively creating free spatial data
    Community uses key value structures, which provide a rich source of information
    Key strength: broad coverage
    LGD Contributions
    Established mapping to Dbpedia
    Geonames mapping partially done (37 different entity types cities, churches, ...)
    Facet-based LGD Browser provides an interface for OSM/LGD, which highlights its structural aspects
    Live sync
    Goal: Make LGD as useful (succesful) as DBpedia for the geospatial domain
  • 164. Many different approaches (D2R, Virtuoso RDF Views, Triplify, …)
    No agreement on a formalsemantics of RDF2RDFmapping
    • LOD readiness, SPARQL-SQL translation
    W3C RDB2RDF WG
    Extraction Relational Data
  • 165. From unstructured sources
    • Deploy existing NLP approaches (OpenCalais, Ontos API)
    • 166. Develop standardized, LOD enabled interfaces between NLP tools (NLP2RDF)
    From semi-structured sources
    • Efficient bi-directional synchronization
    From structured sources
    • Declarative syntax and semantics of data model transformations (W3C WG RDB2RDF)
    Orthogonal challenges
    • Using LOD as background knowledge
    • 167. Provenance
    Extraction Challenges
  • 168. Storage and Querying
  • 169. Still by a factor 5-50 slower than relational data management (BSBM, DBpedia Benchmark)
    Performance increases steadily
    Comprehensive, well-supported open-soure and commercial implementations are available:
    • OpenLink’s Virtuoso (os+commercial)
    • 170. Big OWLIM (commercial), Swift OWLIM (os)
    • 171. 4store (os)
    • 172. Talis (hosted)
    • 173. Bigdata (distributed)
    • 174. Allegrograph (commercial)
    • 175. Mulgara (os)
    RDF Data Management
  • 176.
    • UsesDBpediaasdataand a selectionof 25 frequentlyexecutedqueries
    • 177. Can generatefractionsand multiples ofDBpedia‘ssize
    • 178. Does not resemble relational data
    Performance differences, observedwithotherbenchmarksareamplified
    DBpedia Benchmark
    GeometricMean
  • 179.
    • Reduce the performance gap between relational and RDF data management
    • 180. SPARQL Query extensions
    • 181. Spatial/semantic/temporal data management
    • 182. More advanced query result caching
    • 183. View maintenance / adaptive reorganization based on common access patterns
    • 184. More realistic benchmarks
    Storage and Querying Challenges
  • 185. Authoring
  • 186. 1. Semantic (Text) Wikis
    • Authoring of semanticallyannotated texts
    2. Semantic Data Wikis
    • Direct authoring of structured information (i.e. RDF, RDF-Schema, OWL)
    Two Kinds of Semantic Wikis
  • 187. Versatile domain-independent tool
    Serves as Linked Data / SPARQL endpoint on the Data Web
    Open-source project hosted at Google code
    Not just a Wiki UI, but a whole framework for the development of Semantic Web applications
    Developed in PHP based on the Zend framework
    Very active developer and user community
    More than 500 downloads monthly
    Large number of use cases
    OntoWiki – a semantic data wiki
  • 188. Dynamic views on knowledgebases
    OntoWiki
  • 189. OntoWiki
    RDF triples on resourcedetailspage
  • 190. OntoWiki
    Dynamische Vorschläge aus dem Daten Web
  • 191. CatalogusProfessorumLipsiensis
  • 192. OntoWiki: Caucasian Spiders
  • 193. RDFauthor in OntoWiki
  • 194. Semantic Portal with OntoWiki: Vakantieland
  • 195. RDFaCE- RDFa Content Editor
  • 196. RDFaCEArchitecture
  • 197. Integratingvarious NLP APIs
  • 198. Linking
    © CC-BY-NC-ND by ~Dezz~ (residae on flickr)
  • 199. Automatic
    Semi-automatic
    Manual
    • Sindice integration into UIs
    • 201. Semantic Pingback
    LOD Linking
  • 202. LIMES 0.3: Basic Idea
    Usesthecharacteristicsofmetricspaces
    Especiallyconsequencesoftriangleinequality
    d(x, y) < d(x, z) + d(z, y)
    d(x, z) - d(z, y) < d(x, y) < d(x, z) + d(z, y)
    Basic idea
    Usepessimisticapproximationsofdistancesinsteadofcomputingthem
    Onlycomputedistanceswhenneeded
  • 203. Overview
    Knowledgesources
  • 204. Computationof Exemplars
    Assumption: numberofexemplarsisgiven
    Goal: Segment targetdataset
  • 205. Computationof Exemplars
  • 206. Computationof Exemplars
  • 207. Computationof Exemplars
  • 208. Computationof Exemplars
  • 209. Computationof Exemplars
    NB: Distancesfromexemplarsto all otherpointsareknown
  • 210. Filtering
    y
    x
    z
    1. Measuredistancefromeach x toeachexemplar
  • 211. Filtering
    y
    x
    z
    2. Applyd(x, y) - d(y, z) > t d(x, z) > t
  • 212. SimilarityComputation
    y
    x
    z
    d(x, y) - d(y, z) <t Compute d(x, z)
  • 213. Serialization
    Resultsarereturnedas RDF
    ForexamplemappingDBpediaandDrugbank
    @prefixdrugbank: <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/> .
    @prefixdbpedia: <http://dbpedia.org/ontology/> .
    @prefixowl: <http://www.w3.org/2002/07/owl#> .
    dbpedia:Cefaclorowl:sameAs drugbank:DB00833 .
    dbpedia:Clortermineowl:sameAs drugbank:DB01527 .
    dbpedia:Prednicarbateowl:sameAs drugbank:DB01130 .
    dbpedia:Linezolidowl:sameAs drugbank:DB00601 .
    dbpedia:Valaciclovirowl:sameAs drugbank:DB00577 .
    ….
  • 214. Experiments
    Q1: Whatisthebestnumberofexemplars?
    Q2: Whatistherelationbetweenthesimilaritythresholdqandthe total numberofcomparisons?
    Q3: Doestheassignmentof S and T matter?
    Q4:Howdoes LIMES compareto SILK?
  • 215. Q1 and Q2
    Experiments on syntheticdata
    Knowledgebasesofsizes 2000, 3000, 5000, 7500 and 10000
    Variednumberofexemplars
    Variedthresholds
    Experiments wererepeated 5 times
    Averageresultsarepresented
  • 216. Q1 and Q2
  • 217. Q1 and Q2
    Q1
    Best numberofexemplarsdepends on q
    Forq> 0.9, bestnumber lies around |T|1/2
    Q2
    As expected, numberofcomparisonsdiminisheswithgrowingq
  • 218. Q3 (order of S and T)
    Experiments on syntheticdata
    Knowledgebasesofsizes 1000, 2000, 3000, …, 10000
    Numberofexemplars was |T|1/2
    Experiments wererepeated 5 times
    Averageresultsarepresented
  • 219. Q3
  • 220. Q3
    • Green = S firstismore time-efficient
    • 221. Overall lessthan 5% difference
  • Q4 (comparisonwith SILK)
    3 Experiments on real data
    Drugs
    Diseases
    SimCities
    Numberofexemplars was |T|1/2
    Comparisonofruntimewith SILK
    Experiments wererepeatedthrice
    Best runtimesarepresented
  • 222. Q4
  • 223. Q4
    Weoutperform SILK 2 by 1.5 ordersofmagnitude
    The larger thedatasources, thehigherourspeedup (64 forSimCities)
  • 224. LIMES Future Work
    Extension ofapproachto semi-metrics
    Adaption ofexemplarcomputationtoskew in data
    Automatic computationoftheaccuratenumberofexemplars
    ParallelizationusingHadoopshowshighspeedupgain
    Combinationwithactivelearning
  • 225. update and notification services for LOD
    Downward compatible with Pingback (blogosphere)
    http://aksw.org/Projects/SemanticPingBack
    Creating a network effect aroundLinking Data: Semantic Pingback
  • 226. Visualizing Pingbacks in OntoWiki
  • 227. Only 5% of the information on the Data Web is actually linked
    • Make sense of work in the de-duplication/record linkage literature
    • 228. Consider the open world nature of Linked Data
    • 229. Use LOD background knowledge
    • 230. Zero-configuration linking
    • 231. Explore active learning approaches, which integrate users in a feedback loop
    • 232. Maintain a 24/7 linking service: Linked Open Data Around-The-Clock project (LATC-project.eu)
    Interlinking Challenges
  • 233. Enrichment
  • 234. Linked Data is mainly instance data and !!!
    ORE (Ontology Repair and Enrichment) tool allows to improve an OWL ontology by fixing inconsistencies & making suggestions for adding further axioms.
    • Ontology Debugging: OWL reasoning to detect inconsistencies and satisfiable classes + detect the most likely sources for the problems. user can create a repair plan, while maintaining full control.
    • 235. Ontology Enrichment: uses the DL-Learner framework to suggest definitions & super classes for existing classes in the KB. works if instance data is available for harmonising schema and data.
    http://aksw.org/Projects/ORE
    Enrichment & Repair
  • 236. Quality
    Analysis
  • 237. Quality on the Data Web is varying a lot
    • Hand crafted or expensively curated knowledge base (e.g. DBLP, UMLS) vs. extracted from text or Web 2.0 sources (DBpedia)
    Research Challenge
    • Establish measures for assessing the authority, provenance, reliability of Data Web resources
    Linked Data Quality Analysis
  • 238. Evolution
    © CC-BY-SA by alasis on flickr)
  • 239.
    • unified method, for both data evolution and ontology refactoring.
    • 240. modularized, declarative definition of evolution patterns is relatively simple compared to an imperative description of evolution
    • 241. allows domain experts and knowledge engineers to amend the ontology structure and modify data with just a few clicks
    • 242. Combined with RDF representation of evolution patterns and their exposure on the Linked Data Web, EvoPat facilitates the development of an evolution pattern ecosystem
    • 243. patterns can be shared and reused on the Data Web.
    • 244. declarative definition of bad smells and corresponding evolution patterns promotes the (semi-)automatic improvement of information quality.
    EvoPat – Pattern based KB Evolution
  • 245. Evolution Patterns
  • 246.
  • 247. Exploration
  • 248. An ecosystemof LOD visualizations
    Statistical
    visualization
    Faceted-browsing
    Spatialfaceted-browsing
    Entity-/faceted-
    Basedbrowsing
    Domain specificvisualizations
    LOD Exploration
    Widgets


    • Dataset analysis (size, vocabularies, propertyhistograms etc.)
    • 249. Selectionofsuitablevisualizationwidgets
    Choreographylayer
    LOD Datasets
  • 250. TODO: Put ULEI slides
    Faceted spatial-semantic browsing component
  • 251. Pure JavaScript, requires onlySPARQL Endpoint for data access, Cross-Origin Resource Sharing (CORS) enabled.
    operates on local spatial regions, doednot depend on global meta-data about the data
    Source code:
    • https://github.com/AKSW/SpatialSemanticBrowsingWidgets
    Online Demo - LinkedGeoData Browser:
    • http://browser.linkedgeodata.org
    Next steps
    • Polygone/curvemarkers, domainspecificvisualizationtemplates, integrationofothersources, mobile interface
    Publication:
    • Claus Stadler, Jens Lehmann, Konrad Höffner, Sören Auer: LinkedGeoData: A Core for a Web of Spatial Open Data. Toappear in Semantic Web Journal - Special Issue on Linked Spatiotemporal Data and Geo-Ontologies.
    Faceted spatial-semantic browsing - Availability
  • 252. Genericentity-basedexplorationwithOntoWiki
    http://fintrans.publicdata.eu
  • 253. Domain-specificvisualization:http://energy.publicdata.eu
  • 254. Visualization ofstatisticdata (datacubevocab.)
    http://scoreboard.lod2.eu
  • 255.
  • 256. 05.10.2011
    Sören Auer - The emerging Web of Linked Data
    170
  • 257. 05.10.2011
    Sören Auer - The emerging Web of Linked Data
    171
  • 258.
  • 259. Visual Query Builder
  • 260. Relationship Finder in CPL
  • 261. Distributed Social Semantic Networking
  • 262. DistributedSocialSemanticNetworking
    Social Networks are walled gardens
    • Take users' data out of their hands,
    • 263. predefined privacy & data security regulations
    • 264. infrastructure of a single provider (lock-in)
    • 265. Facebook (600M+ users) = Web inside the Web
    • 266. Interoperability is limited to proprietary APIs
    Social networks should be open and evolving
    • allow users to control what to enter & keep control over their data
    • 267. users should be able to host the data on infrastructure, which is under their direct control, the same way as they host their own website (TBL)
    We need a truly Distributed Social Semantic Network (DSSN)
    • Initial approaches appeared with GNU social and more recently Diaspora
    • 268. a DSSN should be based on semantic resource descriptions and de-referenceabilityso as to ensure versatility, reusability and openness in order to accommodate unforeseen usage scenarios
    • 269. a number of standards and best-practices for social, Semantic Web applications such as FOAF, WebID and Semantic Pingback emerged.
  • Resources announce services and feeds, feeds announce services – in particular a push service.
    Applications initiate ping requests to spin the Linked Data network
    Applications subscribe to feeds on push services and receive instant notifications on updates.
    Update services are able to modify resources and feeds (e.g. on request of an application)
    Personal and global search services index social network resources and are used by applications
    Access to resources & services can be delegated to applications by a WebID, i.e. application can act in name of WebID owner
    The majority of all access operations is executed through standard web requests.
    DSSN Architecture
  • 270.
    • Open-source, MVC architecture
    • 271. Plattform independent, based on HTML5, CSS, Javascript
    • 272. jQuery, jQuery Mobile, jQueryUI
    • 273. rdfQuery – simple triplestore in Javascript
    • 274. PhoneGap (Apache Device ready) native appsforiOS, Android, Blackberry OS, WebOS, Symbian, Bada
    • 275. http://aksw.org/Projects/MobileSocialSemanticWeb
    DSSN Mobile Client
  • 276.
  • 277. DSSN Mobile Browsing
  • 278. DSSN Mobile Editing
  • 279. LOD Lifecycle
    supported byDebian based
    LOD2 Stack
    (released next week)
  • 280. First releaseofthe LOD2 Stack: stack.lod2.eu & demo.lod2.eu/lod2demo
  • 281.
  • 282. AKSW Team
  • 283. Thanksforyourattention!
    Sören Auer
    http://www.uni-leipzig.de/~auer/ | http://aksw.org | http://lod2.org
    auer@uni-leipzig.de