Your SlideShare is downloading. ×
Keynote at AImWD
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Keynote at AImWD

108
views

Published on

Keynote given at the workshop for Artificial Intelligence meets the Web of Data on Pragmatic Semantics. …

Keynote given at the workshop for Artificial Intelligence meets the Web of Data on Pragmatic Semantics.

In this keynote I argue that the Web of Data is a Complex System or Marketplace of Ideas rather than a classical Database, and that the model theory on which classical semantics are based is not appropriate in all situations, and propose an alternative "Pragmatic Semantics" based on optimisation of possible interpretations. .

Published in: Education

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
108
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Pragmatic Semantics for theWeb of DataAImWD -- Montpellier 2013Stefan Schlobach(based on work of and using slides from ChristopheGueret, Kathrin Denthler and Wouter Beek)VU Amsterdam
  • 2. Postulates• The Web of Data requires semantics• The Web of Data is not a database• The Web of Data is a complex system• Semantics for a database are not (always)suitable for complex systems• We need new semantic paradigms– Voila: Pragmatic Semantics
  • 3. CLASSICAL SEMANTICS FOR THEWEB OF DATAPart1
  • 4. 4/18Linked DataGraph/facts based knowledge representationConnect resources to properties / otherresourcesWeb-based: resources have a URITry http://dbpedia.org/resource/Amsterdam
  • 5. Model theory for Semantic WebLanguages: RDF, RDFS, OWL• Ontology and Data: set of formulas S• Model: formal structure satisfying all formulasin S• Entailment: formula f entailed by S iff f in truein all models of S• If contradiction, no models…• No models, everything is entailed.
  • 6. THE WEB OF DATA AS ACOMPLEX SYSTEMPart2
  • 7. Since 2006, people are creating linked dataBut publication and interpretationare distributed processes.The Web of Data is a Complex System.Not a database.It is a Marketplace of ideas.
  • 8. 13/27Key observationsThe Web of Data is more than the sum ofits triples – its a Complex SystemDifferent actorsDifferent scalesDynamic
  • 9. October 2007
  • 10. Evolution of the Web of DataNow
  • 11. The WoD is a complex system!• Countless extremely heterogeneous datasetso general-purposed datasets, such as DBpediao domain-oriented datasets, such as Bio2RDFo government data, music data, geological data, socialnetwork data, etc.Hundrets of billions of RDF tripleso Billions of links within the datasetso More than Million links between the datasetsEmbedded rich semantics in the datao data points are typedo links are typedo links is what makes the statements usefulInformation has impact on different scales
  • 12. A new way of seeing the WoDConsider the WoD as network
  • 13. Relevant (Network) Properties of WoD• Average path length• Degree distribution• Strongly connected components• Degree centrality• Between centrality• Closeness centrality
  • 14. Scales of observation of the WoD1. Graphs scale
  • 15. Graph-scale WoD network• Each dataset is a node• Edges are weighted, directed connectionsbetween the datasetso if there is at least one triple having a subjectwithin dataset 1 and an object within dataset2, then there is an edge between these twodatasets.o the number of such triples is the weight ofthe edge.
  • 16. • 110 nodes with 350 edges• Average path length is 2.16• 50 components
  • 17. The degree of 7 is criticalpoint after which thenetwork is not scale-freeany more.
  • 18. Top central nodesNode ValueDBpedia 0.332DBLP Berlin 0.108DBLP (RKB) 0.100DBLP Hannover 0.097FOAF profiles 0.075Betweenness centralityNode ValueDBpedia 0.762Geonames 0.614Drug Bank 0.576Linked MDB 0.544Flickr wrappr 0.526Closeness centralityNode ValueDBpedia 0.505UniProt 0.266DBLP (RKB) 0.266ACM (RKB) 0.229GeneID 0.211Degree centralityEvery centrality has a specific meaning...
  • 19. Scales of observation of the WoD2. Triple scale
  • 20. Triple-scale WoD network• We took the 10 million triples from the dataset crawledfrom the WoD, provided by the billion triple challenge2009• This "BTC" network is defined as G=(V, (E, L)), whereo V is a set of nodes, and each node is a URI or aliteralo E is a set of edgeso L is a set of labels, each label characterising arelation between nodes• We applied a few strategies to aggregate data forcomparison.
  • 21. Network Nodes EgesAverage pathlengthComponentsBTC 605K 860K 2.15 602KBTC aggregated 14K 31K 2.80 7KBTC aggregated +filter37 91 1.88 17Triple-scale network and its aggregations• BTC aggregated: triples are aggregated by thedomain names• BTC aggregated + filter: only domain namesshared with the graph-scale network
  • 22. Degree distributionBTC BTC aggregatedPower-law distribution
  • 23. Monitoring and Improving the WoD• Linked data is meant to be browsed, jumping from oneresource to another• The presence of Hubs is critical for the paths• Create alternate paths to be used in case of failureGuéret, Groth, van Harmelen, Schlobach, "Finding the Achilles Heel of the Web of Data:using network analysis for link-recommendation”
  • 24. AmsterdamThe NetherlandsisLocatedInChristophe VU AmsterdamworkInisLocatedInworkInworkInThe links have explicit semantics, which brings implicitlinks deduced after the reasoning processChallenges:
  • 25. Challenges:• Multi-relations links• FOAF (social networks + personal information)• SIOC (relations characterising blogs)• SWRC (describing research work)• …Different filtering produce different networksCentrality status of nodes changes w.r.t the networks• Dynamics• Data will be continuously added and linked.
  • 26. FORMAL INTERACTIONS WITH THEWEB OF DATAPart3
  • 27. 32/18Interacting with Linked DataCommon semantic paradigmCommon goals:Completeness: all the answersSoundness: only exact answers
  • 28. 33/18When solutions do not (quite) fit the problem ...Copyright: sfllaw (Flickr, image 222795669)
  • 29. 34/18MotivationIn the context of Web data ?Issues with scaleIssues with lack of consistencyIssues with contextualised views over the WorldRevise the goalsAs many answers as possible (or needed)Answers as accurate as possible (or needed)
  • 30. 35/18From logic to optimisationOptimise towards the revised goalsNeed methods that cope withuncertainty, context, noise, scale, ...
  • 31. Nature inspired methods forinteracting with complex systems• Advantageous properties– Adaptation– Simplicity– Interactivity: Anytime, user in the loop– Scalability and robustness– Good for dealing with dynamic information• Studied for different interaction types
  • 32. 37/18Answering queries over the dataCopyright: jepoirrier (Flickr, image 829293711)
  • 33. 38/18The problemMatch a graph pattern to the dataMost common approachJoin partial results for each edge of the query
  • 34. 39/18Solving approachesLogic-basedFind all the answers matching all of the query patternOptimisationFind answers matching as much of the query as possibleImportant implications of the optimisationOnly some of the answers will be foundSome of the answers found will be partially true
  • 35. Data LayerSE1Cache??SE2SE3candidate solutions Offspring1ERDF: An evolutionary algorithm under the hood2334Query ResultsWeb of DataInputSet of property/value pairs
  • 36. Data LayerSE1Cache??SE2SE3candidate solutions Offspring1ERDF: An evolutionary algorithm under the hood2334Query ResultsWeb of DataInitial PopulationRandomly chosen to fit thequery graph
  • 37. Data LayerSE1Cache??SE2SE3candidate solutions Offspring1ERDF: An evolutionary algorithm under the hood2334Query ResultsWeb of DataDeterminingfitness byquerying theWeb of DataSingle assertions aresent to SPARQLendpoints
  • 38. Data LayerSE1Cache??SE2SE3candidate solutions Offspring1ERDF: An evolutionary algorithm under the hood2334Query ResultsWeb of DataSelectionFitness determines the bestcandidate which is chosen asparent of the nextgenerationCreate offspringLoop:
  • 39. Data LayerSE1Cache??SE2SE3candidate solutions Offspring1ERDF: An evolutionary algorithm under the hood2334Query ResultsWeb of Data
  • 40. Data LayerSE1Cache??SE2SE3candidate solutions Offspring1ERDF: An evolutionary algorithm under the hood2334Query ResultsWeb of Data
  • 41.  Scalable Lean Robust Anytime ApproximateProperties of eRDF Arbitrary SPARQL endpoints Join-free, so scaling to moreendpoints is comparably painfree
  • 42. 48/18Some resultsTested on queries withvaried complexityWorks best with morecomplex queriesFind exact answerswhen there are some
  • 43. 49/18Finding implicit facts in the dataCopyright: givingnot@rocketmail.com (Flickr, image 6990161491)
  • 44. 50/18The problemDeduce new facts from othersMost common approachCentralise all the facts, batch process deductions
  • 45. 51/18Solving approachesLogic-basedFind all the facts that can be derived from the dataOptimisationFind as many facts as possible while preservingconsistencyImportant implications of the optimisationOnly some of the facts will be foundUnstable content
  • 46. 53/18An optimisation approach: SwarmsSwarm of micro-reasonersBrowse the graph, applying rules when possibleDeduced facts disappear after some timeEvery author of apaper is a personEvery person isalso an agent
  • 47. 54/18Some resultsIf they stay, most ofthe implicit factsare derivedAnts need to followeach other to dealwith precedence ofrulesSeveral ants perrule are needed
  • 48. Related findings and approaches• Storage optimisation using swarms(SwarmLinda from FU Berlin)• Join optimisation with swarms(RCQ-ACS Erasmus Rotterdam)• Emergent Semantics(eXascale Infolab Fribourg)• Previous speaker (argumentation basedsemantics)
  • 49. The day Semantics died…. ?AImWD -- Montpellier 2013Stefan Schlobach(based on work of and using slides from ChristopheGueret, Kathrin Denthler and Wouter Beek)VU Amsterdam
  • 50. PRAGMATIC SEMANTICS FOR THEWEB OF DATAPart4
  • 51. There is meaning in the structure
  • 52. Requirements• Standard languages• Standard semantics still valid (for simple data)• Integrate structural properties– Popularity of nodes/triples– “Distance” between triples– Frequency of triplesSemantics not strict, but pragmaticIntuitively: a statement twenty times made is moretrue than a statement once made
  • 53. Approach• Entailment defined through optimality overdifferent (possibly competing) notions of truth• Make as much information in the data explicit,and turn it into first-class semantics citizens(truth orderings)• Pragmatic entailment is defined through multi-objective optimisation.• Interoperability is then achieved by enriching anontology with meta-information about semanticorderings, as well as agreement on the weightingof orderings.
  • 54. Subset based truth orderings– the size of the minimal entailing subontology– ratio of sub-models in which a formula is satisfiedversus the total number of sub-models– ratio between sub-ontologies of O in which aformula holds holds versus the number of all sub-ontologiesTruth based on part of the given information
  • 55. Graph-based truth orderings• A shortest path ordering (diameter of theinduced sub-graphs). Such a notion is a proxy forconfidence of derivation. A• A random-walk distance or edge-weights, induceorderings that are clustering-aware, with sub-ontologies entailing a formula have morecohesion than others.• PageRank orderings can be used as proxies forpopularityTruth given on the structure of given information
  • 56. Pragmatic Entailment• A pragmatic closure C for an ontology O andorderings f1 to fn is then a set of formulas thatis Pareto-optimal w.r.t. the optimisationproblem max*f1 (C),…,fn (C)+.
  • 57. PraSem• Project title : Pragmatic Semantics for the Web ofData• Acronym: PraSem• Runtime: Nov 2012-Oct 2016• Main researcher: Wouter Beek• People involved: Stefan Schlobach, ChristopheGueret, Kathrin Denthler, Pepijn Kroes, Frank vanHarmelen, and hopefully more people soon.
  • 58. Deal with Open World AssumptionJune 3, 2013 IS: Web of Data 66
  • 59. Deal with incompletenessJune 3, 2013 IS: Web of Data 67
  • 60. Formalise approximationsJune 3, 2013 IS: Web of Data 68
  • 61. Take home message• The Web of Data requires semantics• The Web of Data is not a database• The Web of Data is a complex system• Semantics for a database are not (always)suitable for complex systems• We need new semantic paradigms– Voila: Pragmatic Semantics