Linked Open Data
Dan Brickley, Google
Denny Vrandečić, Wikimedia
Session Linked data, Tuesday, 9:45-11:15
Agenda
!
!
!
!
!
!
!
!
!
!

2

 
 
 
 
 
 
 
 
 
 

Notation
Linked Open Data principles
Applied LOD principles
Applicatio...
Notation
!   URIs here generally abbreviated with CURIEs
(e.g. http://dbpedia.org/resource/Kalamaki = dbpedia:Kalamaki)
! ...
Background

LOD PRINCIPLES

4

22/05/2012
Linked Open Data principles
1.  Use URIs as names for things
2.  Use HTTP URIs so that they can be looked up
3.  Provide r...
LOD Application

WHY SEMANTIC COMPUTING?

6
7
1
 2
 10
Field 1: Tag
!   0 = Face up
!   1 = Face down
Field 2: Suit
!   1 = Clubs
!   2 = Diamonds
!   3 = Hearts
!   4 ...
1
 2
 10

9

100
10 D
cards:next

1
 2
 10

10

100
10 D
1
 2
 10

cards:next

cards:card

11

cards:d10

100
10 D
http://example.org/cards/d10!
!   Oh, an unknown term
!   It is an HTTP URI!
GET /cards/d10 HTTP/1.1!
HOST www.example.org...
1
 2
 10

100
10 D

cards:diamonds

cards:next

cards:card

13

cards:d10

cards:rank

cards:rank-10
1
 2
 10

100
10 D

cards:diamonds

cards:next

cards:card

14

cards:d10

cards:rank

rdfs:label

cards:rank-10
“10 of
Di...
1
 2
 10
cards:facedown

cards:next

cards:card

15

100
10 D

cards:diamonds

cards:d10

cards:rank

rdfs:label

cards:ra...
color:red

“10”^xsd:int

cards:facedown

cards:next

cards:card

16

cards:diamonds

cards:d10

cards:rank

rdfs:label

“K...
color:red

“10”^xsd:int

cards:facedown

cards:next

cards:card

cards:diamonds

cards:d10

cards:rank

rdfs:label

“Karo ...
Programming
function color(card) {
if ((card[2] == 1) or
(card[2] == 4)) {
return 1;
} else {
return 2;
}
}

Classic

func...
19
color:red

“10”^xsd:int

cards:facedown

cards:next

cards:card

color:yellow

cards:diamonds

cards:d10

cards:rank

rdfs...
21
color:purple

color:red
poke

r:col

color:yellow

or

cards:ca

cards:facedown

“10”^xsd:int
cards:diamonds

rdcolor

car...
BUT THAT ARE
KNOWLEDGE-BASED
SYSTEMS AS DONE FOR
DECADES!
23
“In the Semantic Web, it is not the
‘Semantic’ which is new, it is the
‘Web’ which is new.”
CHRIS WELTY, IBM

24
fb:like

color:purple

color:red
poke

r:col

or

aifb:Elena

cards:card

25

color:yellow

cards:diamonds

cards:d10
Semantic Web
Animal

Vegeterian restaurant

Human
Carbon

Queen

Restaurant

Enterprise
Hotel

10-Diamond
Culture Advertis...
Semantic Web

2007
27

22/05/2012
Semantic Web

2008
28

22/05/2012
2009
29

22/05/2012
2010
30

22/05/2012
2011
31

22/05/2012
Applications

SCHEMA.ORG

32

22/05/2012
Schema.org
A quick look.

33
34
35
36
Yandex
37
CreativeWork
event
UserInteraction

intangible

LocalBusiness

place
Landform

38

Organization
CivicStructure
For example?

39
40
<div itemscope itemtype="http://schema.org/VideoObject">!
  <h2>Video: <span itemprop="name">My Title</span></h2>!
  <meta...
(this is almost all you need to know about RDF, incidentally)
42
Applications

WIKIDATA

43

22/05/2012
44
Berlin

edit

Capital of Germany

edit

Also known as: City of Berlin
Main page
Content
API
Random page
Donate to Wikidata...
Berlin

edit

Hauptstadt von Deutschland

edit

Auch bekannt als: Stadt Berlin
Hauptseite
Inhalt
API
Zufällige Seite
Spend...
Application: Infoboxes
!  Now: every article calls an
infobox with local values
!  In Wikidata: one page with
values
! Wik...
48
OPEN QUESTIONS

Or: A few dozen possible paper, project and thesis topics

49
Open questions

UNFINISHED WORK

50
51
Unfinished work
!
!
!
!
!
!
!

52

 
 
 
 
 
 
 

What does a unifying logic look like?
How do we export proofs?
How do we...
Open questions

IDENTITY AND
REPRESENTATION
53
Bart Simpson
Bart

4030 (Character ID on ComicbookDB)
http://rdf.freebase.com/id/en.bart_simpson
http://dbpedia.org/resour...
Identity and representation
!
!
!
!
!
!
!
!
!
!
!

55

 
 
 
 
 
 
 
 
 
 
 

Is there anything out there?
How to find the...
Open questions

TRUST AND DIVERSITY

56
Berlin

edit

Capital of Germany

edit

Also known as: City of Berlin
Main page
Content
API
Random page
Donate to Wikidata...
A statement in Wikidata
Berlin
Population

3,499,879
As of November 30 2011
Method Extrapolation

[2 sources]

3,500,000
A...
A statement in Wikidata
Berlin
Population

3,499,879
As of November 30 2011
Method Extrapolation

3,500,000

3500000

[2 s...
A statement in Wikidata
Berlin
Population

8,000
As of 15th century
Method Estimate

3,500,000

3500000

[2 sources]

[1 s...
A statement in Wikidata
Berlin
Population

3,499,879
As of November 30 2011
Method Extrapolation

3,500,000

3500000

[2 s...
Trust and diversity
!   How to express provenance information?
!   How to store provenance of data?
!   Can provenance inf...
Open questions

UNITS AND ACCURACY

63

22/05/2012
Units and accuracy
!
!
!
!
!
!
!

64

 
 
 
 
 
 
 

How to express “17th century” next to literal dates?
How to express h...
Open questions

SERIALIZATIONS

65
Marge

Bart

parent
http://family.org/id/parent

http://simpsons.com/id/Bart

http://simpsons.com/id/Marge

http://fa
mily...
<?xml version=“1.0” encoding=“UTF-8”?>
<rdf:RDF
xmlns:rdf=“http://www.w3.org/1999/02/22-rdf-syntax-ns#”
xmlns:rdfs=“http:/...
@prefix
@prefix
@prefix
@prefix

rdf ‘http://www.w3.org/1999/02/22-rdf-syntax-ns#’
rdfs ‘http://www.w3.org/2000/01/rdf-sch...
0
1
1
2
0
1
2
1
1
0
1
1
1
0
1
1
1
0
1
1
1
0

69

HEAD
FILE simpsons
GEDC
VERS 5.5
@I1@ INDI
NAME Marge /Bouvier/
SURN Simp...
Serializations
!
!
!
!
!
!

70

 
 
 
 
 
 

Do all tools need to understand all serializations?
Are all serializations lo...
Open questions

ONTOLOGIES

71
Ontologies
!
!
!
!

 
 
 
 

“An ontology is a formal specification of a shared conceptualization”
Defines concepts and th...
Ontologies
!   “An ontology is a formal specification of a shared conceptualization”
!   Strict taxonomies
!   Bart a Fict...
Ontologies
!
!
!
!
!
!
!
!
!

74

 
 
 
 
 
 
 
 
 

How to achieve and measure sharedness?
Who defines the semantics of a...
Open questions

PRIVACY

75
76

76
77

77
Privacy
!
!
!
!

78

 
 
 
 

How to ensure privacy?
What does privacy mean?
How to publish linked data that is not open?
...
Open questions

SCALABILITY

79
Web Data Commons
!
!
!
!
!

80

 
 
 
 

Extracts data from Common Crawl (5b pages, 20 TB compressed)
65,408,946 domains w...
Scalability
!
!
!
!
!
!
!

81

 
 
 
 
 
 
 

How to efficiently use Semantic Web data?
How to select the appropriate set?...
QUESTIONS?

82
Introduction to Hands-On

WHAT ABOUT THE LINKS?

83

22/05/2012
What are the links in "linked
data"?
Are they links between things?

Are they links between documents?

How exactly do the...
Links and Links
!   These questions motivate and drive the Linked Data project, and
have been with the Web from the start....
86
In the beginning...

(1989, 1994, ...)

87
88
89
90
91
92
93
94
95
What's in a (hyper)link?

!   Does a node in the graph stand for 'Stephen Fry'-the-Person? or 'a
page about Stephen Fry'?
...
1989 again

One flat graph? What if we disagree?
97
A Graph of Graphs?

!   Classic WWW hypertext is a top-level document graph.
!   Those documents make claims about the wor...
99
BBC
Freebase
sameas.org

IMDB
stephenfry.com
VIAF

dbpedia.org

100

RottenTomatoes

NewYorkTimes
(No single 'correct' view)

We can emphasize the landscape of sites/datasets...
101
(No single 'correct' view)

We can emphasize the landscape of sites/datasets...
102
Or we can zoom in, and
see how records can be
merged / flattened into a
single set of triples...
103
Summary

!   Linked datasets, pages, real world things...
!   ... all of these are represented in RDF datasets.
!   To que...
Hands-on

EXPLORATION

105
Hands-on
!   You will explore datasets with SPARQL about Stephen Fry
!   SPARQL yourself and your colleagues
!   Spark: SP...
Thinking about data
!   We made a data/ folder for you
!   Real public RDF data about a real person
!   Sources: DBpedia, ...
What to do

!   “Get your hands dirty” with real Linked Data
!   If you hit a problem, make a note of it - & ask!
!   Most...
Questions

!
!
!
!
!

109

 
 
 
 
 

What RDF schemas/ontologies do you see?
How are people and other things identified?
...
Internet Detectives
!   for each triple, can you figure out “how it got there”? in whose voice
is it?
!   is there a real ...
data-and-queries-intro.txt
!   See the info/ folder for more details - SPARQL setup and some
querying tutorial.
!   Goal i...
Hands-on

SPARQL YOURSELF

112
SPARQL yourself

SPARQL endpoint
http://192.168.0.20:8080/openrdf-sesame/repositories/Students

SPARQL Web Form
http://192...
Hands-on

SPARK

114
Spark

115
Spark visualizations

116
Spark visualizations

117
Exercise

118
Exercise

119
Semantic MediaWiki

120
Semantic MediaWiki - Export

121
Task

!   Let’s add semanticweb.org as an additional source in order to add Dan
from there to the lists of the “Friends of...
123

22/05/2012
Upcoming SlideShare
Loading in …5
×

ESWC SS 2012 - Tuesday Tutorial Dan Brickley and Denny Vrandecic: Linked Open Data

211
-1

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
211
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

ESWC SS 2012 - Tuesday Tutorial Dan Brickley and Denny Vrandecic: Linked Open Data

  1. 1. Linked Open Data Dan Brickley, Google Denny Vrandečić, Wikimedia Session Linked data, Tuesday, 9:45-11:15
  2. 2. Agenda ! ! ! ! ! ! ! ! ! ! 2                     Notation Linked Open Data principles Applied LOD principles Application: schema.org Application: Wikidata Open questions Hands-On Intro: On links Hands-on: Exploration Hands-on: SPARQL Hands-on: Spark 22/05/2012
  3. 3. Notation !   URIs here generally abbreviated with CURIEs (e.g. http://dbpedia.org/resource/Kalamaki = dbpedia:Kalamaki) !   Entities and literals are labeled rectangles !   Blank nodes are circles !   Triples are arrows labeled with property connecting subject and object fb:likes 3 22/05/2012 dbpedia:Kalamaki
  4. 4. Background LOD PRINCIPLES 4 22/05/2012
  5. 5. Linked Open Data principles 1.  Use URIs as names for things 2.  Use HTTP URIs so that they can be looked up 3.  Provide results in standard formats (e.g. RDF, SPARQL) 4.  Link to other URIs 5 22/05/2012
  6. 6. LOD Application WHY SEMANTIC COMPUTING? 6
  7. 7. 7
  8. 8. 1 2 10 Field 1: Tag !   0 = Face up !   1 = Face down Field 2: Suit !   1 = Clubs !   2 = Diamonds !   3 = Hearts !   4 = Spades 100 10 D Field 3: Rank !   1 = Ace !   2..10 = 2..10 !   11 = Jack !   12 = Queen !   13 = King Field 4: Address next card Field 5: “Human-readable” Example from Donald Knuth, The Art of Computer Programming, Chapter 1 8
  9. 9. 1 2 10 9 100 10 D
  10. 10. cards:next 1 2 10 10 100 10 D
  11. 11. 1 2 10 cards:next cards:card 11 cards:d10 100 10 D
  12. 12. http://example.org/cards/d10! !   Oh, an unknown term !   It is an HTTP URI! GET /cards/d10 HTTP/1.1! HOST www.example.org! Accept: text/rdf+n3, application/rdf+xml! ! ! HTTP/1.1 200 OK! Content-type: text/n3; charset-UTF-8! ! cards:d10 rdf:type cards:Card ;! rdfs:label “10 of diamonds”@en ;! cards:suit cards:diamonds ;! cards:rank cards:rank-10 .! 12 22/05/2012
  13. 13. 1 2 10 100 10 D cards:diamonds cards:next cards:card 13 cards:d10 cards:rank cards:rank-10
  14. 14. 1 2 10 100 10 D cards:diamonds cards:next cards:card 14 cards:d10 cards:rank rdfs:label cards:rank-10 “10 of Diamonds”@en
  15. 15. 1 2 10 cards:facedown cards:next cards:card 15 100 10 D cards:diamonds cards:d10 cards:rank rdfs:label cards:rank-10 “10 of Diamonds”@en
  16. 16. color:red “10”^xsd:int cards:facedown cards:next cards:card 16 cards:diamonds cards:d10 cards:rank rdfs:label “Karo 10”@de cards:rank-10 “10 of Diamonds”@en
  17. 17. color:red “10”^xsd:int cards:facedown cards:next cards:card cards:diamonds cards:d10 cards:rank rdfs:label “Karo 10”@de cards:rank-10 “10 of Diamonds”@en cards:suit ○ cards:color ⊑ cards:cardcolor 17
  18. 18. Programming function color(card) { if ((card[2] == 1) or (card[2] == 4)) { return 1; } else { return 2; } } Classic function color(card) { return 2 – int((card[2] == 1) or (card[2] == 4)); } Wannabe Hacker function color(card) { if ((card.suite == cards.clubs) or (card.suite == cards.spades)){ return cards.black; } else { return cards.red; } } Symbolic constants cards:cardcolor select ?color where { card cards:cardcolor ?color } Where is the knowledge? How do I edit it? 18 Semantic
  19. 19. 19
  20. 20. color:red “10”^xsd:int cards:facedown cards:next cards:card color:yellow cards:diamonds cards:d10 cards:rank rdfs:label “Karo 10”@de cards:rank-10 “10 of Diamonds”@en cards:suit ○ skat:color ⊑ cards:cardcolor 20
  21. 21. 21
  22. 22. color:purple color:red poke r:col color:yellow or cards:ca cards:facedown “10”^xsd:int cards:diamonds rdcolor cards:next cards:card cards:d10 cards:rank rdfs:label “Karo 10”@de cards:rank-10 “10 of Diamonds”@en cards:suit ○ poker:color ⊑ cards:cardcolor 22
  23. 23. BUT THAT ARE KNOWLEDGE-BASED SYSTEMS AS DONE FOR DECADES! 23
  24. 24. “In the Semantic Web, it is not the ‘Semantic’ which is new, it is the ‘Web’ which is new.” CHRIS WELTY, IBM 24
  25. 25. fb:like color:purple color:red poke r:col or aifb:Elena cards:card 25 color:yellow cards:diamonds cards:d10
  26. 26. Semantic Web Animal Vegeterian restaurant Human Carbon Queen Restaurant Enterprise Hotel 10-Diamond Culture Advertisment Queen-Diamond Diamond King Diamond TV Show Asia Cosmos Purple University AIFB 26 KIT Inchineon Mumbay Airport China India Mumbay Education Tatort Elena Airport Airline Lao Tse Ceylon Religion Karlsruhe Philosophy
  27. 27. Semantic Web 2007 27 22/05/2012
  28. 28. Semantic Web 2008 28 22/05/2012
  29. 29. 2009 29 22/05/2012
  30. 30. 2010 30 22/05/2012
  31. 31. 2011 31 22/05/2012
  32. 32. Applications SCHEMA.ORG 32 22/05/2012
  33. 33. Schema.org A quick look. 33
  34. 34. 34
  35. 35. 35
  36. 36. 36
  37. 37. Yandex 37
  38. 38. CreativeWork event UserInteraction intangible LocalBusiness place Landform 38 Organization CivicStructure
  39. 39. For example? 39
  40. 40. 40
  41. 41. <div itemscope itemtype="http://schema.org/VideoObject">!   <h2>Video: <span itemprop="name">My Title</span></h2>!   <meta itemprop="duration" content="T1M33S" />!   <meta itemprop="thumbnailUrl" content="thumbnail.jpg" />!   <meta itemprop="embedUrl"!     content="http://example.com/videoplayer.swf?video=123" />!   <object ...>!     <embed type="application/x-shockwave-flash" ...>!   </object>!   <span itemprop="description">Video description</span>! </div>! Type: http://schema.org/VideoObject name = My Title duration = T1M33S thumbnailurl = thumbnail.jpg embedurl = http://www.example.com/videoplayer.swf? video=123 description = Video description 41
  42. 42. (this is almost all you need to know about RDF, incidentally) 42
  43. 43. Applications WIKIDATA 43 22/05/2012
  44. 44. 44
  45. 45. Berlin edit Capital of Germany edit Also known as: City of Berlin Main page Content API Random page Donate to Wikidata Interaction Help About Wikidata Community Recent changes Languages Catalá Cesky Dansk Eesti English Español Esperanto Français Hrvatski Italiano O’zbek Continent Europe [3 sources] Country Germany [2 sources] Population 3,499,879 As of November 30 2011 Method Extrapolation 3,500,000 As of 2012 Method Estimate [1 source] [2 sources] [further values] Phone prefix 030 since June 1973 0311 before June 1973 Mayor Registration license Area Complete list Twin city 45 edit | x [new statement] Klaus W| Klaus Wowereit B German politician Klaus Wunderlich German musician 891,85 km” Klaus Wagner Stalker of the Los Angeles British royal family Klaus Wagner German mathematician Klaus Waldeck Austrian musician and lawyer [2 sources] [1 source] [no source] [1 source] [2 sources] [no sources]
  46. 46. Berlin edit Hauptstadt von Deutschland edit Auch bekannt als: Stadt Berlin Hauptseite Inhalt API Zufällige Seite Spende an Wikidata Interaktion Hilfe Über Wikidata Benutzerportal Letze Änderungen Sprachen Catalá Cesky Dansk Eesti English Español Esperanto Français Hrvatski Italiano O’zbek Kontinent Europa [3 Quellen] Land Deutschland [2 Quellen] Einwohner 3.499.879 Stand 30. November 2011 Methode Fortschreibung 3.500.000 Stand 2012 Methode Schätzung [1 Quelle] [2 Quellen] [weitere Werte] Telefonvorwahl 030 Seit Juni 1973 0311 Vor Juni 1973 Bürgermeister Amtliches Kennzeichen Fläche Vollständige Liste Partnerstadt 46 edit | x [neue Aussage] Klaus W| Klaus Wowereit B Deutscher Politiker Klaus Wunderlich Deutscher 891,85 km”Musiker Klaus Wagner Stalker der Britischen Königsfamilie Los Angeles Klaus Wagner Deutscher Mathematiker Klaus Waldeck Österreichischer Musiker und Anwalt [2 Quellen] [1 Quelle] [keine Quellen] [1 Quelle] [2 Quellen] [keine Quellen]
  47. 47. Application: Infoboxes !  Now: every article calls an infobox with local values !  In Wikidata: one page with values ! Wikipedias fill infoboxes with Wikidata values 47
  48. 48. 48
  49. 49. OPEN QUESTIONS Or: A few dozen possible paper, project and thesis topics 49
  50. 50. Open questions UNFINISHED WORK 50
  51. 51. 51
  52. 52. Unfinished work ! ! ! ! ! ! ! 52               What does a unifying logic look like? How do we export proofs? How do we validate proofs? How do we express trust? How does the crypto stack really work? What are usable interfaces to the Semantic Web? How are Semantic Web applications created?
  53. 53. Open questions IDENTITY AND REPRESENTATION 53
  54. 54. Bart Simpson Bart 4030 (Character ID on ComicbookDB) http://rdf.freebase.com/id/en.bart_simpson http://dbpedia.org/resource/Bart_Simpson http://en.wikipedia.org/wiki/Bart_Simpson http://simpsons.com/id/Bart http://en.wikipedia.org/wiki/Bart_Simpson 54
  55. 55. Identity and representation ! ! ! ! ! ! ! ! ! ! ! 55                       Is there anything out there? How to find the right identifier? How to know what an identifier identifies? What about the multitude of identifiers? How do we know that two identifiers identify the same entity? How do we know that two identifiers identify different entities? Without this, can we still usefully apply statistical techniques? What about creating new identifiers? What if identifiers are ambiguous? How to find representations for entities fitting my UI? How to choose a representation?
  56. 56. Open questions TRUST AND DIVERSITY 56
  57. 57. Berlin edit Capital of Germany edit Also known as: City of Berlin Main page Content API Random page Donate to Wikidata Interaction Help About Wikidata Community Recent changes Languages Catalá Cesky Dansk Eesti English Español Esperanto Français Hrvatski Italiano O’zbek edit | x Continent Europe [3 sources] Country Germany [2 sources] Population 3,499,879 As of November 30 2011 Method Extrapolation 3,500,000 As of 2012 Method Estimate [1 source] [2 sources] [further values] Phone prefix 030 since June 1973 0311 before June 1973 [2 sources] [1 source] Mayor Klaus Wowereit [no source] Registration license B [1 source] Area 891,85 km” [2 sources] Twin city Los Angeles [no sources] Complete list 57 [new statement]
  58. 58. A statement in Wikidata Berlin Population 3,499,879 As of November 30 2011 Method Extrapolation [2 sources] 3,500,000 As of 2012 Method Estimate 58 [1 source]
  59. 59. A statement in Wikidata Berlin Population 3,499,879 As of November 30 2011 Method Extrapolation 3,500,000 3500000 [2 sources] [1 source] population population Berlin item property Statement1 as of 2011-11-30 59 method Extrapolation 3499879 value
  60. 60. A statement in Wikidata Berlin Population 8,000 As of 15th century Method Estimate 3,500,000 3500000 [2 sources] [1 source] population property item value Statement2 property Statement1 as of 15th century 60 population Berlin method Estimate 8000 value
  61. 61. A statement in Wikidata Berlin Population 3,499,879 As of November 30 2011 Method Extrapolation 3,500,000 3500000 [2 sources] [1 source] population population Berlin property item value Statement2 property Statement1 reference as of value reference method Source3 2011-11-30 61 3499879 Extrapolation Source1 Source2
  62. 62. Trust and diversity !   How to express provenance information? !   How to store provenance of data? !   Can provenance information be expressed such that the data is still easily accessible? !   How to query data with provenance information? !   How to deal with genuinely diverse data? !   How to match diverse vocabularies? !   How to deal with noisy data? !   Is reification really necessary? !   Do named graphs provide solutions? !   Use one graph per statement? 62
  63. 63. Open questions UNITS AND ACCURACY 63 22/05/2012
  64. 64. Units and accuracy ! ! ! ! ! ! ! 64               How to express “17th century” next to literal dates? How to express heterogeneous accuracies? Is a functional value of 40,000km really inconsistent with 39,987km? How to express confidence values? How to express units? Is 176cm equal to 5ft9? 177cm too? Is equality transitive? How to express ranges (e.g. property “active” for bands)? 22/05/2012
  65. 65. Open questions SERIALIZATIONS 65
  66. 66. Marge Bart parent http://family.org/id/parent http://simpsons.com/id/Bart http://simpsons.com/id/Marge http://fa mily.org sibling /id/siblin http://family.org/id/Adult g Adult http://family.org/id/Child http://simpsons.com/id/Lisa Lisa 66 Child http://www.w3.org/1999/02/22/rdf-syntax-ns#type http://www.w3.org/2000/01/rdf-schema#label
  67. 67. <?xml version=“1.0” encoding=“UTF-8”?> <rdf:RDF xmlns:rdf=“http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns:rdfs=“http://www.w3.org/2000/01/rdf-schema#” xmlns:family=“http://family.org/id/”> <rdf:Description rdf:about=“http://simpsons.com/id/Marge”> <rdf:type rdf:resource=“http://family.org/id/Adult”/> <rdfs:label>Marge</rdfs:label> </rdf:Description> <rdf:Description rdf:about=“http://simpsons.com/id/Bart”> <rdfs:label>Bart</rdfs:label> <rdf:type rdf:resource=“http://family.org/id/Child”/> <family:parent rdf:resource=“http://simpsons.com/id/Marge”/> <family:sibling rdf:resource=“http://simpsons.com/id/Lisa”/> </rdf:Description> <rdf:Description rdf:about=“http://simpsons.com/id/Lisa”> <rdfs:label>Lisa</rdfs:label> <rdf:type rdf:resource=“http://family.org/id/Child”/> <family:parent rdf:resource=“http://simpsons.com/id/Marge”/> </rdf:Description> </rdf:RDF> 67
  68. 68. @prefix @prefix @prefix @prefix rdf ‘http://www.w3.org/1999/02/22-rdf-syntax-ns#’ rdfs ‘http://www.w3.org/2000/01/rdf-schema#’ family ‘http://family.org/id/’ simpsons ‘http://simpsons.com/id/’ simpsons:Marge rdf:type family:Adult ; rdfs:label ‘Marge’ . simpsons:Bart rdf:type family:Child ; rdfs:label ‘Bart’ ; family:parent simpsons:Marge ; family:sibling simpsons:Lisa . simpsons:Lisa rdf:type family:Child ; rdfs:label ‘Lisa’ ; family:parent simpsons:Marge . { “id” : “Bart”, “type” : “Child”, “sibling” : “Lisa”, “parent” : “Marge” } 68
  69. 69. 0 1 1 2 0 1 2 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 69 HEAD FILE simpsons GEDC VERS 5.5 @I1@ INDI NAME Marge /Bouvier/ SURN Simpson SEX F FAMS @F1@ @I2@ INDI NAME Bart /Simpson/ SEX M FAMS @F1@ @I3@ INDI NAME Lisa /Simpson/ SEX F FAMS @F1@ @F1@ FAM WIFE @I1@ CHIL @I2@ CHIL @I3@ TRLR Bart is a son of [[parent::Marge]] and the brother of [[sibling::Lisa]]. Child(Bart). sibling(Bart, Lisa). parent(Bart, Marge). { “id” : “Bart”, “type” : “Child”, “sibling” : “Lisa”, “parent” : “Marge” }
  70. 70. Serializations ! ! ! ! ! ! 70             Do all tools need to understand all serializations? Are all serializations lossless? How to ensure they are up-to-date? What about current tools that don’t understand anything? Is the data sufficiently complete? How to seamlessly ground and lift data to RDF?
  71. 71. Open questions ONTOLOGIES 71
  72. 72. Ontologies ! ! ! !         “An ontology is a formal specification of a shared conceptualization” Defines concepts and their formal relations to each other You can understand a concept without having a word for it Axiom not possible in OWL L, can only be approximated sibling ⊑ parent ○ sister ○ husband V parent ○ brother = uncle Herb Homer ⚭ Marge Bart 72 Selma ♂ ⚭ Sideshow Bob
  73. 73. Ontologies !   “An ontology is a formal specification of a shared conceptualization” !   Strict taxonomies !   Bart a FictionalPerson ! owl:sameAs !   GDR sameAs Germany !   Classes as individuals !   Eagle a EndangeredSpecies ! rdfs:domain and rdfs:range ! family:child rdfs:range foaf:Person !   “Unauthorized” extensions ! 73 foaf:favouriteMovie
  74. 74. Ontologies ! ! ! ! ! ! ! ! ! 74                   How to achieve and measure sharedness? Who defines the semantics of a term? How to achieve correctness? Does sharedness mean correctness? How to overcome limitations on expressivity? How to deal with wishes for more expressivity? How to deal with undecidability? What does inconsistency mean? How to deal with brittleness?
  75. 75. Open questions PRIVACY 75
  76. 76. 76 76
  77. 77. 77 77
  78. 78. Privacy ! ! ! ! 78         How to ensure privacy? What does privacy mean? How to publish linked data that is not open? What about the ethics of combining data?
  79. 79. Open questions SCALABILITY 79
  80. 80. Web Data Commons ! ! ! ! ! 80         Extracts data from Common Crawl (5b pages, 20 TB compressed) 65,408,946 domains with triples 1,222,563,749 typed entities 3,294,248,653 triples www.webdatacommons.org 22/05/2012
  81. 81. Scalability ! ! ! ! ! ! ! 81               How to efficiently use Semantic Web data? How to select the appropriate set? How to cache it? How to deal with frequent updates? How to deal with SPARQL endpoints vs RDF? How to do federated queries? Who pays for it and when?
  82. 82. QUESTIONS? 82
  83. 83. Introduction to Hands-On WHAT ABOUT THE LINKS? 83 22/05/2012
  84. 84. What are the links in "linked data"? Are they links between things? Are they links between documents? How exactly do the "Web hyperlinks" we know and love relate to the factual "typed links" of data modeling? 84
  85. 85. Links and Links !   These questions motivate and drive the Linked Data project, and have been with the Web from the start. !   They explain our most boring debates ("http-range-14"). !   And show how 'Semantic Web' is a project to improve the mainstream Web itself. 85
  86. 86. 86
  87. 87. In the beginning... (1989, 1994, ...) 87
  88. 88. 88
  89. 89. 89
  90. 90. 90
  91. 91. 91
  92. 92. 92
  93. 93. 93
  94. 94. 94
  95. 95. 95
  96. 96. What's in a (hyper)link? !   Does a node in the graph stand for 'Stephen Fry'-the-Person? or 'a page about Stephen Fry'? !   What about when there are multiple pages about the same person? in different voices? sometimes disagreeing? !   RDF thinks in triples, but data management is often in quads: asking who-said-what in SPARQL 96
  97. 97. 1989 again One flat graph? What if we disagree? 97
  98. 98. A Graph of Graphs? !   Classic WWW hypertext is a top-level document graph. !   Those documents make claims about the world; factual graphs, e.g. schema.org, RDFa. !   SPARQL let's us store and query all this. !   Each Web 'node' may give us its own 'nodes and links' description, including links. 98
  99. 99. 99
  100. 100. BBC Freebase sameas.org IMDB stephenfry.com VIAF dbpedia.org 100 RottenTomatoes NewYorkTimes
  101. 101. (No single 'correct' view) We can emphasize the landscape of sites/datasets... 101
  102. 102. (No single 'correct' view) We can emphasize the landscape of sites/datasets... 102
  103. 103. Or we can zoom in, and see how records can be merged / flattened into a single set of triples... 103
  104. 104. Summary !   Linked datasets, pages, real world things... !   ... all of these are represented in RDF datasets. !   To query this hands on, we can use SPARQL to ask questions, and 'named graphs' to organize factual claims into groups. 104
  105. 105. Hands-on EXPLORATION 105
  106. 106. Hands-on !   You will explore datasets with SPARQL about Stephen Fry !   SPARQL yourself and your colleagues !   Spark: SPARQL on the Web 106
  107. 107. Thinking about data !   We made a data/ folder for you !   Real public RDF data about a real person !   Sources: DBpedia, Freebase, VIAF, sameas.org, New York Times, Identi.ca, BBC, Rotten Tomatoes, IMDB and us. !   I’ll briefly introduce the data now, then see info/data-and-queriesintro.txt http://192.168.0.20:8080/openrdf-workbench/repositories/Tuesday 107
  108. 108. What to do !   “Get your hands dirty” with real Linked Data !   If you hit a problem, make a note of it - & ask! !   Most files have RDF describing Stephen Fry; he is real and human, please bear that in mind. !   Study the shape and patterns of the data, ask yourself questions, using SPARQL to explore. 108
  109. 109. Questions ! ! ! ! ! 109           What RDF schemas/ontologies do you see? How are people and other things identified? Are there common patterns across sources? Can you write queries that integrate these? What bugs in the data are there? How do you think they got there?
  110. 110. Internet Detectives !   for each triple, can you figure out “how it got there”? in whose voice is it? !   is there a real schema? (if the Wifi is up) !   how would you check its truth? who “said” it and how could a machine tell? !   which sources (or parts) aggregate different points of view within a single RDF graph? 110
  111. 111. data-and-queries-intro.txt !   See the info/ folder for more details - SPARQL setup and some querying tutorial. !   Goal is to study the Linked Data Web and understand how it might evolve. !   Identify project and research topics, and ways of helping to improve the Web. 111
  112. 112. Hands-on SPARQL YOURSELF 112
  113. 113. SPARQL yourself SPARQL endpoint http://192.168.0.20:8080/openrdf-sesame/repositories/Students SPARQL Web Form http://192.168.0.20:8080/openrdf-workbench/repositories/Students/query 113
  114. 114. Hands-on SPARK 114
  115. 115. Spark 115
  116. 116. Spark visualizations 116
  117. 117. Spark visualizations 117
  118. 118. Exercise 118
  119. 119. Exercise 119
  120. 120. Semantic MediaWiki 120
  121. 121. Semantic MediaWiki - Export 121
  122. 122. Task !   Let’s add semanticweb.org as an additional source in order to add Dan from there to the lists of the “Friends of Spark”. !   Expand spark.zip, then check test/index.html 122
  123. 123. 123 22/05/2012
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×