Cigs lod docext_kb_20131118

358 views

Published on

Can documents be Linked Data? / Kate Byrne, School of Informatics, University of Edinburgh, CIGS LOD Workshop

Presented at Linked Open Data: current practice in libraries and archives (Cataloguing & Indexing Group in Scotlland 3rd Linked Open Data Conference), Edinburgh, 18 Nov 2013

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
358
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Cigs lod docext_kb_20131118

  1. 1. vision txt2rdf grounding Can Documents be Linked Data? Kate Byrne, School of Informatics, University of Edinburgh CIGS LOD Workshop 18th November 2013 1
  2. 2. vision txt2rdf 1 The semantic web vision 2 Extracting structured knowledge from free text 3 grounding Respect for authority, or, Why we need ontologies 2
  3. 3. vision txt2rdf grounding The semantic web vision W3C RDF Concepts, 2002 draft “RDF ... allows anyone to say anything about anything.” Tim Berners-Lee, 2006 “The day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machine, leaving humans to provide the inspiration and intuition.” Tim Berners-Lee, 2009 “The web as I envisaged it, we have not seen it yet.” 3
  4. 4. vision txt2rdf grounding The semantic web vision W3C RDF Concepts, 2002 draft “RDF ... allows anyone to say anything about anything.” Tim Berners-Lee, 2006 “The day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machine, leaving humans to provide the inspiration and intuition.” Tim Berners-Lee, 2009 “The web as I envisaged it, we have not seen it yet.” 3
  5. 5. vision txt2rdf grounding The semantic web vision W3C RDF Concepts, 2002 draft “RDF ... allows anyone to say anything about anything.” Tim Berners-Lee, 2006 “The day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machine, leaving humans to provide the inspiration and intuition.” Tim Berners-Lee, 2009 “The web as I envisaged it, we have not seen it yet.” 3
  6. 6. vision txt2rdf grounding The semantic web vision W3C RDF Concepts, 2002 draft “RDF ... allows anyone to say anything about anything.” Tim Berners-Lee, 2006 “The day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machine, leaving humans to provide the inspiration and intuition.” Tim Berners-Lee, 2009 “The web as I envisaged it, we have not seen it yet.” 3
  7. 7. vision txt2rdf grounding Simple declarative sentences “In a hole in the ground there lived a hobbit. Not a nasty, dirty, wet hole, filled with the ends of worms and an oozy smell, nor yet a dry, bare, sandy hole with nothing in it to sit down on or to eat: it was a hobbit-hole, and that means comfort.” 5
  8. 8. vision txt2rdf grounding Simple declarative sentences “In a hole in the ground there lived a hobbit. Not a nasty, dirty, wet hole, filled with the ends of worms and an oozy smell, nor yet a dry, bare, sandy hole with nothing in it to sit down on or to eat: it was a hobbit-hole, and that means comfort.” hobbit lives in hole located in the ground 5
  9. 9. vision txt2rdf grounding Simple declarative sentences “In a hole in the ground there lived a hobbit. Not a nasty, dirty, wet hole, filled with the ends of worms and an oozy smell, nor yet a dry, bare, sandy hole with nothing in it to sit down on or to eat: it was a hobbit-hole, and that means comfort.” hobbit lives in hole located in the ground does not have nastiness has type hobbit hole has characteristic comfort 5
  10. 10. vision txt2rdf grounding A lot of information is in textual form! 6
  11. 11. vision txt2rdf grounding A lot of information is in textual form! 6
  12. 12. vision txt2rdf grounding A lot of information is in textual form! 6
  13. 13. vision txt2rdf grounding A lot of information is in textual form! 6
  14. 14. vision txt2rdf grounding A lot of information is in textual form! 6
  15. 15. vision txt2rdf grounding A lot of information is in textual form! 6
  16. 16. vision txt2rdf grounding Nouns and verbs subject object predicate 7
  17. 17. vision txt2rdf grounding Nouns and verbs subject object predicate hobbit lives in hole located in the ground does not have nastiness has type hobbit hole has characteristic comfort 7
  18. 18. vision txt2rdf grounding Nouns and verbs subject object predicate nouns hobbit lives in hole located in the ground does not have nastiness has type hobbit hole has characteristic comfort 7
  19. 19. vision txt2rdf grounding Nouns and verbs subject object predicate nouns hobbit lives in hole located in the ground does not have verbs nastiness has type hobbit hole has characteristic comfort 7
  20. 20. vision txt2rdf 1 The semantic web vision 2 Extracting structured knowledge from free text 3 grounding Respect for authority, or, Why we need ontologies 8
  21. 21. vision txt2rdf grounding Extracting structured knowledge from free text fancy NLP processing and RDFisation 8
  22. 22. vision txt2rdf grounding Natural Language Processing pipeline Text documents sfsjksjwjvssjkljljs sd’lajoen s Pre−processing tokenise jjs kjdlk lksjlkj sks oihhg sk jjlkjlj jljbjl skj ekw generate triples Graph of triples sentence and para split remove unwanted relations RDF translation Named Entity Recognition POS tag multi−word tokens and features trained NER model list of NEs and classes attach siteids trained RE model set of NE pairs and features list of relations and classes Relation Extraction 9
  23. 23. vision txt2rdf grounding Named entities and relations site 20 Evidence of a quartz knapping site was found within the confines of the stone circle, and in conjunction with several structures within the inner ring, strongly suggests a domestic site. Besides the quartz implements and corresponding waste, several other artifacts of local origin occurred including a split pebble axe of greenstone with Shetland Early Bronze Age affinities. B Beveridge, 1972. Field survey and excavation, as a response to continual wind and marine erosion, was carried out at the Sands of Breckon between 1982 and 1983. HP50NW 11.00 was recorded as a stone settings surrounded by occupational debris (Site 22). Excavation revealed midden deposits of an early Iron Age date and a surface scatter of artefacts of mixed dates. The stone settings were tentatively interpreted as the basal stones of long cists. Historic Scotland Archive Project (SW) 2002. 10
  24. 24. vision txt2rdf grounding Named entities and relations site 20 10
  25. 25. vision txt2rdf grounding Converting text relations to RDF – 1 site 20 site20 − hasEvent − excavationX excavationX − hasLocation − SandsOfBreckon excavationX − hasDate − 1982 11
  26. 26. vision txt2rdf grounding Converting text relations to RDF – 2 event:excavation site20 − hasEvent − excavationX excavationX − hasLocation − SandsOfBreckon excavationX − hasDate − 1982 rdf:type date:1982 sitetype:stone+settings20w179 :hasPeriod :hasEvent event:excavation20w158 :hasClassn siteid:site20 :hasLocation :hasLocation sitename:sands+of+breckon :hasLocation address:hp50nw+11.00 address:breckon 12
  27. 27. vision txt2rdf 1 The semantic web vision 2 Extracting structured knowledge from free text 3 grounding Respect for authority, or, Why we need ontologies 13
  28. 28. vision txt2rdf grounding Let’s remind ourselves what’s the point of Linked Data 13
  29. 29. vision txt2rdf grounding Let’s remind ourselves what’s the point of Linked Data archaeological site archive museum database siteid: sitename: 47919 Cairnpapple find spot: Cairnpapple classification: Cairn, henge This stone flake from the cutting edge of a ground stone axehead was found at Cairnpapple in West Lothian. The stone is from... site number: NS97SE 16 objectid: X.EP 167 A complex site on the summit of Cairnpapple Hill excavated by Piggot in 1947... :Objectid#x.ep+167 Classn/Sitetype#cairn%20+henge :hasClassn :hasFindSpot :hasClassn :hasId :Siteid#site47919 :hasLocation :Classn/Objtype#axe+flake Id#ns97se+16 :hasEvent :Loc/Sitename#cairnpapple :Event#excavated47919w10 :hasLocation :hasLocation :Loc/Place#west+lothian :hasAgent :Agent/Person#piggot :hasPeriod :Time/Date#1947 :Loc/Place#cairnpapple+hill 13
  30. 30. vision txt2rdf grounding But linking Linked Data is actually pretty hard archaeological site archive museum database siteid: sitename: 47919 Cairnpapple find spot: Cairnpapple classification: Cairn, henge This stone flake from the cutting edge of a ground stone axehead was found at Cairnpapple in West Lothian. The stone is from... site number: NS97SE 16 objectid: X.EP 167 A complex site on the summit of Cairnpapple Hill excavated by Piggot in 1947... :Objectid#x.ep+167 Classn/Sitetype#cairn%20+henge :hasClassn :hasFindSpot :hasClassn :hasId :Siteid#site47919 :hasLocation :Classn/Objtype#axe+flake Id#ns97se+16 :hasEvent :Loc/Sitename#cairnpapple :Event#excavated47919w10 :hasLocation :hasLocation :Loc/Place#west+lothian :hasAgent :Agent/Person#piggot :hasPeriod :Time/Date#1947 :Loc/Place#cairnpapple+hill Direct link means spotting identical node in separate graph How? String matching? Clues from context? 14
  31. 31. vision txt2rdf grounding Using LOD cloud “Authority Nodes” as intermediaries 15
  32. 32. vision txt2rdf grounding Using LOD cloud “Authority Nodes” as intermediaries 15
  33. 33. vision txt2rdf grounding Using LOD cloud “Authority Nodes” as intermediaries grounding local URIs against "authority" nodes is the next big challenge! 15
  34. 34. vision txt2rdf grounding Grounding site20 against Monument Thesaurus sitetype:religious+ritual+and+funerary skos:broader sitetype:standing+stone "An arrangement of two or more standing stones" sitetype:stone+circle skos:scopeNote event:excavation "stone setting" rdf:type sitetype:stone+row skos:related rdfs:label sitetype:stone+setting rdfs:subClassOf rdf:type sitetype: date:1982 sitetype:stone+settings20w179 :hasPeriod :hasClassn :hasEvent event:excavation20w158 siteid:site20 :hasLocation :hasLocation sitename:sands+of+breckon :hasLocation address:hp50nw+11.01+hp+5304+0519 address:breckon 16
  35. 35. vision txt2rdf grounding Grounding site20 against Monument Thesaurus sitetype:religious+ritual+and+funerary skos:broader sitetype:standing+stone "An arrangement of two or more standing stones" sitetype:stone+circle skos:scopeNote event:excavation "stone setting" sitetype:stone+row skos:related rdf:type rdfs:label sitetype:stone+setting rdfs:subClassOf rdf:type sitetype: date:1982 sitetype:stone+settings20w179 :hasPeriod :hasClassn :hasEvent event:excavation20w158 siteid:site20 :hasLocation :hasLocation sitename:sands+of+breckon :hasLocation address:hp50nw+11.01+hp+5304+0519 address:breckon 16
  36. 36. vision txt2rdf grounding Grounding against various authorities/ontologies Placename authorities: Geonames, OS gazetteer, Pleiades Period: EH draft ontology Monument classifications: Seneschal project Bibliographic: LCSH, FRBR ...hundreds of LOD datasets in the cloud Informatics projects Edina “Unlock” service – spatial and temporal grounding GAP projects – grounding against maps of the ancient world 17
  37. 37. vision txt2rdf grounding Grounding against various authorities/ontologies Placename authorities: Geonames, OS gazetteer, Pleiades Period: EH draft ontology Monument classifications: Seneschal project Bibliographic: LCSH, FRBR ...hundreds of LOD datasets in the cloud Informatics projects Edina “Unlock” service – spatial and temporal grounding GAP projects – grounding against maps of the ancient world 17
  38. 38. vision txt2rdf grounding Unlock Text – find placenames and plot on map http://unlock.edina.ac.uk/ 18
  39. 39. vision txt2rdf grounding GapVis interface http://nrabinowitz.github.com/gapvis/ 19
  40. 40. vision txt2rdf grounding Questions? 20

×