• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
LO(D) and behold: issues, tips and techniques for extending to the giant global graph
 

LO(D) and behold: issues, tips and techniques for extending to the giant global graph

on

  • 234 views

Presentation given to the Cataloguing and Indexing Group Scotland seminar on Linked Open Data practises in archives and libraries, 18 November 2013. I explained the issues associated with discovering ...

Presentation given to the Cataloguing and Indexing Group Scotland seminar on Linked Open Data practises in archives and libraries, 18 November 2013. I explained the issues associated with discovering vocabulary URIs from literals and tips and techniques that could be employed to help discovery of URIs

Statistics

Views

Total Views
234
Views on SlideShare
233
Embed Views
1

Actions

Likes
1
Downloads
0
Comments
0

1 Embed 1

https://twitter.com 1

Accessibility

Categories

Upload Details

Uploaded via SlideShare as Microsoft PowerPoint

Usage Rights

CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Are your strings in a fankle or your things unmentionable? This session will cover practical issues associated with mapping linked open datasets from a local environment to the global semantic web. Issues will be illustrated by examples encountered at National Library of Scotland. Topics covered will include: DODLOD@NLS; LoC(h)-Getty; two other lochs; Wikipedia wickedness and the Elusive loch of Shandon (a watery Brigadoon); innumerate Scots and their thirds, forths and firths; The Germans (a Basil Fawlty moment);
  • NLS @ Edinburgh, Digital Access ManagerReorganisation& Responsibilities Overseeing Library systems & strategic development of collecting databaseResource discoveryWebsite Strategic development on open data and semantic webThere are 3 of us! Web editor/sys libBusy with ELDopen data and content policy – draftWikipedian in Residence – improve access and knowledge of our collections in to the open sphere. undertaken modest steps with linked open datano books in the Library. Coz as Gildas says … books are boring! We only have pictures.
  • Represents 50,000 digitised maps, millions of pages of books and directories (Post Office directories, military lists), papers about the medical history of India, broadsides, photographs, posters and manuscripts
  • Represents 50,000 digitised maps, millions of pages of books and directories (Post Office directories, military lists), papers about the medical history of India, broadsides, photographs, posters and manuscripts
  • It’s in efficient, looking things up, typing them in and then correcting the text. It should be look up and link.
  • It’s in efficient, looking things up, typing them in and then correcting the text. It should be look up and link.
  • So the DOD is old. Old doesn’t mean it’s not fucntionaltho! It’s relevant.It pre-dates Linked Open Data.THE Big issue that I want to focus on is how do we get from our historical strings which we store in the database to their modern day URIsOf course if we started now we’d record the URIs
  • We use 5 vocabularies in the DODActually we use more … we have a few local terms but we try very hard to avoid these
  • Take what we have which is the string is assigned local URI. Then the string needs to be matched with the string in the voc we used to discover its URIAnd That links our URI to the voc URI.
  • Getty doesn’t have linked data representationsIf we started today we’d store the URIs and not the strings (we’d take datadump for strings and URIs for lookup for cataloguers)
  • So far we’ve been looking inwards … at the local. What if we want to extend to the global.How does NLS get from the local to the global?Do we just publish our data and let it be so?Like DNBDo we try start making links?Like BnF to Wikipedia?How do we know what vocs have been mapped to what other vocs? Where is LCNAF mapped ?Where is LCSH mappedShould we map the Scottish stuff?What about those vocs that don’t have URIs (TGN)Where do I map?Or am I trapped in the local?
  • Le Chef tells me that the links just need to be made once and once onlyFor all timeI have a hope …. that the vocs I use are mapped to other vocs and so that my Library’s stuff will link to othersThat’s how its meant to work, isn’t it?So the people manufacturing Kilts and looking at Kilts on wikipediaFind these soldiers!
  • So let’s look at some techniques to connect the local to the global graph and try to find those elusive URIs
  • Use tools like Google Refine and other string matching tools and algorithms
  • Can use tools like Google Refine and various string matching tools.
  • No hits, means no URI – needs humansMultiple hits, means no URIs – needs humansExact match, means YEAH! But .. – needs humans
  • No hits, means no URI – needs humansMultiple hits, means no URIs – needs humansExact match, means YEAH! But .. – needs humans
  • No hits, means no URI – needs humansMultiple hits, means no URIs – needs humansExact match, means YEAH! But .. – needs humans
  • We could use humans. We could use humans and machines to give humans assistance. Examples?
  • Why use humans? Lots of them are looking for something interesting to do ….There are lots of people not being exploited in your organisation. They are bored. They want to contribute. They are bored.ReceptionistsNight watch men.Simple short tasks. – if you don’t know, move to next.
  • Groups – we had library school placement students comeAnd we decided to get them to help discover URIs for the Haig collection.So I got everything set up, they came to see me and then I had a Basil Fawlty momentDon’t mention the war But I had to And then I had to explain about why there were rascists comments in the data
  • There are lots of people not being exploited in your organisation. They are bored. They want to contribute. They are bored.ReceptionistsNight watch men.Simple short tasks. – if you don’t know, move to next.
  • I requested that we build a module that enables tagging of places, names
  • Change focus on this – getting people to say that this IS that. Then say that we are planning to implement semantic tagging as part of other projects. Building a tagging module for transcription service.
  • These are the results at id.loc.gov if you search for horses. IN DOD this picture only describes “horses” but of course the crowd can enhance while tagging. Run a string against this

LO(D) and behold: issues, tips and techniques for extending to the giant global graph LO(D) and behold: issues, tips and techniques for extending to the giant global graph Presentation Transcript

  • LO(D) and Behold! issues, tips and techniques for extending to the Giant Global Graph gill hamilton digital access manager g.hamilton@nls.uk
  • agenda • the Library and DOD • The Big Issue : mapping local instances to the global graph • practical techniques • discussion
  • The DOD • descriptive, technical, administrative & preservation metadata • http://digital.nls.uk/ • > 15 million records
  • Cataloguing in the DOD • Screen shot of DOD
  • cataloguing like this is so last ….
  • but to change causes
  • What IS the URI for “Spud” anyway? is the mapping of historical strings to their modern day things or URIs … that we would have used if we were starting now.
  • the DOD and its vocs Name Authority File (names) go Subject Authority File (keyword) go Thesaurus for Graphic Materials (keyword) go Thesaurus of Geographic Names (place) go Art & Architecture Thesaurus (keyword) go
  • subjects D74548618 Keyword D1790  D1790 Keyword-keyword “Kilts”  D1790 exactMatch sh85072341 ? http://digital.nls.uk/74548618
  • names D74548618 tWhoType-Depicted D9886 tWho-who D9886 exactMatch http://digital.nls.uk/74548618 D9886 “Great Britain. Army. Women’s Army Auxiliary Corps” no2006000034   ?
  • places  D74549224 tPlaceType-Placedepicted D575 D575 tPlace-place “Cambrai” D575 exactMatch _________  httphttp://digital.nls.uk/74549224
  • art & architecture D74546504 Keyword D1234  D1234 Keyword-keyword “War photography”  D1234 exactMatch  _________ http://digital.nls.uk/74546504
  • LoC Getty Name Authority File go Subject Authority File go Thesaurus for Graphic Materials go Thesaurus of Geographic Names go The Art and Architecture Thesaurus go   Well …. real soon now
  • the extended graph http://www.math.uh.edu/~tomforde/images/UniverseAndMan.jpg
  • once for all time! LCSH:sh85072341 exactMatch hproductontology.org/doc/Kilt exactMatch dbPedia:Kilt DOD:D575 exactMatch geoNames:3029030
  • matching local to global statistical literals humans influences
  • statistical the law of large numbers
  • literals String matching …. Yeah baby! What do we want? UNIQUE MATCHES! When do we want them? NOW! NOW! NOW! OH NO!!! hang on a minute ….
  • literals String matching …. DAMN YOU! WOO-HOO! Not hits hits Multiple EXACT MATCH means we’ve gotta URI no URI no need for humans needs humans but can you really REALLY trust it? Innumerate?
  • an aside …. the innumerate Scots The first bridge The Forth bridge is the Forth neither the Fourth nor the 4th The 2nd bridge is the Forth Road bridge
  • an aside …. the innumerate Scots Fourth Forth bridge Third Forth bridge
  • an aside …. the innumerate Scots There’s a third The First Bridge on Forth Bridge on the Firth of Forth the Firth The FIFTH Forth bridge is the Forth bridge Did I tell you about And Finally, The 2nd Bridge there’s the on the Firth of Fifth Forth Forth is the Forth bridge on the Road bridge yet? Firth of Forth Firths And on the Firth of Forth there’s a Fourth bridge but it’s not the Forth bridge
  • http://www.inboundmarketingagents.com/Portals/160334/images/ID-10025253-resized-600.jpg humans individuals groups crowds
  • individuals versus
  • Captured Boche plane groups http://upload.wikimedia.org/wikipedia/en/f/fb/Basil_Fawlty.jpg
  • groups keyword Earth (soil) close Earthworks (engineering works) sh85040505 exact exact AAT 74465029 1048 close AAT 74549258 3723 AAT 74546044 4844 TGMI sh85124396 Earls Match http://en.wikipedia.org/wiki/Soil Match Wikipedia http://en.wikipedia.org/wiki/Earls LCSH keywordAuthor ity 74548320 1055 74546674 4352 http://en.wikipedia.org/wiki/Earthworks_(engineering) exact Eating & drinking Editors Keyword http://id.loc.gov/auth orities/subjects/sh850 40976.html close LCSH AAT Match Wikipedia http://en.wikipedia.org/wiki/Editing Edwardian Match broad DODid keywordID Originating voc http://en.wikipedia.org/wiki/Edwardian close AAT 74549696 1071 Egg sh85041248 close http://en.wikipedia.org/wiki/Egg_(biology) close AAT 74549016 4351 Elderly Electricity Embankments tgm007221 sh85042065 close close http://en.wikipedia.org/wiki/Elderly http://en.wikipedia.org/wiki/Electricity close close AAT AAT AAT Eating & drinking Editors Emblems Edwardian Emergency medical services Egg Enemies Engineers Elderly Engines (power producing equipment) Electricity Entertainers Entertaining Embankments Entertainment sh85040976 sh85042664 sh85042693 close http://en.wikipedia.org/wiki/Embankments http://en.wikipedia.org/wiki/Editing exact AAT http://en.wikipedia.org/wiki/Emblems close http://en.wikipedia.org/wiki/Edwardian http://en.wikipedia.org/wiki/Emergency_medical_service TGMI exact s close close http://en.wikipedia.org/wiki/Enemies http://en.wikipedia.org/wiki/Egg_(biology) AAT close broad AAT close http://en.wikipedia.org/wiki/Engineers close close http://en.wikipedia.org/wiki/Elderly AAT close close close http://en.wikipedia.org/wiki/Engines http://en.wikipedia.org/wiki/Electricity TGMI exact http://en.wikipedia.org/wiki/Entertainers broad AAT close exact http://en.wikipedia.org/wiki/Entertaining broad http://en.wikipedia.org/wiki/Embankments exact exact sh85042747 sh85041248 sh95005954 sh85043249 tgm007221 sh85043258 sh85042065 sh85044098 sh85044107 sh85042664 sh96009616 broad http://en.wikipedia.org/wiki/Entertainers exact TGMI broad close close close close exact TGMI 74549556 74547020 AAT 74548178 AAT 74546714 74545806 AAT 74549382 AAT 74549498 74549310 AAT 74548888 AAT 74546398 1079 3744 1087 1088 4807 1104 1106 4895 1115 1116 74548150 5191 Entrances ? http://en.wikipedia.org/wiki/Entrance exact AAT 74549258 1117 Epaulets - http://en.wikipedia.org/wiki/Epaulette exact AAT 74546740 1118 Equestrians sh85062154 http://en.wikipedia.org/wiki/Equestrianism broad AAT 74549594 1123 AAT 74545814 1124 TGMI 74549442 3782 74548678 1139 Equipment Equipment & supplies sh85085299 sh85085299 close ? broad http://en.wikipedia.org/wiki/Military_equipment close close http://en.wikipedia.org/wiki/Military_equipment broad Ethnic groups sh85045172 exact http://en.wikipedia.org/wiki/Ethnic_group exact AAT Events sh96009616 close http://en.wikipedia.org/wiki/Competition close AAT 74547864 1148 sh85046104 broad http://en.wikipedia.org/wiki/Excavation_(archaeology) related AAT 74549618 4850 74548718 5038 Excavation (process) Exhibiting sh85046354 close http://en.wikipedia.org/wiki/Exhibition broad AAT Exhibitions (events) sh85046354 close http://en.wikipedia.org/wiki/Exhibition exact AAT 74546188 1163 Explosions sh85046465 exact http://en.wikipedia.org/wiki/Explosion close AAT 74549252 3750
  • crowds Which person is Old Fox? `<name> Capt. Campbell </name> <name> Maj Duncanson </name> Is Glencoe here? <name> old Fox </name> <place> Glencoe </place> http://en.wikipedia.org/wiki/Massacre_of_Glencoe <name> McDonalds </name> Order to Capt. Campbell by Maj. Duncanson You are hereby ordered to fall upon the rebells, the McDonalds of Glencoe, and put all to the sword under seventy. you are to have a speciall care that the old Fox and his sones doe upon no account escape your hands
  • crowds & geonames We think this is Cambrai … Do you think this Cambrai is here? it Or do you think it’s here?
  • crowds & dbPedia Is the horse in this picture … it none of these
  • crowds & LCSH Would you describe this horse in any of these ways? it Show jumpers (horses) Horses in motion pictures Toy Horses Horses War horses Travel with horses none of these
  • Things to think about …. • using a voc without URIs? – should we change? • are there good ways to string match? – are they trustworthy? • are crowds helpful? • what vocs are mapped to what other vocs? – can/should we help map vocs beyond our domain?