The role of Thesauriand Standard Vocabularies in linking data-AGROVOC-UNBIS-EUROVOCA proposal for collaboration between ag...
The Developmentof the Internet<br />
<ul><li>“Closed” (“normal”) IT environments
Data sources carefully controlled.
Data formats “custom-defined” for an application.
Linked data based on an “open world mindset”
Integrating data from the open Web
Systems designed to incorporate new information incrementally
By design, tolerance of incomplete information</li></ul>Open World Mindset<br />
The Linked Data Universe: http://www.linkeddata.org  (july 2009)<br />4<br />
The Linked Data Universe: http://www.linkeddata.org  (july 2010)<br />
Example: BBC Wildlife Finder<br />
Humboldt Squid page, pulled together from a diversity of Linked Data sources<br />BBC TV Documentary<br />BBC News item<br...
RDF– a grammarforthelanguageofdata<br />Resource<br />ResourceA<br />ResourceB<br />relatedTo<br />Resource<br />ResourceA...
<ul><li>http://www.w3.org/2007/Talks/0221-Bangalore-IH/</li></ul>RDF as a common format for merging data<br />
<ul><li>Born as tools to assure consistency in the indexing of library collections
Thesauri were based on “terms”,  but terms   represented already concepts in a non explicit way
Hierarchical and associative relationships represented generic ontological domain knowledge
Candidate building blocks for the semantic web</li></ul>Roleofthesauri/conceptschemes<br />
..from thesaurus to Ontologies….<br />
<ul><li>around  30,000 concepts
600000 labels in around 20 languages.
one-stop shop for terminological knowledge related to agriculture in general
a knowledge base of related concepts organized in ontological relationships (hierarchical, associative, equivalence)‏
Is a concept/term/string based system
Concepts may be organized in multiple categories.</li></ul>AGROVOC today<br />
Semantic Relationships<br />
AGROVOC conceptual model,in SKOS-XL<br />:bar<br />skos:literalForm<br />“maize”<br />:foo<br />has_synonym<br />:foo<br /...
http://www.w3.org/2004/02/skos/<br />
SKOS-XL output<br /><rdf:Descriptionrdf:about="http://aims.fao.org/aos/agrovoc/agrovocScheme">	<rdf:typerdf:resource="http...
The conceptschemeworkbench<br />
Linkingvocabularies<br />
http://agris.fao.org/agris-search/search/display.do?f=2004/ZA/ZA04002.xml;ZA2004000049<br />
http://eurovoc.europa.eu/218754<br />http://aims.fao.org/aos/agrovoc/c_7825<br />
Linking data through common URIs<br />Eurovoc<br />Maize<br />Maize<br />skosxl: literalForm<br />skosxl: literalForm<br /...
What are wedoingwithunstructured data?<br /><ul><li>Wehaveenormousamountsofunstructured material
Upcoming SlideShare
Loading in...5
×

Ksim keizer 2010-10-19

989

Published on

Presentation on Linked Data and Thesauri given to the KSIM meeting of the UN

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
989
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
16
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Thisgraphelaboratedby Nova Spivacksfrom Radar Networksispopular at the moment. The Y-Axisisfor the increaseof information connections. The X-Axisisfor the increaseof social connections. Whereas the Web Operating System in 2030 isstill a brilliantguess in the future, the developmentof the Semantic Web, or Web 3.0 hasnowgotconsiderablemomentum
  • Oneof the key development in the semantic web are “Linked Open Data”. The Linked Open Data paradigmclaimsthatexistingstructured data needtobereleasedfrom the proprietary silos in whichthey are at the moment. With the existenceof RDF (ResourceDescriptionFramework) there are the semantictoolsto do so. Thereisalsotechnologytouse RDF. More tothislater.
  • Thisis a snapshotoneyearlater. The growthisenormous. A centralpointisDBPedia, “triplified” information fromWikipedia. The differentcoloursrepresent the different information types, being “life sciences” and “publications” the mostpopulatedareas, butwith the area “government” stronglygrowingInterestingnewcomers in the last months are the two VIVO datasetsfrom the UnitedStatesdescriping expertise in Science. Vivo isactually a project thatstarted the agriculturallibraryofCornellUniversity
  • Whatdoesthismean in practice? I will show thiswithanexamplefrom the BBC. The biggestconsumers (and producers) of LOD are as I know the BBC and the New York times (Butnowalso the US government)
  • During the Web 1.0 phase, Webpageswerecomposedbyhumans. Todaymostwebpages are drivenbydatabasesthat can bedynamicallyqueried. Theycontainthrough RSS feedsalso data fromotherwebsitesThis BBC webpageis a big jumpfurther. I hasnotbeencomposedbyhumans and itisnotfromone database generated. Itisgeneratedfromdifferentdatasourcesthatwerepresentaslinked open data, linkedonlythrough common URIs
  • The “technology” thatmakeslinked open data possibleis RDF. Everything in RDF ismadeof “triples”, A triple means a statement with “Subject-Predicate-Object” asshown in thisexample. Ideally, allelementsof a triple are representedbyan URI, anunambiguousdefinitionof a concept, whichismachinereadable, buttriples can bebuiltalsofromsimpleletterstrings.
  • Whatisnow the roleofthesauri and specifically the roleofourthesauri in this set up?
  • In our team wehadveryearly the idea thatthesauriwouldbecomeofimportance in the developmentof Web information management. Within the AOS (AgriculturalOntology Service) initiativewehavegone a long and winding road. The Google searchshowsour 2003 paper in JODI.Butnow AGROVOC hasbecome showcase for the useofthesauritobuildconceptschemes
  • Some auto appreciation
  • Thisis the AGROVOC SKOS modelthathasbeendeveloped and decided in April 2010 under activecollaborationfrom Tom Baker, whowasmemberof the W3C SKOS workinggroup.
  • SKOS-XL hasbeenpublishedas a W3C standard oneyear ago. The initialversionsof SKOS werenotsufficientto express the complexicitiesofmultilingualthesauri. Margherita Sini from FAO wasmemberof the SKOS workinggroup and we are vere satisfiedthat at then end a standard emergedthatcatersforourneeds
  • You can seehere the AGROVOC encoding in SKOS
  • During the discussions on the AGROVOC model, wealsodid some software engineering. The resultis the conceptschemeworkbench.Is a web-based working environment for managing the AGROVOC Concept Server  Facilitate the collaborative editing of multilingual terminology and semantic concept information  It includes administration and group management features  It includes workflows for maintenance, validation and quality assurance of the data pool  The CS is accessible freely to everybody to facilitates collaborative editing Alreadynownotonly AGROVOC is on the workbench, butalso the FAO OpenArchive authority data. We can hostanyconceptscheme
  • The tableshows 3 descriptorsthat are in AGROVOC, EUROVOC and UNBIS. In AGROVOC and EUROVOC they are alreadyencodedasURIs. Easilywecouldestablishrelationshipslikeowl.sameAsbetween the concepts or skos:exactMatchbetweenlabels.
  • In a bibliographical record thereismuch more hidden information thandisplayedwith the metadata. Manyof the highlystructured data are linkingtoother information on the web. In AGRIS wehavenowintroducedsomethingwhatwecall “naivelinking”. An AGRIS record linksautomaticallyto Google Mapsfor the location of the center and to Google toretrieve the full text of the resource, citationlists or otherpublicationsfrom the authors. Thisoftenworks, butclearlynotalway, s asitisnotcontrolledbysemantics, butonlythroughidentyofstrings. Foranuneducatedmachineunfortunately COW and C.O.W. are the same, whereaspeanuts and groundnuts are somethingdifferent.
  • Ifresources are marked up withsemanticallydefined and machinereadableconcepts, they can belinked and mashed up preciselyaswehaveseen in the examplefrom the BBC.In thisexamplewe start withan AGRIS record on Hazardouswaste, whichisindexedwith AGROVOC. Alreadynowwe can easily link to material indexedwithEurovoc, hereanexamplefromEuroLex. If the UNBIS thesaurus wouldberestructuredto a conceptscheme and publishedas LOD, related UN documentscouldbeattachedautomaticallyby the machine.
  • How does this work: A resource is connected with each concept URI in the web. The concepts between three vocabularies are having same literal which is connected with owl:sameAS/exactMatch relationship. As we are speakingaboutthesauri and notontologieswekept the relation tobechosenpurposelyvague. The conceptscouldbematchedwithowl:sameAS or the termscouldbematcheswith SKOS:exactMatch. A lotofdiscussion on thisisongoing
  • Oneof the groundbreakingenterprises in this area isThomsonReuters “Open Calais”. Thisis a webservicethatprovidessemanticmark up foranyunstructured text thatyoufeedintotheir service The service is free ofCharge. Why? I will show youlater.
  • My team in collaborationwith the IndianInstituteofTechnology in Kanpur isdeveloping a similar service foroursubject area.
  • Wehavehere a text from 1964 without a bibliographic record at handabout a plantprotectionissue
  • Open Calais isverygood in thoseareas, in whichtheyhavetheirownelaboratedconceptschemeagainstwhich the texts are analyzed: “Places”, “Persons”, “Business Processes” , “IndustryTerms”, butitisweak in the specifictopicanalysis, whattheycall “social tags”
  • AgroTaggerstilllacksmanyof the sophisticated featuresof “Open Calais” ,butismuch, muchbetter in the subjectanalysisof the text
  • Wewillnowtry a life demo
  • Ksim keizer 2010-10-19

    1. 1. The role of Thesauriand Standard Vocabularies in linking data-AGROVOC-UNBIS-EUROVOCA proposal for collaboration between agencies<br />Dr. Johannes Keizer<br />FAO of the UnitedNations<br />Office ofKnowledge Exchange, Research and Extension<br />Knowledge and CapacityforDevelopment<br />
    2. 2. The Developmentof the Internet<br />
    3. 3. <ul><li>“Closed” (“normal”) IT environments
    4. 4. Data sources carefully controlled.
    5. 5. Data formats “custom-defined” for an application.
    6. 6. Linked data based on an “open world mindset”
    7. 7. Integrating data from the open Web
    8. 8. Systems designed to incorporate new information incrementally
    9. 9. By design, tolerance of incomplete information</li></ul>Open World Mindset<br />
    10. 10. The Linked Data Universe: http://www.linkeddata.org (july 2009)<br />4<br />
    11. 11. The Linked Data Universe: http://www.linkeddata.org (july 2010)<br />
    12. 12. Example: BBC Wildlife Finder<br />
    13. 13. Humboldt Squid page, pulled together from a diversity of Linked Data sources<br />BBC TV Documentary<br />BBC News item<br />Wikipedia<br />Animal Diversity Web:Nocturnal way of life<br />
    14. 14. RDF– a grammarforthelanguageofdata<br />Resource<br />ResourceA<br />ResourceB<br />relatedTo<br />Resource<br />ResourceA<br /> Some text<br />describedBy<br />Describe resources using interrelated “statements” (“triples”).<br />Use URIs – unique, globally managed identifiers – <br /> as the “words” of statements.<br />
    15. 15. <ul><li>http://www.w3.org/2007/Talks/0221-Bangalore-IH/</li></ul>RDF as a common format for merging data<br />
    16. 16. <ul><li>Born as tools to assure consistency in the indexing of library collections
    17. 17. Thesauri were based on “terms”, but terms represented already concepts in a non explicit way
    18. 18. Hierarchical and associative relationships represented generic ontological domain knowledge
    19. 19. Candidate building blocks for the semantic web</li></ul>Roleofthesauri/conceptschemes<br />
    20. 20. ..from thesaurus to Ontologies….<br />
    21. 21. <ul><li>around 30,000 concepts
    22. 22. 600000 labels in around 20 languages.
    23. 23. one-stop shop for terminological knowledge related to agriculture in general
    24. 24. a knowledge base of related concepts organized in ontological relationships (hierarchical, associative, equivalence)‏
    25. 25. Is a concept/term/string based system
    26. 26. Concepts may be organized in multiple categories.</li></ul>AGROVOC today<br />
    27. 27. Semantic Relationships<br />
    28. 28. AGROVOC conceptual model,in SKOS-XL<br />:bar<br />skos:literalForm<br />“maize”<br />:foo<br />has_synonym<br />:foo<br />skos:literalForm<br />“corn”<br />has_translation<br />maïs (fr)<br />has_synonym<br />:bar<br />AGROVOCConceptScheme<br />Other scheme in FAO<br />skos:inScheme<br />Another scheme in FAO<br />skos:topConceptOf<br />6211<br />Further schemes in FAO<br />skos:inScheme<br />skos:broader<br />8171<br />skos:broader<br />SKOSConcept<br />1474<br />SKOS Label<br />skos:broader<br />12332<br />rdf:type<br />skosxl:prefLabel<br />skosxl:altLabel<br />rdf:type<br />has_synonym<br />
    29. 29. http://www.w3.org/2004/02/skos/<br />
    30. 30. SKOS-XL output<br /><rdf:Descriptionrdf:about="http://aims.fao.org/aos/agrovoc/agrovocScheme"> <rdf:typerdf:resource="http://www.w3.org/2004/02/skos/core#ConceptScheme"/></rdf:Description><rdf:Description rdf:about="http://aims.fao.org/aos/agrovoc/c_330829"> <rdf:typerdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/> <skos:inSchemerdf:resource="http://aims.fao.org/aos/agrovoc/agrovocScheme"/> <skos:topConceptOfrdf:resource="http://aims.fao.org/aos/agrovoc/agrovocScheme"/></rdf:Description><rdf:Description rdf:about="http://aims.fao.org/aos/agrovoc/xl_en_1278479064610"> <literalForm xmlns="http://www.w3.org/2008/05/skos-xl#" xml:lang="en">subjects</literalForm> <rdf:typerdf:resource="http://www.w3.org/2008/05/skos-xl#Label"/></rdf:Description><br />URI of AGROVOC concept<br />
    31. 31. The conceptschemeworkbench<br />
    32. 32. Linkingvocabularies<br />
    33. 33. http://agris.fao.org/agris-search/search/display.do?f=2004/ZA/ZA04002.xml;ZA2004000049<br />
    34. 34. http://eurovoc.europa.eu/218754<br />http://aims.fao.org/aos/agrovoc/c_7825<br />
    35. 35. Linking data through common URIs<br />Eurovoc<br />Maize<br />Maize<br />skosxl: literalForm<br />skosxl: literalForm<br />owl:sameAs/exactMatch<br />http://aims.fao.org/aos/agrovoc/c_12332<br />Maize<br />UNBIS<br />http://eurovoc.europa.eu/219871<br />AGROVOC<br />skosxl: literalForm<br />owl:sameAs/exactMatch<br />Maize <br />http://agris.fao.org/agris-search/search/display.do?f=1996/TR/TR96001.xml;TR9600026<br />http://unbisnet.un.org:8080/ipac20/ipac.jsp?session=128F308557F34.283092&profile=bib&uri=full=3100001~!685149~!1&ri=1&aspect=subtab124&menu=search&source=~!horizon<br />http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2010:202:0011:0015:EN:PDF<br />http://aims.fao.org/aos/agrovoc/c_12332 owl:sameAshttp://eurovoc.europa.eu/219871<br />
    36. 36. What are wedoingwithunstructured data?<br /><ul><li>Wehaveenormousamountsofunstructured material
    37. 37. Stillmostof the documentsthatwe are producing are mostlysemanticallyunstructured
    38. 38. Human work tocatalogue and indexisbecomingalways more rare
    39. 39. Weneedmachinesto do automaticsemanticmarkupsof text
    40. 40. Ifmachines are trained and based on conceptschemes, ther are ableto do so</li></li></ul><li>
    41. 41. <ul><li>Does Concept identification in unstructured texts
    42. 42. Uses Agrovoc as a controlled vocabulary
    43. 43. Prototype under testing with excellent results (entire repository of ICARDA indexed)
    44. 44. Will produce in future Structured RDF files that can be used to link data like “open Calais”</li></ul>AgroTagger<br />
    45. 45.
    46. 46.
    47. 47.
    48. 48. Life Demo: Semanticmarkups:<br />http://viewer.opencalais.com/<br />http://agropedialabs.iitk.ac.in/Tagger/Agrotagger_text.php<br />
    49. 49. Collaboration<br />Some points, aboutwhatweneedto do and whatwecould do together<br />
    50. 50. 01: Open Archives + Linked Open Data<br /><ul><li>Ouragencieshave a wealthofimportant information
    51. 51. Weshouldpublishthemas fast aspossibleas “Linked Open Data” and create linksamongthem
    52. 52. metadata from databases and vocabularies) can be published without bigger investments and with little delay. 
    53. 53. Our data need to be come reference points in the linked data environment.</li></li></ul><li><ul><li>a SKOS-XL modeltotransformmultilingualcomplexthesauri in toconceptschemes and publishthemas LOD
    54. 54. a cuttingedgeworkbenchtoenrich and maintain the conceptschemes/vocabularies
    55. 55. Semanticinteroperability! Mapping!</li></ul>02 Conceptschemes!<br />
    56. 56. <ul><li>Development of Production level machine indexing to substitute human indexing of agency publications. 
    57. 57. Adapting AgroTagger for UNBIS
    58. 58. Methodologies to adapt the system to any Agency thesaurus and document corpus
    59. 59. Web Servicestoaccess the semantic markup engines
    60. 60. CustomizationofSearchEngines</li></ul>03 Semantic Technologies !<br />
    61. 61. <ul><li>Workinggroupwithinterestedcolleaguesfromdifferentagencies
    62. 62. Discussion forum to elaborate a project proposal (can behosted on aims.fao.org)
    63. 63. Workshop in springtodiscuss and decide details</li></ul>PossibleSteps<br />
    64. 64. Thank You!<br />
    65. 65. Giving a try to the workbench<br />A demo version of the AWB: http://202.73.13.50:55234/agrovocdevv10d/ With all functionalities, availabe to users for testing purpose.Latest stable release version 1.0 : (read/write) http://202.73.13.50:55381/agrovocv10i/Latest stable release version 1.0 (Read only): http://202.73.13.50:55481/agrovocv10i/ (Visitors only with only view privilege)<br />
    66. 66. …and more: http://aims.fao.org<br />
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×