Istic thesaurus ws-keizer_2010-10-22

603 views

Published on

Presentation on ISTIC workshop on thesauri. Enlarged and revised version of the presentation given to the UNKSIM

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
603
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
10
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Thisgraphelaboratedby Nova Spivacksfrom Radar Networksispopular at the moment. The Y-Axisisfor the increaseof information connections. The X-Axisisfor the increaseof social connections. Whereas the Web Operating System in 2030 isstill a brilliantguess in the future, the developmentof the Semantic Web, or Web 3.0 hasnowgotconsiderablemomentum
  • Oneof the key development in the semantic web are “Linked Open Data”. The Linked Open Data paradigmclaimsthatexistingstructured data needtobereleasedfrom the proprietary silos in whichthey are at the moment. With the existenceof RDF (ResourceDescriptionFramework) there are the semantictoolsto do so. Thereisalsotechnologytouse RDF. More tothislater.
  • Thisis a snapshotoneyearlater. The growthisenormous. A centralpointisDBPedia, “triplified” information fromWikipedia. The differentcoloursrepresent the different information types, being “life sciences” and “publications” the mostpopulatedareas, butwith the area “government” stronglygrowingInterestingnewcomers in the last months are the two VIVO datasetsfrom the UnitedStatesdescriping expertise in Science. Vivo isactually a project thatstarted the agriculturallibraryofCornellUniversity
  • Whatdoesthismean in practice? I will show thiswithanexamplefrom the BBC. The biggestconsumers (and producers) of LOD are as I know the BBC and the New York times (Butnowalso the US government)
  • During the Web 1.0 phase, Webpageswerecomposedbyhumans. Todaymostwebpages are drivenbydatabasesthat can bedynamicallyqueried. Theycontainthrough RSS feedsalso data fromotherwebsitesThis BBC webpageis a big jumpfurther. I hasnotbeencomposedbyhumans and itisnotfromone database generated. Itisgeneratedfromdifferentdatasourcesthatwerepresentaslinked open data, linkedonlythrough common URIs
  • The “technology” thatmakeslinked open data possibleis RDF. Everything in RDF ismadeof “triples”, A triple means a statement with “Subject-Predicate-Object” asshown in thisexample. Ideally, allelementsof a triple are representedbyan URI, anunambiguousdefinitionof a concept, whichismachinereadable, buttriples can bebuiltalsofromsimpleletterstrings.
  • Whatisnow the roleofthesauri and specifically the roleofourthesauri in this set up?
  • In our team wehadveryearly the idea thatthesauriwouldbecomeofimportance in the developmentof Web information management. Within the AOS (AgriculturalOntology Service) initiativewehavegone a long and winding road. The Google searchshowsour 2003 paper in JODI.Butnow AGROVOC hasbecome showcase for the useofthesauritobuildconceptschemes
  • Some auto appreciation
  • Thisis the AGROVOC SKOS modelthathasbeendeveloped and decided in April 2010 under activecollaborationfrom Tom Baker, whowasmemberof the W3C SKOS workinggroup.
  • SKOS-XL hasbeenpublishedas a W3C standard oneyear ago. The initialversionsof SKOS werenotsufficientto express the complexicitiesofmultilingualthesauri. Margherita Sini from FAO wasmemberof the SKOS workinggroup and we are vere satisfiedthat at then end a standard emergedthatcatersforourneeds
  • You can seehere the AGROVOC encoding in SKOS
  • The tableshows 3 descriptorsthat are in AGROVOC, EUROVOC and UNBIS. In AGROVOC and EUROVOC they are alreadyencodedasURIs. Easilywecouldestablishrelationshipslikeowl.sameAsbetween the concepts or skos:exactMatchbetweenlabels.
  • In a bibliographical record thereismuch more hidden information thandisplayedwith the metadata. Manyof the highlystructured data are linkingtoother information on the web. In AGRIS wehavenowintroducedsomethingwhatwecall “naivelinking”. An AGRIS record linksautomaticallyto Google Mapsfor the location of the center and to Google toretrieve the full text of the resource, citationlists or otherpublicationsfrom the authors. Thisoftenworks, butclearlynotalway, s asitisnotcontrolledbysemantics, butonlythroughidentyofstrings. Foranuneducatedmachineunfortunately COW and C.O.W. are the same, whereaspeanuts and groundnuts are somethingdifferent.
  • Ifresources are marked up withsemanticallydefined and machinereadableconcepts, they can belinked and mashed up preciselyaswehaveseen in the examplefrom the BBC.In thisexamplewe start withan AGRIS record on Hazardouswaste, whichisindexedwith AGROVOC. Alreadynowwe can easily link to material indexedwithEurovoc, hereanexamplefromEuroLex. If the UNBIS thesaurus wouldberestructuredto a conceptscheme and publishedas LOD, related UN documentscouldbeattachedautomaticallyby the machine.
  • How does this work: A resource is connected with each concept URI in the web. The concepts between three vocabularies are having same literal which is connected with owl:sameAS/exactMatch relationship. As we are speakingaboutthesauri and notontologieswekept the relation tobechosenpurposelyvague. The conceptscouldbematchedwithowl:sameAS or the termscouldbematcheswith SKOS:exactMatch. A lotofdiscussion on thisisongoing
  • Oneof the groundbreakingenterprises in this area isThomsonReuters “Open Calais”. Thisis a webservicethatprovidessemanticmark up foranyunstructured text thatyoufeedintotheir service The service is free ofCharge. Why? I will show youlater.
  • My team in collaborationwith the IndianInstituteofTechnology in Kanpur isdeveloping a similar service foroursubject area.
  • Wehavehere a text from 1964 without a bibliographic record at handabout a plantprotectionissue
  • Open Calais isverygood in thoseareas, in whichtheyhavetheirownelaboratedconceptschemeagainstwhich the texts are analyzed: “Places”, “Persons”, “Business Processes” , “IndustryTerms”, butitisweak in the specifictopicanalysis, whattheycall “social tags”
  • AgroTaggerstilllacksmanyof the sophisticated featuresof “Open Calais” ,butismuch, muchbetter in the subjectanalysisof the text
  • Wewillnowtry a life demo
  • During the discussions on the AGROVOC model, wealsodid some software engineering. The resultis the conceptschemeworkbench.Is a web-based working environment for managing the AGROVOC Concept Server  Facilitate the collaborative editing of multilingual terminology and semantic concept information  It includes administration and group management features  It includes workflows for maintenance, validation and quality assurance of the data pool  The CS is accessible freely to everybody to facilitates collaborative editing Alreadynownotonly AGROVOC is on the workbench, butalso the FAO OpenArchive authority data. We can hostanyconceptscheme
  • Istic thesaurus ws-keizer_2010-10-22

    1. 1. The role of Thesauri and Standard Vocabularies in linking data Dr. Johannes Keizer FAO of the United Nations Office of Knowledge Exchange, Research and Extension Knowledge and Capacity for Development
    2. 2. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development ThesaurusWorkshop–CASBeijing,2010-10-22 The Development of the Internet
    3. 3. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development ThesaurusWorkshop–CASBeijing,2010-10-22  “Closed” (“normal”) IT environments  Data sources carefully controlled.  Data formats “custom-defined” for an application.  Linked data based on an “open world mindset”  Integrating data from the open Web  Systems designed to incorporate new information incrementally  By design, tolerance of incomplete information Open World Mindset
    4. 4. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development ThesaurusWorkshop–CASBeijing,2010-10-22 The Linked Data Universe: http://www.linkeddata.org (july 2009) 4
    5. 5. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development ThesaurusWorkshop–CASBeijing,2010-10-22 The Linked Data Universe: http://www.linkeddata.org (july 2010)
    6. 6. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development ThesaurusWorkshop–CASBeijing,2010-10-22 Example: BBC Wildlife Finder
    7. 7. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development ThesaurusWorkshop–CASBeijing,2010-10-22 Humboldt Squid page, pulled together from a diversity of Linked Data sources Animal Diversity Web: Nocturnal way of life BBC TV Documentary BBC News item Wikipedia
    8. 8. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development ThesaurusWorkshop–CASBeijing,2010-10-22 RDF– a grammar for the language of data Resource relatedTo ResourceA ResourceB Resource describedBy ResourceA Some text 1. Describe resources using interrelated “statements” (“triples”). 2. Use URIs – unique, globally managed identifiers – as the “words” of statements.
    9. 9. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development ThesaurusWorkshop–CASBeijing,2010-10-22 •http://www.w3.org/2007/Talks/0221-Bangalore-IH/ RDF as a common format for merging data
    10. 10. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development ThesaurusWorkshop–CASBeijing,2010-10-22 Finding things related to “genes” across databases Source: Joanne Luciano, Mitre, and the W3C HCLS IG
    11. 11. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development ThesaurusWorkshop–CASBeijing,2010-10-22  Born as tools to assure consistency in the indexing of library collections  Thesauri were based on “terms”, but terms represented already concepts in a non explicit way  Hierarchical and associative relationships represented generic ontological domain knowledge  Candidate building blocks for the semantic web Role of thesauri/concept schemes
    12. 12. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development ThesaurusWorkshop–CASBeijing,2010-10-22 ..from thesaurus to Ontologies….
    13. 13. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development ThesaurusWorkshop–CASBeijing,2010-10-22  around 30,000 concepts  600000 labels in around 20 languages.  one-stop shop for terminological knowledge related to agriculture in general  a knowledge base of related concepts organized in ontological relationships (hierarchical, associative, equivalence)  Is a concept/term/string based system  Concepts may be organized in multiple categories. AGROVOC today
    14. 14. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development ThesaurusWorkshop–CASBeijing,2010-10-22 Semantic Relationships Concept to Concept isA (hierarchy), isPestOf, hasPest Concept to Term has_lexicalization (links concepts to their lexical realizations) Term to Term isSynonymOf, isTranslationOf, hasAcronym, hasAbbreviation Term to String hasSpellingVariant, hasSingular
    15. 15. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development ThesaurusWorkshop–CASBeijing,2010-10-22 The AGROVOC SKOS-XL Model 8171 1474 12332 skosxl:altLabel skosxl:prefLabel skos:broader SKOS Label skos:broader SKOS Concept rdf:type rdf:type 6211 skos:broader Agrovoc Concept Scheme skos:topConceptOfskos:inScheme SKOS Concept Scheme rdf:type rdf:type :bar :foo “corn” “maize” skosxl:literalForm skosxl:literalForm rdf:type rdf:type rdf:type
    16. 16. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development ThesaurusWorkshop–CASBeijing,2010-10-22 http://www.w3.org/2004/02/skos/
    17. 17. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development ThesaurusWorkshop–CASBeijing,2010-10-22 SKOS-XL output <rdf:Description rdf:about="http://aims.fao.org/aos/agrovoc/agrovocScheme"> <rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#ConceptScheme"/></rdf :Description><rdf:Description rdf:about="http://aims.fao.org/aos/agrovoc/c_330829"> <rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/> <skos:inScheme rdf:resource="http://aims.fao.org/aos/agrovoc/agrovocScheme"/> <skos:topConceptOf rdf:resource="http://aims.fao.org/aos/agrovoc/agrovocScheme"/></rdf:Descri ption><rdf:Description rdf:about="http://aims.fao.org/aos/agrovoc/xl_en_1278479064610"> <literalForm xmlns="http://www.w3.org/2008/05/skos-xl#" xml:lang="en">subjects</literalForm> <rdf:type rdf:resource="http://www.w3.org/2008/05/skos-xl#Label"/></rdf:Description> URI of AGROVOC concept
    18. 18. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development ThesaurusWorkshop–CASBeijing,2010-10-22 AGROVOC EUROVOC UNBIS Relationship http://aims.fao. org/aos/agrovoc /c_207 http://eurovoc .europa.eu/21 9055 agroforestry skos:exactMatch / owl:sameAs http://aims.fao. org/aos/agrovoc /c_4826 http://eurovoc .europa.eu/22 0018 MILK skos:exactMatch / owl:sameAs http://aims.fao. org/aos/agrovoc /c_12332 http://eurovoc .europa.eu/21 9871 MAIZE skos:exactMatch / owl:sameAs Linking vocabularies
    19. 19. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development ThesaurusWorkshop–CASBeijing,2010-10-22 http://agris.fao.org/agris-search/search/display.do?f=2004/ZA/ZA04002.xml;ZA2004000049
    20. 20. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development ThesaurusWorkshop–CASBeijing,2010-10-22 http://aims.fao.org/aos/agrovoc/c_7825 http://eurovoc.europa.eu/218754
    21. 21. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development ThesaurusWorkshop–CASBeijing,2010-10-22 http://eurovoc.europa.eu/ 219871 Maize skosxl: literalForm Maize http://aims.fao.org/ao s/agrovoc/c_12332 AGROVOC skosxl: literalForm Maize http://aims.fao.org/aos/agrovoc/c_12332 owl:sameAs http://eurovoc.europa.eu/219871 owl:sameAs/exactMatch http://agris.fao.org/agris- search/search/display.do?f=1996 /TR/TR96001.xml;TR9600026 Linking data through common URIs skosxl: literalForm owl:sameAs/exactMatch http://eur- lex.europa.eu/LexUriServ/LexUriSe rv.do?uri=OJ:L:2010:202:0011:001 5:EN:PDF http://unbisnet.un.org:8080/ipac20/ipac.j sp?session=128F308557F34.283092&pr ofile=bib&uri=full=3100001~!685149~!1& ri=1&aspect=subtab124&menu=search& source=~!horizon Maize Eurovoc UNBIS
    22. 22. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development ThesaurusWorkshop–CASBeijing,2010-10-22 What are we doing with unstructured data? • We have enormous amounts of unstructured material • Still most of the documents that we are producing are mostly semantically unstructured • Human work to catalogue and index is becoming always more rare • We need machines to do automatic semantic mark ups of text • If machines are trained and based on concept schemes, ther are able to do so
    23. 23. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development ThesaurusWorkshop–CASBeijing,2010-10-22
    24. 24. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development ThesaurusWorkshop–CASBeijing,2010-10-22 • Does Concept identification in unstructured texts • Uses Agrovoc as a controlled vocabulary • Prototype under testing with excellent results (entire repository of ICARDA indexed) • Will produce in future Structured RDF files that can be used to link data like “open Calais” • AgroTagger
    25. 25. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development ThesaurusWorkshop–CASBeijing,2010-10-22
    26. 26. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development ThesaurusWorkshop–CASBeijing,2010-10-22
    27. 27. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development ThesaurusWorkshop–CASBeijing,2010-10-22
    28. 28. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development ThesaurusWorkshop–CASBeijing,2010-10-22 Life Demo: Semantic mark ups: http://viewer.opencalais.com/ http://agropedialabs.iitk.ac.in/Tagger/Agrotagger_text.php
    29. 29. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development ThesaurusWorkshop–CASBeijing,2010-10-22 The concept scheme workbench
    30. 30. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development ThesaurusWorkshop–CASBeijing,2010-10-22  Is a web-based working environment for managing the AGROVOC Concept Server  Facilitate the collaborative editing of multilingual terminology and semantic concept information  It includes administration and group management features  It includes workflows for maintenance, validation and quality assurance of the data pool  The CS is accessible freely to everybody to facilitates collaborative editing The workbench
    31. 31. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development ThesaurusWorkshop–CASBeijing,2010-10-22 Group/Action/Status GROUP Non registered users Term editors Ontology editors Validators Publishers Administrators ACTION concept-create concept-delete concept-edit term-create term-edit term-delete .......... STATUS Proposed by guest Proposed Revised by guest Revised Validated Published Proposed deprecated Deprecated
    32. 32. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development ThesaurusWorkshop–CASBeijing,2010-10-22 3 Concept Life Cycle GUEST <concept-create> Proposed by guest VALIDATOR <validates> Validated PUBLISHER <publishes> Published TERM EDITOR <concept-edit> Revised ADMINISTRATOR <validates> Published ONTOLOGY EDITOR <concept-delete> Proposed deprecated PUBLISHER <validates> Deprecated
    33. 33. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development ThesaurusWorkshop–CASBeijing,2010-10-22 Modules • Home • Search • Concept/Term Management • Relationship Management • Classification Scheme Management • Validation • Consistency Check • Import/Export • User/Group Management • Statistics/Preferences 3
    34. 34. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development ThesaurusWorkshop–CASBeijing,2010-10-22 • by string: the user can specify if the system should search by exact match, beginning with, contains or fuzzy • by URI or term code; or by range of term code (e.g. between 123 and 9876) • by classification schemes • by creation or modification date • by specific relationships (e.g. search all concepts using the “has_pest”) • by status, language by notes/attributes Search 3
    35. 35. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development ThesaurusWorkshop–CASBeijing,2010-10-22 3 Graph Visualization  Java Applets based touch graph  Visualizes concepts and its relationships with other concepts in graphical view
    36. 36. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development ThesaurusWorkshop–CASBeijing,2010-10-22 3 Web services AGROVOC CS WORKBENCH maintain access response uses SKOS Triple Store Other Applications
    37. 37. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development ThesaurusWorkshop–CASBeijing,2010-10-22 AGROVOC Web Services
    38. 38. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development ThesaurusWorkshop–CASBeijing,2010-10-22 Architecture of the System
    39. 39. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development ThesaurusWorkshop–CASBeijing,2010-10-22 3 Front end Back end Administrativ e Database (Mysql) Protégé Triple Store (Mysql) Middleware Hibernate Layer Protégé OWL API Gilead Intermediate Layer Google Web Toolkit (GWT) Graph Visualizatio n GWT Incubator Web services System Overview
    40. 40. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development ThesaurusWorkshop–CASBeijing,2010-10-22 Giving it a try……. A demo version of the AWB: http://202.73.13.50:55234/agrovocdevv10d/ With all functionalities, availabe to users for testing purpose. Latest stable release version 1.0 : (read/write) http://202.73.13.50:55381/agrovocv10i/ Latest stable release version 1.0 (Read only): http://202.73.13.50:55481/agrovocv10i/ (Visitors only with only view privilege)
    41. 41. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development ThesaurusWorkshop–CASBeijing,2010-10-22 …and more: http://aims.fao.org
    42. 42. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development ThesaurusWorkshop–CASBeijing,2010-10-22 Thank You!

    ×