The role of Thesauri and Standard Vocabularies in linking data
1. The role of Thesauri
and Standard Vocabularies
in linking data
Dr. Johannes Keizer
FAO of the United Nations
Office of Knowledge Exchange, Research and Extension
Knowledge and Capacity for Development
2. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
ThesaurusWorkshop–CASBeijing,2010-10-22
The Development of the Internet
3. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
ThesaurusWorkshop–CASBeijing,2010-10-22
“Closed” (“normal”) IT environments
Data sources carefully controlled.
Data formats “custom-defined” for an
application.
Linked data based on an “open world
mindset”
Integrating data from the open Web
Systems designed to incorporate new
information incrementally
By design, tolerance of incomplete
information
Open World Mindset
4. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
ThesaurusWorkshop–CASBeijing,2010-10-22
The Linked Data Universe:
http://www.linkeddata.org (july 2009)
4
5. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
ThesaurusWorkshop–CASBeijing,2010-10-22 The Linked Data Universe:
http://www.linkeddata.org (july 2010)
6. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
ThesaurusWorkshop–CASBeijing,2010-10-22
Example: BBC Wildlife Finder
7. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
ThesaurusWorkshop–CASBeijing,2010-10-22
Humboldt Squid page, pulled together from a diversity of Linked Data
sources
Animal Diversity Web:
Nocturnal way of life
BBC TV Documentary
BBC News item
Wikipedia
8. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
ThesaurusWorkshop–CASBeijing,2010-10-22
RDF– a grammar for the language of data
Resource
relatedTo
ResourceA ResourceB
Resource
describedBy
ResourceA Some text
1. Describe resources using interrelated “statements” (“triples”).
2. Use URIs – unique, globally managed identifiers –
as the “words” of statements.
9. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
ThesaurusWorkshop–CASBeijing,2010-10-22
•http://www.w3.org/2007/Talks/0221-Bangalore-IH/
RDF as a common format for merging data
10. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
ThesaurusWorkshop–CASBeijing,2010-10-22
Finding things related to “genes” across
databases
Source: Joanne Luciano, Mitre, and the W3C HCLS IG
11. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
ThesaurusWorkshop–CASBeijing,2010-10-22
Born as tools to assure consistency in the
indexing of library collections
Thesauri were based on “terms”, but terms
represented already concepts in a non
explicit way
Hierarchical and associative relationships
represented generic ontological domain
knowledge
Candidate building blocks for the semantic
web
Role of thesauri/concept schemes
12. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
ThesaurusWorkshop–CASBeijing,2010-10-22
..from thesaurus to Ontologies….
13. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
ThesaurusWorkshop–CASBeijing,2010-10-22
around 30,000 concepts
600000 labels in around 20 languages.
one-stop shop for terminological knowledge
related to agriculture in general
a knowledge base of related concepts organized
in ontological relationships (hierarchical,
associative, equivalence)
Is a concept/term/string based system
Concepts may be organized in multiple categories.
AGROVOC today
14. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
ThesaurusWorkshop–CASBeijing,2010-10-22
Semantic Relationships
Concept to
Concept
isA (hierarchy), isPestOf, hasPest
Concept to
Term
has_lexicalization
(links concepts to their lexical
realizations)
Term to
Term
isSynonymOf, isTranslationOf,
hasAcronym, hasAbbreviation
Term to
String
hasSpellingVariant, hasSingular
15. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
ThesaurusWorkshop–CASBeijing,2010-10-22
The AGROVOC SKOS-XL Model
8171
1474
12332
skosxl:altLabel
skosxl:prefLabel
skos:broader
SKOS
Label
skos:broader
SKOS
Concept
rdf:type
rdf:type
6211
skos:broader
Agrovoc
Concept
Scheme
skos:topConceptOfskos:inScheme
SKOS
Concept
Scheme
rdf:type
rdf:type
:bar
:foo
“corn”
“maize”
skosxl:literalForm
skosxl:literalForm
rdf:type
rdf:type
rdf:type
16. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
ThesaurusWorkshop–CASBeijing,2010-10-22
http://www.w3.org/2004/02/skos/
17. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
ThesaurusWorkshop–CASBeijing,2010-10-22
SKOS-XL output
<rdf:Description
rdf:about="http://aims.fao.org/aos/agrovoc/agrovocScheme"> <rdf:type
rdf:resource="http://www.w3.org/2004/02/skos/core#ConceptScheme"/></rdf
:Description><rdf:Description
rdf:about="http://aims.fao.org/aos/agrovoc/c_330829"> <rdf:type
rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
<skos:inScheme
rdf:resource="http://aims.fao.org/aos/agrovoc/agrovocScheme"/>
<skos:topConceptOf
rdf:resource="http://aims.fao.org/aos/agrovoc/agrovocScheme"/></rdf:Descri
ption><rdf:Description
rdf:about="http://aims.fao.org/aos/agrovoc/xl_en_1278479064610">
<literalForm xmlns="http://www.w3.org/2008/05/skos-xl#"
xml:lang="en">subjects</literalForm> <rdf:type
rdf:resource="http://www.w3.org/2008/05/skos-xl#Label"/></rdf:Description>
URI of AGROVOC concept
18. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
ThesaurusWorkshop–CASBeijing,2010-10-22
AGROVOC EUROVOC UNBIS Relationship
http://aims.fao.
org/aos/agrovoc
/c_207
http://eurovoc
.europa.eu/21
9055
agroforestry skos:exactMatch
/ owl:sameAs
http://aims.fao.
org/aos/agrovoc
/c_4826
http://eurovoc
.europa.eu/22
0018
MILK skos:exactMatch
/ owl:sameAs
http://aims.fao.
org/aos/agrovoc
/c_12332
http://eurovoc
.europa.eu/21
9871
MAIZE skos:exactMatch
/ owl:sameAs
Linking vocabularies
19. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
ThesaurusWorkshop–CASBeijing,2010-10-22
http://agris.fao.org/agris-search/search/display.do?f=2004/ZA/ZA04002.xml;ZA2004000049
20. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
ThesaurusWorkshop–CASBeijing,2010-10-22
http://aims.fao.org/aos/agrovoc/c_7825
http://eurovoc.europa.eu/218754
21. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
ThesaurusWorkshop–CASBeijing,2010-10-22
http://eurovoc.europa.eu/
219871
Maize
skosxl: literalForm
Maize
http://aims.fao.org/ao
s/agrovoc/c_12332
AGROVOC
skosxl: literalForm
Maize
http://aims.fao.org/aos/agrovoc/c_12332 owl:sameAs http://eurovoc.europa.eu/219871
owl:sameAs/exactMatch
http://agris.fao.org/agris-
search/search/display.do?f=1996
/TR/TR96001.xml;TR9600026
Linking data through common URIs
skosxl: literalForm
owl:sameAs/exactMatch
http://eur-
lex.europa.eu/LexUriServ/LexUriSe
rv.do?uri=OJ:L:2010:202:0011:001
5:EN:PDF
http://unbisnet.un.org:8080/ipac20/ipac.j
sp?session=128F308557F34.283092&pr
ofile=bib&uri=full=3100001~!685149~!1&
ri=1&aspect=subtab124&menu=search&
source=~!horizon
Maize
Eurovoc
UNBIS
22. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
ThesaurusWorkshop–CASBeijing,2010-10-22
What are we doing with unstructured data?
• We have enormous amounts of unstructured
material
• Still most of the documents that we are
producing are mostly semantically
unstructured
• Human work to catalogue and index is
becoming always more rare
• We need machines to do automatic semantic
mark ups of text
• If machines are trained and based on concept
schemes, ther are able to do so
23. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
ThesaurusWorkshop–CASBeijing,2010-10-22
24. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
ThesaurusWorkshop–CASBeijing,2010-10-22
• Does Concept identification in unstructured
texts
• Uses Agrovoc as a controlled vocabulary
• Prototype under testing with excellent
results (entire repository of ICARDA
indexed)
• Will produce in future Structured RDF files
that can be used to link data like “open
Calais”
•
AgroTagger
25. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
ThesaurusWorkshop–CASBeijing,2010-10-22
26. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
ThesaurusWorkshop–CASBeijing,2010-10-22
27. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
ThesaurusWorkshop–CASBeijing,2010-10-22
28. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
ThesaurusWorkshop–CASBeijing,2010-10-22
Life Demo: Semantic mark ups:
http://viewer.opencalais.com/
http://agropedialabs.iitk.ac.in/Tagger/Agrotagger_text.php
29. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
ThesaurusWorkshop–CASBeijing,2010-10-22
The concept scheme workbench
30. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
ThesaurusWorkshop–CASBeijing,2010-10-22
Is a web-based working environment for managing the
AGROVOC Concept Server
Facilitate the collaborative editing of multilingual
terminology and semantic concept information
It includes administration and group management
features
It includes workflows for maintenance, validation and
quality assurance of the data pool
The CS is accessible freely to everybody to facilitates
collaborative editing
The workbench
31. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
ThesaurusWorkshop–CASBeijing,2010-10-22
Group/Action/Status
GROUP
Non registered users
Term editors
Ontology editors
Validators
Publishers
Administrators
ACTION
concept-create
concept-delete
concept-edit
term-create
term-edit
term-delete
..........
STATUS
Proposed by guest
Proposed
Revised by guest
Revised
Validated
Published
Proposed deprecated
Deprecated
32. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
ThesaurusWorkshop–CASBeijing,2010-10-22
3
Concept Life Cycle
GUEST
<concept-create>
Proposed by guest
VALIDATOR
<validates>
Validated
PUBLISHER
<publishes>
Published
TERM EDITOR
<concept-edit>
Revised
ADMINISTRATOR
<validates>
Published
ONTOLOGY EDITOR
<concept-delete>
Proposed deprecated
PUBLISHER
<validates>
Deprecated
33. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
ThesaurusWorkshop–CASBeijing,2010-10-22
Modules
• Home
• Search
• Concept/Term
Management
• Relationship
Management
• Classification Scheme
Management
• Validation
• Consistency Check
• Import/Export
• User/Group Management
• Statistics/Preferences
3
34. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
ThesaurusWorkshop–CASBeijing,2010-10-22
• by string: the user can specify if the system
should search by exact match, beginning with,
contains or fuzzy
• by URI or term code; or by range of term code
(e.g. between 123 and 9876)
• by classification schemes
• by creation or modification date
• by specific relationships (e.g. search all
concepts using the “has_pest”)
• by status, language
by notes/attributes
Search
3
35. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
ThesaurusWorkshop–CASBeijing,2010-10-22
3
Graph Visualization
Java Applets
based touch
graph
Visualizes
concepts and
its
relationships
with other
concepts in
graphical view
36. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
ThesaurusWorkshop–CASBeijing,2010-10-22
3
Web services
AGROVOC CS
WORKBENCH maintain access
response
uses
SKOS
Triple
Store
Other
Applications
37. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
ThesaurusWorkshop–CASBeijing,2010-10-22
AGROVOC Web Services
38. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
ThesaurusWorkshop–CASBeijing,2010-10-22
Architecture of the System
39. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
ThesaurusWorkshop–CASBeijing,2010-10-22
3
Front end Back end
Administrativ
e Database
(Mysql)
Protégé
Triple Store
(Mysql)
Middleware
Hibernate
Layer
Protégé
OWL API
Gilead
Intermediate
Layer
Google
Web
Toolkit
(GWT)
Graph
Visualizatio
n
GWT
Incubator
Web
services
System Overview
40. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
ThesaurusWorkshop–CASBeijing,2010-10-22
Giving it a try…….
A demo version of the AWB:
http://202.73.13.50:55234/agrovocdevv10d/ With all
functionalities, availabe to users for testing purpose.
Latest stable release version 1.0 : (read/write)
http://202.73.13.50:55381/agrovocv10i/
Latest stable release version 1.0 (Read only):
http://202.73.13.50:55481/agrovocv10i/ (Visitors only with only
view privilege)
41. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
ThesaurusWorkshop–CASBeijing,2010-10-22
…and more: http://aims.fao.org
42. dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
ThesaurusWorkshop–CASBeijing,2010-10-22
Thank You!