Hybrid approaches for automatic vowelization of arabic texts
Ph d thesis-ahsan_slidesv3
1. http://www.fao.org/aims/
Aligning Controlled vocabularies for enabling
semantic matching in a distributed knowledge
management system
Ahsan Morshed
Doctoral Candidate
University of Trento
ahsan.morshed@fao.org
PhD Supervisor: Professor Fausto Giunchiglia
fausto@dit.unitn.it
Ahsan Morshed, FAO 1 / 54
2. http://www.fao.org/aims/
Publications (1-3)
A. Morshed. Controlled Vocabulary Matching in Distributed Systems, at
BNCOD 2009 Conference,UK.
A. Morshed and M. Sini. Aligning Controlled vocabularies: Algorithm and
Architecture at Workshop on Advance Technologies for Digital Libraries
2009, AT4DL, Trento, Italy.
M. Sini, J. Keizer, G. Johannsen, A. Morshed, S. Rajbhandari and M.
Amirhosseini.The AGROVOC Concept Server Workbench System:
Empowering management of agricultural vocabularies with semantics at
International Association of Agricultural Information Specialists (IAALD),
France, 2010.
Ahsan Morshed, FAO 2 / 54
3. http://www.fao.org/aims/
Publications (4-6)
A. Morshed, G. Johanssen, J. Keizer and M. Zeng,. Bridging End Users’
Terms and AGROVOC Concept Server Vocabularies. International
Conference on Dublin Core and Metadata Applications (DC-2010),
Pittsburgh, USA, 2010 (submitted).
A. Morshed, M. Sini and J. Keizer. Aligning Controlled Vocabularies using a
facet based approach. (Technical Paper at FAO).
A. Morshed and R. Singh. Evaluation and Ranking of Ontology Construction
Tools (Technical Paper).
Ahsan Morshed, FAO 3 / 54
4. http://www.fao.org/aims/
Agenda
Background: the role of controlled vocabulary in semantic matching
The overall goal: Aligning Controlled Vocabularies in a distributed
system
A facet based matching
An Architecture for matching system
A running prototype for matching system
Evaluation Methodology and Results
Limitations and Related Works
Conclusions and Future work
Ahsan Morshed, FAO 4 / 54
5. http://www.fao.org/aims/
Some matching techniques
Element Matching techniques
ex: edit distance
Corpus-based techniques
ex: token or extension of classes
Structure-based tecniques
ex: graph matching
Knowledge-based techniques
ex: external resources
Ahsan Morshed, FAO 5 / 54
6. http://www.fao.org/aims/
Some matching systems
Cupid
- element level and structure level matching
RiMOM
- based on edit distance and Vector distance
FALCON-AO
- based on Linguistic and structure matching
CTXMatch, S-match
-based on knowledge based
Ahsan Morshed, FAO 6 / 54
7. http://www.fao.org/aims/
Some matching projects
HILT (High Level Thesaurus Project)
-JISC funded project, UK
-to facilitate the cross-searching of distributed information
services by subject in a multi-schema environment.
-used datasets (e.g.,DDC,LCSH, IPSV, AAT)
CAT to AGROVOC
Dr. Chan chung
64,638 Chinese terms, 51,614 descriptors and 13,024 non-
descriptors
13,105 exact matches,11,408 BT match, 173 NT match, and 17,47
other matches
Ahsan Morshed, FAO 7 / 54
9. http://www.fao.org/aims/
Matching in Distributed System
Edutella
Edutella is an open source project that creates an infrastructure for sharing
metadata in RDF format
It applies the peer-to-peer model using the JXTA protocol
Swap
aims at overcoming the lack of semantics in current Peer-to-Peer system
Ahsan Morshed, FAO 9 / 54
10. http://www.fao.org/aims/
Semantic Matching in Lighweight
ontologies
To use of lightweight ontologies for matching purpose, all entities need to
agree on the exact meaning of the concepts.
Descriptive lightweight ontologies
-used for defining the meaning of terms as well the nature and structure of a
domain.
Classification lightweight ontologies
-used for describing, classifying, and accessing collection of document.
[Fausto et al.,2007]
Ahsan Morshed, FAO 10 / 54
11. http://www.fao.org/aims/
Controlled Vocabulary (CV)
A vocabulary stores words, synonyms, word sense definitions (i.e.
glosses), relations between word senses and concepts; such a
vocabulary is generally referred to as the Controlled Vocabulary (CV)
if choice or selection of terms are done by domain specialists [ahsan et
al.,2009]
Ahsan Morshed, FAO 11 / 54
12. http://www.fao.org/aims/
Controlled Vocabulary
General controlled vocabulary:
Example: Thesaurus, WordNet, Classification, Directories, Lightweight
Ontologies
Subject specific controlled vocabulary (SSCV)
Library of Congress and Authors List
Uniform List
Series List
Ahsan Morshed, FAO 12 / 54
13. http://www.fao.org/aims/
Applications for managing
controlled vocabularies
Traditional Controlled Vocabulary tools
Ex: Old Agrovoc Thesaurus
Modern Controlled Vocabulary
Ex: AGROVOC Concept Server
Ahsan Morshed, FAO 13 / 54
14. http://www.fao.org/aims/
AGROVOC Concept Server
-store concepts
-Edit concepts
-visualize the
concepts
modern controlled vocabulary
Ref: http://nais.cpe.ku.ac.th/agrovoc/
Ahsan Morshed, FAO 14 / 54
15. http://www.fao.org/aims/
Applications for exploiting
controlled vocabularies
Background Knowledge
Document annotation
Information retrieval and extraction
Audio and Video retrieval
Ahsan Morshed, FAO 15 / 54
16. http://www.fao.org/aims/
Challenges of Matching
Factors of heterogeneity problem
Time
Place
Structure
Culture diversity
Different vocabulary specialists
Ahsan Morshed, FAO 16 / 54
19. http://www.fao.org/aims/
FACET
A facet is like a diamond that consists of different faces.
Its distinct features allow thesauri, classifications or taxonomies to
be organized in different ways.
composed of collectively exhaustive aspects of properties or
characteristics of a domain.
For example, a collection of rice might be classified using cultural
and seasonal facets.
[Fausto et al.,2009] [ahsan et al.,2009]
Ahsan Morshed, FAO 19 / 54
21. http://www.fao.org/aims/
Faceted Controlled vocabulary
Seasonal rice type Cultural rice type
Ahsan Morshed, FAO 21 / 54
22. http://www.fao.org/aims/
Creation of a Facet
Domain Analysis
analysis of terms are done by consulting domain experts
simple concept are identified.
Term collections and organization
terms are order according to their characteristic and meaningful sequence
ex: cow and milk form a facet called Diary system(part of relationship)
[Fausto et al., 2009]
Ahsan Morshed, FAO 22 / 54
23. http://www.fao.org/aims/
Exisiting Metholodies
PMEST : Personality(P), Matter(M), Energy(E), Space (S), and Time(T)
[Ranganathan]
DEPA : Discipline(D), Entity (E), Property (P), Action(A)
[Bhattachary and Fausto et al., 2009 ]
Ahsan Morshed, FAO 23 / 54
24. http://www.fao.org/aims/
Properties of facets
Hospitalities
Compactness
Flexibility
Reusability
The Methodology
Homogeneity
[Bhattachary and Fausto et al., 2009 ]
Ahsan Morshed, FAO 24 / 54
25. http://www.fao.org/aims/
Concept Facet Matcher
Based on DEPA model
CF={mg,lg,R} Where, mg is more general concepts ,lg is less general
concepts, R is related concepts.
Based on Element Lebel Matchers
[Ahsan, 2009 and Ahsan et al., 2009 ]
Ahsan Morshed, FAO 25 / 54
26. http://www.fao.org/aims/
Concept Facet Matcher
Algorithm 1 buildCFacet(CV)
for i = 0 to CV do
store cF (Mg,Lg;R)
end for
return cF
[Ahsan et al., 2009 ]
Ahsan Morshed, FAO 26 / 54
27. http://www.fao.org/aims/
Concept Facet Matcher
Algorithm 2 MatchingFacet(CV1,CV2)
cF1=BuildCFacet(CV1)
cF2=BuildCFacet(CV2)
for i = 0 to cF 1:size do
for j = 0 to cF 2:size do
cfmatcher=elementLevelMatcher(cF 1;cF2)
end for
end for
[Ahsan et al., 2009 ]
Ahsan Morshed, FAO 27 / 54
39. http://www.fao.org/aims/
Results
Facet based appraoch
Experiment 1 Experiment 2
Exact Match 5976 6021
Partial Match 164255 164278
No Match 69800745 69800745
Ahsan Morshed, FAO 39 / 54
40. http://www.fao.org/aims/
Results
Standard Tool
Experiment 1 Experiment 2
Exact Match 8795 8795
Partial Match 334255 334258
No Match N/A N/A
Ahsan Morshed, FAO 40 / 54
41. http://www.fao.org/aims/
Results
Min Max Min Max
Overall 25.8065 31.4496 21.7391 21.7391
Positive 18.6047 14.0814 10.4895 14.6154
Negative 97.1831 52.1495 94.7368 99.1304
Ahsan Morshed, FAO 41 / 54
42. http://www.fao.org/aims/
Advantage of Facet based
System
No knowledge base required
Based on hidden semantic. Semantic meaning retrived during the
processing
Ahsan Morshed, FAO 42 / 54
43. http://www.fao.org/aims/
Limitations
Structure Problems
AGROVOC SQL Format and CABI Text Format
Provided CABI file does not contain chemical and scientific concepts
Term Variants
In AGROVOC, we found ``frog farms" which should have been ``frog farming"
because ``frog farms" is used for ``frog culture" and BT is ``aquaculture". Also, we
found the abbreviated term ``UHT milk" (one kind of milk product) which should
have been "UHT milk".
There were some ambiguous term which had different meanings, for example
``cutting" ( i.e., slicing of bread or meat) or ``cuttings" (i.e.,propagation material).
there were some terms spells whose meaning is to difficult to capture, for
example “2.4.4-T”, “2.4.5-TP 2.4-D”, “2.4 DES”, “2.4 dinitrohenol”. Similarly, CABI
contained the term “4-H Clubs”. These terms did make sense during any
mapping experiments.
Ahsan Morshed, FAO 43 / 54
44. http://www.fao.org/aims/
Limitations
Domain expert
To evaluate our results, we were able to find one domain expert from
FAO but we did not get any domain expert from CABI. The results may
have been different if we had another domain expert.
Lack of consistency
Since the relationships in thesauri lack precise semantics, they are
applied inconsistently, both creating ambiguity in the interpretation
of the relationships and resulting in an overall internal structure that
is irregulated and unpredictable
Ahsan Morshed, FAO 44 / 54
45. http://www.fao.org/aims/
Limitations
Limited automated processing
Traditional thesauri are designed for indexing and query formulation by
people and not for automated processing. The ambiguous semantics that
characterizes many thesauri makes them unsuitable for automated
processing.
Ahsan Morshed, FAO 45 / 54
46. http://www.fao.org/aims/
Related Works
[Fausto et. al, 2004] apply element level matching techniques
for semantic matching
[Stamou et.al] apply string matching techniques for ontology
matching
[Karin Koogan Breitman et.al 2005] apply string matching
techniques for lighweight ontology matching
[Paul Buitelaar et. al, 2009] apply string matching for linguistic
matching system
[Maria Teresa Pazienza et.al, 2007] Apply string matching for
semi-automatic matching system
Ahsan Morshed, FAO 46 / 54
47. http://www.fao.org/aims/
Conclusion and Future work
To build the extended knowledge base
Ahsan Morshed, FAO 47 / 54
48. http://www.fao.org/aims/
Conclusion and Future work
Integrating Mapping into AGROVOC concept Server
Ahsan Morshed, FAO 48 / 54
49. http://www.fao.org/aims/
Conclusion and Future work
We have described the facet based matching system for a large dataset
We have shown a running prototype for this system.
The majority of this work was done under the supervision of the FAO and
the CABI. At the moment, a prototype is running at the FAO
We will integrate this mapping file for searching purpose in AGROVOC
Concept Server.
Ahsan Morshed, FAO 49 / 54
51. http://www.fao.org/aims/
References
[Fausto et al., 2003]: F.Gunchiglia and P. Shvaiko. Semantic Matching
Ontologies and Distributed System workshop, IJCAL,2003
[Fausto et al., 2004]: F. Gunchiglia, P. Shvaiko, and M. Yatskevich. S-
Match: An algorithm and an implementation of semantic matching.
In Proceedings of ESWS’04, 2004.
[Fausto et al., 2004]: F.Gunchiglia and M. Yatskevich. Element level
semantic matching. In meaning Coordination and Negotiation
workshop, ISWC,2004
[Pavel et al., 2006]: P. Shvaiko, F.Gunchiglia and M. Yatskevich.
Discovering missing background knowledge in ontology matching. In
17th European Conference on Artificial Intelligence (ECAI 2006),
volume 141,pages 382-386,2006
Ahsan Morshed, FAO 51 / 54
52. http://www.fao.org/aims/
References (cont)
[Fausto et al., 2007]: F.Gunchiglia and I. Zaihrayeu. Light weight
Ontologies . Technical report at DIT, University of Trento Italy, October
2007
[Pavel et al., 2007]: P. Shvaiko, and J.Euzenate. Ontology matching.
Springer, 1st edition , 2007.
[Fausto et al., 2004]: F.Gunchiglia and M. Yatskevich. Element level
semantic matching. In meaning Coordination and Negotiation
workshop, ISWC,2004
[S.R. Ranganathan]: S.R. Ranganathan. Element of library classification.
Asia Publishing house
Ahsan Morshed, FAO 52 / 54
53. http://www.fao.org/aims/
References (cont)
[Fausto et al., 2009]: F.Gunchiglia, B. Dutta, and V. Maltese. Faceted
lightweight ontologies. In LNCS, 2009
[Bhattachary 1979]: G. Bhattachary. POPSI: its foundamentals and
precedure based on a general theory of subject indexing language. In
Library Science with a slant to Documentation, volume 16, pages.
[Pavel]: P. Shvaiko . Iterative schema-based semantic matching (PhD
thesis), Technical report DIT-06-10Pavel]: 2,December 2006.
[morshed 2009]: A. Morshed and M. Sini. Aligning Controlled
vocabularies: Algorithm and Architecture at Workshop on Advance
Technologies for Digital Libraries 2009, AT4DL, Trento, Italy
[Morshed 2009]: A. Morshed, M. Sini and J. Keizer. Aligning
Controlled Vocabularies using a facet based approach. (Technical Paper
at FAO).
Ahsan Morshed, FAO 53 / 54