Ph d thesis-ahsan_slidesv3

1,218 views

Published on

My PhD thesis slides which I presented on 12th of April at University of Trento, Trento, Italy

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,218
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
30
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Ph d thesis-ahsan_slidesv3

  1. 1. http://www.fao.org/aims/ Aligning Controlled vocabularies for enabling semantic matching in a distributed knowledge management system Ahsan Morshed Doctoral Candidate University of Trento ahsan.morshed@fao.org PhD Supervisor: Professor Fausto Giunchiglia fausto@dit.unitn.it Ahsan Morshed, FAO 1 / 54
  2. 2. http://www.fao.org/aims/ Publications (1-3)  A. Morshed. Controlled Vocabulary Matching in Distributed Systems, at BNCOD 2009 Conference,UK.  A. Morshed and M. Sini. Aligning Controlled vocabularies: Algorithm and Architecture at Workshop on Advance Technologies for Digital Libraries 2009, AT4DL, Trento, Italy.  M. Sini, J. Keizer, G. Johannsen, A. Morshed, S. Rajbhandari and M. Amirhosseini.The AGROVOC Concept Server Workbench System: Empowering management of agricultural vocabularies with semantics at International Association of Agricultural Information Specialists (IAALD), France, 2010. Ahsan Morshed, FAO 2 / 54
  3. 3. http://www.fao.org/aims/ Publications (4-6)  A. Morshed, G. Johanssen, J. Keizer and M. Zeng,. Bridging End Users’ Terms and AGROVOC Concept Server Vocabularies. International Conference on Dublin Core and Metadata Applications (DC-2010), Pittsburgh, USA, 2010 (submitted).  A. Morshed, M. Sini and J. Keizer. Aligning Controlled Vocabularies using a facet based approach. (Technical Paper at FAO).  A. Morshed and R. Singh. Evaluation and Ranking of Ontology Construction Tools (Technical Paper). Ahsan Morshed, FAO 3 / 54
  4. 4. http://www.fao.org/aims/ Agenda  Background: the role of controlled vocabulary in semantic matching  The overall goal: Aligning Controlled Vocabularies in a distributed system  A facet based matching  An Architecture for matching system  A running prototype for matching system  Evaluation Methodology and Results  Limitations and Related Works  Conclusions and Future work Ahsan Morshed, FAO 4 / 54
  5. 5. http://www.fao.org/aims/ Some matching techniques  Element Matching techniques ex: edit distance  Corpus-based techniques ex: token or extension of classes  Structure-based tecniques ex: graph matching  Knowledge-based techniques ex: external resources Ahsan Morshed, FAO 5 / 54
  6. 6. http://www.fao.org/aims/ Some matching systems  Cupid - element level and structure level matching  RiMOM - based on edit distance and Vector distance  FALCON-AO - based on Linguistic and structure matching  CTXMatch, S-match -based on knowledge based Ahsan Morshed, FAO 6 / 54
  7. 7. http://www.fao.org/aims/ Some matching projects  HILT (High Level Thesaurus Project) -JISC funded project, UK -to facilitate the cross-searching of distributed information services by subject in a multi-schema environment. -used datasets (e.g.,DDC,LCSH, IPSV, AAT)  CAT to AGROVOC  Dr. Chan chung  64,638 Chinese terms, 51,614 descriptors and 13,024 non- descriptors  13,105 exact matches,11,408 BT match, 173 NT match, and 17,47 other matches Ahsan Morshed, FAO 7 / 54
  8. 8. http://www.fao.org/aims/ Some matching project  OAEI 2007 (Ontology Alignment Evaluation Initiative) - Food Track - AGROVOC-NALT thesauri System Alignment Alignment Type Falcon-AO 15,300 exactMatch RiMOM 18,420 exactMatch X-SOM 6,583 exactMatch DSSim 14,962 exactMatch [Willem , 2008] Ahsan Morshed, FAO 8 / 54
  9. 9. http://www.fao.org/aims/ Matching in Distributed System  Edutella  Edutella is an open source project that creates an infrastructure for sharing metadata in RDF format  It applies the peer-to-peer model using the JXTA protocol  Swap  aims at overcoming the lack of semantics in current Peer-to-Peer system Ahsan Morshed, FAO 9 / 54
  10. 10. http://www.fao.org/aims/ Semantic Matching in Lighweight ontologies  To use of lightweight ontologies for matching purpose, all entities need to agree on the exact meaning of the concepts.  Descriptive lightweight ontologies -used for defining the meaning of terms as well the nature and structure of a domain.  Classification lightweight ontologies -used for describing, classifying, and accessing collection of document. [Fausto et al.,2007] Ahsan Morshed, FAO 10 / 54
  11. 11. http://www.fao.org/aims/ Controlled Vocabulary (CV)  A vocabulary stores words, synonyms, word sense definitions (i.e. glosses), relations between word senses and concepts; such a vocabulary is generally referred to as the Controlled Vocabulary (CV) if choice or selection of terms are done by domain specialists [ahsan et al.,2009] Ahsan Morshed, FAO 11 / 54
  12. 12. http://www.fao.org/aims/ Controlled Vocabulary  General controlled vocabulary: Example: Thesaurus, WordNet, Classification, Directories, Lightweight Ontologies  Subject specific controlled vocabulary (SSCV)  Library of Congress and Authors List  Uniform List  Series List Ahsan Morshed, FAO 12 / 54
  13. 13. http://www.fao.org/aims/ Applications for managing controlled vocabularies  Traditional Controlled Vocabulary tools Ex: Old Agrovoc Thesaurus  Modern Controlled Vocabulary Ex: AGROVOC Concept Server Ahsan Morshed, FAO 13 / 54
  14. 14. http://www.fao.org/aims/ AGROVOC Concept Server -store concepts -Edit concepts -visualize the concepts modern controlled vocabulary Ref: http://nais.cpe.ku.ac.th/agrovoc/ Ahsan Morshed, FAO 14 / 54
  15. 15. http://www.fao.org/aims/ Applications for exploiting controlled vocabularies  Background Knowledge  Document annotation  Information retrieval and extraction  Audio and Video retrieval Ahsan Morshed, FAO 15 / 54
  16. 16. http://www.fao.org/aims/ Challenges of Matching  Factors of heterogeneity problem  Time  Place  Structure  Culture diversity  Different vocabulary specialists Ahsan Morshed, FAO 16 / 54
  17. 17. http://www.fao.org/aims/ Challenges of Matching  Different heterogeneity  Syntactic heterogeneity  Lexical heterogeneity  Semantic heterogeneity  Pragmatic heterogeneity  Metadata heterogeneity [Pavel, 2006 ] Ahsan Morshed, FAO 17 / 54
  18. 18. http://www.fao.org/aims/ Problem of CV Ahsan Morshed, FAO 18 / 54
  19. 19. http://www.fao.org/aims/ FACET  A facet is like a diamond that consists of different faces.  Its distinct features allow thesauri, classifications or taxonomies to be organized in different ways.  composed of collectively exhaustive aspects of properties or characteristics of a domain. For example, a collection of rice might be classified using cultural and seasonal facets. [Fausto et al.,2009] [ahsan et al.,2009] Ahsan Morshed, FAO 19 / 54
  20. 20. http://www.fao.org/aims/ Faceted Controlled vocabulary Ahsan Morshed, FAO 20 / 54
  21. 21. http://www.fao.org/aims/ Faceted Controlled vocabulary Seasonal rice type Cultural rice type Ahsan Morshed, FAO 21 / 54
  22. 22. http://www.fao.org/aims/ Creation of a Facet  Domain Analysis  analysis of terms are done by consulting domain experts  simple concept are identified.  Term collections and organization  terms are order according to their characteristic and meaningful sequence ex: cow and milk form a facet called Diary system(part of relationship) [Fausto et al., 2009] Ahsan Morshed, FAO 22 / 54
  23. 23. http://www.fao.org/aims/ Exisiting Metholodies  PMEST : Personality(P), Matter(M), Energy(E), Space (S), and Time(T) [Ranganathan]  DEPA : Discipline(D), Entity (E), Property (P), Action(A) [Bhattachary and Fausto et al., 2009 ] Ahsan Morshed, FAO 23 / 54
  24. 24. http://www.fao.org/aims/ Properties of facets  Hospitalities  Compactness  Flexibility  Reusability  The Methodology  Homogeneity [Bhattachary and Fausto et al., 2009 ] Ahsan Morshed, FAO 24 / 54
  25. 25. http://www.fao.org/aims/ Concept Facet Matcher  Based on DEPA model  CF={mg,lg,R} Where, mg is more general concepts ,lg is less general concepts, R is related concepts.  Based on Element Lebel Matchers [Ahsan, 2009 and Ahsan et al., 2009 ] Ahsan Morshed, FAO 25 / 54
  26. 26. http://www.fao.org/aims/ Concept Facet Matcher Algorithm 1 buildCFacet(CV) for i = 0 to CV do store cF (Mg,Lg;R) end for return cF [Ahsan et al., 2009 ] Ahsan Morshed, FAO 26 / 54
  27. 27. http://www.fao.org/aims/ Concept Facet Matcher Algorithm 2 MatchingFacet(CV1,CV2) cF1=BuildCFacet(CV1) cF2=BuildCFacet(CV2) for i = 0 to cF 1:size do for j = 0 to cF 2:size do cfmatcher=elementLevelMatcher(cF 1;cF2) end for end for [Ahsan et al., 2009 ] Ahsan Morshed, FAO 27 / 54
  28. 28. http://www.fao.org/aims/ System Architecture Ahsan Morshed, FAO 28 / 54
  29. 29. http://www.fao.org/aims/ Data Model Agrovoc database Ref: http://aims.fao.org/website/Download/sub Ahsan Morshed, FAO 29 / 54
  30. 30. http://www.fao.org/aims/ DATA Model CABI database Ref: http://cabi.org Ahsan Morshed, FAO 30 / 54
  31. 31. http://www.fao.org/aims/ A Running Prototype Search Sring Validators/ domain specialist s Ahsan Morshed, FAO 31 / 54
  32. 32. http://www.fao.org/aims/ An architecture for a semantic search Ahsan Morshed, FAO 32 / 54
  33. 33. http://www.fao.org/aims/ Running Prototype for search user’s choice Ahsan Morshed, FAO 33 / 54
  34. 34. http://www.fao.org/aims/ Evaluation and Results  A domain Expert  Exact Match  Partial Match  No Match Ahsan Morshed, FAO 34 / 54
  35. 35. http://www.fao.org/aims/ Datasets Comparision Characteristics AGROVOC CAB Tree leaves 29172 47805 Term counts 18200 32884 Single words 6842 11720 MultiWords 11358 21161 Hierarchy depth 7 14 multiple BT 2546 1207 redundant BT 57 76 Ahsan Morshed, FAO 35 / 54
  36. 36. http://www.fao.org/aims/ Datasets AGROVOC version 2007-08-10 2007-08-10 CABI version 2009-11-01 2009-11-01 AGROVOC term-leaves 35036 35036 CABI term-leaves 29172 29172 Coversion hierarchy hierarchy Knowledge base WordNet 2.1 SWN 400.000 Ahsan Morshed, FAO 36 / 54
  37. 37. http://www.fao.org/aims/ Datasets Relationship BT NT RT UF AGROVOC 228466 228424 326389 54370 CABI 15154 15841 41239 7094 Ahsan Morshed, FAO 37 / 54
  38. 38. http://www.fao.org/aims/ Input files Agrovoc input file CAB input file Ahsan Morshed, FAO 38 / 54
  39. 39. http://www.fao.org/aims/ Results Facet based appraoch Experiment 1 Experiment 2 Exact Match 5976 6021 Partial Match 164255 164278 No Match 69800745 69800745 Ahsan Morshed, FAO 39 / 54
  40. 40. http://www.fao.org/aims/ Results Standard Tool Experiment 1 Experiment 2 Exact Match 8795 8795 Partial Match 334255 334258 No Match N/A N/A Ahsan Morshed, FAO 40 / 54
  41. 41. http://www.fao.org/aims/ Results Min Max Min Max Overall 25.8065 31.4496 21.7391 21.7391 Positive 18.6047 14.0814 10.4895 14.6154 Negative 97.1831 52.1495 94.7368 99.1304 Ahsan Morshed, FAO 41 / 54
  42. 42. http://www.fao.org/aims/ Advantage of Facet based System  No knowledge base required  Based on hidden semantic. Semantic meaning retrived during the processing Ahsan Morshed, FAO 42 / 54
  43. 43. http://www.fao.org/aims/ Limitations  Structure Problems  AGROVOC SQL Format and CABI Text Format  Provided CABI file does not contain chemical and scientific concepts  Term Variants  In AGROVOC, we found ``frog farms" which should have been ``frog farming" because ``frog farms" is used for ``frog culture" and BT is ``aquaculture". Also, we found the abbreviated term ``UHT milk" (one kind of milk product) which should have been "UHT milk".  There were some ambiguous term which had different meanings, for example ``cutting" ( i.e., slicing of bread or meat) or ``cuttings" (i.e.,propagation material).  there were some terms spells whose meaning is to difficult to capture, for example “2.4.4-T”, “2.4.5-TP 2.4-D”, “2.4 DES”, “2.4 dinitrohenol”. Similarly, CABI contained the term “4-H Clubs”. These terms did make sense during any mapping experiments. Ahsan Morshed, FAO 43 / 54
  44. 44. http://www.fao.org/aims/ Limitations  Domain expert  To evaluate our results, we were able to find one domain expert from FAO but we did not get any domain expert from CABI. The results may have been different if we had another domain expert.  Lack of consistency  Since the relationships in thesauri lack precise semantics, they are applied inconsistently, both creating ambiguity in the interpretation of the relationships and resulting in an overall internal structure that is irregulated and unpredictable Ahsan Morshed, FAO 44 / 54
  45. 45. http://www.fao.org/aims/ Limitations  Limited automated processing  Traditional thesauri are designed for indexing and query formulation by people and not for automated processing. The ambiguous semantics that characterizes many thesauri makes them unsuitable for automated processing. Ahsan Morshed, FAO 45 / 54
  46. 46. http://www.fao.org/aims/ Related Works  [Fausto et. al, 2004] apply element level matching techniques for semantic matching  [Stamou et.al] apply string matching techniques for ontology matching  [Karin Koogan Breitman et.al 2005] apply string matching techniques for lighweight ontology matching  [Paul Buitelaar et. al, 2009] apply string matching for linguistic matching system  [Maria Teresa Pazienza et.al, 2007] Apply string matching for semi-automatic matching system Ahsan Morshed, FAO 46 / 54
  47. 47. http://www.fao.org/aims/ Conclusion and Future work  To build the extended knowledge base Ahsan Morshed, FAO 47 / 54
  48. 48. http://www.fao.org/aims/ Conclusion and Future work  Integrating Mapping into AGROVOC concept Server Ahsan Morshed, FAO 48 / 54
  49. 49. http://www.fao.org/aims/ Conclusion and Future work  We have described the facet based matching system for a large dataset  We have shown a running prototype for this system.  The majority of this work was done under the supervision of the FAO and the CABI. At the moment, a prototype is running at the FAO  We will integrate this mapping file for searching purpose in AGROVOC Concept Server. Ahsan Morshed, FAO 49 / 54
  50. 50. http://www.fao.org/aims/ Questions Ahsan Morshed, FAO 50 / 54
  51. 51. http://www.fao.org/aims/ References  [Fausto et al., 2003]: F.Gunchiglia and P. Shvaiko. Semantic Matching Ontologies and Distributed System workshop, IJCAL,2003  [Fausto et al., 2004]: F. Gunchiglia, P. Shvaiko, and M. Yatskevich. S- Match: An algorithm and an implementation of semantic matching. In Proceedings of ESWS’04, 2004.  [Fausto et al., 2004]: F.Gunchiglia and M. Yatskevich. Element level semantic matching. In meaning Coordination and Negotiation workshop, ISWC,2004  [Pavel et al., 2006]: P. Shvaiko, F.Gunchiglia and M. Yatskevich. Discovering missing background knowledge in ontology matching. In 17th European Conference on Artificial Intelligence (ECAI 2006), volume 141,pages 382-386,2006 Ahsan Morshed, FAO 51 / 54
  52. 52. http://www.fao.org/aims/ References (cont)  [Fausto et al., 2007]: F.Gunchiglia and I. Zaihrayeu. Light weight Ontologies . Technical report at DIT, University of Trento Italy, October 2007  [Pavel et al., 2007]: P. Shvaiko, and J.Euzenate. Ontology matching. Springer, 1st edition , 2007.  [Fausto et al., 2004]: F.Gunchiglia and M. Yatskevich. Element level semantic matching. In meaning Coordination and Negotiation workshop, ISWC,2004  [S.R. Ranganathan]: S.R. Ranganathan. Element of library classification. Asia Publishing house Ahsan Morshed, FAO 52 / 54
  53. 53. http://www.fao.org/aims/ References (cont)  [Fausto et al., 2009]: F.Gunchiglia, B. Dutta, and V. Maltese. Faceted lightweight ontologies. In LNCS, 2009  [Bhattachary 1979]: G. Bhattachary. POPSI: its foundamentals and precedure based on a general theory of subject indexing language. In Library Science with a slant to Documentation, volume 16, pages.  [Pavel]: P. Shvaiko . Iterative schema-based semantic matching (PhD thesis), Technical report DIT-06-10Pavel]: 2,December 2006.  [morshed 2009]: A. Morshed and M. Sini. Aligning Controlled vocabularies: Algorithm and Architecture at Workshop on Advance Technologies for Digital Libraries 2009, AT4DL, Trento, Italy  [Morshed 2009]: A. Morshed, M. Sini and J. Keizer. Aligning Controlled Vocabularies using a facet based approach. (Technical Paper at FAO). Ahsan Morshed, FAO 53 / 54
  54. 54. http://www.fao.org/aims/ Thank You Ahsan Morshed, FAO 54 / 54

×