SlideShare a Scribd company logo
1 of 54
Download to read offline
http://www.fao.org/aims/




 Aligning Controlled vocabularies for enabling
semantic matching in a distributed knowledge
             management system

                               Ahsan Morshed
                              Doctoral Candidate
                              University of Trento
                         ahsan.morshed@fao.org
             PhD Supervisor: Professor Fausto Giunchiglia
                         fausto@dit.unitn.it

Ahsan Morshed, FAO   1 / 54
http://www.fao.org/aims/



                   Publications (1-3)
        A. Morshed. Controlled Vocabulary Matching in Distributed Systems, at
         BNCOD 2009 Conference,UK.
        A. Morshed and M. Sini. Aligning Controlled vocabularies: Algorithm and
         Architecture at Workshop on Advance Technologies for Digital Libraries
         2009, AT4DL, Trento, Italy.
        M. Sini, J. Keizer, G. Johannsen, A. Morshed, S. Rajbhandari and M.
         Amirhosseini.The AGROVOC Concept Server Workbench System:
         Empowering management of agricultural vocabularies with semantics at
         International Association of Agricultural Information Specialists (IAALD),
         France, 2010.




Ahsan Morshed, FAO       2 / 54
http://www.fao.org/aims/



                   Publications (4-6)
       A. Morshed, G. Johanssen, J. Keizer and M. Zeng,. Bridging End Users’
        Terms and AGROVOC Concept Server Vocabularies. International
        Conference on Dublin Core and Metadata Applications (DC-2010),
        Pittsburgh, USA, 2010 (submitted).

       A. Morshed, M. Sini and J. Keizer. Aligning Controlled Vocabularies using a
        facet based approach. (Technical Paper at FAO).

       A. Morshed and R. Singh. Evaluation and Ranking of Ontology Construction
        Tools (Technical Paper).




Ahsan Morshed, FAO       3 / 54
http://www.fao.org/aims/



                                Agenda
       Background: the role of controlled vocabulary in semantic matching
       The overall goal: Aligning Controlled Vocabularies in a distributed
        system
       A facet based matching
       An Architecture for matching system
       A running prototype for matching system
       Evaluation Methodology and Results
       Limitations and Related Works
       Conclusions and Future work




Ahsan Morshed, FAO     4 / 54
http://www.fao.org/aims/



        Some matching techniques

       Element Matching techniques
            ex: edit distance
       Corpus-based techniques
            ex: token or extension of classes
       Structure-based tecniques
            ex: graph matching
       Knowledge-based techniques
            ex: external resources




Ahsan Morshed, FAO      5 / 54
http://www.fao.org/aims/



          Some matching systems
        Cupid
         - element level and structure level matching
        RiMOM
         - based on edit distance and Vector distance
        FALCON-AO
         - based on Linguistic and structure matching
        CTXMatch, S-match
         -based on knowledge based




Ahsan Morshed, FAO     6 / 54
http://www.fao.org/aims/



             Some matching projects
         HILT (High Level Thesaurus Project)
         -JISC funded project, UK
         -to facilitate the cross-searching of distributed information
          services by subject in a multi-schema environment.
         -used datasets (e.g.,DDC,LCSH, IPSV, AAT)

        CAT to AGROVOC
            Dr. Chan chung
            64,638 Chinese terms, 51,614 descriptors and 13,024 non-
             descriptors
            13,105 exact matches,11,408 BT match, 173 NT match, and 17,47
             other matches


Ahsan Morshed, FAO        7 / 54
http://www.fao.org/aims/



           Some matching project
       OAEI 2007 (Ontology Alignment Evaluation Initiative) -
        Food Track
        - AGROVOC-NALT thesauri


        System                 Alignment   Alignment Type


        Falcon-AO              15,300      exactMatch

        RiMOM                  18,420      exactMatch

        X-SOM                  6,583       exactMatch

        DSSim                  14,962      exactMatch

                 [Willem , 2008]
Ahsan Morshed, FAO    8 / 54
http://www.fao.org/aims/



   Matching in Distributed System

     Edutella
       Edutella is an open source project that creates an infrastructure for sharing
        metadata in RDF format
       It applies the peer-to-peer model using the JXTA protocol


     Swap
        aims at overcoming the lack of semantics in current Peer-to-Peer system




Ahsan Morshed, FAO        9 / 54
http://www.fao.org/aims/

  Semantic Matching in Lighweight
            ontologies
     To use of lightweight ontologies for matching purpose, all entities need to
      agree on the exact meaning of the concepts.

     Descriptive lightweight ontologies
      -used for defining the meaning of terms as well the nature and structure of a
      domain.

      Classification lightweight ontologies
      -used for describing, classifying, and accessing collection of document.
                    [Fausto et al.,2007]




Ahsan Morshed, FAO        10 / 54
http://www.fao.org/aims/



      Controlled Vocabulary (CV)
     A vocabulary stores words, synonyms, word sense definitions (i.e.
      glosses), relations between word senses and concepts; such a
      vocabulary is generally referred to as the Controlled Vocabulary (CV)
      if choice or selection of terms are done by domain specialists [ahsan et
      al.,2009]




Ahsan Morshed, FAO      11 / 54
http://www.fao.org/aims/



            Controlled Vocabulary
      General controlled vocabulary:
       Example: Thesaurus, WordNet, Classification, Directories, Lightweight
       Ontologies

      Subject specific controlled vocabulary (SSCV)

      Library of Congress and Authors List
            Uniform List

            Series List




Ahsan Morshed, FAO      12 / 54
http://www.fao.org/aims/


          Applications for managing
           controlled vocabularies

         Traditional Controlled Vocabulary tools
          Ex: Old Agrovoc Thesaurus



         Modern Controlled Vocabulary
          Ex: AGROVOC Concept Server




Ahsan Morshed, FAO      13 / 54
http://www.fao.org/aims/



         AGROVOC Concept Server



                                                             -store concepts
                                                             -Edit concepts
                                                             -visualize the
                                                             concepts




          modern controlled vocabulary
          Ref: http://nais.cpe.ku.ac.th/agrovoc/


Ahsan Morshed, FAO   14 / 54
http://www.fao.org/aims/

         Applications for exploiting
          controlled vocabularies

        Background Knowledge

        Document annotation

        Information retrieval and extraction

        Audio and Video retrieval




Ahsan Morshed, FAO      15 / 54
http://www.fao.org/aims/



             Challenges of Matching

       Factors of heterogeneity problem

           Time
           Place
           Structure
           Culture diversity
           Different vocabulary specialists




Ahsan Morshed, FAO     16 / 54
http://www.fao.org/aims/



             Challenges of Matching

       Different heterogeneity

           Syntactic heterogeneity
           Lexical heterogeneity
           Semantic heterogeneity
           Pragmatic heterogeneity
           Metadata heterogeneity

                                  [Pavel, 2006 ]



Ahsan Morshed, FAO    17 / 54
http://www.fao.org/aims/


                     Problem of CV




Ahsan Morshed, FAO   18 / 54
http://www.fao.org/aims/


                                   FACET
       A facet is like a diamond that consists of different faces.

       Its distinct features allow thesauri, classifications or taxonomies to
        be organized in different ways.

       composed of collectively exhaustive aspects of properties or
        characteristics of a domain.
        For example, a collection of rice might be classified using cultural
        and seasonal facets.
           [Fausto et al.,2009] [ahsan et al.,2009]




Ahsan Morshed, FAO       19 / 54
http://www.fao.org/aims/



     Faceted Controlled vocabulary




Ahsan Morshed, FAO   20 / 54
http://www.fao.org/aims/



     Faceted Controlled vocabulary




   Seasonal rice type          Cultural rice type


Ahsan Morshed, FAO   21 / 54
http://www.fao.org/aims/



                   Creation of a Facet
     Domain Analysis
        analysis of terms are done by consulting domain experts
        simple concept are identified.

     Term collections and organization
        terms are order according to their characteristic and meaningful sequence
          ex: cow and milk form a facet called Diary system(part of relationship)
               [Fausto et al., 2009]




Ahsan Morshed, FAO       22 / 54
http://www.fao.org/aims/



               Exisiting Metholodies

       PMEST : Personality(P), Matter(M), Energy(E), Space (S), and Time(T)
          [Ranganathan]

       DEPA : Discipline(D), Entity (E), Property (P), Action(A)
          [Bhattachary and Fausto et al., 2009 ]




Ahsan Morshed, FAO      23 / 54
http://www.fao.org/aims/



                 Properties of facets

             Hospitalities

             Compactness

             Flexibility

             Reusability

             The Methodology

             Homogeneity

           [Bhattachary and Fausto et al., 2009 ]



Ahsan Morshed, FAO          24 / 54
http://www.fao.org/aims/



             Concept Facet Matcher

       Based on DEPA model

       CF={mg,lg,R} Where, mg is more general concepts ,lg is less general
        concepts, R is related concepts.

       Based on Element Lebel Matchers

        [Ahsan, 2009 and Ahsan et al., 2009 ]




Ahsan Morshed, FAO     25 / 54
http://www.fao.org/aims/



             Concept Facet Matcher

       Algorithm 1 buildCFacet(CV)
       for i = 0 to CV do
       store cF (Mg,Lg;R)
       end for
       return cF



       [Ahsan et al., 2009 ]




Ahsan Morshed, FAO     26 / 54
http://www.fao.org/aims/



             Concept Facet Matcher

      Algorithm 2 MatchingFacet(CV1,CV2)
      cF1=BuildCFacet(CV1)
      cF2=BuildCFacet(CV2)
      for i = 0 to cF 1:size do
      for j = 0 to cF 2:size do
      cfmatcher=elementLevelMatcher(cF 1;cF2)
      end for
      end for


      [Ahsan et al., 2009 ]



Ahsan Morshed, FAO    27 / 54
http://www.fao.org/aims/



               System Architecture




Ahsan Morshed, FAO   28 / 54
http://www.fao.org/aims/


                         Data Model




            Agrovoc database
            Ref: http://aims.fao.org/website/Download/sub
Ahsan Morshed, FAO    29 / 54
http://www.fao.org/aims/




                     DATA Model




                     CABI database

                     Ref: http://cabi.org

Ahsan Morshed, FAO      30 / 54
http://www.fao.org/aims/


               A Running Prototype
                                         Search Sring




                               Validators/ domain
                               specialist s
Ahsan Morshed, FAO   31 / 54
http://www.fao.org/aims/


 An architecture for a semantic search




Ahsan Morshed, FAO   32 / 54
http://www.fao.org/aims/



   Running Prototype for search




                               user’s choice




Ahsan Morshed, FAO   33 / 54
http://www.fao.org/aims/



            Evaluation and Results




         A domain Expert
         Exact Match
         Partial Match
         No Match

Ahsan Morshed, FAO   34 / 54
http://www.fao.org/aims/



                       Datasets
                         Comparision
     Characteristics         AGROVOC   CAB
     Tree leaves             29172     47805
     Term counts             18200     32884
     Single words            6842      11720
     MultiWords              11358     21161
     Hierarchy depth         7         14
     multiple BT             2546      1207
     redundant BT            57        76




Ahsan Morshed, FAO     35 / 54
http://www.fao.org/aims/



                         Datasets


      AGROVOC version              2007-08-10    2007-08-10

      CABI version                 2009-11-01    2009-11-01
      AGROVOC term-leaves          35036         35036
      CABI term-leaves             29172         29172
      Coversion                    hierarchy     hierarchy
      Knowledge base               WordNet 2.1   SWN 400.000




Ahsan Morshed, FAO       36 / 54
http://www.fao.org/aims/




                             Datasets

                                    Relationship
                     BT             NT             RT       UF


        AGROVOC 228466              228424         326389   54370


        CABI         15154          15841          41239    7094




Ahsan Morshed, FAO        37 / 54
http://www.fao.org/aims/



                     Input files




   Agrovoc input file             CAB input file



Ahsan Morshed, FAO      38 / 54
http://www.fao.org/aims/



                                Results
                          Facet based appraoch
                                    Experiment 1   Experiment 2


          Exact Match               5976           6021


          Partial Match             164255         164278


          No Match                  69800745       69800745




Ahsan Morshed, FAO        39 / 54
http://www.fao.org/aims/



                                Results
                                    Standard Tool
                                    Experiment 1    Experiment 2


          Exact Match               8795            8795


          Partial Match             334255          334258


          No Match                  N/A             N/A




Ahsan Morshed, FAO        40 / 54
http://www.fao.org/aims/



                                  Results
                     Min           Max       Min       Max


        Overall      25.8065       31.4496   21.7391   21.7391


        Positive     18.6047       14.0814   10.4895   14.6154


        Negative     97.1831       52.1495   94.7368   99.1304




Ahsan Morshed, FAO      41 / 54
http://www.fao.org/aims/


          Advantage of Facet based
                  System

        No knowledge base required



        Based on hidden semantic. Semantic meaning retrived during the
         processing




Ahsan Morshed, FAO     42 / 54
http://www.fao.org/aims/



                               Limitations
     Structure Problems

         AGROVOC SQL Format and CABI Text Format
         Provided CABI file does not contain chemical and scientific concepts

     Term Variants
     In AGROVOC, we found ``frog farms" which should have been ``frog farming"
      because ``frog farms" is used for ``frog culture" and BT is ``aquaculture". Also, we
      found the abbreviated term ``UHT milk" (one kind of milk product) which should
      have been "UHT milk".
     There were some ambiguous term which had different meanings, for example
      ``cutting" ( i.e., slicing of bread or meat) or ``cuttings" (i.e.,propagation material).
     there were some terms spells whose meaning is to difficult to capture, for
      example “2.4.4-T”, “2.4.5-TP 2.4-D”, “2.4 DES”, “2.4 dinitrohenol”. Similarly, CABI
      contained the term “4-H Clubs”. These terms did make sense during any
      mapping experiments.



Ahsan Morshed, FAO          43 / 54
http://www.fao.org/aims/



                            Limitations

     Domain expert
     To evaluate our results, we were able to find one domain expert from
      FAO but we did not get any domain expert from CABI. The results may
      have been different if we had another domain expert.
     Lack of consistency
     Since the relationships in thesauri lack precise semantics, they are
      applied inconsistently, both creating ambiguity in the interpretation
      of the relationships and resulting in an overall internal structure that
      is irregulated and unpredictable




Ahsan Morshed, FAO      44 / 54
http://www.fao.org/aims/



                            Limitations

     Limited automated processing
     Traditional thesauri are designed for indexing and query formulation by
      people and not for automated processing. The ambiguous semantics that
      characterizes many thesauri makes them unsuitable for automated
      processing.




Ahsan Morshed, FAO      45 / 54
http://www.fao.org/aims/



                       Related Works
          [Fausto et. al, 2004] apply element level matching techniques
           for semantic matching
          [Stamou et.al] apply string matching techniques for ontology
           matching
          [Karin Koogan Breitman et.al 2005] apply string matching
           techniques for lighweight ontology matching
          [Paul Buitelaar et. al, 2009] apply string matching for linguistic
           matching system
          [Maria Teresa Pazienza et.al, 2007] Apply string matching for
           semi-automatic matching system




Ahsan Morshed, FAO      46 / 54
http://www.fao.org/aims/



       Conclusion and Future work
          To build the extended knowledge base




Ahsan Morshed, FAO   47 / 54
http://www.fao.org/aims/



       Conclusion and Future work
      Integrating Mapping into AGROVOC concept Server




Ahsan Morshed, FAO   48 / 54
http://www.fao.org/aims/



        Conclusion and Future work

       We have described the facet based matching system for a large dataset
       We have shown a running prototype for this system.

       The majority of this work was done under the supervision of the FAO and
        the CABI. At the moment, a prototype is running at the FAO

       We will integrate this mapping file for searching purpose in AGROVOC
        Concept Server.




Ahsan Morshed, FAO      49 / 54
http://www.fao.org/aims/



                           Questions




Ahsan Morshed, FAO   50 / 54
http://www.fao.org/aims/


                          References
     [Fausto et al., 2003]: F.Gunchiglia and P. Shvaiko. Semantic Matching
      Ontologies and Distributed System workshop, IJCAL,2003
     [Fausto et al., 2004]: F. Gunchiglia, P. Shvaiko, and M. Yatskevich. S-
      Match: An algorithm and an implementation of semantic matching.
      In Proceedings of ESWS’04, 2004.
     [Fausto et al., 2004]: F.Gunchiglia and M. Yatskevich. Element level
      semantic matching. In meaning Coordination and Negotiation
      workshop, ISWC,2004
     [Pavel et al., 2006]: P. Shvaiko, F.Gunchiglia and M. Yatskevich.
      Discovering missing background knowledge in ontology matching. In
      17th European Conference on Artificial Intelligence (ECAI 2006),
      volume 141,pages 382-386,2006




Ahsan Morshed, FAO       51 / 54
http://www.fao.org/aims/



                  References (cont)

     [Fausto et al., 2007]: F.Gunchiglia and I. Zaihrayeu. Light weight
      Ontologies . Technical report at DIT, University of Trento Italy, October
      2007
     [Pavel et al., 2007]: P. Shvaiko, and J.Euzenate. Ontology matching.
      Springer, 1st edition , 2007.
     [Fausto et al., 2004]: F.Gunchiglia and M. Yatskevich. Element level
      semantic matching. In meaning Coordination and Negotiation
      workshop, ISWC,2004
     [S.R. Ranganathan]: S.R. Ranganathan. Element of library classification.
      Asia Publishing house




Ahsan Morshed, FAO       52 / 54
http://www.fao.org/aims/


                  References (cont)
     [Fausto et al., 2009]: F.Gunchiglia, B. Dutta, and V. Maltese. Faceted
      lightweight ontologies. In LNCS, 2009
     [Bhattachary 1979]: G. Bhattachary. POPSI: its foundamentals and
      precedure based on a general theory of subject indexing language. In
      Library Science with a slant to Documentation, volume 16, pages.
     [Pavel]: P. Shvaiko . Iterative schema-based semantic matching (PhD
      thesis), Technical report DIT-06-10Pavel]: 2,December 2006.

     [morshed 2009]: A. Morshed and M. Sini. Aligning Controlled
      vocabularies: Algorithm and Architecture at Workshop on Advance
      Technologies for Digital Libraries 2009, AT4DL, Trento, Italy
     [Morshed 2009]: A. Morshed, M. Sini and J. Keizer. Aligning
      Controlled Vocabularies using a facet based approach. (Technical Paper
      at FAO).


Ahsan Morshed, FAO      53 / 54
http://www.fao.org/aims/




                        Thank You




Ahsan Morshed, FAO   54 / 54

More Related Content

Similar to Ph d thesis-ahsan_slidesv3

Similar to Ph d thesis-ahsan_slidesv3 (20)

Laboratory for applied ontology
Laboratory for applied ontologyLaboratory for applied ontology
Laboratory for applied ontology
 
The agricultural ontology service a proposal to create a knowledge organizati...
The agricultural ontology service a proposal to create a knowledge organizati...The agricultural ontology service a proposal to create a knowledge organizati...
The agricultural ontology service a proposal to create a knowledge organizati...
 
Semantic standards for the web
Semantic standards for the webSemantic standards for the web
Semantic standards for the web
 
Integrating controlled vocabularies in information management systems : the n...
Integrating controlled vocabularies in information management systems : the n...Integrating controlled vocabularies in information management systems : the n...
Integrating controlled vocabularies in information management systems : the n...
 
Developing an integrated thesaurus for the cornell genomics initiative digita...
Developing an integrated thesaurus for the cornell genomics initiative digita...Developing an integrated thesaurus for the cornell genomics initiative digita...
Developing an integrated thesaurus for the cornell genomics initiative digita...
 
Ontology development and use for efficient information input and retrieval
Ontology development and use for efficient information input and retrievalOntology development and use for efficient information input and retrieval
Ontology development and use for efficient information input and retrieval
 
Ontology development and use for efficient information input and retrieval
Ontology development and use for efficient information input and retrievalOntology development and use for efficient information input and retrieval
Ontology development and use for efficient information input and retrieval
 
RDA Wheat Data Interoperability Cookbook and last developments
RDA Wheat Data Interoperability Cookbook and last developmentsRDA Wheat Data Interoperability Cookbook and last developments
RDA Wheat Data Interoperability Cookbook and last developments
 
The Agricultural Ontology Server (AOS)
The Agricultural Ontology Server (AOS)The Agricultural Ontology Server (AOS)
The Agricultural Ontology Server (AOS)
 
Thesaurus alignment for linked data publishing DC 2011
Thesaurus alignment for linked data publishing DC 2011Thesaurus alignment for linked data publishing DC 2011
Thesaurus alignment for linked data publishing DC 2011
 
A Collaborative Framework for Managing and Publishing KOS
A Collaborative  Framework for  Managing and Publishing KOS A Collaborative  Framework for  Managing and Publishing KOS
A Collaborative Framework for Managing and Publishing KOS
 
February 2024 - Top 10 cited articles.pdf
February 2024 - Top 10 cited articles.pdfFebruary 2024 - Top 10 cited articles.pdf
February 2024 - Top 10 cited articles.pdf
 
WORD RECOGNITION MASLP
WORD RECOGNITION MASLPWORD RECOGNITION MASLP
WORD RECOGNITION MASLP
 
Educational Multimedia Dictionary
Educational Multimedia DictionaryEducational Multimedia Dictionary
Educational Multimedia Dictionary
 
September 2021: Top10 Cited Articles in Natural Language Computing
September 2021: Top10 Cited Articles in Natural Language ComputingSeptember 2021: Top10 Cited Articles in Natural Language Computing
September 2021: Top10 Cited Articles in Natural Language Computing
 
AgriOcean DSpace
AgriOcean DSpace AgriOcean DSpace
AgriOcean DSpace
 
A statistical approach to term extraction.pdf
A statistical approach to term extraction.pdfA statistical approach to term extraction.pdf
A statistical approach to term extraction.pdf
 
April 2022 - Top 10 cited articles.pdf
April 2022 - Top 10 cited articles.pdfApril 2022 - Top 10 cited articles.pdf
April 2022 - Top 10 cited articles.pdf
 
Recent advances in LVCSR : A benchmark comparison of performances
Recent advances in LVCSR : A benchmark comparison of performancesRecent advances in LVCSR : A benchmark comparison of performances
Recent advances in LVCSR : A benchmark comparison of performances
 
Hybrid approaches for automatic vowelization of arabic texts
Hybrid approaches for automatic vowelization of arabic textsHybrid approaches for automatic vowelization of arabic texts
Hybrid approaches for automatic vowelization of arabic texts
 

Ph d thesis-ahsan_slidesv3

  • 1. http://www.fao.org/aims/ Aligning Controlled vocabularies for enabling semantic matching in a distributed knowledge management system Ahsan Morshed Doctoral Candidate University of Trento ahsan.morshed@fao.org PhD Supervisor: Professor Fausto Giunchiglia fausto@dit.unitn.it Ahsan Morshed, FAO 1 / 54
  • 2. http://www.fao.org/aims/ Publications (1-3)  A. Morshed. Controlled Vocabulary Matching in Distributed Systems, at BNCOD 2009 Conference,UK.  A. Morshed and M. Sini. Aligning Controlled vocabularies: Algorithm and Architecture at Workshop on Advance Technologies for Digital Libraries 2009, AT4DL, Trento, Italy.  M. Sini, J. Keizer, G. Johannsen, A. Morshed, S. Rajbhandari and M. Amirhosseini.The AGROVOC Concept Server Workbench System: Empowering management of agricultural vocabularies with semantics at International Association of Agricultural Information Specialists (IAALD), France, 2010. Ahsan Morshed, FAO 2 / 54
  • 3. http://www.fao.org/aims/ Publications (4-6)  A. Morshed, G. Johanssen, J. Keizer and M. Zeng,. Bridging End Users’ Terms and AGROVOC Concept Server Vocabularies. International Conference on Dublin Core and Metadata Applications (DC-2010), Pittsburgh, USA, 2010 (submitted).  A. Morshed, M. Sini and J. Keizer. Aligning Controlled Vocabularies using a facet based approach. (Technical Paper at FAO).  A. Morshed and R. Singh. Evaluation and Ranking of Ontology Construction Tools (Technical Paper). Ahsan Morshed, FAO 3 / 54
  • 4. http://www.fao.org/aims/ Agenda  Background: the role of controlled vocabulary in semantic matching  The overall goal: Aligning Controlled Vocabularies in a distributed system  A facet based matching  An Architecture for matching system  A running prototype for matching system  Evaluation Methodology and Results  Limitations and Related Works  Conclusions and Future work Ahsan Morshed, FAO 4 / 54
  • 5. http://www.fao.org/aims/ Some matching techniques  Element Matching techniques ex: edit distance  Corpus-based techniques ex: token or extension of classes  Structure-based tecniques ex: graph matching  Knowledge-based techniques ex: external resources Ahsan Morshed, FAO 5 / 54
  • 6. http://www.fao.org/aims/ Some matching systems  Cupid - element level and structure level matching  RiMOM - based on edit distance and Vector distance  FALCON-AO - based on Linguistic and structure matching  CTXMatch, S-match -based on knowledge based Ahsan Morshed, FAO 6 / 54
  • 7. http://www.fao.org/aims/ Some matching projects  HILT (High Level Thesaurus Project) -JISC funded project, UK -to facilitate the cross-searching of distributed information services by subject in a multi-schema environment. -used datasets (e.g.,DDC,LCSH, IPSV, AAT)  CAT to AGROVOC  Dr. Chan chung  64,638 Chinese terms, 51,614 descriptors and 13,024 non- descriptors  13,105 exact matches,11,408 BT match, 173 NT match, and 17,47 other matches Ahsan Morshed, FAO 7 / 54
  • 8. http://www.fao.org/aims/ Some matching project  OAEI 2007 (Ontology Alignment Evaluation Initiative) - Food Track - AGROVOC-NALT thesauri System Alignment Alignment Type Falcon-AO 15,300 exactMatch RiMOM 18,420 exactMatch X-SOM 6,583 exactMatch DSSim 14,962 exactMatch [Willem , 2008] Ahsan Morshed, FAO 8 / 54
  • 9. http://www.fao.org/aims/ Matching in Distributed System  Edutella  Edutella is an open source project that creates an infrastructure for sharing metadata in RDF format  It applies the peer-to-peer model using the JXTA protocol  Swap  aims at overcoming the lack of semantics in current Peer-to-Peer system Ahsan Morshed, FAO 9 / 54
  • 10. http://www.fao.org/aims/ Semantic Matching in Lighweight ontologies  To use of lightweight ontologies for matching purpose, all entities need to agree on the exact meaning of the concepts.  Descriptive lightweight ontologies -used for defining the meaning of terms as well the nature and structure of a domain.  Classification lightweight ontologies -used for describing, classifying, and accessing collection of document. [Fausto et al.,2007] Ahsan Morshed, FAO 10 / 54
  • 11. http://www.fao.org/aims/ Controlled Vocabulary (CV)  A vocabulary stores words, synonyms, word sense definitions (i.e. glosses), relations between word senses and concepts; such a vocabulary is generally referred to as the Controlled Vocabulary (CV) if choice or selection of terms are done by domain specialists [ahsan et al.,2009] Ahsan Morshed, FAO 11 / 54
  • 12. http://www.fao.org/aims/ Controlled Vocabulary  General controlled vocabulary: Example: Thesaurus, WordNet, Classification, Directories, Lightweight Ontologies  Subject specific controlled vocabulary (SSCV)  Library of Congress and Authors List  Uniform List  Series List Ahsan Morshed, FAO 12 / 54
  • 13. http://www.fao.org/aims/ Applications for managing controlled vocabularies  Traditional Controlled Vocabulary tools Ex: Old Agrovoc Thesaurus  Modern Controlled Vocabulary Ex: AGROVOC Concept Server Ahsan Morshed, FAO 13 / 54
  • 14. http://www.fao.org/aims/ AGROVOC Concept Server -store concepts -Edit concepts -visualize the concepts modern controlled vocabulary Ref: http://nais.cpe.ku.ac.th/agrovoc/ Ahsan Morshed, FAO 14 / 54
  • 15. http://www.fao.org/aims/ Applications for exploiting controlled vocabularies  Background Knowledge  Document annotation  Information retrieval and extraction  Audio and Video retrieval Ahsan Morshed, FAO 15 / 54
  • 16. http://www.fao.org/aims/ Challenges of Matching  Factors of heterogeneity problem  Time  Place  Structure  Culture diversity  Different vocabulary specialists Ahsan Morshed, FAO 16 / 54
  • 17. http://www.fao.org/aims/ Challenges of Matching  Different heterogeneity  Syntactic heterogeneity  Lexical heterogeneity  Semantic heterogeneity  Pragmatic heterogeneity  Metadata heterogeneity [Pavel, 2006 ] Ahsan Morshed, FAO 17 / 54
  • 18. http://www.fao.org/aims/ Problem of CV Ahsan Morshed, FAO 18 / 54
  • 19. http://www.fao.org/aims/ FACET  A facet is like a diamond that consists of different faces.  Its distinct features allow thesauri, classifications or taxonomies to be organized in different ways.  composed of collectively exhaustive aspects of properties or characteristics of a domain. For example, a collection of rice might be classified using cultural and seasonal facets. [Fausto et al.,2009] [ahsan et al.,2009] Ahsan Morshed, FAO 19 / 54
  • 20. http://www.fao.org/aims/ Faceted Controlled vocabulary Ahsan Morshed, FAO 20 / 54
  • 21. http://www.fao.org/aims/ Faceted Controlled vocabulary Seasonal rice type Cultural rice type Ahsan Morshed, FAO 21 / 54
  • 22. http://www.fao.org/aims/ Creation of a Facet  Domain Analysis  analysis of terms are done by consulting domain experts  simple concept are identified.  Term collections and organization  terms are order according to their characteristic and meaningful sequence ex: cow and milk form a facet called Diary system(part of relationship) [Fausto et al., 2009] Ahsan Morshed, FAO 22 / 54
  • 23. http://www.fao.org/aims/ Exisiting Metholodies  PMEST : Personality(P), Matter(M), Energy(E), Space (S), and Time(T) [Ranganathan]  DEPA : Discipline(D), Entity (E), Property (P), Action(A) [Bhattachary and Fausto et al., 2009 ] Ahsan Morshed, FAO 23 / 54
  • 24. http://www.fao.org/aims/ Properties of facets  Hospitalities  Compactness  Flexibility  Reusability  The Methodology  Homogeneity [Bhattachary and Fausto et al., 2009 ] Ahsan Morshed, FAO 24 / 54
  • 25. http://www.fao.org/aims/ Concept Facet Matcher  Based on DEPA model  CF={mg,lg,R} Where, mg is more general concepts ,lg is less general concepts, R is related concepts.  Based on Element Lebel Matchers [Ahsan, 2009 and Ahsan et al., 2009 ] Ahsan Morshed, FAO 25 / 54
  • 26. http://www.fao.org/aims/ Concept Facet Matcher Algorithm 1 buildCFacet(CV) for i = 0 to CV do store cF (Mg,Lg;R) end for return cF [Ahsan et al., 2009 ] Ahsan Morshed, FAO 26 / 54
  • 27. http://www.fao.org/aims/ Concept Facet Matcher Algorithm 2 MatchingFacet(CV1,CV2) cF1=BuildCFacet(CV1) cF2=BuildCFacet(CV2) for i = 0 to cF 1:size do for j = 0 to cF 2:size do cfmatcher=elementLevelMatcher(cF 1;cF2) end for end for [Ahsan et al., 2009 ] Ahsan Morshed, FAO 27 / 54
  • 28. http://www.fao.org/aims/ System Architecture Ahsan Morshed, FAO 28 / 54
  • 29. http://www.fao.org/aims/ Data Model Agrovoc database Ref: http://aims.fao.org/website/Download/sub Ahsan Morshed, FAO 29 / 54
  • 30. http://www.fao.org/aims/ DATA Model CABI database Ref: http://cabi.org Ahsan Morshed, FAO 30 / 54
  • 31. http://www.fao.org/aims/ A Running Prototype Search Sring Validators/ domain specialist s Ahsan Morshed, FAO 31 / 54
  • 32. http://www.fao.org/aims/ An architecture for a semantic search Ahsan Morshed, FAO 32 / 54
  • 33. http://www.fao.org/aims/ Running Prototype for search user’s choice Ahsan Morshed, FAO 33 / 54
  • 34. http://www.fao.org/aims/ Evaluation and Results  A domain Expert  Exact Match  Partial Match  No Match Ahsan Morshed, FAO 34 / 54
  • 35. http://www.fao.org/aims/ Datasets Comparision Characteristics AGROVOC CAB Tree leaves 29172 47805 Term counts 18200 32884 Single words 6842 11720 MultiWords 11358 21161 Hierarchy depth 7 14 multiple BT 2546 1207 redundant BT 57 76 Ahsan Morshed, FAO 35 / 54
  • 36. http://www.fao.org/aims/ Datasets AGROVOC version 2007-08-10 2007-08-10 CABI version 2009-11-01 2009-11-01 AGROVOC term-leaves 35036 35036 CABI term-leaves 29172 29172 Coversion hierarchy hierarchy Knowledge base WordNet 2.1 SWN 400.000 Ahsan Morshed, FAO 36 / 54
  • 37. http://www.fao.org/aims/ Datasets Relationship BT NT RT UF AGROVOC 228466 228424 326389 54370 CABI 15154 15841 41239 7094 Ahsan Morshed, FAO 37 / 54
  • 38. http://www.fao.org/aims/ Input files Agrovoc input file CAB input file Ahsan Morshed, FAO 38 / 54
  • 39. http://www.fao.org/aims/ Results Facet based appraoch Experiment 1 Experiment 2 Exact Match 5976 6021 Partial Match 164255 164278 No Match 69800745 69800745 Ahsan Morshed, FAO 39 / 54
  • 40. http://www.fao.org/aims/ Results Standard Tool Experiment 1 Experiment 2 Exact Match 8795 8795 Partial Match 334255 334258 No Match N/A N/A Ahsan Morshed, FAO 40 / 54
  • 41. http://www.fao.org/aims/ Results Min Max Min Max Overall 25.8065 31.4496 21.7391 21.7391 Positive 18.6047 14.0814 10.4895 14.6154 Negative 97.1831 52.1495 94.7368 99.1304 Ahsan Morshed, FAO 41 / 54
  • 42. http://www.fao.org/aims/ Advantage of Facet based System  No knowledge base required  Based on hidden semantic. Semantic meaning retrived during the processing Ahsan Morshed, FAO 42 / 54
  • 43. http://www.fao.org/aims/ Limitations  Structure Problems  AGROVOC SQL Format and CABI Text Format  Provided CABI file does not contain chemical and scientific concepts  Term Variants  In AGROVOC, we found ``frog farms" which should have been ``frog farming" because ``frog farms" is used for ``frog culture" and BT is ``aquaculture". Also, we found the abbreviated term ``UHT milk" (one kind of milk product) which should have been "UHT milk".  There were some ambiguous term which had different meanings, for example ``cutting" ( i.e., slicing of bread or meat) or ``cuttings" (i.e.,propagation material).  there were some terms spells whose meaning is to difficult to capture, for example “2.4.4-T”, “2.4.5-TP 2.4-D”, “2.4 DES”, “2.4 dinitrohenol”. Similarly, CABI contained the term “4-H Clubs”. These terms did make sense during any mapping experiments. Ahsan Morshed, FAO 43 / 54
  • 44. http://www.fao.org/aims/ Limitations  Domain expert  To evaluate our results, we were able to find one domain expert from FAO but we did not get any domain expert from CABI. The results may have been different if we had another domain expert.  Lack of consistency  Since the relationships in thesauri lack precise semantics, they are applied inconsistently, both creating ambiguity in the interpretation of the relationships and resulting in an overall internal structure that is irregulated and unpredictable Ahsan Morshed, FAO 44 / 54
  • 45. http://www.fao.org/aims/ Limitations  Limited automated processing  Traditional thesauri are designed for indexing and query formulation by people and not for automated processing. The ambiguous semantics that characterizes many thesauri makes them unsuitable for automated processing. Ahsan Morshed, FAO 45 / 54
  • 46. http://www.fao.org/aims/ Related Works  [Fausto et. al, 2004] apply element level matching techniques for semantic matching  [Stamou et.al] apply string matching techniques for ontology matching  [Karin Koogan Breitman et.al 2005] apply string matching techniques for lighweight ontology matching  [Paul Buitelaar et. al, 2009] apply string matching for linguistic matching system  [Maria Teresa Pazienza et.al, 2007] Apply string matching for semi-automatic matching system Ahsan Morshed, FAO 46 / 54
  • 47. http://www.fao.org/aims/ Conclusion and Future work  To build the extended knowledge base Ahsan Morshed, FAO 47 / 54
  • 48. http://www.fao.org/aims/ Conclusion and Future work  Integrating Mapping into AGROVOC concept Server Ahsan Morshed, FAO 48 / 54
  • 49. http://www.fao.org/aims/ Conclusion and Future work  We have described the facet based matching system for a large dataset  We have shown a running prototype for this system.  The majority of this work was done under the supervision of the FAO and the CABI. At the moment, a prototype is running at the FAO  We will integrate this mapping file for searching purpose in AGROVOC Concept Server. Ahsan Morshed, FAO 49 / 54
  • 50. http://www.fao.org/aims/ Questions Ahsan Morshed, FAO 50 / 54
  • 51. http://www.fao.org/aims/ References  [Fausto et al., 2003]: F.Gunchiglia and P. Shvaiko. Semantic Matching Ontologies and Distributed System workshop, IJCAL,2003  [Fausto et al., 2004]: F. Gunchiglia, P. Shvaiko, and M. Yatskevich. S- Match: An algorithm and an implementation of semantic matching. In Proceedings of ESWS’04, 2004.  [Fausto et al., 2004]: F.Gunchiglia and M. Yatskevich. Element level semantic matching. In meaning Coordination and Negotiation workshop, ISWC,2004  [Pavel et al., 2006]: P. Shvaiko, F.Gunchiglia and M. Yatskevich. Discovering missing background knowledge in ontology matching. In 17th European Conference on Artificial Intelligence (ECAI 2006), volume 141,pages 382-386,2006 Ahsan Morshed, FAO 51 / 54
  • 52. http://www.fao.org/aims/ References (cont)  [Fausto et al., 2007]: F.Gunchiglia and I. Zaihrayeu. Light weight Ontologies . Technical report at DIT, University of Trento Italy, October 2007  [Pavel et al., 2007]: P. Shvaiko, and J.Euzenate. Ontology matching. Springer, 1st edition , 2007.  [Fausto et al., 2004]: F.Gunchiglia and M. Yatskevich. Element level semantic matching. In meaning Coordination and Negotiation workshop, ISWC,2004  [S.R. Ranganathan]: S.R. Ranganathan. Element of library classification. Asia Publishing house Ahsan Morshed, FAO 52 / 54
  • 53. http://www.fao.org/aims/ References (cont)  [Fausto et al., 2009]: F.Gunchiglia, B. Dutta, and V. Maltese. Faceted lightweight ontologies. In LNCS, 2009  [Bhattachary 1979]: G. Bhattachary. POPSI: its foundamentals and precedure based on a general theory of subject indexing language. In Library Science with a slant to Documentation, volume 16, pages.  [Pavel]: P. Shvaiko . Iterative schema-based semantic matching (PhD thesis), Technical report DIT-06-10Pavel]: 2,December 2006.  [morshed 2009]: A. Morshed and M. Sini. Aligning Controlled vocabularies: Algorithm and Architecture at Workshop on Advance Technologies for Digital Libraries 2009, AT4DL, Trento, Italy  [Morshed 2009]: A. Morshed, M. Sini and J. Keizer. Aligning Controlled Vocabularies using a facet based approach. (Technical Paper at FAO). Ahsan Morshed, FAO 53 / 54
  • 54. http://www.fao.org/aims/ Thank You Ahsan Morshed, FAO 54 / 54