SlideShare a Scribd company logo
1 of 60
Lecture Notes
                                                  Birzeit University, Palestine
                                                                          2012




Advanced Topics in Ontology Engineering


        Arabic Ontology



            Dr. Mustafa Jarrar
          Sina Institute, University of Birzeit
                  mjarrar@birzeit.edu
                     www.jarrar.info


                       Jarrar © 2012                                          1
Reading


Mustafa Jarrar: Building A Formal Arabic Ontology (Invited Paper) . In proceedings of
the Experts Meeting On Arabic Ontologies And Semantic Networks. Alecso, Arab League.
Tunis, July 26-28, 2011.Article http://www.jarrar.info/publications/J11.pdf.htm
Slides: http://mjarrar.blogspot.com/2011/08/building-formal-arabic-ontology-invited.html


Mustafa Jarrar: Towards The Notion Of Gloss, And The Adoption Of Linguistic
Resources In Formal Ontology Engineering. In proceedings of the 15th International World
Wide Web Conference (WWW2006). Edinburgh, Scotland. Pages 497-503. ACM Press. ISBN:
1595933239. May 2006.http://www.jarrar.info/publications/J06.pdf.htm




                                         Jarrar © 2012                                     2
Arabic Ontology Team (Current)




 Fateh Badran                     Collaborators
 Maather Alkqam                   Jamal Daher
 Rabaa Yusuf
                                  Prof. Mahi Arrar
 Razan Marwan
 Haneen Somoom
 Naser Musleh
                  Jarrar © 2012                      3
The Arabic Ontology Project
                                                         http://sites.birzeit.edu/comp/ArabicOntology


       •   A project started in 2010, at Birzeit University, Palestine.
       •   The ArabicOntology is more than an Arabic WordNet
       •   Unlike WordNet, the ArabicOntology is logically and philosophically well-
           founded, as it follows strict ontological principles.  but can be used an
           Arabic WordNet.




 The project is partially funded
  (Seed funding) by Birzeit
  University (VP academic
  Office, Research Committee).




                                         Jarrar © 2012                                            4
Arabic Ontology: Data Model (Simplified)
     • ConceptID (as a synsetID in WordNet) to identify a concept.
     • Polysemy and synonymy: like in WordNet, several words (i.e., lexical
       units) can be used to lexicalize one concept (synonymy); and one word
       might be used to lexicalize several concepts.




                                         Lexical Unit




Gloss: describes a concept



                                                       Semantic Relations
Concept ID: concept unique reference

                                       Jarrar © 2012                        5
Lexical vs. Semantic Relationships
     • Semantic relations are relationships between concepts (not
       words), e.g., subtype, part-of, etc.
     • Lexical relations are relationships between words (not
       concepts), e.g., synonym-of, root-of, abbreviation-of, etc.
     • Ontologies are mainly concerned with semantic relations.

                                       Lexical Unit




Gloss: describes a concept
                                                  Semantic Relations



Concept ID: concept unique reference
                                       Jarrar © 2012                   6
Arabic Ontology

• Arabic Ontology: the set of concepts (of all Arabic terms), and the
  semantic (not lexical) relationships between these concepts.
• To build an Arabic Ontology: Identify the set of concepts for every
  Arabic word (Polysemy), and define semantic relations between these
  concepts.
• Most important relation is the subtype relation,
    which leads to a (tree of concepts) .




                              Jarrar © 2012                             7
Arabic Ontology: Subtype Relationships
    • Subtype relation: is a mathematical relations (subset: A B ), such
      that every instance in A must also be an instance of B.
    • Inheritance: subtypes inherit all properties of their super types.
    • “Hyponymy” in WordNet is close to (but not the same as) the subtype relation.
    • “General-Specific” relations, as in thesauri, are not subtype relations.




           world
      . . . ..           .
        .     .      .
   . . .. . .. . . .
. . . 10 . . 3 . . .
          . 6 .            . .
 .     . . . .. . 4 . .. .
  . . . . .. . . . . .
    .       .
         . . . .. .
                                       Jarrar © 2012                                  8
Arabic Ontology: Subtype Relationships

• It is recommended to use proper subtypes, as it is more strict.
• That is, A and B are never equal, B is always a super set of A.
• It is recommended to classify concepts based on “rigidity”.
• For example it is wrong to say that a „WorkTable‟ is type of „Table‟.
  as being a work table is a non-rigid property.
• As such, subtypes form a tree.




                                  Jarrar © 2012                           9
Arabic Ontology: Core (Top Levels).

     Arabic Core Ontology: the top levels of the Arabic Ontology, - built
     manually based on DOLCE and SUMO upper level ontologies, and
     taking into account, carefully, the philosophical and historical aspects of
     the Arabic conceptsterms.
          Top 3 levels shown here, for simplicity




                                                                     10 levels, 550 concepts
• The 10th level of this core ontology should top all Arabic concepts and levels.
• This allow us to detect any problems in the tree/relations!
• The core Ontology governs the correctness and the evolution of the whole
  Arabic Ontology.
                                                    Jarrar © 2012                              10
Arabic Ontology: Glosses
               according to strict ontological guidelines[J06]

     A gloss: is an auxiliary informal (but controlled) account of the intended
     meaning of a linguistic term, for the commonsense perception of humans.




A gloss is supposed to render factual knowledge that is critical to understand a concept, but that
e.g. is implausible, unreasonable, or very difficult to formalize and/or articulate explicitly. (NOT) to
catalogue general information and comments, as e.g. conventional dictionaries and encyclopedias
usually do, or as <rdfs:comment>.               Jarrar © 2012                                          11
Arabic Ontology: Gloss Guidelines

What should and what should not be provided in a gloss:
1. Start with the principal/super type of the concept being defined.
   E.g. „Search engine‟: “A computer program that …”, „Invoice‟: “A business document
         that…”, „University‟: “An institution of …”.


3. Focus on distinguishing characteristics and intrinsic prosperities that
   differentiate the concept out of other concepts.
   E.g. Compare, „Laptop computer‟:
   “A computer that is designed to do pretty much anything a desktop computer can do, it runs for a
        short time (usually two to five hours) on batteries”.
   “A portable computer small enough to use in your lap…”.


2. Written in a form of propositions, offering the reader inferential
  knowledge that help him to construct the image of the concept.
   E.g. Compare „Search engine‟:
   “A computer program for searching the internet, it can be defined as one of the most useful aspects
        of the World Wide Web. Some of the major ones are Google, ….”;
   A computer program that enables users to search and retrieves documents or data from a database
        or from a computer network…”.
                                        Jarrar © 2012                                               12
Arabic Ontology: Gloss Guidelines

4. Use supportive examples :
  - To clarify cases that are commonly known to be false but they are true, or
    that are known to be true but they are false;

  - To strengthen and illustrate distinguishing characteristics (e.g. define by
    examples, counter-examples).

  Examples can be types and/or instances of the concept being defined.


5. Be consistent with formal definitions/axioms.


6. Be sufficient, clear, and easy to understand.


 WordNet glosses do not follow such ontological guidelines

                                 Jarrar © 2012                                    13
Arabic Ontology: Gloss Guidelines

As a gloss starts with a supertype of concept being defined, try to read
the gloss as the following, to verify what you do is correct:




                               Jarrar © 2012                               14
ArabicOntology Vs WordNet

Unlike WordNet, the Arabic Ontology is:

    1. Philosophically well founded:
        • Focuses on intrinsic properties;
        • All types are rigid;
        • The top level is derived from known Top Level Ontologies.

    2. Strictly formal:
         • Semantic relations are well-defined mathematical relations.

    3. Strictly-controlled glosses
        • The content and structure of the glosses is strictly based on
           ontological principles.




                                Jarrar © 2012                             15
Our Approach to Building the
      ArabicOntology
Roughly:

Step1:
 Mine Arabic concepts/glosses from dictionaries.

Step 2:
 Automatically map between these Arabic concepts and WordNet
 concepts, thus inherit semantic relations from WordNet.

Step 3:
 Link all concepts with the Arabic Core Ontology.

Step 4:
Re-formulate these glosses, according to strict ontological guidelines.




                               Jarrar © 2012                              16
Step1-Mining Arabic Concepts from
     Dictionaries
• Collect as much glosses/concepts as possible from specialized and
  general dictionaries.
• Manual extraction from dictionaries, then basic cleaning done
  automatically.



              Mining
              concepts



  • 75k glosses ready.
  • We have ~100 students typing dictionaries now!
  • +150K more glosses (expected this year)


                             Jarrar © 2012                            17
Step1-Mining Arabic Concepts from
     Dictionaries
• Collect as much glosses/concepts as possible from specialized and
  general dictionaries.
• Manual extraction from dictionaries, then basic cleaning done
  automatically.



              Mining
              concepts




                             Jarrar © 2012                            18
Step1-Mining Arabic Concepts from
      Dictionaries
• Most Arabic dictionaries are not useful, but some are a good start.
                               The dictionaries we need should:
                                  Focus on the semantic aspects.
                                  Multiple meanings are not mixed up.
                                  Structure of quality of the meaning.


               Mining
               concepts




                               Jarrar © 2012                            19
Examples (Good & Bad Resources)

    Wiktionary 
                                 
        
                                       
                       
    


                   
                                           

                        




                       Jarrar © 2012           20
The Matching Function is used for:

 1- Based on the previous mapping, we can inherit Semantic Relations
    from WordNet.
                       Arabic Concepts                    WordNet Concepts

                                 A
                                                                    L
                                     H
                        J                                       B
                                                                    R
                                 C                          D              Q


 Remark: This is only a good start, as these inherited relations need to be
   cleaned using the Arabic Top Levels, and using the OnToClean Methodology.

2- Same function is used to detect redundant concepts, within the
   Arabic Ontology itself.
                                  Jarrar © 2012                                21
Step2: Map Arabic concepts to WordNet
                   (Matching Function)


We developed a smart algorithm, such that:
     Input: (Arabic gloss, 117k English glosses in WordNet).
   Output: (best match, rank)
 Accuracy: +90% (being improved)
                                               WordNet (English)
                                              The territory occupied by one of the constituent
                                              administrative districts of a nation
                                               The way something is with respect to its main
                                              attributes
                                               The group of people comprising the government
                                              of a sovereign state
                                               A politically organized body of people under a
                                              single government
                                               A compilation of the known facts regarding
                                              something or someone
                                                                    ….




                   A politically organized
                   body of people under a
                   single government
                              Jarrar © 2012                                                      22
Step 3: Link concepts with the Arabic Core
     Ontology
Each Arabic concept (from previous steps) is mapped to a concept in the
10th level.
That is, the 10th level of this core ontology should top all Arabic concepts
and levels, so to enable automatic detection of problems in the hierarchy.

        Top 3 levels shown here, for simplicity




                                                     A        C
                                          J
                                                                  J
                                                  Jarrar © 2012           23
Automatic Detection of Inconsistencies
 Subtype links from Arabic concepts to the core ontology (done manually)
Subtypes links between Arabic concepts (derived via the mappings to WordNet)
Now we can automatically detect whether the links                         are correct?
 If (J A) and (A                  ) then it‟s most likely true that (J
       ) , thus no need to have (J               ).
 However, as H and C don‟t share a supertype, (H C) is likely incorrect.
       Top 3 levels shown here, for simplicity




                                                 X!
                                                      A      C   X!
                                      J
                                                 Jarrar © 2012
                                                                      H                  24
Step 4- Re-Formulate Glosses,
      according to strict ontological guidelines[J06]




Glosses are re-formulated semi-manually, to meet our strict rules.

Gloss-cleaning can be done automatically to a certain point.

 While the manual-cleaning (=re-formulating) glosses, mistakes in
  subtype relation can be detected.




                               Jarrar © 2012                         25
Outline

  • Arabic Ontology Project

  • Design principles

  • Methodology and progress

  • Ongoing Research

     • Matching Function

     • The Top levels

     • Finding Arabic Synonyms


                  Jarrar © 2012   26
Why the Matching Function

Arabic concepts                                                     WordNet:
without subtype    Arabic Concepts            WordNet Concepts      English concepts
links   between                                                     with      subtype
them                                                                relations between
                             A                                      them
                                                          L
                                 H
                    J                                 B
                                                          R
                             C                   D            Q



    Assumption: If we could know that the (English concept B = Arabic concept A)
    and (English concept D = Arabic concept C), and (D SubtypeOf B) then we can
    automatically deduce that (C SubtypeOf A).

    Research Problem: given an Arabic concept, automatically find its equivalent
    English concept (if exists).

                                      Jarrar © 2012                                27
The Matching Function
     (Map Arabic concepts to WordNet)

A smart algorithm that maps a gloss of a concept written in Arabic into
its English equivalent in WordNet, such that:
       Input: (Arabic gloss, 117k English glosses in WordNet).
     Output: (best match, rank)

                                                WordNet (English)
                                               The territory occupied by one of the constituent
                                               administrative districts of a nation
                                                The way something is with respect to its main
                                               attributes
                                                The group of people comprising the government
                                               of a sovereign state
                                                A politically organized body of people under a
                                               single government
                                                A compilation of the known facts regarding
                                               something or someone
                                                                     ….




                   Country
                    A politically organized
                    body of people under a
                    single government
                               Jarrar © 2012                                                      28
Matching Function: Main Steps


Translate the Arabic gloss into English (using Google or Bing)



Find the minimal set of English Concepts(Synsets) that contains
the best match for an Arabic concept (i.e. Search Domain).


Rank the concepts in the search domain (according to how
much they are close to the Arabic Gloss)



 The best match of an Arabic concept is the English concept
  with the highest score.

                           Jarrar © 2012                          29
Translating Arabic Glosses into English

                                      A country with defined
                                      borders, the people and
                                      Government and institutions


 This step is done easily, as Google and Bing provide APIs for
  automatic translation.

 The translation is not always good, but satisfactory. It depends on
  the quality of the gloss being translated.




                              Jarrar © 2012                             30
Determine the Search Domain

  Matching the translated gloss into an English gloss among (117k gloss)
    is:
         Time consuming.
         The algorithm might be misled as the many English glosses.

  Instead, Find the minimal set of English Concepts(Synsets) that
      contains the best match for an Arabic concept (i.e. Search Domain).

   Translate the word “ ” into English using the Google and Bing
  {state, country, nation, polity, land } =


Now, Search Domain = the set of all glosses of this set words + two
levels up and to levels down.




                                Jarrar © 2012                           31
Determine the Search Domain
                                                 Search Domain
                           The territory occupied by one of the constituent
                           administrative districts of a nation




                                                                                           (~2000 glosses)
                            The way something is with respect to its main attributes
        English Terms       The group of people comprising the government of a sovereign
                           state
        country            A politically organized body of people under a single
                           government
        state               A compilation of the known facts regarding something or
                           someone
        land
                            Put before
        nation
                            A state of depression or agitation
        polity
                            The territory occupied by a nation
        earth
                            Express in words


                                                          …

 Our translated target gloss should be in in this search
  domain
                              Jarrar © 2012                                                32
Determine the Search Domain
                                                          Search Domain
                                     The territory occupied by one of the constituent
                                     administrative districts of a nation




                                                                                                     (~2000 glosses)
                                     The way something is with respect to its main attributes
              English Terms           The group of people comprising the government of a sovereign
                                     state
              country                A politically organized body of people under a single
                                     government
              state                   A compilation of the known facts regarding something or
                                     someone
              land
                                     Put before
              nation
                                     A state of depression or agitation
              polity
                                     The territory occupied by a nation
              earth
                                     Express in words



• How many levels to go for?
                                                                   …
    – Too many levels increases the number of irrelevant synsets.
    – Too few levels lowers the possibility of finding the best match.
• We need to find the minimal set of synsets that contains the best match
  for the Arabic concept.       Jarrar © 2012                             33
The Ranking Step

                                                Search Domain
                          The territory occupied by one of the constituent
                          administrative districts of a nation




                                                                                          (~2000 glosses)
                           The way something is with respect to its main attributes
        English Terms      The group of people comprising the government of a sovereign
                          state
        country           A politically organized body of people under a single
                          government
        state              A compilation of the known facts regarding something or
                          someone
        land
                           Put before
        nation
                           A state of depression or agitation
        polity
                           The territory occupied by a nation
        earth
                           Express in words


                                                         …
Rank the concepts in the search domain (according to how
much they are close to the Arabic Gloss)

                             Jarrar © 2012                                                34
The Ranking Step

Rank the concepts in the search domain (according to how
much they are close to the Arabic Gloss)
      The ranking step is divided into several steps
 Convert the translated gloss into an Array of words: original
 words, their Synonyms, super/subtypes.

 Give a weight for each word depending on its type. (Synonym,
 direct Subtype, direct Supertype, second level Supertype)

 Match the words in each concept in the search domain with the
 word in the array.

 For each hit, add the weight of the word to the synset score.
    The synset with the highest score is the best match.
                              Jarrar © 2012                      35
Convert the translated gloss into an Array of words
                                                 Array
      A country with defined                     country
      borders, the people                        defined
                                                                    1-m
      and Government and                                            Original words
                                                 …
      institutions                               Body politic
                                                                    m-n
                                                 Commonwheath
                                                                    Synonyms of
                   Array                         …                  Original words
  Stop words
                   country        1              People
  are removed                                                       n-o
                   defined                       Political entity
                                                                    subtypes of
                                                 …                  Original words
                   borders
                                                 Asian country
                   people                                           o-p
                                                 City state
                   Government                                       subtypes of
                                                 …                  Original words
                   institutions   m              countries
                                                 defines            p-q
The array 1-q might be 1500 words or             defining           stems of all
  more. the idea is to have the relevant         …                  words (1-p)
  words that one may to describe the
  concept being matched.         Jarrar © 2012                                 36
Convert the translated gloss into an Array of words
                                                Array
                                                country
                                                                   1-m
                                                defined
                                                                   Original words
                                                …                  weight=a
                                                Body politic
Give a weight for each word depending on        Commonwheath
                                                                   m-n
its type. (Synonym, direct Subtype, direct                         Synonyms of
                                                …                  Original words
Supertype, second level Supertype)              People             weight=b
                                                                   n-o
                                                Political entity
                                                                   subtypes of
                                                …                  Original words
                                                Asian country       weight=c
                                                                   o-p
                                                City state
                                                                   subtypes of
                                                …                  Original words
                                                countries          weight=d
                                                defines            p-q
                                                defining           stems of all
                                                …                  words (1-p)
                                                                   weight=e

                                Jarrar © 2012                                 37
Convert the translated gloss into an Array of words
                                                                                        Array
              Match the words in each concept in the                                    country
              search domain with the word in the array.                                 defined
                                                                                                           1-m
                                                                                                           Original words
                                                                                        …                  weight=a
                                Search Domain                                           Body politic
The territory occupied by one of the constituent administrative districts of a nation
                                                                                                           m-n
                                                                                        Commonwheath
                                                                                                           Synonyms of
The way something is with respect to its main attributes
                                                                                        …                  Original words
The group of people comprising the government of a sovereign state
                                                                                        People             weight=b
A politically organized body of people under a single government                                           n-o
                                                                                        Political entity
 A compilation of the known facts regarding something or someone                                           subtypes of
Put before
                                                                                        …                  Original words
                                                                                        Asian country       weight=c
A state of depression or agitation
                                                                                                           o-p
The territory occupied by a nation                                                      City state
                                                                                                           subtypes of
Express in words                                                                        …                  Original words
                                          …                                             countries          weight=d
                                                                                        defines            p-q
                                                                                        defining           stems of all
                                                                                        …                  words (1-p)
 For each hit, add the weight of the word to the array.                                                    weight=e

                                                                        Jarrar © 2012                                 38
Ordered Search Domain

 The synset with the highest score is the best match.
                               Search Domain                                          rank
     The territory occupied by one of the constituent administrative districts of a
                                                                                       90
     nation
     The way something is with respect to its main attributes                          89
     The group of people comprising the government of a sovereign state                89
     A politically organized body of people under a single government                  78
      A compilation of the known facts regarding something or someone                  70
     Put before                                                                        65
     A state of depression or agitation                                                50
     The territory occupied by a nation                                                20
     Express in words                                                                  0

                                          …




                                                     Jarrar © 2012                           39
Ordered Search Domain

    The synset with the highest score is the best match.
                                  Search Domain                                          rank
        The territory occupied by one of the constituent administrative districts of a
                                                                                          90
        nation
         The way something is with respect to its main attributes                         89
         The group of people comprising the government of a sovereign state               89
         A politically organized body of people under a single government                 78
         A compilation of the known facts regarding something or someone                  70
         Put before                                                                       65
         A state of depression or agitation                                               50
         The territory occupied by a nation                                               20
         Express in words                                                                 0

                                              …

 The results (until this step) were not too bad, but we added an extra
step, called “Rank x Centrality” to improve the accuracy.


                                                        Jarrar © 2012                           40
The Rank X Centrality Step (Rank+)

   Rebuild the subtype links between concepts
          Search Domain                      rank
                                                            50            65
 The territory occupied by one of the                                                          P        S
 constituent administrative districts of a   90             Q                 A
 nation                                                                               90
  The way something is with respect to its
                                             89                      70
 main attributes                                        T                             B
  The group of people comprising the                                 W                              Q
 government of a sovereign state             89
  A politically organized body of people
                                             78                           20              89
 under a single government                          F       U
  A compilation of the known facts                               0        R           E
 regarding something or someone              70
                                                                 O            78               89
 Put before                                  65
 A state of depression or agitation          50                                   G            J
 The territory occupied by a nation          20
 Express in words                             0
                     …

R: the rank computed in the previous step
C: the centrality C of each concept in its tree, how many nodes above/under
Rank+ = R x C (the idea is to use centrality to compromise ranks that are
close to each other)             Jarrar © 2012                            41
The Experiment




Exp N°           N° of glosses        Match percentage
                 160 gloss            More than 90%
1st experiment
                 1000 gloss           In progress…
2 experiment
 nd




                      Jarrar © 2012                      42
Ongoing Research (Use Machine learning and Neural networks)
                                                    By Mohammed Mellhem



       Static weights                              Dynamic weights

  For each word found, a
 weight is given as follows
Type                    Weight             Learning to find the best
                        Value
                                           weights and levels to expand
Keyword 1st 3 words               3
                                           (using neural networks).
Keyword 4th and above             1

Super-type                       0.7        The algorithm tries to find the
Sub-Type                         0.4       maximum level of expansion that
Synonym                          0.3       can help to find the best match
Sub-word                         0.2

Bigram (not-imp.)                1.2




                                   Jarrar © 2012                               43
Progress so far

• Considering three parts:
    1.   Find best depth for Search Space expansion
    2.   Find best depth for Query expansion
    3.   Find best weights for best matching


•    For point 1 and 2; the algorithm try to find the maximum level of
     expansion that can help to find best match.

•    For Point 3; the algorithm is still in the learning process, initial
     results shows that in more than 85% of 162 processed till now,
     found by concept keyword or keyword stem.

•    Improve performance by loading matching data on demand.




                               Jarrar © 2012                                44
Outline

  • Arabic Ontology Project

  • Design principles

  • Methodology and progress

  • Ongoing Research

     • The Matching Function

     • The Top Levels

     • Finding Arabic Synonyms


                  Jarrar © 2012   45
Arabic Ontology: Core (Top Levels).

     What are the top 10 levels of the ArabicOntology? And how to build it?


          Top 3 levels shown here, for simplicity




                                                                     10 levels, 550 concepts
• The 10th level of this core ontology should top all Arabic concepts and levels.
• This allow us to detect any problems in the tree/relations!
• The core Ontology governs the correctness and the evolution of the whole
  Arabic Ontology.
                                                    Jarrar © 2012                              46
Arabic Ontology: Core (Top Levels).

       What are the top 10 levels of the ArabicOntology? And how to build it?

                                               Methodology
                                               (by Rana Rashmawi)
             Top 3 levels shown here, for simplicity




                                                                         10 levels, 550 concepts
    Arabic Core Ontology: the top levels of the Arabic Ontology, - built manually
    based on DOLCE and SUMO upper level ontologies, and taking into
    account, carefully, the philosophical and historical aspects of the Arabic
    conceptsterms.

    Phase 1. Determining the top (Core) Arabic concepts.

    Phase 2. Constructing the Top Levels.

•   Phase 3. Verifyingthis core ontology should top all Arabic concepts and levels.
    The 10th level of the correctness and completeness of the top levels.
•   This allow us to detect any problems in the tree/relations!
•   The core Ontology governs the correctness and the evolution of the whole
    Arabic Ontology.
                                                       Jarrar © 2012                               47
Phase 1. Determining the top (Core) Arabic
              concepts
1.     Both SUMO & DOLCE were translated to Arabic separately.
       Where each English upper term was mapped to (3-5) Arabic
       Concepts, based on the relevance of the meaning.
         The result of this step was a pool of 1200 Arabic terms, that
          necessarily demonstrate the upper terms of the Arabic Language.

≈         SUMO Terms
          Entity                                Arabic Terms   ≈
          Abstract
          Physical
          Attribute
          …


= 80     DOLCE Terms
         Entity
         Abstract
         Object
         Event
         …                      Jarrar © 2012                               48
Phase 1. Determining the top (Core) Arabic
              concepts
2. We investigated all the meanings for each Arabic term in the resulted
   pool of terms, using Arabic Lexicons. Allowing some editing and
   reforming of the glosses following strict Ontological guidelines.

    The result of this step was a larger pool of glosses (about 6000 glosses)
     of the most general and comprehensive concepts in the Arabic Language.

                                               Arabic Meaning   ≈
Arabic Terms   ≈
                          …




                                                                           .


                                   .


                          …    Jarrar © 2012                                49
Phase 1. Determining the top (Core) Arabic
                  concepts
   3. The last step in this phase was to choose one Arabic concept for
      each English concept for both SUMO & DOLCE.
         This step could be considered as a process of semantic mapping
          between English and Arabic Concepts.

Arabic Terms                                     Arabic Meaning
                          …




                                                                           .


                                   .


                          …


                                 Jarrar © 2012                                 50
Phase 2. Constructing the Top Levels

           • After determining the Arabic upper (core) Concepts that correspond
             to the English Upper Concepts from both SUMO & DOLCE. We
             derived the relations between these Arabic concepts depending on
             the existing relations in both foreign Ontologies. As both of them
             depend on the (sub/super-type relation) in their structure.

                       Entity




Abstract    Object   Info. Entity   Event   Quality




           • The same methodology was used to extract the semantic relations
             from both DOLCE & SUMO to complete all the lower levels, up until
             the 10th.


                                            Jarrar © 2012                     51
Phase 3. Verifying the correctness and
             completeness of the top levels



• A manual mapping of 6000 Arabic meaning was
  done to verify the top levels.




                    Jarrar © 2012               52
Outline

  • Arabic Ontology Project

  • Design principles

  • Methodology and progress

  • Ongoing Research

     • The Matching Function

     • The Top Levels

     • Finding Arabic Synonyms


                  Jarrar © 2012   53
Further/Ongoing Research

   Given many Arabic-English, Arabic-French, Arabic-Italian dictionaries

   Can we derive an Arabic-Arabic thesaurus? For example



   Then Categorize very-related words (maybe using WordNet) as the
    following:




This will help finding possible Arabic synsets, which help detecting
 possible subtype relations and/or validate the existing relations.



                                 Jarrar © 2012                          54
Table
 sort            ‫رتت‬      tidy          ‫أنيق‬
 arrange         ‫رتت‬      tidy         ‫ضخم‬
 order           ‫رتت‬      tidy        ‫نظيف‬
 set             ‫رتت‬      tidy       ‫منهجي‬
 tidy            ‫رتت‬      tidy         ‫منظم‬
 pack            ‫رتت‬      tidy         ‫مهندم‬
 shape           ‫رتت‬      pack        ‫حشمخ‬
 form            ‫رتت‬      pack         ‫رسمخ‬
 sort              ‫ نىع‬pack           ‫وضت‬
 sort             ‫ شكم‬pack             ‫تكىم‬
 sort            ‫ ضزة‬pack               ‫حشب‬
 sort         ‫ هذا اننىع‬shape          ‫شكم‬
 sort            ‫ صنف‬shape             ‫حبنخ‬
 sort            ‫ طزيقخ‬shape            ‫هيئخ‬
 arrange            ‫ نظم‬shape         ‫مظهز‬
 arrange           ‫ اتخذ‬shape          ‫تجسد‬
 arrange   ‫ سىي انخالف‬form             ‫شكم‬
 arrange           ‫ ػدل‬form          ‫استمبرح‬
 order           ‫ اننظبم‬form         ‫صىرح‬
 order           ‫ تزتيت‬form             ‫نىع‬
 order              ‫ أمز‬form            ‫هيئخ‬
 set           ‫ مجمىػخ‬form            ‫صيغخ‬
 set             ‫ وضغ‬tune              ‫رتت‬
 set              ‫ ضجط‬form            ‫ضزة‬
 set                 ‫2102 © ثدأ‬
                    Jarrarorganize     ‫رتت‬     55
Level 1




          Jarrar © 2012   56
Level 2




          Jarrar © 2012   57
Level 3




          Jarrar © 2012   58
Level 4




          Jarrar © 2012   59
Cycle




        Jarrar © 2012   60

More Related Content

Similar to Jarrar.lecture notes.arabicontology

Using automated lexical resources in arabic sentence subjectivity
Using automated lexical resources in arabic sentence subjectivityUsing automated lexical resources in arabic sentence subjectivity
Using automated lexical resources in arabic sentence subjectivityijaia
 
USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITY
USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITYUSING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITY
USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITYijaia
 
Jarrar.lecture notes.lexicalsemanticsandmultilingualism
Jarrar.lecture notes.lexicalsemanticsandmultilingualismJarrar.lecture notes.lexicalsemanticsandmultilingualism
Jarrar.lecture notes.lexicalsemanticsandmultilingualismBayan Maali
 
Pal gov.tutorial4.session8 2.stepwisemethodologies
Pal gov.tutorial4.session8 2.stepwisemethodologiesPal gov.tutorial4.session8 2.stepwisemethodologies
Pal gov.tutorial4.session8 2.stepwisemethodologiesMustafa Jarrar
 
Principles of parameters
Principles of parametersPrinciples of parameters
Principles of parametersVelnar
 
A Comparative Study of Ontology building Tools in Semantic Web Applications
A Comparative Study of Ontology building Tools in Semantic Web Applications A Comparative Study of Ontology building Tools in Semantic Web Applications
A Comparative Study of Ontology building Tools in Semantic Web Applications dannyijwest
 
A Comparative Study Ontology Building Tools for Semantic Web Applications
A Comparative Study Ontology Building Tools for Semantic Web Applications A Comparative Study Ontology Building Tools for Semantic Web Applications
A Comparative Study Ontology Building Tools for Semantic Web Applications IJwest
 
A Comparative Study Ontology Building Tools for Semantic Web Applications
A Comparative Study Ontology Building Tools for Semantic Web Applications A Comparative Study Ontology Building Tools for Semantic Web Applications
A Comparative Study Ontology Building Tools for Semantic Web Applications dannyijwest
 
02 ai-one - content analytics business cases
02 ai-one - content analytics business cases02 ai-one - content analytics business cases
02 ai-one - content analytics business casesdiggelmann
 
Different valuable tools for Arabic sentiment analysis: a comparative evaluat...
Different valuable tools for Arabic sentiment analysis: a comparative evaluat...Different valuable tools for Arabic sentiment analysis: a comparative evaluat...
Different valuable tools for Arabic sentiment analysis: a comparative evaluat...IJECEIAES
 
Taxonomy, ontology, folksonomies & SKOS.
Taxonomy, ontology, folksonomies & SKOS.Taxonomy, ontology, folksonomies & SKOS.
Taxonomy, ontology, folksonomies & SKOS.Janet Leu
 
Amharic WSD using WordNet
Amharic WSD using WordNetAmharic WSD using WordNet
Amharic WSD using WordNetSeid Hassen
 
Building an Ontology in Educational Domain Case Study for the University of P...
Building an Ontology in Educational Domain Case Study for the University of P...Building an Ontology in Educational Domain Case Study for the University of P...
Building an Ontology in Educational Domain Case Study for the University of P...IJRES Journal
 
Intro to oop.pptx
Intro to oop.pptxIntro to oop.pptx
Intro to oop.pptxUmerUmer25
 
Updated concordancer
Updated concordancerUpdated concordancer
Updated concordancerbenhemsen
 
Concordancer
ConcordancerConcordancer
Concordancerbenhemsen
 

Similar to Jarrar.lecture notes.arabicontology (20)

Using automated lexical resources in arabic sentence subjectivity
Using automated lexical resources in arabic sentence subjectivityUsing automated lexical resources in arabic sentence subjectivity
Using automated lexical resources in arabic sentence subjectivity
 
Patterns of Value
Patterns of ValuePatterns of Value
Patterns of Value
 
USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITY
USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITYUSING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITY
USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITY
 
Jarrar.lecture notes.lexicalsemanticsandmultilingualism
Jarrar.lecture notes.lexicalsemanticsandmultilingualismJarrar.lecture notes.lexicalsemanticsandmultilingualism
Jarrar.lecture notes.lexicalsemanticsandmultilingualism
 
Tutorial 1-Ontologies
Tutorial 1-OntologiesTutorial 1-Ontologies
Tutorial 1-Ontologies
 
Pal gov.tutorial4.session8 2.stepwisemethodologies
Pal gov.tutorial4.session8 2.stepwisemethodologiesPal gov.tutorial4.session8 2.stepwisemethodologies
Pal gov.tutorial4.session8 2.stepwisemethodologies
 
Principles of parameters
Principles of parametersPrinciples of parameters
Principles of parameters
 
A Comparative Study of Ontology building Tools in Semantic Web Applications
A Comparative Study of Ontology building Tools in Semantic Web Applications A Comparative Study of Ontology building Tools in Semantic Web Applications
A Comparative Study of Ontology building Tools in Semantic Web Applications
 
A Comparative Study Ontology Building Tools for Semantic Web Applications
A Comparative Study Ontology Building Tools for Semantic Web Applications A Comparative Study Ontology Building Tools for Semantic Web Applications
A Comparative Study Ontology Building Tools for Semantic Web Applications
 
A Comparative Study Ontology Building Tools for Semantic Web Applications
A Comparative Study Ontology Building Tools for Semantic Web Applications A Comparative Study Ontology Building Tools for Semantic Web Applications
A Comparative Study Ontology Building Tools for Semantic Web Applications
 
02 ai-one - content analytics business cases
02 ai-one - content analytics business cases02 ai-one - content analytics business cases
02 ai-one - content analytics business cases
 
Different valuable tools for Arabic sentiment analysis: a comparative evaluat...
Different valuable tools for Arabic sentiment analysis: a comparative evaluat...Different valuable tools for Arabic sentiment analysis: a comparative evaluat...
Different valuable tools for Arabic sentiment analysis: a comparative evaluat...
 
Taxonomy, ontology, folksonomies & SKOS.
Taxonomy, ontology, folksonomies & SKOS.Taxonomy, ontology, folksonomies & SKOS.
Taxonomy, ontology, folksonomies & SKOS.
 
Amharic WSD using WordNet
Amharic WSD using WordNetAmharic WSD using WordNet
Amharic WSD using WordNet
 
NLP todo
NLP todoNLP todo
NLP todo
 
Building an Ontology in Educational Domain Case Study for the University of P...
Building an Ontology in Educational Domain Case Study for the University of P...Building an Ontology in Educational Domain Case Study for the University of P...
Building an Ontology in Educational Domain Case Study for the University of P...
 
Intro to oop.pptx
Intro to oop.pptxIntro to oop.pptx
Intro to oop.pptx
 
oops.pdf
oops.pdfoops.pdf
oops.pdf
 
Updated concordancer
Updated concordancerUpdated concordancer
Updated concordancer
 
Concordancer
ConcordancerConcordancer
Concordancer
 

Recently uploaded

Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 

Recently uploaded (20)

Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 

Jarrar.lecture notes.arabicontology

  • 1. Lecture Notes Birzeit University, Palestine 2012 Advanced Topics in Ontology Engineering Arabic Ontology Dr. Mustafa Jarrar Sina Institute, University of Birzeit mjarrar@birzeit.edu www.jarrar.info Jarrar © 2012 1
  • 2. Reading Mustafa Jarrar: Building A Formal Arabic Ontology (Invited Paper) . In proceedings of the Experts Meeting On Arabic Ontologies And Semantic Networks. Alecso, Arab League. Tunis, July 26-28, 2011.Article http://www.jarrar.info/publications/J11.pdf.htm Slides: http://mjarrar.blogspot.com/2011/08/building-formal-arabic-ontology-invited.html Mustafa Jarrar: Towards The Notion Of Gloss, And The Adoption Of Linguistic Resources In Formal Ontology Engineering. In proceedings of the 15th International World Wide Web Conference (WWW2006). Edinburgh, Scotland. Pages 497-503. ACM Press. ISBN: 1595933239. May 2006.http://www.jarrar.info/publications/J06.pdf.htm Jarrar © 2012 2
  • 3. Arabic Ontology Team (Current) Fateh Badran Collaborators Maather Alkqam Jamal Daher Rabaa Yusuf Prof. Mahi Arrar Razan Marwan Haneen Somoom Naser Musleh Jarrar © 2012 3
  • 4. The Arabic Ontology Project http://sites.birzeit.edu/comp/ArabicOntology • A project started in 2010, at Birzeit University, Palestine. • The ArabicOntology is more than an Arabic WordNet • Unlike WordNet, the ArabicOntology is logically and philosophically well- founded, as it follows strict ontological principles.  but can be used an Arabic WordNet.  The project is partially funded (Seed funding) by Birzeit University (VP academic Office, Research Committee). Jarrar © 2012 4
  • 5. Arabic Ontology: Data Model (Simplified) • ConceptID (as a synsetID in WordNet) to identify a concept. • Polysemy and synonymy: like in WordNet, several words (i.e., lexical units) can be used to lexicalize one concept (synonymy); and one word might be used to lexicalize several concepts. Lexical Unit Gloss: describes a concept Semantic Relations Concept ID: concept unique reference Jarrar © 2012 5
  • 6. Lexical vs. Semantic Relationships • Semantic relations are relationships between concepts (not words), e.g., subtype, part-of, etc. • Lexical relations are relationships between words (not concepts), e.g., synonym-of, root-of, abbreviation-of, etc. • Ontologies are mainly concerned with semantic relations. Lexical Unit Gloss: describes a concept Semantic Relations Concept ID: concept unique reference Jarrar © 2012 6
  • 7. Arabic Ontology • Arabic Ontology: the set of concepts (of all Arabic terms), and the semantic (not lexical) relationships between these concepts. • To build an Arabic Ontology: Identify the set of concepts for every Arabic word (Polysemy), and define semantic relations between these concepts. • Most important relation is the subtype relation, which leads to a (tree of concepts) . Jarrar © 2012 7
  • 8. Arabic Ontology: Subtype Relationships • Subtype relation: is a mathematical relations (subset: A B ), such that every instance in A must also be an instance of B. • Inheritance: subtypes inherit all properties of their super types. • “Hyponymy” in WordNet is close to (but not the same as) the subtype relation. • “General-Specific” relations, as in thesauri, are not subtype relations. world . . . .. . . . . . . .. . .. . . . . . . 10 . . 3 . . . . 6 . . . . . . . .. . 4 . .. . . . . . .. . . . . . . . . . . .. . Jarrar © 2012 8
  • 9. Arabic Ontology: Subtype Relationships • It is recommended to use proper subtypes, as it is more strict. • That is, A and B are never equal, B is always a super set of A. • It is recommended to classify concepts based on “rigidity”. • For example it is wrong to say that a „WorkTable‟ is type of „Table‟. as being a work table is a non-rigid property. • As such, subtypes form a tree. Jarrar © 2012 9
  • 10. Arabic Ontology: Core (Top Levels). Arabic Core Ontology: the top levels of the Arabic Ontology, - built manually based on DOLCE and SUMO upper level ontologies, and taking into account, carefully, the philosophical and historical aspects of the Arabic conceptsterms. Top 3 levels shown here, for simplicity 10 levels, 550 concepts • The 10th level of this core ontology should top all Arabic concepts and levels. • This allow us to detect any problems in the tree/relations! • The core Ontology governs the correctness and the evolution of the whole Arabic Ontology. Jarrar © 2012 10
  • 11. Arabic Ontology: Glosses according to strict ontological guidelines[J06] A gloss: is an auxiliary informal (but controlled) account of the intended meaning of a linguistic term, for the commonsense perception of humans. A gloss is supposed to render factual knowledge that is critical to understand a concept, but that e.g. is implausible, unreasonable, or very difficult to formalize and/or articulate explicitly. (NOT) to catalogue general information and comments, as e.g. conventional dictionaries and encyclopedias usually do, or as <rdfs:comment>. Jarrar © 2012 11
  • 12. Arabic Ontology: Gloss Guidelines What should and what should not be provided in a gloss: 1. Start with the principal/super type of the concept being defined. E.g. „Search engine‟: “A computer program that …”, „Invoice‟: “A business document that…”, „University‟: “An institution of …”. 3. Focus on distinguishing characteristics and intrinsic prosperities that differentiate the concept out of other concepts. E.g. Compare, „Laptop computer‟: “A computer that is designed to do pretty much anything a desktop computer can do, it runs for a short time (usually two to five hours) on batteries”. “A portable computer small enough to use in your lap…”. 2. Written in a form of propositions, offering the reader inferential knowledge that help him to construct the image of the concept. E.g. Compare „Search engine‟: “A computer program for searching the internet, it can be defined as one of the most useful aspects of the World Wide Web. Some of the major ones are Google, ….”; A computer program that enables users to search and retrieves documents or data from a database or from a computer network…”. Jarrar © 2012 12
  • 13. Arabic Ontology: Gloss Guidelines 4. Use supportive examples : - To clarify cases that are commonly known to be false but they are true, or that are known to be true but they are false; - To strengthen and illustrate distinguishing characteristics (e.g. define by examples, counter-examples). Examples can be types and/or instances of the concept being defined. 5. Be consistent with formal definitions/axioms. 6. Be sufficient, clear, and easy to understand.  WordNet glosses do not follow such ontological guidelines Jarrar © 2012 13
  • 14. Arabic Ontology: Gloss Guidelines As a gloss starts with a supertype of concept being defined, try to read the gloss as the following, to verify what you do is correct: Jarrar © 2012 14
  • 15. ArabicOntology Vs WordNet Unlike WordNet, the Arabic Ontology is: 1. Philosophically well founded: • Focuses on intrinsic properties; • All types are rigid; • The top level is derived from known Top Level Ontologies. 2. Strictly formal: • Semantic relations are well-defined mathematical relations. 3. Strictly-controlled glosses • The content and structure of the glosses is strictly based on ontological principles. Jarrar © 2012 15
  • 16. Our Approach to Building the ArabicOntology Roughly: Step1: Mine Arabic concepts/glosses from dictionaries. Step 2: Automatically map between these Arabic concepts and WordNet concepts, thus inherit semantic relations from WordNet. Step 3: Link all concepts with the Arabic Core Ontology. Step 4: Re-formulate these glosses, according to strict ontological guidelines. Jarrar © 2012 16
  • 17. Step1-Mining Arabic Concepts from Dictionaries • Collect as much glosses/concepts as possible from specialized and general dictionaries. • Manual extraction from dictionaries, then basic cleaning done automatically. Mining concepts • 75k glosses ready. • We have ~100 students typing dictionaries now! • +150K more glosses (expected this year) Jarrar © 2012 17
  • 18. Step1-Mining Arabic Concepts from Dictionaries • Collect as much glosses/concepts as possible from specialized and general dictionaries. • Manual extraction from dictionaries, then basic cleaning done automatically. Mining concepts Jarrar © 2012 18
  • 19. Step1-Mining Arabic Concepts from Dictionaries • Most Arabic dictionaries are not useful, but some are a good start. The dictionaries we need should:  Focus on the semantic aspects.  Multiple meanings are not mixed up.  Structure of quality of the meaning. Mining concepts Jarrar © 2012 19
  • 20. Examples (Good & Bad Resources) Wiktionary           Jarrar © 2012 20
  • 21. The Matching Function is used for: 1- Based on the previous mapping, we can inherit Semantic Relations from WordNet. Arabic Concepts WordNet Concepts A L H J B R C D Q  Remark: This is only a good start, as these inherited relations need to be cleaned using the Arabic Top Levels, and using the OnToClean Methodology. 2- Same function is used to detect redundant concepts, within the Arabic Ontology itself. Jarrar © 2012 21
  • 22. Step2: Map Arabic concepts to WordNet (Matching Function) We developed a smart algorithm, such that: Input: (Arabic gloss, 117k English glosses in WordNet). Output: (best match, rank) Accuracy: +90% (being improved) WordNet (English) The territory occupied by one of the constituent administrative districts of a nation The way something is with respect to its main attributes The group of people comprising the government of a sovereign state A politically organized body of people under a single government A compilation of the known facts regarding something or someone …. A politically organized body of people under a single government Jarrar © 2012 22
  • 23. Step 3: Link concepts with the Arabic Core Ontology Each Arabic concept (from previous steps) is mapped to a concept in the 10th level. That is, the 10th level of this core ontology should top all Arabic concepts and levels, so to enable automatic detection of problems in the hierarchy. Top 3 levels shown here, for simplicity A C J J Jarrar © 2012 23
  • 24. Automatic Detection of Inconsistencies Subtype links from Arabic concepts to the core ontology (done manually) Subtypes links between Arabic concepts (derived via the mappings to WordNet) Now we can automatically detect whether the links are correct?  If (J A) and (A ) then it‟s most likely true that (J ) , thus no need to have (J ).  However, as H and C don‟t share a supertype, (H C) is likely incorrect. Top 3 levels shown here, for simplicity X! A C X! J Jarrar © 2012 H 24
  • 25. Step 4- Re-Formulate Glosses, according to strict ontological guidelines[J06] Glosses are re-formulated semi-manually, to meet our strict rules. Gloss-cleaning can be done automatically to a certain point.  While the manual-cleaning (=re-formulating) glosses, mistakes in subtype relation can be detected. Jarrar © 2012 25
  • 26. Outline • Arabic Ontology Project • Design principles • Methodology and progress • Ongoing Research • Matching Function • The Top levels • Finding Arabic Synonyms Jarrar © 2012 26
  • 27. Why the Matching Function Arabic concepts WordNet: without subtype Arabic Concepts WordNet Concepts English concepts links between with subtype them relations between A them L H J B R C D Q Assumption: If we could know that the (English concept B = Arabic concept A) and (English concept D = Arabic concept C), and (D SubtypeOf B) then we can automatically deduce that (C SubtypeOf A). Research Problem: given an Arabic concept, automatically find its equivalent English concept (if exists). Jarrar © 2012 27
  • 28. The Matching Function (Map Arabic concepts to WordNet) A smart algorithm that maps a gloss of a concept written in Arabic into its English equivalent in WordNet, such that: Input: (Arabic gloss, 117k English glosses in WordNet). Output: (best match, rank) WordNet (English) The territory occupied by one of the constituent administrative districts of a nation The way something is with respect to its main attributes The group of people comprising the government of a sovereign state A politically organized body of people under a single government A compilation of the known facts regarding something or someone …. Country A politically organized body of people under a single government Jarrar © 2012 28
  • 29. Matching Function: Main Steps Translate the Arabic gloss into English (using Google or Bing) Find the minimal set of English Concepts(Synsets) that contains the best match for an Arabic concept (i.e. Search Domain). Rank the concepts in the search domain (according to how much they are close to the Arabic Gloss)  The best match of an Arabic concept is the English concept with the highest score. Jarrar © 2012 29
  • 30. Translating Arabic Glosses into English A country with defined borders, the people and Government and institutions  This step is done easily, as Google and Bing provide APIs for automatic translation.  The translation is not always good, but satisfactory. It depends on the quality of the gloss being translated. Jarrar © 2012 30
  • 31. Determine the Search Domain Matching the translated gloss into an English gloss among (117k gloss) is:  Time consuming.  The algorithm might be misled as the many English glosses. Instead, Find the minimal set of English Concepts(Synsets) that contains the best match for an Arabic concept (i.e. Search Domain).  Translate the word “ ” into English using the Google and Bing {state, country, nation, polity, land } = Now, Search Domain = the set of all glosses of this set words + two levels up and to levels down. Jarrar © 2012 31
  • 32. Determine the Search Domain Search Domain The territory occupied by one of the constituent administrative districts of a nation (~2000 glosses) The way something is with respect to its main attributes English Terms The group of people comprising the government of a sovereign state country A politically organized body of people under a single government state A compilation of the known facts regarding something or someone land Put before nation A state of depression or agitation polity The territory occupied by a nation earth Express in words …  Our translated target gloss should be in in this search domain Jarrar © 2012 32
  • 33. Determine the Search Domain Search Domain The territory occupied by one of the constituent administrative districts of a nation (~2000 glosses) The way something is with respect to its main attributes English Terms The group of people comprising the government of a sovereign state country A politically organized body of people under a single government state A compilation of the known facts regarding something or someone land Put before nation A state of depression or agitation polity The territory occupied by a nation earth Express in words • How many levels to go for? … – Too many levels increases the number of irrelevant synsets. – Too few levels lowers the possibility of finding the best match. • We need to find the minimal set of synsets that contains the best match for the Arabic concept. Jarrar © 2012 33
  • 34. The Ranking Step Search Domain The territory occupied by one of the constituent administrative districts of a nation (~2000 glosses) The way something is with respect to its main attributes English Terms The group of people comprising the government of a sovereign state country A politically organized body of people under a single government state A compilation of the known facts regarding something or someone land Put before nation A state of depression or agitation polity The territory occupied by a nation earth Express in words … Rank the concepts in the search domain (according to how much they are close to the Arabic Gloss) Jarrar © 2012 34
  • 35. The Ranking Step Rank the concepts in the search domain (according to how much they are close to the Arabic Gloss) The ranking step is divided into several steps Convert the translated gloss into an Array of words: original words, their Synonyms, super/subtypes. Give a weight for each word depending on its type. (Synonym, direct Subtype, direct Supertype, second level Supertype) Match the words in each concept in the search domain with the word in the array. For each hit, add the weight of the word to the synset score. The synset with the highest score is the best match. Jarrar © 2012 35
  • 36. Convert the translated gloss into an Array of words Array A country with defined country borders, the people defined 1-m and Government and Original words … institutions Body politic m-n Commonwheath Synonyms of Array … Original words Stop words country 1 People are removed n-o defined Political entity subtypes of … Original words borders Asian country people o-p City state Government subtypes of … Original words institutions m countries defines p-q The array 1-q might be 1500 words or defining stems of all more. the idea is to have the relevant … words (1-p) words that one may to describe the concept being matched. Jarrar © 2012 36
  • 37. Convert the translated gloss into an Array of words Array country 1-m defined Original words … weight=a Body politic Give a weight for each word depending on Commonwheath m-n its type. (Synonym, direct Subtype, direct Synonyms of … Original words Supertype, second level Supertype) People weight=b n-o Political entity subtypes of … Original words Asian country weight=c o-p City state subtypes of … Original words countries weight=d defines p-q defining stems of all … words (1-p) weight=e Jarrar © 2012 37
  • 38. Convert the translated gloss into an Array of words Array Match the words in each concept in the country search domain with the word in the array. defined 1-m Original words … weight=a Search Domain Body politic The territory occupied by one of the constituent administrative districts of a nation m-n Commonwheath Synonyms of The way something is with respect to its main attributes … Original words The group of people comprising the government of a sovereign state People weight=b A politically organized body of people under a single government n-o Political entity A compilation of the known facts regarding something or someone subtypes of Put before … Original words Asian country weight=c A state of depression or agitation o-p The territory occupied by a nation City state subtypes of Express in words … Original words … countries weight=d defines p-q defining stems of all … words (1-p) For each hit, add the weight of the word to the array. weight=e Jarrar © 2012 38
  • 39. Ordered Search Domain  The synset with the highest score is the best match. Search Domain rank The territory occupied by one of the constituent administrative districts of a 90 nation The way something is with respect to its main attributes 89 The group of people comprising the government of a sovereign state 89 A politically organized body of people under a single government 78 A compilation of the known facts regarding something or someone 70 Put before 65 A state of depression or agitation 50 The territory occupied by a nation 20 Express in words 0 … Jarrar © 2012 39
  • 40. Ordered Search Domain  The synset with the highest score is the best match. Search Domain rank The territory occupied by one of the constituent administrative districts of a 90 nation The way something is with respect to its main attributes 89 The group of people comprising the government of a sovereign state 89 A politically organized body of people under a single government 78 A compilation of the known facts regarding something or someone 70 Put before 65 A state of depression or agitation 50 The territory occupied by a nation 20 Express in words 0 …  The results (until this step) were not too bad, but we added an extra step, called “Rank x Centrality” to improve the accuracy. Jarrar © 2012 40
  • 41. The Rank X Centrality Step (Rank+)  Rebuild the subtype links between concepts Search Domain rank 50 65 The territory occupied by one of the P S constituent administrative districts of a 90 Q A nation 90 The way something is with respect to its 89 70 main attributes T B The group of people comprising the W Q government of a sovereign state 89 A politically organized body of people 78 20 89 under a single government F U A compilation of the known facts 0 R E regarding something or someone 70 O 78 89 Put before 65 A state of depression or agitation 50 G J The territory occupied by a nation 20 Express in words 0 … R: the rank computed in the previous step C: the centrality C of each concept in its tree, how many nodes above/under Rank+ = R x C (the idea is to use centrality to compromise ranks that are close to each other) Jarrar © 2012 41
  • 42. The Experiment Exp N° N° of glosses Match percentage 160 gloss More than 90% 1st experiment 1000 gloss In progress… 2 experiment nd Jarrar © 2012 42
  • 43. Ongoing Research (Use Machine learning and Neural networks) By Mohammed Mellhem Static weights Dynamic weights For each word found, a weight is given as follows Type Weight Learning to find the best Value weights and levels to expand Keyword 1st 3 words 3 (using neural networks). Keyword 4th and above 1 Super-type 0.7  The algorithm tries to find the Sub-Type 0.4 maximum level of expansion that Synonym 0.3 can help to find the best match Sub-word 0.2 Bigram (not-imp.) 1.2 Jarrar © 2012 43
  • 44. Progress so far • Considering three parts: 1. Find best depth for Search Space expansion 2. Find best depth for Query expansion 3. Find best weights for best matching • For point 1 and 2; the algorithm try to find the maximum level of expansion that can help to find best match. • For Point 3; the algorithm is still in the learning process, initial results shows that in more than 85% of 162 processed till now, found by concept keyword or keyword stem. • Improve performance by loading matching data on demand. Jarrar © 2012 44
  • 45. Outline • Arabic Ontology Project • Design principles • Methodology and progress • Ongoing Research • The Matching Function • The Top Levels • Finding Arabic Synonyms Jarrar © 2012 45
  • 46. Arabic Ontology: Core (Top Levels). What are the top 10 levels of the ArabicOntology? And how to build it? Top 3 levels shown here, for simplicity 10 levels, 550 concepts • The 10th level of this core ontology should top all Arabic concepts and levels. • This allow us to detect any problems in the tree/relations! • The core Ontology governs the correctness and the evolution of the whole Arabic Ontology. Jarrar © 2012 46
  • 47. Arabic Ontology: Core (Top Levels). What are the top 10 levels of the ArabicOntology? And how to build it? Methodology (by Rana Rashmawi) Top 3 levels shown here, for simplicity 10 levels, 550 concepts Arabic Core Ontology: the top levels of the Arabic Ontology, - built manually based on DOLCE and SUMO upper level ontologies, and taking into account, carefully, the philosophical and historical aspects of the Arabic conceptsterms. Phase 1. Determining the top (Core) Arabic concepts. Phase 2. Constructing the Top Levels. • Phase 3. Verifyingthis core ontology should top all Arabic concepts and levels. The 10th level of the correctness and completeness of the top levels. • This allow us to detect any problems in the tree/relations! • The core Ontology governs the correctness and the evolution of the whole Arabic Ontology. Jarrar © 2012 47
  • 48. Phase 1. Determining the top (Core) Arabic concepts 1. Both SUMO & DOLCE were translated to Arabic separately. Where each English upper term was mapped to (3-5) Arabic Concepts, based on the relevance of the meaning.  The result of this step was a pool of 1200 Arabic terms, that necessarily demonstrate the upper terms of the Arabic Language. ≈ SUMO Terms Entity Arabic Terms ≈ Abstract Physical Attribute … = 80 DOLCE Terms Entity Abstract Object Event … Jarrar © 2012 48
  • 49. Phase 1. Determining the top (Core) Arabic concepts 2. We investigated all the meanings for each Arabic term in the resulted pool of terms, using Arabic Lexicons. Allowing some editing and reforming of the glosses following strict Ontological guidelines.  The result of this step was a larger pool of glosses (about 6000 glosses) of the most general and comprehensive concepts in the Arabic Language. Arabic Meaning ≈ Arabic Terms ≈ … . . … Jarrar © 2012 49
  • 50. Phase 1. Determining the top (Core) Arabic concepts 3. The last step in this phase was to choose one Arabic concept for each English concept for both SUMO & DOLCE.  This step could be considered as a process of semantic mapping between English and Arabic Concepts. Arabic Terms Arabic Meaning … . . … Jarrar © 2012 50
  • 51. Phase 2. Constructing the Top Levels • After determining the Arabic upper (core) Concepts that correspond to the English Upper Concepts from both SUMO & DOLCE. We derived the relations between these Arabic concepts depending on the existing relations in both foreign Ontologies. As both of them depend on the (sub/super-type relation) in their structure. Entity Abstract Object Info. Entity Event Quality • The same methodology was used to extract the semantic relations from both DOLCE & SUMO to complete all the lower levels, up until the 10th. Jarrar © 2012 51
  • 52. Phase 3. Verifying the correctness and completeness of the top levels • A manual mapping of 6000 Arabic meaning was done to verify the top levels. Jarrar © 2012 52
  • 53. Outline • Arabic Ontology Project • Design principles • Methodology and progress • Ongoing Research • The Matching Function • The Top Levels • Finding Arabic Synonyms Jarrar © 2012 53
  • 54. Further/Ongoing Research  Given many Arabic-English, Arabic-French, Arabic-Italian dictionaries  Can we derive an Arabic-Arabic thesaurus? For example  Then Categorize very-related words (maybe using WordNet) as the following: This will help finding possible Arabic synsets, which help detecting possible subtype relations and/or validate the existing relations. Jarrar © 2012 54
  • 55. Table sort ‫رتت‬ tidy ‫أنيق‬ arrange ‫رتت‬ tidy ‫ضخم‬ order ‫رتت‬ tidy ‫نظيف‬ set ‫رتت‬ tidy ‫منهجي‬ tidy ‫رتت‬ tidy ‫منظم‬ pack ‫رتت‬ tidy ‫مهندم‬ shape ‫رتت‬ pack ‫حشمخ‬ form ‫رتت‬ pack ‫رسمخ‬ sort ‫ نىع‬pack ‫وضت‬ sort ‫ شكم‬pack ‫تكىم‬ sort ‫ ضزة‬pack ‫حشب‬ sort ‫ هذا اننىع‬shape ‫شكم‬ sort ‫ صنف‬shape ‫حبنخ‬ sort ‫ طزيقخ‬shape ‫هيئخ‬ arrange ‫ نظم‬shape ‫مظهز‬ arrange ‫ اتخذ‬shape ‫تجسد‬ arrange ‫ سىي انخالف‬form ‫شكم‬ arrange ‫ ػدل‬form ‫استمبرح‬ order ‫ اننظبم‬form ‫صىرح‬ order ‫ تزتيت‬form ‫نىع‬ order ‫ أمز‬form ‫هيئخ‬ set ‫ مجمىػخ‬form ‫صيغخ‬ set ‫ وضغ‬tune ‫رتت‬ set ‫ ضجط‬form ‫ضزة‬ set ‫2102 © ثدأ‬ Jarrarorganize ‫رتت‬ 55
  • 56. Level 1 Jarrar © 2012 56
  • 57. Level 2 Jarrar © 2012 57
  • 58. Level 3 Jarrar © 2012 58
  • 59. Level 4 Jarrar © 2012 59
  • 60. Cycle Jarrar © 2012 60