Lecture Notes                                                  Birzeit University, Palestine                              ...
ReadingMustafa Jarrar: Building A Formal Arabic Ontology (Invited Paper) . In proceedings ofthe Experts Meeting On Arabic ...
Arabic Ontology Team (Current) Fateh Badran                     Collaborators Maather Alkqam                   Jamal Daher...
The Arabic Ontology Project                                                         http://sites.birzeit.edu/comp/ArabicOn...
Arabic Ontology: Data Model (Simplified)     • ConceptID (as a synsetID in WordNet) to identify a concept.     • Polysemy ...
Lexical vs. Semantic Relationships     • Semantic relations are relationships between concepts (not       words), e.g., su...
Arabic Ontology• Arabic Ontology: the set of concepts (of all Arabic terms), and the  semantic (not lexical) relationships...
Arabic Ontology: Subtype Relationships    • Subtype relation: is a mathematical relations (subset: A B ), such      that e...
Arabic Ontology: Subtype Relationships• It is recommended to use proper subtypes, as it is more strict.• That is, A and B ...
Arabic Ontology: Core (Top Levels).     Arabic Core Ontology: the top levels of the Arabic Ontology, - built     manually ...
Arabic Ontology: Glosses               according to strict ontological guidelines[J06]     A gloss: is an auxiliary inform...
Arabic Ontology: Gloss GuidelinesWhat should and what should not be provided in a gloss:1. Start with the principal/super ...
Arabic Ontology: Gloss Guidelines4. Use supportive examples :  - To clarify cases that are commonly known to be false but ...
Arabic Ontology: Gloss GuidelinesAs a gloss starts with a supertype of concept being defined, try to readthe gloss as the ...
ArabicOntology Vs WordNetUnlike WordNet, the Arabic Ontology is:    1. Philosophically well founded:        • Focuses on i...
Our Approach to Building the      ArabicOntologyRoughly:Step1: Mine Arabic concepts/glosses from dictionaries.Step 2: Auto...
Step1-Mining Arabic Concepts from     Dictionaries• Collect as much glosses/concepts as possible from specialized and  gen...
Step1-Mining Arabic Concepts from     Dictionaries• Collect as much glosses/concepts as possible from specialized and  gen...
Step1-Mining Arabic Concepts from      Dictionaries• Most Arabic dictionaries are not useful, but some are a good start.  ...
Examples (Good & Bad Resources)    Wiktionary                                                                          ...
The Matching Function is used for: 1- Based on the previous mapping, we can inherit Semantic Relations    from WordNet.   ...
Step2: Map Arabic concepts to WordNet                   (Matching Function)We developed a smart algorithm, such that:     ...
Step 3: Link concepts with the Arabic Core     OntologyEach Arabic concept (from previous steps) is mapped to a concept in...
Automatic Detection of Inconsistencies Subtype links from Arabic concepts to the core ontology (done manually)Subtypes lin...
Step 4- Re-Formulate Glosses,      according to strict ontological guidelines[J06]Glosses are re-formulated semi-manually,...
Outline  • Arabic Ontology Project  • Design principles  • Methodology and progress  • Ongoing Research     • Matching Fun...
Why the Matching FunctionArabic concepts                                                     WordNet:without subtype    Ar...
The Matching Function     (Map Arabic concepts to WordNet)A smart algorithm that maps a gloss of a concept written in Arab...
Matching Function: Main StepsTranslate the Arabic gloss into English (using Google or Bing)Find the minimal set of English...
Translating Arabic Glosses into English                                      A country with defined                       ...
Determine the Search Domain  Matching the translated gloss into an English gloss among (117k gloss)    is:         Time c...
Determine the Search Domain                                                 Search Domain                           The te...
Determine the Search Domain                                                          Search Domain                        ...
The Ranking Step                                                Search Domain                          The territory occup...
The Ranking StepRank the concepts in the search domain (according to howmuch they are close to the Arabic Gloss)      The ...
Convert the translated gloss into an Array of words                                                 Array      A country w...
Convert the translated gloss into an Array of words                                                Array                  ...
Convert the translated gloss into an Array of words                                                                       ...
Ordered Search Domain The synset with the highest score is the best match.                               Search Domain   ...
Ordered Search Domain    The synset with the highest score is the best match.                                  Search Dom...
The Rank X Centrality Step (Rank+)   Rebuild the subtype links between concepts          Search Domain                   ...
The ExperimentExp N°           N° of glosses        Match percentage                 160 gloss            More than 90%1st...
Ongoing Research (Use Machine learning and Neural networks)                                                    By Mohammed...
Progress so far• Considering three parts:    1.   Find best depth for Search Space expansion    2.   Find best depth for Q...
Outline  • Arabic Ontology Project  • Design principles  • Methodology and progress  • Ongoing Research     • The Matching...
Arabic Ontology: Core (Top Levels).     What are the top 10 levels of the ArabicOntology? And how to build it?          To...
Arabic Ontology: Core (Top Levels).       What are the top 10 levels of the ArabicOntology? And how to build it?          ...
Phase 1. Determining the top (Core) Arabic              concepts1.     Both SUMO & DOLCE were translated to Arabic separat...
Phase 1. Determining the top (Core) Arabic              concepts2. We investigated all the meanings for each Arabic term i...
Phase 1. Determining the top (Core) Arabic                  concepts   3. The last step in this phase was to choose one Ar...
Phase 2. Constructing the Top Levels           • After determining the Arabic upper (core) Concepts that correspond       ...
Phase 3. Verifying the correctness and             completeness of the top levels• A manual mapping of 6000 Arabic meaning...
Outline  • Arabic Ontology Project  • Design principles  • Methodology and progress  • Ongoing Research     • The Matching...
Further/Ongoing Research   Given many Arabic-English, Arabic-French, Arabic-Italian dictionaries   Can we derive an Arab...
Table sort            ‫رتت‬      tidy          ‫أنيق‬ arrange         ‫رتت‬      tidy         ‫ضخم‬ order           ‫رتت‬ ...
Level 1          Jarrar © 2012   56
Level 2          Jarrar © 2012   57
Level 3          Jarrar © 2012   58
Level 4          Jarrar © 2012   59
Cycle        Jarrar © 2012   60
Upcoming SlideShare
Loading in …5
×

Jarrar.lecture notes.arabicontology

522 views

Published on

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
522
On SlideShare
0
From Embeds
0
Number of Embeds
28
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Jarrar.lecture notes.arabicontology

  1. 1. Lecture Notes Birzeit University, Palestine 2012Advanced Topics in Ontology Engineering Arabic Ontology Dr. Mustafa Jarrar Sina Institute, University of Birzeit mjarrar@birzeit.edu www.jarrar.info Jarrar © 2012 1
  2. 2. ReadingMustafa Jarrar: Building A Formal Arabic Ontology (Invited Paper) . In proceedings ofthe Experts Meeting On Arabic Ontologies And Semantic Networks. Alecso, Arab League.Tunis, July 26-28, 2011.Article http://www.jarrar.info/publications/J11.pdf.htmSlides: http://mjarrar.blogspot.com/2011/08/building-formal-arabic-ontology-invited.htmlMustafa Jarrar: Towards The Notion Of Gloss, And The Adoption Of LinguisticResources In Formal Ontology Engineering. In proceedings of the 15th International WorldWide Web Conference (WWW2006). Edinburgh, Scotland. Pages 497-503. ACM Press. ISBN:1595933239. May 2006.http://www.jarrar.info/publications/J06.pdf.htm Jarrar © 2012 2
  3. 3. Arabic Ontology Team (Current) Fateh Badran Collaborators Maather Alkqam Jamal Daher Rabaa Yusuf Prof. Mahi Arrar Razan Marwan Haneen Somoom Naser Musleh Jarrar © 2012 3
  4. 4. The Arabic Ontology Project http://sites.birzeit.edu/comp/ArabicOntology • A project started in 2010, at Birzeit University, Palestine. • The ArabicOntology is more than an Arabic WordNet • Unlike WordNet, the ArabicOntology is logically and philosophically well- founded, as it follows strict ontological principles.  but can be used an Arabic WordNet. The project is partially funded (Seed funding) by Birzeit University (VP academic Office, Research Committee). Jarrar © 2012 4
  5. 5. Arabic Ontology: Data Model (Simplified) • ConceptID (as a synsetID in WordNet) to identify a concept. • Polysemy and synonymy: like in WordNet, several words (i.e., lexical units) can be used to lexicalize one concept (synonymy); and one word might be used to lexicalize several concepts. Lexical UnitGloss: describes a concept Semantic RelationsConcept ID: concept unique reference Jarrar © 2012 5
  6. 6. Lexical vs. Semantic Relationships • Semantic relations are relationships between concepts (not words), e.g., subtype, part-of, etc. • Lexical relations are relationships between words (not concepts), e.g., synonym-of, root-of, abbreviation-of, etc. • Ontologies are mainly concerned with semantic relations. Lexical UnitGloss: describes a concept Semantic RelationsConcept ID: concept unique reference Jarrar © 2012 6
  7. 7. Arabic Ontology• Arabic Ontology: the set of concepts (of all Arabic terms), and the semantic (not lexical) relationships between these concepts.• To build an Arabic Ontology: Identify the set of concepts for every Arabic word (Polysemy), and define semantic relations between these concepts.• Most important relation is the subtype relation, which leads to a (tree of concepts) . Jarrar © 2012 7
  8. 8. Arabic Ontology: Subtype Relationships • Subtype relation: is a mathematical relations (subset: A B ), such that every instance in A must also be an instance of B. • Inheritance: subtypes inherit all properties of their super types. • “Hyponymy” in WordNet is close to (but not the same as) the subtype relation. • “General-Specific” relations, as in thesauri, are not subtype relations. world . . . .. . . . . . . .. . .. . . .. . . 10 . . 3 . . . . 6 . . . . . . . .. . 4 . .. . . . . . .. . . . . . . . . . . .. . Jarrar © 2012 8
  9. 9. Arabic Ontology: Subtype Relationships• It is recommended to use proper subtypes, as it is more strict.• That is, A and B are never equal, B is always a super set of A.• It is recommended to classify concepts based on “rigidity”.• For example it is wrong to say that a „WorkTable‟ is type of „Table‟. as being a work table is a non-rigid property.• As such, subtypes form a tree. Jarrar © 2012 9
  10. 10. Arabic Ontology: Core (Top Levels). Arabic Core Ontology: the top levels of the Arabic Ontology, - built manually based on DOLCE and SUMO upper level ontologies, and taking into account, carefully, the philosophical and historical aspects of the Arabic conceptsterms. Top 3 levels shown here, for simplicity 10 levels, 550 concepts• The 10th level of this core ontology should top all Arabic concepts and levels.• This allow us to detect any problems in the tree/relations!• The core Ontology governs the correctness and the evolution of the whole Arabic Ontology. Jarrar © 2012 10
  11. 11. Arabic Ontology: Glosses according to strict ontological guidelines[J06] A gloss: is an auxiliary informal (but controlled) account of the intended meaning of a linguistic term, for the commonsense perception of humans.A gloss is supposed to render factual knowledge that is critical to understand a concept, but thate.g. is implausible, unreasonable, or very difficult to formalize and/or articulate explicitly. (NOT) tocatalogue general information and comments, as e.g. conventional dictionaries and encyclopediasusually do, or as <rdfs:comment>. Jarrar © 2012 11
  12. 12. Arabic Ontology: Gloss GuidelinesWhat should and what should not be provided in a gloss:1. Start with the principal/super type of the concept being defined. E.g. „Search engine‟: “A computer program that …”, „Invoice‟: “A business document that…”, „University‟: “An institution of …”.3. Focus on distinguishing characteristics and intrinsic prosperities that differentiate the concept out of other concepts. E.g. Compare, „Laptop computer‟: “A computer that is designed to do pretty much anything a desktop computer can do, it runs for a short time (usually two to five hours) on batteries”. “A portable computer small enough to use in your lap…”.2. Written in a form of propositions, offering the reader inferential knowledge that help him to construct the image of the concept. E.g. Compare „Search engine‟: “A computer program for searching the internet, it can be defined as one of the most useful aspects of the World Wide Web. Some of the major ones are Google, ….”; A computer program that enables users to search and retrieves documents or data from a database or from a computer network…”. Jarrar © 2012 12
  13. 13. Arabic Ontology: Gloss Guidelines4. Use supportive examples : - To clarify cases that are commonly known to be false but they are true, or that are known to be true but they are false; - To strengthen and illustrate distinguishing characteristics (e.g. define by examples, counter-examples). Examples can be types and/or instances of the concept being defined.5. Be consistent with formal definitions/axioms.6. Be sufficient, clear, and easy to understand. WordNet glosses do not follow such ontological guidelines Jarrar © 2012 13
  14. 14. Arabic Ontology: Gloss GuidelinesAs a gloss starts with a supertype of concept being defined, try to readthe gloss as the following, to verify what you do is correct: Jarrar © 2012 14
  15. 15. ArabicOntology Vs WordNetUnlike WordNet, the Arabic Ontology is: 1. Philosophically well founded: • Focuses on intrinsic properties; • All types are rigid; • The top level is derived from known Top Level Ontologies. 2. Strictly formal: • Semantic relations are well-defined mathematical relations. 3. Strictly-controlled glosses • The content and structure of the glosses is strictly based on ontological principles. Jarrar © 2012 15
  16. 16. Our Approach to Building the ArabicOntologyRoughly:Step1: Mine Arabic concepts/glosses from dictionaries.Step 2: Automatically map between these Arabic concepts and WordNet concepts, thus inherit semantic relations from WordNet.Step 3: Link all concepts with the Arabic Core Ontology.Step 4:Re-formulate these glosses, according to strict ontological guidelines. Jarrar © 2012 16
  17. 17. Step1-Mining Arabic Concepts from Dictionaries• Collect as much glosses/concepts as possible from specialized and general dictionaries.• Manual extraction from dictionaries, then basic cleaning done automatically. Mining concepts • 75k glosses ready. • We have ~100 students typing dictionaries now! • +150K more glosses (expected this year) Jarrar © 2012 17
  18. 18. Step1-Mining Arabic Concepts from Dictionaries• Collect as much glosses/concepts as possible from specialized and general dictionaries.• Manual extraction from dictionaries, then basic cleaning done automatically. Mining concepts Jarrar © 2012 18
  19. 19. Step1-Mining Arabic Concepts from Dictionaries• Most Arabic dictionaries are not useful, but some are a good start. The dictionaries we need should:  Focus on the semantic aspects.  Multiple meanings are not mixed up.  Structure of quality of the meaning. Mining concepts Jarrar © 2012 19
  20. 20. Examples (Good & Bad Resources) Wiktionary          Jarrar © 2012 20
  21. 21. The Matching Function is used for: 1- Based on the previous mapping, we can inherit Semantic Relations from WordNet. Arabic Concepts WordNet Concepts A L H J B R C D Q Remark: This is only a good start, as these inherited relations need to be cleaned using the Arabic Top Levels, and using the OnToClean Methodology.2- Same function is used to detect redundant concepts, within the Arabic Ontology itself. Jarrar © 2012 21
  22. 22. Step2: Map Arabic concepts to WordNet (Matching Function)We developed a smart algorithm, such that: Input: (Arabic gloss, 117k English glosses in WordNet). Output: (best match, rank) Accuracy: +90% (being improved) WordNet (English) The territory occupied by one of the constituent administrative districts of a nation The way something is with respect to its main attributes The group of people comprising the government of a sovereign state A politically organized body of people under a single government A compilation of the known facts regarding something or someone …. A politically organized body of people under a single government Jarrar © 2012 22
  23. 23. Step 3: Link concepts with the Arabic Core OntologyEach Arabic concept (from previous steps) is mapped to a concept in the10th level.That is, the 10th level of this core ontology should top all Arabic conceptsand levels, so to enable automatic detection of problems in the hierarchy. Top 3 levels shown here, for simplicity A C J J Jarrar © 2012 23
  24. 24. Automatic Detection of Inconsistencies Subtype links from Arabic concepts to the core ontology (done manually)Subtypes links between Arabic concepts (derived via the mappings to WordNet)Now we can automatically detect whether the links are correct? If (J A) and (A ) then it‟s most likely true that (J ) , thus no need to have (J ). However, as H and C don‟t share a supertype, (H C) is likely incorrect. Top 3 levels shown here, for simplicity X! A C X! J Jarrar © 2012 H 24
  25. 25. Step 4- Re-Formulate Glosses, according to strict ontological guidelines[J06]Glosses are re-formulated semi-manually, to meet our strict rules.Gloss-cleaning can be done automatically to a certain point. While the manual-cleaning (=re-formulating) glosses, mistakes in subtype relation can be detected. Jarrar © 2012 25
  26. 26. Outline • Arabic Ontology Project • Design principles • Methodology and progress • Ongoing Research • Matching Function • The Top levels • Finding Arabic Synonyms Jarrar © 2012 26
  27. 27. Why the Matching FunctionArabic concepts WordNet:without subtype Arabic Concepts WordNet Concepts English conceptslinks between with subtypethem relations between A them L H J B R C D Q Assumption: If we could know that the (English concept B = Arabic concept A) and (English concept D = Arabic concept C), and (D SubtypeOf B) then we can automatically deduce that (C SubtypeOf A). Research Problem: given an Arabic concept, automatically find its equivalent English concept (if exists). Jarrar © 2012 27
  28. 28. The Matching Function (Map Arabic concepts to WordNet)A smart algorithm that maps a gloss of a concept written in Arabic intoits English equivalent in WordNet, such that: Input: (Arabic gloss, 117k English glosses in WordNet). Output: (best match, rank) WordNet (English) The territory occupied by one of the constituent administrative districts of a nation The way something is with respect to its main attributes The group of people comprising the government of a sovereign state A politically organized body of people under a single government A compilation of the known facts regarding something or someone …. Country A politically organized body of people under a single government Jarrar © 2012 28
  29. 29. Matching Function: Main StepsTranslate the Arabic gloss into English (using Google or Bing)Find the minimal set of English Concepts(Synsets) that containsthe best match for an Arabic concept (i.e. Search Domain).Rank the concepts in the search domain (according to howmuch they are close to the Arabic Gloss) The best match of an Arabic concept is the English concept with the highest score. Jarrar © 2012 29
  30. 30. Translating Arabic Glosses into English A country with defined borders, the people and Government and institutions This step is done easily, as Google and Bing provide APIs for automatic translation. The translation is not always good, but satisfactory. It depends on the quality of the gloss being translated. Jarrar © 2012 30
  31. 31. Determine the Search Domain Matching the translated gloss into an English gloss among (117k gloss) is:  Time consuming.  The algorithm might be misled as the many English glosses. Instead, Find the minimal set of English Concepts(Synsets) that contains the best match for an Arabic concept (i.e. Search Domain).  Translate the word “ ” into English using the Google and Bing {state, country, nation, polity, land } =Now, Search Domain = the set of all glosses of this set words + twolevels up and to levels down. Jarrar © 2012 31
  32. 32. Determine the Search Domain Search Domain The territory occupied by one of the constituent administrative districts of a nation (~2000 glosses) The way something is with respect to its main attributes English Terms The group of people comprising the government of a sovereign state country A politically organized body of people under a single government state A compilation of the known facts regarding something or someone land Put before nation A state of depression or agitation polity The territory occupied by a nation earth Express in words … Our translated target gloss should be in in this search domain Jarrar © 2012 32
  33. 33. Determine the Search Domain Search Domain The territory occupied by one of the constituent administrative districts of a nation (~2000 glosses) The way something is with respect to its main attributes English Terms The group of people comprising the government of a sovereign state country A politically organized body of people under a single government state A compilation of the known facts regarding something or someone land Put before nation A state of depression or agitation polity The territory occupied by a nation earth Express in words• How many levels to go for? … – Too many levels increases the number of irrelevant synsets. – Too few levels lowers the possibility of finding the best match.• We need to find the minimal set of synsets that contains the best match for the Arabic concept. Jarrar © 2012 33
  34. 34. The Ranking Step Search Domain The territory occupied by one of the constituent administrative districts of a nation (~2000 glosses) The way something is with respect to its main attributes English Terms The group of people comprising the government of a sovereign state country A politically organized body of people under a single government state A compilation of the known facts regarding something or someone land Put before nation A state of depression or agitation polity The territory occupied by a nation earth Express in words …Rank the concepts in the search domain (according to howmuch they are close to the Arabic Gloss) Jarrar © 2012 34
  35. 35. The Ranking StepRank the concepts in the search domain (according to howmuch they are close to the Arabic Gloss) The ranking step is divided into several steps Convert the translated gloss into an Array of words: original words, their Synonyms, super/subtypes. Give a weight for each word depending on its type. (Synonym, direct Subtype, direct Supertype, second level Supertype) Match the words in each concept in the search domain with the word in the array. For each hit, add the weight of the word to the synset score. The synset with the highest score is the best match. Jarrar © 2012 35
  36. 36. Convert the translated gloss into an Array of words Array A country with defined country borders, the people defined 1-m and Government and Original words … institutions Body politic m-n Commonwheath Synonyms of Array … Original words Stop words country 1 People are removed n-o defined Political entity subtypes of … Original words borders Asian country people o-p City state Government subtypes of … Original words institutions m countries defines p-qThe array 1-q might be 1500 words or defining stems of all more. the idea is to have the relevant … words (1-p) words that one may to describe the concept being matched. Jarrar © 2012 36
  37. 37. Convert the translated gloss into an Array of words Array country 1-m defined Original words … weight=a Body politicGive a weight for each word depending on Commonwheath m-nits type. (Synonym, direct Subtype, direct Synonyms of … Original wordsSupertype, second level Supertype) People weight=b n-o Political entity subtypes of … Original words Asian country weight=c o-p City state subtypes of … Original words countries weight=d defines p-q defining stems of all … words (1-p) weight=e Jarrar © 2012 37
  38. 38. Convert the translated gloss into an Array of words Array Match the words in each concept in the country search domain with the word in the array. defined 1-m Original words … weight=a Search Domain Body politicThe territory occupied by one of the constituent administrative districts of a nation m-n Commonwheath Synonyms ofThe way something is with respect to its main attributes … Original wordsThe group of people comprising the government of a sovereign state People weight=bA politically organized body of people under a single government n-o Political entity A compilation of the known facts regarding something or someone subtypes ofPut before … Original words Asian country weight=cA state of depression or agitation o-pThe territory occupied by a nation City state subtypes ofExpress in words … Original words … countries weight=d defines p-q defining stems of all … words (1-p) For each hit, add the weight of the word to the array. weight=e Jarrar © 2012 38
  39. 39. Ordered Search Domain The synset with the highest score is the best match. Search Domain rank The territory occupied by one of the constituent administrative districts of a 90 nation The way something is with respect to its main attributes 89 The group of people comprising the government of a sovereign state 89 A politically organized body of people under a single government 78 A compilation of the known facts regarding something or someone 70 Put before 65 A state of depression or agitation 50 The territory occupied by a nation 20 Express in words 0 … Jarrar © 2012 39
  40. 40. Ordered Search Domain  The synset with the highest score is the best match. Search Domain rank The territory occupied by one of the constituent administrative districts of a 90 nation The way something is with respect to its main attributes 89 The group of people comprising the government of a sovereign state 89 A politically organized body of people under a single government 78 A compilation of the known facts regarding something or someone 70 Put before 65 A state of depression or agitation 50 The territory occupied by a nation 20 Express in words 0 … The results (until this step) were not too bad, but we added an extrastep, called “Rank x Centrality” to improve the accuracy. Jarrar © 2012 40
  41. 41. The Rank X Centrality Step (Rank+)  Rebuild the subtype links between concepts Search Domain rank 50 65 The territory occupied by one of the P S constituent administrative districts of a 90 Q A nation 90 The way something is with respect to its 89 70 main attributes T B The group of people comprising the W Q government of a sovereign state 89 A politically organized body of people 78 20 89 under a single government F U A compilation of the known facts 0 R E regarding something or someone 70 O 78 89 Put before 65 A state of depression or agitation 50 G J The territory occupied by a nation 20 Express in words 0 …R: the rank computed in the previous stepC: the centrality C of each concept in its tree, how many nodes above/underRank+ = R x C (the idea is to use centrality to compromise ranks that areclose to each other) Jarrar © 2012 41
  42. 42. The ExperimentExp N° N° of glosses Match percentage 160 gloss More than 90%1st experiment 1000 gloss In progress…2 experiment nd Jarrar © 2012 42
  43. 43. Ongoing Research (Use Machine learning and Neural networks) By Mohammed Mellhem Static weights Dynamic weights For each word found, a weight is given as followsType Weight Learning to find the best Value weights and levels to expandKeyword 1st 3 words 3 (using neural networks).Keyword 4th and above 1Super-type 0.7  The algorithm tries to find theSub-Type 0.4 maximum level of expansion thatSynonym 0.3 can help to find the best matchSub-word 0.2Bigram (not-imp.) 1.2 Jarrar © 2012 43
  44. 44. Progress so far• Considering three parts: 1. Find best depth for Search Space expansion 2. Find best depth for Query expansion 3. Find best weights for best matching• For point 1 and 2; the algorithm try to find the maximum level of expansion that can help to find best match.• For Point 3; the algorithm is still in the learning process, initial results shows that in more than 85% of 162 processed till now, found by concept keyword or keyword stem.• Improve performance by loading matching data on demand. Jarrar © 2012 44
  45. 45. Outline • Arabic Ontology Project • Design principles • Methodology and progress • Ongoing Research • The Matching Function • The Top Levels • Finding Arabic Synonyms Jarrar © 2012 45
  46. 46. Arabic Ontology: Core (Top Levels). What are the top 10 levels of the ArabicOntology? And how to build it? Top 3 levels shown here, for simplicity 10 levels, 550 concepts• The 10th level of this core ontology should top all Arabic concepts and levels.• This allow us to detect any problems in the tree/relations!• The core Ontology governs the correctness and the evolution of the whole Arabic Ontology. Jarrar © 2012 46
  47. 47. Arabic Ontology: Core (Top Levels). What are the top 10 levels of the ArabicOntology? And how to build it? Methodology (by Rana Rashmawi) Top 3 levels shown here, for simplicity 10 levels, 550 concepts Arabic Core Ontology: the top levels of the Arabic Ontology, - built manually based on DOLCE and SUMO upper level ontologies, and taking into account, carefully, the philosophical and historical aspects of the Arabic conceptsterms. Phase 1. Determining the top (Core) Arabic concepts. Phase 2. Constructing the Top Levels.• Phase 3. Verifyingthis core ontology should top all Arabic concepts and levels. The 10th level of the correctness and completeness of the top levels.• This allow us to detect any problems in the tree/relations!• The core Ontology governs the correctness and the evolution of the whole Arabic Ontology. Jarrar © 2012 47
  48. 48. Phase 1. Determining the top (Core) Arabic concepts1. Both SUMO & DOLCE were translated to Arabic separately. Where each English upper term was mapped to (3-5) Arabic Concepts, based on the relevance of the meaning.  The result of this step was a pool of 1200 Arabic terms, that necessarily demonstrate the upper terms of the Arabic Language.≈ SUMO Terms Entity Arabic Terms ≈ Abstract Physical Attribute …= 80 DOLCE Terms Entity Abstract Object Event … Jarrar © 2012 48
  49. 49. Phase 1. Determining the top (Core) Arabic concepts2. We investigated all the meanings for each Arabic term in the resulted pool of terms, using Arabic Lexicons. Allowing some editing and reforming of the glosses following strict Ontological guidelines.  The result of this step was a larger pool of glosses (about 6000 glosses) of the most general and comprehensive concepts in the Arabic Language. Arabic Meaning ≈Arabic Terms ≈ … . . … Jarrar © 2012 49
  50. 50. Phase 1. Determining the top (Core) Arabic concepts 3. The last step in this phase was to choose one Arabic concept for each English concept for both SUMO & DOLCE.  This step could be considered as a process of semantic mapping between English and Arabic Concepts.Arabic Terms Arabic Meaning … . . … Jarrar © 2012 50
  51. 51. Phase 2. Constructing the Top Levels • After determining the Arabic upper (core) Concepts that correspond to the English Upper Concepts from both SUMO & DOLCE. We derived the relations between these Arabic concepts depending on the existing relations in both foreign Ontologies. As both of them depend on the (sub/super-type relation) in their structure. EntityAbstract Object Info. Entity Event Quality • The same methodology was used to extract the semantic relations from both DOLCE & SUMO to complete all the lower levels, up until the 10th. Jarrar © 2012 51
  52. 52. Phase 3. Verifying the correctness and completeness of the top levels• A manual mapping of 6000 Arabic meaning was done to verify the top levels. Jarrar © 2012 52
  53. 53. Outline • Arabic Ontology Project • Design principles • Methodology and progress • Ongoing Research • The Matching Function • The Top Levels • Finding Arabic Synonyms Jarrar © 2012 53
  54. 54. Further/Ongoing Research  Given many Arabic-English, Arabic-French, Arabic-Italian dictionaries  Can we derive an Arabic-Arabic thesaurus? For example  Then Categorize very-related words (maybe using WordNet) as the following:This will help finding possible Arabic synsets, which help detecting possible subtype relations and/or validate the existing relations. Jarrar © 2012 54
  55. 55. Table sort ‫رتت‬ tidy ‫أنيق‬ arrange ‫رتت‬ tidy ‫ضخم‬ order ‫رتت‬ tidy ‫نظيف‬ set ‫رتت‬ tidy ‫منهجي‬ tidy ‫رتت‬ tidy ‫منظم‬ pack ‫رتت‬ tidy ‫مهندم‬ shape ‫رتت‬ pack ‫حشمخ‬ form ‫رتت‬ pack ‫رسمخ‬ sort ‫ نىع‬pack ‫وضت‬ sort ‫ شكم‬pack ‫تكىم‬ sort ‫ ضزة‬pack ‫حشب‬ sort ‫ هذا اننىع‬shape ‫شكم‬ sort ‫ صنف‬shape ‫حبنخ‬ sort ‫ طزيقخ‬shape ‫هيئخ‬ arrange ‫ نظم‬shape ‫مظهز‬ arrange ‫ اتخذ‬shape ‫تجسد‬ arrange ‫ سىي انخالف‬form ‫شكم‬ arrange ‫ ػدل‬form ‫استمبرح‬ order ‫ اننظبم‬form ‫صىرح‬ order ‫ تزتيت‬form ‫نىع‬ order ‫ أمز‬form ‫هيئخ‬ set ‫ مجمىػخ‬form ‫صيغخ‬ set ‫ وضغ‬tune ‫رتت‬ set ‫ ضجط‬form ‫ضزة‬ set ‫2102 © ثدأ‬ Jarrarorganize ‫رتت‬ 55
  56. 56. Level 1 Jarrar © 2012 56
  57. 57. Level 2 Jarrar © 2012 57
  58. 58. Level 3 Jarrar © 2012 58
  59. 59. Level 4 Jarrar © 2012 59
  60. 60. Cycle Jarrar © 2012 60

×