Pal gov.tutorial4.session13.arabicontology


Published on

Published in: Education, Spiritual
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Pal gov.tutorial4.session13.arabicontology

  1. 1. ‫أكاديمية الحكومة اإللكترونية الفلسطينية‬ The Palestinian eGovernment Academy www.egovacademy.psTutorial 4: Ontology Engineering & Lexical Semantics Session 13 ArabicOntology Dr. Mustafa Jarrar University of Birzeit PalGov © 2011 1
  2. 2. AboutThis tutorial is part of the PalGov project, funded by the TEMPUS IV program of theCommission of the European Communities, grant agreement 511159-TEMPUS-1-2010-1-PS-TEMPUS-JPHES. The project website: www.egovacademy.psProject Consortium: Birzeit University, Palestine University of Trento, Italy (Coordinator ) Palestine Polytechnic University, Palestine Vrije Universiteit Brussel, Belgium Palestine Technical University, Palestine Université de Savoie, France Ministry of Telecom and IT, Palestine University of Namur, Belgium Ministry of Interior, Palestine TrueTrust, UK Ministry of Local Government, PalestineCoordinator:Dr. Mustafa JarrarBirzeit University, P.O.Box 14- Birzeit, PalestineTelfax:+972 2 2982935 mjarrar@birzeit.eduPalGov © 2011 2
  3. 3. © Copyright NotesEveryone is encouraged to use this material, or part of it, but shouldproperly cite the project (logo and website), and the author of that part.No part of this tutorial may be reproduced or modified in any form or byany means, without prior written permission from the project, who havethe full copyrights on the material. Attribution-NonCommercial-ShareAlike CC-BY-NC-SAThis license lets others remix, tweak, and build upon your work non-commercially, as long as they credit you and license their new creationsunder the identical terms. PalGov © 2011 3
  4. 4. Tutorial Map Topic Time Session 1_1: The Need for Sharing Semantics 1.5 Session 1_2: What is an ontology 1.5 Intended Learning ObjectivesA: Knowledge and Understanding Session 2: Lab- Build a Population Ontology 3 4a1: Demonstrate knowledge of what is an ontology, Session 3: Lab- Build a BankCustomer Ontology 3 how it is built, and what it is used for. Session 4: Lab- Build a BankCustomer Ontology 3 4a2: Demonstrate knowledge of ontology engineering and evaluation. Session 5: Lab- Ontology Tools 3 4a3: Describe the difference between an ontology and a Session 6_1: Ontology Engineering Challenges 1.5 schema, and an ontology and a dictionary. Session 6_2: Ontology Double Articulation 1.5 4a4: Explain the concept of language ontologies, lexical semantics and multilingualism. Session 7: Lab - Build a Legal-Person Ontology 3B: Intellectual Skills Session 8_1: Ontology Modeling Challenges 1.5 4b1: Develop quality ontologies. Session 8_2: Stepwise Methodologies 1.5 4b2: Tackle ontology engineering challenges. 4b3: Develop multilingual ontologies. Session 9: Lab - Build a Legal-Person Ontology 3 4b4: Formulate quality glosses. Session 10: Zinnar – The Palestinian eGovernment 3C: Professional and Practical Skills Interoperability Framework 4c1: Use ontology tools. Session 11: Lab- Using Zinnar in web services 3 4c2: (Re)use existing Language ontologies. Session 12_1: Lexical Semantics and Multilingually 1.5D: General and Transferable Skills d1: Working with team. Session 12_2: WordNets 1.5 d2: Presenting and defending ideas. Session 13: ArabicOntology 3 d3: Use of creativity and innovation in problem solving. Session 14: Lab-Using Linguistic Ontologies 3 d4: Develop communication skills and logical reasoning abilities. Session 15: Lab-Using Linguistic Ontologies 3 PalGov © 2011 4
  5. 5. Session ILOsThis session will help student to:4a4: Explain the concept of language ontologies, lexical semantics and multilingualism.4b4: Formulate quality glosses.4b3: Develop multilingual ontologies. PalGov © 2011 5
  6. 6. ReadingMustafa Jarrar: Building A Formal Arabic Ontology (Invited Paper) . In proceedings ofthe Experts Meeting On Arabic Ontologies And Semantic Networks. Alecso, Arab League.Tunis, July 26-28, 2011.Article Jarrar: Towards The Notion Of Gloss, And The Adoption Of LinguisticResources In Formal Ontology Engineering. In proceedings of the 15th International WorldWide Web Conference (WWW2006). Edinburgh, Scotland. Pages 497-503. ACM Press. ISBN:1595933239. May 2006. Gangemi , Nicola Guarino , Alessandro Oltramari , Ro Oltramari , Stefano Borgo:Cleaning-up WordNets Top-Level. In Proc. of the 1stInternational WordNetConference (2002);jsessionid=C9962DFEDD793F3F839426B774BC9BAF?doi= PalGov © 2011 6
  7. 7. The Arabic Ontology Project • A project started in 2010, at Birzeit University, Palestine. • The ArabicOntology is more than an Arabic WordNet • Unlike WordNet, the ArabicOntology is logically and philosophically well- founded, as it follows strict ontological principles.  but can be used an Arabic WordNet. The project is partially funded (Seed funding) by Birzeit University (VP academic Office, Research Committee). PalGov © 2011 7
  8. 8. Arabic Ontology: Data Model (Simplified) • ConceptID (as a synsetID in WordNet) to identify a concept. • Polysemy and synonymy: like in WordNet, several words (i.e., lexical units) can be used to lexicalize one concept (synonymy); and one word might be used to lexicalize several concepts. Lexical UnitGloss: describes a concept Semantic RelationsConcept ID: concept unique reference PalGov © 2011 8
  9. 9. Lexical vs. Semantic Relationships • Semantic relations are relationships between concepts (not words), e.g., subtype, part-of, etc. • Lexical relations are relationships between words (not concepts), e.g., synonym-of, root-of, abbreviation-of, etc. • Ontologies are mainly concerned with semantic relations. Lexical UnitGloss: describes a concept Semantic RelationsConcept ID: concept unique reference PalGov © 2011 9
  10. 10. Arabic Ontology• Arabic Ontology: the set of concepts (of all Arabic terms), and the semantic (not lexical) relationships between these concepts.• To build an Arabic Ontology: Identify the set of concepts for every Arabic word (Polysemy), and define semantic relations between these concepts.• Most important relation is the subtype relation, which leads to a (tree of concepts) . PalGov © 2011 10
  11. 11. Arabic Ontology: Subtype Relationships • Subtype relation: is a mathematical relations (subset: A  B ), such that every instance in A must also be an instance of B. • Inheritance: subtypes inherit all properties of their super types. • “Hyponymy” in WordNet is close to (but not the same as) the subtype relation. • “General-Specific” relations, as in thesauri, are not subtype relations. world . . . .. . . . . . . .. . .. . . .. . . 10 . . 3 . . . . 14 . . . 6 . . . .. . . . . . . . .. .. 4. .. . . . . . . . . .. . PalGov © 2011 11
  12. 12. Arabic Ontology: Subtype Relationships• It is recommended to use proper subtypes, as it is more strict.• That is, A and B are never equal, B is always a super set of A.• It is recommended to classify concepts based on “rigidity”.• For example it is wrong to say that a „WorkTable‟ is type of „Table‟. as being a work table is a non-rigid property.• As such, subtypes form a tree. PalGov © 2011 12
  13. 13. Arabic Ontology: Core (Top Levels). ‫أمهات المعاني لجميع الكلمات العربية‬ Arabic Core Ontology: the top levels of the Arabic Ontology, - built manually based on DOLCE and SUMO upper level ontologies, and taking into account, carefully, the philosophical and historical aspects of the Arabic conceptsterms. Top 3 levels shown here, for simplicity 10 levels, 550 concepts• The 10th level of this core ontology should top all Arabic concepts and levels.• This allow us to detect any problems in the tree/relations!• The core Ontology governs the correctness and the evolution of the whole Arabic Ontology. PalGov © 2011 13
  14. 14. Arabic Ontology: Glosses according to strict ontological guidelines[J06] A gloss: is an auxiliary informal (but controlled) account of the intended meaning of a linguistic term, for the commonsense perception of humans.A gloss is supposed to render factual knowledge that is critical to understand a concept, but thate.g. is implausible, unreasonable, or very difficult to formalize and/or articulate explicitly. (NOT) tocatalogue general information and comments, as e.g. conventional dictionaries and encyclopediasusually do, or as <rdfs:comment>. PalGov © 2011 14
  15. 15. Arabic Ontology: Gloss GuidelinesWhat should and what should not be provided in a gloss:1. Start with the principal/super type of the concept being defined. E.g. „Search engine‟: “A computer program that …”, „Invoice‟: “A business document that…”, „University‟: “An institution of …”.3. Focus on distinguishing characteristics and intrinsic prosperities that differentiate the concept out of other concepts. E.g. Compare, „Laptop computer‟: “A computer that is designed to do pretty much anything a desktop computer can do, it runs for a short time (usually two to five hours) on batteries”. “A portable computer small enough to use in your lap…”.2. Written in a form of propositions, offering the reader inferential knowledge that help him to construct the image of the concept. E.g. Compare „Search engine‟: “A computer program for searching the internet, it can be defined as one of the most useful aspects of the World Wide Web. Some of the major ones are Google, ….”; A computer program that enables users to search and retrieves documents or data from a database or from a computer network…”. PalGov © 2011 15
  16. 16. Arabic Ontology: Gloss Guidelines4. Use supportive examples : - To clarify cases that are commonly known to be false but they are true, or that are known to be true but they are false; - To strengthen and illustrate distinguishing characteristics (e.g. define by examples, counter-examples). Examples can be types and/or instances of the concept being defined.5. Be consistent with formal definitions/axioms.6. Be sufficient, clear, and easy to understand. WordNet glosses do not follow such ontological guidelines PalGov © 2011 16
  17. 17. Arabic Ontology: Gloss GuidelinesAs a gloss starts with a supertype of concept being defined, try to readthe gloss as the following, to verify what you do is correct: .‫جدول: مصفىفت بياناث مكىنت من صفىف وأعمدة‬ .‫جدول: ترتيب بياناث جنبا ً الى جنب على شكل صفىف وأعمدة‬ .‫جدول: تنظيم بياناث بصىرة ممنهجت جنبا ً الى جنب على شكل صفىف وأعمدة‬ PalGov © 2011 17
  18. 18. ArabicOntology Vs WordNetUnlike WordNet, the Arabic Ontology is: 1. Philosophically well founded: • Focuses on intrinsic properties; • All types are rigid; • The top level is derived from known Top Level Ontologies. 2. Strictly formal: • Semantic relations are well-defined mathematical relations. 3. Strictly-controlled glosses • The content and structure of the glosses is strictly based on ontological principles. PalGov © 2011 18
  19. 19. Methodology and Progress PalGov © 2011 19
  20. 20. Our Approach to Building the ArabicOntologyRoughly:Step1: Mine Arabic concepts/glosses from dictionaries.Step 2: Automatically map between these Arabic concepts and WordNet concepts, thus inherit semantic relations from WordNet.Step 3: Link all concepts with the Arabic Core Ontology.Step 4:Re-formulate these glosses, according to strict ontological guidelines. PalGov © 2011 20
  21. 21. Step1-Mining Arabic Concepts from Dictionaries• Collect as much glosses/concepts as possible from specialized and general dictionaries.• Manual extraction from dictionaries, then basic cleaning done automatically. Mining concepts • 35k glosses ready. • We have ~100 students typing dictionaries now! • +100K more glosses (expected this year) PalGov © 2011 21
  22. 22. Step1-Mining Arabic Concepts from Dictionaries• Collect as much glosses/concepts as possible from specialized and general dictionaries.• Manual extraction from dictionaries, then basic cleaning done automatically. Mining concepts PalGov © 2011 22
  23. 23. Step1-Mining Arabic Concepts from Dictionaries• Most Arabic dictionaries are not useful, but some are a good start. The dictionaries we need should:  Focus on the semantic aspects.  Multiple meanings are not mixed up.  Structure of quality of the meaning. Mining concepts PalGov © 2011 23
  24. 24. ‫)‪Examples (Good & Bad Resources‬‬ ‫‪Wiktionary ‬‬ ‫‪‬معجم البلدان‬ ‫‪ ‬معجم مصطلح األصول‬ ‫‪ ‬المعجم اإلسالمي‬ ‫‪ ‬معجم الحاسبات‬‫‪ ‬المترادف والمتوارد‬ ‫‪ ‬معجم تعريف مصطلحات القانون الخاص‬ ‫‪ ‬معجم األلفاظ المشتركة في اللغة العربية‬ ‫‪ ‬أقرب الموارد‬ ‫‪‬المعجم الوجيزز‬ ‫1102 © ‪PalGov‬‬ ‫42‬
  25. 25. Step2: Map Arabic concepts to WordNet (Matching Function)We developed a smart algorithm, such that: Input: (Arabic gloss, 117k English glosses in WordNet). Output: (best match, rank) Accuracy: +90% (being improved) WordNet (English) The territory occupied by one of the constituent ‫َبلَد لها حُ دود معروفة وشعْ ب‬ َ ُ administrative districts of a nation The way something is with respect to its main َ َّ ُ َ َّ َ ُ ‫وفيها حُ كومة ومُؤسسات م َنظمة‬ attributes The group of people comprising the government of a sovereign state A politically organized body of people under a single government A compilation of the known facts regarding something or someone …. A politically organized body of people under a single government PalGov © 2011 25
  26. 26. The Matching Function is used for: 1- Based on the previous mapping, we can inherit Semantic Relations from WordNet. Arabic Concepts WordNet Concepts A L H J B R C D Q Remark: This is only a good start, as these inherited relations need to be cleaned using the Arabic Top Levels, and using the OnToClean Methodology.2- Same function is used to detect redundant concepts, within the Arabic Ontology itself. PalGov © 2011 26
  27. 27. Step 3: Link concepts with the Arabic Core OntologyEach Arabic concept (from previous steps) is mapped to a concept in the10th level.That is, the 10th level of this core ontology should top all Arabic conceptsand levels, so to enable automatic detection of problems in the hierarchy. Top 3 levels shown here, for simplicity A C J J PalGov © 2011 27
  28. 28. Until this stage We have many concepts extracted from linguistic resources, but the glosses are not well-written! We have many possible subtype relations between concepts, derived via the mappings to WordNet concepts. We have a sample of 6000 Arabic concepts mapped to the 10th level in the Core ontology. We need to:  Clean the glosses,  Clean/correct the subtype links. PalGov © 2011 28
  29. 29. Automatic Detection of Inconsistencies Subtype links from Arabic concepts to the core ontology (done manually)Subtypes links between Arabic concepts (derived via the mappings to WordNet)Now we can automatically detect whether the links are correct? If (J A) and (A  ‫ )اصطالح لغوي‬then it‟s most likely true that (J  ‫اصطالح‬ ‫ , )لغوي‬thus no need to have (J  ‫.)اصطالح لغوي‬ However, as H and C don‟t share a supertype, (H C) is likely incorrect. Top 3 levels shown here, for simplicity X! A C X! J PalGov © 2011 H 29
  30. 30. Step 4- Re-Formulate Glosses, according to strict ontological guidelines[J06]Glosses are re-formulated semi-manually, to meet our strict rules.Gloss-cleaning can be done automatically to a certain point. While the manual-cleaning (=re-formulating) glosses, mistakes in subtype relation can be detected. PalGov © 2011 30
  31. 31. Further Research (ongoing)  Given many Arabic-English, Arabic-French, Arabic-Italian dictionaries  Can we derive an Arabic-Arabic thesaurus? For example: ‫جدول: مصفوفة، نهر، قائمة، قناة ماء‬  Then Categorize very-related words (maybe using WordNet) as the following: ‫جدول: مصفوفة، قائمة، نهر، قناة ماء‬This will help finding possible Arabic synsets, which help detecting possible subtype relations and/or validate the existing relations. PalGov © 2011 31
  32. 32. ReferencesMustafa Jarrar: Building A Formal Arabic Ontology (Invited Paper) . In proceedings of the Experts Meeting On Arabic OntologieAnd Semantic Networks. Alecso, Arab League. Tunis, July 26-28, 2011.Article Jarrar: Towards The Notion Of Gloss, And The Adoption Of Linguistic Resources In Formal OntologyEngineering. In proceedings of the 15th International World Wide Web Conference (WWW2006). Edinburgh, Scotland. Pages 49503. ACM Press. ISBN: 1595933239. May 2006.[MBC93] George A. Miller, Richard Beckwith, Christiane Fellbaum, Derek Gross, and Katherine Miller: Introduction to WordNet:An On-line Lexical Database. International Journal of Lexicography, Vol. 3, Nr. 4. Pages 235-244. (1990)[GGO02] Aldo Gangemi , Nicola Guarino , Alessandro Oltramari , Ro Oltramari , Stefano Borgo: Cleaning-up WordNets Top-Level. In Proc. of the 1st International WordNetConference (2002);jsessionid=C9962DFEDD793F3F839426B774BC9BAF?doi= Christophe, Calberg-Challot Marie (2010): “Synonymy in Terminology: the Contribution of Ontoterminology”, Re-thinking synonymy: semantic sameness and similarity in languages and their description, Helsinki, 2010 Christophe, Calberg-Challot Marie, Damas Luc, Rouard Philippe (2009): “Ontoterminology: A new paradigm forterminology”. KEOD, Madeira PalGov © 2011 32