Semantic Analysis and Concept-based Translation for Multilingual Information Systems

2,732 views

Published on

Talk given at the GAL 2007,
Hildesheim, Germany

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,732
On SlideShare
0
From Embeds
0
Number of Embeds
27
Actions
Shares
0
Downloads
81
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Semantic Analysis and Concept-based Translation for Multilingual Information Systems

  1. 1. Semantic Analysis and Concept-based Translation for Multilingual Information Systems Johannes Leveling and Sven Hartrumpf and Rainer Osswald Intelligent Information and Communication Systems (IICS) University of Hagen (FernUniversität in Hagen) 58084 Hagen, Germany firstname.lastname@fernuni-hagen.de GAL 2007, Hildesheim, Germany
  2. 2. Semantic Analysis and Concept- based Translation Outline J. Leveling, S. Hartrumpf, R. Osswald Concept- 1 Concept-based Representation: MultiNet based Representa- tion: MultiNet 2 Three Phases for a Concept-Based Multilingual IR Three Phases System for a Concept- Based Multilingual IR System 3 Concept-Based Information Systems Concept- Based Information Systems 4 Applications Applications Conclusion and Outlook 5 Conclusion and Outlook References J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 2 / 27
  3. 3. Semantic Analysis and Concept- based Translation Motivation for Concept-Based J. Leveling, S. Hartrumpf, Translation R. Osswald Concept- based • Example 1: Representa- tion: Query expansion in information retrieval (IR) with MultiNet elements from same synset Three Phases for a Concept- → needs word sense disambiguation (differentiation of Based Multilingual IR concepts), otherwise loss of precision System Concept- • Example 2: Based Information Question answering (QA): questions on relations Systems between concepts (situations, events, etc.) Applications Example: Who killed Lee Harvey Oswald? Conclusion and Outlook → need semantic representation; References bag-of-words information retrieval is not enough J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 3 / 27
  4. 4. Semantic Analysis and Concept- based Translation The MultiNet Paradigm J. Leveling, S. Hartrumpf, R. Osswald • Meaning and knowledge representation: Concept- Multilayered Extended Semantic Networks (Helbig, based Representa- 2001, 2006) tion: MultiNet • Semantic network of nodes (concepts) and edges Three Phases (semantic relations from a fixed set) for a Concept- Based • In addition: Multilingual IR System semantic sorts, semantic features, layer information Concept- Based • Different types of concepts: Information Systems lexicalized vs. non-lexicalized Applications • Language-independence: Conclusion and Outlook annotation of English/Czech sentences from the Wall References Street Journal with MultiNet (Charles University, Prague) J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 4 / 27
  5. 5. Semantic Analysis and Concept- based Translation Selected Semantic Relations J. Leveling, Relation Description S. Hartrumpf, R. Osswald ASSOC association Concept- ATTCH attachment of object to object based Representa- CHPA change of sorts (property →abstract object) tion: MultiNet EXP experiencer Three Phases MCONT an informational process or object for a Concept- OBJ neutral object Based Multilingual IR PRED predicative concept specifying a plurality System PROP property relationship Concept- Based PARS meronymy Information Systems SCAR carrier of a state Applications SSPE state specifier SUB conceptual subordination for objects Conclusion and Outlook SUBS conceptual subordination for situations References SYNO synonymy TEMP temporal restriction for a situation ALTN 1 an introduction of alternatives J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 5 / 27
  6. 6. Semantic Analysis and Concept- based Translation The Computational Lexicon – J. Leveling, S. Hartrumpf, HaGenLex R. Osswald Concept- based Representa- tion: • Semantically oriented (German) lexical resource MultiNet (Hartrumpf et al., 2003) Three Phases for a Concept- • Consists of multiple lexicons: Based Multilingual IR • full syntactico-semantic information (26,000 entries) System • flat lexicon (50,000 entries) Concept- Based • compound lexicon (30,000 entries; structure and Information semantics) Systems • name lexicons (250,000 entries) Applications Conclusion • Support for the lexicographer: LIAplus workbench and Outlook References J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 6 / 27
  7. 7. Semantic Analysis and Concept- based Translation Sample Concepts (German) J. Leveling, S. Hartrumpf, R. Osswald Concept- based Representa- • essen.1.1: (Der Student) (ißt) (eine Schokolade). tion: MultiNet • essen.1.2: (Der Student) (ißt) sich (satt). Three Phases for a Concept- • essen.2.1: Das Kind hat kein Essen bekommen. Based Multilingual IR System • essen.2.2: Das Essen am Abend dauerte 2 Stunden. Concept- • fressen.1.1: (Der Hund) (frißt) (einen Knochen). Based Information • fressen.1.2: (Die Großmutter) (frißt) (einen Narren) (an Systems Applications den Blumen). Conclusion and Outlook References J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 7 / 27
  8. 8. Semantic Analysis and Concept- based Translation Lexicon Entry (German): J. Leveling, S. Hartrumpf, essen.1.1 R. Osswald n-sign  Concept- morph base ”essen”  based  infl-para i129g   v-syn  Representa-  v-type main  tion: syn   perf-aux haben  MultiNet v-control nocontr      sem  sem  entity nonment-action  Three Phases    c-id ”essen.1.1” for a Concept-      Based        rel agt  Multilingual IR        np-syn  System     cat np       syn   agr case nom   sel    Concept-   semsel      sem  Based    select semsel sem entity human-object   Information     rel aff    Systems          np-syn      cat np   Applications    syn       agr case acc     sel    Conclusion     sem   semsel sem and Outlook entity sort co References J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 8 / 27
  9. 9. Semantic Analysis and Concept- based Translation Lexicon Entry (German): J. Leveling, S. Hartrumpf, fressen.1.1 R. Osswald n-sign  Concept- morph base ”fressen”  based  infl-para i139g   v-syn  Representa-  v-type main  tion: syn   perf-aux haben  MultiNet v-control nocontr      sem  sem  entity nonment-action  Three Phases    c-id ”fressen.1.1” for a Concept-      Based        rel agt  Multilingual IR        np-syn  System     cat np       syn   agr case nom   sel    Concept-   semsel      sem  Based    select semsel sem entity animal-object ∨ human-object   Information     rel aff    Systems          np-syn      cat np   Applications    syn       agr case acc     sel    Conclusion     sem   semsel sem and Outlook entity sort co References J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 9 / 27
  10. 10. Semantic Analysis and Concept- based Translation Semantic analysis – J. Leveling, S. Hartrumpf, The WOCADI parser R. Osswald Concept- based Representa- • Produces semantic network representation from tion: MultiNet (German) texts (Hartrumpf, 2003): Three Phases • resolves coreferences, for a Concept- Based • analyzes idioms, Multilingual IR System • decompounds nouns and adjectives, Concept- • identifies metonymy, Based • resolves deictic expressions etc. Information Systems • Applied to large corpora, including Applications CLEF-NEWS newspaper corpus (275,000 articles) and Conclusion and Outlook German Wikipedia (500,000 articles) References J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 10 / 27
  11. 11. Semantic Analysis and Concept- based Translation SN Example (German) J. Leveling, S. Hartrumpf, R. Osswald du.1.1 streß.1.1 psychisch.1.1 Concept- PROP SUBS SUB based Representa- dokument.1.1 problem.1.1 tion: PRED *ALTN1 MultiNet c3 c7 c6 prüfling.1.1 Three Phases PRED EXP PRED for a Concept- c10 Based Multilingual IR OBJ MCONT ATTCH *ALTN1 System c2 c1 c5 c8 kandidat.1.1 SUBS Concept- SCAR E P SS Based c9 Information PRED B Systems SUBS SU Applications finden.1.1 c4 berichten.2.2 ASSOC Conclusion prüfungskandidat.1.1prüfung.1.1 and Outlook References Finde Dokumente, die über psychische Probleme oder Stress von Prüfungskandidaten oder Prüflingen berichten. (GIRT topic 116) J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 11 / 27
  12. 12. Semantic Analysis and Concept- based Translation SN Example (English) J. Leveling, S. Hartrumpf, R. Osswald you stress mental Concept- PROP SUBS SUB based Representa- document problem tion: PRED *ALTN1 MultiNet c3 c7 c6 examinee Three Phases PRED EXP PRED for a Concept- c10 Based Multilingual IR OBJ MCONT ATTCH *ALTN1 System c2 c1 c5 c8 candidate SUBS Concept- SCAR PE SS Based c9 Information PRED B Systems SUBS SU Applications find c4 report ASSOC Conclusion exam and Outlook References ‘Find documents reporting on mental problems or stress of examination candidates or examinees.’ (GIRT topic 116) J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 12 / 27
  13. 13. Semantic Analysis and Concept- based Translation Phase 1: Using Statistical MT J. Leveling, S. Hartrumpf, and Web Services R. Osswald Concept- • Employ (statistical) machine translation (MT) web based Representa- service for IR experiments (translation of tion: MultiNet queries/questions): Systran, Promt, ... Three Phases • Problems: for a Concept- Based • translating questions: Multilingual IR System most systems trained on declarative sentences; Concept- imperative forms often misunderstood Based Information (Find documents ... →Fund Dokument ...) Systems • named entity recognition: Applications not reliable (Neuengland →new narrow country ) Conclusion and Outlook • Performance loss from off-the-shelf translation tools for References QA@CLEF: 50% further examples: Ligozat et al. (2006) J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 13 / 27
  14. 14. Semantic Analysis and Concept- based Translation Phase 2: Aligning J. Leveling, S. Hartrumpf, Concept-based Tools and R. Osswald Resources Concept- based Representa- tion: • Morphology and syntax are different for different MultiNet languages Three Phases for a Concept- • Semantics is the same (in general) Based Multilingual IR • Our approach: System Concept- • create lexicons for different languages ; Based fast construction parallel to existing lexicon(s), e.g. Information Systems HaGenLex →HaEnLex Applications • develop parser for different languages Conclusion • apply methods from IR/QA on SN representation and Outlook References • General idea: replace concepts (labels) in semantic network representation (as a form of translation) J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 14 / 27
  15. 15. Semantic Analysis and Concept- based Translation Status of Alignment of Lexical J. Leveling, S. Hartrumpf, Resources R. Osswald Concept- based • German to English dictionaries: about 100,000 Representa- tion: word/phrase translations MultiNet • Mapping between HaGenLex concepts and GermaNet Three Phases for a Concept- concepts, plus GermaNet to EuroWordNet mapping: Based Multilingual IR about 14,000 concept translations System Concept- • Wikipedia articles (in German and English): about Based Information 3,000 proper noun translations for cities, countries, Systems persons, organizations, etc. Applications • HaEnLex (parallel English version of HaGenLex) with Conclusion and Outlook full morphologic, syntactic, semantic description of References concepts: about 7,000 English entries J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 15 / 27
  16. 16. Semantic Analysis and Concept- based Translation Linguistic Phenomena (1/6) J. Leveling, S. Hartrumpf, R. Osswald Concept- based Compounds (rare in English): Representa- tion: MultiNet • with regular semantics Three Phases Kinderernährung →nutrition of children for a Concept- Based • with irregular semantics Multilingual IR System Frauenzimmer →dame (?); ladies’ room (?) Concept- Based • borderline cases Information Systems Bankwesen →banking (system) (?) Applications → compound-less semantic representation is possible Conclusion and Outlook References J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 16 / 27
  17. 17. Semantic Analysis and Concept- based Translation Linguistic Phenomena (2/6) J. Leveling, S. Hartrumpf, R. Osswald Concept- based Representa- Idioms: tion: MultiNet • with corresponding idiom: Three Phases for a Concept- in den Sinn kommen (DE) →to start thinking about sth. Based Multilingual IR to come into mind (EN) →to start thinking about sth. System • without equivalent idiom: Concept- Based to be someone’s cup of tea (EN) →to like Information Systems → semantic representation of idioms Applications Conclusion and Outlook References J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 17 / 27
  18. 18. Semantic Analysis and Concept- based Translation Linguistic Phenomena (3/6) J. Leveling, S. Hartrumpf, R. Osswald Concept- based Metonymy: Representa- tion: • with corresponding metonymy pattern (for regulat MultiNet metonymy): Three Phases for a Concept- The White House agreed, that ... (EN) Based Multilingual IR →place-for-government System Das Weiße Haus stimmte zu, dass ... (DE) Concept- Based →place-for-government Information Systems • without: ? Applications Conclusion → no problems, yet and Outlook References J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 18 / 27
  19. 19. Semantic Analysis and Concept- based Translation Linguistic Phenomena (4/6) J. Leveling, S. Hartrumpf, R. Osswald Concept- based Proper nouns: Representa- tion: MultiNet • transcriptions and transliterations, historic name Three Phases variants for a Concept- Based • Böll →Boell; Multilingual IR System Gorbatschow →Gorbatchev, Gorbatchov Concept- Based → can be solved using aligned online resources e.g. Information Systems Wikipedia Applications → treat name variants as elements of the same synset Conclusion and Outlook References J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 19 / 27
  20. 20. Semantic Analysis and Concept- based Translation Linguistic Phenomena (5/6) J. Leveling, S. Hartrumpf, R. Osswald Concept- based Semantic gaps/lexical gaps: Representa- tion: • Fohlen (DE) →colt (if male), MultiNet Three Phases • Fohlen (DE) →filly (if female) for a Concept- Based • Alignment of lexicon entries: morpho-syntactic features Multilingual IR System differ in different languages, syntactic features also, Concept- semantic features do not (in general) but: net Based Information entries/rules/entailments may be slightly different?!, Systems because they already involve other concepts (which Applications Conclusion have to be translated) and Outlook References J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 20 / 27
  21. 21. Semantic Analysis and Concept- based Translation Linguistic Phenomena (6/6) J. Leveling, S. Hartrumpf, Semantic gaps/lexical gaps: R. Osswald essen.1.1 →eat.1.1 AND fressen.1.1 →eat.1.1 Concept- based Representa- n-sign  tion: ”eat” MultiNet morph base infl-para i20     v-syn  Three Phases syn v-type main  for a Concept-     sem sem Based     entity nonment-action  Multilingual IR   c-id  ”eat.1.1”   System         rel agt    Concept-        np-syn    syn Based   cat np   semsel  sel    Information sem    semsel sem   entity animal-object ∨ human-object  Systems  select       rel aff    Applications       np-syn   syn cat np            Conclusion   sel  sem   semsel sem and Outlook entity sort co References J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 21 / 27
  22. 22. Semantic Analysis and Concept- based Translation Phase 3: Towards a J. Leveling, S. Hartrumpf, Concept-Based Translation R. Osswald Concept- based Representa- tion: MultiNet Three Phases • Assumption that the same inventory of relations hold for a Concept- Based (about 140 relations) for different languages Multilingual IR System • Natural language generation (for German) Concept- Based • Possible solution: English parser, generate natural Information Systems language from semantic network representation Applications Conclusion and Outlook References J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 22 / 27
  23. 23. Semantic Analysis and Concept- based Translation Monolingual Concept-Based IR J. Leveling, S. Hartrumpf, R. Osswald • Techniques of standard IR: stemming and stopword Concept- based removal Representa- tion: • Monolingual concept-based IR: MultiNet • represent queries (and documents) as semantic Three Phases for a Concept- networks Based • (translate concepts) Multilingual IR System • employ methods on semantic network representation Concept- Based • Advantages: Information Systems • semantics of compounds (relation to its constituents) Applications • semantics of prepositions is typically represented by Conclusion semantic relation or function (no full translation needed) and Outlook • lemmatizing (instead of stemming) References • query expansion with elements of synsets J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 23 / 27
  24. 24. Semantic Analysis and Concept- based Translation Multilingual Concept-Based IR J. Leveling, S. Hartrumpf, R. Osswald Concept- based Representa- • Three different approaches at supporting a multilingual tion: search MultiNet Three Phases 1 translate queries into the document language for a Concept- 2 translate documents into the query language Based Multilingual IR 3 translate both queries and documents into an System interlingua Concept- Based • Multilingual concept-based IR: same as monolingual Information Systems approach, but translate concepts (1, 2, or 3) Applications →towards an interlingua Conclusion and Outlook References J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 24 / 27
  25. 25. Semantic Analysis and Concept- based Translation Projects and Evaluations J. Leveling, S. Hartrumpf, R. Osswald • GeoCLEF (Leveling and Veiel, 2006): Web service for Concept- based MT (query translation) Representa- tion: • GIRT-4 experiments (Leveling, 2004, 2006a): combined MultiNet concept and word translation Three Phases for a Concept- • NLI-Z39.50 (Leveling, 2006b): replace terminal Based Multilingual IR System concepts in SN, then treat translation alternatives as a Concept- synset for query expansion (no decision for a single Based Information reading necessary) Systems • QA@CLEF (Hartrumpf and Leveling, 2007): Web Applications service for MT, then analysis; concept-based translation Conclusion and Outlook with rudimentary English parser (preliminary References experiments) J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 25 / 27
  26. 26. Semantic Analysis and Concept- based Translation Conclusion J. Leveling, S. Hartrumpf, R. Osswald Concept- based Representa- • General approach: tion: MultiNet • Parse queries Three Phases • Translate concepts in SN representation for a Concept- Based • Operate on SN representation Multilingual IR System • Aims at multilingual information systems for different Concept- Based purposes: Information Systems IR, QA Applications • 3 phases (currently phase 2) Conclusion and Outlook References J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 26 / 27
  27. 27. Semantic Analysis and Concept- based Translation Outlook J. Leveling, S. Hartrumpf, R. Osswald Concept- based Representa- tion: MultiNet • Create a repository of interlingua concepts: Three Phases allow for a concept-based machine-translation of text for a Concept- Based →natural language generation Multilingual IR System →MT Concept- • Outlook for IR/QA: Based Information index semantic relations as well Systems Applications Conclusion and Outlook References J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 27 / 27
  28. 28. Semantic Hartrumpf, Sven (2003). Hybrid Disambiguation in Natural Language Analysis and Analysis. Osnabrück, Germany: Der Andere Verlag. Concept- based Hartrumpf, Sven; Hermann Helbig; and Rainer Osswald (2003). The Translation semantically based computer lexicon HaGenLex – Structure and J. Leveling, S. Hartrumpf, technological environment. Traitement automatique des langues, R. Osswald 44(2):81–105. Concept- Hartrumpf, Sven and Johannes Leveling (2007). Interpretation and based normalization of temporal expressions for question answering. In Representa- tion: Evaluation of Multilingual and Multi-modal Information Retrieval: 7th MultiNet Workshop of the Cross-Language Evaluation Forum, CLEF 2006 Three Phases (edited by Peters, Carol; Paul Clough; Fredric C. Gey; Jussi Karlgren; for a Concept- Based Bernardo Magnini; Douglas W. Oard; Maarten de Rijke; and Multilingual IR System Maximilian Stempfhuber), volume 4730 of LNCS, pp. 432–439. Berlin: Springer. Concept- Based Helbig, Hermann (2001). Die semantische Struktur natürlicher Sprache: Information Systems Wissensrepräsentation mit MultiNet. Berlin: Springer. Applications Helbig, Hermann (2006). Knowledge Representation and the Semantics Conclusion of Natural Language. Berlin: Springer. and Outlook Leveling, Johannes (2004). University of Hagen at CLEF 2003: Natural References language access to the GIRT4 data. In Comparative Evaluation of Multilingual Information Access Systems: 4th Workshop of the Cross-Language Evaluation Forum, CLEF 2003 (edited by Peters, J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 27 / 27
  29. 29. Semantic Carol; Julio Gonzalo; Martin Braschler; and Michael Kluck), volume Analysis and 3237 of LNCS, pp. 412–424. Berlin: Springer. Concept- based Leveling, Johannes (2006a). A baseline for NLP in domain-specific Translation information retrieval. In Accessing Multilingual Information J. Leveling, S. Hartrumpf, Repositories: 6th Workshop of the Cross-Language Evaluation Forum, R. Osswald CLEF 2005 (edited by Peters, Carol; Fredric C. Gey; Julio Gonzalo; Gareth J. F. Jones; Michael Kluck; Bernardo Magnini; Henning Müller; Concept- based and Maarten de Rijke), volume 4022 of LNCS, pp. 222–225. Berlin: Representa- Springer. tion: MultiNet Leveling, Johannes (2006b). Formale Interpretation von Nutzeranfragen Three Phases für natürlichsprachliche Interfaces zu Informationsangeboten im for a Concept- Based Internet. Der andere Verlag, Tönning, Germany. Multilingual IR System Leveling, Johannes and Dirk Veiel (2006). University of Hagen at Concept- GeoCLEF 2006: Experiments with metonymy recognition in Based documents. In Results of the CLEF 2006 Cross-Language System Information Systems Evaluation Campaign, Working Notes for the CLEF 2006 Workshop Applications (edited by Nardi, Alessandro; Carol Peters; and José Luis Vicedo). Alicante, Spain. Conclusion and Outlook Ligozat, Anne-Laure; Brigitte Grau; Isabelle Robba; and Anne Vilnat References (2006). Evaluation and improvement of cross-lingual question answering strategies. In Proceedings of the EACL 2006 Workshop on Multilingual Question Answering (MLQA’06), pp. 23–30. Trento, Italy. J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 27 / 27

×