Semantic Analysis and Concept-based Translation for Multilingual Information Systems

Semantic Analysis and Concept-based
Translation for Multilingual Information
Systems

Johannes Leveling and
Sven Hartrumpf and
Rainer Osswald

Intelligent Information and Communication Systems (IICS)
University of Hagen (FernUniversität in Hagen)
58084 Hagen, Germany
firstname.lastname@fernuni-hagen.de

GAL 2007, Hildesheim, Germany

Semantic
Analysis and
Concept-
based
Translation
Outline
J. Leveling,
S. Hartrumpf,
R. Osswald

Concept- 1 Concept-based Representation: MultiNet
based
Representa-
tion:
MultiNet 2 Three Phases for a Concept-Based Multilingual IR
Three Phases System
for a Concept-
Based
Multilingual IR
System 3 Concept-Based Information Systems
Concept-
Based
Information
Systems 4 Applications
Applications

Conclusion
and Outlook 5 Conclusion and Outlook
References

J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 2 / 27

Semantic
Analysis and
Concept-
based
Translation
Motivation for Concept-Based
J. Leveling,
S. Hartrumpf,
Translation
R. Osswald

Concept-
based
• Example 1:
Representa-
tion:
Query expansion in information retrieval (IR) with
MultiNet
elements from same synset
Three Phases
for a Concept- → needs word sense disambiguation (differentiation of
Based
Multilingual IR concepts), otherwise loss of precision
System

Concept-
• Example 2:
Based
Information
Question answering (QA): questions on relations
Systems
between concepts (situations, events, etc.)
Applications
Example: Who killed Lee Harvey Oswald?
Conclusion
and Outlook → need semantic representation;
References bag-of-words information retrieval is not enough


Semantic
Analysis and
Concept-
based
Translation
The MultiNet Paradigm
J. Leveling,
S. Hartrumpf,
R. Osswald • Meaning and knowledge representation:
Concept-
Multilayered Extended Semantic Networks (Helbig,
based
Representa-
2001, 2006)
tion:
MultiNet • Semantic network of nodes (concepts) and edges
Three Phases (semantic relations from a ﬁxed set)
for a Concept-
Based • In addition:
Multilingual IR
System semantic sorts, semantic features, layer information
Concept-
Based • Different types of concepts:
Information
Systems lexicalized vs. non-lexicalized
Applications • Language-independence:
Conclusion
and Outlook annotation of English/Czech sentences from the Wall
References Street Journal with MultiNet (Charles University,
Prague)


Semantic
Analysis and
Concept-
based
Translation
Selected Semantic Relations
J. Leveling, Relation Description
S. Hartrumpf,
R. Osswald
ASSOC association
Concept- ATTCH attachment of object to object
based
Representa- CHPA change of sorts (property →abstract object)
tion:
MultiNet
EXP experiencer
Three Phases
MCONT an informational process or object
for a Concept- OBJ neutral object
Based
Multilingual IR PRED predicative concept specifying a plurality
System
PROP property relationship
Concept-
Based PARS meronymy
Information
Systems
SCAR carrier of a state
Applications
SSPE state speciﬁer
SUB conceptual subordination for objects
Conclusion
and Outlook SUBS conceptual subordination for situations
References SYNO synonymy
TEMP temporal restriction for a situation
ALTN 1 an introduction of alternatives

Semantic
Analysis and
Concept-
based
Translation
The Computational Lexicon –
J. Leveling,
S. Hartrumpf,
HaGenLex
R. Osswald

Concept-
based
Representa-
tion: • Semantically oriented (German) lexical resource
MultiNet
(Hartrumpf et al., 2003)
Three Phases
for a Concept- • Consists of multiple lexicons:
Based
Multilingual IR • full syntactico-semantic information (26,000 entries)
System
• ﬂat lexicon (50,000 entries)
Concept-
Based • compound lexicon (30,000 entries; structure and
Information semantics)
Systems
• name lexicons (250,000 entries)
Applications

Conclusion • Support for the lexicographer: LIAplus workbench
and Outlook

References


Semantic
Analysis and
Concept-
based
Translation
Sample Concepts (German)
J. Leveling,
S. Hartrumpf,
R. Osswald

Concept-
based
Representa- • essen.1.1: (Der Student) (ißt) (eine Schokolade).
tion:
MultiNet • essen.1.2: (Der Student) (ißt) sich (satt).
Three Phases
for a Concept- • essen.2.1: Das Kind hat kein Essen bekommen.
Based
Multilingual IR
System
• essen.2.2: Das Essen am Abend dauerte 2 Stunden.
Concept- • fressen.1.1: (Der Hund) (frißt) (einen Knochen).
Based
Information • fressen.1.2: (Die Großmutter) (frißt) (einen Narren) (an
Systems

Applications
den Blumen).
Conclusion
and Outlook

References


Semantic
Analysis and
Concept-
based
Translation
Lexicon Entry (German):
J. Leveling,
S. Hartrumpf,
essen.1.1
R. Osswald
n-sign 
Concept- morph base ”essen” 
based  infl-para i129g 
 v-syn 
Representa- 
v-type main

tion:
syn 
 perf-aux haben 
MultiNet v-control nocontr
 
 
 sem 
sem
 entity nonment-action 
Three Phases   
c-id ”essen.1.1”
for a Concept-
 
  
Based     
  rel agt 
Multilingual IR     
  np-syn 
System    
cat np
  
   syn  
agr case nom
  sel   
Concept-  
semsel     
sem

Based 


select semsel sem
entity human-object


Information 

 
rel aff
 

Systems

    
   np-syn  
   cat np  
Applications    syn  
    agr case acc  
  sel   
Conclusion     sem  
semsel sem
and Outlook entity sort co

References


Semantic
Analysis and
Concept-
based
Translation
Lexicon Entry (German):
J. Leveling,
S. Hartrumpf,
fressen.1.1
R. Osswald
n-sign 
Concept- morph base ”fressen” 
based  infl-para i139g 
 v-syn 
Representa- 
v-type main

tion:
syn 
 perf-aux haben 
MultiNet v-control nocontr
 
 
 sem 
sem
 entity nonment-action 
Three Phases   
c-id ”fressen.1.1”
for a Concept-
 
  
Based     
Multilingual IR     
  np-syn 
System    
cat np
  
   syn  
agr case nom
  sel   
Concept-  
semsel     
sem

Based 


select semsel sem
entity animal-object ∨ human-object


Information 

 
rel aff
 

Systems

    
   np-syn  
   cat np  
Applications    syn  
    agr case acc  
  sel   
Conclusion     sem  
semsel sem

References


Semantic
Analysis and
Concept-
based
Translation
Semantic analysis –
J. Leveling,
S. Hartrumpf,
The WOCADI parser
R. Osswald

Concept-
based
Representa- • Produces semantic network representation from
tion:
MultiNet (German) texts (Hartrumpf, 2003):
Three Phases • resolves coreferences,
for a Concept-
Based • analyzes idioms,
Multilingual IR
System
• decompounds nouns and adjectives,
Concept-
• identiﬁes metonymy,
Based • resolves deictic expressions etc.
Information
Systems
• Applied to large corpora, including
Applications
CLEF-NEWS newspaper corpus (275,000 articles) and
Conclusion
and Outlook German Wikipedia (500,000 articles)
References


Semantic
Analysis and
Concept-
based
Translation
SN Example (German)
J. Leveling,
S. Hartrumpf,
R. Osswald du.1.1 streß.1.1 psychisch.1.1

Concept-

PROP
SUBS
SUB
based
Representa-
dokument.1.1 problem.1.1
tion: PRED

*ALTN1
MultiNet c3 c7 c6
prüfling.1.1
Three Phases
PRED
EXP

PRED
for a Concept- c10
Based
Multilingual IR OBJ MCONT ATTCH
*ALTN1
System c2 c1 c5 c8
kandidat.1.1
SUBS

Concept-
SCAR

E
P
SS

Based c9
Information

PRED
B
Systems SUBS SU

Applications finden.1.1 c4 berichten.2.2
ASSOC
Conclusion prüfungskandidat.1.1prüfung.1.1
and Outlook

References

Finde Dokumente, die über psychische Probleme oder Stress von
Prüfungskandidaten oder Prüﬂingen berichten. (GIRT topic 116)

Semantic
Analysis and
Concept-
based
Translation
SN Example (English)
J. Leveling,
S. Hartrumpf,
R. Osswald you stress mental

Concept-

PROP
SUBS
SUB
based
Representa-
document problem
tion: PRED

*ALTN1
MultiNet c3 c7 c6
examinee
Three Phases
PRED
EXP

PRED
for a Concept- c10
Based
Multilingual IR OBJ MCONT ATTCH
*ALTN1
System c2 c1 c5 c8
candidate
SUBS

Concept-
SCAR

PE
SS

Based c9
Information

PRED
B
Systems SUBS SU

Applications find c4 report
ASSOC
Conclusion exam
and Outlook

References

‘Find documents reporting on mental problems or stress of examination
candidates or examinees.’ (GIRT topic 116)

Semantic
Analysis and
Concept-
based
Translation
Phase 1: Using Statistical MT
J. Leveling,
S. Hartrumpf,
and Web Services
R. Osswald

Concept- • Employ (statistical) machine translation (MT) web
based
Representa- service for IR experiments (translation of
tion:
MultiNet queries/questions): Systran, Promt, ...
Three Phases • Problems:
for a Concept-
Based • translating questions:
Multilingual IR
System most systems trained on declarative sentences;
Concept- imperative forms often misunderstood
Based
Information
(Find documents ... →Fund Dokument ...)
Systems • named entity recognition:
Applications not reliable (Neuengland →new narrow country )
Conclusion
and Outlook
• Performance loss from off-the-shelf translation tools for
References QA@CLEF: 50%
further examples: Ligozat et al. (2006)


Semantic
Analysis and
Concept-
based
Translation
Phase 2: Aligning
J. Leveling,
S. Hartrumpf,
Concept-based Tools and
R. Osswald
Resources
Concept-
based
Representa-
tion:
• Morphology and syntax are different for different
MultiNet
languages
Three Phases
for a Concept- • Semantics is the same (in general)
Based
Multilingual IR • Our approach:
System

Concept-
• create lexicons for different languages ;
Based fast construction parallel to existing lexicon(s), e.g.
Information
Systems HaGenLex →HaEnLex
Applications • develop parser for different languages
Conclusion • apply methods from IR/QA on SN representation
and Outlook

References
• General idea: replace concepts (labels) in semantic
network representation (as a form of translation)


Semantic
Analysis and
Concept-
based
Translation
Status of Alignment of Lexical
J. Leveling,
S. Hartrumpf,
Resources
R. Osswald

Concept-
based
• German to English dictionaries: about 100,000
Representa-
tion:
word/phrase translations
MultiNet
• Mapping between HaGenLex concepts and GermaNet
Three Phases
for a Concept- concepts, plus GermaNet to EuroWordNet mapping:
Based
Multilingual IR about 14,000 concept translations
System

Concept-
• Wikipedia articles (in German and English): about
Based
Information
3,000 proper noun translations for cities, countries,
Systems
persons, organizations, etc.
Applications
• HaEnLex (parallel English version of HaGenLex) with
Conclusion
and Outlook full morphologic, syntactic, semantic description of
References concepts: about 7,000 English entries


Semantic
Analysis and
Concept-
based
Translation
Linguistic Phenomena (1/6)
J. Leveling,
S. Hartrumpf,
R. Osswald

Concept-
based Compounds (rare in English):
Representa-
tion:
MultiNet
• with regular semantics
Three Phases Kinderernährung →nutrition of children
for a Concept-
Based • with irregular semantics
Multilingual IR
System Frauenzimmer →dame (?); ladies’ room (?)
Concept-
Based • borderline cases
Information
Systems Bankwesen →banking (system) (?)
Applications → compound-less semantic representation is possible
Conclusion
and Outlook

References


Semantic
Analysis and
Concept-
based
Translation
J. Leveling,
S. Hartrumpf,
R. Osswald

Concept-
based
Representa- Idioms:
tion:
MultiNet • with corresponding idiom:
Three Phases
for a Concept- in den Sinn kommen (DE) →to start thinking about sth.
Based
Multilingual IR
to come into mind (EN) →to start thinking about sth.
System
• without equivalent idiom:
Concept-
Based to be someone’s cup of tea (EN) →to like
Information
Systems
→ semantic representation of idioms
Applications

Conclusion
and Outlook

References


Semantic
Analysis and
Concept-
based
Translation
J. Leveling,
S. Hartrumpf,
R. Osswald

Concept-
based
Metonymy:
Representa-
tion: • with corresponding metonymy pattern (for regulat
MultiNet
metonymy):
Three Phases
for a Concept- The White House agreed, that ... (EN)
Based
Multilingual IR →place-for-government
System
Das Weiße Haus stimmte zu, dass ... (DE)
Concept-
Based →place-for-government
Information
Systems • without: ?
Applications

Conclusion
→ no problems, yet
and Outlook

References


Semantic
Analysis and
Concept-
based
Translation
J. Leveling,
S. Hartrumpf,
R. Osswald

Concept-
based Proper nouns:
Representa-
tion:
MultiNet
• transcriptions and transliterations, historic name
Three Phases variants
for a Concept-
Based • Böll →Boell;
Multilingual IR
System Gorbatschow →Gorbatchev, Gorbatchov
Concept-
Based
→ can be solved using aligned online resources e.g.
Information
Systems
Wikipedia
Applications → treat name variants as elements of the same synset
Conclusion
and Outlook

References


Semantic
Analysis and
Concept-
based
Translation
J. Leveling,
S. Hartrumpf,
R. Osswald

Concept-
based
Semantic gaps/lexical gaps:
Representa-
tion: • Fohlen (DE) →colt (if male),
MultiNet

Three Phases
• Fohlen (DE) →ﬁlly (if female)
for a Concept-
Based • Alignment of lexicon entries: morpho-syntactic features
Multilingual IR
System differ in different languages, syntactic features also,
Concept- semantic features do not (in general) but: net
Based
Information entries/rules/entailments may be slightly different?!,
Systems
because they already involve other concepts (which
Applications

Conclusion
have to be translated)
and Outlook

References


Semantic
Analysis and
Concept-
based
Translation
J. Leveling,
S. Hartrumpf,
Semantic gaps/lexical gaps:
R. Osswald
essen.1.1 →eat.1.1 AND fressen.1.1 →eat.1.1
Concept-
based
Representa- n-sign 
tion:
”eat”
MultiNet morph base
infl-para i20 
 
 v-syn 
Three Phases syn
v-type main

for a Concept-
 

 sem
sem
Based
 
  entity nonment-action 
Multilingual IR 
 c-id

”eat.1.1” 

System 
    
 
Concept-    
   np-syn  
 syn
Based   cat np  
semsel  sel   
Information sem
 
 semsel sem 
 entity animal-object ∨ human-object 
Systems  select
   
  rel aff 
 
Applications
  
   np-syn  
syn
cat np
    
     
Conclusion   sel  sem  
semsel sem

References


Semantic
Analysis and
Concept-
based
Translation
Phase 3: Towards a
J. Leveling,
S. Hartrumpf,
Concept-Based Translation
R. Osswald

Concept-
based
Representa-
tion:
MultiNet

Three Phases • Assumption that the same inventory of relations hold
for a Concept-
Based (about 140 relations) for different languages
Multilingual IR
System • Natural language generation (for German)
Concept-
Based • Possible solution: English parser, generate natural
Information
Systems language from semantic network representation
Applications

Conclusion
and Outlook

References


Semantic
Analysis and
Concept-
based
Translation
Monolingual Concept-Based IR
J. Leveling,
S. Hartrumpf,
R. Osswald
• Techniques of standard IR: stemming and stopword
Concept-
based removal
Representa-
tion: • Monolingual concept-based IR:
MultiNet
• represent queries (and documents) as semantic
Three Phases
for a Concept- networks
Based • (translate concepts)
Multilingual IR
System • employ methods on semantic network representation
Concept-
Based
• Advantages:
Information
Systems
• semantics of compounds (relation to its constituents)
Applications
• semantics of prepositions is typically represented by
Conclusion
semantic relation or function (no full translation needed)
and Outlook • lemmatizing (instead of stemming)
References • query expansion with elements of synsets


Semantic
Analysis and
Concept-
based
Translation
Multilingual Concept-Based IR
J. Leveling,
S. Hartrumpf,
R. Osswald

Concept-
based
Representa-
• Three different approaches at supporting a multilingual
tion: search
MultiNet

Three Phases
1 translate queries into the document language
for a Concept- 2 translate documents into the query language
Based
Multilingual IR 3 translate both queries and documents into an
System
interlingua
Concept-
Based • Multilingual concept-based IR: same as monolingual
Information
Systems approach, but translate concepts (1, 2, or 3)
Applications →towards an interlingua
Conclusion
and Outlook

References


Semantic
Analysis and
Concept-
based
Translation
Projects and Evaluations
J. Leveling,
S. Hartrumpf,
R. Osswald
• GeoCLEF (Leveling and Veiel, 2006): Web service for
Concept-
based
MT (query translation)
Representa-
tion: • GIRT-4 experiments (Leveling, 2004, 2006a): combined
MultiNet
concept and word translation
Three Phases
for a Concept- • NLI-Z39.50 (Leveling, 2006b): replace terminal
Based
Multilingual IR
System
concepts in SN, then treat translation alternatives as a
Concept-
synset for query expansion (no decision for a single
Based
Information
reading necessary)
Systems
• QA@CLEF (Hartrumpf and Leveling, 2007): Web
Applications
service for MT, then analysis; concept-based translation
Conclusion
and Outlook with rudimentary English parser (preliminary
References experiments)


Semantic
Analysis and
Concept-
based
Translation
Conclusion
J. Leveling,
S. Hartrumpf,
R. Osswald

Concept-
based
Representa-
• General approach:
tion:
MultiNet • Parse queries
Three Phases • Translate concepts in SN representation
for a Concept-
Based
• Operate on SN representation
Multilingual IR
System • Aims at multilingual information systems for different
Concept-
Based
purposes:
Information
Systems
IR, QA
Applications • 3 phases (currently phase 2)
Conclusion
and Outlook

References


Semantic
Analysis and
Concept-
based
Translation
Outlook
J. Leveling,
S. Hartrumpf,
R. Osswald

Concept-
based
Representa-
tion:
MultiNet
• Create a repository of interlingua concepts:
Three Phases
allow for a concept-based machine-translation of text
for a Concept-
Based
→natural language generation
Multilingual IR
System
→MT
Concept- • Outlook for IR/QA:
Based
Information index semantic relations as well
Systems

Applications

Conclusion
and Outlook

References


Semantic Hartrumpf, Sven (2003). Hybrid Disambiguation in Natural Language
Analysis and Analysis. Osnabrück, Germany: Der Andere Verlag.
Concept-
based Hartrumpf, Sven; Hermann Helbig; and Rainer Osswald (2003). The
Translation
semantically based computer lexicon HaGenLex – Structure and
J. Leveling,
S. Hartrumpf, technological environment. Traitement automatique des langues,
R. Osswald 44(2):81–105.
Concept- Hartrumpf, Sven and Johannes Leveling (2007). Interpretation and
based normalization of temporal expressions for question answering. In
Representa-
tion: Evaluation of Multilingual and Multi-modal Information Retrieval: 7th
MultiNet
Workshop of the Cross-Language Evaluation Forum, CLEF 2006
Three Phases (edited by Peters, Carol; Paul Clough; Fredric C. Gey; Jussi Karlgren;
for a Concept-
Based Bernardo Magnini; Douglas W. Oard; Maarten de Rijke; and
Multilingual IR
System
Maximilian Stempfhuber), volume 4730 of LNCS, pp. 432–439. Berlin:
Springer.
Concept-
Based Helbig, Hermann (2001). Die semantische Struktur natürlicher Sprache:
Information
Systems Wissensrepräsentation mit MultiNet. Berlin: Springer.
Applications Helbig, Hermann (2006). Knowledge Representation and the Semantics
Conclusion of Natural Language. Berlin: Springer.
and Outlook
Leveling, Johannes (2004). University of Hagen at CLEF 2003: Natural
References
language access to the GIRT4 data. In Comparative Evaluation of
Multilingual Information Access Systems: 4th Workshop of the
Cross-Language Evaluation Forum, CLEF 2003 (edited by Peters,

Semantic Carol; Julio Gonzalo; Martin Braschler; and Michael Kluck), volume
Analysis and 3237 of LNCS, pp. 412–424. Berlin: Springer.
Concept-
based Leveling, Johannes (2006a). A baseline for NLP in domain-speciﬁc
Translation
information retrieval. In Accessing Multilingual Information
J. Leveling,
S. Hartrumpf, Repositories: 6th Workshop of the Cross-Language Evaluation Forum,
R. Osswald CLEF 2005 (edited by Peters, Carol; Fredric C. Gey; Julio Gonzalo;
Gareth J. F. Jones; Michael Kluck; Bernardo Magnini; Henning Müller;
Concept-
based and Maarten de Rijke), volume 4022 of LNCS, pp. 222–225. Berlin:
Representa- Springer.
tion:
MultiNet Leveling, Johannes (2006b). Formale Interpretation von Nutzeranfragen
Three Phases für natürlichsprachliche Interfaces zu Informationsangeboten im
for a Concept-
Based Internet. Der andere Verlag, Tönning, Germany.
Multilingual IR
System Leveling, Johannes and Dirk Veiel (2006). University of Hagen at
Concept- GeoCLEF 2006: Experiments with metonymy recognition in
Based documents. In Results of the CLEF 2006 Cross-Language System
Information
Systems Evaluation Campaign, Working Notes for the CLEF 2006 Workshop
Applications
(edited by Nardi, Alessandro; Carol Peters; and José Luis Vicedo).
Alicante, Spain.
Conclusion
and Outlook Ligozat, Anne-Laure; Brigitte Grau; Isabelle Robba; and Anne Vilnat
References (2006). Evaluation and improvement of cross-lingual question
answering strategies. In Proceedings of the EACL 2006 Workshop on
Multilingual Question Answering (MLQA’06), pp. 23–30. Trento, Italy.

Semantic Analysis and Concept-based Translation for Multilingual Information Systems

Recommended

Recommended

More Related Content

Similar to Semantic Analysis and Concept-based Translation for Multilingual Information Systems

Similar to Semantic Analysis and Concept-based Translation for Multilingual Information Systems (20)

Recently uploaded

Recently uploaded (20)

Semantic Analysis and Concept-based Translation for Multilingual Information Systems