Exploiting Distributional Semantic Models in Question Answering

922 views
775 views

Published on

This paper investigates the role of Distributional
Semantic Models (DSMs) in Question Answering (QA), and
specifically in a QA system called QuestionCube. QuestionCube is
a framework for QA that combines several techniques to retrieve
passages containing the exact answers for natural language questions.
It exploits Information Retrieval models to seek candidate
answers and Natural Language Processing algorithms for the
analysis of questions and candidate answers both in English and
Italian. The data source for the answer is an unstructured text
document collection stored in search indices.
In this paper we propose to exploit DSMs in the QuestionCube
framework. In DSMs words are represented as mathematical
points in a geometric space, also known as semantic space. Words
are similar if they are close in that space. Our idea is that
DSMs approaches can help to compute relatedness between users’
questions and candidate answers by exploiting paradigmatic
relations between words. Results of an experimental evaluation
carried out on CLEF2010 QA dataset, prove the effectiveness of
the proposed approach.

Published in: Technology, Education
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
922
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
19
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Exploiting Distributional Semantic Models in Question Answering

  1. 1. Exploiting Distributional Semantic Models in Question Answering Piero Molino1,2, Pierpaolo Basile1,2, Annalina Caputo1, Pasquale Lops1, Giovanni Semeraro1 {basilepp@di.uniba.it} 1Dept. of Computer Science, University of Bari Aldo Moro 2QuestionCube s.r.l. 6th IEEE International Conference on Semantic Computing September 19-21, 2012 Palermo, Italy
  2. 2. Background A bottle of Tesgüino is on the table. Everyone likes Tesgüino. Tesgüino makes you drunk. We make Tesgüino out of corn.
  3. 3. Background A bottle of Tesgüino is on the table. Everyone likes Tesgüino. Tesgüino makes you drunk. We make Tesgüino out of corn.It is a corn beer
  4. 4. BackgroundYou shall know a word bythe company it keeps!Meaning of a word isdetermined by its usage 4
  5. 5. BackgroundDistributional Semantic Models (DSMs) – represent words as points in a geometric space – do not require specific text operations – corpus/language independentWidely used in IR and Computational Linguistic – semantic text similarity, synonyms detection, query expansion, …Not directly used in Question Answering (QA) 5
  6. 6. Our ideaUsing DSMs into a QA system – QuestionCube: a framework for QAExploit DSMs – compare semantic similarity between user’s question and candidate answer
  7. 7. Our ideaUsing DSMs into a QA system – QuestionCube: a framework for QAExploit DSMs – compare semantic similarity between user’s question and candidate answer Q: Which beverages contain alcohol? A: Tesgüino makes you drunk.
  8. 8. Outline• Introduction to the QA problem• Distributional Semantic Models – semantic composition• DSMs into QuestionCube• Evaluation and results• Final remark
  9. 9. QA vs. IR Question Answering Information RetrievalInformation need Natural language question Keywords Factoid answer or shortResult Whole documents passages Searching manually insideInformation acquisition Reading answers documents
  10. 10. DISTRIBUTIONAL SEMANTICMODELS
  11. 11. Distributional Semantic Models…Defined as: <T, C, R, W, M, d, S> – T: target elements (words) – C: context – R: relation between T and C – W: weighting schema – M: geometric space TxC – d: space reduction M -> M’ – S: similarity function defined in M’
  12. 12. …Distributional Semantic ModelsTerm-term co-occurrence matrix: each cellcontains the co-occurrences between two termswithin a prefixed distance dog cat computer animal mouse dog 0 4 0 2 1 cat 4 0 0 3 5 computer 0 0 0 0 3 animal 2 3 0 0 2 mouse 1 5 3 2 0
  13. 13. …Distributional Semantic ModelsTTM: term-term co-occurrence matrixLatent Semantic Analysis (LSA): relies on theSingular Value Decomposition (SVD) of the co-occurrence matrixRandom Indexing (RI): based on the RandomProjectionLatent Semantic Analysis over Random Indexing(LSARI)
  14. 14. Latent Semantic AnalysisGiven M the TTM matrix – apply the SVD decomposition M=UΣVT • Σ is the matrix of singular values – choose k singular values from Σ and approximate the matrix M • M*=U*Σ*V*T – SVD • discover high-order relations between terms • reduce the sparseness of the original matrix
  15. 15. Random Indexing1. Generate and assign a context vector to each context element (e.g. document, passage, term, …)2. Term vector is the sum of the context vectors in which the term occurs – sometimes the context vector could be boosted by a score (e.g. term frequency, PMI, …) 15
  16. 16. Context Vector 0 0 0 0 0 0 0 -1 0 0 0 0 1 0 0 -1 0 1 0 0 0 0 1 0 0 0 0 -1• sparse• high dimensional• ternary {-1, 0, +1}• small number of randomly distributed non- zero elements 16
  17. 17. Random Indexing (formal) B =A R n,k n,m m,k k << m B nearly preserves the distance between points (Johnson-Lindenstrauss lemma) dr = c× dRI is a locality-sensitive hashing method which approximate the cosinedistance between vectors 17
  18. 18. Random Indexing (example) CVjohn -> (0, 0, 0, 0, 0, 0, 1, 0, -1, 0)John eats a red apple CVeat -> (1, 0, 0, 0, -1, 0, 0 ,0 ,0 ,0) CVred -> (0, 0, 0, 1, 0, 0, 0, -1, 0, 0)TVapple= CVjohn+ CVeat+ CVred=(1, 0, 0, 1, -1, 0, 1, -1, -1, 0) 18
  19. 19. Latent Semantic Analysis over RandomIndexing1. Reduce the dimension of the co-occurrences matrix using RI2. Perform LSA over RI (LSARI) – reduction of LSA computation time: RI matrix contains less dimensions than co-occurrences matrix
  20. 20. QUESTIONCUBE FRAMEWORK
  21. 21. QuestionCube framework…Natural Language pipeline for Italian and English – PoS-tagging, lemmatization, Name Entity Recognition, Phrase Chunking, Dependency Parsing, Word Sense Disambiguation (WSD)IR model based on classical VSM or BM25 – several query expansion strategiesPassage retrieval
  22. 22. …QuestionCube framework Question Passage Filter Question Term filter analysis Ranked list of N-gram filter passages Questionexpansion and Density filtercategorization Candidate Sequence passages filter Document retrieval Score norm Index
  23. 23. DSMS INTO QUESTIONCUBE
  24. 24. Integration Question Passage Filter Question Term filter analysis Ranked list of N-gram filter passages Question expansion and Density filter categorization Candidate Sequence passages filter Distributional Document filter retrieval New filter Score norm based on DSMs Index
  25. 25. Distributional filter…Compute the similarity between the user’squestion and each candidate answerBoth user’s question and candidate answer arerepresented by more than one term – a method to compose words is necessary
  26. 26. …Distributional filter… candidate Question candidate answer1 answer3 candidate candidate answer2 answer5 Distributional candidate re-ranking candidate filter answer3 answer1 DSM compositional approach candidate candidate answern answern
  27. 27. …Distributional filter (compositionalapproach)Addition (+): pointwise sum of components – represent complex structures by summing words which compose themGiven q=q1 q2 … qn and a=a1 a2 … am     – question: q q1 q2  qn     – candidate answer: a a1 a2  am   q a – similarity sim ( q , a )   q a
  28. 28. EVALUATION AND RESULTS
  29. 29. Evaluation…Dataset: 2010 CLEF QA Competition – 10,700 documents • from European Union legislation and European Parliament transcriptions – 200 questions – Italian and EnglishDSMs – 1,000 vector dimension (TTM/LSA/RI/RILSA) – 50,000 most frequent words – co-occurrence distance: 4 29
  30. 30. ...Evaluation1. Effectiveness of DSMs in QuestionCube2. Comparison between the several DSMs adoptedMetrics – a@n: accuracy taking into account only the first n answers N 1 – MRR based on the rank rank of the correct answer i 1 i N 30
  31. 31. Evaluation (alone scenario)Only distributional filter Passage Filteris adopted: no other Term filterfilters in the pipeline N-gram filter Density filter Sequence filter Distributional filter Score norm 31
  32. 32. Evaluation (combined)Distributional filter is Passage Filtercombined with the other Term filterfilters – using CombSum function N-gram filter – baseline: distributional Density filter filter is removed Sequence filter Distributional filter Score norm 32
  33. 33. Results (English)… Run a@1 a@5 a@10 a@30 MRRalone TTM .060 .145 .215 .345 .107 RI .180 .370 .425 .535 .267 LSA .205 .415 .490 .600 .300 LSARI .190 .405 .490 .620 .295combined baseline* .445 .635 .690 .780 .549 TTM .535 .715 .775 .810 .6141 RI .550 .730 .785 .870 .6371,2 LSA .560 .725 .790 .855 .6371 LSARI .555 .730 .790 .870 .6341 1significat wrt. baseline 2significat wrt. TTM *without distributional filter
  34. 34. …Results (Italian) Run a@1 a@5 a@10 a@30 MRRalone TTM .060 .140 .175 .280 .097 RI .175 .305 .385 .465 .241 LSA .155 .315 .390 .480 .229 LSARI .180 .335 .400 .500 .254combined baseline* .365 .530 .630 .715 .441 TTM .405 .565 .645 .740 .5391 RI .465 .645 .720 .785 .5551 LSA .470 .645 .690 .785 .5511 LSARI .480 .635 .690 .785 .5571,2 1significat wrt. baseline 2significat wrt. TTM *without distributional filter
  35. 35. Final remarks…Alone: all the proposed DSMs are better than the TTM – in particular LSA and LSARICombined: all the combinations are able toovercome the baseline – English +16% (RI/LSA), Italian +26% (LSARI)Slight difference in performance between LSAand LSARI
  36. 36. …Final remarksDistributional filter in the alone scenario showsan higher improvement than in the combinationscenario – find more effective way to combine filters (learning to rank approach)Exploit more complex semantic compositionalstrategies
  37. 37. Thank you for your attention! Questions? <basilepp@di.uniba.it>

×