Wikification of Concept Mentions within Spoken Dialogues Using Domain Constraints from Wikipedia

Seokhwan Kim
Seokhwan KimScientist at Institute for Infocomm Research

EMNLP 2015

Wikification of Concept Mentions within Spoken Dialogues
Using Domain Constraints from Wikipedia
Seokhwan Kim, Rafael E. Banchs, Haizhou Li
Human Language Technology Department, Institute for Infocomm Research (I2
R), Singapore
Wikification on Spoken Dialogues
Linking mentions to the relevant concepts in Wikipedia
Differences between spoken dialogues and written texts
Number of speakers
Dependencies to background knowledge
Degree of informal and noisy expressions
Examples of Wikification on Singapore tour guide dialogues
Guide How can I help you?
Tourist Can you recommend some good places to visit in Singapore?
Guide Well if you like to visit an icon of Singapore, Merlion park will be a nice
place to visit.
Tourist That is a symbol for your country, right?
Guide Yes, we use that to symbolise Singapore.
Tourist Okay.
Guide The lion head symbolised the founding of the island and the fish body
just symbolised the humble fishing village.
Tourist How can I get there from Orchard Road?
Guide You can take the red line train from Orchard and stop at Raffles Place.
Tourist Is this walking distance from the station to the destination?
Guide Yes, it’ll take only ten minutes on foot.
Tourist Alright.
Guide Well, you can also enjoy some seafoods at the riverside near the
place.
Tourist What food do you have any recommendations to try there?
Guide If you like spicy foods, you must try chilli crab which is one of our
favourite dishes here.
Tourist Great! I’ll try that.
Singapore, Merlion Park, Orchard Road, North South MRT Line, Raffles
Place MRT Station Singapore River, Chilli crab
Three-step Approach for Wikification on Dialogues
Input Mention
mi
Linking
Validity
Analysis
In-dialogue
Reference
Analysis
Domain
Relevance
Analysis
Speaker
Relatedness
Analysis
Candidate
Generation
Wikipedia
Concepts
History
<mj, f(mj)>j=0..(i-1)
Candidate
Ranking
Output Concept
f(mi)
Step 1
Step 2
Step 3
Step 1: Mention Analysis
Analyzing four binary properties of a given mention
Linking validity, In-dialogue reference, Domain relevance, Speaker relatedness
Guide: In the morning I suggest to you to go to Botanical Garden.
LV ID DR SRG SRT
- - - - -
LV ID DR SRG SRT
+ - + + -
Tourist: Oh, we also have Botanical Garden.
LV ID DR SRG SRT
+ - - - +
Tourist: That is actually one of my favourite places here.
LV ID DR SRG SRT
+ + - - +
LV ID DR SRG SRT
+ - - - +
Guide: If so, you might like this place also.
LV ID DR SRG SRT
+ + + + -
Step 2: Candidate Generation
Candidates retrieval from a Lucene index on the Wikipedia collection
With filtering constraints based on the analyzed properties in step 1
Combination of multiple constraints: Intersection or Union
Step 3: Candidate Ranking
Ranking SVM: Supervised learning to rank algorithm
s(m, c) =



4 if c is the exactly same as g(m),
3 if c is the parent article of g(m),
2 if c belongs to the same article
but different section of g(m),
1 otherwise.
m: a mention
c: a candidate concept
g(m): the manual annotation for the most relevant concept of m
Datasets
Singapore tour guide dialogues
Human-human mixed initiative dialogues
35 sessions, 21 hours, 31,034 utterances
Manually annotated with relevant Wikipedia concepts
Preprocessed by Stanford CoreNLP toolkit
Wikipedia collection
4,797,927 articles and 25,577,464 sections in total
Collected from Wikipedia database dump as of January 2015
Indexed into a Lucene index
Evaluation: Mention Analysis
SVMlight
was used for training four mention analyzers
With four sets of features: mention (M), utterance (U), dialogue (D),
and Wikipedia-based (W) features
Five-fold cross validation with F-measure
Features LV ID SRG SRT
M 86.29 69.15 71.10 72.94
M+U 86.90 70.43 70.43 68.85
M+D 86.17 71.09 70.56 71.52
M+W 86.21 68.96 70.66 71.86
M+U+D 86.82 72.37 70.12 68.30
M+U+W 86.84 70.13 70.19 68.78
M+U+D+W 86.77 72.20 69.94 68.10
Evaluation: Candidate Generation
Four sets of candidates were prepared for each mention
Baseline: Retrieved with no filtering
Intersection: Filtered with intersection of analyzed properties
Union: Filtered with union of analyzed properties
Oracle: Filtered with manually annotated properties
Top 100 candidates were retrieved from a Lucene index for each set
Evaluation: Candidate Ranking
SVMrank
was used for training ranking functions
The top-ranked item in the list is considered as the result of Wikification
Five-fold cross validation with Precision/Recall/F-measure
Method P R F
Baseline 26.85 22.52 21.24
Intersection 44.37 27.35 33.84
Union 38.04 31.97 34.74
Manual Filtering 39.90 34.72 37.13
1 Fusionopolis Way, #21-01 Connexis (South Tower), Singapore 138632 Email: kims@i2r.a-star.edu.sg

Recommended

Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling... by
Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...
Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...Seokhwan Kim
475 views1 slide
ITMO RecSys course. Autumn 2014. Lecture 5 by
ITMO RecSys course. Autumn 2014. Lecture 5ITMO RecSys course. Autumn 2014. Lecture 5
ITMO RecSys course. Autumn 2014. Lecture 5Andrey Danilchenko
753 views35 slides
Science in text mining by
Science in text miningScience in text mining
Science in text miningTanay Chowdhury
1.5K views19 slides
Slides: Concurrent Inference of Topic Models and Distributed Vector Represent... by
Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...
Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...Parang Saraf
692 views33 slides
PyData2015 by
PyData2015PyData2015
PyData2015Matthew Opala
466 views53 slides
IRE Semantic Annotation of Documents by
IRE Semantic Annotation of Documents IRE Semantic Annotation of Documents
IRE Semantic Annotation of Documents Sharvil Katariya
594 views21 slides

More Related Content

Viewers also liked

Wikipedia Document Classification by
Wikipedia Document Classification Wikipedia Document Classification
Wikipedia Document Classification Mohit Sharma
254 views26 slides
Word2Vec: Vector presentation of words - Mohammad Mahdavi by
Word2Vec: Vector presentation of words - Mohammad MahdaviWord2Vec: Vector presentation of words - Mohammad Mahdavi
Word2Vec: Vector presentation of words - Mohammad Mahdaviirpycon
3.5K views25 slides
Natural Language in Human-Robot Interaction by
Natural Language in Human-Robot InteractionNatural Language in Human-Robot Interaction
Natural Language in Human-Robot InteractionSeokhwan Kim
4.2K views169 slides
Representation Learning of Vectors of Words and Phrases by
Representation Learning of Vectors of Words and PhrasesRepresentation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesFelipe Moraes
6.4K views21 slides
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec by
word2vec, LDA, and introducing a new hybrid algorithm: lda2vecword2vec, LDA, and introducing a new hybrid algorithm: lda2vec
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec👋 Christopher Moody
81.2K views161 slides
[SmartNews] Globally Scalable Web Document Classification Using Word2Vec by
[SmartNews] Globally Scalable Web Document Classification Using Word2Vec[SmartNews] Globally Scalable Web Document Classification Using Word2Vec
[SmartNews] Globally Scalable Web Document Classification Using Word2VecKouhei Nakaji
24.6K views54 slides

Viewers also liked(6)

Wikipedia Document Classification by Mohit Sharma
Wikipedia Document Classification Wikipedia Document Classification
Wikipedia Document Classification
Mohit Sharma254 views
Word2Vec: Vector presentation of words - Mohammad Mahdavi by irpycon
Word2Vec: Vector presentation of words - Mohammad MahdaviWord2Vec: Vector presentation of words - Mohammad Mahdavi
Word2Vec: Vector presentation of words - Mohammad Mahdavi
irpycon3.5K views
Natural Language in Human-Robot Interaction by Seokhwan Kim
Natural Language in Human-Robot InteractionNatural Language in Human-Robot Interaction
Natural Language in Human-Robot Interaction
Seokhwan Kim4.2K views
Representation Learning of Vectors of Words and Phrases by Felipe Moraes
Representation Learning of Vectors of Words and PhrasesRepresentation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and Phrases
Felipe Moraes6.4K views
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec by 👋 Christopher Moody
word2vec, LDA, and introducing a new hybrid algorithm: lda2vecword2vec, LDA, and introducing a new hybrid algorithm: lda2vec
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec
[SmartNews] Globally Scalable Web Document Classification Using Word2Vec by Kouhei Nakaji
[SmartNews] Globally Scalable Web Document Classification Using Word2Vec[SmartNews] Globally Scalable Web Document Classification Using Word2Vec
[SmartNews] Globally Scalable Web Document Classification Using Word2Vec
Kouhei Nakaji24.6K views

Similar to Wikification of Concept Mentions within Spoken Dialogues Using Domain Constraints from Wikipedia

Towards Improving Dialogue Topic Tracking Performances with Wikification of C... by
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...Towards Improving Dialogue Topic Tracking Performances with Wikification of C...
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...Seokhwan Kim
871 views1 slide
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ... by
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...Seokhwan Kim
610 views1 slide
Cv huaiping by
Cv huaipingCv huaiping
Cv huaipingMing Huaiping
73 views2 slides
Automated evaluation of crowdsourced annotations in the cultural heritage domain by
Automated evaluation of crowdsourced annotations in the cultural heritage domainAutomated evaluation of crowdsourced annotations in the cultural heritage domain
Automated evaluation of crowdsourced annotations in the cultural heritage domaindreamgirl314
750 views27 slides
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications by
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and ApplicationsICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and ApplicationsForward Gradient
240 views195 slides
Morphological Analyzer and Generator for Tamil Language by
Morphological Analyzer and Generator for Tamil LanguageMorphological Analyzer and Generator for Tamil Language
Morphological Analyzer and Generator for Tamil LanguageLushanthan Sivaneasharajah
1.5K views46 slides

Similar to Wikification of Concept Mentions within Spoken Dialogues Using Domain Constraints from Wikipedia(15)

Towards Improving Dialogue Topic Tracking Performances with Wikification of C... by Seokhwan Kim
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...Towards Improving Dialogue Topic Tracking Performances with Wikification of C...
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...
Seokhwan Kim871 views
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ... by Seokhwan Kim
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...
Seokhwan Kim610 views
Automated evaluation of crowdsourced annotations in the cultural heritage domain by dreamgirl314
Automated evaluation of crowdsourced annotations in the cultural heritage domainAutomated evaluation of crowdsourced annotations in the cultural heritage domain
Automated evaluation of crowdsourced annotations in the cultural heritage domain
dreamgirl314750 views
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications by Forward Gradient
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and ApplicationsICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
Forward Gradient240 views
"Thinking in English" information structures task array by Lawrie Hunter
"Thinking in English" information structures task array"Thinking in English" information structures task array
"Thinking in English" information structures task array
Lawrie Hunter373 views
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2 by Karthik Murugesan
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
Karthik Murugesan522 views
Esha t patkar portfolio 2020 by EshaPatkar
Esha t patkar portfolio 2020Esha t patkar portfolio 2020
Esha t patkar portfolio 2020
EshaPatkar23 views
NLP guest lecture: How to get text to confess what knowledge it has by Fariz Darari
NLP guest lecture: How to get text to confess what knowledge it hasNLP guest lecture: How to get text to confess what knowledge it has
NLP guest lecture: How to get text to confess what knowledge it has
Fariz Darari153 views
Search and Hyperlinking Overview @MediaEval2014 by Maria Eskevich
Search and Hyperlinking Overview @MediaEval2014Search and Hyperlinking Overview @MediaEval2014
Search and Hyperlinking Overview @MediaEval2014
Maria Eskevich395 views
Evaluating 'Thetford tomb raiders' Sharing research findings via an App AltC2013 by Nicola Louise Beddall-Hill
Evaluating 'Thetford tomb raiders' Sharing research findings via an App AltC2013Evaluating 'Thetford tomb raiders' Sharing research findings via an App AltC2013
Evaluating 'Thetford tomb raiders' Sharing research findings via an App AltC2013

More from Seokhwan Kim

The Eighth Dialog System Technology Challenge (DSTC8) by
The Eighth Dialog System Technology Challenge (DSTC8)The Eighth Dialog System Technology Challenge (DSTC8)
The Eighth Dialog System Technology Challenge (DSTC8)Seokhwan Kim
337 views1 slide
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc... by
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...Seokhwan Kim
285 views1 slide
Dynamic Memory Networks for Dialogue Topic Tracking by
Dynamic Memory Networks for Dialogue Topic TrackingDynamic Memory Networks for Dialogue Topic Tracking
Dynamic Memory Networks for Dialogue Topic TrackingSeokhwan Kim
421 views1 slide
The Fifth Dialog State Tracking Challenge (DSTC5) by
The Fifth Dialog State Tracking Challenge (DSTC5)The Fifth Dialog State Tracking Challenge (DSTC5)
The Fifth Dialog State Tracking Challenge (DSTC5)Seokhwan Kim
645 views1 slide
Sequential Labeling for Tracking Dynamic Dialog States by
Sequential Labeling for Tracking Dynamic Dialog StatesSequential Labeling for Tracking Dynamic Dialog States
Sequential Labeling for Tracking Dynamic Dialog StatesSeokhwan Kim
525 views1 slide
Wikipedia-based Kernels for Dialogue Topic Tracking by
Wikipedia-based Kernels for Dialogue Topic TrackingWikipedia-based Kernels for Dialogue Topic Tracking
Wikipedia-based Kernels for Dialogue Topic TrackingSeokhwan Kim
975 views28 slides

More from Seokhwan Kim(16)

The Eighth Dialog System Technology Challenge (DSTC8) by Seokhwan Kim
The Eighth Dialog System Technology Challenge (DSTC8)The Eighth Dialog System Technology Challenge (DSTC8)
The Eighth Dialog System Technology Challenge (DSTC8)
Seokhwan Kim337 views
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc... by Seokhwan Kim
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
Seokhwan Kim285 views
Dynamic Memory Networks for Dialogue Topic Tracking by Seokhwan Kim
Dynamic Memory Networks for Dialogue Topic TrackingDynamic Memory Networks for Dialogue Topic Tracking
Dynamic Memory Networks for Dialogue Topic Tracking
Seokhwan Kim421 views
The Fifth Dialog State Tracking Challenge (DSTC5) by Seokhwan Kim
The Fifth Dialog State Tracking Challenge (DSTC5)The Fifth Dialog State Tracking Challenge (DSTC5)
The Fifth Dialog State Tracking Challenge (DSTC5)
Seokhwan Kim645 views
Sequential Labeling for Tracking Dynamic Dialog States by Seokhwan Kim
Sequential Labeling for Tracking Dynamic Dialog StatesSequential Labeling for Tracking Dynamic Dialog States
Sequential Labeling for Tracking Dynamic Dialog States
Seokhwan Kim525 views
Wikipedia-based Kernels for Dialogue Topic Tracking by Seokhwan Kim
Wikipedia-based Kernels for Dialogue Topic TrackingWikipedia-based Kernels for Dialogue Topic Tracking
Wikipedia-based Kernels for Dialogue Topic Tracking
Seokhwan Kim975 views
A Graph-based Cross-lingual Projection Approach for Spoken Language Understan... by Seokhwan Kim
A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...
A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...
Seokhwan Kim918 views
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio... by Seokhwan Kim
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
Seokhwan Kim833 views
MMR-based active machine learning for Bio named entity recognition by Seokhwan Kim
MMR-based active machine learning for Bio named entity recognitionMMR-based active machine learning for Bio named entity recognition
MMR-based active machine learning for Bio named entity recognition
Seokhwan Kim489 views
A semi-supervised method for efficient construction of statistical spoken lan... by Seokhwan Kim
A semi-supervised method for efficient construction of statistical spoken lan...A semi-supervised method for efficient construction of statistical spoken lan...
A semi-supervised method for efficient construction of statistical spoken lan...
Seokhwan Kim368 views
A spoken dialog system for electronic program guide information access by Seokhwan Kim
A spoken dialog system for electronic program guide information accessA spoken dialog system for electronic program guide information access
A spoken dialog system for electronic program guide information access
Seokhwan Kim487 views
An alignment-based approach to semi-supervised relation extraction including ... by Seokhwan Kim
An alignment-based approach to semi-supervised relation extraction including ...An alignment-based approach to semi-supervised relation extraction including ...
An alignment-based approach to semi-supervised relation extraction including ...
Seokhwan Kim414 views
An Alignment-based Pattern Representation Model for Information Extraction by Seokhwan Kim
An Alignment-based Pattern Representation Model for Information ExtractionAn Alignment-based Pattern Representation Model for Information Extraction
An Alignment-based Pattern Representation Model for Information Extraction
Seokhwan Kim413 views
EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템 by Seokhwan Kim
EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템
EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템
Seokhwan Kim1.2K views
A Cross-Lingual Annotation Projection Approach for Relation Detection by Seokhwan Kim
A Cross-Lingual Annotation Projection Approach for Relation DetectionA Cross-Lingual Annotation Projection Approach for Relation Detection
A Cross-Lingual Annotation Projection Approach for Relation Detection
Seokhwan Kim603 views
A Cross-lingual Annotation Projection-based Self-supervision Approach for Ope... by Seokhwan Kim
A Cross-lingual Annotation Projection-based Self-supervision Approach for Ope...A Cross-lingual Annotation Projection-based Self-supervision Approach for Ope...
A Cross-lingual Annotation Projection-based Self-supervision Approach for Ope...
Seokhwan Kim471 views

Recently uploaded

START Newsletter 3 by
START Newsletter 3START Newsletter 3
START Newsletter 3Start Project
5 views25 slides
Update 42 models(Diode/General ) in SPICE PARK(DEC2023) by
Update 42 models(Diode/General ) in SPICE PARK(DEC2023)Update 42 models(Diode/General ) in SPICE PARK(DEC2023)
Update 42 models(Diode/General ) in SPICE PARK(DEC2023)Tsuyoshi Horigome
28 views16 slides
Pull down shoulder press final report docx (1).pdf by
Pull down shoulder press final report docx (1).pdfPull down shoulder press final report docx (1).pdf
Pull down shoulder press final report docx (1).pdfComsat Universal Islamabad Wah Campus
13 views25 slides
SNMPx by
SNMPxSNMPx
SNMPxAmatullahbutt
17 views12 slides
Introduction to CAD-CAM.pptx by
Introduction to CAD-CAM.pptxIntroduction to CAD-CAM.pptx
Introduction to CAD-CAM.pptxsuyogpatil49
5 views15 slides
DESIGN OF SPRINGS-UNIT4.pptx by
DESIGN OF SPRINGS-UNIT4.pptxDESIGN OF SPRINGS-UNIT4.pptx
DESIGN OF SPRINGS-UNIT4.pptxgopinathcreddy
19 views47 slides

Recently uploaded(20)

Update 42 models(Diode/General ) in SPICE PARK(DEC2023) by Tsuyoshi Horigome
Update 42 models(Diode/General ) in SPICE PARK(DEC2023)Update 42 models(Diode/General ) in SPICE PARK(DEC2023)
Update 42 models(Diode/General ) in SPICE PARK(DEC2023)
Introduction to CAD-CAM.pptx by suyogpatil49
Introduction to CAD-CAM.pptxIntroduction to CAD-CAM.pptx
Introduction to CAD-CAM.pptx
suyogpatil495 views
Generative AI Models & Their Applications by SN
Generative AI Models & Their ApplicationsGenerative AI Models & Their Applications
Generative AI Models & Their Applications
SN8 views
Machine Element II Course outline.pdf by odatadese1
Machine Element II Course outline.pdfMachine Element II Course outline.pdf
Machine Element II Course outline.pdf
odatadese19 views
Instrumentation & Control Lab Manual.pdf by NTU Faisalabad
Instrumentation & Control Lab Manual.pdfInstrumentation & Control Lab Manual.pdf
Instrumentation & Control Lab Manual.pdf
NTU Faisalabad 5 views
Effect of deep chemical mixing columns on properties of surrounding soft clay... by AltinKaradagli
Effect of deep chemical mixing columns on properties of surrounding soft clay...Effect of deep chemical mixing columns on properties of surrounding soft clay...
Effect of deep chemical mixing columns on properties of surrounding soft clay...
AltinKaradagli6 views
DevOps to DevSecOps: Enhancing Software Security Throughout The Development L... by Anowar Hossain
DevOps to DevSecOps: Enhancing Software Security Throughout The Development L...DevOps to DevSecOps: Enhancing Software Security Throughout The Development L...
DevOps to DevSecOps: Enhancing Software Security Throughout The Development L...
Anowar Hossain13 views
SUMIT SQL PROJECT SUPERSTORE 1.pptx by Sumit Jadhav
SUMIT SQL PROJECT SUPERSTORE 1.pptxSUMIT SQL PROJECT SUPERSTORE 1.pptx
SUMIT SQL PROJECT SUPERSTORE 1.pptx
Sumit Jadhav 13 views

Wikification of Concept Mentions within Spoken Dialogues Using Domain Constraints from Wikipedia

  • 1. Wikification of Concept Mentions within Spoken Dialogues Using Domain Constraints from Wikipedia Seokhwan Kim, Rafael E. Banchs, Haizhou Li Human Language Technology Department, Institute for Infocomm Research (I2 R), Singapore Wikification on Spoken Dialogues Linking mentions to the relevant concepts in Wikipedia Differences between spoken dialogues and written texts Number of speakers Dependencies to background knowledge Degree of informal and noisy expressions Examples of Wikification on Singapore tour guide dialogues Guide How can I help you? Tourist Can you recommend some good places to visit in Singapore? Guide Well if you like to visit an icon of Singapore, Merlion park will be a nice place to visit. Tourist That is a symbol for your country, right? Guide Yes, we use that to symbolise Singapore. Tourist Okay. Guide The lion head symbolised the founding of the island and the fish body just symbolised the humble fishing village. Tourist How can I get there from Orchard Road? Guide You can take the red line train from Orchard and stop at Raffles Place. Tourist Is this walking distance from the station to the destination? Guide Yes, it’ll take only ten minutes on foot. Tourist Alright. Guide Well, you can also enjoy some seafoods at the riverside near the place. Tourist What food do you have any recommendations to try there? Guide If you like spicy foods, you must try chilli crab which is one of our favourite dishes here. Tourist Great! I’ll try that. Singapore, Merlion Park, Orchard Road, North South MRT Line, Raffles Place MRT Station Singapore River, Chilli crab Three-step Approach for Wikification on Dialogues Input Mention mi Linking Validity Analysis In-dialogue Reference Analysis Domain Relevance Analysis Speaker Relatedness Analysis Candidate Generation Wikipedia Concepts History <mj, f(mj)>j=0..(i-1) Candidate Ranking Output Concept f(mi) Step 1 Step 2 Step 3 Step 1: Mention Analysis Analyzing four binary properties of a given mention Linking validity, In-dialogue reference, Domain relevance, Speaker relatedness Guide: In the morning I suggest to you to go to Botanical Garden. LV ID DR SRG SRT - - - - - LV ID DR SRG SRT + - + + - Tourist: Oh, we also have Botanical Garden. LV ID DR SRG SRT + - - - + Tourist: That is actually one of my favourite places here. LV ID DR SRG SRT + + - - + LV ID DR SRG SRT + - - - + Guide: If so, you might like this place also. LV ID DR SRG SRT + + + + - Step 2: Candidate Generation Candidates retrieval from a Lucene index on the Wikipedia collection With filtering constraints based on the analyzed properties in step 1 Combination of multiple constraints: Intersection or Union Step 3: Candidate Ranking Ranking SVM: Supervised learning to rank algorithm s(m, c) =    4 if c is the exactly same as g(m), 3 if c is the parent article of g(m), 2 if c belongs to the same article but different section of g(m), 1 otherwise. m: a mention c: a candidate concept g(m): the manual annotation for the most relevant concept of m Datasets Singapore tour guide dialogues Human-human mixed initiative dialogues 35 sessions, 21 hours, 31,034 utterances Manually annotated with relevant Wikipedia concepts Preprocessed by Stanford CoreNLP toolkit Wikipedia collection 4,797,927 articles and 25,577,464 sections in total Collected from Wikipedia database dump as of January 2015 Indexed into a Lucene index Evaluation: Mention Analysis SVMlight was used for training four mention analyzers With four sets of features: mention (M), utterance (U), dialogue (D), and Wikipedia-based (W) features Five-fold cross validation with F-measure Features LV ID SRG SRT M 86.29 69.15 71.10 72.94 M+U 86.90 70.43 70.43 68.85 M+D 86.17 71.09 70.56 71.52 M+W 86.21 68.96 70.66 71.86 M+U+D 86.82 72.37 70.12 68.30 M+U+W 86.84 70.13 70.19 68.78 M+U+D+W 86.77 72.20 69.94 68.10 Evaluation: Candidate Generation Four sets of candidates were prepared for each mention Baseline: Retrieved with no filtering Intersection: Filtered with intersection of analyzed properties Union: Filtered with union of analyzed properties Oracle: Filtered with manually annotated properties Top 100 candidates were retrieved from a Lucene index for each set Evaluation: Candidate Ranking SVMrank was used for training ranking functions The top-ranked item in the list is considered as the result of Wikification Five-fold cross validation with Precision/Recall/F-measure Method P R F Baseline 26.85 22.52 21.24 Intersection 44.37 27.35 33.84 Union 38.04 31.97 34.74 Manual Filtering 39.90 34.72 37.13 1 Fusionopolis Way, #21-01 Connexis (South Tower), Singapore 138632 Email: kims@i2r.a-star.edu.sg