Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Towards Improving Dialogue Topic Tracking Performances
with Wikification of Concept Mentions
Seokhwan Kim, Rafael E. Banchs...
Upcoming SlideShare
Loading in …5

Towards Improving Dialogue Topic Tracking Performances with Wikification of Concept Mentions


Published on

Poster for SIGDIAL 2015

  • Be the first to comment

  • Be the first to like this

Towards Improving Dialogue Topic Tracking Performances with Wikification of Concept Mentions

  1. 1. Towards Improving Dialogue Topic Tracking Performances with Wikification of Concept Mentions Seokhwan Kim, Rafael E. Banchs, Haizhou Li Human Language Technology Department, Institute for Infocomm Research (I2 R), Singapore Dialogue Topic Tracking Detecting topic transitions and predicting topic categories In on-going dialogues which address more than a single topic Classification problem f(xt) = (yt−1, yt) Examples of dialogue topic tracking Speaker Utterance Topic Transition Guide How can I help you? NONE→NONE Tourist Can you recommend some good places to visit in Sin- gapore? NONE→ATTR Guide Well if you like to visit an icon of Singapore, Merlion park will be a nice place to visit. Tourist That is a symbol for your country, right? ATTR→ATTR Guide Yes, we use that to symbolise Singapore. Tourist Okay. ATTR→ATTR Guide The lion head symbolised the founding of the island and the fish body just symbolised the humble fishing village. Tourist How can I get there from Orchard Road? ATTR→TRSP Guide You can take the red line train from Orchard and stop at Raffles Place. Tourist Is this walking distance from the station to the destina- tion? TRSP→TRSP Guide Yes, it’ll take only ten minutes on foot. Tourist Alright. TRSP→FOOD Guide Well, you can also enjoy some seafoods at the riverside near the place. Tourist What food do you have any recommendations to try there? FOOD→FOOD Guide If you like spicy foods, you must try chilli crab which is one of our favourite dishes here. Tourist Great! I’ll try that. FOOD→FOOD Wikipedia-based Dialogue Topic Tracking To leverage on Wikipedia as an external knowledge source Obtained without significant effort toward building resources Kernel methods based on dialogue segment-level correspondences to Wikipedia articles obtained with vector space models Kim et al. (ICASSP 2014, ACL 2014) Limitations Mismatches between spoken dialogues and written encyclopedia Multiple relevances from a single dialogue segment to more than one concept Lack of semantic or discourse aspects in concept retrieval Wikification of Concept Mentions in Spoken Dialogues Wikification aims at linking mentions to the relevant entries in Wikipedia Examples of Wikification results on the previous example Singapore, Merlion Park, Orchard Road, North South MRT Line, Raffles Place MRT Station Singapore River, Chilli crab Dealing with co-references, ambiguities, and variabilities of mentions Every noun phrase is defined as a single mention For each mention, candidates are retrieved from a Lucene index Ranking SVM: Supervised learning to rank algorithm s(m, c) =    4 if c is the exactly same as g(m), 3 if c is the parent article of g(m), 2 if c belongs to the same article but different section of g(m), 1 otherwise. m: a mention c: a candidate concept g(m): the manual annotation for the most relevant concept of m The top-ranked item in the list of candidates scored by this model is considered as the result of Wikification for a given mention Wikification-based Features for Topic Tracking Baseline based on vector space model φ(x) = α1, α2, · · · , α|W| ∈ R|W| αi = h j=0 λj · tfidf(wi, u(t−j)) Wikipedia-based features (ICASSP 2014, ACL 2014) φ (x) = α1, α2, · · · , α|W|, β1, β2, · · · , β|D| βi = sim(x, ci) = cos (θ) = φ(x) · φ(ci) |φ(x)||φ(ci)| Wikification-based features γi = h j=0 λj · mk ∈ u(t−j)|g(mk) = ci Experimental Setup Singapore tour guide dialogues Human-human mixed initiative dialogues 35 sessions, 21 hours, 31,034 utterances Manually annotated with nine topic categories Wikipedia collection 4,797,927 articles and 25,577,464 sections in total Collected from Wikipedia database dump as of January 2015 Indexed into a Lucene index Wikification of Concept Mentions Kim et al. (EMNLP 2015) Preprocessed by Stanford CoreNLP toolkit For every noun phrase, top 100 candidates are retrieved from Lucene index SVMrank was used to train a ranking function Wikification Performances Five-fold cross validation 38.04/31.97/34.74 in Precision/Recall/F-measure Dialogue Topic Tracking SVM models were tranined using SVMlight 3 schedules were considered: every turns, only tourist turns, and only guide turns All the evaluations were done in five-fold cross validation Evaluation metrics Accuracy of the predicted topic label for every turn Precision/Recall/F-measure for each event of topic transition occurred Experimental Results Schedule Features Transition Turn P R F ACC all α 42.08 53.48 47.10 67.97 α + β 42.12 53.38 47.08 67.98 α + γ 47.36 50.19 48.73 72.38 α + β + γ 47.35 50.24 48.75 72.43 α + γ 50.77 49.36 50.06 79.12 α + β + γ 50.82 49.41 50.10 79.15 tourist turns α 41.88 52.59 46.63 67.15 α + β 41.84 52.75 46.67 67.08 α + γ 46.58 51.09 48.73 71.99 α + β + γ 46.57 51.09 48.72 71.99 α + γ 50.51 49.58 50.04 81.10 α + β + γ 50.43 49.58 50.00 81.10 guide turns α 41.96 52.11 46.49 67.13 α + β 41.91 52.03 46.42 67.13 α + γ 47.10 48.44 47.76 71.94 α + β + γ 47.02 48.21 47.61 71.93 α + γ 50.94 49.10 50.00 78.92 α + β + γ 50.98 49.02 49.98 78.92 γ is the oracle features with the manual annotations for Wikification 1 Fusionopolis Way, #21-01 Connexis (South Tower), Singapore 138632 Email: