A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain Knowledge from Wikipedia


Published on

A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain Knowledge from Wikipedia.
Seokhwan Kim, Rafael E. Banchs, Haizhou Li.
The 52nd Annual Meeting of the Association for Computational Linguistics (ACL 2014), Baltimore, Jun 2014

Published in: Science
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain Knowledge from Wikipedia

  1. 1. A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain Knowledge from Wikipedia Seokhwan Kim, Rafael E. Banchs, Haizhou Li Human Language Technology Department, Institute for Infocomm Research, Singapore Introduction Most previous dialog systems have focused only on a single target task on a single topic and domain. Some researchers have tried to identify dialog topics with text categorization or external knowledge-based approaches. In this work, we propose a composite kernel approach for dialog topic tracking with structured domain knowledge from Wikipedia. Dialog Topic Tracking A classification problem f(xt) = (yt−1, yt) xt contains the input features obtained at a turn t yt ∈ C, where C is a closed set of topic categories. If a topic transition occurs at t, yt should be different from yt−1. Otherwise, both yt and yt−1 have the same value. Examples of dialog topic tracking t Speaker Utterance Topic Transition 0 Guide How can I help you? NONE→NONE 1 Tourist Can you recommend some good places to visit in Singapore? NONE→ATTR Guide Well if you like to visit an icon of Singapore, Merlion park will be a nice place to visit. 2 Tourist Merlion is a symbol for Singapore, right? ATTR→ATTR Guide Yes, we use that to symbolise Singapore. 3 Tourist Okay. ATTR→ATTR Guide The lion head symbolised the founding of the island and the fish body just symbolised the humble fishing village. 4 Tourist How can I get there from Orchard Road? ATTR→TRSP Guide You can take the north-south line train from Orchard Road and stop at Raffles Place station. 5 Tourist Is this walking distance from the station to the destination? TRSP→TRSP Guide Yes, it’ll take only ten minutes on foot. 6 Tourist Alright. TRSP→FOOD Guide Well, you can also enjoy some seafoods at the riverside near the place. 7 Tourist What food do you have any recommendations to try there? FOOD→FOOD Guide If you like spicy foods, you must try chilli crab which is one of our favourite dishes here in Singapore. 8 Tourist Great! I’ll try that. FOOD→FOOD f detects three topic transitions at t1, t4, t6 Topic sequence: Attraction→Transportation→Food Wikipedia-based Dialog Topic Tracking Supervised machine learning With the training examples annotated with topic labels Based on the information obtained solely from an ongoing dialog Not sufficient to identify not only user-initiative, but also system-initiative topic transitions Wikipedia-based dialog topic tracking To leverage on Wikipedia as an external knowledge source Obtained without significant effort toward building resources Wikipedia-based Composite Kernel Composite kernel for dialog topic tracking With relevant Wikipedia paragraphs selected using cosine similarity sim (x, pi) = φ(x) · φ(pi) |φ(x)||φ(pi)| The term vector φ(x) is computed by accumulating the weights in the previous turns φ(x) = α1, α2, · · · , α|W| ∈ R|W| Two kernel structures are constructed using the information from the highly-ranked paragraphs History Sequence Kernel A sequence of the most similar paragraphs S = (s0, · · · , st), where sj = argmaxi sim xj, pi Sub-sequence kernel Ks(S1, S2) = u∈An i:u=S1[i] j:u=S2[j] λl(i)+l(j) Domain Context Tree Kernel A tree structure incorporates various types of domain knowledge from Wikipedia Subset tree kernel Kt(T1, T2) = n1∈NT1 n2∈NT2 (n1, n2) , Kernel Composition Linear combination of three kernels K(x1, x2) = α · Kl(V1, V2) + β · Ks(S1, S2) + γ · Kt(T1, T2) Experimental Setup Singapore tour guide dialogs Human-human mixed initiative dialogs 35 sessions, 21 hours, 19,651 utterances Manually annotated with nine topic categories Wikipedia collection 3,155 articles related to Singapore Collected from Wikipedia database dump as of February 2013 Models Baselines: Kl only, Kl with P as features Proposed approaches: Kl + Ks, Kl + Kt, Kl + Ks + Kt. Measures 5-fold cross validation to the manual annotations Accuracy of the predicted topic label for every turn Precision/Recall/F-measure for each event of topic transition occurred Experimental Results Comparison of the performances among the models Turn-level Transition-level Accuracy P R F Kl 62.45 42.77 24.77 31.37 Kl + P 62.44 42.76 24.77 31.37 Kl + Ks 67.19 39.94 40.59 40.26 Kl + Kt 68.54 45.55 35.69 40.02 Kl + Ks + Kt 69.98 44.82 39.83 42.18 Error distibutions of topic transitions 0 500 1000 1500 2000 2500 3000 Kl Kl + P Kl + Ks Kl + Kt ALL NumberofTransitionErrors FP(SYS) FN(SYS) FP(USR) FN(USR) 1 Fusionopolis Way, #21-01 Connexis (South Tower), Singapore 138632 Email: kims@i2r.a-star.edu.sg WWW: http://hlt.i2r.a-star.edu.sg/