Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Exploring Convolutional and Recurrent Neural Networks
in Sequential Labelling for Dialogue Topic Tracking
Seokhwan Kim, Ra...
Upcoming SlideShare
Loading in …5

Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling for Dialogue Topic Tracking


Published on

Poster for ACL2016

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling for Dialogue Topic Tracking

  1. 1. Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling for Dialogue Topic Tracking Seokhwan Kim, Rafael E. Banchs, Haizhou Li Human Language Technology Department, Institute for Infocomm Research (I2 R), Singapore Dialogue Topic Tracking Categorizing the topic state at each time step f(t) =    B-{c ∈ C} if ut is at the beginning of a segment belongs to c, I-{c ∈ C} else if ut is inside a segment belongs to c, O otherwise, Examples of dialogue topic tracking Speaker Utterance Topic Guide How can I help you? B - OPEN Tourist Can you recommend some good places to visit in Singa- pore? B - ATTR Guide Well if you like to visit an icon of Singapore, Merlion will be a nice place to visit. I - ATTR Tourist Okay. But I’m particularly interested in amusement parks. B - ATTR Guide Then, what about Universal Studio? I - ATTR Tourist Good! How can I get there from Orchard Road by public transportation? B - TRSP Guide You can take the red line train from Orchard and transfer to the purple line at Dhoby Ghaut. Then, you could reach HarbourFront where Sentosa Express departs. I - TRSP Tourist How long does it take in total? I - TRSP Guide It’ll take around half an hour. I - TRSP Tourist Alright. I - TRSP Guide Or, you can use the shuttle bus service from the hotels in Orchard, which is free of charge. B - TRSP Tourist Great! That would be definitely better. I - TRSP Guide After visiting the park, you can enjoy some seafoods at the riverside on the way back. B - FOOD Tourist What food do you have any recommendations to try there? I - FOOD Guide If you like spicy foods, you must try chilli crab which is one of our favourite dishes. I - FOOD Tourist Great! I’ll try that. I - FOOD Model 1: Convolutional Neural Networks (CNNs) Convolutional neural network architecture for dialogue topic tracking ut-1 ut ut-2 ut-h+1 … Input utterances within window size h Embedding layer with three different channels for current, previous, and history utterances Convolutional layer with multiple kernel sizes Max pooling layer Dense layer w softmax output Representing an utterance as a matrix with n rows of k-dimensional word vectors Each input has three channels for the current, previous, and the history utterances A convolutional filter has the same width k and a window size m as its height The maximum value is selected from each feature map in max pooling layer The values from max pooling are forwarded to the fully-connected softmax layer Model 2: Recurrent Neural Networks (RNNs) Recurrent neural network architecture for dialogue topic tracking ut-h+1 … ut-2 ut-1 ut Inputs Utterance-level embedding layer sf t-h+1 sf t-2 sf t-1 sf t Forward layer sb t-h+1 sb t-2 sb t-1 sb t Backward layer yt-h+1 … yt-2 yt-1 yt Output labels Each utterance is represented with k-dimensional pre-trained embeddings A sequence of the utterance vectors within h time steps are connected Hidden states from uni-/bi-directional recurrent layers are passed to softmax Model 3: Recurrent Convolutional Networks (RCNNs) Recurrent convolutional network architecture for dialogue topic tracking … Inputs … ut-1 ut ut-2 ut-h+1 Convolutional layer Forward layer sf t-1 sf t sf t-2 sf t-h+1 Backward layer sb t-1 sb t sb t-2 sb t-h+1 Output labels yt-1 yt yt-2 yt-h+1 Max pooling layer Each feature vector generated after the max pooling layers in the CNN architecture is connected to the recurrent layers in the RNN architecture Evaluation TourSG corpus Human-human mixed initiative dialogues 35 sessions, 21 hours, 31,034 utterances Manually annotated with nine topic categories Models Baselines Support Vector Machines (SVM) Conditional Random Fields (CRF) CNNs: learned from scratch/pre-trained word2vec RNNs: uni-directional/bi-directional RNNs/LSTMs RCNNs: uni-directional/bi-directional RCNNs/LRCNs Results Models Features P R F SVM bag-of-ngrams, speaker 59.85 59.94 59.90 SVM doc2vec, speaker 46.66 52.31 49.32 SVM bag-of-ngrams, speaker, doc2vec 59.91 60.01 59.96 CRF bag-of-ngrams, speaker 60.05 60.97 60.51 CRF doc2vec, speaker 61.77 49.57 55.00 CRF bag-of-ngrams, speaker, doc2vec 60.08 61.00 60.54 CNN learned from scratch 63.88 62.87 63.37 CNN learned from pre-trained word2vec 66.91 68.61 67.75 RNN uni-directional 49.51 53.75 51.55 RNN bi-directional 48.73 49.82 49.27 LSTM uni-directional 49.45 50.23 49.84 LSTM bi-directional 48.42 48.77 48.59 RCNN uni-directional 67.08 68.67 67.86 RCNN bi-directional 67.25 69.39 68.30 LRCN uni-directional 67.50 69.04 68.26 LRCN bi-directional 67.60 69.62 68.59 Error Distributions SVM CRF CNN LRCN 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000 6500 7000 7500 Numberoferrors missing extraneous wrong category wrong boundary 1 Fusionopolis Way, #21-01 Connexis (South Tower), Singapore 138632 Email: