Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Dynamic Memory Networks for Dialogue Topic Tracking


Published on

Presented at AAAI 2019 DSTC7 Workshop as a General Technical Track Paper

Published in: Software
  • Be the first to comment

  • Be the first to like this

Dynamic Memory Networks for Dialogue Topic Tracking

  1. 1. Dynamic Memory Networks for Dialogue Topic Tracking Seokhwan Kim Adobe Research, San Jose, CA, USA Dialogue Topic Tracking Categorizing the topic state at each time step f(t) =    B-{c ∈ C} if ut is at the beginning of a segment belongs to c, I-{c ∈ C} else if ut is inside a segment belongs to c, O otherwise, Examples of dialogue topic tracking Speaker Utterance (ut) f(t) Guide How can I help you? B-OPEN Tourist Can you recommend some good places to visit in Singa- pore? B-ATTR Guide Well if you like to visit an icon of Singapore, Merlion will be a nice place to visit. I-ATTR Tourist Okay. But I’m particularly interested in amusement parks. B-ATTR Guide Then, what about Universal Studio? I-ATTR Tourist Good! How can I get there from Orchard Road by public transportation? B-TRSP Guide You can take the red line train from Orchard and transfer to the purple line at Dhoby Ghaut. Then, you could reach HarbourFront where Sentosa Express departs. I-TRSP Tourist How long does it take in total? I-TRSP Guide It’ll take around half an hour. I-TRSP Tourist Alright. I-TRSP Guide You could spend a whole afternoon at the park by its closing time at 6pm. B-ATTR Tourist Sounds good! I-ATTR Guide Then, I recommend you enjoy dinner at the riverside on the way back. B-FOOD Tourist What do you recommend there? I-FOOD Guide If you like spicy foods, you must try chili crab which is one of our favorite dishes. I-FOOD Tourist Great! I’ll try that. I-FOOD Baselines: CNN and RCNN (Kim et al., 2016) CNN RCNN Inputs ut-1 ut ut-w+1 … ut-2ut-2 Convolutional layer Max pooling layer Prediction yt Inputs ut-w+1 … ut-1 ut ut-2ut-2 Convolutional layer Max pooling layer Max pooling layer Recurrent layer ht-w+1 … ht-2 ht-1 ht Prediction yt Prediction yt Convolutional Neural Network (CNN) for Dialogue Topic Tracking Representing an utterance as a matrix with n rows of k-dimensional word vectors A convolutional filter has the same width k and a window size m as its height The maximum value is selected from each feature map in max pooling layer The values from max pooling are forwarded to the fully-connected softmax layer Recurrent CNN (RCNN) for Dialogue Topic Tracking Each feature vector generated after the max pooling layers in the CNN architecture is connected to the recurrent layers in the RNN architecture Proposed Model: Dynamic Memory Network Dynamic Memory Network (DMN) for Dialogue Topic Tracking Inputs ut-w+1 … ut-1 ut ut-2ut-2 Convolutional layer Dynamic Memories Prediction yt Prediction yt h1 t-w+1 h2 t-w+1 hm t-w+1… h1 t-2 h2 t-2 hm t-2… h1 t-1 h2 t-1 hm t-1…h1 t-1 h2 t-1 hm t-1… h1 t h2 t hm t…h1 t h2 t hm t… … … … Max pooling Proposed Model: Dynamic Memory Network Our models represent the latent dialogue state at each given time step as a set of read-writable memory slots Each memory slot is updated through a given dialogue sequence by the content-based operations in gated recurrent networks Gating mechanisms Single Gate (Henaff et al. 2016) Update and Reset Gates Cross-slot Interactions zj i σ uT i wj + uT i hj i−1 σ k αkj z uT i wk + βkj z uT i hk i−1 rj i - σ uT i Wr wj + uT i Ur hj i−1 σ k αkj r uT i wk + βkj r uT i hk i−1 ˜hj i tanh Uhj i−1 + Vwj + Wui tanh U rj i ◦ hj i−1 + Vwj + Wui hj i 1 − zj i ◦ hj i−1 + zj i ◦ ˜hj i Evaluation: Data TourSG corpus Human-human mixed initiative dialogues 35 sessions, 21 hours, 31,034 utterances Manually annotated with eight topic categories ‘attraction’, ‘transportation’, ‘food’, ‘accommodation’, ‘shopping’, ‘opening’, ‘closing’, ‘other’ 15 classes: ({B-, I-} × {c : c ∈ C; and c = ‘other’}) ∪ {O} Data Statistics Set # sessions # segments # utterances Train 14 2,104 12,759 Dev 6 700 4,812 Test 15 2,210 13,463 Total 35 5,014 31,034 Evaluation: Implementation Details Word Embedding Initialized with the pre-trained word2vec on 2.9M sentences from travel forum Fine-tuned while the whole model is trained Convolutional Layer Learned 100 feature maps for each of three different filter sizes {3, 4, 5} For CNN, applied over the current, previous, and history utterances w = 10 For RCNN and DMN, applied for each single utterance Recurrent Layer RCNNs We compared two variants: Vanilla RNNs and Gated Recurrent Units (GRUs) The hidden layer dimensions were 150 for the vanilla RNN and 50 for the GRU DMNs Three dynamic memory networks were trained based on the proposed gating mechanisms The number of memory slots were m = 5 for the first two distributed models and m = 10 for the other with cross-slot interactions Model Training Adam optimizer by minimizing the categorical cross entropy loss on softmax With mini-batch size of 50 and dropout after max pooling with the rate of 0.25 Stopped training after 150 epochs where the CNN baseline has been saturated. Evaluation: Results Sequential Labelling Segmentation Models P R F Pk WD CNN 0.6691 0.6861 0.6775 0.3799 0.4884 RCNN (RNN) 0.6825 0.6572 0.6696 0.3970 0.4634 RCNN (GRU) 0.6936 0.6826 0.6880 0.3888 0.4619 DMN (single) 0.6877 0.7105 0.6989 0.3782 0.4393 DMN (reset & update) 0.6959 0.7035 0.6997 0.3781 0.4427 DMN (cross-slot) 0.7008 0.7090 0.7049† 0.3532‡ 0.4223‡ CNN RCNN Dynamic Memory 0 500 1000 1500 2000 2500 3000 3500 4000 4500 Numberoferrors missing extraneous wrong boundary wrong category Slot 0 Slot 1 Slot 2 Slot 3 Slot 4 Slot 5 Slot 6 Slot 7 Slot 8 Slot 9 B-ATTR B-TRSP B-FOOD B-ACCO B-SHOP I-ATTR I-TRSP I-FOOD I-ACCO I-SHOP 0.6 0.7 0.8 0.9 345 Park Avenue, San Jose, CA 95110, USA Email: