The Eighth Dialog System Technology Challenge (DSTC8)

Seokhwan Kim
Seokhwan KimScientist at Institute for Infocomm Research

Poster presented at the 3rd NeurIPS workshop on Conversational AI workshop: Today's Practice and Tomorrow's Potential, Vancouver, Dec 2019.

The Eighth Dialog System Technology Challenge (DSTC8)
Seokhwan Kim, Michel Galley, Chulaka Gunasekara, Sungjin Lee, Adam Atkinson, Baolin Peng, Hannes Schulz,
Jianfeng Gao, Jinchao Li, Mahmoud Adada, Minlie Huang, Luis Lastras, Jonathan K. Kummerfeld, Walter S. Lasecki,
Chiori Hori, Anoop Cherian, Tim K. Marks, Abhinav Rastogi, Xiaoxue Zang, Srinivas Sunkara, Raghav Gupta
Nov 2018 –
Jan 2019
7 Proposals Received
Jan 2019
DSTC8 Planning
@ DSTC7 Workshop
Mar 2019 4 Tracks Selected
Apr - May, 2019 Track Preparation
Jun - Oct, 2019 Challenge Period
Feb 8, 2020
DSTC8 Workshop
@ AAAI-20
• DSTC8 Timeline • Four Main Tracks
• Multi-domain Task Completion
(Microsoft Research AI & Tsinghua University)
• NOESIS II: Predicting Responses
Identifying Success, and Managing
Complexity in Task-Oriented Dialogue
(IBM Research AI & University of Michigan)
• Audio Visual Scene-Aware Dialog
(Mitsubishi Electric Research Laboratories)
• Schema-Guided State Tracking
(Google Research)
• Workshop@AAAI-20
• One-day Workshop
• Feb 8, 2020 in New York
• Online registration will be
open until Jan 10, 2020.
• DSTC9 Planning Session
• Call for DSTC9 Track
Proposals by Jan 25, 2020
• Proposal Presentations @
DSTC8 Workshop
Track 3: Audio Visual Scene-Aware Dialog Track 4: Schema-Guided Dialogue State Tracking
Track 1: Multi-Domain Task-Completion
• Task #1: End-to-End Multi-Domain Task
• Goal: Train an end-to-end dialog system that takes natural language as
input, and output natural language as response
• Data: MultiWOZ 2.0 from Budzianowski et al. (EMNLP 2018)
• 7 domains: attraction, hospital, police, hotel, restaurant, taxi, and train.
• Additional annotation on the user act is also provided.
• Evaluation: E2E performance with both automatic & human evaluation
• Results
• 12 Submissions; a conventional pipeline approach or an end-to-end approach.
• Some differences between human and automatic evaluation.
• The winning team (by human evaluation) leverages GPT-2.
• Task #2: Fast Adaptation Task
• Goal: Train an end-to-end dialog system to adapt to a new goal-oriented
domain given very few in-domain sample dialogs
• Data
• Large Reddit dataset of 5 million dialogs over 1000 domains (subreddits).
• MetaLWOz dataset with over 38,000 goal-oriented dialogs, covering 47 diverse
domains divided into 226 tasks.
• Evaluation
• Automatic Evaluation on MultiWOZ
• Human Evaluation on held-out MetaLWOz-domains
• Results
• Four submissions; transfer learning or fine-tuning with LSTM and Transformer
• Significant difference between automatic and human evaluation.
• Human evaluation ordering robust under bootstrapping.
• Winning team (by human evaluation) used hybrid GPT-2 generator and ranker.
Track 2: NOESIS II: Predicting Responses
• Task: Dialogue generation via next utterance
• Input: Multi-Party Conversation Context + 100 Candidate Utterances
• Output: Rankings including ‘none’
• Data in two domains: Ubuntu support & student advising.
• Evaluation Metrics
• Recall@N: In what fraction of cases is the true answer in the top N
produced by the system?
• Mean Reciprocal Rank:
Ubuntu data from
Kummerfeld et. al., (ACL 2019)
Advising data from DSTC7
Almost all teams used
BERT based approaches
• Results (Ubuntu) • Results (Advising)
Data augmentation
methods and balancing
positive and negative
samples were a crucial
part of the best
approaches.
DSTC mailing list: list@dstc.communityhttps://sites.google.com/dstc.community/dstc8
• Task
• Building machines having a conversation with humans about the objects
and events around them to interact with the real world through
understanding dynamic audiovisual scenes.
• Input: a video, dialog history about the video, a follow-up question
• Output: Generate a correct response to the follow-up question
• Data
• Video Content: Charades Dataset [Sigurdsson et al. 2016]
• Dialog Collection: AVSD dialogs collected via MTurk
• Two turkers have a dialog, consisting of 10 rounds of Q&A.
• Then the Questioner writes a summary of the video.
• Results
• Received 27 system submission from 12 teams
• The best system applied "Universal Multimodal Tansformer: Fine tuned
seq-to-seq model with GPT-2 embedding".
• Task
• Develop dialogue state tracking systems suitable for scale and
complexity of large-scale virtual assistants
• Support a wide variety of APIs or services over many domains
• Zero-shot or few-shot generalization to new services
• Data
• Largest public corpus of multi-domain task-oriented conversations
• Containing annotations for spoken language understanding, dialogue
state tracking, policy imitation learning, language generation.
• Evaluation Metrics
• Joint Goal Accuracy, Average goal accuracy, Requested Slots F1, Active
Intent accuracy
• Results
• Participation from 25 teams
• Mostly based on large pre-trained models like BERT or RoBERTa
• Good performance without any domain or service specific parameters
• Winning team: machine-reading comprehension for non-categorical slots;
wide and deep network for categorical slots; data augmentation by back
translation; additional hand-crafted features

Recommended

Effort Used to Create Domain-Specific Modeling Languages by
Effort Used to Create Domain-Specific Modeling LanguagesEffort Used to Create Domain-Specific Modeling Languages
Effort Used to Create Domain-Specific Modeling LanguagesJuha-Pekka Tolvanen
161 views20 slides
[CS570] Machine Learning Team Project (I know what items really are) by
[CS570] Machine Learning Team Project (I know what items really are)[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)Kunwoo Park
579 views25 slides
Deep Learning with CNTK by
Deep Learning with CNTKDeep Learning with CNTK
Deep Learning with CNTKAshish Jaiman
586 views30 slides
Team Data Science Process Presentation (TDSP), Aug 29, 2017 by
Team Data Science Process Presentation (TDSP), Aug 29, 2017Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017Debraj GuhaThakurta
1.2K views34 slides
TLC2018 Thomas Haver: The Automation Firehose - Be Strategic and Tactical by
TLC2018 Thomas Haver: The Automation Firehose - Be Strategic and TacticalTLC2018 Thomas Haver: The Automation Firehose - Be Strategic and Tactical
TLC2018 Thomas Haver: The Automation Firehose - Be Strategic and TacticalAnna Royzman
301 views65 slides
Large scale computing by
Large scale computing Large scale computing
Large scale computing Bhupesh Bansal
117 views34 slides

More Related Content

Similar to The Eighth Dialog System Technology Challenge (DSTC8)

Rack Cluster Deployment for SDSC Supercomputer by
Rack Cluster Deployment for SDSC SupercomputerRack Cluster Deployment for SDSC Supercomputer
Rack Cluster Deployment for SDSC SupercomputerRebekah Rodriguez
158 views27 slides
tip oopt pse-summit2017 by
tip oopt pse-summit2017tip oopt pse-summit2017
tip oopt pse-summit2017domenico di mola
284 views23 slides
Software Analytics - Achievements and Challenges by
Software Analytics - Achievements and ChallengesSoftware Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesTao Xie
1.8K views65 slides
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies by
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Yahoo Developer Network
1K views32 slides
Test-Driven Development in the Corporate Workplace by
Test-Driven Development in the Corporate WorkplaceTest-Driven Development in the Corporate Workplace
Test-Driven Development in the Corporate WorkplaceAhmed Owian
634 views20 slides
A Methodology for Building the Internet of Things by
A Methodology for Building the Internet of ThingsA Methodology for Building the Internet of Things
A Methodology for Building the Internet of ThingsThe Internet of Things Methodology
28K views25 slides

Similar to The Eighth Dialog System Technology Challenge (DSTC8)(20)

Rack Cluster Deployment for SDSC Supercomputer by Rebekah Rodriguez
Rack Cluster Deployment for SDSC SupercomputerRack Cluster Deployment for SDSC Supercomputer
Rack Cluster Deployment for SDSC Supercomputer
Rebekah Rodriguez158 views
Software Analytics - Achievements and Challenges by Tao Xie
Software Analytics - Achievements and ChallengesSoftware Analytics - Achievements and Challenges
Software Analytics - Achievements and Challenges
Tao Xie1.8K views
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies by Yahoo Developer Network
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Test-Driven Development in the Corporate Workplace by Ahmed Owian
Test-Driven Development in the Corporate WorkplaceTest-Driven Development in the Corporate Workplace
Test-Driven Development in the Corporate Workplace
Ahmed Owian634 views
Liferay v. Drupal: Pound for Pound @ Liferay Symposium 2014 - Findings from t... by Dave DeMichele
Liferay v. Drupal: Pound for Pound @ Liferay Symposium 2014 - Findings from t...Liferay v. Drupal: Pound for Pound @ Liferay Symposium 2014 - Findings from t...
Liferay v. Drupal: Pound for Pound @ Liferay Symposium 2014 - Findings from t...
Dave DeMichele2.8K views
Software engineering -core topics by Amnah_Ch
Software engineering -core topicsSoftware engineering -core topics
Software engineering -core topics
Amnah_Ch731 views
What's new in the latest source{d} releases! by source{d}
What's new in the latest source{d} releases!What's new in the latest source{d} releases!
What's new in the latest source{d} releases!
source{d}97 views
1 the big picture by javadch
1 the big picture1 the big picture
1 the big picture
javadch421 views
Analyzing Big Data's Weakest Link (hint: it might be you) by HPCC Systems
Analyzing Big Data's Weakest Link  (hint: it might be you)Analyzing Big Data's Weakest Link  (hint: it might be you)
Analyzing Big Data's Weakest Link (hint: it might be you)
HPCC Systems422 views
C19013010 the tutorial to build shared ai services session 1 by Bill Liu
C19013010  the tutorial to build shared ai services session 1C19013010  the tutorial to build shared ai services session 1
C19013010 the tutorial to build shared ai services session 1
Bill Liu112 views
EUDAT-EGI collaboration - Welcome and Overview by EUDAT
EUDAT-EGI collaboration - Welcome and OverviewEUDAT-EGI collaboration - Welcome and Overview
EUDAT-EGI collaboration - Welcome and Overview
EUDAT20 views
DevOps Operations Challenges by Vijaya K
DevOps Operations ChallengesDevOps Operations Challenges
DevOps Operations Challenges
Vijaya K1.4K views
Big Data: the weakest link by CS, NcState
Big Data: the weakest linkBig Data: the weakest link
Big Data: the weakest link
CS, NcState1K views

More from Seokhwan Kim

Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc... by
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...Seokhwan Kim
286 views1 slide
Dynamic Memory Networks for Dialogue Topic Tracking by
Dynamic Memory Networks for Dialogue Topic TrackingDynamic Memory Networks for Dialogue Topic Tracking
Dynamic Memory Networks for Dialogue Topic TrackingSeokhwan Kim
421 views1 slide
The Fifth Dialog State Tracking Challenge (DSTC5) by
The Fifth Dialog State Tracking Challenge (DSTC5)The Fifth Dialog State Tracking Challenge (DSTC5)
The Fifth Dialog State Tracking Challenge (DSTC5)Seokhwan Kim
645 views1 slide
Natural Language in Human-Robot Interaction by
Natural Language in Human-Robot InteractionNatural Language in Human-Robot Interaction
Natural Language in Human-Robot InteractionSeokhwan Kim
4.2K views169 slides
Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling... by
Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...
Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...Seokhwan Kim
475 views1 slide
The Fourth Dialog State Tracking Challenge (DSTC4) by
The Fourth Dialog State Tracking Challenge (DSTC4)The Fourth Dialog State Tracking Challenge (DSTC4)
The Fourth Dialog State Tracking Challenge (DSTC4)Seokhwan Kim
1K views20 slides

More from Seokhwan Kim(20)

Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc... by Seokhwan Kim
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
Seokhwan Kim286 views
Dynamic Memory Networks for Dialogue Topic Tracking by Seokhwan Kim
Dynamic Memory Networks for Dialogue Topic TrackingDynamic Memory Networks for Dialogue Topic Tracking
Dynamic Memory Networks for Dialogue Topic Tracking
Seokhwan Kim421 views
The Fifth Dialog State Tracking Challenge (DSTC5) by Seokhwan Kim
The Fifth Dialog State Tracking Challenge (DSTC5)The Fifth Dialog State Tracking Challenge (DSTC5)
The Fifth Dialog State Tracking Challenge (DSTC5)
Seokhwan Kim645 views
Natural Language in Human-Robot Interaction by Seokhwan Kim
Natural Language in Human-Robot InteractionNatural Language in Human-Robot Interaction
Natural Language in Human-Robot Interaction
Seokhwan Kim4.2K views
Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling... by Seokhwan Kim
Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...
Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...
Seokhwan Kim475 views
The Fourth Dialog State Tracking Challenge (DSTC4) by Seokhwan Kim
The Fourth Dialog State Tracking Challenge (DSTC4)The Fourth Dialog State Tracking Challenge (DSTC4)
The Fourth Dialog State Tracking Challenge (DSTC4)
Seokhwan Kim1K views
Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra... by Seokhwan Kim
Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...
Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...
Seokhwan Kim476 views
Towards Improving Dialogue Topic Tracking Performances with Wikification of C... by Seokhwan Kim
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...Towards Improving Dialogue Topic Tracking Performances with Wikification of C...
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...
Seokhwan Kim871 views
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ... by Seokhwan Kim
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...
Seokhwan Kim610 views
Sequential Labeling for Tracking Dynamic Dialog States by Seokhwan Kim
Sequential Labeling for Tracking Dynamic Dialog StatesSequential Labeling for Tracking Dynamic Dialog States
Sequential Labeling for Tracking Dynamic Dialog States
Seokhwan Kim525 views
Wikipedia-based Kernels for Dialogue Topic Tracking by Seokhwan Kim
Wikipedia-based Kernels for Dialogue Topic TrackingWikipedia-based Kernels for Dialogue Topic Tracking
Wikipedia-based Kernels for Dialogue Topic Tracking
Seokhwan Kim975 views
A Graph-based Cross-lingual Projection Approach for Spoken Language Understan... by Seokhwan Kim
A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...
A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...
Seokhwan Kim920 views
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio... by Seokhwan Kim
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
Seokhwan Kim833 views
MMR-based active machine learning for Bio named entity recognition by Seokhwan Kim
MMR-based active machine learning for Bio named entity recognitionMMR-based active machine learning for Bio named entity recognition
MMR-based active machine learning for Bio named entity recognition
Seokhwan Kim489 views
A semi-supervised method for efficient construction of statistical spoken lan... by Seokhwan Kim
A semi-supervised method for efficient construction of statistical spoken lan...A semi-supervised method for efficient construction of statistical spoken lan...
A semi-supervised method for efficient construction of statistical spoken lan...
Seokhwan Kim368 views
A spoken dialog system for electronic program guide information access by Seokhwan Kim
A spoken dialog system for electronic program guide information accessA spoken dialog system for electronic program guide information access
A spoken dialog system for electronic program guide information access
Seokhwan Kim487 views
An alignment-based approach to semi-supervised relation extraction including ... by Seokhwan Kim
An alignment-based approach to semi-supervised relation extraction including ...An alignment-based approach to semi-supervised relation extraction including ...
An alignment-based approach to semi-supervised relation extraction including ...
Seokhwan Kim414 views
An Alignment-based Pattern Representation Model for Information Extraction by Seokhwan Kim
An Alignment-based Pattern Representation Model for Information ExtractionAn Alignment-based Pattern Representation Model for Information Extraction
An Alignment-based Pattern Representation Model for Information Extraction
Seokhwan Kim413 views
EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템 by Seokhwan Kim
EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템
EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템
Seokhwan Kim1.2K views
A Cross-Lingual Annotation Projection Approach for Relation Detection by Seokhwan Kim
A Cross-Lingual Annotation Projection Approach for Relation DetectionA Cross-Lingual Annotation Projection Approach for Relation Detection
A Cross-Lingual Annotation Projection Approach for Relation Detection
Seokhwan Kim603 views

Recently uploaded

Ports-and-Adapters Architecture for Embedded HMI by
Ports-and-Adapters Architecture for Embedded HMIPorts-and-Adapters Architecture for Embedded HMI
Ports-and-Adapters Architecture for Embedded HMIBurkhard Stubert
33 views19 slides
Introduction to Git Source Control by
Introduction to Git Source ControlIntroduction to Git Source Control
Introduction to Git Source ControlJohn Valentino
7 views18 slides
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated... by
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...TomHalpin9
6 views29 slides
nintendo_64.pptx by
nintendo_64.pptxnintendo_64.pptx
nintendo_64.pptxpaiga02016
6 views7 slides
Quality Engineer: A Day in the Life by
Quality Engineer: A Day in the LifeQuality Engineer: A Day in the Life
Quality Engineer: A Day in the LifeJohn Valentino
7 views18 slides
FOSSLight Community Day 2023-11-30 by
FOSSLight Community Day 2023-11-30FOSSLight Community Day 2023-11-30
FOSSLight Community Day 2023-11-30Shane Coughlan
7 views18 slides

Recently uploaded(20)

Ports-and-Adapters Architecture for Embedded HMI by Burkhard Stubert
Ports-and-Adapters Architecture for Embedded HMIPorts-and-Adapters Architecture for Embedded HMI
Ports-and-Adapters Architecture for Embedded HMI
Burkhard Stubert33 views
Introduction to Git Source Control by John Valentino
Introduction to Git Source ControlIntroduction to Git Source Control
Introduction to Git Source Control
John Valentino7 views
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated... by TomHalpin9
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...
TomHalpin96 views
Quality Engineer: A Day in the Life by John Valentino
Quality Engineer: A Day in the LifeQuality Engineer: A Day in the Life
Quality Engineer: A Day in the Life
John Valentino7 views
FOSSLight Community Day 2023-11-30 by Shane Coughlan
FOSSLight Community Day 2023-11-30FOSSLight Community Day 2023-11-30
FOSSLight Community Day 2023-11-30
Shane Coughlan7 views
Transport Management System - Shipment & Container Tracking by Freightoscope
Transport Management System - Shipment & Container TrackingTransport Management System - Shipment & Container Tracking
Transport Management System - Shipment & Container Tracking
Freightoscope 5 views
Generic or specific? Making sensible software design decisions by Bert Jan Schrijver
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium... by Lisi Hocke
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...
Lisi Hocke35 views
predicting-m3-devopsconMunich-2023.pptx by Tier1 app
predicting-m3-devopsconMunich-2023.pptxpredicting-m3-devopsconMunich-2023.pptx
predicting-m3-devopsconMunich-2023.pptx
Tier1 app8 views
How Workforce Management Software Empowers SMEs | TraQSuite by TraQSuite
How Workforce Management Software Empowers SMEs | TraQSuiteHow Workforce Management Software Empowers SMEs | TraQSuite
How Workforce Management Software Empowers SMEs | TraQSuite
TraQSuite6 views
JioEngage_Presentation.pptx by admin125455
JioEngage_Presentation.pptxJioEngage_Presentation.pptx
JioEngage_Presentation.pptx
admin1254558 views
Airline Booking Software by SharmiMehta
Airline Booking SoftwareAirline Booking Software
Airline Booking Software
SharmiMehta9 views

The Eighth Dialog System Technology Challenge (DSTC8)

  • 1. The Eighth Dialog System Technology Challenge (DSTC8) Seokhwan Kim, Michel Galley, Chulaka Gunasekara, Sungjin Lee, Adam Atkinson, Baolin Peng, Hannes Schulz, Jianfeng Gao, Jinchao Li, Mahmoud Adada, Minlie Huang, Luis Lastras, Jonathan K. Kummerfeld, Walter S. Lasecki, Chiori Hori, Anoop Cherian, Tim K. Marks, Abhinav Rastogi, Xiaoxue Zang, Srinivas Sunkara, Raghav Gupta Nov 2018 – Jan 2019 7 Proposals Received Jan 2019 DSTC8 Planning @ DSTC7 Workshop Mar 2019 4 Tracks Selected Apr - May, 2019 Track Preparation Jun - Oct, 2019 Challenge Period Feb 8, 2020 DSTC8 Workshop @ AAAI-20 • DSTC8 Timeline • Four Main Tracks • Multi-domain Task Completion (Microsoft Research AI & Tsinghua University) • NOESIS II: Predicting Responses Identifying Success, and Managing Complexity in Task-Oriented Dialogue (IBM Research AI & University of Michigan) • Audio Visual Scene-Aware Dialog (Mitsubishi Electric Research Laboratories) • Schema-Guided State Tracking (Google Research) • Workshop@AAAI-20 • One-day Workshop • Feb 8, 2020 in New York • Online registration will be open until Jan 10, 2020. • DSTC9 Planning Session • Call for DSTC9 Track Proposals by Jan 25, 2020 • Proposal Presentations @ DSTC8 Workshop Track 3: Audio Visual Scene-Aware Dialog Track 4: Schema-Guided Dialogue State Tracking Track 1: Multi-Domain Task-Completion • Task #1: End-to-End Multi-Domain Task • Goal: Train an end-to-end dialog system that takes natural language as input, and output natural language as response • Data: MultiWOZ 2.0 from Budzianowski et al. (EMNLP 2018) • 7 domains: attraction, hospital, police, hotel, restaurant, taxi, and train. • Additional annotation on the user act is also provided. • Evaluation: E2E performance with both automatic & human evaluation • Results • 12 Submissions; a conventional pipeline approach or an end-to-end approach. • Some differences between human and automatic evaluation. • The winning team (by human evaluation) leverages GPT-2. • Task #2: Fast Adaptation Task • Goal: Train an end-to-end dialog system to adapt to a new goal-oriented domain given very few in-domain sample dialogs • Data • Large Reddit dataset of 5 million dialogs over 1000 domains (subreddits). • MetaLWOz dataset with over 38,000 goal-oriented dialogs, covering 47 diverse domains divided into 226 tasks. • Evaluation • Automatic Evaluation on MultiWOZ • Human Evaluation on held-out MetaLWOz-domains • Results • Four submissions; transfer learning or fine-tuning with LSTM and Transformer • Significant difference between automatic and human evaluation. • Human evaluation ordering robust under bootstrapping. • Winning team (by human evaluation) used hybrid GPT-2 generator and ranker. Track 2: NOESIS II: Predicting Responses • Task: Dialogue generation via next utterance • Input: Multi-Party Conversation Context + 100 Candidate Utterances • Output: Rankings including ‘none’ • Data in two domains: Ubuntu support & student advising. • Evaluation Metrics • Recall@N: In what fraction of cases is the true answer in the top N produced by the system? • Mean Reciprocal Rank: Ubuntu data from Kummerfeld et. al., (ACL 2019) Advising data from DSTC7 Almost all teams used BERT based approaches • Results (Ubuntu) • Results (Advising) Data augmentation methods and balancing positive and negative samples were a crucial part of the best approaches. DSTC mailing list: list@dstc.communityhttps://sites.google.com/dstc.community/dstc8 • Task • Building machines having a conversation with humans about the objects and events around them to interact with the real world through understanding dynamic audiovisual scenes. • Input: a video, dialog history about the video, a follow-up question • Output: Generate a correct response to the follow-up question • Data • Video Content: Charades Dataset [Sigurdsson et al. 2016] • Dialog Collection: AVSD dialogs collected via MTurk • Two turkers have a dialog, consisting of 10 rounds of Q&A. • Then the Questioner writes a summary of the video. • Results • Received 27 system submission from 12 teams • The best system applied "Universal Multimodal Tansformer: Fine tuned seq-to-seq model with GPT-2 embedding". • Task • Develop dialogue state tracking systems suitable for scale and complexity of large-scale virtual assistants • Support a wide variety of APIs or services over many domains • Zero-shot or few-shot generalization to new services • Data • Largest public corpus of multi-domain task-oriented conversations • Containing annotations for spoken language understanding, dialogue state tracking, policy imitation learning, language generation. • Evaluation Metrics • Joint Goal Accuracy, Average goal accuracy, Requested Slots F1, Active Intent accuracy • Results • Participation from 25 teams • Mostly based on large pre-trained models like BERT or RoBERTa • Good performance without any domain or service specific parameters • Winning team: machine-reading comprehension for non-categorical slots; wide and deep network for categorical slots; data augmentation by back translation; additional hand-crafted features