SlideShare a Scribd company logo
The Fifth Dialog State Tracking Challenge (DSTC5)
Seokhwan Kim1
, Luis Fernando D’Haro1
, Rafael E. Banchs1
, Jason D. Williams2
, Matthew Henderson3
, Koichiro Yoshino4
1
Institute for Infocomm Research, Singapore. 2
Microsoft Research, USA. 3
Google, USA. 4
Nara Institute of Science and Technology, Japan.
Problems
Goal
Human-human dialogs on tourist information in English and Chinese
Focusing on the problem of adaptation to a new language
Main Task
Dialog State Tracking (DST)
Pilot Tasks
Spoken Language Understanding (SLU)
Speech Act Prediction (SAP)
Spoken Language Generation (SLG)
End-to-end System (EES)
Datasets
Dialogs
Set Task Language # dialogs # utterances
Train ALL English 35 31,304 ← DSTC4 datasets
Dev ALL Chinese 2 3,130
Test MAIN Chinese 10 14,878
Test SLU Chinese 8 12,655
Test SAP Chinese 8 11,456
Test SLG Chinese 8 12,346
Translations
5-best translations were provided for each utterance with word alignments
generated by English-to-Chinese and Chinese-to-English MT systems
The ontology for DSTC4 was given with its automatic translation to Chinese
Main Task: Dialog State Tracking
Task Definition
Dialog state tracking for each sub-dialog level
Input
Transcribed utterances from the beginning of the session to each timestep
Manually segmented by sub-dialogs and annotated with topic categories
Output
Frame structures defined with slot-value pairs
For 5 major topic categories: Accommodation, Attraction, Food, Shopping, Transportation
Example
Speaker Utterance Dialog State
Guide 我介绍你这个甘榜格南。 (I recommend you this Kampong Glam.) TOPIC: Attraction
TYPE OF PLACE:
Ethnic enclave
NEIGHBORHOOD:
Kampong Glam
Tourist 对。(Right.)
Guide 你看,它是个-它是马来村嘛
(You see, it is a- it’s a Malay Village)
Tourist 对,甘榜- (Right, Kampong-)
Guide 它就卖了很多马来食物。 (It sells a lot of Malay food.) TOPIC: Food
CUISINE:
Malay cuisine
NEIGHBORHOOD:
Kampong Glam
Tourist 比较有特色的食物, (It’s quite a unique food,)
Guide 对,哦。(Right.)
Guide 马来食物,基本上,它是香。
(Malay food, basically, it smells very nice.)
Tourist 那我们住宿呢?(Then, where do we stay?)
TOPIC: Accommodation
INFO: Pricerange
NAME: V Hotel
Guide 我介绍一间呵,叫V Hotel的。 (Let me recommend to you, the V Hotel.)
Guide 这个酒店,价格这个不贵。 (This hotel, the price is not expensive.)
Tourist 好的。 (Okay.)
Guide 如果要去,我建议的这个马来文化村,
TOPIC: Transportation
INFO: Duration
TYPE: Walking
FROM: V Hotel
TO: Kampong Glam
(If you want to go, I suggest this Malay cultural village,)
Tourist 马来村? (Malay village?)
Guide 步行大概我看十五分钟吧。 (I think it take fifteen minutes on foot.)
Tourist 好。 (That’s good.)
Main Task: Dialog State Tracking
Baselines
Fuzzy string matching between ontology entries and utterances (DSTC4)
Baseline 1: Translations in English with the original ontology in English
Baseline 2: Original utterances in Chinese with the translated ontology in Chinese
Evaluation
Schedules: (1) every turn; (2) only at the end of each sub-dialog
Metrics: (1) Frame-level Accuracy; (2) Slot-level Precision/Recall/F-measure
Results (32 entries from 9 teams)
Schedule 1 Schedule 2
Team Entry Accuracy F-measure Accuracy F-measure
0 0 0.0250 0.1124 0.0321 0.1462 ← Baseline 1
0 1 0.0161 0.1475 0.0222 0.1871 ← Baseline 2
1 0 0.0397 0.3115 0.0551 0.3565
1 1 0.0386 0.3032 0.0597 0.3540
1 2 0.0393 0.3071 0.0551 0.3563
1 3 0.0387 0.3052 0.0597 0.3580
1 4 0.0417 0.3166 0.0612 0.3675
2 0 0.0736 0.3966 0.0964 0.4430
2 1 0.0567 0.3764 0.0712 0.4267
2 2 0.0529 0.3756 0.0681 0.4259
2 3 0.0788 0.4047 0.0956 0.4519
2 4 0.0699 0.4024 0.0872 0.4499
3 0 0.0351 0.2060 0.0505 0.2539
3 1 0.0303 0.2424 0.0367 0.2830
3 2 0.0289 0.2074 0.0406 0.2573
3 3 0.0341 0.2442 0.0451 0.2895
4 0 0.0583 0.3280 0.0765 0.3658
4 1 0.0407 0.3405 0.0413 0.3572
4 2 0.0515 0.3708 0.0635 0.3945
4 3 0.0552 0.3649 0.0681 0.3913
4 4 0.0454 0.3572 0.0559 0.3758
5 0 0.0330 0.2749 0.0520 0.3314
5 1 0.0187 0.1804 0.0230 0.1967
5 2 0.0183 0.1520 0.0168 0.1371
5 3 0.0313 0.1574 0.0413 0.1880
5 4 0.0093 0.0945 0.0115 0.0977
6 0 0.0389 0.2849 0.0482 0.3230
6 1 0.0340 0.3070 0.0383 0.3532
6 2 0.0491 0.2988 0.0643 0.3381
7 0 0.0092 0.0783 0.0107 0.0794
7 1 0.0085 0.0767 0.0115 0.0809
8 0 0.0192 0.1570 0.0214 0.1554
8 1 0.0068 0.0554 0.0069 0.0577
9 0 0.0231 0.1114 0.0314 0.1449
Pilot Task: Spoken Language Understanding
Task Definition
Input: Transcribed utterance at each timestep
Output
Speech Act: 4 main categories with 21 attributes
Semantic Tags: 8 main categories with subcategories, relative modifiers and from-to modifiers
Example
Input: 我介绍你这个甘榜格南。 (I recommend you this Kampong Glam.)
Speech Act: INI (RECOMMEND)
Semantic Tags: 我介绍你这<LOC CAT=“CULTURAL”>个甘榜格南</LOC>。
(I recommend you this <LOC CAT=“CULTURAL”>Kampong Glam</LOC>.)
Pilot Task: Spoken Language Understanding
Baselines: SVM for Speech Acts and CRF for Semantic Tags
Evaluation Metrics: Precision/Recall/F-measure
Results on Speech Acts (12 entries from 4 teams)
Guide Tourist
Team Entry P R F P R F
0 0 0.4588 0.2480 0.3219 0.3694 0.1828 0.2446 ← SVM baseline
2 0 0.5450 0.3911 0.4554 0.5001 0.5501 0.5239
2 1 0.5305 0.3969 0.4540 0.5331 0.5263 0.5297
2 2 0.5533 0.3829 0.4526 0.5107 0.5425 0.5261
2 3 0.5127 0.4251 0.4648 0.5605 0.4999 0.5285
3 0 0.4279 0.3583 0.3900 0.4591 0.4241 0.4409
3 1 0.4340 0.3635 0.3956 0.4498 0.4119 0.4300
5 0 0.4085 0.3364 0.3690 0.5026 0.4484 0.4739
5 1 0.3905 0.3216 0.3527 0.4519 0.4031 0.4261
5 2 0.4639 0.3820 0.4190 0.4916 0.4385 0.4635
5 3 0.4540 0.3739 0.4101 0.4871 0.4346 0.4594
5 4 0.4459 0.3672 0.4028 0.4984 0.4446 0.4700
7 0 0.5007 0.2976 0.3733 0.5079 0.4156 0.4571
Results on Sementic Tags (8 entries from 3 teams)
Guide Tourist
Team Entry P R F P R F
0 0 0.4666 0.3187 0.3787 0.5259 0.2659 0.3532 ← CRF baseline
3 0 0.4650 0.3182 0.3779 0.5331 0.2620 0.3513
3 1 0.4650 0.3182 0.3779 0.5331 0.2620 0.3513
5 0 0.5006 0.2923 0.3691 0.5083 0.3110 0.3859
5 1 0.5469 0.1893 0.2813 0.5121 0.3081 0.3847
5 2 0.3577 0.2476 0.2926 0.3031 0.2237 0.2574
5 3 0.3486 0.2541 0.2939 0.2932 0.2149 0.2480
5 4 0.3395 0.2111 0.2603 0.2947 0.2072 0.2433
7 0 0.4400 0.3207 0.3710 0.4408 0.2926 0.3517
Pilot Task: Spoken Language Generation
Task Definition
Input: Speech act and semantic tags at each time step
Output: Generated utterance
Example
Input: INI (RECOMMEND), <LOC CAT=“CULTURAL”>Kampong Glam</LOC>
Output: 我介绍你这个甘榜格南。 (I recommend you this Kampong Glam.)
Baseline
Example-based language generation
Using k-nearest neighbors algorithm on speech acts and semantic tags
Evaluation Metrics
BLEU: Geometric average of n-gram precision of system outputs to references
AM-FM: Linear interpolation of cosine similarity and normalized n-gram probability
Results (4 entries from 1 team)
Guide Tourist
Team Entry AM-FM BLEU AM-FM BLEU
0 0 0.1981 0.3854 0.2602 0.5921 ← Baseline
5 0 0.2818 0.3264 0.3221 0.4850
5 1 0.3180 0.3371 0.3635 0.5249
5 2 0.2737 0.2852 0.3100 0.4741
5 3 0.2405 0.2758 0.4258 0.5302
* More details can be found from our paper in the SLT proceeding, DSTC5 official website (http://workshop.colips.org/dstc5/) and DSTC5 GitHub repository (https://github.com/seokhwankim/dstc5).

More Related Content

Similar to The Fifth Dialog State Tracking Challenge (DSTC5)

SophiaConf 2018 - J. Rahajarison (My Little Adventure)
SophiaConf 2018 - J. Rahajarison (My Little Adventure)SophiaConf 2018 - J. Rahajarison (My Little Adventure)
SophiaConf 2018 - J. Rahajarison (My Little Adventure)
TelecomValley
 
How to use a Kalman Filter in Brand Tracking?
How to use a Kalman Filter in Brand Tracking?How to use a Kalman Filter in Brand Tracking?
How to use a Kalman Filter in Brand Tracking?
Ray Poynter
 
Ground Vibration Control Using Signature Hole Method - Thesis BE Mining, Univ...
Ground Vibration Control Using Signature Hole Method - Thesis BE Mining, Univ...Ground Vibration Control Using Signature Hole Method - Thesis BE Mining, Univ...
Ground Vibration Control Using Signature Hole Method - Thesis BE Mining, Univ...
Muhamad Rizky
 
Climate Change Emotions on YouTube: The Case of Before the Flood
Climate Change Emotions on YouTube: The Case of Before the FloodClimate Change Emotions on YouTube: The Case of Before the Flood
Climate Change Emotions on YouTube: The Case of Before the Flood
Xanat V. Meza
 
1. talleres lectoescritura
1. talleres lectoescritura1. talleres lectoescritura
1. talleres lectoescritura
Oscar López Comunicaciones - OL.COM
 
AP Statistics - Confidence Intervals with Means - One Sample
AP Statistics - Confidence Intervals with Means - One SampleAP Statistics - Confidence Intervals with Means - One Sample
AP Statistics - Confidence Intervals with Means - One Sample
Frances Coronel
 
Umedia2011 - uP: A lightweight protocol for services in smart spaces
Umedia2011 -  uP: A lightweight protocol for services in smart spacesUmedia2011 -  uP: A lightweight protocol for services in smart spaces
Umedia2011 - uP: A lightweight protocol for services in smart spaces
Fabricio Nogueira Buzeto
 
Remote detection of weak aftershocks of the DPRK underground explosions using...
Remote detection of weak aftershocks of the DPRK underground explosions using...Remote detection of weak aftershocks of the DPRK underground explosions using...
Remote detection of weak aftershocks of the DPRK underground explosions using...
Ivan Kitov
 
sCorrecting for country skew: How APNIC adjusts for sample bias in the counts
sCorrecting for country skew: How APNIC adjusts for sample bias in the countssCorrecting for country skew: How APNIC adjusts for sample bias in the counts
sCorrecting for country skew: How APNIC adjusts for sample bias in the counts
APNIC
 
Trigonotabel
TrigonotabelTrigonotabel
Trigonotabel
husnul hotimah
 
93 crit valuetables_4th
93 crit valuetables_4th93 crit valuetables_4th
93 crit valuetables_4th
asfawm
 
Group assigment statistic group3
Group assigment statistic group3Group assigment statistic group3
Group assigment statistic group3
Narith Por
 
Machine Learning 101
Machine Learning 101Machine Learning 101
Machine Learning 101
Talha Obaid
 

Similar to The Fifth Dialog State Tracking Challenge (DSTC5) (13)

SophiaConf 2018 - J. Rahajarison (My Little Adventure)
SophiaConf 2018 - J. Rahajarison (My Little Adventure)SophiaConf 2018 - J. Rahajarison (My Little Adventure)
SophiaConf 2018 - J. Rahajarison (My Little Adventure)
 
How to use a Kalman Filter in Brand Tracking?
How to use a Kalman Filter in Brand Tracking?How to use a Kalman Filter in Brand Tracking?
How to use a Kalman Filter in Brand Tracking?
 
Ground Vibration Control Using Signature Hole Method - Thesis BE Mining, Univ...
Ground Vibration Control Using Signature Hole Method - Thesis BE Mining, Univ...Ground Vibration Control Using Signature Hole Method - Thesis BE Mining, Univ...
Ground Vibration Control Using Signature Hole Method - Thesis BE Mining, Univ...
 
Climate Change Emotions on YouTube: The Case of Before the Flood
Climate Change Emotions on YouTube: The Case of Before the FloodClimate Change Emotions on YouTube: The Case of Before the Flood
Climate Change Emotions on YouTube: The Case of Before the Flood
 
1. talleres lectoescritura
1. talleres lectoescritura1. talleres lectoescritura
1. talleres lectoescritura
 
AP Statistics - Confidence Intervals with Means - One Sample
AP Statistics - Confidence Intervals with Means - One SampleAP Statistics - Confidence Intervals with Means - One Sample
AP Statistics - Confidence Intervals with Means - One Sample
 
Umedia2011 - uP: A lightweight protocol for services in smart spaces
Umedia2011 -  uP: A lightweight protocol for services in smart spacesUmedia2011 -  uP: A lightweight protocol for services in smart spaces
Umedia2011 - uP: A lightweight protocol for services in smart spaces
 
Remote detection of weak aftershocks of the DPRK underground explosions using...
Remote detection of weak aftershocks of the DPRK underground explosions using...Remote detection of weak aftershocks of the DPRK underground explosions using...
Remote detection of weak aftershocks of the DPRK underground explosions using...
 
sCorrecting for country skew: How APNIC adjusts for sample bias in the counts
sCorrecting for country skew: How APNIC adjusts for sample bias in the countssCorrecting for country skew: How APNIC adjusts for sample bias in the counts
sCorrecting for country skew: How APNIC adjusts for sample bias in the counts
 
Trigonotabel
TrigonotabelTrigonotabel
Trigonotabel
 
93 crit valuetables_4th
93 crit valuetables_4th93 crit valuetables_4th
93 crit valuetables_4th
 
Group assigment statistic group3
Group assigment statistic group3Group assigment statistic group3
Group assigment statistic group3
 
Machine Learning 101
Machine Learning 101Machine Learning 101
Machine Learning 101
 

More from Seokhwan Kim

The Eighth Dialog System Technology Challenge (DSTC8)
The Eighth Dialog System Technology Challenge (DSTC8)The Eighth Dialog System Technology Challenge (DSTC8)
The Eighth Dialog System Technology Challenge (DSTC8)
Seokhwan Kim
 
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
Seokhwan Kim
 
Dynamic Memory Networks for Dialogue Topic Tracking
Dynamic Memory Networks for Dialogue Topic TrackingDynamic Memory Networks for Dialogue Topic Tracking
Dynamic Memory Networks for Dialogue Topic Tracking
Seokhwan Kim
 
Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...
Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...
Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...
Seokhwan Kim
 
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...Towards Improving Dialogue Topic Tracking Performances with Wikification of C...
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...
Seokhwan Kim
 
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...
Seokhwan Kim
 
Sequential Labeling for Tracking Dynamic Dialog States
Sequential Labeling for Tracking Dynamic Dialog StatesSequential Labeling for Tracking Dynamic Dialog States
Sequential Labeling for Tracking Dynamic Dialog States
Seokhwan Kim
 
Wikipedia-based Kernels for Dialogue Topic Tracking
Wikipedia-based Kernels for Dialogue Topic TrackingWikipedia-based Kernels for Dialogue Topic Tracking
Wikipedia-based Kernels for Dialogue Topic Tracking
Seokhwan Kim
 
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
Seokhwan Kim
 
MMR-based active machine learning for Bio named entity recognition
MMR-based active machine learning for Bio named entity recognitionMMR-based active machine learning for Bio named entity recognition
MMR-based active machine learning for Bio named entity recognitionSeokhwan Kim
 
A semi-supervised method for efficient construction of statistical spoken lan...
A semi-supervised method for efficient construction of statistical spoken lan...A semi-supervised method for efficient construction of statistical spoken lan...
A semi-supervised method for efficient construction of statistical spoken lan...
Seokhwan Kim
 
A spoken dialog system for electronic program guide information access
A spoken dialog system for electronic program guide information accessA spoken dialog system for electronic program guide information access
A spoken dialog system for electronic program guide information access
Seokhwan Kim
 
An alignment-based approach to semi-supervised relation extraction including ...
An alignment-based approach to semi-supervised relation extraction including ...An alignment-based approach to semi-supervised relation extraction including ...
An alignment-based approach to semi-supervised relation extraction including ...
Seokhwan Kim
 
An Alignment-based Pattern Representation Model for Information Extraction
An Alignment-based Pattern Representation Model for Information ExtractionAn Alignment-based Pattern Representation Model for Information Extraction
An Alignment-based Pattern Representation Model for Information Extraction
Seokhwan Kim
 
A Cross-Lingual Annotation Projection Approach for Relation Detection
A Cross-Lingual Annotation Projection Approach for Relation DetectionA Cross-Lingual Annotation Projection Approach for Relation Detection
A Cross-Lingual Annotation Projection Approach for Relation Detection
Seokhwan Kim
 
A Cross-lingual Annotation Projection-based Self-supervision Approach for Ope...
A Cross-lingual Annotation Projection-based Self-supervision Approach for Ope...A Cross-lingual Annotation Projection-based Self-supervision Approach for Ope...
A Cross-lingual Annotation Projection-based Self-supervision Approach for Ope...
Seokhwan Kim
 

More from Seokhwan Kim (16)

The Eighth Dialog System Technology Challenge (DSTC8)
The Eighth Dialog System Technology Challenge (DSTC8)The Eighth Dialog System Technology Challenge (DSTC8)
The Eighth Dialog System Technology Challenge (DSTC8)
 
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
 
Dynamic Memory Networks for Dialogue Topic Tracking
Dynamic Memory Networks for Dialogue Topic TrackingDynamic Memory Networks for Dialogue Topic Tracking
Dynamic Memory Networks for Dialogue Topic Tracking
 
Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...
Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...
Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...
 
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...Towards Improving Dialogue Topic Tracking Performances with Wikification of C...
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...
 
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...
 
Sequential Labeling for Tracking Dynamic Dialog States
Sequential Labeling for Tracking Dynamic Dialog StatesSequential Labeling for Tracking Dynamic Dialog States
Sequential Labeling for Tracking Dynamic Dialog States
 
Wikipedia-based Kernels for Dialogue Topic Tracking
Wikipedia-based Kernels for Dialogue Topic TrackingWikipedia-based Kernels for Dialogue Topic Tracking
Wikipedia-based Kernels for Dialogue Topic Tracking
 
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
 
MMR-based active machine learning for Bio named entity recognition
MMR-based active machine learning for Bio named entity recognitionMMR-based active machine learning for Bio named entity recognition
MMR-based active machine learning for Bio named entity recognition
 
A semi-supervised method for efficient construction of statistical spoken lan...
A semi-supervised method for efficient construction of statistical spoken lan...A semi-supervised method for efficient construction of statistical spoken lan...
A semi-supervised method for efficient construction of statistical spoken lan...
 
A spoken dialog system for electronic program guide information access
A spoken dialog system for electronic program guide information accessA spoken dialog system for electronic program guide information access
A spoken dialog system for electronic program guide information access
 
An alignment-based approach to semi-supervised relation extraction including ...
An alignment-based approach to semi-supervised relation extraction including ...An alignment-based approach to semi-supervised relation extraction including ...
An alignment-based approach to semi-supervised relation extraction including ...
 
An Alignment-based Pattern Representation Model for Information Extraction
An Alignment-based Pattern Representation Model for Information ExtractionAn Alignment-based Pattern Representation Model for Information Extraction
An Alignment-based Pattern Representation Model for Information Extraction
 
A Cross-Lingual Annotation Projection Approach for Relation Detection
A Cross-Lingual Annotation Projection Approach for Relation DetectionA Cross-Lingual Annotation Projection Approach for Relation Detection
A Cross-Lingual Annotation Projection Approach for Relation Detection
 
A Cross-lingual Annotation Projection-based Self-supervision Approach for Ope...
A Cross-lingual Annotation Projection-based Self-supervision Approach for Ope...A Cross-lingual Annotation Projection-based Self-supervision Approach for Ope...
A Cross-lingual Annotation Projection-based Self-supervision Approach for Ope...
 

Recently uploaded

What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
BibashShahi
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
saastr
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 

Recently uploaded (20)

What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 

The Fifth Dialog State Tracking Challenge (DSTC5)

  • 1. The Fifth Dialog State Tracking Challenge (DSTC5) Seokhwan Kim1 , Luis Fernando D’Haro1 , Rafael E. Banchs1 , Jason D. Williams2 , Matthew Henderson3 , Koichiro Yoshino4 1 Institute for Infocomm Research, Singapore. 2 Microsoft Research, USA. 3 Google, USA. 4 Nara Institute of Science and Technology, Japan. Problems Goal Human-human dialogs on tourist information in English and Chinese Focusing on the problem of adaptation to a new language Main Task Dialog State Tracking (DST) Pilot Tasks Spoken Language Understanding (SLU) Speech Act Prediction (SAP) Spoken Language Generation (SLG) End-to-end System (EES) Datasets Dialogs Set Task Language # dialogs # utterances Train ALL English 35 31,304 ← DSTC4 datasets Dev ALL Chinese 2 3,130 Test MAIN Chinese 10 14,878 Test SLU Chinese 8 12,655 Test SAP Chinese 8 11,456 Test SLG Chinese 8 12,346 Translations 5-best translations were provided for each utterance with word alignments generated by English-to-Chinese and Chinese-to-English MT systems The ontology for DSTC4 was given with its automatic translation to Chinese Main Task: Dialog State Tracking Task Definition Dialog state tracking for each sub-dialog level Input Transcribed utterances from the beginning of the session to each timestep Manually segmented by sub-dialogs and annotated with topic categories Output Frame structures defined with slot-value pairs For 5 major topic categories: Accommodation, Attraction, Food, Shopping, Transportation Example Speaker Utterance Dialog State Guide 我介绍你这个甘榜格南。 (I recommend you this Kampong Glam.) TOPIC: Attraction TYPE OF PLACE: Ethnic enclave NEIGHBORHOOD: Kampong Glam Tourist 对。(Right.) Guide 你看,它是个-它是马来村嘛 (You see, it is a- it’s a Malay Village) Tourist 对,甘榜- (Right, Kampong-) Guide 它就卖了很多马来食物。 (It sells a lot of Malay food.) TOPIC: Food CUISINE: Malay cuisine NEIGHBORHOOD: Kampong Glam Tourist 比较有特色的食物, (It’s quite a unique food,) Guide 对,哦。(Right.) Guide 马来食物,基本上,它是香。 (Malay food, basically, it smells very nice.) Tourist 那我们住宿呢?(Then, where do we stay?) TOPIC: Accommodation INFO: Pricerange NAME: V Hotel Guide 我介绍一间呵,叫V Hotel的。 (Let me recommend to you, the V Hotel.) Guide 这个酒店,价格这个不贵。 (This hotel, the price is not expensive.) Tourist 好的。 (Okay.) Guide 如果要去,我建议的这个马来文化村, TOPIC: Transportation INFO: Duration TYPE: Walking FROM: V Hotel TO: Kampong Glam (If you want to go, I suggest this Malay cultural village,) Tourist 马来村? (Malay village?) Guide 步行大概我看十五分钟吧。 (I think it take fifteen minutes on foot.) Tourist 好。 (That’s good.) Main Task: Dialog State Tracking Baselines Fuzzy string matching between ontology entries and utterances (DSTC4) Baseline 1: Translations in English with the original ontology in English Baseline 2: Original utterances in Chinese with the translated ontology in Chinese Evaluation Schedules: (1) every turn; (2) only at the end of each sub-dialog Metrics: (1) Frame-level Accuracy; (2) Slot-level Precision/Recall/F-measure Results (32 entries from 9 teams) Schedule 1 Schedule 2 Team Entry Accuracy F-measure Accuracy F-measure 0 0 0.0250 0.1124 0.0321 0.1462 ← Baseline 1 0 1 0.0161 0.1475 0.0222 0.1871 ← Baseline 2 1 0 0.0397 0.3115 0.0551 0.3565 1 1 0.0386 0.3032 0.0597 0.3540 1 2 0.0393 0.3071 0.0551 0.3563 1 3 0.0387 0.3052 0.0597 0.3580 1 4 0.0417 0.3166 0.0612 0.3675 2 0 0.0736 0.3966 0.0964 0.4430 2 1 0.0567 0.3764 0.0712 0.4267 2 2 0.0529 0.3756 0.0681 0.4259 2 3 0.0788 0.4047 0.0956 0.4519 2 4 0.0699 0.4024 0.0872 0.4499 3 0 0.0351 0.2060 0.0505 0.2539 3 1 0.0303 0.2424 0.0367 0.2830 3 2 0.0289 0.2074 0.0406 0.2573 3 3 0.0341 0.2442 0.0451 0.2895 4 0 0.0583 0.3280 0.0765 0.3658 4 1 0.0407 0.3405 0.0413 0.3572 4 2 0.0515 0.3708 0.0635 0.3945 4 3 0.0552 0.3649 0.0681 0.3913 4 4 0.0454 0.3572 0.0559 0.3758 5 0 0.0330 0.2749 0.0520 0.3314 5 1 0.0187 0.1804 0.0230 0.1967 5 2 0.0183 0.1520 0.0168 0.1371 5 3 0.0313 0.1574 0.0413 0.1880 5 4 0.0093 0.0945 0.0115 0.0977 6 0 0.0389 0.2849 0.0482 0.3230 6 1 0.0340 0.3070 0.0383 0.3532 6 2 0.0491 0.2988 0.0643 0.3381 7 0 0.0092 0.0783 0.0107 0.0794 7 1 0.0085 0.0767 0.0115 0.0809 8 0 0.0192 0.1570 0.0214 0.1554 8 1 0.0068 0.0554 0.0069 0.0577 9 0 0.0231 0.1114 0.0314 0.1449 Pilot Task: Spoken Language Understanding Task Definition Input: Transcribed utterance at each timestep Output Speech Act: 4 main categories with 21 attributes Semantic Tags: 8 main categories with subcategories, relative modifiers and from-to modifiers Example Input: 我介绍你这个甘榜格南。 (I recommend you this Kampong Glam.) Speech Act: INI (RECOMMEND) Semantic Tags: 我介绍你这<LOC CAT=“CULTURAL”>个甘榜格南</LOC>。 (I recommend you this <LOC CAT=“CULTURAL”>Kampong Glam</LOC>.) Pilot Task: Spoken Language Understanding Baselines: SVM for Speech Acts and CRF for Semantic Tags Evaluation Metrics: Precision/Recall/F-measure Results on Speech Acts (12 entries from 4 teams) Guide Tourist Team Entry P R F P R F 0 0 0.4588 0.2480 0.3219 0.3694 0.1828 0.2446 ← SVM baseline 2 0 0.5450 0.3911 0.4554 0.5001 0.5501 0.5239 2 1 0.5305 0.3969 0.4540 0.5331 0.5263 0.5297 2 2 0.5533 0.3829 0.4526 0.5107 0.5425 0.5261 2 3 0.5127 0.4251 0.4648 0.5605 0.4999 0.5285 3 0 0.4279 0.3583 0.3900 0.4591 0.4241 0.4409 3 1 0.4340 0.3635 0.3956 0.4498 0.4119 0.4300 5 0 0.4085 0.3364 0.3690 0.5026 0.4484 0.4739 5 1 0.3905 0.3216 0.3527 0.4519 0.4031 0.4261 5 2 0.4639 0.3820 0.4190 0.4916 0.4385 0.4635 5 3 0.4540 0.3739 0.4101 0.4871 0.4346 0.4594 5 4 0.4459 0.3672 0.4028 0.4984 0.4446 0.4700 7 0 0.5007 0.2976 0.3733 0.5079 0.4156 0.4571 Results on Sementic Tags (8 entries from 3 teams) Guide Tourist Team Entry P R F P R F 0 0 0.4666 0.3187 0.3787 0.5259 0.2659 0.3532 ← CRF baseline 3 0 0.4650 0.3182 0.3779 0.5331 0.2620 0.3513 3 1 0.4650 0.3182 0.3779 0.5331 0.2620 0.3513 5 0 0.5006 0.2923 0.3691 0.5083 0.3110 0.3859 5 1 0.5469 0.1893 0.2813 0.5121 0.3081 0.3847 5 2 0.3577 0.2476 0.2926 0.3031 0.2237 0.2574 5 3 0.3486 0.2541 0.2939 0.2932 0.2149 0.2480 5 4 0.3395 0.2111 0.2603 0.2947 0.2072 0.2433 7 0 0.4400 0.3207 0.3710 0.4408 0.2926 0.3517 Pilot Task: Spoken Language Generation Task Definition Input: Speech act and semantic tags at each time step Output: Generated utterance Example Input: INI (RECOMMEND), <LOC CAT=“CULTURAL”>Kampong Glam</LOC> Output: 我介绍你这个甘榜格南。 (I recommend you this Kampong Glam.) Baseline Example-based language generation Using k-nearest neighbors algorithm on speech acts and semantic tags Evaluation Metrics BLEU: Geometric average of n-gram precision of system outputs to references AM-FM: Linear interpolation of cosine similarity and normalized n-gram probability Results (4 entries from 1 team) Guide Tourist Team Entry AM-FM BLEU AM-FM BLEU 0 0 0.1981 0.3854 0.2602 0.5921 ← Baseline 5 0 0.2818 0.3264 0.3221 0.4850 5 1 0.3180 0.3371 0.3635 0.5249 5 2 0.2737 0.2852 0.3100 0.4741 5 3 0.2405 0.2758 0.4258 0.5302 * More details can be found from our paper in the SLT proceeding, DSTC5 official website (http://workshop.colips.org/dstc5/) and DSTC5 GitHub repository (https://github.com/seokhwankim/dstc5).