SlideShare a Scribd company logo
1 of 1
Download to read offline
The Fifth Dialog State Tracking Challenge (DSTC5)
Seokhwan Kim1
, Luis Fernando D’Haro1
, Rafael E. Banchs1
, Jason D. Williams2
, Matthew Henderson3
, Koichiro Yoshino4
1
Institute for Infocomm Research, Singapore. 2
Microsoft Research, USA. 3
Google, USA. 4
Nara Institute of Science and Technology, Japan.
Problems
Goal
Human-human dialogs on tourist information in English and Chinese
Focusing on the problem of adaptation to a new language
Main Task
Dialog State Tracking (DST)
Pilot Tasks
Spoken Language Understanding (SLU)
Speech Act Prediction (SAP)
Spoken Language Generation (SLG)
End-to-end System (EES)
Datasets
Dialogs
Set Task Language # dialogs # utterances
Train ALL English 35 31,304 ← DSTC4 datasets
Dev ALL Chinese 2 3,130
Test MAIN Chinese 10 14,878
Test SLU Chinese 8 12,655
Test SAP Chinese 8 11,456
Test SLG Chinese 8 12,346
Translations
5-best translations were provided for each utterance with word alignments
generated by English-to-Chinese and Chinese-to-English MT systems
The ontology for DSTC4 was given with its automatic translation to Chinese
Main Task: Dialog State Tracking
Task Definition
Dialog state tracking for each sub-dialog level
Input
Transcribed utterances from the beginning of the session to each timestep
Manually segmented by sub-dialogs and annotated with topic categories
Output
Frame structures defined with slot-value pairs
For 5 major topic categories: Accommodation, Attraction, Food, Shopping, Transportation
Example
Speaker Utterance Dialog State
Guide 我介绍你这个甘榜格南。 (I recommend you this Kampong Glam.) TOPIC: Attraction
TYPE OF PLACE:
Ethnic enclave
NEIGHBORHOOD:
Kampong Glam
Tourist 对。(Right.)
Guide 你看,它是个-它是马来村嘛
(You see, it is a- it’s a Malay Village)
Tourist 对,甘榜- (Right, Kampong-)
Guide 它就卖了很多马来食物。 (It sells a lot of Malay food.) TOPIC: Food
CUISINE:
Malay cuisine
NEIGHBORHOOD:
Kampong Glam
Tourist 比较有特色的食物, (It’s quite a unique food,)
Guide 对,哦。(Right.)
Guide 马来食物,基本上,它是香。
(Malay food, basically, it smells very nice.)
Tourist 那我们住宿呢?(Then, where do we stay?)
TOPIC: Accommodation
INFO: Pricerange
NAME: V Hotel
Guide 我介绍一间呵,叫V Hotel的。 (Let me recommend to you, the V Hotel.)
Guide 这个酒店,价格这个不贵。 (This hotel, the price is not expensive.)
Tourist 好的。 (Okay.)
Guide 如果要去,我建议的这个马来文化村,
TOPIC: Transportation
INFO: Duration
TYPE: Walking
FROM: V Hotel
TO: Kampong Glam
(If you want to go, I suggest this Malay cultural village,)
Tourist 马来村? (Malay village?)
Guide 步行大概我看十五分钟吧。 (I think it take fifteen minutes on foot.)
Tourist 好。 (That’s good.)
Main Task: Dialog State Tracking
Baselines
Fuzzy string matching between ontology entries and utterances (DSTC4)
Baseline 1: Translations in English with the original ontology in English
Baseline 2: Original utterances in Chinese with the translated ontology in Chinese
Evaluation
Schedules: (1) every turn; (2) only at the end of each sub-dialog
Metrics: (1) Frame-level Accuracy; (2) Slot-level Precision/Recall/F-measure
Results (32 entries from 9 teams)
Schedule 1 Schedule 2
Team Entry Accuracy F-measure Accuracy F-measure
0 0 0.0250 0.1124 0.0321 0.1462 ← Baseline 1
0 1 0.0161 0.1475 0.0222 0.1871 ← Baseline 2
1 0 0.0397 0.3115 0.0551 0.3565
1 1 0.0386 0.3032 0.0597 0.3540
1 2 0.0393 0.3071 0.0551 0.3563
1 3 0.0387 0.3052 0.0597 0.3580
1 4 0.0417 0.3166 0.0612 0.3675
2 0 0.0736 0.3966 0.0964 0.4430
2 1 0.0567 0.3764 0.0712 0.4267
2 2 0.0529 0.3756 0.0681 0.4259
2 3 0.0788 0.4047 0.0956 0.4519
2 4 0.0699 0.4024 0.0872 0.4499
3 0 0.0351 0.2060 0.0505 0.2539
3 1 0.0303 0.2424 0.0367 0.2830
3 2 0.0289 0.2074 0.0406 0.2573
3 3 0.0341 0.2442 0.0451 0.2895
4 0 0.0583 0.3280 0.0765 0.3658
4 1 0.0407 0.3405 0.0413 0.3572
4 2 0.0515 0.3708 0.0635 0.3945
4 3 0.0552 0.3649 0.0681 0.3913
4 4 0.0454 0.3572 0.0559 0.3758
5 0 0.0330 0.2749 0.0520 0.3314
5 1 0.0187 0.1804 0.0230 0.1967
5 2 0.0183 0.1520 0.0168 0.1371
5 3 0.0313 0.1574 0.0413 0.1880
5 4 0.0093 0.0945 0.0115 0.0977
6 0 0.0389 0.2849 0.0482 0.3230
6 1 0.0340 0.3070 0.0383 0.3532
6 2 0.0491 0.2988 0.0643 0.3381
7 0 0.0092 0.0783 0.0107 0.0794
7 1 0.0085 0.0767 0.0115 0.0809
8 0 0.0192 0.1570 0.0214 0.1554
8 1 0.0068 0.0554 0.0069 0.0577
9 0 0.0231 0.1114 0.0314 0.1449
Pilot Task: Spoken Language Understanding
Task Definition
Input: Transcribed utterance at each timestep
Output
Speech Act: 4 main categories with 21 attributes
Semantic Tags: 8 main categories with subcategories, relative modifiers and from-to modifiers
Example
Input: 我介绍你这个甘榜格南。 (I recommend you this Kampong Glam.)
Speech Act: INI (RECOMMEND)
Semantic Tags: 我介绍你这<LOC CAT=“CULTURAL”>个甘榜格南</LOC>。
(I recommend you this <LOC CAT=“CULTURAL”>Kampong Glam</LOC>.)
Pilot Task: Spoken Language Understanding
Baselines: SVM for Speech Acts and CRF for Semantic Tags
Evaluation Metrics: Precision/Recall/F-measure
Results on Speech Acts (12 entries from 4 teams)
Guide Tourist
Team Entry P R F P R F
0 0 0.4588 0.2480 0.3219 0.3694 0.1828 0.2446 ← SVM baseline
2 0 0.5450 0.3911 0.4554 0.5001 0.5501 0.5239
2 1 0.5305 0.3969 0.4540 0.5331 0.5263 0.5297
2 2 0.5533 0.3829 0.4526 0.5107 0.5425 0.5261
2 3 0.5127 0.4251 0.4648 0.5605 0.4999 0.5285
3 0 0.4279 0.3583 0.3900 0.4591 0.4241 0.4409
3 1 0.4340 0.3635 0.3956 0.4498 0.4119 0.4300
5 0 0.4085 0.3364 0.3690 0.5026 0.4484 0.4739
5 1 0.3905 0.3216 0.3527 0.4519 0.4031 0.4261
5 2 0.4639 0.3820 0.4190 0.4916 0.4385 0.4635
5 3 0.4540 0.3739 0.4101 0.4871 0.4346 0.4594
5 4 0.4459 0.3672 0.4028 0.4984 0.4446 0.4700
7 0 0.5007 0.2976 0.3733 0.5079 0.4156 0.4571
Results on Sementic Tags (8 entries from 3 teams)
Guide Tourist
Team Entry P R F P R F
0 0 0.4666 0.3187 0.3787 0.5259 0.2659 0.3532 ← CRF baseline
3 0 0.4650 0.3182 0.3779 0.5331 0.2620 0.3513
3 1 0.4650 0.3182 0.3779 0.5331 0.2620 0.3513
5 0 0.5006 0.2923 0.3691 0.5083 0.3110 0.3859
5 1 0.5469 0.1893 0.2813 0.5121 0.3081 0.3847
5 2 0.3577 0.2476 0.2926 0.3031 0.2237 0.2574
5 3 0.3486 0.2541 0.2939 0.2932 0.2149 0.2480
5 4 0.3395 0.2111 0.2603 0.2947 0.2072 0.2433
7 0 0.4400 0.3207 0.3710 0.4408 0.2926 0.3517
Pilot Task: Spoken Language Generation
Task Definition
Input: Speech act and semantic tags at each time step
Output: Generated utterance
Example
Input: INI (RECOMMEND), <LOC CAT=“CULTURAL”>Kampong Glam</LOC>
Output: 我介绍你这个甘榜格南。 (I recommend you this Kampong Glam.)
Baseline
Example-based language generation
Using k-nearest neighbors algorithm on speech acts and semantic tags
Evaluation Metrics
BLEU: Geometric average of n-gram precision of system outputs to references
AM-FM: Linear interpolation of cosine similarity and normalized n-gram probability
Results (4 entries from 1 team)
Guide Tourist
Team Entry AM-FM BLEU AM-FM BLEU
0 0 0.1981 0.3854 0.2602 0.5921 ← Baseline
5 0 0.2818 0.3264 0.3221 0.4850
5 1 0.3180 0.3371 0.3635 0.5249
5 2 0.2737 0.2852 0.3100 0.4741
5 3 0.2405 0.2758 0.4258 0.5302
* More details can be found from our paper in the SLT proceeding, DSTC5 official website (http://workshop.colips.org/dstc5/) and DSTC5 GitHub repository (https://github.com/seokhwankim/dstc5).

More Related Content

Similar to The Fifth Dialog State Tracking Challenge (DSTC5)

SophiaConf 2018 - J. Rahajarison (My Little Adventure)
SophiaConf 2018 - J. Rahajarison (My Little Adventure)SophiaConf 2018 - J. Rahajarison (My Little Adventure)
SophiaConf 2018 - J. Rahajarison (My Little Adventure)TelecomValley
 
How to use a Kalman Filter in Brand Tracking?
How to use a Kalman Filter in Brand Tracking?How to use a Kalman Filter in Brand Tracking?
How to use a Kalman Filter in Brand Tracking?Ray Poynter
 
Ground Vibration Control Using Signature Hole Method - Thesis BE Mining, Univ...
Ground Vibration Control Using Signature Hole Method - Thesis BE Mining, Univ...Ground Vibration Control Using Signature Hole Method - Thesis BE Mining, Univ...
Ground Vibration Control Using Signature Hole Method - Thesis BE Mining, Univ...Muhamad Rizky
 
Climate Change Emotions on YouTube: The Case of Before the Flood
Climate Change Emotions on YouTube: The Case of Before the FloodClimate Change Emotions on YouTube: The Case of Before the Flood
Climate Change Emotions on YouTube: The Case of Before the FloodXanat V. Meza
 
AP Statistics - Confidence Intervals with Means - One Sample
AP Statistics - Confidence Intervals with Means - One SampleAP Statistics - Confidence Intervals with Means - One Sample
AP Statistics - Confidence Intervals with Means - One SampleFrances Coronel
 
Remote detection of weak aftershocks of the DPRK underground explosions using...
Remote detection of weak aftershocks of the DPRK underground explosions using...Remote detection of weak aftershocks of the DPRK underground explosions using...
Remote detection of weak aftershocks of the DPRK underground explosions using...Ivan Kitov
 
sCorrecting for country skew: How APNIC adjusts for sample bias in the counts
sCorrecting for country skew: How APNIC adjusts for sample bias in the countssCorrecting for country skew: How APNIC adjusts for sample bias in the counts
sCorrecting for country skew: How APNIC adjusts for sample bias in the countsAPNIC
 
93 crit valuetables_4th
93 crit valuetables_4th93 crit valuetables_4th
93 crit valuetables_4thasfawm
 
Group assigment statistic group3
Group assigment statistic group3Group assigment statistic group3
Group assigment statistic group3Narith Por
 

Similar to The Fifth Dialog State Tracking Challenge (DSTC5) (11)

SophiaConf 2018 - J. Rahajarison (My Little Adventure)
SophiaConf 2018 - J. Rahajarison (My Little Adventure)SophiaConf 2018 - J. Rahajarison (My Little Adventure)
SophiaConf 2018 - J. Rahajarison (My Little Adventure)
 
How to use a Kalman Filter in Brand Tracking?
How to use a Kalman Filter in Brand Tracking?How to use a Kalman Filter in Brand Tracking?
How to use a Kalman Filter in Brand Tracking?
 
Ground Vibration Control Using Signature Hole Method - Thesis BE Mining, Univ...
Ground Vibration Control Using Signature Hole Method - Thesis BE Mining, Univ...Ground Vibration Control Using Signature Hole Method - Thesis BE Mining, Univ...
Ground Vibration Control Using Signature Hole Method - Thesis BE Mining, Univ...
 
Climate Change Emotions on YouTube: The Case of Before the Flood
Climate Change Emotions on YouTube: The Case of Before the FloodClimate Change Emotions on YouTube: The Case of Before the Flood
Climate Change Emotions on YouTube: The Case of Before the Flood
 
1. talleres lectoescritura
1. talleres lectoescritura1. talleres lectoescritura
1. talleres lectoescritura
 
AP Statistics - Confidence Intervals with Means - One Sample
AP Statistics - Confidence Intervals with Means - One SampleAP Statistics - Confidence Intervals with Means - One Sample
AP Statistics - Confidence Intervals with Means - One Sample
 
Remote detection of weak aftershocks of the DPRK underground explosions using...
Remote detection of weak aftershocks of the DPRK underground explosions using...Remote detection of weak aftershocks of the DPRK underground explosions using...
Remote detection of weak aftershocks of the DPRK underground explosions using...
 
sCorrecting for country skew: How APNIC adjusts for sample bias in the counts
sCorrecting for country skew: How APNIC adjusts for sample bias in the countssCorrecting for country skew: How APNIC adjusts for sample bias in the counts
sCorrecting for country skew: How APNIC adjusts for sample bias in the counts
 
Trigonotabel
TrigonotabelTrigonotabel
Trigonotabel
 
93 crit valuetables_4th
93 crit valuetables_4th93 crit valuetables_4th
93 crit valuetables_4th
 
Group assigment statistic group3
Group assigment statistic group3Group assigment statistic group3
Group assigment statistic group3
 

More from Seokhwan Kim

The Eighth Dialog System Technology Challenge (DSTC8)
The Eighth Dialog System Technology Challenge (DSTC8)The Eighth Dialog System Technology Challenge (DSTC8)
The Eighth Dialog System Technology Challenge (DSTC8)Seokhwan Kim
 
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...Seokhwan Kim
 
Dynamic Memory Networks for Dialogue Topic Tracking
Dynamic Memory Networks for Dialogue Topic TrackingDynamic Memory Networks for Dialogue Topic Tracking
Dynamic Memory Networks for Dialogue Topic TrackingSeokhwan Kim
 
Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...
Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...
Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...Seokhwan Kim
 
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...Towards Improving Dialogue Topic Tracking Performances with Wikification of C...
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...Seokhwan Kim
 
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...Seokhwan Kim
 
Sequential Labeling for Tracking Dynamic Dialog States
Sequential Labeling for Tracking Dynamic Dialog StatesSequential Labeling for Tracking Dynamic Dialog States
Sequential Labeling for Tracking Dynamic Dialog StatesSeokhwan Kim
 
Wikipedia-based Kernels for Dialogue Topic Tracking
Wikipedia-based Kernels for Dialogue Topic TrackingWikipedia-based Kernels for Dialogue Topic Tracking
Wikipedia-based Kernels for Dialogue Topic TrackingSeokhwan Kim
 
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...Seokhwan Kim
 
MMR-based active machine learning for Bio named entity recognition
MMR-based active machine learning for Bio named entity recognitionMMR-based active machine learning for Bio named entity recognition
MMR-based active machine learning for Bio named entity recognitionSeokhwan Kim
 
A semi-supervised method for efficient construction of statistical spoken lan...
A semi-supervised method for efficient construction of statistical spoken lan...A semi-supervised method for efficient construction of statistical spoken lan...
A semi-supervised method for efficient construction of statistical spoken lan...Seokhwan Kim
 
A spoken dialog system for electronic program guide information access
A spoken dialog system for electronic program guide information accessA spoken dialog system for electronic program guide information access
A spoken dialog system for electronic program guide information accessSeokhwan Kim
 
An alignment-based approach to semi-supervised relation extraction including ...
An alignment-based approach to semi-supervised relation extraction including ...An alignment-based approach to semi-supervised relation extraction including ...
An alignment-based approach to semi-supervised relation extraction including ...Seokhwan Kim
 
An Alignment-based Pattern Representation Model for Information Extraction
An Alignment-based Pattern Representation Model for Information ExtractionAn Alignment-based Pattern Representation Model for Information Extraction
An Alignment-based Pattern Representation Model for Information ExtractionSeokhwan Kim
 
A Cross-Lingual Annotation Projection Approach for Relation Detection
A Cross-Lingual Annotation Projection Approach for Relation DetectionA Cross-Lingual Annotation Projection Approach for Relation Detection
A Cross-Lingual Annotation Projection Approach for Relation DetectionSeokhwan Kim
 
A Cross-lingual Annotation Projection-based Self-supervision Approach for Ope...
A Cross-lingual Annotation Projection-based Self-supervision Approach for Ope...A Cross-lingual Annotation Projection-based Self-supervision Approach for Ope...
A Cross-lingual Annotation Projection-based Self-supervision Approach for Ope...Seokhwan Kim
 

More from Seokhwan Kim (16)

The Eighth Dialog System Technology Challenge (DSTC8)
The Eighth Dialog System Technology Challenge (DSTC8)The Eighth Dialog System Technology Challenge (DSTC8)
The Eighth Dialog System Technology Challenge (DSTC8)
 
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
 
Dynamic Memory Networks for Dialogue Topic Tracking
Dynamic Memory Networks for Dialogue Topic TrackingDynamic Memory Networks for Dialogue Topic Tracking
Dynamic Memory Networks for Dialogue Topic Tracking
 
Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...
Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...
Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...
 
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...Towards Improving Dialogue Topic Tracking Performances with Wikification of C...
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...
 
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...
 
Sequential Labeling for Tracking Dynamic Dialog States
Sequential Labeling for Tracking Dynamic Dialog StatesSequential Labeling for Tracking Dynamic Dialog States
Sequential Labeling for Tracking Dynamic Dialog States
 
Wikipedia-based Kernels for Dialogue Topic Tracking
Wikipedia-based Kernels for Dialogue Topic TrackingWikipedia-based Kernels for Dialogue Topic Tracking
Wikipedia-based Kernels for Dialogue Topic Tracking
 
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
 
MMR-based active machine learning for Bio named entity recognition
MMR-based active machine learning for Bio named entity recognitionMMR-based active machine learning for Bio named entity recognition
MMR-based active machine learning for Bio named entity recognition
 
A semi-supervised method for efficient construction of statistical spoken lan...
A semi-supervised method for efficient construction of statistical spoken lan...A semi-supervised method for efficient construction of statistical spoken lan...
A semi-supervised method for efficient construction of statistical spoken lan...
 
A spoken dialog system for electronic program guide information access
A spoken dialog system for electronic program guide information accessA spoken dialog system for electronic program guide information access
A spoken dialog system for electronic program guide information access
 
An alignment-based approach to semi-supervised relation extraction including ...
An alignment-based approach to semi-supervised relation extraction including ...An alignment-based approach to semi-supervised relation extraction including ...
An alignment-based approach to semi-supervised relation extraction including ...
 
An Alignment-based Pattern Representation Model for Information Extraction
An Alignment-based Pattern Representation Model for Information ExtractionAn Alignment-based Pattern Representation Model for Information Extraction
An Alignment-based Pattern Representation Model for Information Extraction
 
A Cross-Lingual Annotation Projection Approach for Relation Detection
A Cross-Lingual Annotation Projection Approach for Relation DetectionA Cross-Lingual Annotation Projection Approach for Relation Detection
A Cross-Lingual Annotation Projection Approach for Relation Detection
 
A Cross-lingual Annotation Projection-based Self-supervision Approach for Ope...
A Cross-lingual Annotation Projection-based Self-supervision Approach for Ope...A Cross-lingual Annotation Projection-based Self-supervision Approach for Ope...
A Cross-lingual Annotation Projection-based Self-supervision Approach for Ope...
 

Recently uploaded

Which standard is best for your content?
Which standard is best for your content?Which standard is best for your content?
Which standard is best for your content?Rustici Software
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Bitdefender-CSG-Report-creat7534-interactive
Bitdefender-CSG-Report-creat7534-interactiveBitdefender-CSG-Report-creat7534-interactive
Bitdefender-CSG-Report-creat7534-interactivestartupro
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
QMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdfQMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdfROWELL MARQUINA
 
Transport in Open Pits______SM_MI10415MI
Transport in Open Pits______SM_MI10415MITransport in Open Pits______SM_MI10415MI
Transport in Open Pits______SM_MI10415MIRomil Mishra
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Software Security in the Real World w/Kelsey Hightower
Software Security in the Real World w/Kelsey HightowerSoftware Security in the Real World w/Kelsey Hightower
Software Security in the Real World w/Kelsey HightowerAnchore
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
Deliver Latency Free Customer Experience
Deliver Latency Free Customer ExperienceDeliver Latency Free Customer Experience
Deliver Latency Free Customer ExperienceOpsTree solutions
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Laying the Data Foundations for Artificial Intelligence!
Laying the Data Foundations for Artificial Intelligence!Laying the Data Foundations for Artificial Intelligence!
Laying the Data Foundations for Artificial Intelligence!Memoori
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessWSO2
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
Arti Languages Pre Seed Pitchdeck 2024.pdf
Arti Languages Pre Seed Pitchdeck 2024.pdfArti Languages Pre Seed Pitchdeck 2024.pdf
Arti Languages Pre Seed Pitchdeck 2024.pdfwill854175
 
Transcript: Green paths: Learning from publishers’ sustainability journeys - ...
Transcript: Green paths: Learning from publishers’ sustainability journeys - ...Transcript: Green paths: Learning from publishers’ sustainability journeys - ...
Transcript: Green paths: Learning from publishers’ sustainability journeys - ...BookNet Canada
 

Recently uploaded (20)

Which standard is best for your content?
Which standard is best for your content?Which standard is best for your content?
Which standard is best for your content?
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Bitdefender-CSG-Report-creat7534-interactive
Bitdefender-CSG-Report-creat7534-interactiveBitdefender-CSG-Report-creat7534-interactive
Bitdefender-CSG-Report-creat7534-interactive
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
QMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdfQMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdf
 
Transport in Open Pits______SM_MI10415MI
Transport in Open Pits______SM_MI10415MITransport in Open Pits______SM_MI10415MI
Transport in Open Pits______SM_MI10415MI
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Software Security in the Real World w/Kelsey Hightower
Software Security in the Real World w/Kelsey HightowerSoftware Security in the Real World w/Kelsey Hightower
Software Security in the Real World w/Kelsey Hightower
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
Deliver Latency Free Customer Experience
Deliver Latency Free Customer ExperienceDeliver Latency Free Customer Experience
Deliver Latency Free Customer Experience
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Laying the Data Foundations for Artificial Intelligence!
Laying the Data Foundations for Artificial Intelligence!Laying the Data Foundations for Artificial Intelligence!
Laying the Data Foundations for Artificial Intelligence!
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with Platformless
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
Arti Languages Pre Seed Pitchdeck 2024.pdf
Arti Languages Pre Seed Pitchdeck 2024.pdfArti Languages Pre Seed Pitchdeck 2024.pdf
Arti Languages Pre Seed Pitchdeck 2024.pdf
 
Transcript: Green paths: Learning from publishers’ sustainability journeys - ...
Transcript: Green paths: Learning from publishers’ sustainability journeys - ...Transcript: Green paths: Learning from publishers’ sustainability journeys - ...
Transcript: Green paths: Learning from publishers’ sustainability journeys - ...
 

The Fifth Dialog State Tracking Challenge (DSTC5)

  • 1. The Fifth Dialog State Tracking Challenge (DSTC5) Seokhwan Kim1 , Luis Fernando D’Haro1 , Rafael E. Banchs1 , Jason D. Williams2 , Matthew Henderson3 , Koichiro Yoshino4 1 Institute for Infocomm Research, Singapore. 2 Microsoft Research, USA. 3 Google, USA. 4 Nara Institute of Science and Technology, Japan. Problems Goal Human-human dialogs on tourist information in English and Chinese Focusing on the problem of adaptation to a new language Main Task Dialog State Tracking (DST) Pilot Tasks Spoken Language Understanding (SLU) Speech Act Prediction (SAP) Spoken Language Generation (SLG) End-to-end System (EES) Datasets Dialogs Set Task Language # dialogs # utterances Train ALL English 35 31,304 ← DSTC4 datasets Dev ALL Chinese 2 3,130 Test MAIN Chinese 10 14,878 Test SLU Chinese 8 12,655 Test SAP Chinese 8 11,456 Test SLG Chinese 8 12,346 Translations 5-best translations were provided for each utterance with word alignments generated by English-to-Chinese and Chinese-to-English MT systems The ontology for DSTC4 was given with its automatic translation to Chinese Main Task: Dialog State Tracking Task Definition Dialog state tracking for each sub-dialog level Input Transcribed utterances from the beginning of the session to each timestep Manually segmented by sub-dialogs and annotated with topic categories Output Frame structures defined with slot-value pairs For 5 major topic categories: Accommodation, Attraction, Food, Shopping, Transportation Example Speaker Utterance Dialog State Guide 我介绍你这个甘榜格南。 (I recommend you this Kampong Glam.) TOPIC: Attraction TYPE OF PLACE: Ethnic enclave NEIGHBORHOOD: Kampong Glam Tourist 对。(Right.) Guide 你看,它是个-它是马来村嘛 (You see, it is a- it’s a Malay Village) Tourist 对,甘榜- (Right, Kampong-) Guide 它就卖了很多马来食物。 (It sells a lot of Malay food.) TOPIC: Food CUISINE: Malay cuisine NEIGHBORHOOD: Kampong Glam Tourist 比较有特色的食物, (It’s quite a unique food,) Guide 对,哦。(Right.) Guide 马来食物,基本上,它是香。 (Malay food, basically, it smells very nice.) Tourist 那我们住宿呢?(Then, where do we stay?) TOPIC: Accommodation INFO: Pricerange NAME: V Hotel Guide 我介绍一间呵,叫V Hotel的。 (Let me recommend to you, the V Hotel.) Guide 这个酒店,价格这个不贵。 (This hotel, the price is not expensive.) Tourist 好的。 (Okay.) Guide 如果要去,我建议的这个马来文化村, TOPIC: Transportation INFO: Duration TYPE: Walking FROM: V Hotel TO: Kampong Glam (If you want to go, I suggest this Malay cultural village,) Tourist 马来村? (Malay village?) Guide 步行大概我看十五分钟吧。 (I think it take fifteen minutes on foot.) Tourist 好。 (That’s good.) Main Task: Dialog State Tracking Baselines Fuzzy string matching between ontology entries and utterances (DSTC4) Baseline 1: Translations in English with the original ontology in English Baseline 2: Original utterances in Chinese with the translated ontology in Chinese Evaluation Schedules: (1) every turn; (2) only at the end of each sub-dialog Metrics: (1) Frame-level Accuracy; (2) Slot-level Precision/Recall/F-measure Results (32 entries from 9 teams) Schedule 1 Schedule 2 Team Entry Accuracy F-measure Accuracy F-measure 0 0 0.0250 0.1124 0.0321 0.1462 ← Baseline 1 0 1 0.0161 0.1475 0.0222 0.1871 ← Baseline 2 1 0 0.0397 0.3115 0.0551 0.3565 1 1 0.0386 0.3032 0.0597 0.3540 1 2 0.0393 0.3071 0.0551 0.3563 1 3 0.0387 0.3052 0.0597 0.3580 1 4 0.0417 0.3166 0.0612 0.3675 2 0 0.0736 0.3966 0.0964 0.4430 2 1 0.0567 0.3764 0.0712 0.4267 2 2 0.0529 0.3756 0.0681 0.4259 2 3 0.0788 0.4047 0.0956 0.4519 2 4 0.0699 0.4024 0.0872 0.4499 3 0 0.0351 0.2060 0.0505 0.2539 3 1 0.0303 0.2424 0.0367 0.2830 3 2 0.0289 0.2074 0.0406 0.2573 3 3 0.0341 0.2442 0.0451 0.2895 4 0 0.0583 0.3280 0.0765 0.3658 4 1 0.0407 0.3405 0.0413 0.3572 4 2 0.0515 0.3708 0.0635 0.3945 4 3 0.0552 0.3649 0.0681 0.3913 4 4 0.0454 0.3572 0.0559 0.3758 5 0 0.0330 0.2749 0.0520 0.3314 5 1 0.0187 0.1804 0.0230 0.1967 5 2 0.0183 0.1520 0.0168 0.1371 5 3 0.0313 0.1574 0.0413 0.1880 5 4 0.0093 0.0945 0.0115 0.0977 6 0 0.0389 0.2849 0.0482 0.3230 6 1 0.0340 0.3070 0.0383 0.3532 6 2 0.0491 0.2988 0.0643 0.3381 7 0 0.0092 0.0783 0.0107 0.0794 7 1 0.0085 0.0767 0.0115 0.0809 8 0 0.0192 0.1570 0.0214 0.1554 8 1 0.0068 0.0554 0.0069 0.0577 9 0 0.0231 0.1114 0.0314 0.1449 Pilot Task: Spoken Language Understanding Task Definition Input: Transcribed utterance at each timestep Output Speech Act: 4 main categories with 21 attributes Semantic Tags: 8 main categories with subcategories, relative modifiers and from-to modifiers Example Input: 我介绍你这个甘榜格南。 (I recommend you this Kampong Glam.) Speech Act: INI (RECOMMEND) Semantic Tags: 我介绍你这<LOC CAT=“CULTURAL”>个甘榜格南</LOC>。 (I recommend you this <LOC CAT=“CULTURAL”>Kampong Glam</LOC>.) Pilot Task: Spoken Language Understanding Baselines: SVM for Speech Acts and CRF for Semantic Tags Evaluation Metrics: Precision/Recall/F-measure Results on Speech Acts (12 entries from 4 teams) Guide Tourist Team Entry P R F P R F 0 0 0.4588 0.2480 0.3219 0.3694 0.1828 0.2446 ← SVM baseline 2 0 0.5450 0.3911 0.4554 0.5001 0.5501 0.5239 2 1 0.5305 0.3969 0.4540 0.5331 0.5263 0.5297 2 2 0.5533 0.3829 0.4526 0.5107 0.5425 0.5261 2 3 0.5127 0.4251 0.4648 0.5605 0.4999 0.5285 3 0 0.4279 0.3583 0.3900 0.4591 0.4241 0.4409 3 1 0.4340 0.3635 0.3956 0.4498 0.4119 0.4300 5 0 0.4085 0.3364 0.3690 0.5026 0.4484 0.4739 5 1 0.3905 0.3216 0.3527 0.4519 0.4031 0.4261 5 2 0.4639 0.3820 0.4190 0.4916 0.4385 0.4635 5 3 0.4540 0.3739 0.4101 0.4871 0.4346 0.4594 5 4 0.4459 0.3672 0.4028 0.4984 0.4446 0.4700 7 0 0.5007 0.2976 0.3733 0.5079 0.4156 0.4571 Results on Sementic Tags (8 entries from 3 teams) Guide Tourist Team Entry P R F P R F 0 0 0.4666 0.3187 0.3787 0.5259 0.2659 0.3532 ← CRF baseline 3 0 0.4650 0.3182 0.3779 0.5331 0.2620 0.3513 3 1 0.4650 0.3182 0.3779 0.5331 0.2620 0.3513 5 0 0.5006 0.2923 0.3691 0.5083 0.3110 0.3859 5 1 0.5469 0.1893 0.2813 0.5121 0.3081 0.3847 5 2 0.3577 0.2476 0.2926 0.3031 0.2237 0.2574 5 3 0.3486 0.2541 0.2939 0.2932 0.2149 0.2480 5 4 0.3395 0.2111 0.2603 0.2947 0.2072 0.2433 7 0 0.4400 0.3207 0.3710 0.4408 0.2926 0.3517 Pilot Task: Spoken Language Generation Task Definition Input: Speech act and semantic tags at each time step Output: Generated utterance Example Input: INI (RECOMMEND), <LOC CAT=“CULTURAL”>Kampong Glam</LOC> Output: 我介绍你这个甘榜格南。 (I recommend you this Kampong Glam.) Baseline Example-based language generation Using k-nearest neighbors algorithm on speech acts and semantic tags Evaluation Metrics BLEU: Geometric average of n-gram precision of system outputs to references AM-FM: Linear interpolation of cosine similarity and normalized n-gram probability Results (4 entries from 1 team) Guide Tourist Team Entry AM-FM BLEU AM-FM BLEU 0 0 0.1981 0.3854 0.2602 0.5921 ← Baseline 5 0 0.2818 0.3264 0.3221 0.4850 5 1 0.3180 0.3371 0.3635 0.5249 5 2 0.2737 0.2852 0.3100 0.4741 5 3 0.2405 0.2758 0.4258 0.5302 * More details can be found from our paper in the SLT proceeding, DSTC5 official website (http://workshop.colips.org/dstc5/) and DSTC5 GitHub repository (https://github.com/seokhwankim/dstc5).