SlideShare a Scribd company logo
1 of 1
Download to read offline
A Semi-supervised Method for Efficient Construction of Statistical Spoken
  Language Understanding Resources
                                                                    Seokhwan Kim, Minwoo Jeong, and Gary Geunbae Lee
                                                             Pohang University of Science and Technology (POSTECH), South Korea

                      ABSTRACT                                                                     EXTRACTING CONTEXT PATTERNS                                                                                         SCORING CANDIDATES
 We present a semi-supervised framework to construct spoken                                                                                                                                                    ● The score of the alignment between a raw utterance and a
                                                                    To overcome the context sparseness problem of spoken utterances, we make use of not sub-phrases of an utterance, but the full              context pattern
language understanding resources with very low cost. We            utterance itself as a context pattern for extracting named entities in the utterance. First, we assume that each entry in the entity list
generate context patterns with a few seed entities and a large     is absolutely precise and uniquely belong to only a category whether it is a seed entity or an extended entity as an intermediate of
amount of unlabeled utterances. Using these context patterns,      the overall procedure. For each entry in the entity list, we find out utterances containing it, and make an utterance template by
we extract new entities from the unlabeled utterances. The         replacing the part of entity in the utterance with the defined entity label. In this replacing task, we exclude the entities which are      where ref is a context pattern, tar is a raw utterance, n is the
extracted entities are appended to the seed entities, and we can   located at the beginning or end of the utterance, because context patterns containing the entities located in such positions can lead       number of words in ref, m is the number of words in tar, t is the
obtain the extended entity list by repeating these steps. Our                                                                                                                                                  number of aligned entity labels, and e is the number of words
                                                                   to confusion of determining the boundaries of each entity in the later procedure.
                                                                                                                                                                                                               extracted as entity candidates
method is based on an utterance alignment algorithm which is a
variant of the biological sequence alignment algorithm. Using                                                                                                                                                  ● The score of an entity candidate ej which is extracted by a
this method, we can obtain precise entity lists with high                                                                                                                                                      context pattern refi
coverage, which is of help to reduce the cost of building
                                                                                  ALIGNMENT-BASED NAMED ENTITY RECOGNITION
resources for statistical spoken language understanding systems.
                                                                    We firstly align a raw utterance with a context pattern containing entity labels. Then, from the result of the best alignment
                                                                   between them, we extract the parts of the raw utterance which are aligned to the entity labels in the context pattern as an entity          ● The final score of an entity candidate ej
                   MOTIVATION                                      candidate belonging to the category of the corresponding entity label.

 Spoken Language Understanding (SLU) is a problem of
extracting semantic meanings from recognized utterances and
filling the correct values into a semantic frame structure. Most
of the statistical SLU approaches require a sufficient amount of
                                                                                                                                                                                                                                 EXPERIMENTS
training data which consist of labeled utterances with
                                                                                                                                                                                                               We evaluated our method on the CU-Communicator corpus,
corresponding semantic concepts. The task of building the                                                                                                                                                      which consists of 13,983 utterances. We chose the three most
training data for the SLU problem is one of the most important                                                                                                                                                 frequently occurring semantic categories in the corpus, CITY
and high-cost required tasks for managing the spoken dialog                  MATRIX COMPUTATION                                                              TRACE BACK                                        NAME, MONTH, and DAY NUMBER. we empirically set the
                                                                                                                                                                                                               entity selection threshold value to 0.3.
systems. We concentrate on utilizing a semi-supervised
information extraction method for building resources for                                                                                      The traceback step is started at the position with               ● Result of automatic entity list extension
statistical SLU modules in spoken dialog systems.                                                                                            maximum score from among the first column and the
                                                                                                                                             first row. Then, the next position of the position [i, j] is                                 # of     # of
                                                                                                                                             determined by following policies.                                                     # of
                                                                                                                                                                                                                   Category             extended total Precision Recall
         OVERALL PROCEDURE                                                                                                                    • If tar(i) and ref(j) are identical, then the next position                        seeds
                                                                                                                                                                                                                                         entities entities
                                                                                                                                             is [i + 1, j + 1].
1. Prepare seed entity list E and unlabeled corpus C                                                                                          • Otherwise, the position with maximum score from                 CITY_NAME          20      123       209     65.04% 37.91%
2. Find utterances containing lexical of entities in E in the                                                                                among [i + 1, j + 1] ~ [i + 1, n] and [i + 1, j + 1] ~ [m, j +
corpus C, and replace the parts of matched entities in the                                                                                   1] is the next position.                                              MONTH            1       10        12        100%     83.33%
found utterances with a label which indicates the location
                                                                                                                                                                                                               DAY_NUMBER           3       27        34        100%     79.41%
of entities. Add partially labeled utterances to the
context pattern set P.
                                                                                                                                                                                                               ● Result of corpus labeling experiment
3. Align each utterance in the corpus C with each context
pattern in P, and extract new entity candidates in the utterance
                                                                                                                                                                                                                      Category           Precision     Recall      F-measure
which is matched with the entity label in the context
pattern.                                                                                                                                                                                                           CITY_NAME               91.30        86.83          89.01
4. Compute the score of extracted entity candidates in step                                                                                                                                                           MONTH                98.98        87.24          92.74
3, and add only high-scored candidates to E.                                                                                                                                                                      DAY_NUMBER               92.00        82.03          86.73
5. If there is no additional entities to E in step 4, terminate                                                                                                                                                        Overall             93.24        85.53          89.22
the process with entity list E, context pattern set P, and
partially labeled corpus C as results. Otherwise, return
to step 2 and repeat the process.

More Related Content

What's hot

4.instructuional design theories
4.instructuional design theories4.instructuional design theories
4.instructuional design theoriesShyamanta Baruah
 
Speaker Identification From Youtube Obtained Data
Speaker Identification From Youtube Obtained DataSpeaker Identification From Youtube Obtained Data
Speaker Identification From Youtube Obtained Datasipij
 
NEURAL DISCOURSE MODELLING OF CONVERSATIONS
NEURAL DISCOURSE MODELLING OF CONVERSATIONSNEURAL DISCOURSE MODELLING OF CONVERSATIONS
NEURAL DISCOURSE MODELLING OF CONVERSATIONSkevig
 
NEURAL DISCOURSE MODELLING OF CONVERSATIONS
NEURAL DISCOURSE MODELLING OF CONVERSATIONSNEURAL DISCOURSE MODELLING OF CONVERSATIONS
NEURAL DISCOURSE MODELLING OF CONVERSATIONSijnlc
 
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...ijnlc
 
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural NetworkSentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Networkkevig
 
DOMAIN BASED CHUNKING
DOMAIN BASED CHUNKINGDOMAIN BASED CHUNKING
DOMAIN BASED CHUNKINGkevig
 
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...ijcsit
 
EXTRACTIVE SUMMARIZATION WITH VERY DEEP PRETRAINED LANGUAGE MODEL
EXTRACTIVE SUMMARIZATION WITH VERY DEEP PRETRAINED LANGUAGE MODELEXTRACTIVE SUMMARIZATION WITH VERY DEEP PRETRAINED LANGUAGE MODEL
EXTRACTIVE SUMMARIZATION WITH VERY DEEP PRETRAINED LANGUAGE MODELijaia
 
Extractive Summarization with Very Deep Pretrained Language Model
Extractive Summarization with Very Deep Pretrained Language ModelExtractive Summarization with Very Deep Pretrained Language Model
Extractive Summarization with Very Deep Pretrained Language Modelgerogepatton
 
Arabic named entity recognition using deep learning approach
Arabic named entity recognition using deep learning approachArabic named entity recognition using deep learning approach
Arabic named entity recognition using deep learning approachIJECEIAES
 
SCHEME OF WORK 2010
SCHEME OF WORK 2010SCHEME OF WORK 2010
SCHEME OF WORK 2010SMS
 

What's hot (15)

4.instructuional design theories
4.instructuional design theories4.instructuional design theories
4.instructuional design theories
 
Speaker Identification From Youtube Obtained Data
Speaker Identification From Youtube Obtained DataSpeaker Identification From Youtube Obtained Data
Speaker Identification From Youtube Obtained Data
 
Semantic Search Trend
Semantic Search TrendSemantic Search Trend
Semantic Search Trend
 
NEURAL DISCOURSE MODELLING OF CONVERSATIONS
NEURAL DISCOURSE MODELLING OF CONVERSATIONSNEURAL DISCOURSE MODELLING OF CONVERSATIONS
NEURAL DISCOURSE MODELLING OF CONVERSATIONS
 
NEURAL DISCOURSE MODELLING OF CONVERSATIONS
NEURAL DISCOURSE MODELLING OF CONVERSATIONSNEURAL DISCOURSE MODELLING OF CONVERSATIONS
NEURAL DISCOURSE MODELLING OF CONVERSATIONS
 
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
 
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural NetworkSentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
 
DOMAIN BASED CHUNKING
DOMAIN BASED CHUNKINGDOMAIN BASED CHUNKING
DOMAIN BASED CHUNKING
 
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
 
Kc3517481754
Kc3517481754Kc3517481754
Kc3517481754
 
EXTRACTIVE SUMMARIZATION WITH VERY DEEP PRETRAINED LANGUAGE MODEL
EXTRACTIVE SUMMARIZATION WITH VERY DEEP PRETRAINED LANGUAGE MODELEXTRACTIVE SUMMARIZATION WITH VERY DEEP PRETRAINED LANGUAGE MODEL
EXTRACTIVE SUMMARIZATION WITH VERY DEEP PRETRAINED LANGUAGE MODEL
 
Extractive Summarization with Very Deep Pretrained Language Model
Extractive Summarization with Very Deep Pretrained Language ModelExtractive Summarization with Very Deep Pretrained Language Model
Extractive Summarization with Very Deep Pretrained Language Model
 
Nd2421622165
Nd2421622165Nd2421622165
Nd2421622165
 
Arabic named entity recognition using deep learning approach
Arabic named entity recognition using deep learning approachArabic named entity recognition using deep learning approach
Arabic named entity recognition using deep learning approach
 
SCHEME OF WORK 2010
SCHEME OF WORK 2010SCHEME OF WORK 2010
SCHEME OF WORK 2010
 

Similar to A semi-supervised method for efficient construction of statistical spoken language understanding resources

Towards Building Semantic Role Labeler for Indian Languages
Towards Building Semantic Role Labeler for Indian LanguagesTowards Building Semantic Role Labeler for Indian Languages
Towards Building Semantic Role Labeler for Indian LanguagesAlgoscale Technologies Inc.
 
Contextual Emotion Recognition Using Transformer-Based Models
Contextual Emotion Recognition Using Transformer-Based ModelsContextual Emotion Recognition Using Transformer-Based Models
Contextual Emotion Recognition Using Transformer-Based ModelsIRJET Journal
 
Context Driven Technique for Document Classification
Context Driven Technique for Document ClassificationContext Driven Technique for Document Classification
Context Driven Technique for Document ClassificationIDES Editor
 
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET Journal
 
C2mala2janani 111110022452-phpapp02
C2mala2janani 111110022452-phpapp02C2mala2janani 111110022452-phpapp02
C2mala2janani 111110022452-phpapp02Harnav Harivasan
 
DOMAIN BASED CHUNKING
DOMAIN BASED CHUNKINGDOMAIN BASED CHUNKING
DOMAIN BASED CHUNKINGkevig
 
DOMAIN BASED CHUNKING
DOMAIN BASED CHUNKINGDOMAIN BASED CHUNKING
DOMAIN BASED CHUNKINGijnlc
 
SEMI-SUPERVISED BOOTSTRAPPING APPROACH FOR NAMED ENTITY RECOGNITION
SEMI-SUPERVISED BOOTSTRAPPING APPROACH FOR NAMED ENTITY RECOGNITIONSEMI-SUPERVISED BOOTSTRAPPING APPROACH FOR NAMED ENTITY RECOGNITION
SEMI-SUPERVISED BOOTSTRAPPING APPROACH FOR NAMED ENTITY RECOGNITIONkevig
 
IRJET - Audio Emotion Analysis
IRJET - Audio Emotion AnalysisIRJET - Audio Emotion Analysis
IRJET - Audio Emotion AnalysisIRJET Journal
 
Performance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi languagePerformance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi languageiosrjce
 
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual SimilaritySemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual Similaritypathsproject
 
Context Sensitive Relatedness Measure of Word Pairs
Context Sensitive Relatedness Measure of Word PairsContext Sensitive Relatedness Measure of Word Pairs
Context Sensitive Relatedness Measure of Word PairsIJCSIS Research Publications
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
 
An evolutionary optimization method for selecting features for speech emotion...
An evolutionary optimization method for selecting features for speech emotion...An evolutionary optimization method for selecting features for speech emotion...
An evolutionary optimization method for selecting features for speech emotion...TELKOMNIKA JOURNAL
 
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...AIRCC Publishing Corporation
 
Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Edmond Lepedus
 

Similar to A semi-supervised method for efficient construction of statistical spoken language understanding resources (20)

Towards Building Semantic Role Labeler for Indian Languages
Towards Building Semantic Role Labeler for Indian LanguagesTowards Building Semantic Role Labeler for Indian Languages
Towards Building Semantic Role Labeler for Indian Languages
 
Contextual Emotion Recognition Using Transformer-Based Models
Contextual Emotion Recognition Using Transformer-Based ModelsContextual Emotion Recognition Using Transformer-Based Models
Contextual Emotion Recognition Using Transformer-Based Models
 
Context Driven Technique for Document Classification
Context Driven Technique for Document ClassificationContext Driven Technique for Document Classification
Context Driven Technique for Document Classification
 
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
 
C2 mala2 janani
C2 mala2 jananiC2 mala2 janani
C2 mala2 janani
 
C2mala2janani 111110022452-phpapp02
C2mala2janani 111110022452-phpapp02C2mala2janani 111110022452-phpapp02
C2mala2janani 111110022452-phpapp02
 
DOMAIN BASED CHUNKING
DOMAIN BASED CHUNKINGDOMAIN BASED CHUNKING
DOMAIN BASED CHUNKING
 
DOMAIN BASED CHUNKING
DOMAIN BASED CHUNKINGDOMAIN BASED CHUNKING
DOMAIN BASED CHUNKING
 
SEMI-SUPERVISED BOOTSTRAPPING APPROACH FOR NAMED ENTITY RECOGNITION
SEMI-SUPERVISED BOOTSTRAPPING APPROACH FOR NAMED ENTITY RECOGNITIONSEMI-SUPERVISED BOOTSTRAPPING APPROACH FOR NAMED ENTITY RECOGNITION
SEMI-SUPERVISED BOOTSTRAPPING APPROACH FOR NAMED ENTITY RECOGNITION
 
IRJET - Audio Emotion Analysis
IRJET - Audio Emotion AnalysisIRJET - Audio Emotion Analysis
IRJET - Audio Emotion Analysis
 
Performance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi languagePerformance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi language
 
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual SimilaritySemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
 
Wsd final paper
Wsd final paperWsd final paper
Wsd final paper
 
Context Sensitive Relatedness Measure of Word Pairs
Context Sensitive Relatedness Measure of Word PairsContext Sensitive Relatedness Measure of Word Pairs
Context Sensitive Relatedness Measure of Word Pairs
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
An evolutionary optimization method for selecting features for speech emotion...
An evolutionary optimization method for selecting features for speech emotion...An evolutionary optimization method for selecting features for speech emotion...
An evolutionary optimization method for selecting features for speech emotion...
 
A018110108
A018110108A018110108
A018110108
 
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
 
Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...
 

More from Seokhwan Kim

The Eighth Dialog System Technology Challenge (DSTC8)
The Eighth Dialog System Technology Challenge (DSTC8)The Eighth Dialog System Technology Challenge (DSTC8)
The Eighth Dialog System Technology Challenge (DSTC8)Seokhwan Kim
 
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...Seokhwan Kim
 
Dynamic Memory Networks for Dialogue Topic Tracking
Dynamic Memory Networks for Dialogue Topic TrackingDynamic Memory Networks for Dialogue Topic Tracking
Dynamic Memory Networks for Dialogue Topic TrackingSeokhwan Kim
 
The Fifth Dialog State Tracking Challenge (DSTC5)
The Fifth Dialog State Tracking Challenge (DSTC5)The Fifth Dialog State Tracking Challenge (DSTC5)
The Fifth Dialog State Tracking Challenge (DSTC5)Seokhwan Kim
 
Natural Language in Human-Robot Interaction
Natural Language in Human-Robot InteractionNatural Language in Human-Robot Interaction
Natural Language in Human-Robot InteractionSeokhwan Kim
 
Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...
Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...
Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...Seokhwan Kim
 
The Fourth Dialog State Tracking Challenge (DSTC4)
The Fourth Dialog State Tracking Challenge (DSTC4)The Fourth Dialog State Tracking Challenge (DSTC4)
The Fourth Dialog State Tracking Challenge (DSTC4)Seokhwan Kim
 
Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...
Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...
Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...Seokhwan Kim
 
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...Towards Improving Dialogue Topic Tracking Performances with Wikification of C...
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...Seokhwan Kim
 
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...Seokhwan Kim
 
Sequential Labeling for Tracking Dynamic Dialog States
Sequential Labeling for Tracking Dynamic Dialog StatesSequential Labeling for Tracking Dynamic Dialog States
Sequential Labeling for Tracking Dynamic Dialog StatesSeokhwan Kim
 
Wikipedia-based Kernels for Dialogue Topic Tracking
Wikipedia-based Kernels for Dialogue Topic TrackingWikipedia-based Kernels for Dialogue Topic Tracking
Wikipedia-based Kernels for Dialogue Topic TrackingSeokhwan Kim
 
A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...
A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...
A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...Seokhwan Kim
 
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...Seokhwan Kim
 
MMR-based active machine learning for Bio named entity recognition
MMR-based active machine learning for Bio named entity recognitionMMR-based active machine learning for Bio named entity recognition
MMR-based active machine learning for Bio named entity recognitionSeokhwan Kim
 
A spoken dialog system for electronic program guide information access
A spoken dialog system for electronic program guide information accessA spoken dialog system for electronic program guide information access
A spoken dialog system for electronic program guide information accessSeokhwan Kim
 
An alignment-based approach to semi-supervised relation extraction including ...
An alignment-based approach to semi-supervised relation extraction including ...An alignment-based approach to semi-supervised relation extraction including ...
An alignment-based approach to semi-supervised relation extraction including ...Seokhwan Kim
 
An Alignment-based Pattern Representation Model for Information Extraction
An Alignment-based Pattern Representation Model for Information ExtractionAn Alignment-based Pattern Representation Model for Information Extraction
An Alignment-based Pattern Representation Model for Information ExtractionSeokhwan Kim
 
EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템
EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템
EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템Seokhwan Kim
 
A Cross-Lingual Annotation Projection Approach for Relation Detection
A Cross-Lingual Annotation Projection Approach for Relation DetectionA Cross-Lingual Annotation Projection Approach for Relation Detection
A Cross-Lingual Annotation Projection Approach for Relation DetectionSeokhwan Kim
 

More from Seokhwan Kim (20)

The Eighth Dialog System Technology Challenge (DSTC8)
The Eighth Dialog System Technology Challenge (DSTC8)The Eighth Dialog System Technology Challenge (DSTC8)
The Eighth Dialog System Technology Challenge (DSTC8)
 
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
 
Dynamic Memory Networks for Dialogue Topic Tracking
Dynamic Memory Networks for Dialogue Topic TrackingDynamic Memory Networks for Dialogue Topic Tracking
Dynamic Memory Networks for Dialogue Topic Tracking
 
The Fifth Dialog State Tracking Challenge (DSTC5)
The Fifth Dialog State Tracking Challenge (DSTC5)The Fifth Dialog State Tracking Challenge (DSTC5)
The Fifth Dialog State Tracking Challenge (DSTC5)
 
Natural Language in Human-Robot Interaction
Natural Language in Human-Robot InteractionNatural Language in Human-Robot Interaction
Natural Language in Human-Robot Interaction
 
Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...
Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...
Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...
 
The Fourth Dialog State Tracking Challenge (DSTC4)
The Fourth Dialog State Tracking Challenge (DSTC4)The Fourth Dialog State Tracking Challenge (DSTC4)
The Fourth Dialog State Tracking Challenge (DSTC4)
 
Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...
Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...
Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...
 
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...Towards Improving Dialogue Topic Tracking Performances with Wikification of C...
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...
 
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...
 
Sequential Labeling for Tracking Dynamic Dialog States
Sequential Labeling for Tracking Dynamic Dialog StatesSequential Labeling for Tracking Dynamic Dialog States
Sequential Labeling for Tracking Dynamic Dialog States
 
Wikipedia-based Kernels for Dialogue Topic Tracking
Wikipedia-based Kernels for Dialogue Topic TrackingWikipedia-based Kernels for Dialogue Topic Tracking
Wikipedia-based Kernels for Dialogue Topic Tracking
 
A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...
A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...
A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...
 
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
 
MMR-based active machine learning for Bio named entity recognition
MMR-based active machine learning for Bio named entity recognitionMMR-based active machine learning for Bio named entity recognition
MMR-based active machine learning for Bio named entity recognition
 
A spoken dialog system for electronic program guide information access
A spoken dialog system for electronic program guide information accessA spoken dialog system for electronic program guide information access
A spoken dialog system for electronic program guide information access
 
An alignment-based approach to semi-supervised relation extraction including ...
An alignment-based approach to semi-supervised relation extraction including ...An alignment-based approach to semi-supervised relation extraction including ...
An alignment-based approach to semi-supervised relation extraction including ...
 
An Alignment-based Pattern Representation Model for Information Extraction
An Alignment-based Pattern Representation Model for Information ExtractionAn Alignment-based Pattern Representation Model for Information Extraction
An Alignment-based Pattern Representation Model for Information Extraction
 
EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템
EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템
EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템
 
A Cross-Lingual Annotation Projection Approach for Relation Detection
A Cross-Lingual Annotation Projection Approach for Relation DetectionA Cross-Lingual Annotation Projection Approach for Relation Detection
A Cross-Lingual Annotation Projection Approach for Relation Detection
 

Recently uploaded

FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 

Recently uploaded (20)

FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 

A semi-supervised method for efficient construction of statistical spoken language understanding resources

  • 1. A Semi-supervised Method for Efficient Construction of Statistical Spoken Language Understanding Resources Seokhwan Kim, Minwoo Jeong, and Gary Geunbae Lee Pohang University of Science and Technology (POSTECH), South Korea ABSTRACT EXTRACTING CONTEXT PATTERNS SCORING CANDIDATES We present a semi-supervised framework to construct spoken ● The score of the alignment between a raw utterance and a To overcome the context sparseness problem of spoken utterances, we make use of not sub-phrases of an utterance, but the full context pattern language understanding resources with very low cost. We utterance itself as a context pattern for extracting named entities in the utterance. First, we assume that each entry in the entity list generate context patterns with a few seed entities and a large is absolutely precise and uniquely belong to only a category whether it is a seed entity or an extended entity as an intermediate of amount of unlabeled utterances. Using these context patterns, the overall procedure. For each entry in the entity list, we find out utterances containing it, and make an utterance template by we extract new entities from the unlabeled utterances. The replacing the part of entity in the utterance with the defined entity label. In this replacing task, we exclude the entities which are where ref is a context pattern, tar is a raw utterance, n is the extracted entities are appended to the seed entities, and we can located at the beginning or end of the utterance, because context patterns containing the entities located in such positions can lead number of words in ref, m is the number of words in tar, t is the obtain the extended entity list by repeating these steps. Our number of aligned entity labels, and e is the number of words to confusion of determining the boundaries of each entity in the later procedure. extracted as entity candidates method is based on an utterance alignment algorithm which is a variant of the biological sequence alignment algorithm. Using ● The score of an entity candidate ej which is extracted by a this method, we can obtain precise entity lists with high context pattern refi coverage, which is of help to reduce the cost of building ALIGNMENT-BASED NAMED ENTITY RECOGNITION resources for statistical spoken language understanding systems. We firstly align a raw utterance with a context pattern containing entity labels. Then, from the result of the best alignment between them, we extract the parts of the raw utterance which are aligned to the entity labels in the context pattern as an entity ● The final score of an entity candidate ej MOTIVATION candidate belonging to the category of the corresponding entity label. Spoken Language Understanding (SLU) is a problem of extracting semantic meanings from recognized utterances and filling the correct values into a semantic frame structure. Most of the statistical SLU approaches require a sufficient amount of EXPERIMENTS training data which consist of labeled utterances with We evaluated our method on the CU-Communicator corpus, corresponding semantic concepts. The task of building the which consists of 13,983 utterances. We chose the three most training data for the SLU problem is one of the most important frequently occurring semantic categories in the corpus, CITY and high-cost required tasks for managing the spoken dialog MATRIX COMPUTATION TRACE BACK NAME, MONTH, and DAY NUMBER. we empirically set the entity selection threshold value to 0.3. systems. We concentrate on utilizing a semi-supervised information extraction method for building resources for The traceback step is started at the position with ● Result of automatic entity list extension statistical SLU modules in spoken dialog systems. maximum score from among the first column and the first row. Then, the next position of the position [i, j] is # of # of determined by following policies. # of Category extended total Precision Recall OVERALL PROCEDURE • If tar(i) and ref(j) are identical, then the next position seeds entities entities is [i + 1, j + 1]. 1. Prepare seed entity list E and unlabeled corpus C • Otherwise, the position with maximum score from CITY_NAME 20 123 209 65.04% 37.91% 2. Find utterances containing lexical of entities in E in the among [i + 1, j + 1] ~ [i + 1, n] and [i + 1, j + 1] ~ [m, j + corpus C, and replace the parts of matched entities in the 1] is the next position. MONTH 1 10 12 100% 83.33% found utterances with a label which indicates the location DAY_NUMBER 3 27 34 100% 79.41% of entities. Add partially labeled utterances to the context pattern set P. ● Result of corpus labeling experiment 3. Align each utterance in the corpus C with each context pattern in P, and extract new entity candidates in the utterance Category Precision Recall F-measure which is matched with the entity label in the context pattern. CITY_NAME 91.30 86.83 89.01 4. Compute the score of extracted entity candidates in step MONTH 98.98 87.24 92.74 3, and add only high-scored candidates to E. DAY_NUMBER 92.00 82.03 86.73 5. If there is no additional entities to E in step 4, terminate Overall 93.24 85.53 89.22 the process with entity list E, context pattern set P, and partially labeled corpus C as results. Otherwise, return to step 2 and repeat the process.