A Graph-based Cross-lingual Projection Approach for Spoken Language
Understanding Portability to a New Language
Seokhwan Kim
Human Language Technology Department, Institute for Infocomm Research, Singapore
Introduction
Statistical approaches to SLU require a sufficient number of
training examples to obtain good results
Cross-lingual SLU using SMT technologies can improve the
portability of SLU to a new language
Previous work on cross-lingual SLU have focused on filtering
out or correcting the noisy translations as post-processing
We propose a graph-based projection approach to improve
the robustness to the translation errors in cross-lingual SLU
Cross-lingual SLU Using SMT
TrainOnTarget
Dataset
in Ls
SMT
from Ls to Lt
Translated
Dataset
in Lt
SLU
in Lt
User Input
in Lt
TestTraining
Annotations for a given word sequence x = {x1, · · · , xn}
NE
an NE tag sequence y = {y1, · · · , yn}
DA
a class variable z
Example
xs
ys
zs
xt
yt
zt
Show me flights to New York on Nov 18th
î &â 11* 18Ò Êë ß ÃK š JBn
to.city
-b
to.city
-i
month
-b
day
-b
o oooo
to.city
-b
month
-b
day
-b
o o o oo o
show_flight
show_flight
Direct Projection
The simplest way of projection
It propagates the annotations only with word alignments themselves
It considers only the translation for each single utterance
It is performed by a single pass process
The results of direct projection can be unreliable because of
erroneous translations and word alignments
Graph-based Projection
Graph Construction for NE
Nodes
All trigrams in the dataset
Edges
Monolingual: w(vi, vj) = simcosine(f(vi), f(vj)) =
f(vi)·f(vj)
|f(vi)||f(vj)|
Bilingual: w(vk
s , vl
t ) =
count(vk
s ,vl
t )
vm
t
count(vk
s ,vm
t )
Initial values
Based on the manual annotations of NE in Ls
vt
vt
vt
vt
vs
vs
vs
vs
Graph Construction for DA
Nodes
Utterance nodes U = {u1, · · · , um}
Trigram nodes V
Edges
The edge between ui and vj has a binary weight value indicating whether vj in ui
Initial values
Based on the manual annotations of DA in Ls
ut
ut
vt
vt
vt
vt
vs
vs
vs
vs
us
us
Label Propagation
A graph-based semi-supervised learning algorithm
It induces labels for all of the unlabeled nodes on the graph
Experimental Settings
Data
3,351 pairs of bi-utterances in English and Korean
Manually annotated with 30 DA classes and 30 NE classes
Toolkits
Moses and SRILM for SMT
Junto toolkit for Graph-based projection
Maximum Entropy for DA identification
Conditional Random Fields for NE recognition
Measures
5-fold cross validation to the manual annotations on Lt
Precision/recall/F-measure for NE recognition
Accuracy for DA identification.
Experimental results
NE
Korean→English English→Korean
P R F P R F
Supervised 97.6 95.4 96.4 97.1 96.9 97.0
TestOnSource 45.2 16.4 24.0 63.8 19.9 30.3
Direct 43.1 11.9 18.7 50.9 14.8 23.0
Graph-based 50.7 39.8 44.6 67.2 43.4 52.7
DA
Accuracy (%)
Korean→English English→Korean
Supervised 87.7 83.3
TestOnSource 58.9 70.2
Direct 56.5 69.6
Graph-based 63.5 74.3
Conclusion
This paper presented a graph-based projection approach for
cross-lingual SLU using SMT
Our approach performed a label propagation algorithm on a
proposed graph that was defined with the translations for all
over the dataset
The feasibility of our approach was demonstrated by English
and Korean SLU models
Experimental results show that our graph-based projection
helped to improve the performances of the cross-lingual SLU
than previous approaches
1 Fusionopolis Way, #21-01 Connexis (South Tower), Singapore 138632 Email: kims@i2r.a-star.edu.sg WWW: http://hlt.i2r.a-star.edu.sg/

A Graph-based Cross-lingual Projection Approach for Spoken Language Understanding Portability to a New Language

  • 1.
    A Graph-based Cross-lingualProjection Approach for Spoken Language Understanding Portability to a New Language Seokhwan Kim Human Language Technology Department, Institute for Infocomm Research, Singapore Introduction Statistical approaches to SLU require a sufficient number of training examples to obtain good results Cross-lingual SLU using SMT technologies can improve the portability of SLU to a new language Previous work on cross-lingual SLU have focused on filtering out or correcting the noisy translations as post-processing We propose a graph-based projection approach to improve the robustness to the translation errors in cross-lingual SLU Cross-lingual SLU Using SMT TrainOnTarget Dataset in Ls SMT from Ls to Lt Translated Dataset in Lt SLU in Lt User Input in Lt TestTraining Annotations for a given word sequence x = {x1, · · · , xn} NE an NE tag sequence y = {y1, · · · , yn} DA a class variable z Example xs ys zs xt yt zt Show me flights to New York on Nov 18th î &â 11* 18Ò Êë ß ÃK š JBn to.city -b to.city -i month -b day -b o oooo to.city -b month -b day -b o o o oo o show_flight show_flight Direct Projection The simplest way of projection It propagates the annotations only with word alignments themselves It considers only the translation for each single utterance It is performed by a single pass process The results of direct projection can be unreliable because of erroneous translations and word alignments Graph-based Projection Graph Construction for NE Nodes All trigrams in the dataset Edges Monolingual: w(vi, vj) = simcosine(f(vi), f(vj)) = f(vi)·f(vj) |f(vi)||f(vj)| Bilingual: w(vk s , vl t ) = count(vk s ,vl t ) vm t count(vk s ,vm t ) Initial values Based on the manual annotations of NE in Ls vt vt vt vt vs vs vs vs Graph Construction for DA Nodes Utterance nodes U = {u1, · · · , um} Trigram nodes V Edges The edge between ui and vj has a binary weight value indicating whether vj in ui Initial values Based on the manual annotations of DA in Ls ut ut vt vt vt vt vs vs vs vs us us Label Propagation A graph-based semi-supervised learning algorithm It induces labels for all of the unlabeled nodes on the graph Experimental Settings Data 3,351 pairs of bi-utterances in English and Korean Manually annotated with 30 DA classes and 30 NE classes Toolkits Moses and SRILM for SMT Junto toolkit for Graph-based projection Maximum Entropy for DA identification Conditional Random Fields for NE recognition Measures 5-fold cross validation to the manual annotations on Lt Precision/recall/F-measure for NE recognition Accuracy for DA identification. Experimental results NE Korean→English English→Korean P R F P R F Supervised 97.6 95.4 96.4 97.1 96.9 97.0 TestOnSource 45.2 16.4 24.0 63.8 19.9 30.3 Direct 43.1 11.9 18.7 50.9 14.8 23.0 Graph-based 50.7 39.8 44.6 67.2 43.4 52.7 DA Accuracy (%) Korean→English English→Korean Supervised 87.7 83.3 TestOnSource 58.9 70.2 Direct 56.5 69.6 Graph-based 63.5 74.3 Conclusion This paper presented a graph-based projection approach for cross-lingual SLU using SMT Our approach performed a label propagation algorithm on a proposed graph that was defined with the translations for all over the dataset The feasibility of our approach was demonstrated by English and Korean SLU models Experimental results show that our graph-based projection helped to improve the performances of the cross-lingual SLU than previous approaches 1 Fusionopolis Way, #21-01 Connexis (South Tower), Singapore 138632 Email: kims@i2r.a-star.edu.sg WWW: http://hlt.i2r.a-star.edu.sg/